Monday, January 29, 2018

Building Ceph master with C++17 support on openSUSE Leap 42.3

Ceph now requires C++17 support, which is available with modern compilers such as gcc-7. openSUSE Leap 42.3, my current OS of choice, includes gcc-7. However, it's not used by default.

Using gcc-7 for the Ceph build is a simple matter of:
> sudo zypper in gcc7-c++
> CC=gcc-7 CXX=/usr/bin/g++-7 ./ ...
> cd build && make -j$(nproc)

Monday, July 3, 2017

Multipath Failover Simulation with QEMU

While working on a Ceph OSD multipath issue, I came across a helpful post from Dan Horák on how to simulate a multipath device under QEMU.

qemu-kvm ... -device virtio-scsi-pci,id=scsi \
  -drive if=none,id=hda,file=<path>,cache=none,format=raw,serial=MPIO \
  -device scsi-hd,drive=hda \
  -drive if=none,id=hdb,file=<path>,cache=none,format=raw,serial=MPIO \
  -device scsi-hd,drive=hdb"
  • <path> should be replaced with a file or device path (the same for each)
  • serial= specifies the SCSI logical unit serial number
This attaches two virtual SCSI devices to the VM, both of which are backed by the same file and share the same SCSI logical unit identifier.
Once booted, the SCSI devices for each corresponding path appear as sda and sdb, which are then detected as multipath enabled and subsequently mapped as dm-0:

         Starting Device-Mapper Multipath Device Controller...
[  OK  ] Started Device-Mapper Multipath Device Controller.
[    1.329668] device-mapper: multipath service-time: version 0.3.0 loaded
rapido1:/# multipath -ll
size=2.0G features='1 retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 0:0:0:0 sda 8:0  active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 0:0:1:0 sdb 8:16 active ready running

QEMU additionally allows for virtual device hot(un)plug at runtime, which can be done from the QEMU monitor CLI (accessed via ctrl-a c) using the drive_del command. This can be used to trigger a multipath failover event:

rapido1:/# mkfs.xfs /dev/dm-0
meta-data=/dev/dm-0              isize=256    agcount=4, agsize=131072 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0, sparse=0
data     =                       bsize=4096   blocks=524288, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
rapido1:/# mount /dev/dm-0 /mnt/
[   96.846919] XFS (dm-0): Mounting V4 Filesystem
[   96.851383] XFS (dm-0): Ending clean mount

rapido1:/# QEMU 2.6.2 monitor - type 'help' for more information
(qemu) drive_del hda

rapido1:/# echo io-to-trigger-path-failure > /mnt/failover-trigger
[  190.926579] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[  190.926588] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x2 [current] 
[  190.926589] sd 0:0:0:0: [sda] tag#0 ASC=0x3a ASCQ=0x0 
[  190.926590] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 00 02 00 00 01 00
[  190.926591] blk_update_request: I/O error, dev sda, sector 2
[  190.926597] device-mapper: multipath: Failing path 8:0.

rapido1:/# multipath -ll
size=2.0G features='1 retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| `- 0:0:0:0 sda 8:0  failed faulty running
`-+- policy='service-time 0' prio=1 status=active
  `- 0:0:1:0 sdb 8:16 active ready  running

The above procedure demonstrates cable-pull simulation while the broken path is used by the mounted dm-0 device. The subsequent I/O failure triggers multipath failover to the remaining good path.

I've added this functionality to Rapido (pull-request) so that multipath failover can be performed in a couple of minutes directly from kernel source. I encourage you to give it a try for yourself!

Friday, June 9, 2017

Rapido: Quick Kernel Testing From Source (Video)

I presented a short talk at the 2017 openSUSE Conference on Linux kernel testing using Rapido.

There were many other interesting talks during the conference, all of which can be viewed on the oSC 2017 media site.
A video of my presentation is available below, and on YouTube. Many thanks to the organisers and sponsors for putting on a great event.

Tuesday, December 27, 2016

Adding Reviewed-by and Acked-by Tags with Git

This week's "Git Rocks!" moment came while I was investigating how I could automatically add Reviewed-by, Acked-by, Tested-by, etc. tags to a given commit message.

Git's interpret-trailers command is capable of testing for and manipulating arbitrary Key: Value tags in commit messages.

For example, appending Reviewed-by: MY NAME <> to the top commit message is as simple as running:

> GIT_EDITOR='git interpret-trailers --trailer \
 "Reviewed-by: $(git config <$(git config>" \
 --in-place' git commit --amend 

Or with the help of a "git rb" alias, via:
> git config alias.rb "interpret-trailers --trailer \
 \"Reviewed-by: $(git config <$(git config>\" \
> GIT_EDITOR="git rb" git commit --amend

The above examples work by replacing the normal git commit editor with a call to git interpret-trailers, which appends the desired tag to the commit message and then exits.

My specific use case is to add Reviewed-by: tags to specific commits during interactive rebase, e.g.:
> git rebase --interactive HEAD~3

This brings up an editor with a list of the top three commits in the current branch. Assuming the aforementioned rb alias has been configured, individual commits will be given a Reviewed-by tag when appended with the following line:

exec GIT_EDITOR="git rb" git commit --amend

As an example, the following will see three commits applied, with the commit message for two of them (d9e994e and 5f8c115) appended with my Reviewed-by tag.

pick d9e994e ctdb: Fix CID 1398179 Argument cannot be negative
exec GIT_EDITOR="git rb" git commit --amend
pick 0fb313c ctdb: Fix CID 1398178 Argument cannot be negative
#    ^^^^^^^ don't add a Reviewed-by tag for this one just yet 
pick 5f8c115 ctdb: Fix CID 1398175 Dereference after null check
exec GIT_EDITOR="git rb" git commit --amend

Bonus: By default, the vim editor includes git rebase --interactive syntax highlighting and key-bindings - if you press K while hovering over a commit hash (e.g. d9e994e from above), vim will call git show <commit-hash>, making reviewing and tagging even faster!

Thanks to:
  • Upstream Git developers, especially those who implemented the interpret-trailers functionality.
  • My employer, SUSE.

Tuesday, December 13, 2016

Rapido: A Glorified Wrapper for Dracut and QEMU


I've blogged a few of times about how Dracut and QEMU can be combined to greatly improve Linux kernel dev/test turnaround.
  • My first post covered the basics of building the kernel, running dracut, and booting the resultant image with qemu-kvm.
  • A later post took a closer look at network configuration, and focused on bridging VMs with the hypervisor.
  • Finally, my third post looked at how this technique could be combined with Ceph, to provide a similarly efficient workflow for Ceph development.
In bringing this series to a conclusion, I'd like to introduce the newly released Rapido project. Rapido combines all of the procedures and techniques described in the articles above into a handful of scripts, which can be used to test specific Linux kernel functionality, standalone or alongside other technologies such as Ceph.



Usage - Standalone Linux VM

The following procedure was tested on openSUSE Leap 42.2 and SLES 12SP2, but should work fine on many other Linux distributions.


Step 1: Checkout and Build

Checkout the Linux kernel and Rapido source repositories:

~/> cd ~
~/> git clone
~/> git clone

Build the kernel (using a config provided with the Rapido source):
~/> cp rapido/kernel/vanilla_config linux/.config
~/> cd linux
~/linux/> make -j6
~/linux/> make modules
~/linux/> INSTALL_MOD_PATH=./mods make modules_install
~/linux/> sudo ln -s $PWD/mods/lib/modules/$(make kernelrelease) \
                        /lib/modules/$(make kernelrelease)

Step 2: Configuration 

Install Rapido dependencies: dracut, qemu, brctl (bridge-utils) and tunctl.

Edit rapido.conf, the master Rapido configuration file:
~/linux/> cd ~/rapido
~/rapido/> vi rapido.conf
  • set KERNEL_SRC="/home/<user>/linux"
  • set TAP_USER="<user>"
  • set MAC_ADDR1 to a valid MAC address, e.g. "b8:ac:24:45:c5:01"
  • set MAC_ADDR2 to a valid MAC address, e.g. "b8:ac:24:45:c5:02"

Configure the bridge and tap network devices. This must be done as root:
~/rapido/> sudo tools/
~/rapido/> ip addr show br0
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet scope global br0

Step 3: Image Generation 

Generate a minimal Linux VM image which includes binaries, libraries and kernel modules for filesystem testing:
~/rapido/> ./
 dracut: *** Creating initramfs image file 'initrds/myinitrd' done ***
~/rapido/> ls -lah initrds/myinitrd
-rw-r--r-- 1 ddiss users 30M Dec 13 18:17 initrds/myinitrd

Step 4 - Boot!

 ~/rapido/> ./
+ mount -t btrfs /dev/zram1 /mnt/scratch
[    3.542927] BTRFS info (device zram1): disk space caching is enabled
btrfs filesystem mounted at /mnt/test and /mnt/scratch

In a whopping four seconds, or thereabouts, the VM should have booted to a rapido:/# bash prompt. Leaving you with two zram backed Btrfs filesystems mounted at /mnt/test and /mnt/scratch.

Everything, including the VM's root filesystem, is in memory, so any changes will not persist across reboot. Use the rapido.conf QEMU_EXTRA_ARGS parameter if you wish to add persistent storage to a VM.

Although the network isn't used in this case, you should be able to observe that the VM's network adapter can be reached from the hypervisor, and vice-versa.
rapido1:/# ip a show dev eth0
    inet brd scope global eth0
rapido1:/# ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=1.97 ms

Once you're done playing around, you can shutdown:
rapido1:/# shutdown
[  267.304313] sysrq: SysRq : sysrq: Power Off
rapido1:/# [  268.168447] ACPI: Preparing to enter system sleep state S5
[  268.169493] reboot: Power down
+ exit 0



Usage - Ceph cluster and CephFS client VM

This usage guide builds on the previous standalone Linux VM procedure, but this time adds Ceph to the mix. If you're not interested in Ceph (how could you not be!) then feel free to skip to the next section.


Step I - Checkout and Build

We already have a clone of the Rapido and Linux kernel repositories. All that's needed for CephFS testing is a Ceph build:
~/> git clone
~/> cd ceph
<install Ceph build dependencies>
~/ceph/> cd build
~/ceph/build/> make -j4 


Step II - Start a Ceph "cluster"

Once Ceph has finished compiling, can be run with the following parameters to configure and locally start three OSDs, one monitor process, and one MDS.
~/ceph/build/> OSD=3 MON=1 RGW=0 MDS=1 ../src/ -i -n
~/ceph/build/> bin/ceph -c status
     health HEALTH_OK
     monmap e2: 1 mons at {a=}
            election epoch 4, quorum 0 a
      fsmap e5: 1/1/1 up {0=a=up:active}
        mgr no daemons active 
     osdmap e10: 3 osds: 3 up, 3 in


Step III - Rapido configuration

Edit rapido.conf, the master Rapido configuration file:
~/ceph/build/> cd ~/rapido
~/rapido/> vi rapido.conf
  • set CEPH_SRC="/home/<user>/ceph/src"
  • KERNEL_SRC and network parameters were configured earlier

Step IV - Image Generation

The script generates a VM image with the Ceph configuration and keyring from the cluster, as well as the CephFS kernel module.
~/rapido/> ./
 dracut: *** Creating initramfs image file 'initrds/myinitrd' done ***


Step V - Boot!

Booting the newly generated image should bring you to a shell prompt, with the provisioned CephFS filesystem mounted under /mnt/cephfs:
~/rapido/> ./
+ mount -t ceph /mnt/cephfs -o name=admin,secret=...
[    3.492742] libceph: mon0 session established
rapido1:/# df -h /mnt/cephfs
Filesystem             Size  Used Avail Use% Mounted on  1.3T  611G  699G  47% /mnt/cephfs

CephFS is a clustered filesystem, in which case testing from multiple clients is also of interest. From another window, boot a second VM:
~/rapido/> ./



Further Use Cases

Rapido ships with a bunch of scripts for testing different kernel components:
  • (shown above)
    • Image: includes Ceph config, credentials and CephFS kernel module
    • Boot: mounts CephFS filesystem
    • Image: includes CIFS (SMB client) kernel module
    • Boot: mounts share using details and credentials specified in rapido.conf
    • Image: includes dropbear SSH server
    • Boot: starts an SSH server with SSH_AUTHORIZED_KEY
    • Image: includes xfstests and CephFS kernel client
    • Boot: mounts CephFS filesystem and runs FSTESTS_AUTORUN_CMD
  • (shown above)
    • Image: includes xfstests and local Btrfs and XFS dependencies
    • Boot: provisions local xfstest zram devices. Runs FSTESTS_AUTORUN_CMD
    • Image: includes LIO, loopback dev and dm-delay kernel modules
    • Boot: provisions an iSCSI target, with three LUs exposed
    • Image: includes LIO and Ceph RBD kernel modules
    • Boot: provisions an iSCSI target backed by CEPH_RBD_IMAGE, using target_core_rbd
    • Image: CEPH_RBD_IMAGE is attached to the VM using qemu-block-rbd
    • Boot: runs shell only
    • Image: includes Ceph config, credentials and Ceph RBD kernel module
    • Boot: maps CEPH_RBD_IMAGE using the RBD kernel client
    • Image: includes Ceph config, librados, librbd, and pulls in tcmu-runner from TCMU_RUNNER_SRC
    • Boot: starts tcmu-runner and configures a tcmu+rbd backstore exposing CEPH_RBD_IMAGE via the LIO loopback fabric
  • (see
    • Image: usb_f_mass_storage, zram, dm-crypt, and RBD_USB_SRC
    • Boot: starts the script from RBD_USB_SRC




  • Dracut and QEMU can be combined for super-fast Linux kernel testing and development.
  • Rapido is mostly just a glorified wrapper around these utilities, but does provide some useful tools for automated testing of specific Linux kernel functionality.

If you run into any problems, or wish to provide any kind of feedback (always appreciated), please feel free to leave a message below, or raise a ticket in the Rapido issue tracker.

Update 20170106:
  • Add details and fix the example CEPH_SRC path.

Tuesday, June 28, 2016

Linux USB Gadget Application Testing

Developing a USB gadget application that runs on Linux?
Following a recent Ceph USB gateway project, I was looking at ways to test a Linux USB device without the need to fiddle with cables, or deal with slow embedded board boot times.

Ideally USB gadget testing could be performed by running the USB device code within a virtual machine, and attaching the VM's virtual USB device port to an emulated USB host controller on the hypervisor system.

I was unfortunately unable to find support for virtual USB device ports in QEMU, so I abandoned the above architecture, and discovered dummy_hcd.ko instead.

The dummy_hcd Linux kernel module is an excellent utility for USB device testing from within a standalone system or VM.

dummy_hcd.ko offers the following features:
  • Re-route USB device traffic back to the local system
    • Effectively providing device loopback functionality
  • USB high-speed and super-speed connection simulation
It can be enabled via the USB_DUMMY_HCD kernel config parameter. Once the module is loaded, no further configuration is required.

Tuesday, May 10, 2016

Rapid Ceph Kernel Module Testing with


Ceph's utility is very useful for deploying and testing a mock cluster directly from the Ceph source repository. It can:
  • Generate a cluster configuration file and authentication keys
  • Provision and deploy a number of OSDs
    • Backed by local disk, or memory using the --memstore parameter
  • Deploy an arbitrary number of monitor, MDS or rados-gateway nodes
All services are deployed as the running user. I.e. root access is not needed.

Once deployed, the mock cluster can be used with any of the existing Ceph client utilities, or exercised with the unit tests in the Ceph src/test directory.

When developing or testing Linux kernel changes for CephFS or RBD, it's useful to also be able to use these kernel clients against a deployed Ceph cluster.

Test Environment Overview - image based on content by Sage Weil

The instructions below walk through configuration and deployment of all components needed to test Linux kernel RBD and CephFS modules against a mock Ceph cluster. The procedure was performed on openSUSE Leap 42.1, but should also be applicable for other Linux distributions.

Network Setup

First off, configure a bridge interface to connect the Ceph cluster with a kernel client VM network:

> sudo /sbin/brctl addbr br0
> sudo ip addr add dev br0
> sudo ip link set dev br0 up

br0 will not be bridged with any physical adapters, just the kernel VM via a TAP interface which is configured with:

> sudo /sbin/tunctl -u $(whoami) -t tap0
> sudo /sbin/brctl addif br0 tap0
> sudo ip link set tap0 up

For more information on the bridge setup, see:

Ceph Cluster Deployment

The Ceph cluster can now be deployed, with all nodes accepting traffic on the bridge network:

> cd $ceph_source_dir
<build Ceph>
> cd src
> OSD=3 MON=1 RGW=0 MDS=1 ./ -i -n --memstore

$ceph_source_dir should be replaced with the actual path. Be sure to specify the same IP address with -i as was assigned to the br0 interface.

More information about usage can be found at:

Kernel VM Deployment

Build a kernel:

> cd $kernel_source_dir
> make menuconfig 
$kernel_source_dir should be replaced with the actual path. Ensure CONFIG_BLK_DEV_RBD=m, CONFIG_CEPH_FS=y, CONFIG_CEPH_LIB=y, CONFIG_E1000=y and CONFIG_IP_PNP=y are set in the kernel config. A sample can be found here.
> make
> INSTALL_MOD_PATH=./mods make modules_install

Create a link to the modules directory ./mods, so that Dracut can find them:
> sudo ln -s $PWD/mods/lib/modules/$(make kernelrelease) \
                /lib/modules/$(make kernelrelease)

Generate an initramfs with Dracut. This image will be used as the test VM.
> export CEPH_SRC=$ceph_source_dir/src
> dracut --no-compress  --kver "$(cat include/config/kernel.release)" \
        --install "tail blockdev ps rmdir resize dd vim grep find df sha256sum \
                   strace mkfs.xfs /lib64/" \
        --include "$CEPH_SRC/mount.ceph" "/sbin/mount.ceph" \
        --include "$CEPH_SRC/ceph.conf" "/etc/ceph/ceph.conf" \
        --add-drivers "rbd" \
        --no-hostonly --no-hostonly-cmdline \
        --modules "bash base network ifcfg" \
        --force myinitrd

Boot the kernel and initramfs directly using QEMU/KVM:
> qemu-kvm -smp cpus=2 -m 512 \
        -kernel arch/x86/boot/bzImage -initrd myinitrd \
        -device e1000,netdev=network1,mac=b8:ac:6f:31:45:70 \
        -netdev tap,id=network1,script=no,downscript=no,ifname=tap0 \
        -append "ip= \
       console=ttyS0 rd.lvm=0 rd.luks=0" \

This should bring up a Dracut debug shell in the VM, with a network configuration matching the values parsed in via the ip= kernel parameter.

dracut:/# ip a
2: eth0: ... mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:ac:6f:31:45:70 brd ff:ff:ff:ff:ff:ff
    inet brd scope global eth0

For more information on kernel setup, see:

RBD Image Provisioning

An RBD volume can be provisioned using the regular Ceph utilities in the Ceph source directory:

> cd $ceph_source_dir/src
> ./rados lspools

By default, an rbd pool is created by, which can be used for RBD images:
> ./rbd create --image-format 1 --size 1024 1g_vstart_img
> ./rbd ls -l
1g_vstart_img 1024M          1

Note: "--image-format 1" is specified to ensure that the kernel supports all features of the provisioned RBD image.

Kernel RBD Usage

From the Dracut shell, the newly provisioned 1g_vstart_img image can be mapped locally using the sysfs filesystem:
dracut:/# modprobe rbd
[    9.031056] rbd: loaded
dracut:/# echo -n " name=admin,secret=AQBPiuhd9389dh28djASE32Ceiojc234AF345w== rbd 1g_vstart_img -" > /sys/bus/rbd/add
[  347.743272] libceph: mon0 session established
[  347.744284] libceph: client4121 fsid 234b432f-a895-43d2-23fd-9127a1837b32
[  347.749516] rbd: rbd0: added with size 0x40000000

Note: The monitor address and admin credentials can be retrieved from the ceph.conf and keyring files respectively, located in the Ceph source directory.

The /dev/rbd0 mapped image can now be used like any other block device:
dracut:/# mkfs.xfs /dev/rbd0 
dracut:/# mkdir -p /mnt/rbdfs
dracut:/# mount /dev/rbd0 /mnt/rbdfs
[  415.841757] XFS (rbd0): Mounting V4 Filesystem
[  415.917595] XFS (rbd0): Ending clean mount
dracut:/# df -h /mnt/rbdfs
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd0      1014M   33M  982M   4% /mnt/rbdfs

Kernel CephFS Usage already goes to the effort of deploying a filesystem:
> cd $ceph_source_dir/src
> ./ceph fs ls
> name: cephfs_a, metadata pool: cephfs_metadata_a, data pools: [cephfs_data_a ]

All that's left is to mount it from the kernel VM using the mount.ceph binary that was copied into the initramfs:
dracut:/# mkdir -p /mnt/mycephfs
dracut:/# mount.ceph /mnt/mycephfs \
                -o name=admin,secret=AQBPiuhd9389dh28djASE32Ceiojc234AF345w==
[  723.103153] libceph: mon0 session established
[  723.184978] libceph: client4122 fsid 234b432f-a895-43d2-23fd-9127a1837b32

dracut:/# df -h /mnt/mycephfs/
Filesystem            Size  Used Avail Use% Mounted on  3.0G  4.0M  3.0G   1% /mnt/mycephfs

Note: The monitor address and admin credentials can be retrieved from the ceph.conf and keyring files respectively, located in the Ceph source directory.


Unmount CephFS:
dracut:/# umount /mnt/mycephfs

Unmount the RBD image:
dracut:/# umount /dev/rbd0
[ 1592.592510] XFS (rbd0): Unmounting Filesystem

Unmap the RBD image (0 is derived from /dev/rbdX):
dracut:/# echo -n 0 > /sys/bus/rbd/remove

Power-off the VM:
dracut:/# echo 1 > /proc/sys/kernel/sysrq && echo o > /proc/sysrq-trigger
[ 1766.387417] sysrq: SysRq : Power Off
dracut:/# [ 1766.811686] ACPI: Preparing to enter system sleep state S5
[ 1766.812217] reboot: Power down

Shutdown the Ceph cluster:
> cd $ceph_source_dir/src
> ./


A mock Ceph cluster can be deployed from source in a matter of seconds using the utility.
Likewise, a kernel can be booted directly from source alongside a throwaway VM and connected to the mock Ceph cluster in a couple of minutes with Dracut and QEMU/KVM.

This environment is ideal for rapid development and integration testing of Ceph user-space and kernel components, including RBD and CephFS.