Tuesday, May 10, 2016

Rapid Ceph Kernel Module Testing with vstart.sh


Ceph's vstart.sh utility is very useful for deploying and testing a mock cluster directly from the Ceph source repository. It can:
  • Generate a cluster configuration file and authentication keys
  • Provision and deploy a number of OSDs
    • Backed by local disk, or memory using the --memstore parameter
  • Deploy an arbitrary number of monitor, MDS or rados-gateway nodes
All services are deployed as the running user. I.e. root access is not needed.

Once deployed, the mock cluster can be used with any of the existing Ceph client utilities, or exercised with the unit tests in the Ceph src/test directory.

When developing or testing Linux kernel changes for CephFS or RBD, it's useful to also be able to use these kernel clients against a vstart.sh deployed Ceph cluster.

Test Environment Overview - image based on content by Sage Weil

The instructions below walk through configuration and deployment of all components needed to test Linux kernel RBD and CephFS modules against a mock Ceph cluster. The procedure was performed on openSUSE Leap 42.1, but should also be applicable for other Linux distributions.

Network Setup

First off, configure a bridge interface to connect the Ceph cluster with a kernel client VM network:

> sudo /sbin/brctl addbr br0
> sudo ip addr add dev br0
> sudo ip link set dev br0 up

br0 will not be bridged with any physical adapters, just the kernel VM via a TAP interface which is configured with:

> sudo /sbin/tunctl -u $(whoami) -t tap0
> sudo /sbin/brctl addif br0 tap0
> sudo ip link set tap0 up

For more information on the bridge setup, see:

Ceph Cluster Deployment

The Ceph cluster can now be deployed, with all nodes accepting traffic on the bridge network:

> cd $ceph_source_dir
<build Ceph>
> cd src
> OSD=3 MON=1 RGW=0 MDS=1 ./vstart.sh -i -n --memstore

$ceph_source_dir should be replaced with the actual path. Be sure to specify the same IP address with -i as was assigned to the br0 interface.

More information about vstart.sh usage can be found at:

Kernel VM Deployment

Build a kernel:

> cd $kernel_source_dir
> make menuconfig 
$kernel_source_dir should be replaced with the actual path. Ensure CONFIG_BLK_DEV_RBD=m, CONFIG_CEPH_FS=y, CONFIG_CEPH_LIB=y, CONFIG_E1000=y and CONFIG_IP_PNP=y are set in the kernel config. A sample can be found here.
> make
> INSTALL_MOD_PATH=./mods make modules_install

Create a link to the modules directory ./mods, so that Dracut can find them:
> sudo ln -s $PWD/mods/lib/modules/$(make kernelrelease) \
                /lib/modules/$(make kernelrelease)

Generate an initramfs with Dracut. This image will be used as the test VM.
> export CEPH_SRC=$ceph_source_dir/src
> dracut --no-compress  --kver "$(cat include/config/kernel.release)" \
        --install "tail blockdev ps rmdir resize dd vim grep find df sha256sum \
                   strace mkfs.xfs /lib64/libkeyutils.so.1" \
        --include "$CEPH_SRC/mount.ceph" "/sbin/mount.ceph" \
        --include "$CEPH_SRC/ceph.conf" "/etc/ceph/ceph.conf" \
        --add-drivers "rbd" \
        --no-hostonly --no-hostonly-cmdline \
        --modules "bash base network ifcfg" \
        --force myinitrd

Boot the kernel and initramfs directly using QEMU/KVM:
> qemu-kvm -smp cpus=2 -m 512 \
        -kernel arch/x86/boot/bzImage -initrd myinitrd \
        -device e1000,netdev=network1,mac=b8:ac:6f:31:45:70 \
        -netdev tap,id=network1,script=no,downscript=no,ifname=tap0 \
        -append "ip= \
                rd.shell=1 console=ttyS0 rd.lvm=0 rd.luks=0" \

This should bring up a Dracut debug shell in the VM, with a network configuration matching the values parsed in via the ip= kernel parameter.

dracut:/# ip a
2: eth0: ... mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:ac:6f:31:45:70 brd ff:ff:ff:ff:ff:ff
    inet brd scope global eth0

For more information on kernel setup, see:

RBD Image Provisioning

An RBD volume can be provisioned using the regular Ceph utilities in the Ceph source directory:

> cd $ceph_source_dir/src
> ./rados lspools

By default, an rbd pool is created by vstart.sh, which can be used for RBD images:
> ./rbd create --image-format 1 --size 1024 1g_vstart_img
> ./rbd ls -l
1g_vstart_img 1024M          1

Note: "--image-format 1" is specified to ensure that the kernel supports all features of the provisioned RBD image.

Kernel RBD Usage

From the Dracut shell, the newly provisioned 1g_vstart_img image can be mapped locally using the sysfs filesystem:
dracut:/# modprobe rbd
[    9.031056] rbd: loaded
dracut:/# echo -n " name=admin,secret=AQBPiuhd9389dh28djASE32Ceiojc234AF345w== rbd 1g_vstart_img -" > /sys/bus/rbd/add
[  347.743272] libceph: mon0 session established
[  347.744284] libceph: client4121 fsid 234b432f-a895-43d2-23fd-9127a1837b32
[  347.749516] rbd: rbd0: added with size 0x40000000

Note: The monitor address and admin credentials can be retrieved from the ceph.conf and keyring files respectively, located in the Ceph source directory.

The /dev/rbd0 mapped image can now be used like any other block device:
dracut:/# mkfs.xfs /dev/rbd0 
dracut:/# mkdir -p /mnt/rbdfs
dracut:/# mount /dev/rbd0 /mnt/rbdfs
[  415.841757] XFS (rbd0): Mounting V4 Filesystem
[  415.917595] XFS (rbd0): Ending clean mount
dracut:/# df -h /mnt/rbdfs
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd0      1014M   33M  982M   4% /mnt/rbdfs

Kernel CephFS Usage

vstart.sh already goes to the effort of deploying a filesystem:
> cd $ceph_source_dir/src
> ./ceph fs ls
> name: cephfs_a, metadata pool: cephfs_metadata_a, data pools: [cephfs_data_a ]

All that's left is to mount it from the kernel VM using the mount.ceph binary that was copied into the initramfs:
dracut:/# mkdir -p /mnt/mycephfs
dracut:/# mount.ceph /mnt/mycephfs \
                -o name=admin,secret=AQBPiuhd9389dh28djASE32Ceiojc234AF345w==
[  723.103153] libceph: mon0 session established
[  723.184978] libceph: client4122 fsid 234b432f-a895-43d2-23fd-9127a1837b32

dracut:/# df -h /mnt/mycephfs/
Filesystem            Size  Used Avail Use% Mounted on  3.0G  4.0M  3.0G   1% /mnt/mycephfs

Note: The monitor address and admin credentials can be retrieved from the ceph.conf and keyring files respectively, located in the Ceph source directory.


Unmount CephFS:
dracut:/# umount /mnt/mycephfs

Unmount the RBD image:
dracut:/# umount /dev/rbd0
[ 1592.592510] XFS (rbd0): Unmounting Filesystem

Unmap the RBD image (0 is derived from /dev/rbdX):
dracut:/# echo -n 0 > /sys/bus/rbd/remove

Power-off the VM:
dracut:/# echo 1 > /proc/sys/kernel/sysrq && echo o > /proc/sysrq-trigger
[ 1766.387417] sysrq: SysRq : Power Off
dracut:/# [ 1766.811686] ACPI: Preparing to enter system sleep state S5
[ 1766.812217] reboot: Power down

Shutdown the Ceph cluster:
> cd $ceph_source_dir/src
> ./stop.sh


A mock Ceph cluster can be deployed from source in a matter of seconds using the vstart.sh utility.
Likewise, a kernel can be booted directly from source alongside a throwaway VM and connected to the mock Ceph cluster in a couple of minutes with Dracut and QEMU/KVM.

This environment is ideal for rapid development and integration testing of Ceph user-space and kernel components, including RBD and CephFS.

No comments:

Post a Comment

Comments are moderated due to spammer abuse.