Tuesday, May 10, 2016

Rapid Ceph Kernel Module Testing with vstart.sh

Introduction

Ceph's vstart.sh utility is very useful for deploying and testing a mock cluster directly from the Ceph source repository. It can:
  • Generate a cluster configuration file and authentication keys
  • Provision and deploy a number of OSDs
    • Backed by local disk, or memory using the --memstore parameter
  • Deploy an arbitrary number of monitor, MDS or rados-gateway nodes
All services are deployed as the running user. I.e. root access is not needed.

Once deployed, the mock cluster can be used with any of the existing Ceph client utilities, or exercised with the unit tests in the Ceph src/test directory.

When developing or testing Linux kernel changes for CephFS or RBD, it's useful to also be able to use these kernel clients against a vstart.sh deployed Ceph cluster.

Test Environment Overview - image based on content by Sage Weil

The instructions below walk through configuration and deployment of all components needed to test Linux kernel RBD and CephFS modules against a mock Ceph cluster. The procedure was performed on openSUSE Leap 42.1, but should also be applicable for other Linux distributions.

Network Setup

First off, configure a bridge interface to connect the Ceph cluster with a kernel client VM network:

> sudo /sbin/brctl addbr br0
> sudo ip addr add 192.168.155.1/24 dev br0
> sudo ip link set dev br0 up

br0 will not be bridged with any physical adapters, just the kernel VM via a TAP interface which is configured with:

> sudo /sbin/tunctl -u $(whoami) -t tap0
> sudo /sbin/brctl addif br0 tap0
> sudo ip link set tap0 up

For more information on the bridge setup, see:
http://blog.elastocloud.org/2015/07/qemukvm-bridged-network-with-tap.html

Ceph Cluster Deployment

The Ceph cluster can now be deployed, with all nodes accepting traffic on the bridge network:

> cd $ceph_source_dir
<build Ceph>
> cd src
> OSD=3 MON=1 RGW=0 MDS=1 ./vstart.sh -i 192.168.155.1 -n --memstore

$ceph_source_dir should be replaced with the actual path. Be sure to specify the same IP address with -i as was assigned to the br0 interface.

More information about vstart.sh usage can be found at:
 http://docs.ceph.com/docs/hammer/dev/dev_cluster_deployement/

Kernel VM Deployment

Build a kernel:

 
> cd $kernel_source_dir
> make menuconfig 
$kernel_source_dir should be replaced with the actual path. Ensure CONFIG_BLK_DEV_RBD=m, CONFIG_CEPH_FS=y, CONFIG_CEPH_LIB=y, CONFIG_E1000=y and CONFIG_IP_PNP=y are set in the kernel config. A sample can be found here.
 
> make
> INSTALL_MOD_PATH=./mods make modules_install
 

Create a link to the modules directory ./mods, so that Dracut can find them:
 
> sudo ln -s $PWD/mods/lib/modules/$(make kernelrelease) \
                /lib/modules/$(make kernelrelease)

Generate an initramfs with Dracut. This image will be used as the test VM.
 
> export CEPH_SRC=$ceph_source_dir/src
> dracut --no-compress  --kver "$(cat include/config/kernel.release)" \
        --install "tail blockdev ps rmdir resize dd vim grep find df sha256sum \
                   strace mkfs.xfs /lib64/libkeyutils.so.1" \
        --include "$CEPH_SRC/mount.ceph" "/sbin/mount.ceph" \
        --include "$CEPH_SRC/ceph.conf" "/etc/ceph/ceph.conf" \
        --add-drivers "rbd" \
        --no-hostonly --no-hostonly-cmdline \
        --modules "bash base network ifcfg" \
        --force myinitrd

Boot the kernel and initramfs directly using QEMU/KVM:
 
> qemu-kvm -smp cpus=2 -m 512 \
        -kernel arch/x86/boot/bzImage -initrd myinitrd \
        -device e1000,netdev=network1,mac=b8:ac:6f:31:45:70 \
        -netdev tap,id=network1,script=no,downscript=no,ifname=tap0 \
        -append "ip=192.168.155.2:::255.255.255.0:myhostname \
                rd.shell=1 console=ttyS0 rd.lvm=0 rd.luks=0" \
        -nographic

This should bring up a Dracut debug shell in the VM, with a network configuration matching the values parsed in via the ip= kernel parameter.

dracut:/# ip a
...
2: eth0: ... mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether b8:ac:6f:31:45:70 brd ff:ff:ff:ff:ff:ff
    inet 192.168.155.2/24 brd 192.168.155.255 scope global eth0

For more information on kernel setup, see:
http://blog.elastocloud.org/2015/06/rapid-linux-kernel-devtest-with-qemu.html

RBD Image Provisioning

An RBD volume can be provisioned using the regular Ceph utilities in the Ceph source directory:

> cd $ceph_source_dir/src
> ./rados lspools
rbd
...

By default, an rbd pool is created by vstart.sh, which can be used for RBD images:
 
> ./rbd create --image-format 1 --size 1024 1g_vstart_img
> ./rbd ls -l
NAME           SIZE PARENT FMT PROT LOCK
1g_vstart_img 1024M          1

Note: "--image-format 1" is specified to ensure that the kernel supports all features of the provisioned RBD image.

Kernel RBD Usage

From the Dracut shell, the newly provisioned 1g_vstart_img image can be mapped locally using the sysfs filesystem:
dracut:/# modprobe rbd
[    9.031056] rbd: loaded
dracut:/# echo -n "192.168.155.1:6789 name=admin,secret=AQBPiuhd9389dh28djASE32Ceiojc234AF345w== rbd 1g_vstart_img -" > /sys/bus/rbd/add
[  347.743272] libceph: mon0 192.168.155.1:6789 session established
[  347.744284] libceph: client4121 fsid 234b432f-a895-43d2-23fd-9127a1837b32
[  347.749516] rbd: rbd0: added with size 0x40000000

Note: The monitor address and admin credentials can be retrieved from the ceph.conf and keyring files respectively, located in the Ceph source directory.

The /dev/rbd0 mapped image can now be used like any other block device:
dracut:/# mkfs.xfs /dev/rbd0 
...
dracut:/# mkdir -p /mnt/rbdfs
dracut:/# mount /dev/rbd0 /mnt/rbdfs
[  415.841757] XFS (rbd0): Mounting V4 Filesystem
[  415.917595] XFS (rbd0): Ending clean mount
dracut:/# df -h /mnt/rbdfs
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd0      1014M   33M  982M   4% /mnt/rbdfs


Kernel CephFS Usage

vstart.sh already goes to the effort of deploying a filesystem:
> cd $ceph_source_dir/src
> ./ceph fs ls
> name: cephfs_a, metadata pool: cephfs_metadata_a, data pools: [cephfs_data_a ]

All that's left is to mount it from the kernel VM using the mount.ceph binary that was copied into the initramfs:
dracut:/# mkdir -p /mnt/mycephfs
dracut:/# mount.ceph 192.168.155.1:6789:/ /mnt/mycephfs \
                -o name=admin,secret=AQBPiuhd9389dh28djASE32Ceiojc234AF345w==
[  723.103153] libceph: mon0 192.168.155.1:6789 session established
[  723.184978] libceph: client4122 fsid 234b432f-a895-43d2-23fd-9127a1837b32

dracut:/# df -h /mnt/mycephfs/
Filesystem            Size  Used Avail Use% Mounted on
192.168.155.1:6789:/  3.0G  4.0M  3.0G   1% /mnt/mycephfs


Note: The monitor address and admin credentials can be retrieved from the ceph.conf and keyring files respectively, located in the Ceph source directory.

Cleanup

Unmount CephFS:
dracut:/# umount /mnt/mycephfs

Unmount the RBD image:
dracut:/# umount /dev/rbd0
[ 1592.592510] XFS (rbd0): Unmounting Filesystem

Unmap the RBD image (0 is derived from /dev/rbdX):
dracut:/# echo -n 0 > /sys/bus/rbd/remove

Power-off the VM:
dracut:/# echo 1 > /proc/sys/kernel/sysrq && echo o > /proc/sysrq-trigger
[ 1766.387417] sysrq: SysRq : Power Off
dracut:/# [ 1766.811686] ACPI: Preparing to enter system sleep state S5
[ 1766.812217] reboot: Power down

Shutdown the Ceph cluster:
> cd $ceph_source_dir/src
> ./stop.sh

Conclusion

A mock Ceph cluster can be deployed from source in a matter of seconds using the vstart.sh utility.
Likewise, a kernel can be booted directly from source alongside a throwaway VM and connected to the mock Ceph cluster in a couple of minutes with Dracut and QEMU/KVM.

This environment is ideal for rapid development and integration testing of Ceph user-space and kernel components, including RBD and CephFS.

Monday, March 28, 2016

Efficient Microsoft Azure Uploads and Downloads

With the release of version 0.7.1, Elasto is now capable of efficient (sparse aware) uploads and downloads to/from Microsoft Azure, using the Blob and File services.

Example of a Microsoft Azure Page Blob Download


This is done by determining which regions of a Page Blob, File Service file, or local file are allocated and only transferring those regions, which improves both network and storage utilisation.
  • For Azure Page Blobs, the Get Page Ranges API request is used to obtain a list of allocated regions.
  • For Azure File Service files, the List Ranges API request is used.
  • For local files, SEEK_DATA and SEEK_HOLE are used to determine which regions of a file are allocated.
  • Amazon S3 Objects and Azure Block Blobs are still downloaded and uploaded in entirety.
    • Sparse regions are unsupported by these services.
Elasto is free software, and can be obtained for openSUSE and many other Linux distributions from the openSUSE Build Service. Be safe, take backups before experimenting with this new feature.

Tuesday, December 15, 2015

Ceph USB Storage Gateway


Last week was Hackweek, a week full of fun and innovation at SUSE. I decided to use the time to work on a USB storage gateway for Ceph.



The concept is simple - create a USB device that, when configured and connected, exposes remote Ceph RADOS Block Device (RBD) images for access as USB mass storage, allowing for:
  • Ceph storage usage by almost any system with a USB port
    • Including dumb systems such as TVs, MP3 players and mobile phones
  • Boot from RBD images
    • Many systems are capable of booting from a USB mass storage device
  • Minimal configuration
    • Network, Ceph credentials and image details should be all that's needed for configuration


Hardware

I already own a Cubietruck, which has the following desirable characteristics for this project:
  • Works with a mainline Linux Kernel
  • Is reasonably small and portable
  • Supports power and data transfer via a single mini-USB port
  • Performs relatively well
    • Dual-core 1GHz processor and 2GB RAM
    • Gigabit network adapter and WiFi 802.11 b/g/n

Possible alternatives worth evaluation include C.H.I.P (smaller and cheaper), NanoPi2, and UP (faster). I should take this opportunity to mention that I do gladly accept hardware donations!


Base System

I decided on using openSUSE Tumbleweed as the base operating system for this project. An openSUSE Tubleweed ARM port for the Cubietruck is available for download at:
http://download.opensuse.org/ports/armv7hl/factory/images/openSUSE-Tumbleweed-ARM-JeOS-cubietruck.armv7l-Current.raw.xz

Installation is as straightforward as copying the image to an SD card and booting - I documented the installation procedure on the openSUSE Wiki.
Releases prior to Build350 exhibit boot issues due to the U-Boot device-tree path. However, this has been fixed in recent builds.


Kernel

The Linux kernel currently shipped with the openSUSE image does not include support for acting as a USB mass storage gadget, nor does it include Ceph RBD support. In order to obtain these features, and also reduce the size of the base image, I built a mainline Linux kernel (4.4-rc4) using a minimal custom kernel configuration:
~/> sudo zypper install --no-recommends git-core gcc make ncurses-devel bc
~/> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
~/> cd linux
~/linux/> wget https://raw.githubusercontent.com/ddiss/ceph_usb_gateway/master/.config
          # or `make sunxi_defconfig menuconfig`
          #     ->enable Ceph, sunxi and USB gadget modules
~/linux/> make oldconfig
~/linux/> make -j2 zImage dtbs modules
~/linux/> sudo make install modules_install
~/linux/> sudo cp arch/arm/boot/zImage /boot/zImage-$(make kernelrelease)
~/linux/> sudo cp arch/arm/boot/dts/sun7i-a20-cubietruck.dtb /boot/dtb-4.3.0-2/
~/linux/> sudo cp arch/arm/boot/dts/sun7i-a20-cubietruck.dtb /boot/dtb/

This build procedure takes a long time to complete. Cross compilation could be used to improve build times.
I plan on publishing a USB gadget enabled ARM kernel on the Open Build Service in the future, which would allow for simple installation via zypper - watch this space!


Ceph RADOS Block Device (RBD) mapping

To again save space, I avoided installation of user-space Ceph packages by using the bare kernel sysfs interface for RBD image mapping.
The Ceph RBD kernel module must be loaded prior to use:
# modprobe rbd

Ceph RADOS block devices can be mapped using the following command:
# echo -n "${MON_IP}:6789 name=${AUTH_NAME},secret=${AUTH_SECRET} " \
          "${CEPH_POOL} ${CEPH_IMG} -" > /sys/bus/rbd/add

$MON_IP can be obtained from ceph.conf. Similarly, the $AUTH_NAME and $AUTH_SECRET credentials can be retrieved from a regular Ceph keyring file.
$CEPH_POOL and $CEPH_IMG correspond to the location of the RBD image.

A locally mapped RBD block device can be subsequently removed via:
# echo -n "${DEV_ID}" > /sys/bus/rbd/remove

$DEV_ID can be determined from the numeric suffix assigned to the /dev/rbdX device path.

Images can't be provisioned without the Ceph user-space utilities installed, so should be performed on a separate system (e.g. an OSD) prior to mapping on the Cubietruck. E.g. To provision a 10GB image:
# rbd create --size=10240 --pool ${CEPH_POOL} ${CEPH_IMG}

With my Cubietruck connected to the network via the ethernet adapter, I observed streaming read (/dev/rbd -> /dev/null) throughput at ~37MB/s, and the same value for streaming writes (/dev/zero -> /dev/rbd). Performance appears to be constrained by limitations of the Cubietruck hardware.


USB Mass Storage Gadget

The Linux kernel mass storage gadget module is configured via configfs. A device can be exposed as a USB mass storage device with the following procedure:
# modprobe sunxi configfs libcomposite usb_f_mass_storage

# mount -t configfs configfs /sys/kernel/config
# cd /sys/kernel/config/usb_gadget/
# mkdir -p ceph
# cd ceph

# mkdir -p strings/0x409
# echo "fedcba9876543210" > strings/0x409/serialnumber
# echo "openSUSE" > strings/0x409/manufacturer
# echo "Ceph USB Drive" > strings/0x409/product

# mkdir -p functions/mass_storage.usb0
# echo 1 > functions/mass_storage.usb0/stall
# echo 0 > functions/mass_storage.usb0/lun.0/cdrom
# echo 0 > functions/mass_storage.usb0/lun.0/ro
# echo 0 > functions/mass_storage.usb0/lun.0/nofua
# echo "$DEV" > functions/mass_storage.usb0/lun.0/file

# mkdir -p configs/c.1/strings/0x409
# echo "Config 1: mass-storage" > configs/c.1/strings/0x409/configuration
# echo 250 > configs/c.1/MaxPower
# ln -s functions/mass_storage.usb0 configs/c.1/

# ls /sys/class/udc > UDC

$DEV corresponds to a /dev/X device path, which should be a locally mapped RBD device path. The module can however also use local files as backing for USB mass storage.


Boot-Time Automation

By default, Cubietruck boots when the board is connected to a USB host via the mini-USB connection.
With RBD image mapping and USB mass storage exposure now working, the process can be run automatically on boot via a simple script: rbd_usb_gw.sh
Furthermore, a systemd service can be added:
[Unit]
Wants=network-online.target
After=network-online.target

[Service]
# XXX assume that rbd_usb_gw.sh is present in /bin
ExecStart=/bin/rbd_usb_gw.sh %i
Type=oneshot
RemainAfterExit=yes

Finally, this service can be triggered by Wicked when the network interface comes online, with the following entry added to /etc/sysconfig/network/config:
POST_UP_SCRIPT="systemd:rbd-mapper@.service"


Boot Performance Optimisation

A significant reduction in boot time can be achieved by running everything from initramfs, rather than booting to the full Linux distribution.
Generating a minimal initramfs image, with support for mapping and exposing RBD images is straightforward, thanks to the Dracut utility:
# dracut --no-compress  \
         --kver "`uname -r" \
         --install "ps rmdir dd vim grep find df modinfo" \
         --add-drivers "rbd musb_hdrc sunxi configfs" \
         --no-hostonly --no-hostonly-cmdline \
         --modules "bash base network ifcfg" \
         --include /bin/rbd_usb_gw.sh /lib/dracut/hooks/emergency/02_rbd_usb_gw.sh \
         myinitrd

The rbd_usb_gw.sh script is installed into the initramfs image as a Dracut emergency hook, which sees it executed as soon as initramfs has booted.

To ensure that the network is up prior to the launch of rbd_usb_gw.sh, the kernel DHCP client (CONFIG_IP_PNP_DHCP) can be used by appending ip=dhcp to the boot-time kernel parameters. This can be set from the U-Boot bootloader prompt:
=> setenv append 'ip=dhcp'
=> boot

The new initramfs image must be committed to the boot partition via:

# cp myinitrd /boot/
# rm /boot/initrd
# sudo ln -s /boot/myinitrd /boot/initrd

Note: In order to boot back to the full Linux distribution, you will have to mount the /boot partition and revert the /boot/initrd symlink to its previous target.


Future Improvements

  • Support configuration of the device without requiring console access
    • Run an embedded web-server, or expose a configuration filesystem via USB 
  • Install the operating system onto on-board NAND storage,
  • Further improve boot times
    • Avoid U-Boot device probes
  • Experiment with the new f_tcm USB gadget module
    • Expose RBD images via USB and iSCSI


Credits

Many thanks to:
  • My employer, SUSE Linux, for encouraging me to work on projects like this during Hackweek.
  • The linux-sunxi community, for their excellent contributions to the mainline Linux kernel.
  • Colleagues Dirk, Bernhard, Alex and Andreas for their help in bringing up openSUSE Tumbleweed on my Cubietruck board.

Sunday, July 12, 2015

QEMU/KVM Bridged Network with TAP interfaces

In my previous post, Rapid Linux Kernel Dev/Test with QEMU, KVM and Dracut, I described how build and boot a Linux kernel quickly, making use of port forwarding between hypervisor and guest VM for virtual network traffic.

This post describes how to plumb the Linux VM directly into a hypervisor network, through the use of a bridge.

Start by creating a bridge on the hypervisor system:
> sudo /sbin/brctl addbr br0

Clear the IP address on the network interface that you'll be bridging (e.g. eth0).
Note: This will disable network traffic on eth0!
> sudo ip addr flush dev eth0
Add the interface to the bridge:
> sudo /sbin/brctl addif br0 eth0


Next up, create a TAP interface:
> sudo /sbin/tunctl -u $(whoami)
Set 'tap0' persistent and owned by uid 1001
The -u parameter ensures that the current user will be able to connect to the TAP interface.

Add the TAP interface to the bridge:
> sudo /sbin/brctl addif br0 tap0

Make sure everything is up:
> sudo ip link set dev br0 up
> sudo ip link set dev tap0 up

The TAP interface is now ready for use. Assuming that a DHCP server is available on the bridged network, the VM can now obtain an IP address during boot via:
> qemu-kvm -kernel arch/x86/boot/bzImage \
           -initrd initramfs \
           -device e1000,netdev=network0,mac=52:55:00:d1:55:01 \
           -netdev tap,id=network0,ifname=tap0,script=no,downscript=no \
           -append "ip=dhcp rd.shell=1 console=ttyS0" -nographic

The MAC address is explicitly specified, so care should be taken to ensure its uniqueness.

The DHCP server response details are printed alongside network interface configuration. E.g.
[    3.792570] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[    3.796085] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    3.812083] Sending DHCP requests ., OK
[    4.824174] IP-Config: Got DHCP answer from 10.155.0.42, my address is 10.155.0.1
[    4.825119] IP-Config: Complete:
[    4.825476]      device=eth0, hwaddr=52:55:00:d1:55:01, ipaddr=10.155.0.1, mask=255.255.0.0, gw=10.155.0.254
[    4.826546]      host=rocksolid-sles, domain=suse.de, nis-domain=suse.de
...

Didn't get an IP address? There are a few things to check:
  • Confirm that the kernel is built with boot-time DHCP client (CONFIG_IP_PNP_DHCP=y) and E1000 network driver (CONFIG_E1000=y) support.
  • Check the -device and -netdev arguments specify a valid e1000 TAP interface.
  • Ensure that ip=dhcp is provided as a kernel boot parameter, and that the DHCP server is up and running.
Happy hacking!

Wednesday, June 10, 2015

Rapid Linux Kernel Dev/Test with QEMU, KVM and Dracut

Inspired by Stefan Hajnoczi's excellent blog post, I recently set about constructing an environment for rapid testing of Linux kernel changes, particularly focused on the LIO iSCSI target. Such an environment would help me in number of ways:
  • Faster dev / test turnaround.
    • A modified kernel can be compiled and booted in a matter of seconds.
  • Improved resource utilisation.
    • No need to boot external test hosts or heavyweight VMs.
  •  Simplified and speedier debugging.

My requirements were slightly different to Stefan's, in that:
  • I'd prefer to be lazy and use Dracut for initramfs generation.
  • I need a working network connection between VM and hypervisor system
    • The VM will act as the iSCSI target, the hypervisor as the initiator.

Starting with the Linux kernel, the first step is to build a bzimage:
~/> git clone \
        git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
hack, hack, hack.
~/linux/> make menuconfig
Set CONFIG_IP_PNP_DHCP=y and CONFIG_E1000=y to enable IP address assignment on boot.
~/linux/> make -j6
~/linux/> make modules
~/linux/> INSTALL_MOD_PATH=./mods make modules_install
~/linux/> sudo ln -s $PWD/mods/lib/modules/$(make kernelrelease) \
                     /lib/modules/$(make kernelrelease)
This leaves us with a compressed kernel image file at arch/x86/boot/bzimage, and corresponding modules installed under mods/lib/module/$(make kernelrelease), where $(make kernelrelease) evaluates to 4.1.0-rc7+ in this example. The /lib/modules/4.1.0-rc7+ symlink allows Dracut to locate the modules.

The next step is to generate an initial RAM filesystem, or initramfs, which includes a minimal set of user-space utilities, and kernel modules needed for testing:

~/linux/> dracut --kver "$(make kernelrelease)" \
                 --add-drivers "iscsi_target_mod target_core_mod" \
                 --add-drivers "target_core_file target_core_iblock" \
                 --add-drivers "configfs" \
                 --install "ps grep netstat" \
                 --no-hostonly --no-hostonly-cmdline \
                 --modules "bash base shutdown network ifcfg" initramfs
...
*** Creating image file done ***

We now have an initramfs file in the current directory, with the following contents:
  • LIO kernel modules obtained from /lib/module/4.1.0-rc7, as directed via the --kver and --add-drivers parameters.
  • User-space shell, boot and network helpers, as directed via the --modules parameter.

We're now ready to use QEMU/KVM to boot our test kernel and initramfs:

~/linux/> qemu-kvm -kernel arch/x86/boot/bzImage \
                   -initrd initramfs \
                   -device e1000,netdev=network0 \
                   -netdev user,id=network0 \
                   -redir tcp:51550::3260 \
                   -append "ip=dhcp rd.shell=1 console=ttyS0" \
                   -nographic

This boots the test environment, with the kernel and initramfs previously generated:

[    3.216596] dracut Warning: dracut: FATAL: No or empty root= argument
[    3.217998] dracut Warning: dracut: Refusing to continue
...
Dropping to debug shell.

dracut:/#

From the dracut shell, confirm that the QEMU DHCP server assigned the VM an IP address:

dracut:/# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
...
    inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0

Port 3260 (iSCSI) on this interface is forwarded to/from port 51550 on the hypervisor, as configured via the qemu-kvm -redir parameter.

Now onto LIO iSCSI target setup. First off load the appropriate kernel modules:

dracut:/# modprobe iscsi_target_mod
dracut:/# cat /proc/modules
iscsi_target_mod 246669 0 - Live 0xffffffffa006a000
target_core_mod 289004 1 iscsi_target_mod, Live 0xffffffffa000b000
configfs 22407 3 iscsi_target_mod,target_core_mod, Live 0xffffffffa0000000

LIO configuration requires a mounted configfs filesystem:

dracut:/# mount -t configfs configfs /sys/kernel/config/
dracut:/# cat /sys/kernel/config/target/version 
Target Engine Core ConfigFS Infrastructure v4.1.0 on Linux/x86_64 on 4.1.0-rc1+

An iSCSI target can be provisioned by manipulating corresponding configfs entries. I used the lio_dump output on an existing setup as reference:

dracut:/# mkdir /sys/kernel/config/target/iscsi
dracut:/# echo -n 0 > /sys/kernel/config/target/iscsi/discovery_auth/enforce_discovery_auth
dracut:/# mkdir -p /sys/kernel/config/target/iscsi/<iscsi_iqn>/tpgt_1/np/10.0.2.15:3260
...

Finally, we're ready to connect to the LIO target using the local hypervisor port that forwards to the VM's virtual network adapter:

~/linux/> iscsiadm --mode discovery \
                   --type sendtargets \
                   --portal 127.0.0.1:51550
10.0.2.15:3260,1 iqn.2015-04.suse.arch:5eca2313-028d-435c-9131-53a5ab256a83

It works!

There are a few things that can be adjusted:
  • Port forwarding to the VM network is a bit fiddly - I'm now using a bridge/TAP configuration instead.
  • When dropping into the emergency boot shell, Dracut executes scripts carried under /lib/dracut/hooks/emergency/. This means that a custom script can be triggered on boot via:
    ~/linux/> dracut -i runme.sh /lib/dracut/hooks/emergency/02-runme.sh ...
    
  • It should be possible to have Dracut pull the kernel modules in from the temporary directory, but I wasn't able to get this working:
    ~/linux/> INSTALL_MOD_PATH=./mods make modules_install
    ~/linux/> dracut --kver "$(make kernelrelease)" --kmoddir ./mods/lib/...
    
  • Boot time and initramfs file IO performance can be improved by disabling compression. This is done by specifying the --no-compress Dracut parameter.

Update 20150722:
  • Don't install kernel modules as root, set up a /lib/modules symlink for Dracut instead.
  • Link to bridge/TAP networking post.
  • Describe boot script usage.
Update 20150813:
  • Use $(make kernelrelease) rather than a hard-coded 4.1.0-rc7+ kernel version string - thanks Aurélien!
Update 20150908: Describe initramfs --no-compress optimisation.

Saturday, May 23, 2015

Azure File Service IO with Elasto on Linux

In an earlier post I described the basics of the Microsoft Azure File Service, and how it can be used on Linux with the cifs.ko kernel client.

Since that time I've been hacking away on the Elasto cloud storage client, to the point that it now (with version 0.6.0) supports Azure File Service share provisioning as well as file and directory IO.


To play with Elasto yourself:
  • Install the packages
  • Download your Azure PublishSettings credentials
  • Run
    elasto_cli -s Azure_PublishSettings_File -u afs://
Keep in mind that Elasto is still far from mature, so don't be surprised if it corrupts your data or causes a fire.
With the warning out of the way, I'd like to thank:
  • My employer SUSE Linux, for supporting my Elasto development efforts during Hack Week.
  • Samba Experience conference organisers, for giving me the chance to talk about the project.
  • Kdenlive developers, for writing great video editing software.

Wednesday, December 10, 2014

Accénts & Ümlauts - A Custom Keyboard Layout on Linux

As a native English speaker living in Germany, I need to be able to reach the full Germanic alphabet without using long key combinations or (gasp) resorting to a German keyboard layout.

Accents and umlauts on US keyboards aren't only useful for expats. They're also enjoyed (or abused) by a number of English speaking subcultures:
  • Hipsters: "This is such a naïve café."
  • Metal heads: "Did you hear Spın̈al Tap are touring with Motörhead this year?"
  • Teenage gamers: "über pwnage!"

The standard US system keyboard layout can be enhanced to offer German characters via the following key mappings:
Key Key + Shift Key + AltGr (Right Alt) Key + AltGr + Shift
e E é É
u U ü Ü
o O ö Ö
a A ä Ä
s S ß ß
5 %

With openSUSE 13.2, this can be configured by first defining the mappings in /usr/share/X11/xkb/symbols/us_de:
partial default alphanumeric_keys
xkb_symbols "basic" {
    name[Group1]= "US/ASCII";
    include "us"

    key <AD03> {[e,          E,           eacute,         Eacute]};
    key <AD07> {[u,          U,           udiaeresis,     Udiaeresis]};
    key <AD09> {[o,          O,           odiaeresis,     Odiaeresis]};
    key <AC01> {[a,          A,           adiaeresis,     Adiaeresis]};
    key <AC02> {[s,          S,           ssharp,         ssharp]};
    key <AE05> {[NoSymbol, NoSymbol,      EuroSign]};

    key <RALT> {type[Group1]="TWO_LEVEL",
                [ISO_Level3_Shift, ISO_Level3_Shift]};

    modifier_map Mod5   {<RALT>};
};

Secondly, specify the keyboard layout as the system default in /etc/X11/xorg.conf.d/00-keyboard.conf:

Section "InputClass"
        Identifier "system-keyboard"
        MatchIsKeyboard "on"
        Option "XkbLayout" "us_de"
EndSection

Achtung!: IBus may be configured to override the system keyboard layout - ensure this is not the case in Ibus Preferences:
Once in place, the key mappings can be easily modified to suit specific tastes or languages - viel Spaß!