Tuesday, December 15, 2015

Ceph USB Storage Gateway


Last week was Hackweek, a week full of fun and innovation at SUSE. I decided to use the time to work on a USB storage gateway for Ceph.



The concept is simple - create a USB device that, when configured and connected, exposes remote Ceph RADOS Block Device (RBD) images for access as USB mass storage, allowing for:
  • Ceph storage usage by almost any system with a USB port
    • Including dumb systems such as TVs, MP3 players and mobile phones
  • Boot from RBD images
    • Many systems are capable of booting from a USB mass storage device
  • Minimal configuration
    • Network, Ceph credentials and image details should be all that's needed for configuration


Hardware

I already own a Cubietruck, which has the following desirable characteristics for this project:
  • Works with a mainline Linux Kernel
  • Is reasonably small and portable
  • Supports power and data transfer via a single mini-USB port
  • Performs relatively well
    • Dual-core 1GHz processor and 2GB RAM
    • Gigabit network adapter and WiFi 802.11 b/g/n

Possible alternatives worth evaluation include C.H.I.P (smaller and cheaper), NanoPi2, and UP (faster). I should take this opportunity to mention that I do gladly accept hardware donations!


Base System

I decided on using openSUSE Tumbleweed as the base operating system for this project. An openSUSE Tubleweed ARM port for the Cubietruck is available for download at:
http://download.opensuse.org/ports/armv7hl/factory/images/openSUSE-Tumbleweed-ARM-JeOS-cubietruck.armv7l-Current.raw.xz

Installation is as straightforward as copying the image to an SD card and booting - I documented the installation procedure on the openSUSE Wiki.
Releases prior to Build350 exhibit boot issues due to the U-Boot device-tree path. However, this has been fixed in recent builds.


Kernel

The Linux kernel currently shipped with the openSUSE image does not include support for acting as a USB mass storage gadget, nor does it include Ceph RBD support. In order to obtain these features, and also reduce the size of the base image, I built a mainline Linux kernel (4.4-rc4) using a minimal custom kernel configuration:
~/> sudo zypper install --no-recommends git-core gcc make ncurses-devel bc
~/> git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
~/> cd linux
~/linux/> wget https://raw.githubusercontent.com/ddiss/ceph_usb_gateway/master/.config
          # or `make sunxi_defconfig menuconfig`
          #     ->enable Ceph, sunxi and USB gadget modules
~/linux/> make oldconfig
~/linux/> make -j2 zImage dtbs modules
~/linux/> sudo make install modules_install
~/linux/> sudo cp arch/arm/boot/zImage /boot/zImage-$(make kernelrelease)
~/linux/> sudo cp arch/arm/boot/dts/sun7i-a20-cubietruck.dtb /boot/dtb-4.3.0-2/
~/linux/> sudo cp arch/arm/boot/dts/sun7i-a20-cubietruck.dtb /boot/dtb/

This build procedure takes a long time to complete. Cross compilation could be used to improve build times.
I plan on publishing a USB gadget enabled ARM kernel on the Open Build Service in the future, which would allow for simple installation via zypper - watch this space!


Ceph RADOS Block Device (RBD) mapping

To again save space, I avoided installation of user-space Ceph packages by using the bare kernel sysfs interface for RBD image mapping.
The Ceph RBD kernel module must be loaded prior to use:
# modprobe rbd

Ceph RADOS block devices can be mapped using the following command:
# echo -n "${MON_IP}:6789 name=${AUTH_NAME},secret=${AUTH_SECRET} " \
          "${CEPH_POOL} ${CEPH_IMG} -" > /sys/bus/rbd/add

$MON_IP can be obtained from ceph.conf. Similarly, the $AUTH_NAME and $AUTH_SECRET credentials can be retrieved from a regular Ceph keyring file.
$CEPH_POOL and $CEPH_IMG correspond to the location of the RBD image.

A locally mapped RBD block device can be subsequently removed via:
# echo -n "${DEV_ID}" > /sys/bus/rbd/remove

$DEV_ID can be determined from the numeric suffix assigned to the /dev/rbdX device path.

Images can't be provisioned without the Ceph user-space utilities installed, so should be performed on a separate system (e.g. an OSD) prior to mapping on the Cubietruck. E.g. To provision a 10GB image:
# rbd create --size=10240 --pool ${CEPH_POOL} ${CEPH_IMG}

With my Cubietruck connected to the network via the ethernet adapter, I observed streaming read (/dev/rbd -> /dev/null) throughput at ~37MB/s, and the same value for streaming writes (/dev/zero -> /dev/rbd). Performance appears to be constrained by limitations of the Cubietruck hardware.


USB Mass Storage Gadget

The Linux kernel mass storage gadget module is configured via configfs. A device can be exposed as a USB mass storage device with the following procedure:
# modprobe sunxi configfs libcomposite usb_f_mass_storage

# mount -t configfs configfs /sys/kernel/config
# cd /sys/kernel/config/usb_gadget/
# mkdir -p ceph
# cd ceph

# mkdir -p strings/0x409
# echo "fedcba9876543210" > strings/0x409/serialnumber
# echo "openSUSE" > strings/0x409/manufacturer
# echo "Ceph USB Drive" > strings/0x409/product

# mkdir -p functions/mass_storage.usb0
# echo 1 > functions/mass_storage.usb0/stall
# echo 0 > functions/mass_storage.usb0/lun.0/cdrom
# echo 0 > functions/mass_storage.usb0/lun.0/ro
# echo 0 > functions/mass_storage.usb0/lun.0/nofua
# echo "$DEV" > functions/mass_storage.usb0/lun.0/file

# mkdir -p configs/c.1/strings/0x409
# echo "Config 1: mass-storage" > configs/c.1/strings/0x409/configuration
# echo 250 > configs/c.1/MaxPower
# ln -s functions/mass_storage.usb0 configs/c.1/

# ls /sys/class/udc > UDC

$DEV corresponds to a /dev/X device path, which should be a locally mapped RBD device path. The module can however also use local files as backing for USB mass storage.


Boot-Time Automation

By default, Cubietruck boots when the board is connected to a USB host via the mini-USB connection.
With RBD image mapping and USB mass storage exposure now working, the process can be run automatically on boot via a simple script: rbd_usb_gw.sh
Furthermore, a systemd service can be added:
[Unit]
Wants=network-online.target
After=network-online.target

[Service]
# XXX assume that rbd_usb_gw.sh is present in /bin
ExecStart=/bin/rbd_usb_gw.sh %i
Type=oneshot
RemainAfterExit=yes

Finally, this service can be triggered by Wicked when the network interface comes online, with the following entry added to /etc/sysconfig/network/config:
POST_UP_SCRIPT="systemd:rbd-mapper@.service"


Boot Performance Optimisation

A significant reduction in boot time can be achieved by running everything from initramfs, rather than booting to the full Linux distribution.
Generating a minimal initramfs image, with support for mapping and exposing RBD images is straightforward, thanks to the Dracut utility:
# dracut --no-compress  \
         --kver "`uname -r" \
         --install "ps rmdir dd vim grep find df modinfo" \
         --add-drivers "rbd musb_hdrc sunxi configfs" \
         --no-hostonly --no-hostonly-cmdline \
         --modules "bash base network ifcfg" \
         --include /bin/rbd_usb_gw.sh /lib/dracut/hooks/emergency/02_rbd_usb_gw.sh \
         myinitrd

The rbd_usb_gw.sh script is installed into the initramfs image as a Dracut emergency hook, which sees it executed as soon as initramfs has booted.

To ensure that the network is up prior to the launch of rbd_usb_gw.sh, the kernel DHCP client (CONFIG_IP_PNP_DHCP) can be used by appending ip=dhcp to the boot-time kernel parameters. This can be set from the U-Boot bootloader prompt:
=> setenv append 'ip=dhcp'
=> boot

The new initramfs image must be committed to the boot partition via:

# cp myinitrd /boot/
# rm /boot/initrd
# sudo ln -s /boot/myinitrd /boot/initrd

Note: In order to boot back to the full Linux distribution, you will have to mount the /boot partition and revert the /boot/initrd symlink to its previous target.


Future Improvements

  • Support configuration of the device without requiring console access
    • Run an embedded web-server, or expose a configuration filesystem via USB 
  • Install the operating system onto on-board NAND storage,
  • Further improve boot times
    • Avoid U-Boot device probes
  • Experiment with the new f_tcm USB gadget module
    • Expose RBD images via USB and iSCSI


Credits

Many thanks to:
  • My employer, SUSE Linux, for encouraging me to work on projects like this during Hackweek.
  • The linux-sunxi community, for their excellent contributions to the mainline Linux kernel.
  • Colleagues Dirk, Bernhard, Alex and Andreas for their help in bringing up openSUSE Tumbleweed on my Cubietruck board.

6 comments:

  1. Great initiative!

    Is there smaller hardware available to support this ?
    I can think about the size of USB armory, FST-01, ....
    In one side USB type A, and in the other side RJ45.
    Even with USB2 - 100Mbps Ethernet would already be a first start.

    ReplyDelete
    Replies
    1. The newly released ~$10 NanoPi NEO is a 40mm x 40mm board with (power/data) micro USB OTG and 100Mbps Ethernet. Upstream kernel support for the SoC is also mostly complete.

      Delete
  2. The Zsun WiFi SD Card Reader (https://wiki.hackerspace.pl/projects:zsun-wifi-card-reader) could be made to work with a bit of HW and SW hacking. Other alternatives include the NanoPi M1 (http://nanopi.io/nanopi-m1.html), and the ODROID-C2 (http://www.hardkernel.com/main/products/prdt_info.php?g_code=G145457216438).

    That said, I still consider C.H.I.P the most promising at this stage, especially given the $9 price tag.

    ReplyDelete
  3. I've been trying unsuccessfully for over 3 days to get the image to boot following the instructions provided on https://en.opensuse.org/HCL:Cubietruck. All I get is a black screen. It appears that the "current" image on the site and all the mirrors are corrupt when trying to boot using my 5 64GB SD cards. I can only boot when programming the image to and 7 year old 2G SD card. The command that I am using to put the data on the card is the one giving in the instructions: xzcat openSUSE-Tumbleweed-ARM-JeOS-cubietruck.armv7l-2016.06.12-Build2.10.raw.xz | dd bs=4M of=/dev/sde iflag=fullblock oflag=direct; syn

    What might I be doing wrong? I was able to program the same 64GB card back in early April using the XFCE image that was available at the time, but since then there is only the JeOS image from June 25th available, and it won't boot from my 64 GB cards. What might I be doing wrong?

    John

    ReplyDelete
    Replies
    1. Hi John,

      There's a good chance that the jeos image doesn't include display support. I'd suggest that you contact the opensuse-arm mailing list to resolve these issues.
      For this project I wasn't using the VGA/HDMI outputs at all - only ssh (by default, the image requests an IP address via DHCP) and UART (see https://linux-sunxi.org/Cubietruck#Adding_a_serial_port).

      Delete
  4. Hi, I managed to get it resolved by trying a different USB card reader. I am not sure why the card reader really makes such a difference, because I am otherwise able to transfer fils to and from the same SD cards without corruption. I confirmed this by checking the MD5 sums of files being saved to and from the cards.

    ReplyDelete

Comments are moderated due to spammer abuse.