Tuesday, November 29, 2022

Btrfs Seed Devices for A/B System Updates

Sunflower seedling - Creative Commons Attribution-Share Alike 3.0 Unported
A/B system updates, as described here, provide a way for an operating system (OS) to seamlessly update from an old version to a new version, while ensuring that any failure in the upgrade process will allow for fallback to the known-working old version of the OS.

Typically A/B updates are implemented using separate old and new filesystem images, atop separate, equally sized disk partitions. However, modern copy-on-write filesystems offer some more performant and space efficient possibilities, as described below.

A/B Updates Using Btrfs Subvolume Snapshots

Linux's Btrfs filesystem provides support for snapshots at a subvolume level, which can be used for A/B system updates. A typical procedure would be:

  • The current OS version is running atop an old read-only subvolume
  • When an update is available, the old subvolume is cloned as a writeable snapshot under a newly created path within the filesystem
  • The upgrade is written to the new snapshot subvolume path (e.g. via btrfs receive)
  • The new snapshot is configured as the default subvolume, causing it to be mounted on next boot
  • If any issues are encountered during or post update, any default subvolume change is reverted, the old OS version is booted and the new subvolume is subsequently discarded

This procedure works well; it's space efficient, allows for as many old versions to be retained as desired and also doesn't require any specific block device partitioning scheme. Given these benefits, it's unsurprising that SUSE uses a similar approach to provide Transactional Update functionality. However, there are still some minor caveats:

  • Currently Btrfs only provides atomic snapshots for single subvolumes, meaning that the above procedure shouldn't be used if an OS update modifies multiple subvolumes
  • The update procedure must be aware of the new subvolume path to target for I/O
    • An alternative may be to create a read-only snapshot before upgrading in-place, similar to snapper based rollback

A/B Updates Using Btrfs Seed Devices

Btrfs seed devices offer copy-on-write support at a block device level, which also can be used to provide A/B system updates, with fallback between new and old block devices instead of subvolumes.

The following seed device example requires two or more separate block devices (or partitions), with one acting as a read-only seed device and one a read-write "sprout" device.

  • The currently running OS version is backed by an old block device, flagged as a read-only seed via
    btrfstune -S 1 /dev/old_block_dev
  • When an update is available, the new writeable "sprout" device is added to the Btrfs filesystem via
    btrfs device add /dev/new_block_device /
  • The filesystem is remounted read-write
  • The update is written in-place, with Btrfs ensuring that all update I/O is written to the newly added block device
  • The new block device is flagged for the bootloader as the default boot device
  • If any issues are encountered during or post update, any default boot device change is reverted and the new block device can be discarded
    • The previous OS version remains untouched on the old device for fallback
  • Once the new OS version is deemed stable, the old seed device should be removed from the filesystem, which will cause dependent data from the old device to be merged into the new

This seed device approach removes some of the constraints of the Btrfs subvolume approach, namely:

  • The update procedure can atomically apply changes across multiple subvolumes, with seed-device rollback safely reverting all subvolume changes made
  • After read-write remount, the update process can perform I/O to the running system in-place, without any specific knowledge of the seed device usage or underlying filesystem

This functionality may be attractive for Linux distributions, particularly if adding A/B update support to an existing update process with little filesystem integration. However, there remain a number of trade-offs to consider:

  • Seed devices are significantly less space efficient compared to snapshot based A/B updates
    • Each block device must have sufficient capacity to store the OS
  • I/O performed when the old seed device is removed from the updated filesystem is a significant overhead and is avoided with snapshot based A/B updates
    • Btrfs at least provides some compensation for this by verifying data checksums
  • Btrfs seed device support appears somewhat niche compared to regular subvolume snapshots, so it likely receives less filesystem test focus


Btrfs subvolume snapshots and seed devices can both be used to provide seamless and reliable A/B system updates. Snapshot based updates offer more efficient storage and CPU resource utilization, so should likely be considered the optimal choice for implementers.
Seed device based updates are a viable alternative, particularly for multi-subvolume updates, but implementers should carefully consider the described trade-offs.
Animated gif of a sunflower seed sprouting - Creative Commons Attribution-Share Alike 4.0 International



Saturday, April 21, 2018

Samsung Android Full Device Backup with TWRP


Following these instructions, correctly or incorrectly, may leave you with a completely broken or bricked device. Furthermore, flashing your device may void your warranty - Samsung uses eFuses to permanently flag occurrences of a device running non-Samsung software, such as TWRP.
I take no responsibility for what may come of using these instructions.

With the warning out of the way, I will say that I tested this process with the following environment:
  • Android Device: Samsung Galaxy S3 (i9300)
  • TWRP: 3.2.1-0
  • Desktop OS: openSUSE Leap 42.3

Flashing and Booting into Recovery

  • Download the official TWRP image for your device, and corresponding PGP signature
    • https://dl.twrp.me
  • Use gpg to verify your TWRP image
  • Download and install Heimdall on your Linux or Windows PC
  • Boot your Samsung device into Download Mode
    • Simultaneous hold the Volume-down + Home/Bixby + Power buttons
  • Using Heimdall on your desktop, flash the TWRP image to your device's recovery partition:
    • heimdall flash --no-reboot --RECOVERY <recovery.img>
    • Wait for Heimdall to output "RECOVERY upload successful"
  • From Download Mode, boot your Samsung device into TWRP
    • Simultaneous hold the Volume-up + Home/Bixby + Power buttons
    • If you accidentally boot into regular Android, then you'll likely have to boot into Download Mode and reflash, as regular boot restores the recovery partition to its default contents

Exposing the Device as USB Mass Storage

  • Unmount all partitions:
    • From the TWRP main menu, select Mount, then uncheck all partitions
  • Bring up a shell
    • From the TWRP main menu, select Advanced -> Terminal 
    • adb shell could be used instead here, but the adb connection from the desktop to the device will be lost when all USB roles are disabled
  • Determine which block device you wish to backup
  • # cat /etc/fstab
    • In my case (i9300), all data is stored on /dev/block/mmcblk0 partitions
  • Check the current state of the TWRP USB gadget
  • # cat /sys/devices/virtual/android_usb/android0/functions
  • Configure a read-only USB Mass Storage gadget
  • # echo 1 > /sys/devices/virtual/android_usb/android0/f_mass_storage/lun0/ro
    # echo /dev/block/mmcblk0 > /sys/devices/virtual/android_usb/android0/f_mass_storage/lun0/file
  • Disable all USB roles
  • # echo 0 > /sys/devices/virtual/android_usb/android0/enable
  • Enable the Mass Storage gadget USB role
  • # echo mass_storage,adb > /sys/devices/virtual/android_usb/android0/functions
    # echo 1 > /sys/devices/virtual/android_usb/android0/enable
  • If not already done, connect the device to your desktop or laptop
    • The attached device should appear as regular USB storage


Any Linux, Windows or macOS program capable of fully backing up a USB storage device should be usable from this point. The procedure below uses the dd command on Linux.
  • From your computer, determine which USB storage device to back up
  • ddiss@desktop:~> lsscsi
    [2:0:0:0]    disk    SAMSUNG  File-Stor Gadget 0001  /dev/sdb 
  • As root, start copying the data from the device
  • ddiss@desktop:~> sudo dd if=/dev/sdb of=/home/ddiss/samsung_backup.img bs=1M
  • dd will take a long time to complete, depending on the size of your device, USB connection speed, etc.
  • Once completed, unplug your Android device and reboot it
  • The image file can be compressed
With the image now obtained, you could mount it on your desktop, or restore it to the device at a later date. I'll hopefully get around to writing separate posts for both in future.

Monday, January 29, 2018

Building Ceph master with C++17 support on openSUSE Leap 42.3

Ceph now requires C++17 support, which is available with modern compilers such as gcc-7. openSUSE Leap 42.3, my current OS of choice, includes gcc-7. However, it's not used by default.

Using gcc-7 for the Ceph build is a simple matter of:
> sudo zypper in gcc7-c++
> CC=gcc-7 CXX=/usr/bin/g++-7 ./do_cmake.sh ...
> cd build && make -j$(nproc)

Monday, July 3, 2017

Multipath Failover Simulation with QEMU

While working on a Ceph OSD multipath issue, I came across a helpful post from Dan Horák on how to simulate a multipath device under QEMU.

qemu-kvm ... -device virtio-scsi-pci,id=scsi \
  -drive if=none,id=hda,file=<path>,cache=none,format=raw,serial=MPIO \
  -device scsi-hd,drive=hda \
  -drive if=none,id=hdb,file=<path>,cache=none,format=raw,serial=MPIO \
  -device scsi-hd,drive=hdb"
  • <path> should be replaced with a file or device path (the same for each)
  • serial= specifies the SCSI logical unit serial number
This attaches two virtual SCSI devices to the VM, both of which are backed by the same file and share the same SCSI logical unit identifier.
Once booted, the SCSI devices for each corresponding path appear as sda and sdb, which are then detected as multipath enabled and subsequently mapped as dm-0:

         Starting Device-Mapper Multipath Device Controller...
[  OK  ] Started Device-Mapper Multipath Device Controller.
[    1.329668] device-mapper: multipath service-time: version 0.3.0 loaded
rapido1:/# multipath -ll
size=2.0G features='1 retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=1 status=active
| `- 0:0:0:0 sda 8:0  active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 0:0:1:0 sdb 8:16 active ready running

QEMU additionally allows for virtual device hot(un)plug at runtime, which can be done from the QEMU monitor CLI (accessed via ctrl-a c) using the drive_del command. This can be used to trigger a multipath failover event:

rapido1:/# mkfs.xfs /dev/dm-0
meta-data=/dev/dm-0              isize=256    agcount=4, agsize=131072 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0, sparse=0
data     =                       bsize=4096   blocks=524288, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
rapido1:/# mount /dev/dm-0 /mnt/
[   96.846919] XFS (dm-0): Mounting V4 Filesystem
[   96.851383] XFS (dm-0): Ending clean mount

rapido1:/# QEMU 2.6.2 monitor - type 'help' for more information
(qemu) drive_del hda

rapido1:/# echo io-to-trigger-path-failure > /mnt/failover-trigger
[  190.926579] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[  190.926588] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x2 [current] 
[  190.926589] sd 0:0:0:0: [sda] tag#0 ASC=0x3a ASCQ=0x0 
[  190.926590] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 00 00 02 00 00 01 00
[  190.926591] blk_update_request: I/O error, dev sda, sector 2
[  190.926597] device-mapper: multipath: Failing path 8:0.

rapido1:/# multipath -ll
size=2.0G features='1 retain_attached_hw_handler' hwhandler='0' wp=rw
|-+- policy='service-time 0' prio=0 status=enabled
| `- 0:0:0:0 sda 8:0  failed faulty running
`-+- policy='service-time 0' prio=1 status=active
  `- 0:0:1:0 sdb 8:16 active ready  running

The above procedure demonstrates cable-pull simulation while the broken path is used by the mounted dm-0 device. The subsequent I/O failure triggers multipath failover to the remaining good path.

I've added this functionality to Rapido (pull-request) so that multipath failover can be performed in a couple of minutes directly from kernel source. I encourage you to give it a try for yourself!

Friday, June 9, 2017

Rapido: Quick Kernel Testing From Source (Video)

I presented a short talk at the 2017 openSUSE Conference on Linux kernel testing using Rapido.

There were many other interesting talks during the conference, all of which can be viewed on the oSC 2017 media site.
A video of my presentation is embedded below.
Many thanks to the organisers and sponsors for putting on a great event.

Tuesday, December 27, 2016

Adding Reviewed-by and Acked-by Tags with Git

This week's "Git Rocks!" moment came while I was investigating how I could automatically add Reviewed-by, Acked-by, Tested-by, etc. tags to a given commit message.

Git's interpret-trailers command is capable of testing for and manipulating arbitrary Key: Value tags in commit messages.

For example, appending Reviewed-by: MY NAME <my@email.com> to the top commit message is as simple as running:

> GIT_EDITOR='git interpret-trailers --trailer \
 "Reviewed-by: $(git config user.name) <$(git config user.email)>" \
 --in-place' git commit --amend 

Or with the help of a "git rb" alias, via:
> git config alias.rb "interpret-trailers --trailer \
 \"Reviewed-by: $(git config user.name) <$(git config user.email)>\" \
> GIT_EDITOR="git rb" git commit --amend

The above examples work by replacing the normal git commit editor with a call to git interpret-trailers, which appends the desired tag to the commit message and then exits.

My specific use case is to add Reviewed-by: tags to specific commits during interactive rebase, e.g.:
> git rebase --interactive HEAD~3

This brings up an editor with a list of the top three commits in the current branch. Assuming the aforementioned rb alias has been configured, individual commits will be given a Reviewed-by tag when appended with the following line:

exec GIT_EDITOR="git rb" git commit --amend

As an example, the following will see three commits applied, with the commit message for two of them (d9e994e and 5f8c115) appended with my Reviewed-by tag.

pick d9e994e ctdb: Fix CID 1398179 Argument cannot be negative
exec GIT_EDITOR="git rb" git commit --amend
pick 0fb313c ctdb: Fix CID 1398178 Argument cannot be negative
#    ^^^^^^^ don't add a Reviewed-by tag for this one just yet 
pick 5f8c115 ctdb: Fix CID 1398175 Dereference after null check
exec GIT_EDITOR="git rb" git commit --amend

Bonus: By default, the vim editor includes git rebase --interactive syntax highlighting and key-bindings - if you press K while hovering over a commit hash (e.g. d9e994e from above), vim will call git show <commit-hash>, making reviewing and tagging even faster!

Note taking: Arbitrary notes can also be appended to commits using the same technique. E.g. From the git interactive rebase editor:

pick 12dd8972f6e fix build
x GIT_EDITOR='git interpret-trailers --trailer "TODO-ddiss: squash with prior" --in-place' git commit --amend

Thanks to:
  • Upstream Git developers, especially those who implemented the interpret-trailers functionality.
  • My employer, SUSE.

Update 20190123:
  • Add commit message note taking example

Tuesday, December 13, 2016

Rapido: A Glorified Wrapper for Dracut and QEMU


I've blogged a few of times about how Dracut and QEMU can be combined to greatly improve Linux kernel dev/test turnaround.
  • My first post covered the basics of building the kernel, running dracut, and booting the resultant image with qemu-kvm.
  • A later post took a closer look at network configuration, and focused on bridging VMs with the hypervisor.
  • Finally, my third post looked at how this technique could be combined with Ceph, to provide a similarly efficient workflow for Ceph development.
In bringing this series to a conclusion, I'd like to introduce the newly released Rapido project. Rapido combines all of the procedures and techniques described in the articles above into a handful of scripts, which can be used to test specific Linux kernel functionality, standalone or alongside other technologies such as Ceph.

Usage - Standalone Linux VM

The following procedure was tested on openSUSE Leap 42.3 and SLES 12SP3, but should work fine on many other Linux distributions.

Step 1: Checkout and Build

Checkout the Linux kernel and Rapido source repositories:

~/> cd ~
~/> git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
~/> git clone https://github.com/rapido-linux/rapido.git

Build the kernel (using a config provided with the Rapido source):
~/> cp rapido/kernel/vanilla_config linux/.config
~/> cd linux
~/linux/> make -j6
~/linux/> make modules
~/linux/> INSTALL_MOD_PATH=./mods make modules_install

Step 2: Configuration 

Install Rapido dependencies: dracut and qemu.

Create a master rapido.conf configuration file using the example template:
~/linux/> cd ~/rapido
~/rapido/> cp rapido.conf.example rapido.conf
~/rapido/> vi rapido.conf
  • set KERNEL_SRC="/home/<user>/linux"
  • the remaining options can be left as is for now

Step 3: Image Generation 

Generate a minimal Linux VM image which includes binaries, libraries and kernel modules for filesystem testing:
~/rapido/> ./cut_fstests_local.sh
 dracut: *** Creating initramfs image file 'initrds/myinitrd' done ***
~/rapido/> ls -lah initrds/myinitrd
-rw-r--r-- 1 ddiss users 30M Dec 13 18:17 initrds/myinitrd

Step 4 - Boot!

 ~/rapido/> ./vm.sh
+ mount -t btrfs /dev/zram1 /mnt/scratch
[    3.542927] BTRFS info (device zram1): disk space caching is enabled
btrfs filesystem mounted at /mnt/test and /mnt/scratch

In a whopping four seconds, or thereabouts, the VM should have booted to a rapido:/# bash prompt. Leaving you with two zram backed Btrfs filesystems mounted at /mnt/test and /mnt/scratch.

Everything, including the VM's root filesystem, is in memory, so any changes will not persist across reboot. Use the rapido.conf QEMU_EXTRA_ARGS parameter if you wish to add persistent storage to a VM.

Once you're done playing around, you can shutdown:
rapido1:/# shutdown
[  267.304313] sysrq: SysRq : sysrq: Power Off
rapido1:/# [  268.168447] ACPI: Preparing to enter system sleep state S5
[  268.169493] reboot: Power down
+ exit 0

Step 5: Network Configuration

The fstests_local VM above is networkless, so doesn't require bridge network configuration. For VMs that do (e.g. CephFS client below) edit rapido.conf:
  • set TAP_USER="<user>"
  • set MAC_ADDR1 to a valid MAC address, e.g. "b8:ac:24:45:c5:01"
  • set MAC_ADDR2 to a valid MAC address, e.g. "b8:ac:24:45:c5:02"

Configure the isolated bridge and tap network devices. This must be done as root:
~/rapido/> sudo tools/br_setup.sh
~/rapido/> ip addr show br0
4: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet scope global br0

Usage - Ceph vstart.sh cluster and CephFS client VM

This usage guide builds on the previous standalone Linux VM procedure, but this time adds Ceph to the mix. If you're not interested in Ceph (how could you not be!) then feel free to skip to the next section.

Step I - Checkout and Build

We already have a clone of the Rapido and Linux kernel repositories. All that's needed for CephFS testing is a Ceph build:
~/> git clone https://github.com/ceph/ceph
~/> cd ceph
<install Ceph build dependencies>
~/ceph/> cd build
~/ceph/build/> make -j4 

Step II - Start a vstart.sh Ceph "cluster"

Once Ceph has finished compiling, vstart.sh can be run with the following parameters to configure and locally start three OSDs, one monitor process, and one MDS.
~/ceph/build/> OSD=3 MON=1 RGW=0 MDS=1 ../src/vstart.sh -i -n
~/ceph/build/> bin/ceph -c status
     health HEALTH_OK
     monmap e2: 1 mons at {a=}
            election epoch 4, quorum 0 a
      fsmap e5: 1/1/1 up {0=a=up:active}
        mgr no daemons active 
     osdmap e10: 3 osds: 3 up, 3 in

Step III - Rapido configuration

Edit rapido.conf, the master Rapido configuration file:
~/ceph/build/> cd ~/rapido
~/rapido/> vi rapido.conf
  • set CEPH_SRC="/home/<user>/ceph/src"
  • KERNEL_SRC and network parameters were configured earlier

Step IV - Image Generation

The cut_cephfs.sh script generates a VM image with the Ceph configuration and keyring from the vstart.sh cluster, as well as the CephFS kernel module.
~/rapido/> ./cut_cephfs.sh
 dracut: *** Creating initramfs image file 'initrds/myinitrd' done ***

Step V - Boot!

Booting the newly generated image should bring you to a shell prompt, with the vstart.sh provisioned CephFS filesystem mounted under /mnt/cephfs:
~/rapido/> ./vm.sh
+ mount -t ceph /mnt/cephfs -o name=admin,secret=...
[    3.492742] libceph: mon0 session established
rapido1:/# df -h /mnt/cephfs
Filesystem             Size  Used Avail Use% Mounted on  1.3T  611G  699G  47% /mnt/cephfs

CephFS is a clustered filesystem, in which case testing from multiple clients is also of interest. From another window, boot a second VM:
~/rapido/> ./vm.sh



Further Use Cases

Rapido ships with a bunch of scripts for testing different kernel components:
  • cut_cephfs.sh (shown above)
    • Image: includes Ceph config, credentials and CephFS kernel module
    • Boot: mounts CephFS filesystem
  • cut_cifs.sh
    • Image: includes CIFS (SMB client) kernel module
    • Boot: mounts share using details and credentials specified in rapido.conf
  • cut_dropbear.sh
    • Image: includes dropbear SSH server
    • Boot: starts an SSH server with SSH_AUTHORIZED_KEY
  • cut_fstests_cephfs.sh
    • Image: includes xfstests and CephFS kernel client
    • Boot: mounts CephFS filesystem and runs FSTESTS_AUTORUN_CMD
  • cut_fstests_local.sh (shown above)
    • Image: includes xfstests and local Btrfs and XFS dependencies
    • Boot: provisions local xfstest zram devices. Runs FSTESTS_AUTORUN_CMD
  • cut_lio_local.sh
    • Image: includes LIO, loopback dev and dm-delay kernel modules
    • Boot: provisions an iSCSI target, with three LUs exposed
  • cut_lio_rbd.sh
    • Image: includes LIO and Ceph RBD kernel modules
    • Boot: provisions an iSCSI target backed by CEPH_RBD_IMAGE, using target_core_rbd
  • cut_qemu_rbd.sh
    • Image: CEPH_RBD_IMAGE is attached to the VM using qemu-block-rbd
    • Boot: runs shell only
  • cut_rbd.sh
    • Image: includes Ceph config, credentials and Ceph RBD kernel module
    • Boot: maps CEPH_RBD_IMAGE using the RBD kernel clien
  • cut_samba_cephfs.sh
    • Image: includes Ceph vstart config, credentials and libcephfs from CEPH_SRC, and additionally pulls in Samba from a (pre compiled) SAMBA_SRC
    • Boot: configures smb.conf with a CephFS backed share and starts Samba
  • cut_samba_local.sh
    • Image: includes local kernel filesystem utils, and pulls in Samba from SAMBA_SRC
    • Boot: configures smb.conf with a zram backed share and starts Samba
  • cut_tcmu_rbd_loop.sh
    • Image: includes Ceph config, librados, librbd, and pulls in tcmu-runner from TCMU_RUNNER_SRC
    • Boot: starts tcmu-runner and configures a tcmu+rbd backstore exposing CEPH_RBD_IMAGE via the LIO loopback fabric
  • cut_usb_rbd.sh (see https://github.com/ddiss/rbd-usb)
    • Image: usb_f_mass_storage, zram, dm-crypt, and RBD_USB_SRC
    • Boot: starts the conf-fs.sh script from RBD_USB_SRC




  • Dracut and QEMU can be combined for super-fast Linux kernel testing and development.
  • Rapido is mostly just a glorified wrapper around these utilities, but does provide some useful tools for automated testing of specific Linux kernel functionality.

If you run into any problems, or wish to provide any kind of feedback (always appreciated), please feel free to leave a message below, or raise a ticket in the Rapido issue tracker.

Update 20170106:
  • Add cut_tcmu_rbd_loop.sh details and fix the example CEPH_SRC path. 
 Update 20180312:
  • Use KERNEL_INSTALL_MOD_PATH instead of an ugly symlink
  • Update Github links to refer to new project URL
  • Remove old brctl and tunctl dependencies
  • Split network setup into a separate section, as fstests_local VMs are now networkless
  • Add cut_samba_cephfs.sh and cut_samba_local.sh details