Typically A/B updates are implemented using separate old and new filesystem images, atop separate, equally sized disk partitions. However, modern copy-on-write filesystems offer some more performant and space efficient possibilities, as described below.
A/B Updates Using Btrfs Subvolume Snapshots
Linux's Btrfs filesystem provides support for snapshots at a subvolume level, which can be used for A/B system updates. A typical procedure would be:
- The current OS version is running atop an old read-only subvolume
- When an update is available, the old subvolume is cloned as a writeable snapshot under a newly created path within the filesystem
- The upgrade is written to the new snapshot subvolume path (e.g. via btrfs receive)
- The new snapshot is configured as the default subvolume, causing it to be mounted on next boot
- If any issues are encountered during or post update, any default subvolume change is reverted, the old OS version is booted and the new subvolume is subsequently discarded
This procedure works well; it's space efficient, allows for as many old versions to be retained as desired and also doesn't require any specific block device partitioning scheme. Given these benefits, it's unsurprising that SUSE uses a similar approach to provide Transactional Update functionality. However, there are still some minor caveats:
- Currently Btrfs only provides atomic snapshots for single subvolumes, meaning that the above procedure shouldn't be used if an OS update modifies multiple subvolumes
- The update procedure must be aware of the new subvolume path to target for I/O
- An alternative may be to create a read-only snapshot before upgrading in-place, similar to snapper based rollback
A/B Updates Using Btrfs Seed Devices
Btrfs seed devices offer copy-on-write support at a block device level, which also can be used to provide A/B system updates, with fallback between new and old block devices instead of subvolumes.
The following seed device example requires two or more separate block devices (or partitions), with one acting as a read-only seed device and one a read-write "sprout" device.
- The currently running OS version is backed by an old block device, flagged as a read-only seed via
btrfstune -S 1 /dev/old_block_dev
- When an update is available, the new writeable "sprout" device is added to the Btrfs filesystem via
btrfs device add /dev/new_block_device /
- The filesystem is remounted read-write
- The update is written in-place, with Btrfs ensuring that all update I/O is written to the newly added block device
- The new block device is flagged for the bootloader as the default boot device
- If any issues are encountered during or post update, any default boot device change is reverted and the new block device can be discarded
- The previous OS version remains untouched on the old device for fallback
- Once the new OS version is deemed stable, the old seed device should be removed from the filesystem, which will cause dependent data from the old device to be merged into the new
This seed device approach removes some of the constraints of the Btrfs subvolume approach, namely:
- The update procedure can atomically apply changes across multiple subvolumes, with seed-device rollback safely reverting all subvolume changes made
- After read-write remount, the update process can perform I/O to the running system in-place, without any specific knowledge of the seed device usage or underlying filesystem
This functionality may be attractive for Linux distributions, particularly if adding A/B update support to an existing update process with little filesystem integration. However, there remain a number of trade-offs to consider:
- Seed devices are significantly less space efficient compared to snapshot based A/B updates
- Each block device must have sufficient capacity to store the OS
- I/O performed when the old seed device is removed from the updated filesystem is a significant overhead and is avoided with snapshot based A/B updates
- Btrfs at least provides some compensation for this by verifying data checksums
- Btrfs seed device support appears somewhat niche compared to regular subvolume snapshots, so it likely receives less filesystem test focus
No comments:
Post a Comment
Comments are moderated due to spammer abuse.