Provisioning without hardware support
This page has notes for performing provisioning without hardware support.
It is desirable to allow people to do testing without requiring that they use extra hardware. The reason for this is that many people will not go to the extra trouble and expense of obtaining hardware, and the requirement will therefore dramatically shrink the pool of willing testers.
There are two main requirements for hardware for automated provisioning (and automated testing):
- the ability to reboot the board
- the ability to control menus for provisioning
I conducted some experiments to see whether sufficient software control was available to provide for mostly-automated provisioning. The idea was to use software control of the board for the majority of interactions, and fall back to user (manual) intervention if required.
That is, the experiement was to see if, under circumstances where the system did not hang or corrupt itself, the system could auto-recover sufficiently to return it to a "safe" mode, where it could be controlled via software processes only.
I would like to develop a set of criteria for what attributes of a system make this type of operation possible. It is possible that certain features could be designed-in to the system, to support software-only provisioning and board automation.
Most test labs use a serial connection to the board to control the bootloader, to select the kernel under test or to provide command line parameters or ram addresses. However, this requires that the board expose a serial port, and that the user install a serial cable from the management host to the board. Many products either do not expose a serial port at all, or the port is only accessible with great difficulty (like requiring soldering to the back of the board, after the product is taken apart).
A lot of test boards utilize system software that they retrieve from SDCards. A test system can avoid having to interact with the firmware or product menus on the device under test by using specialized hardware (called an SDMux) to switch access to storage between the target board's SD card slot and a host machine. SDMux hardware solves a large number of problems, but is also something most end users will not have.
Note from grub-based provisioning, using a safe/test system
GRUB is a bootloader commonly used in Desktop Linux systems, to load Linux or other operating systems at system boot time. It provides a menu, which is based on a configuration file that is most often auto-generated by scripts on the target Linux system.
Grub boots using materials that are in (by convention) /boot in root filesystem of the machine being booted. The conventions for the /boot directory are that the kernel image, config file, System.map, and initrd have the version of the kernel as part of their filenames.
The grub menu is in the file: /boot/grub/grub.cfg, and it is auto-generated by scripts in /etc/grub.d. One of the main scripts is /etc/grub.d/10_linux, which collects all the image names from /boot and creates menu items for them, based on settings in the config file /etc/default/grub.
The tool used to rebuild the grub.cfg file is called 'update-grub'.
Grub is supposed to be able to read and write information to /boot/grub/grubenv, to allow for control of grub operation. This file has special properties that allow it to be read/written with minimal disruption to the filesystem in which it resides. (Grub accesses it using it's own and EFI/BIOS I/O routines).
Grub uses it's ability to read from grubenv to determine if it should boot to alternate images. Specifically, it uses data from grubenv to determine if a previous boot failed, to avoid booting into that same kernel. However, it is important for grub to be able to write to grubenv in order to use this feature. Grub can read more filesystem types than it can currently write to. (Specifically, Grub can not write to a btrfs filesystem.)
Design of the boot-once system
The system I tried to use for doing automated provisioning consisted of 3 main parts:
- image preparation and placement
- grub menu rebuild
- reboot logic
To provision the system, the host first boots the system into a "safe" kernel. (This step may be skipped to save time, if the test kernel appears to be functioning correctly and able to handle network traffic and filesystem operations).
I added a script to grub, called /etc/grub.d/50_test, that adds an entry for a test kernel.
The script adds a new menu entry called "Test kernel" to the grub menu, which expects the following files to be present on the system:
The provisioning system places the kernel images into the /boot directory, with the indicated names (prefixed by 'test"). We can't use the conventional name for the kernel (with the kernel version number), because the grub menu update process will find the test kernel and put it in with the list of other detected kernels, and it might put it first. Grub defaults to booting the first kernel in the list by default, and we always want the default to be a "safe" kernel (that we know will work on the device).
When the host prepares to reboot the system, it calls (on the target board)
"grub-editenv /boot/grub/grubenv set next_entry="Test kernel"
to set the next boot to be to the test kernel.
Then the host calls (on the target board) 'reboot', to cause a software reboot of the system.
When grub boots, it reads 'next_entry' from grubenv, then clears the value for 'next_entry' in grubenv, and then boots the requested kernel.
If no other modifications are made, the test kernel will boot only once.
If the kernel hangs, then the user may have to manually reboot the machine.
Grub Issues (on potato)
The potato board sdcard image that I used (one with Ubuntu 18.04) was partitioned with a VFAT partition (for efi data), a BTRFS partitions (with 3 sub-volumes), and SWAP partition.
Since grub could not
Criteria for hardware-less updates (using a boot-once system)
- Firmware can only support boot-once mode if they have the following capabilities:
- Ability to write a bit to a persistent location, before starting the system software (software under test)
- Automation can be greatly enhance if there is a hardware watchdog feature, that can reboot the board if the software under test hangs