1--- 2title: Automatic Boot Assessment 3category: Booting 4layout: default 5SPDX-License-Identifier: LGPL-2.1-or-later 6--- 7 8# Automatic Boot Assessment 9 10systemd provides support for automatically reverting back to the previous 11version of the OS or kernel in case the system consistently fails to boot. This 12support is built into various of its components. When used together these 13components provide a complete solution on UEFI systems, built as add-on to the 14[Boot Loader Specification](BOOT_LOADER_SPECIFICATION.md). 15However, the different components may also be used independently, and in 16combination with other software, to implement similar schemes, for example with 17other boot loaders or for non-UEFI systems. Here's a brief overview of the 18complete set of components: 19 20* The 21 [`systemd-boot(7)`](https://www.freedesktop.org/software/systemd/man/systemd-boot.html) 22 boot loader optionally maintains a per-boot-loader-entry counter that is 23 decreased by one on each attempt to boot the entry, prioritizing entries that 24 have non-zero counters over those which already reached a counter of zero 25 when choosing the entry to boot. 26 27* The 28 [`systemd-bless-boot.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot.service.html) 29 service automatically marks a boot loader entry, for which boot counting as 30 mentioned above is enabled, as "good" when a boot has been determined to be 31 successful, thus turning off boot counting for it. 32 33* The 34 [`systemd-bless-boot-generator(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot-generator.html) 35 generator automatically pulls in `systemd-bless-boot.service` when use of 36 `systemd-boot` with boot counting enabled is detected. 37 38* The 39 [`systemd-boot-check-no-failures.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-boot-check-no-failures.service.html) 40 service is a simple health check tool that determines whether the boot 41 completed successfully. When enabled it becomes an indirect dependency of 42 `systemd-bless-boot.service` (by means of `boot-complete.target`, see 43 below), ensuring that the boot will not be considered successful if there are 44 any failed services. 45 46* The `boot-complete.target` target unit (see 47 [`systemd.special(7)`](https://www.freedesktop.org/software/systemd/man/systemd.special.html)) 48 serves as a generic extension point both for units that are necessary to 49 consider a boot successful (example: `systemd-boot-check-no-failures.service` 50 as described above), and units that want to act only if the boot is 51 successful (example: `systemd-bless-boot.service` as described above). 52 53* The 54 [`kernel-install(8)`](https://www.freedesktop.org/software/systemd/man/kernel-install.html) 55 script can optionally create boot loader entries that carry an initial boot 56 counter (the initial counter is configurable in `/etc/kernel/tries`). 57 58## Details 59 60The boot counting data `systemd-boot` and `systemd-bless-boot.service` 61manage is stored in the name of the boot loader entries. If a boot loader entry 62file name contains `+` followed by one or two numbers (if two numbers, then 63those need to be separated by `-`) right before the `.conf` suffix, then boot 64counting is enabled for it. The first number is the "tries left" counter 65encoding how many attempts to boot this entry shall still be made. The second 66number is the "tries done" counter, encoding how many failed attempts to boot 67it have already been made. Each time a boot loader entry marked this way is 68booted the first counter is decreased by one, and the second one increased by 69one. (If the second counter is missing, then it is assumed to be equivalent to 70zero.) If the "tries left" counter is above zero the entry is still considered 71for booting (the entry's state is considered to be "indeterminate"), as soon as 72it reached zero the entry is not tried anymore (entry state "bad"). If the boot 73attempt completed successfully the entry's counters are removed from the name 74(entry state "good"), thus turning off boot counting for the future. 75 76## Walkthrough 77 78Here's an example walkthrough of how this all fits together. 79 801. The user runs `echo 3 > /etc/kernel/tries` to enable boot counting. 81 822. A new kernel is installed. `kernel-install` is used to generate a new boot 83 loader entry file for it. Let's say the version string for the new kernel is 84 `4.14.11-300.fc27.x86_64`, a new boot loader entry 85 `/boot/loader/entries/4.14.11-300.fc27.x86_64+3.conf` is hence created. 86 873. The system is booted for the first time after the new kernel is 88 installed. The boot loader now sees the `+3` counter in the entry file 89 name. It hence renames the file to `4.14.11-300.fc27.x86_64+2-1.conf` 90 indicating that at this point one attempt has started and thus only one less 91 is left. After the rename completed the entry is booted as usual. 92 934. Let's say this attempt to boot fails. On the following boot the boot loader 94 will hence see the `+2-1` tag in the name, and hence rename the entry file to 95 `4.14.11-300.fc27.x86_64+1-2.conf`, and boot it. 96 975. Let's say the boot fails again. On the subsequent boot the loader hence will 98 see the `+1-2` tag, and rename the file to 99 `4.14.11-300.fc27.x86_64+0-3.conf` and boot it. 100 1016. If this boot also fails, on the next boot the boot loader will see the 102 tag `+0-3`, i.e. the counter reached zero. At this point the entry will be 103 considered "bad", and ordered to the beginning of the list of entries. The 104 next newest boot entry is now tried, i.e. the system automatically reverted 105 back to an earlier version. 106 107The above describes the walkthrough when the selected boot entry continuously 108fails. Let's have a look at an alternative ending to this walkthrough. In this 109scenario the first 4 steps are the same as above: 110 1111. *as above* 112 1132. *as above* 114 1153. *as above* 116 1174. *as above* 118 1195. Let's say the second boot succeeds. The kernel initializes properly, systemd 120 is started and invokes all generators. 121 1226. One of the generators started is `systemd-bless-boot-generator` which 123 detects that boot counting is used. It hence pulls 124 `systemd-bless-boot.service` into the initial transaction. 125 1267. `systemd-bless-boot.service` is ordered after and `Requires=` the generic 127 `boot-complete.target` unit. This unit is hence also pulled into the initial 128 transaction. 129 1308. The `boot-complete.target` unit is ordered after and pulls in various units 131 that are required to succeed for the boot process to be considered 132 successful. One such unit is `systemd-boot-check-no-failures.service`. 133 1349. `systemd-boot-check-no-failures.service` is run after all its own 135 dependencies completed, and assesses that the boot completed 136 successfully. It hence exits cleanly. 137 13810. This allows `boot-complete.target` to be reached. This signifies to the 139 system that this boot attempt shall be considered successful. 140 14111. Which in turn permits `systemd-bless-boot.service` to run. It now 142 determines which boot loader entry file was used to boot the system, and 143 renames it dropping the counter tag. Thus 144 `4.14.11-300.fc27.x86_64+1-2.conf` is renamed to 145 `4.14.11-300.fc27.x86_64.conf`. From this moment boot counting is turned 146 off. 147 14812. On the following boot (and all subsequent boots after that) the entry is 149 now seen with boot counting turned off, no further renaming takes place. 150 151## How to adapt this scheme to other setups 152 153Of the stack described above many components may be replaced or augmented. Here 154are a couple of recommendations. 155 1561. To support alternative boot loaders in place of `systemd-boot` two scenarios 157 are recommended: 158 159 a. Boot loaders already implementing the Boot Loader Specification can simply 160 implement an equivalent file rename based logic, and thus integrate fully 161 with the rest of the stack. 162 163 b. Boot loaders that want to implement boot counting and store the counters 164 elsewhere can provide their own replacements for 165 `systemd-bless-boot.service` and `systemd-bless-boot-generator`, but should 166 continue to use `boot-complete.target` and thus support any services 167 ordered before that. 168 1692. To support additional components that shall succeed before the boot is 170 considered successful, simply place them in units (if they aren't already) 171 and order them before the generic `boot-complete.target` target unit, 172 combined with `Requires=` dependencies from the target, so that the target 173 cannot be reached when any of the units fail. You may add any number of 174 units like this, and only if they all succeed the boot entry is marked as 175 good. Note that the target unit shall pull in these boot checking units, not 176 the other way around. 177 1783. To support additional components that shall only run on boot success, simply 179 wrap them in a unit and order them after `boot-complete.target`, pulling it 180 in. 181 182## FAQ 183 1841. *Why do you use file renames to store the counter? Why not a regular file?* 185 — Mainly two reasons: it's relatively likely that renames can be implemented 186 atomically even in simpler file systems, while writing to file contents has 187 a much bigger chance to be result in incomplete or corrupt data, as renaming 188 generally avoids allocating or releasing data blocks. Moreover it has the 189 benefit that the boot count metadata is directly attached to the boot loader 190 entry file, and thus the lifecycle of the metadata and the entry itself are 191 bound together. This means no additional clean-up needs to take place to 192 drop the boot loader counting information for an entry when it is removed. 193 1942. *Why not use EFI variables for storing the boot counter?* — The memory chips 195 used to back the persistent EFI variables are generally not of the highest 196 quality, hence shouldn't be written to more than necessary. This means we 197 can't really use it for changes made regularly during boot, but can use it 198 only for seldom made configuration changes. 199 2003. *I have a service which — when it fails — should immediately cause a 201 reboot. How does that fit in with the above?* — Well, that's orthogonal to 202 the above, please use `FailureAction=` in the unit file for this. 203 2044. *Under some condition I want to mark the current boot loader entry as bad 205 right-away, so that it never is tried again, how do I do that?* — You may 206 invoke `/usr/lib/systemd/systemd-bless-boot bad` at any time to mark the 207 current boot loader entry as "bad" right-away so that it isn't tried again 208 on later boots. 209