1---
2title: Automatic Boot Assessment
3category: Booting
4layout: default
5SPDX-License-Identifier: LGPL-2.1-or-later
6---
7
8# Automatic Boot Assessment
9
10systemd provides support for automatically reverting back to the previous
11version of the OS or kernel in case the system consistently fails to boot. This
12support is built into various of its components. When used together these
13components provide a complete solution on UEFI systems, built as add-on to the
14[Boot Loader Specification](BOOT_LOADER_SPECIFICATION.md).
15However, the different components may also be used independently, and in
16combination with other software, to implement similar schemes, for example with
17other boot loaders or for non-UEFI systems. Here's a brief overview of the
18complete set of components:
19
20* The
21  [`systemd-boot(7)`](https://www.freedesktop.org/software/systemd/man/systemd-boot.html)
22  boot loader optionally maintains a per-boot-loader-entry counter that is
23  decreased by one on each attempt to boot the entry, prioritizing entries that
24  have non-zero counters over those which already reached a counter of zero
25  when choosing the entry to boot.
26
27* The
28  [`systemd-bless-boot.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot.service.html)
29  service automatically marks a boot loader entry, for which boot counting as
30  mentioned above is enabled, as "good" when a boot has been determined to be
31  successful, thus turning off boot counting for it.
32
33* The
34  [`systemd-bless-boot-generator(8)`](https://www.freedesktop.org/software/systemd/man/systemd-bless-boot-generator.html)
35  generator automatically pulls in `systemd-bless-boot.service` when use of
36  `systemd-boot` with boot counting enabled is detected.
37
38* The
39  [`systemd-boot-check-no-failures.service(8)`](https://www.freedesktop.org/software/systemd/man/systemd-boot-check-no-failures.service.html)
40  service is a simple health check tool that determines whether the boot
41  completed successfully. When enabled it becomes an indirect dependency of
42  `systemd-bless-boot.service` (by means of `boot-complete.target`, see
43  below), ensuring that the boot will not be considered successful if there are
44  any failed services.
45
46* The `boot-complete.target` target unit (see
47  [`systemd.special(7)`](https://www.freedesktop.org/software/systemd/man/systemd.special.html))
48  serves as a generic extension point both for units that are necessary to
49  consider a boot successful (example: `systemd-boot-check-no-failures.service`
50  as described above), and units that want to act only if the boot is
51  successful (example: `systemd-bless-boot.service` as described above).
52
53* The
54  [`kernel-install(8)`](https://www.freedesktop.org/software/systemd/man/kernel-install.html)
55  script can optionally create boot loader entries that carry an initial boot
56  counter (the initial counter is configurable in `/etc/kernel/tries`).
57
58## Details
59
60The boot counting data `systemd-boot` and `systemd-bless-boot.service`
61manage is stored in the name of the boot loader entries. If a boot loader entry
62file name contains `+` followed by one or two numbers (if two numbers, then
63those need to be separated by `-`) right before the `.conf` suffix, then boot
64counting is enabled for it. The first number is the "tries left" counter
65encoding how many attempts to boot this entry shall still be made. The second
66number is the "tries done" counter, encoding how many failed attempts to boot
67it have already been made. Each time a boot loader entry marked this way is
68booted the first counter is decreased by one, and the second one increased by
69one. (If the second counter is missing, then it is assumed to be equivalent to
70zero.) If the "tries left" counter is above zero the entry is still considered
71for booting (the entry's state is considered to be "indeterminate"), as soon as
72it reached zero the entry is not tried anymore (entry state "bad"). If the boot
73attempt completed successfully the entry's counters are removed from the name
74(entry state "good"), thus turning off boot counting for the future.
75
76## Walkthrough
77
78Here's an example walkthrough of how this all fits together.
79
801. The user runs `echo 3 > /etc/kernel/tries` to enable boot counting.
81
822. A new kernel is installed. `kernel-install` is used to generate a new boot
83   loader entry file for it. Let's say the version string for the new kernel is
84   `4.14.11-300.fc27.x86_64`, a new boot loader entry
85   `/boot/loader/entries/4.14.11-300.fc27.x86_64+3.conf` is hence created.
86
873. The system is booted for the first time after the new kernel is
88   installed. The boot loader now sees the `+3` counter in the entry file
89   name. It hence renames the file to `4.14.11-300.fc27.x86_64+2-1.conf`
90   indicating that at this point one attempt has started and thus only one less
91   is left. After the rename completed the entry is booted as usual.
92
934. Let's say this attempt to boot fails. On the following boot the boot loader
94   will hence see the `+2-1` tag in the name, and hence rename the entry file to
95   `4.14.11-300.fc27.x86_64+1-2.conf`, and boot it.
96
975. Let's say the boot fails again. On the subsequent boot the loader hence will
98   see the `+1-2` tag, and rename the file to
99   `4.14.11-300.fc27.x86_64+0-3.conf` and boot it.
100
1016. If this boot also fails, on the next boot the boot loader will see the
102   tag `+0-3`, i.e. the counter reached zero. At this point the entry will be
103   considered "bad", and ordered to the beginning of the list of entries. The
104   next newest boot entry is now tried, i.e. the system automatically reverted
105   back to an earlier version.
106
107The above describes the walkthrough when the selected boot entry continuously
108fails. Let's have a look at an alternative ending to this walkthrough. In this
109scenario the first 4 steps are the same as above:
110
1111. *as above*
112
1132. *as above*
114
1153. *as above*
116
1174. *as above*
118
1195. Let's say the second boot succeeds. The kernel initializes properly, systemd
120   is started and invokes all generators.
121
1226. One of the generators started is `systemd-bless-boot-generator` which
123   detects that boot counting is used. It hence pulls
124   `systemd-bless-boot.service` into the initial transaction.
125
1267. `systemd-bless-boot.service` is ordered after and `Requires=` the generic
127   `boot-complete.target` unit. This unit is hence also pulled into the initial
128   transaction.
129
1308. The `boot-complete.target` unit is ordered after and pulls in various units
131   that are required to succeed for the boot process to be considered
132   successful. One such unit is `systemd-boot-check-no-failures.service`.
133
1349. `systemd-boot-check-no-failures.service` is run after all its own
135   dependencies completed, and assesses that the boot completed
136   successfully. It hence exits cleanly.
137
13810. This allows `boot-complete.target` to be reached. This signifies to the
139    system that this boot attempt shall be considered successful.
140
14111. Which in turn permits `systemd-bless-boot.service` to run. It now
142    determines which boot loader entry file was used to boot the system, and
143    renames it dropping the counter tag. Thus
144    `4.14.11-300.fc27.x86_64+1-2.conf` is renamed to
145    `4.14.11-300.fc27.x86_64.conf`. From this moment boot counting is turned
146    off.
147
14812. On the following boot (and all subsequent boots after that) the entry is
149    now seen with boot counting turned off, no further renaming takes place.
150
151## How to adapt this scheme to other setups
152
153Of the stack described above many components may be replaced or augmented. Here
154are a couple of recommendations.
155
1561. To support alternative boot loaders in place of `systemd-boot` two scenarios
157   are recommended:
158
159    a. Boot loaders already implementing the Boot Loader Specification can simply
160       implement an equivalent file rename based logic, and thus integrate fully
161       with the rest of the stack.
162
163    b. Boot loaders that want to implement boot counting and store the counters
164       elsewhere can provide their own replacements for
165       `systemd-bless-boot.service` and `systemd-bless-boot-generator`, but should
166       continue to use `boot-complete.target` and thus support any services
167       ordered before that.
168
1692. To support additional components that shall succeed before the boot is
170   considered successful, simply place them in units (if they aren't already)
171   and order them before the generic `boot-complete.target` target unit,
172   combined with `Requires=` dependencies from the target, so that the target
173   cannot be reached when any of the units fail. You may add any number of
174   units like this, and only if they all succeed the boot entry is marked as
175   good. Note that the target unit shall pull in these boot checking units, not
176   the other way around.
177
1783. To support additional components that shall only run on boot success, simply
179   wrap them in a unit and order them after `boot-complete.target`, pulling it
180   in.
181
182## FAQ
183
1841. *Why do you use file renames to store the counter? Why not a regular file?*
185   — Mainly two reasons: it's relatively likely that renames can be implemented
186   atomically even in simpler file systems, while writing to file contents has
187   a much bigger chance to be result in incomplete or corrupt data, as renaming
188   generally avoids allocating or releasing data blocks. Moreover it has the
189   benefit that the boot count metadata is directly attached to the boot loader
190   entry file, and thus the lifecycle of the metadata and the entry itself are
191   bound together. This means no additional clean-up needs to take place to
192   drop the boot loader counting information for an entry when it is removed.
193
1942. *Why not use EFI variables for storing the boot counter?* — The memory chips
195   used to back the persistent EFI variables are generally not of the highest
196   quality, hence shouldn't be written to more than necessary. This means we
197   can't really use it for changes made regularly during boot, but can use it
198   only for seldom made configuration changes.
199
2003. *I have a service which — when it fails — should immediately cause a
201   reboot. How does that fit in with the above?* — Well, that's orthogonal to
202   the above, please use `FailureAction=` in the unit file for this.
203
2044. *Under some condition I want to mark the current boot loader entry as bad
205   right-away, so that it never is tried again, how do I do that?* — You may
206   invoke `/usr/lib/systemd/systemd-bless-boot bad` at any time to mark the
207   current boot loader entry as "bad" right-away so that it isn't tried again
208   on later boots.
209