README
1 Daemontools and runit
2
3Tired of PID files, needing root access, and writing init scripts just
4to have your UNIX apps start when your server boots? Want a simpler,
5better alternative that will also restart them if they crash? If so,
6this is an introduction to process supervision with runit/daemontools.
7
8
9 Background
10
11Classic init scripts, e.g. /etc/init.d/apache, are widely used for
12starting processes at system boot time, when they are executed by init.
13Sadly, init scripts are cumbersome and error-prone to write, they must
14typically be edited and run as root, and the processes they launch do
15not get restarted automatically if they crash.
16
17In an alternative scheme called "process supervision", each important
18process is looked after by a tiny supervising process, which deals with
19starting and stopping the important process on request, and re-starting
20it when it exits unexpectedly. Those supervising processes can in turn
21be supervised by other supervising processes.
22
23Dan Bernstein wrote the process supervision toolkit, "daemontools",
24which is a set of small, reliable programs that cooperate in the
25UNIX tradition to manage process supervision trees.
26
27Runit is a more conveniently licensed and more actively maintained
28reimplementation of daemontools, written by Gerrit Pape.
29
30Here I’ll use runit, however, the ideas are the same for other
31daemontools-like projects (there are several).
32
33
34 Service directories and scripts
35
36In runit parlance a "service" is simply a directory containing a script
37named "run".
38
39There are just two key programs in runit. Firstly, runsv supervises the
40process for an individual service. Service directories themselves sit
41inside a containing directory, and the runsvdir program supervises that
42directory, running one child runsv process for the service in each
43subdirectory. A typical choice is to start an instance of runsvdir
44which supervises services in subdirectories of /var/service/.
45
46If /var/service/log/ exists, runsv will supervise two services,
47and will connect stdout of main service to the stdin of log service.
48This is primarily used for logging.
49
50You can debug an individual service by running its SERVICE_DIR/run script.
51In this case, its stdout and stderr go to your terminal.
52
53You can also run "runsv SERVICE_DIR", which runs both the service
54and its logger service (SERVICE_DIR/log/run) if logger service exists.
55If logger service exists, the output will go to it instead of the terminal.
56
57"runsvdir /var/service" merely runs "runsv SERVICE_DIR" for every subdirectory
58in /var/service.
59
60
61 Examples
62
63This directory contains some examples of services:
64
65 var_service/getty_<tty>
66
67Runs a getty on <tty>. (run script looks at $PWD and extracts suffix
68after "_" as tty name). Create copies (or symlinks) of this directory
69with different names to run many gettys on many ttys.
70
71 var_service/gpm
72
73Runs gpm, the cut and paste utility and mouse server for text consoles.
74
75 var_service/inetd
76
77Runs inetd. This is an example of a service with log. Log service
78writes timestamped, rotated log data to /var/log/service/inetd/*
79using "svlogd -tt". p_log and w_log scripts demonstrage how you can
80"page log" and "watch log".
81
82Other services which have logs handle them in the same way.
83
84 var_service/nmeter
85
86Runs nmeter '%t %c ....' with output to /dev/tty9. This gives you
87a 1-second sampling of server load and health on a dedicated text console.
88
89
90 Networking examples
91
92In many cases, network configuration makes it necessary to run several daemons:
93dhcp, zeroconf, ppp, openvpn and such. They need to be controlled,
94and in many cases you also want to babysit them.
95
96They present a case where different services need to control (start, stop,
97restart) each other.
98
99 var_service/dhcp_if
100
101controls a udhcpc instance which provides DHCP-assigned IP
102address on interface named "if". Copy/rename this directory as needed to run
103udhcpc on other interfaces (var_service/dhcp_if/run script uses _foo suffix
104of the parent directory as interface name).
105
106When IP address is obtained or lost, var_service/dhcp_if/dhcp_handler is run.
107It saves new config data to /var/run/service/fw/dhcp_if.ipconf and (re)starts
108/var/service/fw service. This example can be used as a template for other
109dynamic network link services (ppp/vpn/zcip).
110
111This is an example of service with has a "finish" script. If downed ("sv d"),
112"finish" is executed. For this service, it removes DHCP address from
113the interface. This is useful when ifplugd detects that the the link is dead
114(cable is no longer attached anywhere) and downs us - keeping DHCP configured
115addresses on the interface would make kernel still try to use it.
116
117 var_service/zcip_if
118
119Zeroconf IP service: assigns a 169.254.x.y/16 address to interface "if".
120This allows to talk to other devices on a network without DHCP server
121(if they also assign 169.254 addresses to themselves).
122
123 var_service/ifplugd_if
124
125Watches link status of interface "if". Downs and ups /var/service/dhcp_if
126service accordingly. In effect, it allows you to unplug/plug-to-different-network
127and have your IP properly re-negotiated at once.
128
129 var_service/dhcp_if_pinger
130
131Uses var_service/dhcp_if's data to determine router IP. Pings it.
132If ping fails, restarts /var/service/dhcp_if service.
133Basically, an example of watchdog service for networks which are not reliable
134and need babysitting.
135
136 var_service/supplicant_if
137
138Wireless supplicant (wifi association and encryption daemon) service for
139interface "if".
140
141 var_service/fw
142
143"Firewall" script, although it is tasked with much more than setting up firewall.
144It is responsible for all aspects of network configuration.
145
146This is an example of *one-shot* service.
147
148It reconfigures network based on current known state of ALL interfaces.
149Uses conf/*.ipconf (static config) and /var/run/service/fw/*.ipconf
150(dynamic config from dhcp/ppp/vpn/etc) to determine what to do.
151
152One-shot-ness of this service means that it shuts itself off after single run.
153IOW: it is not a constantly running daemon sort of thing.
154It starts, it configures the network, it shuts down, all done
155(unlike infamous NetworkManagers which sit in RAM forever).
156
157However, any dhcp/ppp/vpn or similar service can restart it anytime
158when it senses the change in network configuration.
159This even works while fw service runs: if dhcp signals fw to (re)start
160while fw runs, fw will not stop after its execution, but will re-execute once,
161picking up dhcp's new configuration.
162This is achieved very simply by having
163 # Make ourself one-shot
164 sv o .
165at the very beginning of fw/run script, not at the end.
166
167Therefore, any "sv u fw" command by any other script "undoes" o(ne-shot)
168command if fw still runs, thus runsv will rerun it; or start it
169in a normal way if fw is not running.
170
171This mechanism is the reason why fw is a service, not just a script.
172
173System administrators are expected to edit fw/run script, since
174network configuration needs are likely to be very complex and different
175for non-trivial installations.
176
177 var_service/ftpd
178 var_service/httpd
179 var_service/tftpd
180 var_service/ntpd
181
182Examples of typical network daemons.
183
184
185 Process tree
186
187Here is an example of the process tree from a live system with these services
188(and a few others). An interesting detail are ftpd and vpnc services, where
189you can see only logger process. These services are "downed" at the moment:
190their daemons are not launched.
191
192PID TIME COMMAND
193553 0:04 runsvdir -P /var/service
194561 0:00 runsv sshd
195576 0:00 svlogd -tt /var/log/service/sshd
196589 0:00 /usr/sbin/sshd -D -e -p22 -u0 -h /var/service/sshd/ssh_host_rsa_key
197562 0:00 runsv dhcp_eth0
198568 0:00 svlogd -tt /var/log/service/dhcp_eth0
199850 0:00 udhcpc -vv --foreground --interface=eth0
200 --pidfile=/var/service/dhcp_eth0/udhcpc.pid
201 --script=/var/service/dhcp_eth0/dhcp_handler
202 -x hostname bbox
203563 0:00 runsv ntpd
204573 0:01 svlogd -tt /var/log/service/ntpd
205845 0:00 busybox ntpd -dddnNl -S ./ntp.script -p 10.x.x.x -p 10.x.x.x
206564 0:00 runsv ifplugd_wlan0
207598 0:00 svlogd -tt /var/log/service/ifplugd_wlan0
208614 0:05 ifplugd -apqns -t3 -u0 -d0 -i wlan0
209 -r /var/service/ifplugd_wlan0/ifplugd_handler
210565 0:08 runsv dhcp_wlan0_pinger
211911 0:00 sleep 67
212566 0:00 runsv unscd
213583 0:03 svlogd -tt /var/log/service/unscd
214599 0:02 nscd -dddd
215567 0:00 runsv dhcp_wlan0
216591 0:00 svlogd -tt /var/log/service/dhcp_wlan0
217802 0:00 udhcpc -vv -C -o -V --foreground --interface=wlan0
218 --pidfile=/var/service/dhcp_wlan0/udhcpc.pid
219 --script=/var/service/dhcp_wlan0/dhcp_handler
220569 0:00 runsv fw
221570 0:00 runsv ifplugd_eth0
222597 0:00 svlogd -tt /var/log/service/ifplugd_eth0
223612 0:05 ifplugd -apqns -t3 -u8 -d8 -i eth0
224 -r /var/service/ifplugd_eth0/ifplugd_handler
225571 0:00 runsv zcip_eth0
226590 0:00 svlogd -tt /var/log/service/zcip_eth0
227607 0:01 zcip -fvv eth0 /var/service/zcip_eth0/zcip_handler
228572 0:00 runsv ftpd
229604 0:00 svlogd -tt /var/log/service/ftpd
230574 0:00 runsv vpnc
231603 0:00 svlogd -tt /var/log/service/vpnc
232575 0:00 runsv httpd
233602 0:00 svlogd -tt /var/log/service/httpd
234622 0:00 busybox httpd -p80 -vvv -f -h /home/httpd_root
235577 0:00 runsv supplicant_wlan0
236627 0:00 svlogd -tt /var/log/service/supplicant_wlan0
237638 0:03 wpa_supplicant -i wlan0
238 -c /var/service/supplicant_wlan0/wpa_supplicant.conf -d
239
README_distro_proposal.txt
1 A distro which already uses runit
2
3I installed Void Linux, in order to see what do they have.
4Xfce desktop looks fairly okay, network is up.
5ps tells me they did put X, dbus, NM and udev into runsvdir-supervised tree:
6
7 1 ? 00:00:01 runit
8 623 ? 00:00:00 runsvdir
9 629 ? 00:00:00 runsv
10 650 tty1 00:00:00 agetty
11 630 ? 00:00:00 runsv
12 644 ? 00:00:09 NetworkManager
13 1737 ? 00:00:00 dhclient
14 631 ? 00:00:00 runsv
15 639 tty4 00:00:00 agetty
16 632 ? 00:00:00 runsv
17 640 ? 00:00:00 sshd
18 1804 ? 00:00:00 sshd
19 1809 pts/3 00:00:00 sh
20 1818 pts/3 00:00:00 ps
21 633 ? 00:00:00 runsv
22 637 tty5 00:00:00 agetty
23 634 ? 00:00:00 runsv
24 796 ? 00:00:00 dhclient
25 635 ? 00:00:00 runsv
26 649 ? 00:00:00 uuidd
27 636 ? 00:00:00 runsv
28 647 ? 00:00:00 acpid
29 638 ? 00:00:00 runsv
30 652 ? 00:00:00 console-kit-dae
31 641 ? 00:00:00 runsv
32 651 tty6 00:00:00 agetty
33 642 ? 00:00:00 runsv
34 660 tty2 00:00:00 agetty
35 643 ? 00:00:00 runsv
36 657 ? 00:00:02 dbus-daemon
37 645 ? 00:00:00 runsv
38 658 ? 00:00:00 cgmanager
39 648 ? 00:00:00 runsv
40 656 tty3 00:00:00 agetty
41 653 ? 00:00:00 runsv
42 655 ? 00:00:00 lxdm-binary
43 698 tty7 00:00:14 Xorg
44 729 ? 00:00:00 lxdm-session
45 956 ? 00:00:00 sh
46 982 ? 00:00:00 xfce4-session
47 1006 ? 00:00:04 nm-applet
48 654 ? 00:00:00 runsv
49 659 ? 00:00:00 udevd
50
51Here is a link to Void Linux's wiki:
52
53 https://wiki.voidlinux.eu/Runit
54
55Void Linux packages install their services as subdirectories of /etc/rc,
56such as /etc/sv/sshd, with a script file, "run", and a link
57"supervise" -> /run/runit/supervise.sshd
58
59For sshd, "run" contains:
60
61 #!/bin/sh
62 ssh-keygen -A >/dev/null 2>&1 # generate host keys if they don't exist
63 [ -r conf ] && . ./conf
64 exec /usr/bin/sshd -D $OPTS
65
66That's it from the POV of the packager.
67
68This is pretty minimalistic, and yet, it is already distro-specific:
69the link to /run/runit/* is conceptually wrong, it requires packagers
70to know that /etc/rc should not be mutable and thus they need to use
71a different location in filesystem for supervise/ directory.
72
73I think a good thing would be to require just one file: the "run" script.
74The rest should be handled by distro tooling, not by packager.
75
76A similar issue is arising with logging. It would be ideal if packagers
77would not need to know how a particular distro manages logs.
78Whatever their daemons print to stdout/stderr, should be automagically logged
79in a way distro prefers.
80
81* * * * * * * *
82
83 Proposed "standard" on how distros should use runit
84
85The original idea of services-as-directories belongs to D.J.Bernstein (djb),
86and his project to implement it is daemontools: https://cr.yp.to/daemontools.html
87
88There are several reimplementations of daemontools:
89- runit: by Gerrit Pape, http://smarden.org/runit/
90 (busybox has it included)
91- s6: by Laurent Bercot, http://skarnet.org/software/s6/
92
93It is not required that a specific clone should be used. Let evolution work.
94
95
96 Terminology
97
98daemon: any long running background program. Common examples are sshd, getty,
99ntpd, dhcp client...
100
101service: daemon controlled by a service monitor.
102
103service directory: a directory with an executable file (script) named "run"
104which (usually) execs some daemon, possibly after some preparatory steps.
105It should start it not as a child or daemonized process, but by exec'ing it
106(inheriting the same PID and the place in the process tree).
107
108service monitor: a tool which watches a set of service directories.
109In daemontools package, it is called "svscan". In runit, it is called
110"runsvdir". In s6, it is called "s6-svscan".
111Service monitor starts a supervisor for each service directory.
112If it dies, it restarts it. If service directory disappears,
113service monitor will not be restarted if it dies.
114runit's service monitor (runsvdir) sends SIGTERM to supervisors
115whose directories disappeared.
116
117supervisor: a tool which monitors one service directory.
118It runs "run" script as its child. It restarts it if it dies.
119It can be instructed to start/stop/signal its child.
120In daemontools package, it is called "supervise". In runit, it is called
121"runsv". In s6, it is called "s6-supervise".
122
123Conceptually, a daemontools clone can be designed such that it does not *have*
124the supervisor component: service monitor can directly monitor all its daemons
125(for example, this may be a good idea for memory-constrained systems).
126However all three existing projects (daemontools/runit/s6) do have a per-service
127supervisor process.
128
129log service: a service which is exclusively tasked with logging
130the output of another service. It is implemented as log/ subdirectory
131in a service directory. It has the same structure as "normal"
132service dirs: it has a "run" script which starts a logging tool.
133
134If log service exists, stdout of its "main" service is piped
135to log service. Stops/restarts of either of them do not sever the pipe
136between them.
137
138If log service exists, daemontools and s6 run a pair of supervisors
139(one for the daemon, one for the logger); runit runs only one supervisor
140per service, which is handling both of them (presumably this is done
141to use fewer processes and thus, fewer resources).
142
143
144 User API
145
146"Users" of service monitoring are authors of software which has daemons.
147They need to package their daemons to be installed as services at package
148install time. And they need to do this for many distros.
149The less distros diverge, the easier users' lives are.
150
151System-wide service dirs reside in a distro-specific location.
152The recommended location is /var/service. (However, since it is not
153a mandatory location, avoid depending on it in your run scripts.
154Void Linux wanted to have it somewhere in /run/*, and they solved this
155by making /var/service a symlink).
156
157The install location for service dirs is /etc/rc:
158when e.g. ntpd daemon is installed, it creates the /etc/rc/ntpd
159directory with (minimally) one executable file (script) named "run"
160which starts ntpd daemon. It can have other files there.
161
162At boot, distro should copy /etc/rc/* to a suitable writable
163directory (common choice are /var/service, /run/service etc).
164It should create log/ directories in each subdirectory
165and create "run" files in them with suitable (for this particular distro)
166logging tool invocation, unless this directory chose to channel
167all logging from all daemons through service monitor process
168and log all of them into one file/database/whatever,
169in which case log/ directories should not be created.
170
171It is allowable for a distro to directly use /etc/rc/ as the only
172location of its service directories. (For example,
173/var/service may be a symlink to /etc/rc).
174However, it poses some problems:
175
176(1) Supervision tools will need to write to subdirectories:
177the control of running daemons is implemented via some files and fifos
178in automatically created supervise/ subdirectory in each /etc/rc/DIR.
179
180(2) Creation of a new service can race with the rescanning of /etc/rc/
181by service monitor: service monitor may see a directory with only some files
182present. If it attempts to start the service in this state, all sorts
183of bad things may happen. This may be worked around by various
184heuristics in service monitor which give new service a few seconds
185of "grace time" to be fully populated; but this is not yet
186implemented in any of three packages.
187This also may be worked around by creating a .dotdir (a directory
188whose name starts with a dot), populating it, and then renaming;
189but packaging tools usually do not have an option to do this
190automatically - additional install scripting in packages will be needed.
191
192Daemons' output file descriptors are handled somewhat awkwardly
193by various daemontools implementations. For example, for runit tools,
194daemons' stdout goes to wherever runsvdir's stdout was directed;
195stderr goes to runsvdir, which in turn "rotates" it on its command line
196(which is visible in ps output).
197
198Hopefully this get changed/standardized; while it is not, the "run" file
199should start with a
200
201 exec 2>&1
202
203command, making stderr equivalent to stdout.
204An especially primitive service which does not want its output to be logged
205with standard tools can do
206
207 exec >LOGFILE 2>&1
208
209or even
210
211 exec >/dev/null 2>&1
212
213To prevent creation of distro-specific log/ directory, a service directory
214in /etc/rc can contain an empty "log" file.
215
216
217 Controlling daemons
218
219The "svc" tool is available for admins and scripts to control services.
220In particular, often one service needs to control another:
221e.g. ifplugd can detect that the network cable was just plugged in,
222and it needs to (re)start DHCP service for this network device.
223
224The name of this tool is not standard either, which is an obvious problem.
225I propose to fix this by implementing a tool with fixed name and API by all
226daemontools clones. Lets use original daemontools name and API. Thus:
227
228The following form must work:
229
230 svc -udopchaitkx DIR
231
232Options map to up/down/once/STOP/CONT/HUP/ALRM/INT/TERM/KILL/exit
233commands to the daemon being controlled.
234
235The form with one option letter must work. If multiple-option form
236is supported, there is no guarantee in which order they take effect:
237svc -it DIR can deliver TERM and INT in any order.
238
239If more than one DIR can be specified (which is not a requirement),
240there is no guarantee in which order commands are sent to them.
241
242If DIR has no slash and is not "." or "..", it is assumed to be
243relative to the system-wide service directory.
244
245[Currently, "svc" exists only in daemontools and in busybox.
246This proposal asks developers of other daemontools implementations
247to add "svc" command to their projects]
248
249The "svok DIR" tool exits 0 if service supervisor is running
250(with service itself either running or stopped), and nonzero if not.
251
252Other tools with different names and APIs may exist; however
253for portability scripts should use the above tools.
254
255Creation of a new service on a running system should be done atomically.
256To this end, first create and populate a new /etc/rc/DIR.
257
258Then "activate" it by running ??????? - this copies (or symlinks,
259depending on the distro) its files to the "live" service directory,
260wherever it is located on this distro.
261
262Removal of the service should be done as follows:
263svc -d DIR [DIR/log], then remove the service directory:
264this makes service monitor SIGTERM per-directory supervisors
265(if they exist in the implementation).
266
267
268 Implementation details
269
270Top-level service monitor program name is not standardized
271[svscan, runsvdir, s6-svscan ...] - it does not need to be,
272as far as daemon packagers are concerned.
273
274It may run one per-directory supervisor, or two supervisors
275(one for DIR/ and one for DIR/log/); for memory-constrained systems
276an implementation is possible which itself controls all services, without
277intermediate supervisors.
278[runsvdir runs one "runsv DIR" per DIR, runsv handles DIR/log/ if that exists]
279[svscan runs a pair of "supervise DIR" and "supervise DIR/log"]
280
281Directories are remembered by device+inode numbers, not names. Renaming a directory
282does not affect the running service (unless it is renamed to a .dotdir).
283
284Removal (or .dotdiring) of a directory sends SIGTERM to any running services.
285
286Standard output of non-logged services goes to standard output of service monitor.
287Standard output of logger services goes to standard output of service monitor.
288Standard error of them always goes to standard error of service monitor.
289
290If you want to log standard error of your logged service along with its stdout, use
291"exec 2>&1" in the beginning of your "run" script.
292
293Whether stdout/stderr of service monitor is discarded (>/dev/null)
294or logged in some way is system-dependent.
295
296
297 Containers
298
299[What do containers need?]
300