1 Mandatory File Locking For The Linux Operating System 2 3 Andy Walker <andy@lysaker.kvaerner.no> 4 5 15 April 1996 6 7 81. What is mandatory locking? 9------------------------------ 10 11Mandatory locking is kernel enforced file locking, as opposed to the more usual 12cooperative file locking used to guarantee sequential access to files among 13processes. File locks are applied using the flock() and fcntl() system calls 14(and the lockf() library routine which is a wrapper around fcntl().) It is 15normally a process' responsibility to check for locks on a file it wishes to 16update, before applying its own lock, updating the file and unlocking it again. 17The most commonly used example of this (and in the case of sendmail, the most 18troublesome) is access to a user's mailbox. The mail user agent and the mail 19transfer agent must guard against updating the mailbox at the same time, and 20prevent reading the mailbox while it is being updated. 21 22In a perfect world all processes would use and honour a cooperative, or 23"advisory" locking scheme. However, the world isn't perfect, and there's 24a lot of poorly written code out there. 25 26In trying to address this problem, the designers of System V UNIX came up 27with a "mandatory" locking scheme, whereby the operating system kernel would 28block attempts by a process to write to a file that another process holds a 29"read" -or- "shared" lock on, and block attempts to both read and write to a 30file that a process holds a "write " -or- "exclusive" lock on. 31 32The System V mandatory locking scheme was intended to have as little impact as 33possible on existing user code. The scheme is based on marking individual files 34as candidates for mandatory locking, and using the existing fcntl()/lockf() 35interface for applying locks just as if they were normal, advisory locks. 36 37Note 1: In saying "file" in the paragraphs above I am actually not telling 38the whole truth. System V locking is based on fcntl(). The granularity of 39fcntl() is such that it allows the locking of byte ranges in files, in addition 40to entire files, so the mandatory locking rules also have byte level 41granularity. 42 43Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite 44borrowing the fcntl() locking scheme from System V. The mandatory locking 45scheme is defined by the System V Interface Definition (SVID) Version 3. 46 472. Marking a file for mandatory locking 48--------------------------------------- 49 50A file is marked as a candidate for mandatory locking by setting the group-id 51bit in its file mode but removing the group-execute bit. This is an otherwise 52meaningless combination, and was chosen by the System V implementors so as not 53to break existing user programs. 54 55Note that the group-id bit is usually automatically cleared by the kernel when 56a setgid file is written to. This is a security measure. The kernel has been 57modified to recognize the special case of a mandatory lock candidate and to 58refrain from clearing this bit. Similarly the kernel has been modified not 59to run mandatory lock candidates with setgid privileges. 60 613. Available implementations 62---------------------------- 63 64I have considered the implementations of mandatory locking available with 65SunOS 4.1.x, Solaris 2.x and HP-UX 9.x. 66 67Generally I have tried to make the most sense out of the behaviour exhibited 68by these three reference systems. There are many anomalies. 69 70All the reference systems reject all calls to open() for a file on which 71another process has outstanding mandatory locks. This is in direct 72contravention of SVID 3, which states that only calls to open() with the 73O_TRUNC flag set should be rejected. The Linux implementation follows the SVID 74definition, which is the "Right Thing", since only calls with O_TRUNC can 75modify the contents of the file. 76 77HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not 78just mandatory locks. That would appear to contravene POSIX.1. 79 80mmap() is another interesting case. All the operating systems mentioned 81prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX 82also disallows advisory locks for such a file. SVID actually specifies the 83paranoid HP-UX behaviour. 84 85In my opinion only MAP_SHARED mappings should be immune from locking, and then 86only from mandatory locks - that is what is currently implemented. 87 88SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for 89mandatory locks, so reads and writes to locked files always block when they 90should return EAGAIN. 91 92I'm afraid that this is such an esoteric area that the semantics described 93below are just as valid as any others, so long as the main points seem to 94agree. 95 964. Semantics 97------------ 98 991. Mandatory locks can only be applied via the fcntl()/lockf() locking 100 interface - in other words the System V/POSIX interface. BSD style 101 locks using flock() never result in a mandatory lock. 102 1032. If a process has locked a region of a file with a mandatory read lock, then 104 other processes are permitted to read from that region. If any of these 105 processes attempts to write to the region it will block until the lock is 106 released, unless the process has opened the file with the O_NONBLOCK 107 flag in which case the system call will return immediately with the error 108 status EAGAIN. 109 1103. If a process has locked a region of a file with a mandatory write lock, all 111 attempts to read or write to that region block until the lock is released, 112 unless a process has opened the file with the O_NONBLOCK flag in which case 113 the system call will return immediately with the error status EAGAIN. 114 1154. Calls to open() with O_TRUNC, or to creat(), on a existing file that has 116 any mandatory locks owned by other processes will be rejected with the 117 error status EAGAIN. 118 1195. Attempts to apply a mandatory lock to a file that is memory mapped and 120 shared (via mmap() with MAP_SHARED) will be rejected with the error status 121 EAGAIN. 122 1236. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED) 124 that has any mandatory locks in effect will be rejected with the error status 125 EAGAIN. 126 1275. Which system calls are affected? 128----------------------------------- 129 130Those which modify a file's contents, not just the inode. That gives read(), 131write(), readv(), writev(), open(), creat(), mmap(), truncate() and 132ftruncate(). truncate() and ftruncate() are considered to be "write" actions 133for the purposes of mandatory locking. 134 135The affected region is usually defined as stretching from the current position 136for the total number of bytes read or written. For the truncate calls it is 137defined as the bytes of a file removed or added (we must also consider bytes 138added, as a lock can specify just "the whole file", rather than a specific 139range of bytes.) 140 141Note 3: I may have overlooked some system calls that need mandatory lock 142checking in my eagerness to get this code out the door. Please let me know, or 143better still fix the system calls yourself and submit a patch to me or Linus. 144 1456. Warning! 146----------- 147 148Not even root can override a mandatory lock, so runaway processes can wreak 149havoc if they lock crucial files. The way around it is to change the file 150permissions (remove the setgid bit) before trying to read or write to it. 151Of course, that might be a bit tricky if the system is hung :-( 152 153