1@node I/O Overview, I/O on Streams, Pattern Matching, Top
2@c %MENU% Introduction to the I/O facilities
3@chapter Input/Output Overview
4
5Most programs need to do either input (reading data) or output (writing
6data), or most frequently both, in order to do anything useful.  @Theglibc{}
7provides such a large selection of input and output functions
8that the hardest part is often deciding which function is most
9appropriate!
10
11This chapter introduces concepts and terminology relating to input
12and output.  Other chapters relating to the GNU I/O facilities are:
13
14@itemize @bullet
15@item
16@ref{I/O on Streams}, which covers the high-level functions
17that operate on streams, including formatted input and output.
18
19@item
20@ref{Low-Level I/O}, which covers the basic I/O and control
21functions on file descriptors.
22
23@item
24@ref{File System Interface}, which covers functions for operating on
25directories and for manipulating file attributes such as access modes
26and ownership.
27
28@item
29@ref{Pipes and FIFOs}, which includes information on the basic interprocess
30communication facilities.
31
32@item
33@ref{Sockets}, which covers a more complicated interprocess communication
34facility with support for networking.
35
36@item
37@ref{Low-Level Terminal Interface}, which covers functions for changing
38how input and output to terminals or other serial devices are processed.
39@end itemize
40
41
42@menu
43* I/O Concepts::       Some basic information and terminology.
44* File Names::         How to refer to a file.
45@end menu
46
47@node I/O Concepts, File Names,  , I/O Overview
48@section Input/Output Concepts
49
50Before you can read or write the contents of a file, you must establish
51a connection or communications channel to the file.  This process is
52called @dfn{opening} the file.  You can open a file for reading, writing,
53or both.
54@cindex opening a file
55
56The connection to an open file is represented either as a stream or as a
57file descriptor.  You pass this as an argument to the functions that do
58the actual read or write operations, to tell them which file to operate
59on.  Certain functions expect streams, and others are designed to
60operate on file descriptors.
61
62When you have finished reading to or writing from the file, you can
63terminate the connection by @dfn{closing} the file.  Once you have
64closed a stream or file descriptor, you cannot do any more input or
65output operations on it.
66
67@menu
68* Streams and File Descriptors::    The GNU C Library provides two ways
69			             to access the contents of files.
70* File Position::                   The number of bytes from the
71                                     beginning of the file.
72@end menu
73
74@node Streams and File Descriptors, File Position,  , I/O Concepts
75@subsection Streams and File Descriptors
76
77When you want to do input or output to a file, you have a choice of two
78basic mechanisms for representing the connection between your program
79and the file: file descriptors and streams.  File descriptors are
80represented as objects of type @code{int}, while streams are represented
81as @code{FILE *} objects.
82
83File descriptors provide a primitive, low-level interface to input and
84output operations.  Both file descriptors and streams can represent a
85connection to a device (such as a terminal), or a pipe or socket for
86communicating with another process, as well as a normal file.  But, if
87you want to do control operations that are specific to a particular kind
88of device, you must use a file descriptor; there are no facilities to
89use streams in this way.  You must also use file descriptors if your
90program needs to do input or output in special modes, such as
91nonblocking (or polled) input (@pxref{File Status Flags}).
92
93Streams provide a higher-level interface, layered on top of the
94primitive file descriptor facilities.  The stream interface treats all
95kinds of files pretty much alike---the sole exception being the three
96styles of buffering that you can choose (@pxref{Stream Buffering}).
97
98The main advantage of using the stream interface is that the set of
99functions for performing actual input and output operations (as opposed
100to control operations) on streams is much richer and more powerful than
101the corresponding facilities for file descriptors.  The file descriptor
102interface provides only simple functions for transferring blocks of
103characters, but the stream interface also provides powerful formatted
104input and output functions (@code{printf} and @code{scanf}) as well as
105functions for character- and line-oriented input and output.
106@c !!! glibc has dprintf, which lets you do printf on an fd.
107
108Since streams are implemented in terms of file descriptors, you can
109extract the file descriptor from a stream and perform low-level
110operations directly on the file descriptor.  You can also initially open
111a connection as a file descriptor and then make a stream associated with
112that file descriptor.
113
114In general, you should stick with using streams rather than file
115descriptors, unless there is some specific operation you want to do that
116can only be done on a file descriptor.  If you are a beginning
117programmer and aren't sure what functions to use, we suggest that you
118concentrate on the formatted input functions (@pxref{Formatted Input})
119and formatted output functions (@pxref{Formatted Output}).
120
121If you are concerned about portability of your programs to systems other
122than GNU, you should also be aware that file descriptors are not as
123portable as streams.  You can expect any system running @w{ISO C} to
124support streams, but @nongnusystems{} may not support file descriptors at
125all, or may only implement a subset of the GNU functions that operate on
126file descriptors.  Most of the file descriptor functions in @theglibc{}
127are included in the POSIX.1 standard, however.
128
129@node File Position,  , Streams and File Descriptors, I/O Concepts
130@subsection File Position
131
132One of the attributes of an open file is its @dfn{file position} that
133keeps track of where in the file the next character is to be read or
134written.  On @gnusystems{}, and all POSIX.1 systems, the file position
135is simply an integer representing the number of bytes from the beginning
136of the file.
137
138The file position is normally set to the beginning of the file when it
139is opened, and each time a character is read or written, the file
140position is incremented.  In other words, access to the file is normally
141@dfn{sequential}.
142@cindex file position
143@cindex sequential-access files
144
145Ordinary files permit read or write operations at any position within
146the file.  Some other kinds of files may also permit this.  Files which
147do permit this are sometimes referred to as @dfn{random-access} files.
148You can change the file position using the @code{fseek} function on a
149stream (@pxref{File Positioning}) or the @code{lseek} function on a file
150descriptor (@pxref{I/O Primitives}).  If you try to change the file
151position on a file that doesn't support random access, you get the
152@code{ESPIPE} error.
153@cindex random-access files
154
155Streams and descriptors that are opened for @dfn{append access} are
156treated specially for output: output to such files is @emph{always}
157appended sequentially to the @emph{end} of the file, regardless of the
158file position.  However, the file position is still used to control where in
159the file reading is done.
160@cindex append-access files
161
162If you think about it, you'll realize that several programs can read a
163given file at the same time.  In order for each program to be able to
164read the file at its own pace, each program must have its own file
165pointer, which is not affected by anything the other programs do.
166
167In fact, each opening of a file creates a separate file position.
168Thus, if you open a file twice even in the same program, you get two
169streams or descriptors with independent file positions.
170
171By contrast, if you open a descriptor and then duplicate it to get
172another descriptor, these two descriptors share the same file position:
173changing the file position of one descriptor will affect the other.
174
175@node File Names,  , I/O Concepts, I/O Overview
176@section File Names
177
178In order to open a connection to a file, or to perform other operations
179such as deleting a file, you need some way to refer to the file.  Nearly
180all files have names that are strings---even files which are actually
181devices such as tape drives or terminals.  These strings are called
182@dfn{file names}.  You specify the file name to say which file you want
183to open or operate on.
184
185This section describes the conventions for file names and how the
186operating system works with them.
187@cindex file name
188
189@menu
190* Directories::                 Directories contain entries for files.
191* File Name Resolution::        A file name specifies how to look up a file.
192* File Name Errors::            Error conditions relating to file names.
193* File Name Portability::       File name portability and syntax issues.
194@end menu
195
196
197@node Directories, File Name Resolution,  , File Names
198@subsection Directories
199
200In order to understand the syntax of file names, you need to understand
201how the file system is organized into a hierarchy of directories.
202
203@cindex directory
204@cindex link
205@cindex directory entry
206A @dfn{directory} is a file that contains information to associate other
207files with names; these associations are called @dfn{links} or
208@dfn{directory entries}.  Sometimes, people speak of ``files in a
209directory'', but in reality, a directory only contains pointers to
210files, not the files themselves.
211
212@cindex file name component
213The name of a file contained in a directory entry is called a @dfn{file
214name component}.  In general, a file name consists of a sequence of one
215or more such components, separated by the slash character (@samp{/}).  A
216file name which is just one component names a file with respect to its
217directory.  A file name with multiple components names a directory, and
218then a file in that directory, and so on.
219
220Some other documents, such as the POSIX standard, use the term
221@dfn{pathname} for what we call a file name, and either @dfn{filename}
222or @dfn{pathname component} for what this manual calls a file name
223component.  We don't use this terminology because a ``path'' is
224something completely different (a list of directories to search), and we
225think that ``pathname'' used for something else will confuse users.  We
226always use ``file name'' and ``file name component'' (or sometimes just
227``component'', where the context is obvious) in GNU documentation.  Some
228macros use the POSIX terminology in their names, such as
229@code{PATH_MAX}.  These macros are defined by the POSIX standard, so we
230cannot change their names.
231
232You can find more detailed information about operations on directories
233in @ref{File System Interface}.
234
235@node File Name Resolution, File Name Errors, Directories, File Names
236@subsection File Name Resolution
237
238A file name consists of file name components separated by slash
239(@samp{/}) characters.  On the systems that @theglibc{} supports,
240multiple successive @samp{/} characters are equivalent to a single
241@samp{/} character.
242
243@cindex file name resolution
244The process of determining what file a file name refers to is called
245@dfn{file name resolution}.  This is performed by examining the
246components that make up a file name in left-to-right order, and locating
247each successive component in the directory named by the previous
248component.  Of course, each of the files that are referenced as
249directories must actually exist, be directories instead of regular
250files, and have the appropriate permissions to be accessible by the
251process; otherwise the file name resolution fails.
252
253@cindex root directory
254@cindex absolute file name
255If a file name begins with a @samp{/}, the first component in the file
256name is located in the @dfn{root directory} of the process (usually all
257processes on the system have the same root directory).  Such a file name
258is called an @dfn{absolute file name}.
259@c !!! xref here to chroot, if we ever document chroot. -rm
260
261@cindex relative file name
262Otherwise, the first component in the file name is located in the
263current working directory (@pxref{Working Directory}).  This kind of
264file name is called a @dfn{relative file name}.
265
266@cindex parent directory
267The file name components @file{.} (``dot'') and @file{..} (``dot-dot'')
268have special meanings.  Every directory has entries for these file name
269components.  The file name component @file{.} refers to the directory
270itself, while the file name component @file{..} refers to its
271@dfn{parent directory} (the directory that contains the link for the
272directory in question).  As a special case, @file{..} in the root
273directory refers to the root directory itself, since it has no parent;
274thus @file{/..} is the same as @file{/}.
275
276Here are some examples of file names:
277
278@table @file
279@item /a
280The file named @file{a}, in the root directory.
281
282@item /a/b
283The file named @file{b}, in the directory named @file{a} in the root directory.
284
285@item a
286The file named @file{a}, in the current working directory.
287
288@item /a/./b
289This is the same as @file{/a/b}.
290
291@item ./a
292The file named @file{a}, in the current working directory.
293
294@item ../a
295The file named @file{a}, in the parent directory of the current working
296directory.
297@end table
298
299@c An empty string may ``work'', but I think it's confusing to
300@c try to describe it.  It's not a useful thing for users to use--rms.
301A file name that names a directory may optionally end in a @samp{/}.
302You can specify a file name of @file{/} to refer to the root directory,
303but the empty string is not a meaningful file name.  If you want to
304refer to the current working directory, use a file name of @file{.} or
305@file{./}.
306
307Unlike some other operating systems, @gnusystems{} don't have any
308built-in support for file types (or extensions) or file versions as part
309of its file name syntax.  Many programs and utilities use conventions
310for file names---for example, files containing C source code usually
311have names suffixed with @samp{.c}---but there is nothing in the file
312system itself that enforces this kind of convention.
313
314@node File Name Errors, File Name Portability, File Name Resolution, File Names
315@subsection File Name Errors
316
317@cindex file name errors
318@cindex usual file name errors
319
320Functions that accept file name arguments usually detect these
321@code{errno} error conditions relating to the file name syntax or
322trouble finding the named file.  These errors are referred to throughout
323this manual as the @dfn{usual file name errors}.
324
325@table @code
326@item EACCES
327The process does not have search permission for a directory component
328of the file name.
329
330@item ENAMETOOLONG
331This error is used when either the total length of a file name is
332greater than @code{PATH_MAX}, or when an individual file name component
333has a length greater than @code{NAME_MAX}.  @xref{Limits for Files}.
334
335On @gnuhurdsystems{}, there is no imposed limit on overall file name
336length, but some file systems may place limits on the length of a
337component.
338
339@item ENOENT
340This error is reported when a file referenced as a directory component
341in the file name doesn't exist, or when a component is a symbolic link
342whose target file does not exist.  @xref{Symbolic Links}.
343
344@item ENOTDIR
345A file that is referenced as a directory component in the file name
346exists, but it isn't a directory.
347
348@item ELOOP
349Too many symbolic links were resolved while trying to look up the file
350name.  The system has an arbitrary limit on the number of symbolic links
351that may be resolved in looking up a single file name, as a primitive
352way to detect loops.  @xref{Symbolic Links}.
353@end table
354
355
356@node File Name Portability,  , File Name Errors, File Names
357@subsection Portability of File Names
358
359The rules for the syntax of file names discussed in @ref{File Names},
360are the rules normally used by @gnusystems{} and by other POSIX
361systems.  However, other operating systems may use other conventions.
362
363There are two reasons why it can be important for you to be aware of
364file name portability issues:
365
366@itemize @bullet
367@item
368If your program makes assumptions about file name syntax, or contains
369embedded literal file name strings, it is more difficult to get it to
370run under other operating systems that use different syntax conventions.
371
372@item
373Even if you are not concerned about running your program on machines
374that run other operating systems, it may still be possible to access
375files that use different naming conventions.  For example, you may be
376able to access file systems on another computer running a different
377operating system over a network, or read and write disks in formats used
378by other operating systems.
379@end itemize
380
381The @w{ISO C} standard says very little about file name syntax, only that
382file names are strings.  In addition to varying restrictions on the
383length of file names and what characters can validly appear in a file
384name, different operating systems use different conventions and syntax
385for concepts such as structured directories and file types or
386extensions.  Some concepts such as file versions might be supported in
387some operating systems and not by others.
388
389The POSIX.1 standard allows implementations to put additional
390restrictions on file name syntax, concerning what characters are
391permitted in file names and on the length of file name and file name
392component strings.  However, on @gnusystems{}, any character except
393the null character is permitted in a file name string, and
394on @gnuhurdsystems{} there are no limits on the length of file name
395strings.
396