1@node Pattern Matching, I/O Overview, Searching and Sorting, Top 2@c %MENU% Matching shell ``globs'' and regular expressions 3@chapter Pattern Matching 4 5@Theglibc{} provides pattern matching facilities for two kinds of 6patterns: regular expressions and file-name wildcards. The library also 7provides a facility for expanding variable and command references and 8parsing text into words in the way the shell does. 9 10@menu 11* Wildcard Matching:: Matching a wildcard pattern against a single string. 12* Globbing:: Finding the files that match a wildcard pattern. 13* Regular Expressions:: Matching regular expressions against strings. 14* Word Expansion:: Expanding shell variables, nested commands, 15 arithmetic, and wildcards. 16 This is what the shell does with shell commands. 17@end menu 18 19@node Wildcard Matching 20@section Wildcard Matching 21 22@pindex fnmatch.h 23This section describes how to match a wildcard pattern against a 24particular string. The result is a yes or no answer: does the 25string fit the pattern or not. The symbols described here are all 26declared in @file{fnmatch.h}. 27 28@deftypefun int fnmatch (const char *@var{pattern}, const char *@var{string}, int @var{flags}) 29@standards{POSIX.2, fnmatch.h} 30@safety{@prelim{}@mtsafe{@mtsenv{} @mtslocale{}}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} 31@c fnmatch @mtsenv @mtslocale @ascuheap @acsmem 32@c strnlen dup ok 33@c mbsrtowcs 34@c memset dup ok 35@c malloc dup @ascuheap @acsmem 36@c mbsinit dup ok 37@c free dup @ascuheap @acsmem 38@c FCT = internal_fnwmatch @mtsenv @mtslocale @ascuheap @acsmem 39@c FOLD @mtslocale 40@c towlower @mtslocale 41@c EXT @mtsenv @mtslocale @ascuheap @acsmem 42@c STRLEN = wcslen dup ok 43@c getenv @mtsenv 44@c malloc dup @ascuheap @acsmem 45@c MEMPCPY = wmempcpy dup ok 46@c FCT dup @mtsenv @mtslocale @ascuheap @acsmem 47@c STRCAT = wcscat dup ok 48@c free dup @ascuheap @acsmem 49@c END @mtsenv 50@c getenv @mtsenv 51@c MEMCHR = wmemchr dup ok 52@c getenv @mtsenv 53@c IS_CHAR_CLASS = is_char_class @mtslocale 54@c wctype @mtslocale 55@c BTOWC ok 56@c ISWCTYPE ok 57@c auto findidx dup ok 58@c elem_hash dup ok 59@c memcmp dup ok 60@c collseq_table_lookup dup ok 61@c NO_LEADING_PERIOD ok 62This function tests whether the string @var{string} matches the pattern 63@var{pattern}. It returns @code{0} if they do match; otherwise, it 64returns the nonzero value @code{FNM_NOMATCH}. The arguments 65@var{pattern} and @var{string} are both strings. 66 67The argument @var{flags} is a combination of flag bits that alter the 68details of matching. See below for a list of the defined flags. 69 70In @theglibc{}, @code{fnmatch} might sometimes report ``errors'' by 71returning nonzero values that are not equal to @code{FNM_NOMATCH}. 72@end deftypefun 73 74These are the available flags for the @var{flags} argument: 75 76@vtable @code 77@item FNM_FILE_NAME 78@standards{GNU, fnmatch.h} 79Treat the @samp{/} character specially, for matching file names. If 80this flag is set, wildcard constructs in @var{pattern} cannot match 81@samp{/} in @var{string}. Thus, the only way to match @samp{/} is with 82an explicit @samp{/} in @var{pattern}. 83 84@item FNM_PATHNAME 85@standards{POSIX.2, fnmatch.h} 86This is an alias for @code{FNM_FILE_NAME}; it comes from POSIX.2. We 87don't recommend this name because we don't use the term ``pathname'' for 88file names. 89 90@item FNM_PERIOD 91@standards{POSIX.2, fnmatch.h} 92Treat the @samp{.} character specially if it appears at the beginning of 93@var{string}. If this flag is set, wildcard constructs in @var{pattern} 94cannot match @samp{.} as the first character of @var{string}. 95 96If you set both @code{FNM_PERIOD} and @code{FNM_FILE_NAME}, then the 97special treatment applies to @samp{.} following @samp{/} as well as to 98@samp{.} at the beginning of @var{string}. (The shell uses the 99@code{FNM_PERIOD} and @code{FNM_FILE_NAME} flags together for matching 100file names.) 101 102@item FNM_NOESCAPE 103@standards{POSIX.2, fnmatch.h} 104Don't treat the @samp{\} character specially in patterns. Normally, 105@samp{\} quotes the following character, turning off its special meaning 106(if any) so that it matches only itself. When quoting is enabled, the 107pattern @samp{\?} matches only the string @samp{?}, because the question 108mark in the pattern acts like an ordinary character. 109 110If you use @code{FNM_NOESCAPE}, then @samp{\} is an ordinary character. 111 112@item FNM_LEADING_DIR 113@standards{GNU, fnmatch.h} 114Ignore a trailing sequence of characters starting with a @samp{/} in 115@var{string}; that is to say, test whether @var{string} starts with a 116directory name that @var{pattern} matches. 117 118If this flag is set, either @samp{foo*} or @samp{foobar} as a pattern 119would match the string @samp{foobar/frobozz}. 120 121@item FNM_CASEFOLD 122@standards{GNU, fnmatch.h} 123Ignore case in comparing @var{string} to @var{pattern}. 124 125@item FNM_EXTMATCH 126@standards{GNU, fnmatch.h} 127@cindex Korn Shell 128@pindex ksh 129Besides the normal patterns, also recognize the extended patterns 130introduced in @file{ksh}. The patterns are written in the form 131explained in the following table where @var{pattern-list} is a @code{|} 132separated list of patterns. 133 134@table @code 135@item ?(@var{pattern-list}) 136The pattern matches if zero or one occurrences of any of the patterns 137in the @var{pattern-list} allow matching the input string. 138 139@item *(@var{pattern-list}) 140The pattern matches if zero or more occurrences of any of the patterns 141in the @var{pattern-list} allow matching the input string. 142 143@item +(@var{pattern-list}) 144The pattern matches if one or more occurrences of any of the patterns 145in the @var{pattern-list} allow matching the input string. 146 147@item @@(@var{pattern-list}) 148The pattern matches if exactly one occurrence of any of the patterns in 149the @var{pattern-list} allows matching the input string. 150 151@item !(@var{pattern-list}) 152The pattern matches if the input string cannot be matched with any of 153the patterns in the @var{pattern-list}. 154@end table 155@end vtable 156 157@node Globbing 158@section Globbing 159 160@cindex globbing 161The archetypal use of wildcards is for matching against the files in a 162directory, and making a list of all the matches. This is called 163@dfn{globbing}. 164 165You could do this using @code{fnmatch}, by reading the directory entries 166one by one and testing each one with @code{fnmatch}. But that would be 167slow (and complex, since you would have to handle subdirectories by 168hand). 169 170The library provides a function @code{glob} to make this particular use 171of wildcards convenient. @code{glob} and the other symbols in this 172section are declared in @file{glob.h}. 173 174@menu 175* Calling Glob:: Basic use of @code{glob}. 176* Flags for Globbing:: Flags that enable various options in @code{glob}. 177* More Flags for Globbing:: GNU specific extensions to @code{glob}. 178@end menu 179 180@node Calling Glob 181@subsection Calling @code{glob} 182 183The result of globbing is a vector of file names (strings). To return 184this vector, @code{glob} uses a special data type, @code{glob_t}, which 185is a structure. You pass @code{glob} the address of the structure, and 186it fills in the structure's fields to tell you about the results. 187 188@deftp {Data Type} glob_t 189@standards{POSIX.2, glob.h} 190This data type holds a pointer to a word vector. More precisely, it 191records both the address of the word vector and its size. The GNU 192implementation contains some more fields which are non-standard 193extensions. 194 195@table @code 196@item gl_pathc 197The number of elements in the vector, excluding the initial null entries 198if the GLOB_DOOFFS flag is used (see gl_offs below). 199 200@item gl_pathv 201The address of the vector. This field has type @w{@code{char **}}. 202 203@item gl_offs 204The offset of the first real element of the vector, from its nominal 205address in the @code{gl_pathv} field. Unlike the other fields, this 206is always an input to @code{glob}, rather than an output from it. 207 208If you use a nonzero offset, then that many elements at the beginning of 209the vector are left empty. (The @code{glob} function fills them with 210null pointers.) 211 212The @code{gl_offs} field is meaningful only if you use the 213@code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero 214regardless of what is in this field, and the first real element comes at 215the beginning of the vector. 216 217@item gl_closedir 218The address of an alternative implementation of the @code{closedir} 219function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in 220the flag parameter. The type of this field is 221@w{@code{void (*) (void *)}}. 222 223This is a GNU extension. 224 225@item gl_readdir 226The address of an alternative implementation of the @code{readdir} 227function used to read the contents of a directory. It is used if the 228@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of 229this field is @w{@code{struct dirent *(*) (void *)}}. 230 231An implementation of @code{gl_readdir} needs to initialize the following 232members of the @code{struct dirent} object: 233 234@table @code 235@item d_type 236This member should be set to the file type of the entry if it is known. 237Otherwise, the value @code{DT_UNKNOWN} can be used. The @code{glob} 238function may use the specified file type to avoid callbacks in cases 239where the file type indicates that the data is not required. 240 241@item d_ino 242This member needs to be non-zero, otherwise @code{glob} may skip the 243current entry and call the @code{gl_readdir} callback function again to 244retrieve another entry. 245 246@item d_name 247This member must be set to the name of the entry. It must be 248null-terminated. 249@end table 250 251The example below shows how to allocate a @code{struct dirent} object 252containing a given name. 253 254@smallexample 255@include mkdirent.c.texi 256@end smallexample 257 258The @code{glob} function reads the @code{struct dirent} members listed 259above and makes a copy of the file name in the @code{d_name} member 260immediately after the @code{gl_readdir} callback function returns. 261Future invocations of any of the callback functions may dealloacte or 262reuse the buffer. It is the responsibility of the caller of the 263@code{glob} function to allocate and deallocate the buffer, around the 264call to @code{glob} or using the callback functions. For example, an 265application could allocate the buffer in the @code{gl_readdir} callback 266function, and deallocate it in the @code{gl_closedir} callback function. 267 268The @code{gl_readdir} member is a GNU extension. 269 270@item gl_opendir 271The address of an alternative implementation of the @code{opendir} 272function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in 273the flag parameter. The type of this field is 274@w{@code{void *(*) (const char *)}}. 275 276This is a GNU extension. 277 278@item gl_stat 279The address of an alternative implementation of the @code{stat} function 280to get information about an object in the filesystem. It is used if the 281@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of 282this field is @w{@code{int (*) (const char *, struct stat *)}}. 283 284This is a GNU extension. 285 286@item gl_lstat 287The address of an alternative implementation of the @code{lstat} 288function to get information about an object in the filesystems, not 289following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit 290is set in the flag parameter. The type of this field is @code{@w{int 291(*) (const char *,} @w{struct stat *)}}. 292 293This is a GNU extension. 294 295@item gl_flags 296The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} 297might be set. See @ref{Flags for Globbing} for more details. 298 299This is a GNU extension. 300@end table 301@end deftp 302 303For use in the @code{glob64} function @file{glob.h} contains another 304definition for a very similar type. @code{glob64_t} differs from 305@code{glob_t} only in the types of the members @code{gl_readdir}, 306@code{gl_stat}, and @code{gl_lstat}. 307 308@deftp {Data Type} glob64_t 309@standards{GNU, glob.h} 310This data type holds a pointer to a word vector. More precisely, it 311records both the address of the word vector and its size. The GNU 312implementation contains some more fields which are non-standard 313extensions. 314 315@table @code 316@item gl_pathc 317The number of elements in the vector, excluding the initial null entries 318if the GLOB_DOOFFS flag is used (see gl_offs below). 319 320@item gl_pathv 321The address of the vector. This field has type @w{@code{char **}}. 322 323@item gl_offs 324The offset of the first real element of the vector, from its nominal 325address in the @code{gl_pathv} field. Unlike the other fields, this 326is always an input to @code{glob}, rather than an output from it. 327 328If you use a nonzero offset, then that many elements at the beginning of 329the vector are left empty. (The @code{glob} function fills them with 330null pointers.) 331 332The @code{gl_offs} field is meaningful only if you use the 333@code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero 334regardless of what is in this field, and the first real element comes at 335the beginning of the vector. 336 337@item gl_closedir 338The address of an alternative implementation of the @code{closedir} 339function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in 340the flag parameter. The type of this field is 341@w{@code{void (*) (void *)}}. 342 343This is a GNU extension. 344 345@item gl_readdir 346The address of an alternative implementation of the @code{readdir64} 347function used to read the contents of a directory. It is used if the 348@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of 349this field is @w{@code{struct dirent64 *(*) (void *)}}. 350 351This is a GNU extension. 352 353@item gl_opendir 354The address of an alternative implementation of the @code{opendir} 355function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in 356the flag parameter. The type of this field is 357@w{@code{void *(*) (const char *)}}. 358 359This is a GNU extension. 360 361@item gl_stat 362The address of an alternative implementation of the @code{stat64} function 363to get information about an object in the filesystem. It is used if the 364@code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of 365this field is @w{@code{int (*) (const char *, struct stat64 *)}}. 366 367This is a GNU extension. 368 369@item gl_lstat 370The address of an alternative implementation of the @code{lstat64} 371function to get information about an object in the filesystems, not 372following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit 373is set in the flag parameter. The type of this field is @code{@w{int 374(*) (const char *,} @w{struct stat64 *)}}. 375 376This is a GNU extension. 377 378@item gl_flags 379The flags used when @code{glob} was called. In addition, @code{GLOB_MAGCHAR} 380might be set. See @ref{Flags for Globbing} for more details. 381 382This is a GNU extension. 383@end table 384@end deftp 385 386@deftypefun int glob (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob_t *@var{vector-ptr}) 387@standards{POSIX.2, glob.h} 388@safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} 389@c glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @asucorrupt @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 390@c strlen dup ok 391@c strchr dup ok 392@c malloc dup @ascuheap @acsmem 393@c mempcpy dup ok 394@c next_brace_sub ok 395@c free dup @ascuheap @acsmem 396@c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem 397@c glob_pattern_p ok 398@c glob_pattern_type dup ok 399@c getenv dup @mtsenv 400@c GET_LOGIN_NAME_MAX ok 401@c getlogin_r dup @mtasurace:utent @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 402@c GETPW_R_SIZE_MAX ok 403@c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 404@c realloc dup @ascuheap @acsmem 405@c memcpy dup ok 406@c memchr dup ok 407@c *pglob->gl_stat user-supplied 408@c stat64 dup ok 409@c S_ISDIR dup ok 410@c strdup dup @ascuheap @acsmem 411@c glob_pattern_type ok 412@c glob_in_dir @mtsenv @mtslocale @asucorrupt @ascuheap @acucorrupt @acsfd @acsmem 413@c strlen dup ok 414@c glob_pattern_type dup ok 415@c malloc dup @ascuheap @acsmem 416@c mempcpy dup ok 417@c *pglob->gl_stat user-supplied 418@c stat64 dup ok 419@c free dup @ascuheap @acsmem 420@c *pglob->gl_opendir user-supplied 421@c opendir dup @ascuheap @acsmem @acsfd 422@c dirfd dup ok 423@c *pglob->gl_readdir user-supplied 424@c CONVERT_DIRENT_DIRENT64 ok 425@c readdir64 ok [protected by exclusive use of the stream] 426@c REAL_DIR_ENTRY ok 427@c DIRENT_MIGHT_BE_DIR ok 428@c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem 429@c DIRENT_MIGHT_BE_SYMLINK ok 430@c link_exists_p ok 431@c link_exists2_p ok 432@c strlen dup ok 433@c mempcpy dup ok 434@c *pglob->gl_stat user-supplied 435@c fxstatat64 dup ok 436@c realloc dup @ascuheap @acsmem 437@c pglob->gl_closedir user-supplied 438@c closedir @ascuheap @acsmem @acsfd 439@c prefix_array dup @asucorrupt @ascuheap @acucorrupt @acsmem 440@c strlen dup ok 441@c malloc dup @ascuheap @acsmem 442@c free dup @ascuheap @acsmem 443@c mempcpy dup ok 444@c strcpy dup ok 445The function @code{glob} does globbing using the pattern @var{pattern} 446in the current directory. It puts the result in a newly allocated 447vector, and stores the size and address of this vector into 448@code{*@var{vector-ptr}}. The argument @var{flags} is a combination of 449bit flags; see @ref{Flags for Globbing}, for details of the flags. 450 451The result of globbing is a sequence of file names. The function 452@code{glob} allocates a string for each resulting word, then 453allocates a vector of type @code{char **} to store the addresses of 454these strings. The last element of the vector is a null pointer. 455This vector is called the @dfn{word vector}. 456 457To return this vector, @code{glob} stores both its address and its 458length (number of elements, not counting the terminating null pointer) 459into @code{*@var{vector-ptr}}. 460 461Normally, @code{glob} sorts the file names alphabetically before 462returning them. You can turn this off with the flag @code{GLOB_NOSORT} 463if you want to get the information as fast as possible. Usually it's 464a good idea to let @code{glob} sort them---if you process the files in 465alphabetical order, the users will have a feel for the rate of progress 466that your application is making. 467 468If @code{glob} succeeds, it returns 0. Otherwise, it returns one 469of these error codes: 470 471@vtable @code 472@item GLOB_ABORTED 473@standards{POSIX.2, glob.h} 474There was an error opening a directory, and you used the flag 475@code{GLOB_ERR} or your specified @var{errfunc} returned a nonzero 476value. 477@iftex 478See below 479@end iftex 480@ifinfo 481@xref{Flags for Globbing}, 482@end ifinfo 483for an explanation of the @code{GLOB_ERR} flag and @var{errfunc}. 484 485@item GLOB_NOMATCH 486@standards{POSIX.2, glob.h} 487The pattern didn't match any existing files. If you use the 488@code{GLOB_NOCHECK} flag, then you never get this error code, because 489that flag tells @code{glob} to @emph{pretend} that the pattern matched 490at least one file. 491 492@item GLOB_NOSPACE 493@standards{POSIX.2, glob.h} 494It was impossible to allocate memory to hold the result. 495@end vtable 496 497In the event of an error, @code{glob} stores information in 498@code{*@var{vector-ptr}} about all the matches it has found so far. 499 500It is important to notice that the @code{glob} function will not fail if 501it encounters directories or files which cannot be handled without the 502LFS interfaces. The implementation of @code{glob} is supposed to use 503these functions internally. This at least is the assumption made by 504the Unix standard. The GNU extension of allowing the user to provide their 505own directory handling and @code{stat} functions complicates things a 506bit. If these callback functions are used and a large file or directory 507is encountered @code{glob} @emph{can} fail. 508@end deftypefun 509 510@deftypefun int glob64 (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob64_t *@var{vector-ptr}) 511@standards{GNU, glob.h} 512@safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @asucorrupt{} @ascuheap{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} 513@c Same code as glob, but with glob64_t #defined as glob_t. 514The @code{glob64} function was added as part of the Large File Summit 515extensions but is not part of the original LFS proposal. The reason for 516this is simple: it is not necessary. The necessity for a @code{glob64} 517function is added by the extensions of the GNU @code{glob} 518implementation which allows the user to provide their own directory handling 519and @code{stat} functions. The @code{readdir} and @code{stat} functions 520do depend on the choice of @code{_FILE_OFFSET_BITS} since the definition 521of the types @code{struct dirent} and @code{struct stat} will change 522depending on the choice. 523 524Besides this difference, @code{glob64} works just like @code{glob} in 525all aspects. 526 527This function is a GNU extension. 528@end deftypefun 529 530@node Flags for Globbing 531@subsection Flags for Globbing 532 533This section describes the standard flags that you can specify in the 534@var{flags} argument to @code{glob}. Choose the flags you want, 535and combine them with the C bitwise OR operator @code{|}. 536 537Note that there are @ref{More Flags for Globbing} available as GNU extensions. 538 539@vtable @code 540@item GLOB_APPEND 541@standards{POSIX.2, glob.h} 542Append the words from this expansion to the vector of words produced by 543previous calls to @code{glob}. This way you can effectively expand 544several words as if they were concatenated with spaces between them. 545 546In order for appending to work, you must not modify the contents of the 547word vector structure between calls to @code{glob}. And, if you set 548@code{GLOB_DOOFFS} in the first call to @code{glob}, you must also 549set it when you append to the results. 550 551Note that the pointer stored in @code{gl_pathv} may no longer be valid 552after you call @code{glob} the second time, because @code{glob} might 553have relocated the vector. So always fetch @code{gl_pathv} from the 554@code{glob_t} structure after each @code{glob} call; @strong{never} save 555the pointer across calls. 556 557@item GLOB_DOOFFS 558@standards{POSIX.2, glob.h} 559Leave blank slots at the beginning of the vector of words. 560The @code{gl_offs} field says how many slots to leave. 561The blank slots contain null pointers. 562 563@item GLOB_ERR 564@standards{POSIX.2, glob.h} 565Give up right away and report an error if there is any difficulty 566reading the directories that must be read in order to expand @var{pattern} 567fully. Such difficulties might include a directory in which you don't 568have the requisite access. Normally, @code{glob} tries its best to keep 569on going despite any errors, reading whatever directories it can. 570 571You can exercise even more control than this by specifying an 572error-handler function @var{errfunc} when you call @code{glob}. If 573@var{errfunc} is not a null pointer, then @code{glob} doesn't give up 574right away when it can't read a directory; instead, it calls 575@var{errfunc} with two arguments, like this: 576 577@smallexample 578(*@var{errfunc}) (@var{filename}, @var{error-code}) 579@end smallexample 580 581@noindent 582The argument @var{filename} is the name of the directory that 583@code{glob} couldn't open or couldn't read, and @var{error-code} is the 584@code{errno} value that was reported to @code{glob}. 585 586If the error handler function returns nonzero, then @code{glob} gives up 587right away. Otherwise, it continues. 588 589@item GLOB_MARK 590@standards{POSIX.2, glob.h} 591If the pattern matches the name of a directory, append @samp{/} to the 592directory's name when returning it. 593 594@item GLOB_NOCHECK 595@standards{POSIX.2, glob.h} 596If the pattern doesn't match any file names, return the pattern itself 597as if it were a file name that had been matched. (Normally, when the 598pattern doesn't match anything, @code{glob} returns that there were no 599matches.) 600 601@item GLOB_NOESCAPE 602@standards{POSIX.2, glob.h} 603Don't treat the @samp{\} character specially in patterns. Normally, 604@samp{\} quotes the following character, turning off its special meaning 605(if any) so that it matches only itself. When quoting is enabled, the 606pattern @samp{\?} matches only the string @samp{?}, because the question 607mark in the pattern acts like an ordinary character. 608 609If you use @code{GLOB_NOESCAPE}, then @samp{\} is an ordinary character. 610 611@code{glob} does its work by calling the function @code{fnmatch} 612repeatedly. It handles the flag @code{GLOB_NOESCAPE} by turning on the 613@code{FNM_NOESCAPE} flag in calls to @code{fnmatch}. 614 615@item GLOB_NOSORT 616@standards{POSIX.2, glob.h} 617Don't sort the file names; return them in no particular order. 618(In practice, the order will depend on the order of the entries in 619the directory.) The only reason @emph{not} to sort is to save time. 620@end vtable 621 622@node More Flags for Globbing 623@subsection More Flags for Globbing 624 625Beside the flags described in the last section, the GNU implementation of 626@code{glob} allows a few more flags which are also defined in the 627@file{glob.h} file. Some of the extensions implement functionality 628which is available in modern shell implementations. 629 630@vtable @code 631@item GLOB_PERIOD 632@standards{GNU, glob.h} 633The @code{.} character (period) is treated special. It cannot be 634matched by wildcards. @xref{Wildcard Matching}, @code{FNM_PERIOD}. 635 636@item GLOB_MAGCHAR 637@standards{GNU, glob.h} 638The @code{GLOB_MAGCHAR} value is not to be given to @code{glob} in the 639@var{flags} parameter. Instead, @code{glob} sets this bit in the 640@var{gl_flags} element of the @var{glob_t} structure provided as the 641result if the pattern used for matching contains any wildcard character. 642 643@item GLOB_ALTDIRFUNC 644@standards{GNU, glob.h} 645Instead of using the normal functions for accessing the 646filesystem the @code{glob} implementation uses the user-supplied 647functions specified in the structure pointed to by @var{pglob} 648parameter. For more information about the functions refer to the 649sections about directory handling see @ref{Accessing Directories}, and 650@ref{Reading Attributes}. 651 652@item GLOB_BRACE 653@standards{GNU, glob.h} 654If this flag is given, the handling of braces in the pattern is changed. 655It is now required that braces appear correctly grouped. I.e., for each 656opening brace there must be a closing one. Braces can be used 657recursively. So it is possible to define one brace expression in 658another one. It is important to note that the range of each brace 659expression is completely contained in the outer brace expression (if 660there is one). 661 662The string between the matching braces is separated into single 663expressions by splitting at @code{,} (comma) characters. The commas 664themselves are discarded. Please note what we said above about recursive 665brace expressions. The commas used to separate the subexpressions must 666be at the same level. Commas in brace subexpressions are not matched. 667They are used during expansion of the brace expression of the deeper 668level. The example below shows this 669 670@smallexample 671glob ("@{foo/@{,bar,biz@},baz@}", GLOB_BRACE, NULL, &result) 672@end smallexample 673 674@noindent 675is equivalent to the sequence 676 677@smallexample 678glob ("foo/", GLOB_BRACE, NULL, &result) 679glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result) 680glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result) 681glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result) 682@end smallexample 683 684@noindent 685if we leave aside error handling. 686 687@item GLOB_NOMAGIC 688@standards{GNU, glob.h} 689If the pattern contains no wildcard constructs (it is a literal file name), 690return it as the sole ``matching'' word, even if no file exists by that name. 691 692@item GLOB_TILDE 693@standards{GNU, glob.h} 694If this flag is used the character @code{~} (tilde) is handled specially 695if it appears at the beginning of the pattern. Instead of being taken 696verbatim it is used to represent the home directory of a known user. 697 698If @code{~} is the only character in pattern or it is followed by a 699@code{/} (slash), the home directory of the process owner is 700substituted. Using @code{getlogin} and @code{getpwnam} the information 701is read from the system databases. As an example take user @code{bart} 702with his home directory at @file{/home/bart}. For him a call like 703 704@smallexample 705glob ("~/bin/*", GLOB_TILDE, NULL, &result) 706@end smallexample 707 708@noindent 709would return the contents of the directory @file{/home/bart/bin}. 710Instead of referring to the own home directory it is also possible to 711name the home directory of other users. To do so one has to append the 712user name after the tilde character. So the contents of user 713@code{homer}'s @file{bin} directory can be retrieved by 714 715@smallexample 716glob ("~homer/bin/*", GLOB_TILDE, NULL, &result) 717@end smallexample 718 719If the user name is not valid or the home directory cannot be determined 720for some reason the pattern is left untouched and itself used as the 721result. I.e., if in the last example @code{home} is not available the 722tilde expansion yields to @code{"~homer/bin/*"} and @code{glob} is not 723looking for a directory named @code{~homer}. 724 725This functionality is equivalent to what is available in C-shells if the 726@code{nonomatch} flag is set. 727 728@item GLOB_TILDE_CHECK 729@standards{GNU, glob.h} 730If this flag is used @code{glob} behaves as if @code{GLOB_TILDE} is 731given. The only difference is that if the user name is not available or 732the home directory cannot be determined for other reasons this leads to 733an error. @code{glob} will return @code{GLOB_NOMATCH} instead of using 734the pattern itself as the name. 735 736This functionality is equivalent to what is available in C-shells if 737the @code{nonomatch} flag is not set. 738 739@item GLOB_ONLYDIR 740@standards{GNU, glob.h} 741If this flag is used the globbing function takes this as a 742@strong{hint} that the caller is only interested in directories 743matching the pattern. If the information about the type of the file 744is easily available non-directories will be rejected but no extra 745work will be done to determine the information for each file. I.e., 746the caller must still be able to filter directories out. 747 748This functionality is only available with the GNU @code{glob} 749implementation. It is mainly used internally to increase the 750performance but might be useful for a user as well and therefore is 751documented here. 752@end vtable 753 754Calling @code{glob} will in most cases allocate resources which are used 755to represent the result of the function call. If the same object of 756type @code{glob_t} is used in multiple call to @code{glob} the resources 757are freed or reused so that no leaks appear. But this does not include 758the time when all @code{glob} calls are done. 759 760@deftypefun void globfree (glob_t *@var{pglob}) 761@standards{POSIX.2, glob.h} 762@safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} 763@c globfree dup @asucorrupt @ascuheap @acucorrupt @acsmem 764@c free dup @ascuheap @acsmem 765The @code{globfree} function frees all resources allocated by previous 766calls to @code{glob} associated with the object pointed to by 767@var{pglob}. This function should be called whenever the currently used 768@code{glob_t} typed object isn't used anymore. 769@end deftypefun 770 771@deftypefun void globfree64 (glob64_t *@var{pglob}) 772@standards{GNU, glob.h} 773@safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} 774This function is equivalent to @code{globfree} but it frees records of 775type @code{glob64_t} which were allocated by @code{glob64}. 776@end deftypefun 777 778 779@node Regular Expressions 780@section Regular Expression Matching 781 782@Theglibc{} supports two interfaces for matching regular 783expressions. One is the standard POSIX.2 interface, and the other is 784what @theglibc{} has had for many years. 785 786Both interfaces are declared in the header file @file{regex.h}. 787If you define @w{@code{_POSIX_C_SOURCE}}, then only the POSIX.2 788functions, structures, and constants are declared. 789@c !!! we only document the POSIX.2 interface here!! 790 791@menu 792* POSIX Regexp Compilation:: Using @code{regcomp} to prepare to match. 793* Flags for POSIX Regexps:: Syntax variations for @code{regcomp}. 794* Matching POSIX Regexps:: Using @code{regexec} to match the compiled 795 pattern that you get from @code{regcomp}. 796* Regexp Subexpressions:: Finding which parts of the string were matched. 797* Subexpression Complications:: Find points of which parts were matched. 798* Regexp Cleanup:: Freeing storage; reporting errors. 799@end menu 800 801@node POSIX Regexp Compilation 802@subsection POSIX Regular Expression Compilation 803 804Before you can actually match a regular expression, you must 805@dfn{compile} it. This is not true compilation---it produces a special 806data structure, not machine instructions. But it is like ordinary 807compilation in that its purpose is to enable you to ``execute'' the 808pattern fast. (@xref{Matching POSIX Regexps}, for how to use the 809compiled regular expression for matching.) 810 811There is a special data type for compiled regular expressions: 812 813@deftp {Data Type} regex_t 814@standards{POSIX.2, regex.h} 815This type of object holds a compiled regular expression. 816It is actually a structure. It has just one field that your programs 817should look at: 818 819@table @code 820@item re_nsub 821This field holds the number of parenthetical subexpressions in the 822regular expression that was compiled. 823@end table 824 825There are several other fields, but we don't describe them here, because 826only the functions in the library should use them. 827@end deftp 828 829After you create a @code{regex_t} object, you can compile a regular 830expression into it by calling @code{regcomp}. 831 832@deftypefun int regcomp (regex_t *restrict @var{compiled}, const char *restrict @var{pattern}, int @var{cflags}) 833@standards{POSIX.2, regex.h} 834@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} 835@c All of the issues have to do with memory allocation and multi-byte 836@c character handling present in the input string, or implied by ranges 837@c or inverted character classes. 838@c (re_)malloc @ascuheap @acsmem 839@c re_compile_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 840@c (re_)realloc @ascuheap @acsmem [no @asucorrupt @acucorrupt for we zero the buffer] 841@c init_dfa @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 842@c (re_)malloc @ascuheap @acsmem 843@c calloc @ascuheap @acsmem 844@c _NL_CURRENT ok 845@c _NL_CURRENT_WORD ok 846@c btowc @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 847@c libc_lock_init ok 848@c re_string_construct @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 849@c re_string_construct_common ok 850@c re_string_realloc_buffers @ascuheap @acsmem 851@c (re_)realloc dup @ascuheap @acsmem 852@c build_wcs_upper_buffer @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 853@c isascii ok 854@c mbsinit ok 855@c toupper ok 856@c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 857@c iswlower @mtslocale 858@c towupper @mtslocale 859@c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 860@c (re_)malloc dup @ascuheap @acsmem 861@c build_upper_buffer ok (@mtslocale but optimized) 862@c islower ok 863@c toupper ok 864@c build_wcs_buffer @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 865@c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 866@c re_string_translate_buffer ok 867@c parse @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 868@c fetch_token @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 869@c peek_token @mtslocale 870@c re_string_eoi ok 871@c re_string_peek_byte ok 872@c re_string_cur_idx ok 873@c re_string_length ok 874@c re_string_peek_byte_case @mtslocale 875@c re_string_peek_byte dup ok 876@c re_string_is_single_byte_char ok 877@c isascii ok 878@c re_string_peek_byte dup ok 879@c re_string_wchar_at ok 880@c re_string_skip_bytes ok 881@c re_string_skip_bytes dup ok 882@c parse_reg_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 883@c parse_branch @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 884@c parse_expression @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 885@c create_token_tree dup @ascuheap @acsmem 886@c re_string_eoi dup ok 887@c re_string_first_byte ok 888@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 889@c create_tree dup @ascuheap @acsmem 890@c parse_sub_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 891@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 892@c parse_reg_exp dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 893@c postorder() @ascuheap @acsmem 894@c free_tree @ascuheap @acsmem 895@c free_token dup @ascuheap @acsmem 896@c create_tree dup @ascuheap @acsmem 897@c parse_bracket_exp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 898@c _NL_CURRENT dup ok 899@c _NL_CURRENT_WORD dup ok 900@c calloc dup @ascuheap @acsmem 901@c (re_)free dup @ascuheap @acsmem 902@c peek_token_bracket ok 903@c re_string_eoi dup ok 904@c re_string_peek_byte dup ok 905@c re_string_first_byte dup ok 906@c re_string_cur_idx dup ok 907@c re_string_length dup ok 908@c re_string_skip_bytes dup ok 909@c bitset_set ok 910@c re_string_skip_bytes ok 911@c parse_bracket_element @mtslocale 912@c re_string_char_size_at ok 913@c re_string_wchar_at dup ok 914@c re_string_skip_bytes dup ok 915@c parse_bracket_symbol @mtslocale 916@c re_string_eoi dup ok 917@c re_string_fetch_byte_case @mtslocale 918@c re_string_fetch_byte ok 919@c re_string_first_byte dup ok 920@c isascii ok 921@c re_string_char_size_at dup ok 922@c re_string_skip_bytes dup ok 923@c re_string_fetch_byte dup ok 924@c re_string_peek_byte dup ok 925@c re_string_skip_bytes dup ok 926@c peek_token_bracket dup ok 927@c auto build_range_exp @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 928@c auto lookup_collation_sequence_value @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 929@c btowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 930@c collseq_table_lookup ok 931@c auto seek_collating_symbol_entry dup ok 932@c (re_)realloc dup @ascuheap @acsmem 933@c collseq_table_lookup dup ok 934@c bitset_set dup ok 935@c (re_)realloc dup @ascuheap @acsmem 936@c build_equiv_class @mtslocale @ascuheap @acsmem 937@c _NL_CURRENT ok 938@c auto findidx ok 939@c bitset_set dup ok 940@c (re_)realloc dup @ascuheap @acsmem 941@c auto build_collating_symbol @ascuheap @acsmem 942@c auto seek_collating_symbol_entry ok 943@c bitset_set dup ok 944@c (re_)realloc dup @ascuheap @acsmem 945@c build_charclass @mtslocale @ascuheap @acsmem 946@c (re_)realloc dup @ascuheap @acsmem 947@c bitset_set dup ok 948@c isalnum ok 949@c iscntrl ok 950@c isspace ok 951@c isalpha ok 952@c isdigit ok 953@c isprint ok 954@c isupper ok 955@c isblank ok 956@c isgraph ok 957@c ispunct ok 958@c isxdigit ok 959@c bitset_not ok 960@c bitset_mask ok 961@c create_token_tree dup @ascuheap @acsmem 962@c create_tree dup @ascuheap @acsmem 963@c free_charset dup @ascuheap @acsmem 964@c init_word_char @mtslocale 965@c isalnum ok 966@c build_charclass_op @mtslocale @ascuheap @acsmem 967@c calloc dup @ascuheap @acsmem 968@c build_charclass dup @mtslocale @ascuheap @acsmem 969@c (re_)free dup @ascuheap @acsmem 970@c free_charset dup @ascuheap @acsmem 971@c bitset_set dup ok 972@c bitset_not dup ok 973@c bitset_mask dup ok 974@c create_token_tree dup @ascuheap @acsmem 975@c create_tree dup @ascuheap @acsmem 976@c parse_dup_op @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 977@c re_string_cur_idx dup ok 978@c fetch_number @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 979@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 980@c re_string_set_index ok 981@c postorder() @ascuheap @acsmem 982@c free_tree dup @ascuheap @acsmem 983@c mark_opt_subexp ok 984@c duplicate_tree @ascuheap @acsmem 985@c create_token_tree dup @ascuheap @acsmem 986@c create_tree dup @ascuheap @acsmem 987@c postorder() @ascuheap @acsmem 988@c free_tree dup @ascuheap @acsmem 989@c fetch_token dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 990@c parse_branch dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 991@c create_tree dup @ascuheap @acsmem 992@c create_tree @ascuheap @acsmem 993@c create_token_tree @ascuheap @acsmem 994@c (re_)malloc dup @ascuheap @acsmem 995@c analyze @ascuheap @acsmem 996@c (re_)malloc dup @ascuheap @acsmem 997@c preorder() @ascuheap @acsmem 998@c optimize_subexps ok 999@c calc_next ok 1000@c link_nfa_nodes @ascuheap @acsmem 1001@c re_node_set_init_1 @ascuheap @acsmem 1002@c (re_)malloc dup @ascuheap @acsmem 1003@c re_node_set_init_2 @ascuheap @acsmem 1004@c (re_)malloc dup @ascuheap @acsmem 1005@c postorder() @ascuheap @acsmem 1006@c lower_subexps @ascuheap @acsmem 1007@c lower_subexp @ascuheap @acsmem 1008@c create_tree dup @ascuheap @acsmem 1009@c calc_first @ascuheap @acsmem 1010@c re_dfa_add_node @ascuheap @acsmem 1011@c (re_)realloc dup @ascuheap @acsmem 1012@c re_node_set_init_empty ok 1013@c calc_eclosure @ascuheap @acsmem 1014@c calc_eclosure_iter @ascuheap @acsmem 1015@c re_node_set_alloc @ascuheap @acsmem 1016@c (re_)malloc dup @ascuheap @acsmem 1017@c duplicate_node_closure @ascuheap @acsmem 1018@c re_node_set_empty ok 1019@c duplicate_node @ascuheap @acsmem 1020@c re_dfa_add_node dup @ascuheap @acsmem 1021@c re_node_set_insert @ascuheap @acsmem 1022@c (re_)realloc dup @ascuheap @acsmem 1023@c search_duplicated_node ok 1024@c re_node_set_merge @ascuheap @acsmem 1025@c (re_)realloc dup @ascuheap @acsmem 1026@c re_node_set_free @ascuheap @acsmem 1027@c (re_)free dup @ascuheap @acsmem 1028@c re_node_set_insert dup @ascuheap @acsmem 1029@c re_node_set_free dup @ascuheap @acsmem 1030@c calc_inveclosure @ascuheap @acsmem 1031@c re_node_set_init_empty dup ok 1032@c re_node_set_insert_last @ascuheap @acsmem 1033@c (re_)realloc dup @ascuheap @acsmem 1034@c optimize_utf8 ok 1035@c create_initial_state @ascuheap @acsmem 1036@c re_node_set_init_copy @ascuheap @acsmem 1037@c (re_)malloc dup @ascuheap @acsmem 1038@c re_node_set_init_empty dup ok 1039@c re_node_set_contains ok 1040@c re_node_set_merge dup @ascuheap @acsmem 1041@c re_acquire_state_context @ascuheap @acsmem 1042@c calc_state_hash ok 1043@c re_node_set_compare ok 1044@c create_cd_newstate @ascuheap @acsmem 1045@c calloc dup @ascuheap @acsmem 1046@c re_node_set_init_copy dup @ascuheap @acsmem 1047@c (re_)free dup @ascuheap @acsmem 1048@c free_state @ascuheap @acsmem 1049@c re_node_set_free dup @ascuheap @acsmem 1050@c (re_)free dup @ascuheap @acsmem 1051@c NOT_SATISFY_PREV_CONSTRAINT ok 1052@c re_node_set_remove_at ok 1053@c register_state @ascuheap @acsmem 1054@c re_node_set_alloc dup @ascuheap @acsmem 1055@c re_node_set_insert_last dup @ascuheap @acsmem 1056@c (re_)realloc dup @ascuheap @acsmem 1057@c re_node_set_free dup @ascuheap @acsmem 1058@c free_workarea_compile @ascuheap @acsmem 1059@c (re_)free dup @ascuheap @acsmem 1060@c re_string_destruct @ascuheap @acsmem 1061@c (re_)free dup @ascuheap @acsmem 1062@c free_dfa_content @ascuheap @acsmem 1063@c free_token @ascuheap @acsmem 1064@c free_charset @ascuheap @acsmem 1065@c (re_)free dup @ascuheap @acsmem 1066@c (re_)free dup @ascuheap @acsmem 1067@c (re_)free dup @ascuheap @acsmem 1068@c re_node_set_free dup @ascuheap @acsmem 1069@c re_compile_fastmap @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1070@c re_compile_fastmap_iter @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1071@c re_set_fastmap ok 1072@c tolower ok 1073@c mbrtowc dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1074@c wcrtomb dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1075@c towlower @mtslocale 1076@c _NL_CURRENT ok 1077@c (re_)free @ascuheap @acsmem 1078The function @code{regcomp} ``compiles'' a regular expression into a 1079data structure that you can use with @code{regexec} to match against a 1080string. The compiled regular expression format is designed for 1081efficient matching. @code{regcomp} stores it into @code{*@var{compiled}}. 1082 1083It's up to you to allocate an object of type @code{regex_t} and pass its 1084address to @code{regcomp}. 1085 1086The argument @var{cflags} lets you specify various options that control 1087the syntax and semantics of regular expressions. @xref{Flags for POSIX 1088Regexps}. 1089 1090If you use the flag @code{REG_NOSUB}, then @code{regcomp} omits from 1091the compiled regular expression the information necessary to record 1092how subexpressions actually match. In this case, you might as well 1093pass @code{0} for the @var{matchptr} and @var{nmatch} arguments when 1094you call @code{regexec}. 1095 1096If you don't use @code{REG_NOSUB}, then the compiled regular expression 1097does have the capacity to record how subexpressions match. Also, 1098@code{regcomp} tells you how many subexpressions @var{pattern} has, by 1099storing the number in @code{@var{compiled}->re_nsub}. You can use that 1100value to decide how long an array to allocate to hold information about 1101subexpression matches. 1102 1103@code{regcomp} returns @code{0} if it succeeds in compiling the regular 1104expression; otherwise, it returns a nonzero error code (see the table 1105below). You can use @code{regerror} to produce an error message string 1106describing the reason for a nonzero value; see @ref{Regexp Cleanup}. 1107 1108@end deftypefun 1109 1110Here are the possible nonzero values that @code{regcomp} can return: 1111 1112@vtable @code 1113@item REG_BADBR 1114@standards{POSIX.2, regex.h} 1115There was an invalid @samp{\@{@dots{}\@}} construct in the regular 1116expression. A valid @samp{\@{@dots{}\@}} construct must contain either 1117a single number, or two numbers in increasing order separated by a 1118comma. 1119 1120@item REG_BADPAT 1121@standards{POSIX.2, regex.h} 1122There was a syntax error in the regular expression. 1123 1124@item REG_BADRPT 1125@standards{POSIX.2, regex.h} 1126A repetition operator such as @samp{?} or @samp{*} appeared in a bad 1127position (with no preceding subexpression to act on). 1128 1129@item REG_ECOLLATE 1130@standards{POSIX.2, regex.h} 1131The regular expression referred to an invalid collating element (one not 1132defined in the current locale for string collation). @xref{Locale 1133Categories}. 1134 1135@item REG_ECTYPE 1136@standards{POSIX.2, regex.h} 1137The regular expression referred to an invalid character class name. 1138 1139@item REG_EESCAPE 1140@standards{POSIX.2, regex.h} 1141The regular expression ended with @samp{\}. 1142 1143@item REG_ESUBREG 1144@standards{POSIX.2, regex.h} 1145There was an invalid number in the @samp{\@var{digit}} construct. 1146 1147@item REG_EBRACK 1148@standards{POSIX.2, regex.h} 1149There were unbalanced square brackets in the regular expression. 1150 1151@item REG_EPAREN 1152@standards{POSIX.2, regex.h} 1153An extended regular expression had unbalanced parentheses, 1154or a basic regular expression had unbalanced @samp{\(} and @samp{\)}. 1155 1156@item REG_EBRACE 1157@standards{POSIX.2, regex.h} 1158The regular expression had unbalanced @samp{\@{} and @samp{\@}}. 1159 1160@item REG_ERANGE 1161@standards{POSIX.2, regex.h} 1162One of the endpoints in a range expression was invalid. 1163 1164@item REG_ESPACE 1165@standards{POSIX.2, regex.h} 1166@code{regcomp} ran out of memory. 1167@end vtable 1168 1169@node Flags for POSIX Regexps 1170@subsection Flags for POSIX Regular Expressions 1171 1172These are the bit flags that you can use in the @var{cflags} operand when 1173compiling a regular expression with @code{regcomp}. 1174 1175@vtable @code 1176@item REG_EXTENDED 1177@standards{POSIX.2, regex.h} 1178Treat the pattern as an extended regular expression, rather than as a 1179basic regular expression. 1180 1181@item REG_ICASE 1182@standards{POSIX.2, regex.h} 1183Ignore case when matching letters. 1184 1185@item REG_NOSUB 1186@standards{POSIX.2, regex.h} 1187Don't bother storing the contents of the @var{matchptr} array. 1188 1189@item REG_NEWLINE 1190@standards{POSIX.2, regex.h} 1191Treat a newline in @var{string} as dividing @var{string} into multiple 1192lines, so that @samp{$} can match before the newline and @samp{^} can 1193match after. Also, don't permit @samp{.} to match a newline, and don't 1194permit @samp{[^@dots{}]} to match a newline. 1195 1196Otherwise, newline acts like any other ordinary character. 1197@end vtable 1198 1199@node Matching POSIX Regexps 1200@subsection Matching a Compiled POSIX Regular Expression 1201 1202Once you have compiled a regular expression, as described in @ref{POSIX 1203Regexp Compilation}, you can match it against strings using 1204@code{regexec}. A match anywhere inside the string counts as success, 1205unless the regular expression contains anchor characters (@samp{^} or 1206@samp{$}). 1207 1208@deftypefun int regexec (const regex_t *restrict @var{compiled}, const char *restrict @var{string}, size_t @var{nmatch}, regmatch_t @var{matchptr}[restrict], int @var{eflags}) 1209@standards{POSIX.2, regex.h} 1210@safety{@prelim{}@mtsafe{@mtslocale{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsmem{} @acsfd{}}} 1211@c libc_lock_lock @asulock @aculock 1212@c re_search_internal @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1213@c re_string_allocate @ascuheap @acsmem 1214@c re_string_construct_common dup ok 1215@c re_string_realloc_buffers dup @ascuheap @acsmem 1216@c match_ctx_init @ascuheap @acsmem 1217@c (re_)malloc dup @ascuheap @acsmem 1218@c re_string_byte_at ok 1219@c re_string_first_byte dup ok 1220@c check_matching @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1221@c re_string_cur_idx dup ok 1222@c acquire_init_state_context dup @ascuheap @acsmem 1223@c re_string_context_at ok 1224@c re_string_byte_at dup ok 1225@c bitset_contain ok 1226@c re_acquire_state_context dup @ascuheap @acsmem 1227@c check_subexp_matching_top @ascuheap @acsmem 1228@c match_ctx_add_subtop @ascuheap @acsmem 1229@c (re_)realloc dup @ascuheap @acsmem 1230@c calloc dup @ascuheap @acsmem 1231@c transit_state_bkref @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1232@c re_string_cur_idx dup ok 1233@c re_string_context_at dup ok 1234@c NOT_SATISFY_NEXT_CONSTRAINT ok 1235@c get_subexp @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1236@c re_string_get_buffer ok 1237@c search_cur_bkref_entry ok 1238@c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1239@c extend_buffers @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1240@c re_string_realloc_buffers dup @ascuheap @acsmem 1241@c (re_)realloc dup @ascuheap @acsmem 1242@c build_wcs_upper_buffer dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1243@c build_upper_buffer dup ok (@mtslocale but optimized) 1244@c build_wcs_buffer dup @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1245@c re_string_translate_buffer dup ok 1246@c get_subexp_sub @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1247@c check_arrival @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1248@c (re_)realloc dup @ascuheap @acsmem 1249@c re_string_context_at dup ok 1250@c re_node_set_init_1 dup @ascuheap @acsmem 1251@c check_arrival_expand_ecl @ascuheap @acsmem 1252@c re_node_set_alloc dup @ascuheap @acsmem 1253@c find_subexp_node ok 1254@c re_node_set_merge dup @ascuheap @acsmem 1255@c re_node_set_free dup @ascuheap @acsmem 1256@c check_arrival_expand_ecl_sub @ascuheap @acsmem 1257@c re_node_set_contains dup ok 1258@c re_node_set_insert dup @ascuheap @acsmem 1259@c re_node_set_free dup @ascuheap @acsmem 1260@c re_node_set_init_copy dup @ascuheap @acsmem 1261@c re_node_set_init_empty dup ok 1262@c expand_bkref_cache @ascuheap @acsmem 1263@c search_cur_bkref_entry dup ok 1264@c re_node_set_contains dup ok 1265@c re_node_set_init_1 dup @ascuheap @acsmem 1266@c check_arrival_expand_ecl dup @ascuheap @acsmem 1267@c re_node_set_merge dup @ascuheap @acsmem 1268@c re_node_set_init_copy dup @ascuheap @acsmem 1269@c re_node_set_insert dup @ascuheap @acsmem 1270@c re_node_set_free dup @ascuheap @acsmem 1271@c re_acquire_state @ascuheap @acsmem 1272@c calc_state_hash dup ok 1273@c re_node_set_compare dup ok 1274@c create_ci_newstate @ascuheap @acsmem 1275@c calloc dup @ascuheap @acsmem 1276@c re_node_set_init_copy dup @ascuheap @acsmem 1277@c (re_)free dup @ascuheap @acsmem 1278@c register_state dup @ascuheap @acsmem 1279@c free_state dup @ascuheap @acsmem 1280@c re_acquire_state_context dup @ascuheap @acsmem 1281@c re_node_set_merge dup @ascuheap @acsmem 1282@c check_arrival_add_next_nodes @mtslocale @ascuheap @acsmem 1283@c re_node_set_init_empty dup ok 1284@c check_node_accept_bytes @mtslocale @ascuheap @acsmem 1285@c re_string_byte_at dup ok 1286@c re_string_char_size_at dup ok 1287@c re_string_elem_size_at @mtslocale 1288@c _NL_CURRENT_WORD dup ok 1289@c _NL_CURRENT dup ok 1290@c auto findidx dup ok 1291@c _NL_CURRENT_WORD dup ok 1292@c _NL_CURRENT dup ok 1293@c collseq_table_lookup dup ok 1294@c find_collation_sequence_value @mtslocale 1295@c _NL_CURRENT_WORD dup ok 1296@c _NL_CURRENT dup ok 1297@c auto findidx dup ok 1298@c wcscoll @mtslocale @ascuheap @acsmem 1299@c re_node_set_empty dup ok 1300@c re_node_set_merge dup @ascuheap @acsmem 1301@c re_node_set_free dup @ascuheap @acsmem 1302@c re_node_set_insert dup @ascuheap @acsmem 1303@c re_acquire_state dup @ascuheap @acsmem 1304@c check_node_accept ok 1305@c re_string_byte_at dup ok 1306@c bitset_contain dup ok 1307@c re_string_context_at dup ok 1308@c NOT_SATISFY_NEXT_CONSTRAINT dup ok 1309@c match_ctx_add_entry @ascuheap @acsmem 1310@c (re_)realloc dup @ascuheap @acsmem 1311@c (re_)free dup @ascuheap @acsmem 1312@c clean_state_log_if_needed dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1313@c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1314@c find_subexp_node dup ok 1315@c calloc dup @ascuheap @acsmem 1316@c check_arrival dup *** 1317@c match_ctx_add_sublast @ascuheap @acsmem 1318@c (re_)realloc dup @ascuheap @acsmem 1319@c re_acquire_state_context dup @ascuheap @acsmem 1320@c re_node_set_init_union @ascuheap @acsmem 1321@c (re_)malloc dup @ascuheap @acsmem 1322@c re_node_set_init_copy dup @ascuheap @acsmem 1323@c re_node_set_init_empty dup ok 1324@c re_node_set_free dup @ascuheap @acsmem 1325@c check_subexp_matching_top dup @ascuheap @acsmem 1326@c check_halt_state_context ok 1327@c re_string_context_at dup ok 1328@c check_halt_node_context ok 1329@c NOT_SATISFY_NEXT_CONSTRAINT dup ok 1330@c re_string_eoi dup ok 1331@c extend_buffers dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1332@c transit_state @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1333@c transit_state_mb @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1334@c re_string_context_at dup ok 1335@c NOT_SATISFY_NEXT_CONSTRAINT dup ok 1336@c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem 1337@c re_string_cur_idx dup ok 1338@c clean_state_log_if_needed @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1339@c re_node_set_init_union dup @ascuheap @acsmem 1340@c re_acquire_state_context dup @ascuheap @acsmem 1341@c re_string_fetch_byte dup ok 1342@c re_string_context_at dup ok 1343@c build_trtable @ascuheap @acsmem 1344@c (re_)malloc dup @ascuheap @acsmem 1345@c group_nodes_into_DFAstates @ascuheap @acsmem 1346@c bitset_empty dup ok 1347@c bitset_set dup ok 1348@c bitset_merge dup ok 1349@c bitset_set_all ok 1350@c bitset_clear ok 1351@c bitset_contain dup ok 1352@c bitset_copy ok 1353@c re_node_set_init_copy dup @ascuheap @acsmem 1354@c re_node_set_insert dup @ascuheap @acsmem 1355@c re_node_set_init_1 dup @ascuheap @acsmem 1356@c re_node_set_free dup @ascuheap @acsmem 1357@c re_node_set_alloc dup @ascuheap @acsmem 1358@c malloc dup @ascuheap @acsmem 1359@c free dup @ascuheap @acsmem 1360@c re_node_set_free dup @ascuheap @acsmem 1361@c bitset_empty ok 1362@c re_node_set_empty dup ok 1363@c re_node_set_merge dup @ascuheap @acsmem 1364@c re_acquire_state_context dup @ascuheap @acsmem 1365@c bitset_merge ok 1366@c calloc dup @ascuheap @acsmem 1367@c bitset_contain dup ok 1368@c merge_state_with_log @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1369@c re_string_cur_idx dup ok 1370@c re_node_set_init_union dup @ascuheap @acsmem 1371@c re_string_context_at dup ok 1372@c re_node_set_free dup @ascuheap @acsmem 1373@c check_subexp_matching_top @ascuheap @acsmem 1374@c match_ctx_add_subtop dup @ascuheap @acsmem 1375@c transit_state_bkref dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1376@c find_recover_state 1377@c re_string_cur_idx dup ok 1378@c re_string_skip_bytes dup ok 1379@c merge_state_with_log dup @mtslocale @asucorrupt @ascuheap @asulock @ascudlopen @acucorrupt @aculock @acsmem @acsfd 1380@c check_halt_state_context dup ok 1381@c prune_impossible_nodes @mtslocale @ascuheap @acsmem 1382@c (re_)malloc dup @ascuheap @acsmem 1383@c sift_ctx_init ok 1384@c re_node_set_init_empty dup ok 1385@c sift_states_backward @mtslocale @ascuheap @acsmem 1386@c re_node_set_init_1 dup @ascuheap @acsmem 1387@c update_cur_sifted_state @mtslocale @ascuheap @acsmem 1388@c add_epsilon_src_nodes @ascuheap @acsmem 1389@c re_acquire_state dup @ascuheap @acsmem 1390@c re_node_set_alloc dup @ascuheap @acsmem 1391@c re_node_set_merge dup @ascuheap @acsmem 1392@c re_node_set_add_intersect @ascuheap @acsmem 1393@c (re_)realloc dup @ascuheap @acsmem 1394@c check_subexp_limits @ascuheap @acsmem 1395@c sub_epsilon_src_nodes @ascuheap @acsmem 1396@c re_node_set_init_empty dup ok 1397@c re_node_set_contains dup ok 1398@c re_node_set_add_intersect dup @ascuheap @acsmem 1399@c re_node_set_free dup @ascuheap @acsmem 1400@c re_node_set_remove_at dup ok 1401@c re_node_set_contains dup ok 1402@c re_acquire_state dup @ascuheap @acsmem 1403@c sift_states_bkref @mtslocale @ascuheap @acsmem 1404@c search_cur_bkref_entry dup ok 1405@c check_dst_limits ok 1406@c search_cur_bkref_entry dup ok 1407@c check_dst_limits_calc_pos ok 1408@c check_dst_limits_calc_pos_1 ok 1409@c re_node_set_init_copy dup @ascuheap @acsmem 1410@c re_node_set_insert dup @ascuheap @acsmem 1411@c sift_states_backward dup @mtslocale @ascuheap @acsmem 1412@c merge_state_array dup @ascuheap @acsmem 1413@c re_node_set_remove ok 1414@c re_node_set_contains dup ok 1415@c re_node_set_remove_at dup ok 1416@c re_node_set_free dup @ascuheap @acsmem 1417@c re_node_set_free dup @ascuheap @acsmem 1418@c re_node_set_empty dup ok 1419@c build_sifted_states @mtslocale @ascuheap @acsmem 1420@c sift_states_iter_mb @mtslocale @ascuheap @acsmem 1421@c check_node_accept_bytes dup @mtslocale @ascuheap @acsmem 1422@c check_node_accept dup ok 1423@c check_dst_limits dup ok 1424@c re_node_set_insert dup @ascuheap @acsmem 1425@c re_node_set_free dup @ascuheap @acsmem 1426@c check_halt_state_context dup ok 1427@c merge_state_array @ascuheap @acsmem 1428@c re_node_set_init_union dup @ascuheap @acsmem 1429@c re_acquire_state dup @ascuheap @acsmem 1430@c re_node_set_free dup @ascuheap @acsmem 1431@c (re_)free dup @ascuheap @acsmem 1432@c set_regs @ascuheap @acsmem 1433@c (re_)malloc dup @ascuheap @acsmem 1434@c re_node_set_init_empty dup ok 1435@c free_fail_stack_return @ascuheap @acsmem 1436@c re_node_set_free dup @ascuheap @acsmem 1437@c (re_)free dup @ascuheap @acsmem 1438@c update_regs ok 1439@c re_node_set_free dup @ascuheap @acsmem 1440@c pop_fail_stack @ascuheap @acsmem 1441@c re_node_set_free dup @ascuheap @acsmem 1442@c (re_)free dup @ascuheap @acsmem 1443@c (re_)free dup @ascuheap @acsmem 1444@c (re_)free dup @ascuheap @acsmem 1445@c match_ctx_free @ascuheap @acsmem 1446@c match_ctx_clean @ascuheap @acsmem 1447@c (re_)free dup @ascuheap @acsmem 1448@c (re_)free dup @ascuheap @acsmem 1449@c re_string_destruct dup @ascuheap @acsmem 1450@c libc_lock_unlock @aculock 1451This function tries to match the compiled regular expression 1452@code{*@var{compiled}} against @var{string}. 1453 1454@code{regexec} returns @code{0} if the regular expression matches; 1455otherwise, it returns a nonzero value. See the table below for 1456what nonzero values mean. You can use @code{regerror} to produce an 1457error message string describing the reason for a nonzero value; 1458see @ref{Regexp Cleanup}. 1459 1460The argument @var{eflags} is a word of bit flags that enable various 1461options. 1462 1463If you want to get information about what part of @var{string} actually 1464matched the regular expression or its subexpressions, use the arguments 1465@var{matchptr} and @var{nmatch}. Otherwise, pass @code{0} for 1466@var{nmatch}, and @code{NULL} for @var{matchptr}. @xref{Regexp 1467Subexpressions}. 1468@end deftypefun 1469 1470You must match the regular expression with the same set of current 1471locales that were in effect when you compiled the regular expression. 1472 1473The function @code{regexec} accepts the following flags in the 1474@var{eflags} argument: 1475 1476@vtable @code 1477@item REG_NOTBOL 1478@standards{POSIX.2, regex.h} 1479Do not regard the beginning of the specified string as the beginning of 1480a line; more generally, don't make any assumptions about what text might 1481precede it. 1482 1483@item REG_NOTEOL 1484@standards{POSIX.2, regex.h} 1485Do not regard the end of the specified string as the end of a line; more 1486generally, don't make any assumptions about what text might follow it. 1487@end vtable 1488 1489Here are the possible nonzero values that @code{regexec} can return: 1490 1491@vtable @code 1492@item REG_NOMATCH 1493@standards{POSIX.2, regex.h} 1494The pattern didn't match the string. This isn't really an error. 1495 1496@item REG_ESPACE 1497@standards{POSIX.2, regex.h} 1498@code{regexec} ran out of memory. 1499@end vtable 1500 1501@node Regexp Subexpressions 1502@subsection Match Results with Subexpressions 1503 1504When @code{regexec} matches parenthetical subexpressions of 1505@var{pattern}, it records which parts of @var{string} they match. It 1506returns that information by storing the offsets into an array whose 1507elements are structures of type @code{regmatch_t}. The first element of 1508the array (index @code{0}) records the part of the string that matched 1509the entire regular expression. Each other element of the array records 1510the beginning and end of the part that matched a single parenthetical 1511subexpression. 1512 1513@deftp {Data Type} regmatch_t 1514@standards{POSIX.2, regex.h} 1515This is the data type of the @var{matchptr} array that you pass to 1516@code{regexec}. It contains two structure fields, as follows: 1517 1518@table @code 1519@item rm_so 1520The offset in @var{string} of the beginning of a substring. Add this 1521value to @var{string} to get the address of that part. 1522 1523@item rm_eo 1524The offset in @var{string} of the end of the substring. 1525@end table 1526@end deftp 1527 1528@deftp {Data Type} regoff_t 1529@standards{POSIX.2, regex.h} 1530@code{regoff_t} is an alias for another signed integer type. 1531The fields of @code{regmatch_t} have type @code{regoff_t}. 1532@end deftp 1533 1534The @code{regmatch_t} elements correspond to subexpressions 1535positionally; the first element (index @code{1}) records where the first 1536subexpression matched, the second element records the second 1537subexpression, and so on. The order of the subexpressions is the order 1538in which they begin. 1539 1540When you call @code{regexec}, you specify how long the @var{matchptr} 1541array is, with the @var{nmatch} argument. This tells @code{regexec} how 1542many elements to store. If the actual regular expression has more than 1543@var{nmatch} subexpressions, then you won't get offset information about 1544the rest of them. But this doesn't alter whether the pattern matches a 1545particular string or not. 1546 1547If you don't want @code{regexec} to return any information about where 1548the subexpressions matched, you can either supply @code{0} for 1549@var{nmatch}, or use the flag @code{REG_NOSUB} when you compile the 1550pattern with @code{regcomp}. 1551 1552@node Subexpression Complications 1553@subsection Complications in Subexpression Matching 1554 1555Sometimes a subexpression matches a substring of no characters. This 1556happens when @samp{f\(o*\)} matches the string @samp{fum}. (It really 1557matches just the @samp{f}.) In this case, both of the offsets identify 1558the point in the string where the null substring was found. In this 1559example, the offsets are both @code{1}. 1560 1561Sometimes the entire regular expression can match without using some of 1562its subexpressions at all---for example, when @samp{ba\(na\)*} matches the 1563string @samp{ba}, the parenthetical subexpression is not used. When 1564this happens, @code{regexec} stores @code{-1} in both fields of the 1565element for that subexpression. 1566 1567Sometimes matching the entire regular expression can match a particular 1568subexpression more than once---for example, when @samp{ba\(na\)*} 1569matches the string @samp{bananana}, the parenthetical subexpression 1570matches three times. When this happens, @code{regexec} usually stores 1571the offsets of the last part of the string that matched the 1572subexpression. In the case of @samp{bananana}, these offsets are 1573@code{6} and @code{8}. 1574 1575But the last match is not always the one that is chosen. It's more 1576accurate to say that the last @emph{opportunity} to match is the one 1577that takes precedence. What this means is that when one subexpression 1578appears within another, then the results reported for the inner 1579subexpression reflect whatever happened on the last match of the outer 1580subexpression. For an example, consider @samp{\(ba\(na\)*s \)*} matching 1581the string @samp{bananas bas }. The last time the inner expression 1582actually matches is near the end of the first word. But it is 1583@emph{considered} again in the second word, and fails to match there. 1584@code{regexec} reports nonuse of the ``na'' subexpression. 1585 1586Another place where this rule applies is when the regular expression 1587@smallexample 1588\(ba\(na\)*s \|nefer\(ti\)* \)* 1589@end smallexample 1590@noindent 1591matches @samp{bananas nefertiti}. The ``na'' subexpression does match 1592in the first word, but it doesn't match in the second word because the 1593other alternative is used there. Once again, the second repetition of 1594the outer subexpression overrides the first, and within that second 1595repetition, the ``na'' subexpression is not used. So @code{regexec} 1596reports nonuse of the ``na'' subexpression. 1597 1598@node Regexp Cleanup 1599@subsection POSIX Regexp Matching Cleanup 1600 1601When you are finished using a compiled regular expression, you can 1602free the storage it uses by calling @code{regfree}. 1603 1604@deftypefun void regfree (regex_t *@var{compiled}) 1605@standards{POSIX.2, regex.h} 1606@safety{@prelim{}@mtsafe{}@asunsafe{@ascuheap{}}@acunsafe{@acsmem{}}} 1607@c (re_)free dup @ascuheap @acsmem 1608@c free_dfa_content dup @ascuheap @acsmem 1609Calling @code{regfree} frees all the storage that @code{*@var{compiled}} 1610points to. This includes various internal fields of the @code{regex_t} 1611structure that aren't documented in this manual. 1612 1613@code{regfree} does not free the object @code{*@var{compiled}} itself. 1614@end deftypefun 1615 1616You should always free the space in a @code{regex_t} structure with 1617@code{regfree} before using the structure to compile another regular 1618expression. 1619 1620When @code{regcomp} or @code{regexec} reports an error, you can use 1621the function @code{regerror} to turn it into an error message string. 1622 1623@deftypefun size_t regerror (int @var{errcode}, const regex_t *restrict @var{compiled}, char *restrict @var{buffer}, size_t @var{length}) 1624@standards{POSIX.2, regex.h} 1625@safety{@prelim{}@mtsafe{@mtsenv{}}@asunsafe{@asucorrupt{} @ascuheap{} @asulock{} @ascudlopen{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} 1626@c regerror calls gettext, strcmp and mempcpy or memcpy. 1627This function produces an error message string for the error code 1628@var{errcode}, and stores the string in @var{length} bytes of memory 1629starting at @var{buffer}. For the @var{compiled} argument, supply the 1630same compiled regular expression structure that @code{regcomp} or 1631@code{regexec} was working with when it got the error. Alternatively, 1632you can supply @code{NULL} for @var{compiled}; you will still get a 1633meaningful error message, but it might not be as detailed. 1634 1635If the error message can't fit in @var{length} bytes (including a 1636terminating null character), then @code{regerror} truncates it. 1637The string that @code{regerror} stores is always null-terminated 1638even if it has been truncated. 1639 1640The return value of @code{regerror} is the minimum length needed to 1641store the entire error message. If this is less than @var{length}, then 1642the error message was not truncated, and you can use it. Otherwise, you 1643should call @code{regerror} again with a larger buffer. 1644 1645Here is a function which uses @code{regerror}, but always dynamically 1646allocates a buffer for the error message: 1647 1648@smallexample 1649char *get_regerror (int errcode, regex_t *compiled) 1650@{ 1651 size_t length = regerror (errcode, compiled, NULL, 0); 1652 char *buffer = xmalloc (length); 1653 (void) regerror (errcode, compiled, buffer, length); 1654 return buffer; 1655@} 1656@end smallexample 1657@end deftypefun 1658 1659@node Word Expansion 1660@section Shell-Style Word Expansion 1661@cindex word expansion 1662@cindex expansion of shell words 1663 1664@dfn{Word expansion} means the process of splitting a string into 1665@dfn{words} and substituting for variables, commands, and wildcards 1666just as the shell does. 1667 1668For example, when you write @samp{ls -l foo.c}, this string is split 1669into three separate words---@samp{ls}, @samp{-l} and @samp{foo.c}. 1670This is the most basic function of word expansion. 1671 1672When you write @samp{ls *.c}, this can become many words, because 1673the word @samp{*.c} can be replaced with any number of file names. 1674This is called @dfn{wildcard expansion}, and it is also a part of 1675word expansion. 1676 1677When you use @samp{echo $PATH} to print your path, you are taking 1678advantage of @dfn{variable substitution}, which is also part of word 1679expansion. 1680 1681Ordinary programs can perform word expansion just like the shell by 1682calling the library function @code{wordexp}. 1683 1684@menu 1685* Expansion Stages:: What word expansion does to a string. 1686* Calling Wordexp:: How to call @code{wordexp}. 1687* Flags for Wordexp:: Options you can enable in @code{wordexp}. 1688* Wordexp Example:: A sample program that does word expansion. 1689* Tilde Expansion:: Details of how tilde expansion works. 1690* Variable Substitution:: Different types of variable substitution. 1691@end menu 1692 1693@node Expansion Stages 1694@subsection The Stages of Word Expansion 1695 1696When word expansion is applied to a sequence of words, it performs the 1697following transformations in the order shown here: 1698 1699@enumerate 1700@item 1701@cindex tilde expansion 1702@dfn{Tilde expansion}: Replacement of @samp{~foo} with the name of 1703the home directory of @samp{foo}. 1704 1705@item 1706Next, three different transformations are applied in the same step, 1707from left to right: 1708 1709@itemize @bullet 1710@item 1711@cindex variable substitution 1712@cindex substitution of variables and commands 1713@dfn{Variable substitution}: Environment variables are substituted for 1714references such as @samp{$foo}. 1715 1716@item 1717@cindex command substitution 1718@dfn{Command substitution}: Constructs such as @w{@samp{`cat foo`}} and 1719the equivalent @w{@samp{$(cat foo)}} are replaced with the output from 1720the inner command. 1721 1722@item 1723@cindex arithmetic expansion 1724@dfn{Arithmetic expansion}: Constructs such as @samp{$(($x-1))} are 1725replaced with the result of the arithmetic computation. 1726@end itemize 1727 1728@item 1729@cindex field splitting 1730@dfn{Field splitting}: subdivision of the text into @dfn{words}. 1731 1732@item 1733@cindex wildcard expansion 1734@dfn{Wildcard expansion}: The replacement of a construct such as @samp{*.c} 1735with a list of @samp{.c} file names. Wildcard expansion applies to an 1736entire word at a time, and replaces that word with 0 or more file names 1737that are themselves words. 1738 1739@item 1740@cindex quote removal 1741@cindex removal of quotes 1742@dfn{Quote removal}: The deletion of string-quotes, now that they have 1743done their job by inhibiting the above transformations when appropriate. 1744@end enumerate 1745 1746For the details of these transformations, and how to write the constructs 1747that use them, see @w{@cite{The BASH Manual}} (to appear). 1748 1749@node Calling Wordexp 1750@subsection Calling @code{wordexp} 1751 1752All the functions, constants and data types for word expansion are 1753declared in the header file @file{wordexp.h}. 1754 1755Word expansion produces a vector of words (strings). To return this 1756vector, @code{wordexp} uses a special data type, @code{wordexp_t}, which 1757is a structure. You pass @code{wordexp} the address of the structure, 1758and it fills in the structure's fields to tell you about the results. 1759 1760@deftp {Data Type} {wordexp_t} 1761@standards{POSIX.2, wordexp.h} 1762This data type holds a pointer to a word vector. More precisely, it 1763records both the address of the word vector and its size. 1764 1765@table @code 1766@item we_wordc 1767The number of elements in the vector. 1768 1769@item we_wordv 1770The address of the vector. This field has type @w{@code{char **}}. 1771 1772@item we_offs 1773The offset of the first real element of the vector, from its nominal 1774address in the @code{we_wordv} field. Unlike the other fields, this 1775is always an input to @code{wordexp}, rather than an output from it. 1776 1777If you use a nonzero offset, then that many elements at the beginning of 1778the vector are left empty. (The @code{wordexp} function fills them with 1779null pointers.) 1780 1781The @code{we_offs} field is meaningful only if you use the 1782@code{WRDE_DOOFFS} flag. Otherwise, the offset is always zero 1783regardless of what is in this field, and the first real element comes at 1784the beginning of the vector. 1785@end table 1786@end deftp 1787 1788@deftypefun int wordexp (const char *@var{words}, wordexp_t *@var{word-vector-ptr}, int @var{flags}) 1789@standards{POSIX.2, wordexp.h} 1790@safety{@prelim{}@mtunsafe{@mtasurace{:utent} @mtasuconst{:@mtsenv{}} @mtsenv{} @mtascusig{:ALRM} @mtascutimer{} @mtslocale{}}@asunsafe{@ascudlopen{} @ascuplugin{} @ascuintl{} @ascuheap{} @asucorrupt{} @asulock{}}@acunsafe{@acucorrupt{} @aculock{} @acsfd{} @acsmem{}}} 1791@c wordexp @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asucorrupt @asulock @acucorrupt @aculock @acsfd @acsmem 1792@c w_newword ok 1793@c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem 1794@c calloc dup @ascuheap @acsmem 1795@c getenv dup @mtsenv 1796@c strcpy dup ok 1797@c parse_backslash @ascuheap @acsmem 1798@c w_addchar dup @ascuheap @acsmem 1799@c parse_dollars @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1800@c w_addchar dup @ascuheap @acsmem 1801@c parse_arith @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1802@c w_newword dup ok 1803@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1804@c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem 1805@c parse_qtd_backslash dup @ascuheap @acsmem 1806@c eval_expr @mtslocale 1807@c eval_expr_multidiv @mtslocale 1808@c eval_expr_val @mtslocale 1809@c isspace dup @mtslocale 1810@c eval_expr dup @mtslocale 1811@c isspace dup @mtslocale 1812@c isspace dup @mtslocale 1813@c free dup @ascuheap @acsmem 1814@c w_addchar dup @ascuheap @acsmem 1815@c w_addstr dup @ascuheap @acsmem 1816@c itoa_word dup ok 1817@c parse_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem 1818@c w_newword dup ok 1819@c pthread_setcancelstate @ascuplugin @ascuheap @acsmem 1820@c (disable cancellation around exec_comm; it may do_cancel the 1821@c second time, if async cancel is enabled) 1822@c THREAD_ATOMIC_CMPXCHG_VAL dup ok 1823@c do_cancel @ascuplugin @ascuheap @acsmem 1824@c THREAD_ATOMIC_BIT_SET dup ok 1825@c pthread_unwind @ascuplugin @ascuheap @acsmem 1826@c Unwind_ForcedUnwind if available @ascuplugin @ascuheap @acsmem 1827@c libc_unwind_longjmp otherwise 1828@c cleanups 1829@c exec_comm @ascuplugin @ascuheap @aculock @acsfd @acsmem 1830@c pipe2 dup ok 1831@c pipe dup ok 1832@c fork dup @ascuplugin @aculock 1833@c close dup @acsfd 1834@c on child: exec_comm_child -> exec or abort 1835@c waitpid dup ok 1836@c read dup ok 1837@c w_addmem dup @ascuheap @acsmem 1838@c strchr dup ok 1839@c w_addword dup @ascuheap @acsmem 1840@c w_newword dup ok 1841@c w_addchar dup @ascuheap @acsmem 1842@c free dup @ascuheap @acsmem 1843@c kill dup ok 1844@c free dup @ascuheap @acsmem 1845@c parse_param @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1846@c reads from __libc_argc and __libc_argv without guards 1847@c w_newword dup ok 1848@c isalpha dup @mtslocale^^ 1849@c w_addchar dup @ascuheap @acsmem 1850@c isalnum dup @mtslocale^^ 1851@c isdigit dup @mtslocale^^ 1852@c strchr dup ok 1853@c itoa_word dup ok 1854@c atoi dup @mtslocale 1855@c getpid dup ok 1856@c w_addstr dup @ascuheap @acsmem 1857@c free dup @ascuheap @acsmem 1858@c strlen dup ok 1859@c malloc dup @ascuheap @acsmem 1860@c stpcpy dup ok 1861@c w_addword dup @ascuheap @acsmem 1862@c strdup dup @ascuheap @acsmem 1863@c getenv dup @mtsenv 1864@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1865@c parse_tilde dup @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1866@c fnmatch dup @mtsenv @mtslocale @ascuheap @acsmem 1867@c mempcpy dup ok 1868@c _ dup @ascuintl 1869@c fxprintf dup @aculock 1870@c setenv dup @mtasuconst:@mtsenv @ascuheap @asulock @acucorrupt @aculock @acsmem 1871@c strspn dup ok 1872@c strcspn dup ok 1873@c parse_backtick @ascuplugin @ascuheap @aculock @acsfd @acsmem 1874@c w_newword dup ok 1875@c exec_comm dup @ascuplugin @ascuheap @aculock @acsfd @acsmem 1876@c free dup @ascuheap @acsmem 1877@c parse_qtd_backslash dup @ascuheap @acsmem 1878@c parse_backslash dup @ascuheap @acsmem 1879@c w_addchar dup @ascuheap @acsmem 1880@c parse_dquote @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1881@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1882@c parse_backtick dup @ascuplugin @ascuheap @aculock @acsfd @acsmem 1883@c parse_qtd_backslash dup @ascuheap @acsmem 1884@c w_addchar dup @ascuheap @acsmem 1885@c w_addword dup @ascuheap @acsmem 1886@c strdup dup @ascuheap @acsmem 1887@c realloc dup @ascuheap @acsmem 1888@c free dup @ascuheap @acsmem 1889@c parse_squote dup @ascuheap @acsmem 1890@c w_addchar dup @ascuheap @acsmem 1891@c parse_tilde @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1892@c strchr dup ok 1893@c w_addchar dup @ascuheap @acsmem 1894@c getenv dup @mtsenv 1895@c w_addstr dup @ascuheap @acsmem 1896@c strlen dup ok 1897@c w_addmem dup @ascuheap @acsmem 1898@c realloc dup @ascuheap @acsmem 1899@c free dup @ascuheap @acsmem 1900@c mempcpy dup ok 1901@c getuid dup ok 1902@c getpwuid_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1903@c getpwnam_r dup @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1904@c parse_glob @mtasurace:utent @mtasuconst:@mtsenv @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1905@c strchr dup ok 1906@c parse_dollars dup @mtasuconst:@mtsenv @mtslocale @mtsenv @ascudlopen @ascuplugin @ascuintl @ascuheap @asulock @acucorrupt @aculock @acsfd @acsmem 1907@c parse_qtd_backslash @ascuheap @acsmem 1908@c w_addchar dup @ascuheap @acsmem 1909@c parse_backslash dup @ascuheap @acsmem 1910@c w_addchar dup @ascuheap @acsmem 1911@c w_addword dup @ascuheap @acsmem 1912@c w_newword dup ok 1913@c do_parse_glob @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem 1914@c glob dup @mtasurace:utent @mtsenv @mtascusig:ALRM @mtascutimer @mtslocale @ascudlopen @ascuplugin @ascuheap @asulock @aculock @acsfd @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] 1915@c w_addstr dup @ascuheap @acsmem 1916@c w_addchar dup @ascuheap @acsmem 1917@c globfree dup @ascuheap @acsmem [auto glob_t avoids @asucorrupt @acucorrupt] 1918@c free dup @ascuheap @acsmem 1919@c w_newword dup ok 1920@c strdup dup @ascuheap @acsmem 1921@c w_addword dup @ascuheap @acsmem 1922@c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem 1923@c strchr dup ok 1924@c w_addchar dup @ascuheap @acsmem 1925@c realloc dup @ascuheap @acsmem 1926@c free dup @ascuheap @acsmem 1927@c free dup @ascuheap @acsmem 1928Perform word expansion on the string @var{words}, putting the result in 1929a newly allocated vector, and store the size and address of this vector 1930into @code{*@var{word-vector-ptr}}. The argument @var{flags} is a 1931combination of bit flags; see @ref{Flags for Wordexp}, for details of 1932the flags. 1933 1934You shouldn't use any of the characters @samp{|&;<>} in the string 1935@var{words} unless they are quoted; likewise for newline. If you use 1936these characters unquoted, you will get the @code{WRDE_BADCHAR} error 1937code. Don't use parentheses or braces unless they are quoted or part of 1938a word expansion construct. If you use quotation characters @samp{'"`}, 1939they should come in pairs that balance. 1940 1941The results of word expansion are a sequence of words. The function 1942@code{wordexp} allocates a string for each resulting word, then 1943allocates a vector of type @code{char **} to store the addresses of 1944these strings. The last element of the vector is a null pointer. 1945This vector is called the @dfn{word vector}. 1946 1947To return this vector, @code{wordexp} stores both its address and its 1948length (number of elements, not counting the terminating null pointer) 1949into @code{*@var{word-vector-ptr}}. 1950 1951If @code{wordexp} succeeds, it returns 0. Otherwise, it returns one 1952of these error codes: 1953 1954@vtable @code 1955@item WRDE_BADCHAR 1956@standards{POSIX.2, wordexp.h} 1957The input string @var{words} contains an unquoted invalid character such 1958as @samp{|}. 1959 1960@item WRDE_BADVAL 1961@standards{POSIX.2, wordexp.h} 1962The input string refers to an undefined shell variable, and you used the flag 1963@code{WRDE_UNDEF} to forbid such references. 1964 1965@item WRDE_CMDSUB 1966@standards{POSIX.2, wordexp.h} 1967The input string uses command substitution, and you used the flag 1968@code{WRDE_NOCMD} to forbid command substitution. 1969 1970@item WRDE_NOSPACE 1971@standards{POSIX.2, wordexp.h} 1972It was impossible to allocate memory to hold the result. In this case, 1973@code{wordexp} can store part of the results---as much as it could 1974allocate room for. 1975 1976@item WRDE_SYNTAX 1977@standards{POSIX.2, wordexp.h} 1978There was a syntax error in the input string. For example, an unmatched 1979quoting character is a syntax error. This error code is also used to 1980signal division by zero and overflow in arithmetic expansion. 1981@end vtable 1982@end deftypefun 1983 1984@deftypefun void wordfree (wordexp_t *@var{word-vector-ptr}) 1985@standards{POSIX.2, wordexp.h} 1986@safety{@prelim{}@mtsafe{}@asunsafe{@asucorrupt{} @ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}} 1987@c wordfree dup @asucorrupt @ascuheap @acucorrupt @acsmem 1988@c free dup @ascuheap @acsmem 1989Free the storage used for the word-strings and vector that 1990@code{*@var{word-vector-ptr}} points to. This does not free the 1991structure @code{*@var{word-vector-ptr}} itself---only the other 1992data it points to. 1993@end deftypefun 1994 1995@node Flags for Wordexp 1996@subsection Flags for Word Expansion 1997 1998This section describes the flags that you can specify in the 1999@var{flags} argument to @code{wordexp}. Choose the flags you want, 2000and combine them with the C operator @code{|}. 2001 2002@vtable @code 2003@item WRDE_APPEND 2004@standards{POSIX.2, wordexp.h} 2005Append the words from this expansion to the vector of words produced by 2006previous calls to @code{wordexp}. This way you can effectively expand 2007several words as if they were concatenated with spaces between them. 2008 2009In order for appending to work, you must not modify the contents of the 2010word vector structure between calls to @code{wordexp}. And, if you set 2011@code{WRDE_DOOFFS} in the first call to @code{wordexp}, you must also 2012set it when you append to the results. 2013 2014@item WRDE_DOOFFS 2015@standards{POSIX.2, wordexp.h} 2016Leave blank slots at the beginning of the vector of words. 2017The @code{we_offs} field says how many slots to leave. 2018The blank slots contain null pointers. 2019 2020@item WRDE_NOCMD 2021@standards{POSIX.2, wordexp.h} 2022Don't do command substitution; if the input requests command substitution, 2023report an error. 2024 2025@item WRDE_REUSE 2026@standards{POSIX.2, wordexp.h} 2027Reuse a word vector made by a previous call to @code{wordexp}. 2028Instead of allocating a new vector of words, this call to @code{wordexp} 2029will use the vector that already exists (making it larger if necessary). 2030 2031Note that the vector may move, so it is not safe to save an old pointer 2032and use it again after calling @code{wordexp}. You must fetch 2033@code{we_pathv} anew after each call. 2034 2035@item WRDE_SHOWERR 2036@standards{POSIX.2, wordexp.h} 2037Do show any error messages printed by commands run by command substitution. 2038More precisely, allow these commands to inherit the standard error output 2039stream of the current process. By default, @code{wordexp} gives these 2040commands a standard error stream that discards all output. 2041 2042@item WRDE_UNDEF 2043@standards{POSIX.2, wordexp.h} 2044If the input refers to a shell variable that is not defined, report an 2045error. 2046@end vtable 2047 2048@node Wordexp Example 2049@subsection @code{wordexp} Example 2050 2051Here is an example of using @code{wordexp} to expand several strings 2052and use the results to run a shell command. It also shows the use of 2053@code{WRDE_APPEND} to concatenate the expansions and of @code{wordfree} 2054to free the space allocated by @code{wordexp}. 2055 2056@smallexample 2057int 2058expand_and_execute (const char *program, const char **options) 2059@{ 2060 wordexp_t result; 2061 pid_t pid 2062 int status, i; 2063 2064 /* @r{Expand the string for the program to run.} */ 2065 switch (wordexp (program, &result, 0)) 2066 @{ 2067 case 0: /* @r{Successful}. */ 2068 break; 2069 case WRDE_NOSPACE: 2070 /* @r{If the error was @code{WRDE_NOSPACE},} 2071 @r{then perhaps part of the result was allocated.} */ 2072 wordfree (&result); 2073 default: /* @r{Some other error.} */ 2074 return -1; 2075 @} 2076 2077 /* @r{Expand the strings specified for the arguments.} */ 2078 for (i = 0; options[i] != NULL; i++) 2079 @{ 2080 if (wordexp (options[i], &result, WRDE_APPEND)) 2081 @{ 2082 wordfree (&result); 2083 return -1; 2084 @} 2085 @} 2086 2087 pid = fork (); 2088 if (pid == 0) 2089 @{ 2090 /* @r{This is the child process. Execute the command.} */ 2091 execv (result.we_wordv[0], result.we_wordv); 2092 exit (EXIT_FAILURE); 2093 @} 2094 else if (pid < 0) 2095 /* @r{The fork failed. Report failure.} */ 2096 status = -1; 2097 else 2098 /* @r{This is the parent process. Wait for the child to complete.} */ 2099 if (waitpid (pid, &status, 0) != pid) 2100 status = -1; 2101 2102 wordfree (&result); 2103 return status; 2104@} 2105@end smallexample 2106 2107@node Tilde Expansion 2108@subsection Details of Tilde Expansion 2109 2110It's a standard part of shell syntax that you can use @samp{~} at the 2111beginning of a file name to stand for your own home directory. You 2112can use @samp{~@var{user}} to stand for @var{user}'s home directory. 2113 2114@dfn{Tilde expansion} is the process of converting these abbreviations 2115to the directory names that they stand for. 2116 2117Tilde expansion applies to the @samp{~} plus all following characters up 2118to whitespace or a slash. It takes place only at the beginning of a 2119word, and only if none of the characters to be transformed is quoted in 2120any way. 2121 2122Plain @samp{~} uses the value of the environment variable @code{HOME} 2123as the proper home directory name. @samp{~} followed by a user name 2124uses @code{getpwname} to look up that user in the user database, and 2125uses whatever directory is recorded there. Thus, @samp{~} followed 2126by your own name can give different results from plain @samp{~}, if 2127the value of @code{HOME} is not really your home directory. 2128 2129@node Variable Substitution 2130@subsection Details of Variable Substitution 2131 2132Part of ordinary shell syntax is the use of @samp{$@var{variable}} to 2133substitute the value of a shell variable into a command. This is called 2134@dfn{variable substitution}, and it is one part of doing word expansion. 2135 2136There are two basic ways you can write a variable reference for 2137substitution: 2138 2139@table @code 2140@item $@{@var{variable}@} 2141If you write braces around the variable name, then it is completely 2142unambiguous where the variable name ends. You can concatenate 2143additional letters onto the end of the variable value by writing them 2144immediately after the close brace. For example, @samp{$@{foo@}s} 2145expands into @samp{tractors}. 2146 2147@item $@var{variable} 2148If you do not put braces around the variable name, then the variable 2149name consists of all the alphanumeric characters and underscores that 2150follow the @samp{$}. The next punctuation character ends the variable 2151name. Thus, @samp{$foo-bar} refers to the variable @code{foo} and expands 2152into @samp{tractor-bar}. 2153@end table 2154 2155When you use braces, you can also use various constructs to modify the 2156value that is substituted, or test it in various ways. 2157 2158@table @code 2159@item $@{@var{variable}:-@var{default}@} 2160Substitute the value of @var{variable}, but if that is empty or 2161undefined, use @var{default} instead. 2162 2163@item $@{@var{variable}:=@var{default}@} 2164Substitute the value of @var{variable}, but if that is empty or 2165undefined, use @var{default} instead and set the variable to 2166@var{default}. 2167 2168@item $@{@var{variable}:?@var{message}@} 2169If @var{variable} is defined and not empty, substitute its value. 2170 2171Otherwise, print @var{message} as an error message on the standard error 2172stream, and consider word expansion a failure. 2173 2174@c ??? How does wordexp report such an error? 2175@c WRDE_BADVAL is returned. 2176 2177@item $@{@var{variable}:+@var{replacement}@} 2178Substitute @var{replacement}, but only if @var{variable} is defined and 2179nonempty. Otherwise, substitute nothing for this construct. 2180@end table 2181 2182@table @code 2183@item $@{#@var{variable}@} 2184Substitute a numeral which expresses in base ten the number of 2185characters in the value of @var{variable}. @samp{$@{#foo@}} stands for 2186@samp{7}, because @samp{tractor} is seven characters. 2187@end table 2188 2189These variants of variable substitution let you remove part of the 2190variable's value before substituting it. The @var{prefix} and 2191@var{suffix} are not mere strings; they are wildcard patterns, just 2192like the patterns that you use to match multiple file names. But 2193in this context, they match against parts of the variable value 2194rather than against file names. 2195 2196@table @code 2197@item $@{@var{variable}%%@var{suffix}@} 2198Substitute the value of @var{variable}, but first discard from that 2199variable any portion at the end that matches the pattern @var{suffix}. 2200 2201If there is more than one alternative for how to match against 2202@var{suffix}, this construct uses the longest possible match. 2203 2204Thus, @samp{$@{foo%%r*@}} substitutes @samp{t}, because the largest 2205match for @samp{r*} at the end of @samp{tractor} is @samp{ractor}. 2206 2207@item $@{@var{variable}%@var{suffix}@} 2208Substitute the value of @var{variable}, but first discard from that 2209variable any portion at the end that matches the pattern @var{suffix}. 2210 2211If there is more than one alternative for how to match against 2212@var{suffix}, this construct uses the shortest possible alternative. 2213 2214Thus, @samp{$@{foo%r*@}} substitutes @samp{tracto}, because the shortest 2215match for @samp{r*} at the end of @samp{tractor} is just @samp{r}. 2216 2217@item $@{@var{variable}##@var{prefix}@} 2218Substitute the value of @var{variable}, but first discard from that 2219variable any portion at the beginning that matches the pattern @var{prefix}. 2220 2221If there is more than one alternative for how to match against 2222@var{prefix}, this construct uses the longest possible match. 2223 2224Thus, @samp{$@{foo##*t@}} substitutes @samp{or}, because the largest 2225match for @samp{*t} at the beginning of @samp{tractor} is @samp{tract}. 2226 2227@item $@{@var{variable}#@var{prefix}@} 2228Substitute the value of @var{variable}, but first discard from that 2229variable any portion at the beginning that matches the pattern @var{prefix}. 2230 2231If there is more than one alternative for how to match against 2232@var{prefix}, this construct uses the shortest possible alternative. 2233 2234Thus, @samp{$@{foo#*t@}} substitutes @samp{ractor}, because the shortest 2235match for @samp{*t} at the beginning of @samp{tractor} is just @samp{t}. 2236 2237@end table 2238