path: root/src/regex/glob.c
AgeCommit message (Collapse)AuthorLines
2019-08-07fix failure of glob to match broken symlinks under some conditionsRich Felker-5/+12
when the pattern ended with one or more literal path components, or when the GLOB_MARK flag was passed to request that glob flag directory results and the type obtained by readdir was unknown or inconclusive (symlink), the stat function was called to evaluate existence and/or determine type. however, stat fails with ENOENT for broken symlinks, and this caused the match to be omitted from the results. instead, use stat only for the unknown/inconclusive cases with GLOB_MARK, and otherwise, or if stat fails, use lstat existence still needs to be determined. this minimizes the number of costly syscalls, performing both only in the case where GLOB_MARK is in use and there is a final literal path component which is a broken symlink. based on/simplified from patch by James Y Knight.
2019-08-06glob: implement GLOB_TILDE and GLOB_TILDE_CHECKIsmael Luceno-1/+41
2018-10-13allow escaped path-separator slashes in globRich Felker-11/+22
previously (before and after rewrite), spurious escaping of path separators as \/ was not treated the same as /, but rather got split as an unpaired \ at the end of the fnmatch pattern and an unescaped /, resulting in a mismatch/error. for the case of \/ as part of the maximal literal prefix, remove the explicit rejection of it and move the handling of / below escape processing. for the case of \/ after a proper glob pattern, it's hard to parse the pattern, so don't. instead cheat and count repetitions of \ prior to the already-found / character. if there are an odd number, the last is escaping the /, so back up the split position by one. now the char clobbered by null termination is variable, so save it and restore as needed.
2018-10-12rewrite core of the glob implementation for correctness & optimizationRich Felker-105/+112
this code has been long overdue for a rewrite, but the immediate cause that necessitated it was total failure to see past unreadable path components. for example, A/B/* would fail to match anything, even though it should succeed, when both A and A/B are searchable but only A/B is readable. this problem both was caught in conformance testing, and impacted users. the old glob implementation insisted on searching the listing of each path component for a match, even if the next component was a literal. it also used considerable stack space, up to length of the pattern, per recursion level, and relied on an artificial bound of the pattern length by PATH_MAX, which was incorrect because a pattern can be much longer than PATH_MAX while having matches shorter (for example, with necessarily long bracket expressions, or with redundancy). in the new implementation, each level of recursion starts by consuming the maximal literal (possibly escaped-literal) path prefix remaining in the pattern, and only opening a directory to read when there is a proper glob pattern in the next path component. it then recurses into each matching entry. the top-level glob function provided automatic storage (up to PATH_MAX) for construction of candidate/result strings, and allocates a duplicate of the pattern that can be modified in-place with temporary null-termination to pass to fnmatch. this allocation is not a big deal since glob already has to perform allocation, and has to link free to clean up if it experiences an allocation failure or other error after some results have already been allocated. care is taken to use the d_type field from iterated dirents when possible; stat is called only when there are literal path components past the last proper-glob component, or when needed to disambiguate symlinks for the purpose of GLOB_MARK. one peculiarity with the new implementation is the manner in which the error handling callback will be called. if attempting to match */B/C/D where a directory A exists that is inaccessible, the error reported will be a stat error for A/B/C/D rather than (previous and wrong implementation) an opendir error for A, or (likely on other implementations) a stat error for A/B. such behavior does not seem to be non-conforming, but if it turns out to be undesirable for any reason, backtracking could be done on error to report the first component producing it. also, redundant slashes are no longer normalized, but preserved as they appear in the pattern; this is probably more correct, and falls out naturally from the algorithm used. since trailing slashes (which force all matches to be directories) are preserved as well, the behavior of GLOB_MARK has been adjusted not to append an additional slash to results that already end in slash.
2018-10-11fix redundant computations of strlen in glob append functionRich Felker-2/+5
len was already passed as an argument, so don't use strcat, and use memcpy instead of strcpy.
2018-10-11fix invalid substitute of [1] for flexible array member in globRich Felker-2/+2
2018-09-12remove spurious inclusion of libc.h for LFS64 ABI aliasesRich Felker-3/+2
the LFS64 macro was not self-documenting and barely saved any characters. simply use weak_alias directly so that it's clear what's being done, and doesn't depend on a header to provide a strange macro.
2017-10-21fix regression in glob with literal . or .. path componentRich Felker-3/+5
commit 8c4be3e2209d2a1d3874b8bc2b474668fcbbbac6 was written to preclude the GLOB_PERIOD extension from matching these directory entries, but also precluded literal matches. adjust the check that excludes . and .. to check whether the GLOB_PERIOD flag is in effect, so that it cannot alter behavior in cases governed by the standard, and also don't exclude . or .. in any case where normal glob behavior (fnmatch's FNM_PERIOD flag) would have included one or both of them (patterns such as ".*"). it's still not clear whether this is the preferred behavior for GLOB_PERIOD, but at least it's clear that it can no longer break applications which are not relying on quirks of a nonstandard feature.
2017-09-06fix glob descent into . and .. with GLOB_PERIODRich Felker-0/+4
GLOB_PERIOD is a gnu extension, and GNU glob does not seem to honor it except in the last path component. it's not clear whether this a bug or intentional, but it seems reasonable that it should exclude the special entries . and .. when walking. changes based on report and analysis by Julien Ramseier.
2017-06-08fix glob failure to match plain "/" to root directoryRich Felker-1/+1
the check to prevent matching empty string wrongly blocked matching of "/" due to checking emptiness after stripping leading slashes rather than checking the full original argument string. simplified from patch by Julien Ramseier.
2017-01-02make globfree safe after failed glob from over-length argumentRich Felker-2/+2
commit 0dc99ac413d8bc054a2e95578475c7122455eee8 added input length checking to avoid unsafe VLA allocation, but put it in the wrong place, before the glob_t structure was zeroed out. while POSIX isn't clear on whether it's permitted to call globfree after glob failed with GLOB_NOSPACE, making it safe is clearly better than letting uninitialized pointers get passed to free in non-conforming callers. while we're fixing this, change strlen check to the idiomatic strnlen version to avoid unbounded input scanning before returning an error.
2013-12-12include cleanups: remove unused headers and add feature test macrosSzabolcs Nagy-2/+0
2012-09-06use restrict everywhere it's required by c99 and/or posix 2008Rich Felker-1/+1
to deal with the fact that the public headers may be used with pre-c99 compilers, __restrict is used in place of restrict, and defined appropriately for any supported compiler. we also avoid the form [restrict] since older versions of gcc rejected it due to a bug in the original c99 standard, and instead use the form *restrict.
2012-01-23make glob mark symlinks-to-directories with the GLOB_MARK flagRich Felker-1/+1
POSIX is unclear on whether it should, but all historical implementations seem to behave this way, and it seems more useful to applications.
2012-01-22support GLOB_PERIOD flag (GNU extension) to glob functionRich Felker-1/+2
patch by sh4rm4
2011-06-06fix handling of d_name in struct direntRich Felker-3/+2
basically there are 3 choices for how to implement this variable-size string member: 1. C99 flexible array member: breaks using dirent.h with pre-C99 compiler. 2. old way: length-1 string: generates array bounds warnings in caller. 3. new way: length-NAME_MAX string. no problems, simplifies all code. of course the usable part in the pointer returned by readdir might be shorter than NAME_MAX+1 bytes, but that is allowed by the standard and doesn't hurt anything.
2011-06-05safety fix for glob's vla usage: disallow patterns longer than PATH_MAXRich Felker-0/+2
this actually inadvertently disallows some valid patterns with redundant / or * characters, but it's better than allowing unbounded vla allocation. eventually i'll write code to move the pattern to the stack and eliminate redundancy to ensure that it fits in PATH_MAX at the beginning of glob. this would also allow it to be modified in place for passing to fnmatch rather than copied at each level of recursion.
2011-02-12initial check-in, version 0.5.0v0.5.0Rich Felker-0/+238