|Age||Commit message (Collapse)||Author||Lines|
when the pattern ended with one or more literal path components, or
when the GLOB_MARK flag was passed to request that glob flag directory
results and the type obtained by readdir was unknown or inconclusive
(symlink), the stat function was called to evaluate existence and/or
determine type. however, stat fails with ENOENT for broken symlinks,
and this caused the match to be omitted from the results.
instead, use stat only for the unknown/inconclusive cases with
GLOB_MARK, and otherwise, or if stat fails, use lstat existence still
needs to be determined. this minimizes the number of costly syscalls,
performing both only in the case where GLOB_MARK is in use and there
is a final literal path component which is a broken symlink.
based on/simplified from patch by James Y Knight.
previously (before and after rewrite), spurious escaping of path
separators as \/ was not treated the same as /, but rather got split
as an unpaired \ at the end of the fnmatch pattern and an unescaped /,
resulting in a mismatch/error.
for the case of \/ as part of the maximal literal prefix, remove the
explicit rejection of it and move the handling of / below escape
for the case of \/ after a proper glob pattern, it's hard to parse the
pattern, so don't. instead cheat and count repetitions of \ prior to
the already-found / character. if there are an odd number, the last is
escaping the /, so back up the split position by one. now the
char clobbered by null termination is variable, so save it and restore
this code has been long overdue for a rewrite, but the immediate cause
that necessitated it was total failure to see past unreadable path
components. for example, A/B/* would fail to match anything, even
though it should succeed, when both A and A/B are searchable but only
A/B is readable. this problem both was caught in conformance testing,
and impacted users.
the old glob implementation insisted on searching the listing of each
path component for a match, even if the next component was a literal.
it also used considerable stack space, up to length of the pattern,
per recursion level, and relied on an artificial bound of the pattern
length by PATH_MAX, which was incorrect because a pattern can be much
longer than PATH_MAX while having matches shorter (for example, with
necessarily long bracket expressions, or with redundancy).
in the new implementation, each level of recursion starts by consuming
the maximal literal (possibly escaped-literal) path prefix remaining
in the pattern, and only opening a directory to read when there is a
proper glob pattern in the next path component. it then recurses into
each matching entry. the top-level glob function provided automatic
storage (up to PATH_MAX) for construction of candidate/result strings,
and allocates a duplicate of the pattern that can be modified in-place
with temporary null-termination to pass to fnmatch. this allocation is
not a big deal since glob already has to perform allocation, and has
to link free to clean up if it experiences an allocation failure or
other error after some results have already been allocated.
care is taken to use the d_type field from iterated dirents when
possible; stat is called only when there are literal path components
past the last proper-glob component, or when needed to disambiguate
symlinks for the purpose of GLOB_MARK.
one peculiarity with the new implementation is the manner in which the
error handling callback will be called. if attempting to match */B/C/D
where a directory A exists that is inaccessible, the error reported
will be a stat error for A/B/C/D rather than (previous and wrong
implementation) an opendir error for A, or (likely on other
implementations) a stat error for A/B. such behavior does not seem to
be non-conforming, but if it turns out to be undesirable for any
reason, backtracking could be done on error to report the first
component producing it.
also, redundant slashes are no longer normalized, but preserved as
they appear in the pattern; this is probably more correct, and falls
out naturally from the algorithm used. since trailing slashes (which
force all matches to be directories) are preserved as well, the
behavior of GLOB_MARK has been adjusted not to append an additional
slash to results that already end in slash.
len was already passed as an argument, so don't use strcat, and use
memcpy instead of strcpy.
the LFS64 macro was not self-documenting and barely saved any
characters. simply use weak_alias directly so that it's clear what's
being done, and doesn't depend on a header to provide a strange macro.
commit 8c4be3e2209d2a1d3874b8bc2b474668fcbbbac6 was written to
preclude the GLOB_PERIOD extension from matching these directory
entries, but also precluded literal matches.
adjust the check that excludes . and .. to check whether the
GLOB_PERIOD flag is in effect, so that it cannot alter behavior in
cases governed by the standard, and also don't exclude . or .. in any
case where normal glob behavior (fnmatch's FNM_PERIOD flag) would have
included one or both of them (patterns such as ".*").
it's still not clear whether this is the preferred behavior for
GLOB_PERIOD, but at least it's clear that it can no longer break
applications which are not relying on quirks of a nonstandard feature.
GLOB_PERIOD is a gnu extension, and GNU glob does not seem to honor it
except in the last path component. it's not clear whether this a bug
or intentional, but it seems reasonable that it should exclude the
special entries . and .. when walking.
changes based on report and analysis by Julien Ramseier.
the check to prevent matching empty string wrongly blocked matching
of "/" due to checking emptiness after stripping leading slashes
rather than checking the full original argument string.
simplified from patch by Julien Ramseier.
commit 0dc99ac413d8bc054a2e95578475c7122455eee8 added input length
checking to avoid unsafe VLA allocation, but put it in the wrong
place, before the glob_t structure was zeroed out. while POSIX isn't
clear on whether it's permitted to call globfree after glob failed
with GLOB_NOSPACE, making it safe is clearly better than letting
uninitialized pointers get passed to free in non-conforming callers.
while we're fixing this, change strlen check to the idiomatic strnlen
version to avoid unbounded input scanning before returning an error.
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
POSIX is unclear on whether it should, but all historical
implementations seem to behave this way, and it seems more useful to
patch by sh4rm4
basically there are 3 choices for how to implement this variable-size
1. C99 flexible array member: breaks using dirent.h with pre-C99 compiler.
2. old way: length-1 string: generates array bounds warnings in caller.
3. new way: length-NAME_MAX string. no problems, simplifies all code.
of course the usable part in the pointer returned by readdir might be
shorter than NAME_MAX+1 bytes, but that is allowed by the standard and
doesn't hurt anything.
this actually inadvertently disallows some valid patterns with
redundant / or * characters, but it's better than allowing unbounded
eventually i'll write code to move the pattern to the stack and
eliminate redundancy to ensure that it fits in PATH_MAX at the
beginning of glob. this would also allow it to be modified in place
for passing to fnmatch rather than copied at each level of recursion.