|Age||Commit message (Collapse)||Author||Lines|
originally the namespace-infringing "large file support" interfaces
were included as part of glibc-ABI-compat, with the intent that they
not be used for linking, since our off_t is and always has been
unconditionally 64-bit and since we usually do not aim to support
nonstandard interfaces when there is an equivalent standard interface.
unfortunately, having the symbols present and available for linking
caused configure scripts to detect them and attempt to use them
without declarations, producing all the expected ill effects that
as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made
to prevent this, using macros to redirect the LFS64 names to the
standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE.
however, this has turned out to be a source of further problems,
especially since g++ defines _GNU_SOURCE by default. in particular,
the presence of these names as macros breaks a lot of valid code.
this commit removes all the LFS64 symbols and replaces them with a
mechanism in the dynamic linker symbol lookup failure path to retry
with the spurious "64" removed from the symbol name. in the future,
if/when the rest of glibc-ABI-compat is moved out of libc, this can be
this avoids the need for implementation-internal callers to depend on
the nonstandard AT_EMPTY_PATH extension to use __fstatat and isolates
knowledge of that extension to the implementation of __fstat.
riscv32 and future architectures only provide statx.
this makes it so we can drop direct stat syscall use in interfaces
that can't use the POSIX namespace.
instead, use the fstatat/stat functions, so that the logic for which
syscalls are present and usable is all in fstatat.
this results in a slight increase in cost for old kernels on 32-bit
archs: now statx will be attempted first rather than just using the
legacy time32 syscalls, despite us not caring about timestamps.
however, it's not even clear that the legacy syscalls *should* succeed
if the timestamps are out of range; arguably they should fail with
EOVERFLOW. as such, paying a small cost here on old kernels seems
with this change, fchmodat itself is no longer blocking ports to new
archs that lack the legacy syscalls.
because struct stat is no longer assumed to correspond to the
structure used by the stat-family syscalls, it's not valid to make any
of these syscalls directly using a buffer of type struct stat.
commit 9493892021eac4edf1776d945bcdd3f7a96f6978 moved all logic around
this change for stat-family functions into fstatat.c, making the
others wrappers for it. but a few other direct uses of the syscall
were overlooked. the ones in tmpnam/tempnam are harmless since the
syscalls are just used to test for file existence. however, the uses
in fchmodat and __map_file depend on getting accurate file properties,
and these functions may actually have been broken one or more mips
variants due to removal of conversion hacks from syscall_arch.h.
as a low-risk fix, simply use struct kstat in place of struct stat in
the affected places.
here _REDIR_TIME64 is used as an indication that there's an old ABI,
and thereby the old time32 timespec fields of struct stat.
keeping struct stat compatible and providing both versions of the
timespec fields is done so that ftw/nftw does not need painful compat
shims, and (more importantly) so that similar interfaces between pairs
of libc consumers (applications/libraries) will be less likely to
break when one has been rebuilt for time64 but the other has not.
these functions cannot provide the glibc lfs64-ABI-compatible symbols
when time_t differs from what it was in that ABI. instead, the aliases
need to be provided by the time32 compat shims or through some other
time64 syscall is used only if it's the only one defined for the arch,
or if either of the requested times does not fit in 32 bits. care is
taken to normalize the inputs to account for UTIME_NOW or UTIME_OMIT
in tv_nsec, in which case tv_sec should be ignored. this is needed not
only to avoid spurious time64 syscalls that might waste time failing
with ENOSYS, but also to accurately decide whether fallback is
if the requested time cannot be represented, the function fails with
ENOTSUP, defined in general as "The implementation does not support
the requested feature or value". neither the time64 syscall, nor this
error, can happen on current 32-bit archs where time_t is a 32-bit
type, and both are statically unreachable.
on 64-bit archs, there are only superficial changes to the
SYS_futimesat fallback path, which has been modified to pass long
instead of struct timeval to the kernel, making it suitable for use
on 32-bit archs even once time_t is changed to 64-bit. for 32-bit
archs, the call to SYS_utimensat has also been changed to copy the
timespecs through an array of long rather than passing the
previously the fallback wrongly failed with EINVAL rather than ENOSYS
when UTIME_NOW was used with one component but not both. commit
dd5f50da6f6c3df5647e922e47f8568a8896a752 introduced this behavior when
initially adding the fallback support.
instead, detect the case where both are UTIME_NOW early and replace
with a null times pointer; this may improve performance slightly (less
copy from user), and removes the complex logic from the fallback case.
it also makes things slightly simpler for adding time64 code paths.
commit 01ae3fc6d48f4a45535189b7a6db286535af08ca modified fstatat to
translate the kernel's struct stat ("kstat") into the libc struct stat.
To do this, it created a local kstat object, and copied its contents
into the user-provided object.
However, the commit neglected to update the fstat compatibility path and
its fallbacks. They continued to pass the user-supplied object to the
kernel, later overwiting it with the uninitialized memory in the local
commit dfc81828f7ab41da08f744c44117a1bb20a05749 accidentally defined
an instance of struct statx along with the struct declaration.
this commit adds a new backend for fstatat (and thereby the whole stat
family) using the SYS_statx syscall, but conditions the new code on
the kernel stat structure's time fields being smaller than time_t. in
principle that should make it all dead code at present, but mips64 has
a broken stat structure with 32-bit time fields despite having 64-bit
time_t elsewhere, so on mips64 it is a functional change that makes
post-Y2038 filesystem timestamps accessible.
whenever the 32-bit archs end up getting 64-bit time_t, regardless of
how that happens, the changes in this commit will automatically take
effect for them too.
AT_FDCWD is not a valid file descriptor, so POSIX requires fstat to
fail with EBADF. if passed to fstatat, the call would spuriously
succeed and return results for the working directory.
presently, all archs/ABIs have struct stat matching the kernel
stat type, except mips/mipsn32/mips64 which do conversion hacks in
syscall_arch.h to work around bugs in the kernel type. this patch
completely decouples them and adds a translation step to the success
path of fstatat. at present, this is just a gratuitous copying, but it
opens up multiple possibilities for future support for 64-bit time_t
on 32-bit archs and for cleaned-up/unified ABIs.
for clarity, the mips hacks are not yet removed in this commit, so the
mips kstat structs still correspond to the output of the hacks in
their syscall_arch.h files, not the raw kernel type. a subsequent
commit will fix this.
equivalent logic for fstat+O_PATH fallback and direct use of
stat/lstat syscalls where appropriate is kept, now in the fstatat
function. this change both improves functionality (now, fstatat forms
equivalent to fstat/lstat/stat will work even on kernels too old to
have the at functions) and localizes direct interfacing with the
kernel stat structure to one file.
the LFS64 macro was not self-documenting and barely saved any
characters. simply use weak_alias directly so that it's clear what's
being done, and doesn't depend on a header to provide a strange macro.
libc.h was intended to be a header for access to global libc state and
related interfaces, but ended up included all over the place because
it was the way to get the weak_alias macro. most of the inclusions
removed here are places where weak_alias was needed. a few were
recently introduced for hidden. some go all the way back to when
libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented)
cancellation points had to include it.
remaining spurious users are mostly callers of the LOCK/UNLOCK macros
and files that use the LFS64 macro to define the awful *64 aliases.
in a few places, new inclusion of libc.h is added because several
internal headers no longer implicitly include libc.h.
declarations for __lockfile and __unlockfile are moved from libc.h to
stdio_impl.h so that the latter does not need libc.h. putting them in
libc.h made no sense at all, since the macros in stdio_impl.h are
needed to use them correctly anyway.
syscall.h was chosen as the header to declare it, since its intended
usage is alongside syscalls as a fallback for operations the direct
syscall does not support.
in the case where a non-symlink file was replaced by a symlink during
the fchmodat operation with AT_SYMLINK_NOFOLLOW, mode change on the
new symlink target was successfully suppressed, but the error was not
reported. instead, fchmodat simply returned 0.
this code path is used only on archs without the plain, non-at
syscalls, and only when the fstat syscall fails with EBADF on a valid
file descriptor. this in turn can happen only for O_PATH file
descriptors, and may not happen at all on the newer kernels needed for
supporting such archs.
with the flags argument omitted, spurious fstat failures may happen
when the argument register happens to have the AT_SYMLINK_NOFOLLOW bit
these are put alongside the similar functions for __xstat, etc. in
__xstat.c to avoid bloating the number of source files.
these are mostly intended for use with dynamic linking (although they
can also be used statically with object files compiled against glibc
headers), so having them broken down into separate source files to
optimize for static linking is unlikely to be worth the cost having
more files in the source tree (which contributes to libc.a overhead,
compile time, link time, ar/linker command line size exhaustion, and
such archs are expected to omit definitions of the SYS_* macros for
syscalls their kernels lack from arch/$ARCH/bits/syscall.h. the
preprocessor is then able to select the an appropriate implementation
for affected functions. two basic strategies are used on a
where the old syscalls correspond to deprecated library-level
functions, the deprecated functions have been converted to wrappers
for the modern function, and the modern function has fallback code
(omitted at the preprocessor level on new archs) to make use of the
old syscalls if the new syscall fails with ENOSYS. this also improves
functionality on older kernels and eliminates the incentive to program
with deprecated library-level functions for the sake of compatibility
with older kernels.
in other situations where the old syscalls correspond to library-level
functions which are not deprecated but merely lack some new features,
such as the *at functions, the old syscalls are still used on archs
which support them. this may change at some point in the future if or
when fallback code is added to the new functions to make them usable
(possibly with reduced functionality) on old kernels.
the workaround/fallback code for supporting O_PATH file descriptors
when the kernel lacks support for performing these operations on them
caused EBADF to get replaced by ENOENT (due to missing entry in
/proc/self/fd). this is unlikely to affect real-world code (calls that
might yield EBADF are generally unsafe, especially in library code)
but it was breaking some test cases.
the fix I've applied is something of a tradeoff: it adds one syscall
to these operations on kernels where the workaround is needed. the
alternative would be to catch ENOENT from the /proc lookup and
translate it to EBADF, but I want to avoid doing that in the interest
of not touching/depending on /proc at all in these functions as long
as the kernel correctly supports the operations. this is following the
general principle of isolating hacks to code paths that are taken on
broken systems, and keeping the code for correct systems completely
on newer kernels, fchdir and fstat work anyway. this same fix should
be applied to any other syscalls that are similarly affected.
with this change, the current definitions of O_SEARCH and O_EXEC as
O_PATH are mostly conforming to POSIX requirements. the main remaining
issue is that O_NOFOLLOW has different semantics.
I intend to add more Linux workarounds that depend on using these
pathnames, and some of them will be in "syscall" functions that, from
an anti-bloat standpoint, should not depend on the whole snprintf
previously, the AT_SYMLINK_NOFOLLOW flag was ignored, giving
dangerously incorrect behavior -- the target of the symlink had its
modes changed to the modes (usually 0777) intended for the symlink).
this issue was amplified by the fact that musl provides lchmod, as a
wrapper for fchmodat, which some archival programs take as a sign that
symlink modes are supported and thus attempt to use.
emulating AT_SYMLINK_NOFOLLOW was a difficult problem, and I
originally believed it could not be solved, at least not without
depending on kernels newer than 3.5.x or so where O_PATH works halfway
well. however, it turns out that accessing O_PATH file descriptors via
their pseudo-symlink entries in /proc/self/fd works much better than
trying to use the fd directly, and works even on older kernels.
moreover, the kernel has permanently pegged these references to the
inode obtained by the O_PATH open, so there should not be race
conditions with the file being moved, deleted, replaced, etc.
the main aim of this patch is to ensure that if not all fields are
filled in, they contain zeros, so as not to confuse applications.
reportedly some older kernels, including commonly used openvz kernels,
lack the f_flags field, resulting in applications reading random junk
as the mount flags; the common symptom seems to be wrongly considering
the filesystem to be mounted read-only and refusing to operate. glibc
has some amazingly ugly fallback code to get the mount flags for old
kernels, but having them really is not that important anyway; what
matters most is not presenting incorrect flags to the application.
I have also aimed to fill in some fields of statvfs that were
previously missing, and added code to explicitly zero the reserved
space at the end of the structure, which will make things easier in
the future if this space someday needs to be used.
support for these was recently added to sysmacros.h. note that the
syscall argument is a long, despite dev_t being 64-bit, so on 32-bit
archs the high bits will be lost. it appears the high bits are just
glibc silliness and not part of the kernel api, anyway, but it's nice
that we have them there for future expansion if needed.
this function is obsolete, however it's available as a syscall
and as such qemu userspace emulation tries to forward it to the
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
at the same time, make struct statfs match the traditional definition
and make it more useful, especially the fsid_t stuff.