|Age||Commit message (Collapse)||Author||Lines|
Resource usage data is filled by the kernel only when wait4 returns
a pid, i.e. a positive value.
Commit 5850546e9669f793aab61dfc7c4f2c1ff35c4b29 introduced this bug,
possibly because of copy-pasting from getrusage.
originally the namespace-infringing "large file support" interfaces
were included as part of glibc-ABI-compat, with the intent that they
not be used for linking, since our off_t is and always has been
unconditionally 64-bit and since we usually do not aim to support
nonstandard interfaces when there is an equivalent standard interface.
unfortunately, having the symbols present and available for linking
caused configure scripts to detect them and attempt to use them
without declarations, producing all the expected ill effects that
as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made
to prevent this, using macros to redirect the LFS64 names to the
standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE.
however, this has turned out to be a source of further problems,
especially since g++ defines _GNU_SOURCE by default. in particular,
the presence of these names as macros breaks a lot of valid code.
this commit removes all the LFS64 symbols and replaces them with a
mechanism in the dynamic linker symbol lookup failure path to retry
with the spurious "64" removed from the symbol name. in the future,
if/when the rest of glibc-ABI-compat is moved out of libc, this can be
This is a part of the interface contract defined in the Linux man
page (official for a Linux-specific interface) and asserted by test
cases in the Linux Test Project (LTP).
a request for this behavior has been open for a long time. the
motivation is that application code, particularly under some language
runtimes designed around very-low-footprint coroutine type constructs,
may be operating with extremely small stack sizes unsuitable for
receiving signals, using a separate signal stack for any signals it
progress on this was blocked at one point trying to determine whether
the implementation is actually entitled to clobber the alt stack, but
the phrasing "available to the implementation" in the POSIX spec for
sigaltstack seems to make it clear that the application cannot rely on
the contents of this memory to be preserved in the absence of signal
delivery (on the abstract machine, excluding implementation-internal
signals) and that we can therefore use it for delivery of signals that
"don't exist" on the abstract machine.
no change is made for SIGTIMER since it is always blocked when used,
and accepted via sigwaitinfo rather than execution of the signal
this is a Linux-specific function and not covered by POSIX's
requirements for which interfaces are cancellation points, but glibc
makes it one and existing software relies on it being one.
at some point a review for similar functions that should be made
cancellation points should be done.
this function is outside the scope of the standards, but logically
should behave like the set*id functions whose effects are
this is a prerequisite for addition of other interfaces that use
kernel tids, including futex and SIGEV_THREAD_ID.
there is some ambiguity as to whether the semantic return type should
be int or pid_t. either way, futex API imposes a contract that the
values fit in int (excluding some upper reserved bits). glibc used
pid_t, so in the interest of not having gratuitous mismatch (the
underlying types are the same anyway), pid_t is used here as well.
while conceptually this is a syscall, the copy stored in the thread
structure is always valid in all contexts where it's valid to call
libc functions, so it's used to avoid the syscall.
clock_adjtime always returns the current clock setting in struct
timex, so it's always possible that the time64 version is needed.
the 64-bit time code path used the wrong (time32) syscall. fortunately
this code path is not yet taken unless attempting to set a post-Y2038
commit 2b4fd6f75b4fa66d28cddcf165ad48e8fda486d1 added time64 for this
function, but did so with a hidden assumption that the new time64
version of struct timex will be layout-compatible with the old one.
however, there is little benefit to doing it that way, and the cost is
permanent special-casing of 32-bit archs with 64-bit time_t in the
public interface definitions.
instead, do a full translation of the structure going in and out. this
commit is actually a revision to an earlier uncommited version of the
presently the kernel does not actually define time64 versions of these
syscalls, and they're not really needed except to represent extreme
cpu time usage. however, x32's versions of the syscalls already behave
as time64 ones, meaning the functions were broken on x32 if the caller
used any part of the rusage result other than ru_utime and ru_stime.
commit 7e8171143124f7f510db555dc6f6327a965a3e84 made it possible to
fix this by treating x32's syscalls as time64 versions.
in the non-time64-syscall case, make the syscall with the rusage
destination pointer adjusted so that all members but the timevals line
up between the libc and kernel structures. on 64-bit archs, or present
32-bit archs with 32-bit time_t, the timevals will line up too and no
further work is needed. for future 32-bit archs with 64-bit time_t,
the timevals are copied into place, contingent on time_t being larger
the 64-bit/time64 version of the syscall is not API-compatible with
the userspace timex structure definition; fields specified as long
have type long long. so when using the time64 syscall, we have to
convert the entire structure. this was always the case for x32 as
well, but went unnoticed, meaning that clock_adjtime just passed junk
to the kernel on x32. it should be fixed now.
for the fallback case, we avoid encoding any assumptions about the new
location of the time member or naming of the legacy slots by accessing
them through a union of the kernel type and the new userspace type.
the only assumption is that the non-time members live at the same
offsets as in the (non-time64, long-based) kernel timex struct. this
property saves us from having to convert the whole thing, and avoids a
lot of additional work in compat shims.
the new code is statically unreachable for now except on x32, where it
fixes major brokenness. it is permanently unreachable on 64-bit.
the changes here are semantically and structurally identical to those
made to timer_settime and timer_gettime for time64 support.
time64 syscall is used only if it's the only one defined for the arch,
or if the requested timeout length does not fit in 32 bits. on current
32-bit archs where time_t is a 32-bit type, this makes it statically
on 64-bit archs, there are only superficial changes to the code after
preprocessing. both before and after these changes, these functions
copied their timeout arguments to avoid letting the kernel clobber the
caller's copies. now, the copying also serves to change the type from
userspace timespec to a pair of longs, which makes a difference only
in the 32-bit fallback case, not on 64-bit.
this is yet another place where special handling of time syscalls can
and should be avoided by implementing legacy functions in terms of
their modern replacements. in theory a fallback to SYS_settimeofday
could be added to clock_settime, but SYS_clock_settime has been
available since Linux 2.6.0 or earlier, i.e. all the way back to the
minimum supported version.
this removes the assumption that userspace struct timex matches the
syscall type and sets the stage for 64-bit time_t on 32-bit archs.
this sets the stage for having the conversion logic for 64-bit time_t
all in one file, and as a bonus makes clock_adjtime for CLOCK_REALTIME
work even on kernels too old to have the clock_adjtime syscall.
the linux syscall treats this argument as having type int, so passing
extremely long buffer sizes would be misinterpreted by the kernel.
since "short reads" are always acceptable, just cap it down.
patch based on report and suggested change by Florian Weimer.
Author: Alex Suykov <email@example.com>
Author: Aric Belsito <firstname.lastname@example.org>
Author: Drew DeVault <email@example.com>
Author: Michael Clark <firstname.lastname@example.org>
Author: Michael Forney <email@example.com>
Author: Stefan O'Rear <firstname.lastname@example.org>
This port has involved the work of many people over several years. I
have tried to ensure that everyone with substantial contributions has
been credited above; if any omissions are found they will be noted
later in an update to the authors/contributors list in the COPYRIGHT
The version committed here comes from the riscv/riscv-musl repo's
commit 3fe7e2c75df78eef42dcdc352a55757729f451e2, with minor changes by
me for issues found during final review:
- a_ll/a_sc atomics are removed (according to the ISA spec, lr/sc
are not safe to use in separate inline asm fragments)
- a_cas[_p] is fixed to be a memory barrier
- the call from the _start assembly into the C part of crt1/ldso is
changed to allow for the possibility that the linker does not place
them nearby each other.
- DTP_OFFSET is defined correctly so that local-dynamic TLS works
- reloc.h LDSO_ARCH logic is simplified and made explicit.
- unused, non-functional crti/n asm files are removed.
- an empty .sdata section is added to crt1 so that the
__global_pointer reference is resolvable.
- indentation style errors in some asm files are fixed.
this is a workaround to avoid a crashing regression on qemu-user when
dynamic TLS is installed at dlopen time. the sigaction syscall should
not be able to fail, but it does fail for implementation-internal
signals under qemu user-level emulation if the host libc qemu is
running under reserves the same signals for implementation-internal
use, since qemu makes no provision to redirect/emulate them. after
sigaction fails, the subsequent tkill would terminate the process
abnormally as the default action.
no provision to account for membarrier failing is made in the dynamic
linker code that installs new TLS. at the formal level, the missing
barrier in this case is incorrect, and perhaps we should fail the
dlopen operation, but in practice all the archs we support (and
probably all real-world archs except alpha, which isn't yet supported)
should give the right behavior with no barrier at all as a consequence
of consume-order properties.
in the long term, this workaround should be supplemented or replaced
by something better -- a different fallback approach to ensuring
memory consistency, or dynamic allocation of implementation-internal
signals. the latter is appealing in that it would allow cancellation
to work under qemu-user too, and would even allow many levels of
the motivation for this change is twofold. first, it gets the fallback
logic out of the dynamic linker, improving code readability and
organization. second, it provides application code that wants to use
the membarrier syscall, which depends on preregistration of intent
before the process becomes multithreaded unless unbounded latency is
acceptable, with a symbol that, when linked, ensures that this
the LFS64 macro was not self-documenting and barely saved any
characters. simply use weak_alias directly so that it's clear what's
being done, and doesn't depend on a header to provide a strange macro.
libc.h was intended to be a header for access to global libc state and
related interfaces, but ended up included all over the place because
it was the way to get the weak_alias macro. most of the inclusions
removed here are places where weak_alias was needed. a few were
recently introduced for hidden. some go all the way back to when
libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented)
cancellation points had to include it.
remaining spurious users are mostly callers of the LOCK/UNLOCK macros
and files that use the LFS64 macro to define the awful *64 aliases.
in a few places, new inclusion of libc.h is added because several
internal headers no longer implicitly include libc.h.
declarations for __lockfile and __unlockfile are moved from libc.h to
stdio_impl.h so that the latter does not need libc.h. putting them in
libc.h made no sense at all, since the macros in stdio_impl.h are
needed to use them correctly anyway.
the __-prefixed filename does not make sense when the only purpose of
this file is implementing a public function that's not used as a
backend for implementing the standard dirent functions.
commits leading up to this one have moved the vast majority of
libc-internal interface declarations to appropriate internal headers,
allowing them to be type-checked and setting the stage to limit their
visibility. the ones that have not yet been moved are mostly
namespace-protected aliases for standard/public interfaces, which
exist to facilitate implementing plain C functions in terms of POSIX
functionality, or C or POSIX functionality in terms of extensions that
are not standardized. some don't quite fit this description, but are
"internally public" interfacs between subsystems of libc.
rather than create a number of newly-named headers to declare these
functions, and having to add explicit include directives for them to
every source file where they're needed, I have introduced a method of
wrapping the corresponding public headers.
parallel to the public headers in $(srcdir)/include, we now have
wrappers in $(srcdir)/src/include that come earlier in the include
path order. they include the public header they're wrapping, then add
declarations for namespace-protected versions of the same interfaces
and any "internally public" interfaces for the subsystem they
along these lines, the wrapper for features.h is now responsible for
the definition of the hidden, weak, and weak_alias macros. this means
source files will no longer need to include any special headers to
access these features.
over time, it is my expectation that the scope of what is "internally
public" will expand, reducing the number of source files which need to
include *_impl.h and related headers down to those which are actually
implementing the corresponding subsystems, not just using them.
policy is that all public functions which have a public declaration
should be defined in a context where that public declaration is
visible, to avoid preventable type mismatches.
an audit performed using GCC's -Wmissing-declarations turned up the
violations corrected here. in some cases the public header had not
been included; in others, a feature test macro needed to make the
declaration visible had been omitted.
in the case of gethostent and getnetent, the omission seems to have
been intentional, as a hack to admit a single stub definition for both
functions. this kind of hack is no longer acceptable; it's UB and
would not fly with LTO or advanced toolchains. the hack is undone to
make exposure of the declarations possible.
memfd_create was added in linux v3.17 and glibc has api for it.
mlock2 syscall was added in linux v4.4 and glibc has api for it.
It falls back to mlock in case of flags==0, so that case works
even on older kernels.
MLOCK_ONFAULT is moved under _GNU_SOURCE following glibc.
This syscall is available since Linux 3.17 and was also implemented in
glibc in version 2.25 using the same interfaces.
all such arch-specific translation units are being moved to
appropriate arch dirs under the main src tree.
being nonstandard, the closest thing to a specification for this
function is its man page, which documents it as returning int. it can
fail with EBADF if the file descriptor passed is invalid.
On 32 bit mips the kernel uses -1UL/2 to mark RLIM_INFINITY (and
this is the definition in the userspace api), but since it is in
the middle of the valid range of limits and limits are often
compared with relational operators, various kernel side logic is
broken if larger than -1UL/2 limits are used. So we truncate the
limits to -1UL/2 in get/setrlimit and prlimit.
Even if the kernel side logic consistently treated -1UL/2 as greater
than any other limit value, there wouldn't be any clean workaround
that allowed using large limits:
* using -1UL/2 as RLIM_INFINITY in userspace would mean different
infinity value for get/setrlimt and prlimit (where infinity is always
-1ULL) and userspace logic could break easily (just like the kernel
is broken now) and more special case code would be needed for mips.
* translating -1UL/2 kernel side value to -1ULL in userspace would
mean that -1UL/2 limit cannot be set (eg. -1UL/2+1 had to be passed
to the kernel instead).
such archs are expected to omit definitions of the SYS_* macros for
syscalls their kernels lack from arch/$ARCH/bits/syscall.h. the
preprocessor is then able to select the an appropriate implementation
for affected functions. two basic strategies are used on a
where the old syscalls correspond to deprecated library-level
functions, the deprecated functions have been converted to wrappers
for the modern function, and the modern function has fallback code
(omitted at the preprocessor level on new archs) to make use of the
old syscalls if the new syscall fails with ENOSYS. this also improves
functionality on older kernels and eliminates the incentive to program
with deprecated library-level functions for the sake of compatibility
with older kernels.
in other situations where the old syscalls correspond to library-level
functions which are not deprecated but merely lack some new features,
such as the *at functions, the old syscalls are still used on archs
which support them. this may change at some point in the future if or
when fallback code is added to the new functions to make them usable
(possibly with reduced functionality) on old kernels.
it will be needed to implement some things in sysconf, and the syscall
can't easily be used directly because the x32 syscall uses the wrong
structure layout. the l (uncreative, for "linux") prefix is used since
the symbol name __sysinfo is already taken for AT_SYSINFO from the aux
the way the x32 override of this function works is also changed to be
simpler and avoid the useless jump instruction.
the kernel uses long longs in the struct, but the documentation
says they're long. so we need to fixup the mismatch between the
userspace and kernelspace structs.
since the struct offers a mem_unit member, we can avoid truncation
by adjusting that value.
The architecture-specific assembly versions of clone did not set errno on
failure, which is inconsistent with glibc. __clone still returns the error
via its return value, and clone is now a wrapper that sets errno as needed.
The public clone has also been moved to src/linux, as it's not directly
related to the pthreads API.
__clone is called by pthread_create, which does not report errors via
errno. Though not strictly necessary, it's nice to avoid clobbering errno
it's unclear what the historical signature for this function was, but
semantically, the argument should be a pointer to const, and this is
what glibc uses. correct programs should not be using this function
anyway, so it's unlikely to matter.
both the kernel and glibc agree that this argument is unsigned; the
incorrect type ssize_t came from erroneous man pages.
this was wrong since the original commit adding inotify, and I don't
see any explanation for it. not even the man pages have it wrong. it
was most likely a copy-and-paste error.
the header is included only as a guard to check that the declaration
and definition match, so the typo didn't cause any breakage aside
from omitting this check.
the reasons are the same as for sbrk. unlike sbrk, there is no safe
usage because brk does not return any useful information, so it should
just fail unconditionally.
use of sbrk is never safe; it conflicts with malloc, and malloc may be
used internally by the implementation basically anywhere. prior to
this change, applications attempting to use sbrk to do their own heap
management simply caused untrackable memory corruption; now, they will
fail with ENOMEM allowing the errors to be fixed.
sbrk(0) is still permitted as a way to get the current brk; some
misguided applications use this as a measurement of their memory
usage or for other related purposes, and such usage is harmless.
eventually sbrk may be re-added if/when malloc is changed to avoid
using the brk by using mmap for all allocations.