|Age||Commit message (Collapse)||Author||Lines|
thanks to the original factorization using the __timedwait function,
there are no FUTEX_WAIT calls anywhere else, giving us a single point
of change to make nearly all the timed thread primitives time64-ready.
the one exception is the FUTEX_LOCK_PI command for PI mutex timedlock.
I haven't tried to make these two points share code, since they have
different fallbacks (no non-private fallback needed for PI since PI
was added later) and FUTEX_LOCK_PI isn't a cancellation point (thus
allowing the whole code path to inline into pthread_mutex_timedlock).
as for other changes in this series, the time64 syscall is used only
if it's the only one defined for the arch, or if the requested timeout
does not fit in 32 bits. on current 32-bit archs where time_t is a
32-bit type, this makes it statically unreachable.
on 64-bit archs, there are only superficial changes to the code after
preprocessing. on current 32-bit archs, the time is passed via an
intermediate copy to remove the assumption that time_t is a 32-bit
prior to linux 2.6.22, futex wait could fail with EINTR even for
non-interrupting (SA_RESTART) signals. this was no problem provided
the caller simply restarted the wait, but sem_[timed]wait is required
by POSIX to return when interrupted by a signal. commit
a113434cd68ce30642c4995b1caadcd084be6f09 introduced this behavior, and
commit c0ed5a201b2bdb6d1896064bec0020c9973db0a1 reverted it based on a
mistaken belief that it was not required. this belief stems from a bug
in the specification: the description requires the function to return
when interrupted, but the errors section marks EINTR as a "may fail"
condition rather than a "shall fail" one.
since there does seem to be significant value in the change made in
commit c0ed5a201b2bdb6d1896064bec0020c9973db0a1, making it so that
programs that call sem_wait without checking for EINTR don't silently
make forward progress without obtaining the semaphore or treat it as a
fatal error and abort, add a behind-the-scenes mechanism in the
__timedwait backend to suppress EINTR in programs that have never
installed interrupting signal handlers, and have sigaction track and
report this state. this way the semaphore code is not cluttered by
workarounds and can be updated (to be done in next commit) to reflect
the high-level logic for conforming behavior.
these changes are based loosely on a patch by Markus Wichmann, with
the main changes being atomic update to flag object and moving the
workaround from sem_timedwait to the __timedwait futex backend.
commits leading up to this one have moved the vast majority of
libc-internal interface declarations to appropriate internal headers,
allowing them to be type-checked and setting the stage to limit their
visibility. the ones that have not yet been moved are mostly
namespace-protected aliases for standard/public interfaces, which
exist to facilitate implementing plain C functions in terms of POSIX
functionality, or C or POSIX functionality in terms of extensions that
are not standardized. some don't quite fit this description, but are
"internally public" interfacs between subsystems of libc.
rather than create a number of newly-named headers to declare these
functions, and having to add explicit include directives for them to
every source file where they're needed, I have introduced a method of
wrapping the corresponding public headers.
parallel to the public headers in $(srcdir)/include, we now have
wrappers in $(srcdir)/src/include that come earlier in the include
path order. they include the public header they're wrapping, then add
declarations for namespace-protected versions of the same interfaces
and any "internally public" interfaces for the subsystem they
along these lines, the wrapper for features.h is now responsible for
the definition of the hidden, weak, and weak_alias macros. this means
source files will no longer need to include any special headers to
access these features.
over time, it is my expectation that the scope of what is "internally
public" will expand, reducing the number of source files which need to
include *_impl.h and related headers down to those which are actually
implementing the corresponding subsystems, not just using them.
The flag 1<<7 is used in several places for different purposes that are
not always easy to distinguish. Mark those usages that correspond to the
flag that is used by the kernel for futexes.
previously, the __timedwait function was optionally a cancellation
point depending on whether it was passed a pointer to a cleaup
function and context to register. as of now, only one caller actually
used such a cleanup function (and it may face removal soon); most
callers either passed a null pointer to disable cancellation or a
dummy cleanup function.
now, __timedwait is never a cancellation point, and __timedwait_cp is
the cancellable version. this makes the intent of the calling code
more obvious and avoids ugly dummy functions and long argument lists.
as part of abstracting the futex wait, this function suppresses all
futex error values which callers should not see using a whitelist
approach. when the masked cancellation mode was added, the new
ECANCELED error was not whitelisted. this omission caused the new
pthread_cond_wait code using masked cancellation to exhibit a spurious
wake (rather than acting on cancellation) when the request arrived
after blocking on the cond var.
The intent of this is to avoid name space pollution of the C threads
This has two sides to it. First we have to provide symbols that wouldn't
pollute the name space for the C threads implementation. Second we have
to clean up some internal uses of POSIX functions such that they don't
implicitly drag in such symbols.
for unknown syscall commands, the kernel produces ENOSYS, not EINVAL.
private-futex uses the virtual address of the futex int directly as
the hash key rather than requiring the kernel to resolve the address
to an underlying backing for the mapping in which it lies. for certain
usage patterns it improves performance significantly.
in many places, the code using futex __wake and __wait operations was
already passing a correct fixed zero or nonzero flag for the priv
argument, so no change was needed at the site of the call, only in the
__wake and __wait functions themselves. in other places, especially
where the process-shared attribute for a synchronization object was
not previously tracked, additional new code is needed. for mutexes,
the only place to store the flag is in the type field, so additional
bit masking logic is needed for accessing the type.
for non-process-shared condition variable broadcasts, the futex
requeue operation is unable to requeue from a private futex to a
process-shared one in the mutex structure, so requeue is simply
disabled in this case by waking all waiters.
for robust mutexes, the kernel always performs a non-private wake when
the owner dies. in order not to introduce a behavioral regression in
non-process-shared robust mutexes (when the owning thread dies), they
are simply forced to be treated as process-shared for now, giving
correct behavior at the expense of performance. this can be fixed by
adding explicit code to pthread_exit to do the right thing for
non-shared robust mutexes in userspace rather than relying on the
kernel to do it, and will be fixed in this way later.
since not all supported kernels have private futex support, the new
code detects EINVAL from the futex syscall and falls back to making
the call without the private flag. no attempt to cache the result is
made; caching it and using the cached value efficiently is somewhat
difficult, and not worth the complexity when the benefits would be
seen only on ancient kernels which have numerous other limitations and
the new absolute-time-based wait kernelside was hard to get right and
basically just code duplication. it could only improve "performance"
when waiting, and even then, the improvement was just slight drop in
cpu usage during a wait.
actually, with vdso clock_gettime, the "old" way will be even faster
than the "new" way if the time has already expired, since it will not
invoke any syscalls. it can determine entirely in userspace that it
needs to return ETIMEDOUT.
it's unclear whether EINVAL or ENOSYS is used when the operation is
not supported, so check for both...
futex returns EINVAL, not ENOSYS, when op is not supported.
unfortunately this looks just like EINVAL from other causes, and we
end up running the fallback code and getting EINVAL again. fortunately
this case should be rare since correct code should not generate EINVAL
- FUTEX_WAIT_BITSET op will be used for timed waits if available. this
saves a call to clock_gettime.
- error checking for the timespec struct is now inside __timedwait so
it doesn't need to be duplicated everywhere. cond_timedwait still
needs to duplicate it to avoid unlocking the mutex, though.
- pushing and popping the cancellation handler is delegated to
__timedwait, and cancellable/non-cancellable waits are unified.
this patch improves the correctness, simplicity, and size of
cancellation-related code. modulo any small errors, it should now be
completely conformant, safe, and resource-leak free.
the notion of entering and exiting cancellation-point context has been
completely eliminated and replaced with alternative syscall assembly
code for cancellable syscalls. the assembly is responsible for setting
up execution context information (stack pointer and address of the
syscall instruction) which the cancellation signal handler can use to
determine whether the interrupted code was in a cancellable state.
these changes eliminate race conditions in the previous generation of
cancellation handling code (whereby a cancellation request received
just prior to the syscall would not be processed, leaving the syscall
to block, potentially indefinitely), and remedy an issue where
non-cancellable syscalls made from signal handlers became cancellable
if the signal handler interrupted a cancellation point.
x86_64 asm is untested and may need a second try to get it right.
1. make sem_[timed]wait interruptible by signals, per POSIX
2. keep a waiter count in order to avoid unnecessary futex wake syscalls
with this patch, the syscallN() functions are no longer needed; a
variadic syscall() macro allows syscalls with anywhere from 0 to 6
arguments to be made with a single macro name. also, manually casting
each non-integer argument with (long) is no longer necessary; the
casts are hidden in the macros.
some source files which depended on being able to define the old macro
SYSCALL_RETURNS_ERRNO have been modified to directly use __syscall()
instead of syscall(). references to SYSCALL_SIGSET_SIZE and SYSCALL_LL
have also been changed.
x86_64 has not been tested, and may need a follow-up commit to fix any