musl/src/thread, branch v1.2.4

fix pthread_detach inadvertently acting as cancellation point in race case

2023-02-11T18:00:22+00:00

disabling cancellation around the pthread_join call seems to be the
safest and logically simplest fix. i believe it would also be possible
to just perform the unmap directly here after __tl_sync, removing the
dependency on pthread_join, but such an approach duplicately encodes a
lot more implementation assumptions.

use libc-internal malloc for pthread_atfork

2022-12-17T21:00:19+00:00

while no lock is held here making it a lock-order issue, replacement
malloc is likely to want to use pthread_atfork, possibly making the
call to malloc infinitely recursive.

even if not, there is no reason to prefer an application-provided
malloc here.

semaphores: fix missed wakes from ABA bug in waiter count logic

2022-12-13T23:39:44+00:00

because the has-waiters state in the semaphore value futex word is
only representable when the value is zero (the special value -1
represents "0 with potential new waiters"), it's lost if intervening
operations make the semaphore value positive again. this creates an
ABA issue in sem_post, whereby the post uses a stale waiters count
rather than re-evaluating it, skipping the futex wake if the stale
count was zero.

the fix here is based on a proposal by Alexey Izbyshev, with minor
changes to eliminate costly new spurious wake syscalls.

the basic idea is to replace the special value -1 with a sticky
waiters bit (repurposing the sign bit) preserved under both wait and
post. any post that takes place with the waiters bit set will perform
a futex wake.

to be useful, the waiters bit needs to be removable, and to remove it
safely, we perform a broadcast wake instead of a normal single-task
wake whenever removing the bit. this lets any un-accounted-for waiters
wake and re-add the waiters bit if they still need it.

there are multiple possible choices for when to perform this
broadcast, but the optimal choice seems to be doing it whenever the
observed waiters count is less than two (semantically, this means
exactly one, but we might see a stale count of zero). in this case,
the expected number of threads to be woken is one, with exactly the
same cost as a non-broadcast wake.

pthread_atfork: fix return value on malloc failure

2022-11-12T17:22:38+00:00

POSIX requires pthread_atfork to report errors via its return value,
not via errno. The only specified error is ENOMEM.

fix async thread cancellation stack alignment

2022-11-05T22:59:53+00:00

if async cancellation is enabled and acted upon, the stack pointer is
not necessarily pointing to a __syscall_cp_asm stack frame. the
contents of the stack being wrong don't really matter, but if the
stack pointer is not suitably aligned, the procedure call ABI is
violated when calling back into C code via __cancel, and pthread_exit,
cancellation cleanup handlers, TSD destructors, etc. may malfunction
or crash.

for the async cancel case, just call __cancel directly like we did
prior to commit 102f6a01e249ce4495f1119ae6d963a2a4a53ce5. restore the
signal mask prior to doing this since the cancellation handler runs
with all signals blocked.

fix missing synchronization of pthread TSD keys with MT-fork

2022-10-19T18:01:32+00:00

commit 167390f05564e0a4d3fcb4329377fd7743267560 seems to have
overlooked the presence of a lock here, probably because it was one of
the exceptions not using LOCK() but a rwlock.

as such, it can't be added to the generic table of locks to take, so
add an explicit atfork function for the pthread keys table. the order
it is called does not particularly matter since nothing else in libc
but pthread_exit interacts with keys.

fix potential unsynchronized access to killlock state at thread exit

2022-10-19T18:01:32+00:00

as reported by Alexey Izbyshev, when the second-to-last thread exits
causing a return to single-threaded (no locks needed) state, it
creates a situation where the last remaining thread may obtain the
killlock that's already held by the exiting thread. this means it may
erroneously use the tid of the exiting thread, and may corrupt the
lock state due to double-unlock.

commit 8d81ba8c0bc6fe31136cb15c9c82ef4c24965040, which (re)introduced
the switch back to single-threaded state, documents the intent that
the first lock after switching back should provide the necessary
synchronization. this is correct, but only works if the switch back is
made after there is no further need for synchronization with locks
(other than the thread list lock, which can't be bypassed) held by the
exiting thread.

in order to hit the bug, the remaining thread must first take a
different lock, causing it to perform an actual lock one last time,
consume the need_locks==-1 state, and transition to need_locks==0.
after that, the next attempt to lock the exiting thread's killlock
will bypass locking.

fix this by reordering the unlocking of killlock at thread exit time,
along with changes to the state protected by it, to occur earlier,
before the switch to single-threaded state. there are really no
constraints on where it's done, except that it occur after there is no
longer any possibility of application code executing in the exiting
thread, so do it as early as possible.

use alt signal stack when present for implementation-internal signals

2022-08-20T16:24:49+00:00

a request for this behavior has been open for a long time. the
motivation is that application code, particularly under some language
runtimes designed around very-low-footprint coroutine type constructs,
may be operating with extremely small stack sizes unsuitable for
receiving signals, using a separate signal stack for any signals it
might handle.

progress on this was blocked at one point trying to determine whether
the implementation is actually entitled to clobber the alt stack, but
the phrasing "available to the implementation" in the POSIX spec for
sigaltstack seems to make it clear that the application cannot rely on
the contents of this memory to be preserved in the absence of signal
delivery (on the abstract machine, excluding implementation-internal
signals) and that we can therefore use it for delivery of signals that
"don't exist" on the abstract machine.

no change is made for SIGTIMER since it is always blocked when used,
and accepted via sigwaitinfo rather than execution of the signal
handler.

fix error checking in pthread_getname_np

2021-08-06T15:26:15+00:00

len is unsigned and can never be smaller than 0. though unlikely, an
error in read() would have lead to an out of bounds write to name.

Reported-by: Michael Forney

add pthread_getname_np function

2021-04-20T19:34:30+00:00

based on the pthread_setname_np implementation