musl - musl - an implementation of the standard library for Linux-based systems

Age	Commit message (Collapse)	Author	Lines
2020-05-22	restore lock-skipping for processes that return to single-threaded state	Rich Felker	-1/+3
	the design used here relies on the barrier provided by the first lock operation after the process returns to single-threaded state to synchronize with actions by the last thread that exited. by storing the intent to change modes in the same object used to detect whether locking is needed, it's possible to avoid an extra (possibly costly) memory load after the lock is taken.
2020-05-22	don't use libc.threads_minus_1 as relaxed atomic for skipping locks	Rich Felker	-1/+1
	after all but the last thread exits, the next thread to observe libc.threads_minus_1==0 and conclude that it can skip locking fails to synchronize with any changes to memory that were made by the last-exiting thread. this can produce data races. on some archs, at least x86, memory synchronization is unlikely to be a problem; however, with the inline locks in malloc, skipping the lock also eliminated the compiler barrier, and caused code that needed to re-check chunk in-use bits after obtaining the lock to reuse a stale value, possibly from before the process became single-threaded. this in turn produced corruption of the heap state. some uses of libc.threads_minus_1 remain, especially for allocation of new TLS in the dynamic linker; otherwise, it could be removed entirely. it's made non-volatile to reflect that the remaining accesses are only made under lock on the thread list. instead of libc.threads_minus_1, libc.threaded is now used for skipping locks. the difference is that libc.threaded is permanently true once an additional thread has been created. this will produce some performance regression in processes that are mostly single-threaded but occasionally creating threads. in the future it may be possible to bring back the full lock-skipping, but more care needs to be taken to produce a safe design.
2018-01-09	new lock algorithm with state and congestion count in one atomic int	Jens Gustedt	-5/+50
	A variant of this new lock algorithm has been presented at SAC'16, see https://hal.inria.fr/hal-01304108. A full version of that paper is available at https://hal.inria.fr/hal-01236734. The main motivation of this is to improve on the safety of the basic lock implementation in musl. This is achieved by squeezing a lock flag and a congestion count (= threads inside the critical section) into a single int. Thereby an unlock operation does exactly one memory transfer (a_fetch_add) and never touches the value again, but still detects if a waiter has to be woken up. This is a fix of a use-after-free bug in pthread_detach that had temporarily been patched. Therefore this patch also reverts c1e27367a9b26b9baac0f37a12349fc36567c8b6 This is also the only place where internal knowledge of the lock algorithm is used. The main price for the improved safety is a little bit larger code. Under high congestion, the scheduling behavior will be different compared to the previous algorithm. In that case, a successful put-to-sleep may appear out of order compared to the arrival in the critical section.
2013-09-20	fix potential deadlock bug in libc-internal locking logic	Rich Felker	-3/+6
	if a multithreaded program became non-multithreaded (i.e. all other threads exited) while one thread held an internal lock, the remaining thread would fail to release the lock. the the program then became multithreaded again at a later time, any further attempts to obtain the lock would deadlock permanently. the underlying cause is that the value of libc.threads_minus_1 at unlock time might not match the value at lock time. one solution would be returning a flag to the caller indicating whether the lock was taken and needs to be unlocked, but there is a simpler solution: using the lock itself as such a flag. note that this flag is not needed anyway for correctness; if the lock is not held, the unlock code is harmless. however, the memory synchronization properties associated with a_store are costly on some archs, so it's best to avoid executing the unlock code when it is unnecessary.
2012-04-24	ditch the priority inheritance locks; use malloc's version of lock	Rich Felker	-23/+3
	i did some testing trying to switch malloc to use the new internal lock with priority inheritance, and my malloc contention test got 20-100 times slower. if priority inheritance futexes are this slow, it's simply too high a price to pay for avoiding priority inversion. maybe we can consider them somewhere down the road once the kernel folks get their act together on this (and perferably don't link it to glibc's inefficient lock API)... as such, i've switch __lock to use malloc's implementation of lightweight locks, and updated all the users of the code to use an array with a waiter count for their locks. this should give optimal performance in the vast majority of cases, and it's simple. malloc is still using its own internal copy of the lock code because it seems to yield measurably better performance with -O3 when it's inlined (20% or more difference in the contention stress test).
2012-04-24	internal locks: new owner of contended lock must set waiters flag	Rich Felker	-1/+1
	this bug probably would have gone unnoticed since it's only used in the fallback code for systems where priority-inheritance locking fails. unfortunately this approach results in one spurious wake syscall on the final unlock, when there are no waiters remaining. the alternative (possibly better) would be to use broadcast wakes instead of reflagging the waiter unconditionally, and let each waiter reflag itself; this saves one syscall at the expense of invoking the "thundering herd" effect (worse performance degredation) when there are many waiters. ideally we would be able to update all of our locks to use an array of two ints rather than a single int, and use a separate counter system like proper mutexes use; then we could avoid all spurious wake calls without resorting to broadcasts. however, it's not clear to me that priority inheritance futexes support this usage. the kernel sets the waiters flag for them (just like we're doing now) and i can't tell if it's safe to bypass the kernel when unlocking just because we know (from private data, the waiter count) that there are no waiters. this is something that could be explored in the future.
2012-04-24	new internal locking primitive; drop spinlocks	Rich Felker	-6/+27
	we use priority inheritance futexes if possible so that the library cannot hit internal priority inversion deadlocks in the presence of realtime priority scheduling (full support to be added later).
2011-09-16	use a_swap rather than old name a_xchg	Rich Felker	-1/+1

2011-06-14	minor locking optimizations	Rich Felker	-1/+1

2011-04-06	consistency: change all remaining syscalls to use SYS_ rather than __NR_ prefix	Rich Felker	-1/+1

2011-03-19	syscall overhaul part two - unify public and internal syscall interface	Rich Felker	-2/+1
	with this patch, the syscallN() functions are no longer needed; a variadic syscall() macro allows syscalls with anywhere from 0 to 6 arguments to be made with a single macro name. also, manually casting each non-integer argument with (long) is no longer necessary; the casts are hidden in the macros. some source files which depended on being able to define the old macro SYSCALL_RETURNS_ERRNO have been modified to directly use __syscall() instead of syscall(). references to SYSCALL_SIGSET_SIZE and SYSCALL_LL have also been changed. x86_64 has not been tested, and may need a follow-up commit to fix any minor bugs/oversights.
2011-02-12	initial check-in, version 0.5.0v0.5.0	Rich Felker	-0/+12