summaryrefslogtreecommitdiff
path: root/src/thread/pthread_barrier_wait.c
AgeCommit message (Collapse)AuthorLines
2014-08-25sanitize number of spins in userspace before futex waitRich Felker-1/+1
the previous spin limit of 10000 was utterly unreasonable. empirically, it could consume up to 200000 cycles, whereas a failed futex wait (EAGAIN) typically takes 1000 cycles or less, and even a true wait/wake round seems much less expensive. the new counts (100 for general wait, 200 in barrier) were simply chosen to be in the range of what's reasonable without having adverse effects on casual micro-benchmark tests I have been running. they may still be too high, from a standpoint of not wasting cpu cycles, but at least they're a lot better than before. rigorous testing across different archs and cpu models should be performed at some point to determine whether further adjustments should be made.
2014-08-22fix fallback checks for kernels without private futex supportRich Felker-1/+1
for unknown syscall commands, the kernel produces ENOSYS, not EINVAL.
2014-08-15make futex operations use private-futex mode when possibleRich Felker-1/+2
private-futex uses the virtual address of the futex int directly as the hash key rather than requiring the kernel to resolve the address to an underlying backing for the mapping in which it lies. for certain usage patterns it improves performance significantly. in many places, the code using futex __wake and __wait operations was already passing a correct fixed zero or nonzero flag for the priv argument, so no change was needed at the site of the call, only in the __wake and __wait functions themselves. in other places, especially where the process-shared attribute for a synchronization object was not previously tracked, additional new code is needed. for mutexes, the only place to store the flag is in the type field, so additional bit masking logic is needed for accessing the type. for non-process-shared condition variable broadcasts, the futex requeue operation is unable to requeue from a private futex to a process-shared one in the mutex structure, so requeue is simply disabled in this case by waking all waiters. for robust mutexes, the kernel always performs a non-private wake when the owner dies. in order not to introduce a behavioral regression in non-process-shared robust mutexes (when the owning thread dies), they are simply forced to be treated as process-shared for now, giving correct behavior at the expense of performance. this can be fixed by adding explicit code to pthread_exit to do the right thing for non-shared robust mutexes in userspace rather than relying on the kernel to do it, and will be fixed in this way later. since not all supported kernels have private futex support, the new code detects EINVAL from the futex syscall and falls back to making the call without the private flag. no attempt to cache the result is made; caching it and using the cached value efficiently is somewhat difficult, and not worth the complexity when the benefits would be seen only on ancient kernels which have numerous other limitations and bugs anyway.
2012-08-17fix extremely rare but dangerous race condition in robust mutexesRich Felker-19/+4
if new shared mappings of files/devices/shared memory can be made between the time a robust mutex is unlocked and its subsequent removal from the pending slot in the robustlist header, the kernel can inadvertently corrupt data in the newly-mapped pages when the process terminates. i am fixing the bug by using the same global vm lock mechanism that was used to fix the race condition with unmapping barriers after pthread_barrier_wait returns.
2011-09-28fix excessive/insufficient wakes in __vm_unlockRich Felker-3/+3
there is no need to send a wake when the lock count does not hit zero, but when it does, all waiters must be woken (since all with the same sign are eligible to obtain the lock).
2011-09-28improve pshared barriersRich Felker-11/+13
eliminate the sequence number field and instead use the counter as the futex because of the way the lock is held, sequence numbers are completely useless, and this frees up a field in the barrier structure to be used as a waiter count for the count futex, which lets us avoid some syscalls in the best case. as of now, self-synchronized destruction and unmapping should be fully safe. before any thread can return from the barrier, all threads in the barrier have obtained the vm lock, and each holds a shared lock on the barrier. the barrier memory is not inspected after the shared lock count reaches 0, nor after the vm lock is released.
2011-09-28next step making barrier self-sync'd destruction safeRich Felker-4/+12
i think this works, but it can be simplified. (next step)
2011-09-27correctly handle the degenerate barrier in the pshared caseRich Felker-1/+1
2011-09-27fix pshared barrier wrong return value.Rich Felker-1/+1
i set the return value but then never used it... oops!
2011-09-27process-shared barrier support, based on discussion with bdonlanRich Felker-7/+67
this implementation is rather heavy-weight, but it's the first solution i've found that's actually correct. all waiters actually wait twice at the barrier so that they can synchronize exit, and they hold a "vm lock" that prevents changes to virtual memory mappings (and blocks pthread_barrier_destroy) until all waiters are finished inspecting the barrier. thus, it is safe for any thread to destroy and/or unmap the barrier's memory as soon as pthread_barrier_wait returns, without further synchronization.
2011-05-06remove debug code that was missed in barrier commitRich Felker-1/+0
2011-05-06completely new barrier implementation, addressing major correctness issuesRich Felker-16/+44
the previous implementation had at least 2 problems: 1. the case where additional threads reached the barrier before the first wave was finished leaving the barrier was untested and seemed not to be working. 2. threads leaving the barrier continued to access memory within the barrier object after other threads had successfully returned from pthread_barrier_wait. this could lead to memory corruption or crashes if the barrier object had automatic storage in one of the waiting threads and went out of scope before all threads finished returning, or if one thread unmapped the memory in which the barrier object lived. the new implementation avoids both problems by making the barrier state essentially local to the first thread which enters the barrier wait, and forces that thread to be the last to return.
2011-02-17reorganize pthread data structures and move the definitions to alltypes.hRich Felker-11/+11
this allows sys/types.h to provide the pthread types, as required by POSIX. this design also facilitates forcing ABI-compatible sizes in the arch-specific alltypes.h, while eliminating the need for developers changing the internals of the pthread types to poke around with arch-specific headers they may not be able to test.
2011-02-12initial check-in, version 0.5.0v0.5.0Rich Felker-0/+31