summaryrefslogtreecommitdiff
path: root/src/thread
AgeCommit message (Collapse)AuthorLines
2018-10-12combine arch ABI's DTP_OFFSET into DTV pointersRich Felker-2/+2
as explained in commit 6ba5517a460c6c438f64d69464fdfc3269a4c91a, some archs use an offset (typicaly -0x8000) with their DTPOFF relocations, which __tls_get_addr needs to invert. on affected archs, which lack direct support for large immediates, this can cost multiple extra instructions in the hot path. instead, incorporate the DTP_OFFSET into the DTV entries. this means they are no longer valid pointers, so store them as an array of uintptr_t rather than void *; this also makes it easier to access slot 0 as a valid slot count. commit e75b16cf93ebbc1ce758d3ea6b2923e8b2457c68 left behind cruft in two places, __reset_tls and __tls_get_new, from back when it was possible to have uninitialized gap slots indicated by a null pointer in the DTV. since the concept of null pointer is no longer meaningful with an offset applied, remove this cruft. presently there are no archs with both TLSDESC and nonzero DTP_OFFSET, but the dynamic TLSDESC relocation code is also updated to apply an inverted offset to its offset field, so that the offset DTV would not impose a runtime cost in TLSDESC resolver functions.
2018-09-18limit the configurable default stack/guard size for threadsRich Felker-6/+10
limit to 8MB/1MB, repectively. since the defaults cannot be reduced once increased, excessively large settings would lead to an unrecoverably broken state. this change is in preparation to allow defaults to be increased via program headers at the linker level. creation of threads that really need larger sizes needs to be done with an explicit attribute.
2018-09-18remove redundant declarations of __default_stacksize, __default_guardsizeRich Felker-8/+0
these are now declared in pthread_impl.h.
2018-09-18fix benign data race in pthread_attr_initRich Felker-0/+2
access to defaults should be protected against concurrent changes.
2018-09-18fix deletion of pthread tsd keys that still have non-null values storedRich Felker-18/+101
per POSIX, deletion of a key for which some threads still have values stored is permitted, and newly created keys must initially hold the null value in all threads. these properties were not met by our implementation; if a key was deleted with values left and a new key was created in the same slot, the old values were still visible. moreover, due to lack of any synchronization in pthread_key_delete, there was a TOCTOU race whereby a concurrent pthread_exit could attempt to call a null destructor pointer for the newly orphaned value. this commit introduces a solution based on __synccall, stopping the world to zero out the values for deleted keys, but only does so lazily when all key slots have been exhausted. pthread_key_delete is split off into a separate translation unit so that static-linked programs which only create keys but never delete them will not pull in the __synccall machinery. a global rwlock is added to synchronize creation and deletion of keys with dtor execution. since the dtor execution loop now has to release and retake the lock around its call to each dtor, checks are made not to call the nodtor dummy function for keys which lack a dtor.
2018-09-15check for kernel support before allowing robust mutex creationRich Felker-1/+17
on some archs, linux support for futex operations (including robust_list processing) that depend on kernelspace CAS is conditional on a runtime check. as of linux 4.18, this check fails unconditionally on nommu archs that perform it, and spurious failure on powerpc64 was observed but not explained. it's also possible that futex support is omitted entirely, or that the kernel is older than 2.6.17. for most futex ops, ENOSYS does not yield hard breakage; userspace will just spin at 100% cpu load. but for robust mutexes, correct behavior depends on the kernel functionality. use the get_robust_list syscall to probe for support at the first call to pthread_mutexattr_setrobust, and block creation of robust mutexes with a reportable error if they can't be supported.
2018-09-12split internal lock API out of libc.h, creating lock.hRich Felker-1/+8
this further reduces the number of source files which need to include libc.h and thereby be potentially exposed to libc global state and internals. this will also facilitate further improvements like adding an inline fast-path, if we want to do so later.
2018-09-12reduce spurious inclusion of libc.hRich Felker-8/+1
libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
2018-09-12remove unused __futex function and source fileRich Felker-7/+0
the direct syscall or various thin and mostly-inline wrappers around it are used instead internally. at some point a public futex function should be added, but it's not yet clear what the signature should be, and in the mean time this file is not useful.
2018-09-12hide __pthread_once_full symbolRich Felker-1/+1
this is a special case that does not need a declaration, because it's not even a libc-internal interface between translation units. instead it's a poor hack around compilers' inability to shrink-wrap critical code paths. after vis.h was disabled, it became more of a pessimization on many archs due to the extra layer of machinery to support a call through the PLT, but now it should be efficient again.
2018-09-12overhaul internally-public declarations using wrapper headersRich Felker-53/+6
commits leading up to this one have moved the vast majority of libc-internal interface declarations to appropriate internal headers, allowing them to be type-checked and setting the stage to limit their visibility. the ones that have not yet been moved are mostly namespace-protected aliases for standard/public interfaces, which exist to facilitate implementing plain C functions in terms of POSIX functionality, or C or POSIX functionality in terms of extensions that are not standardized. some don't quite fit this description, but are "internally public" interfacs between subsystems of libc. rather than create a number of newly-named headers to declare these functions, and having to add explicit include directives for them to every source file where they're needed, I have introduced a method of wrapping the corresponding public headers. parallel to the public headers in $(srcdir)/include, we now have wrappers in $(srcdir)/src/include that come earlier in the include path order. they include the public header they're wrapping, then add declarations for namespace-protected versions of the same interfaces and any "internally public" interfaces for the subsystem they correspond to. along these lines, the wrapper for features.h is now responsible for the definition of the hidden, weak, and weak_alias macros. this means source files will no longer need to include any special headers to access these features. over time, it is my expectation that the scope of what is "internally public" will expand, reducing the number of source files which need to include *_impl.h and related headers down to those which are actually implementing the corresponding subsystems, not just using them.
2018-09-12use hidden visibility for sh __unmapself backendsRich Felker-2/+3
2018-09-12make arch __set_thread_area backends hiddenRich Felker-0/+9
this is not a public interface, and does not even necessarily match the syscall on all archs that have a syscall by that name. on archs where it's implemented in C, no action on the source file is needed; the hidden declaration in pthread_arch.h suffices.
2018-09-12make arch __clone backends hiddenRich Felker-0/+15
these are not a public interface and are not intended to be callable from anywhere but the public clone function or other places in libc.
2018-09-12move declarations of tls setup/access functions to pthread_impl.hRich Felker-4/+0
it's already included in all places where these are needed, and aside from __tls_get_addr, they're all implementation internals.
2018-09-12for c11 mtx and cnd functions, use externally consistent type namesRich Felker-12/+17
despite looking like undefined behavior, the affected code is correct both before and after this patch. the pairs mtx_t and pthread_mutex_t, and cnd_t and pthread_cond_t, are not mutually compatible within a single translation unit (because they are distinct untagged aggregate instances), but they are compatible with an object of either type from another translation unit (6.2.7 ΒΆ1), and therefore a given translation unit can choose which one it wants to use. in the interest of being able to move declarations out of source files to headers that facilitate checking, use the pthread type names in declaring the namespace-safe versions of the pthread functions and cast the argument pointer types when calling them.
2018-09-12make inadvertently exposed __pthread_{timed,try}join_np functions staticRich Felker-2/+2
these exist for the sake of defining the corresponding weak public aliases (for C11 and POSIX namespace conformance reasons). they are not referenced by anything else in libc, so make them static.
2018-09-12fix issues from public functions defined without declaration visibleRich Felker-0/+1
policy is that all public functions which have a public declaration should be defined in a context where that public declaration is visible, to avoid preventable type mismatches. an audit performed using GCC's -Wmissing-declarations turned up the violations corrected here. in some cases the public header had not been included; in others, a feature test macro needed to make the declaration visible had been omitted. in the case of gethostent and getnetent, the omission seems to have been intentional, as a hack to admit a single stub definition for both functions. this kind of hack is no longer acceptable; it's UB and would not fly with LTO or advanced toolchains. the hack is undone to make exposure of the declarations possible.
2018-09-05define and use internal macros for hidden visibility, weak refsRich Felker-26/+20
this cleans up what had become widespread direct inline use of "GNU C" style attributes directly in the source, and lowers the barrier to increased use of hidden visibility, which will be useful to recovering some of the efficiency lost when the protected visibility hack was dropped in commit dc2f368e565c37728b0d620380b849c3a1ddd78f, especially on archs where the PLT ABI is costly.
2018-09-04fix namespace violation for c11 mutex functionsRich Felker-1/+3
__pthread_mutex_timedlock is used to implement c11 mutex functions, and therefore cannot call pthread_mutex_trylock by name.
2018-09-04in pthread_mutex_timedlock, avoid repeatedly reading mutex type fieldRich Felker-3/+4
compiler cannot cache immutable fields of the mutex object across external calls it can't see, much less across atomics.
2018-09-04in pthread_mutex_trylock, EBUSY out more directly when possibleRich Felker-2/+2
avoid gratuitously setting up and tearing down the robust list pending slot.
2018-08-29fix async thread cancellation on sh-fdpicRich Felker-0/+3
if __cp_cancel was reached via __syscall_cp, r12 will necessarily still contain a GOT pointer (for libc.so or for the static-linked main program) valid for entering __cancel. however, in the case of async cancellation, r12 may contain any scratch value; it's not necessarily even a valid GOT pointer for the code that was interrupted. unlike in commit 0ec49dab6794166d67fae4764ce7fdea42ea6103 where the corresponding issue was fixed for powerpc64, there is fundamentally no way for fdpic code to recompute its GOT pointer. so a new mechanism is introduced for cancel_handler to write a GOT register value into the interrupted context on archs where it is needed.
2018-08-29fix async thread cancellation on powerpc64Rich Felker-0/+7
entering the local entry point for __cancel from __cp_cancel is valid if __cp_cancel was reached from __syscall_cp, since both are in libc and share the same TOC pointer, but it is not valid if __cp_cancel was reached when cancel_handler rewrote the program counter for asynchronous cancellation of code outside libc. to ensure __cancel is entered with a valid TOC pointer, recompute the correct value in a PC-relative manner before jumping.
2018-08-28reject invalid arguments to pthread_barrierattr_setpsharedRich Felker-0/+1
this is a POSIX requirement.
2018-08-28rewrite __aeabi_read_tp in asmSzabolcs Nagy-12/+6
__aeabi_read_tp used to call c code, but that was incorrect as the arm runtime abi specifies special pcs for this function: it is only allowed to clobber r0, ip, lr and cpsr. maintainer's note: the old code explicitly saved and restored all general-purpose registers which are call-clobbered in the normal calling convention, so it's unlikely that any real-world compilers produced code that could break. however theoretically they could have chosen to use floating point registers, in which case the caller's values of those registers would be clobbered.
2018-08-28fix deadlock in async thread self-cancellationRich Felker-1/+5
with async cancellation enabled, pthread_cancel(pthread_self()) deadlocked due to pthread_kill holding killlock which is needed by pthread_exit. this could be solved by making pthread_kill block signals around the critical section, at least when the target thread is itself, but the issue only arises for cancellation, and otherwise would just be imposing unnecessary cost. instead just have pthread_cancel explicitly check for async self-cancellation and call pthread_exit(PTHREAD_CANCELED) directly rather than going through the signal machinery.
2018-08-23fix tls access on arm targets before armv6kSzabolcs Nagy-1/+1
commit 610c5a8524c3d6cd3ac5a5f1231422e7648a3791 changed the thread pointer setup so tp points at the end of the pthread struct on arm, but failed to update __aeabi_read_tp so it was off by 8. this broke tls access in code that is compiled with -mtp=soft, which is the default when target arch is pre armv6k or thumb1. maintainer's note: no release versions are affected.
2018-08-18mips archs: fix runaway execution if start fn passed to clone returnsSegev Finer-3/+12
Call SYS_exit on return from fn in __clone. This is the expected behavior of this function. Without this the child task will crash on return from fn, since it will return to nowhere.
2018-08-16fix pthread_create return value with PTHREAD_EXPLICIT_SCHEDRich Felker-0/+1
due to moved code, commit b8742f32602add243ee2ce74d804015463726899 inadvertently used the return value of __clone, rather than the return value of SYS_sched_setscheduler in the new thread, to check whether it needed to report failure. since a successful __clone returns the tid of the new thread, which is never zero, this caused pthread_create always to return with an invalid error number in the code path for PTHREAD_EXPLICIT_SCHED. this regression was not present in any releases.
2018-07-27make pthread_attr_init honor defaults set by pthread_setattr_default_npRich Felker-4/+11
this fixes a major gap in the intended functionality of pthread_setattr_default_np. if application/library code creating a thread does not pass a null attribute pointer to pthread_create, but sets up an attribute object to change other properties while leaving the stack alone, the created thread will get a stack with size DEFAULT_STACK_SIZE. this makes pthread_setattr_default_np useless for working around stack overflow issues in such applications, and leaves a major risk of regression if previously-working code switches from using a null attribute pointer to an attribute object. this change aligns the behavior more closely with the glibc pthread_setattr_default_np functionality too, albeit via a different mechanism. glibc encodes "default" specially in the attribute object and reads the actual default at thread creation time. with this commit, we now copy the current default into the attribute object at pthread_attr_init time, so that applications that query the properties of the attribute object will see the right values.
2018-06-19add m68k portRich Felker-0/+58
three ABIs are supported: the default with 68881 80-bit fpu format and results returned in floating point registers, softfloat-only with the same format, and coldfire fpu with IEEE single/double only. only the first is tested at all, and only under qemu which has fpu emulation bugs. basic functionality smoke tests have been performed for the most common arch-specific breakage via libc-test and qemu user-level emulation. some sysvipc failures remain, but are shared with other big endian archs and will be fixed separately.
2018-05-09make linking of thread-start with explicit scheduling conditionalRich Felker-28/+28
the wrapper start function that performs scheduling operations is unreachable if pthread_attr_setinheritsched is never called, so move it there rather than the pthread_create source file, saving some code size for static-linked programs.
2018-05-09improve design of thread-start with explicit scheduling attributesRich Felker-21/+39
eliminate the awkward startlock mechanism and corresponding fields of the pthread structure that were only used at startup. instead of having pthread_create perform the scheduling operations and having the new thread wait for them to be completed, start the new thread with a wrapper start function that performs its own scheduling, sending the result code back via a futex. this way the new thread can use storage from the calling thread's stack rather than permanent fields in the pthread structure.
2018-05-05improve joinable/detached thread state handlingRich Felker-19/+22
previously, some accesses to the detached state (from pthread_join and pthread_getattr_np) were unsynchronized; they were harmless in programs with well-defined behavior, but ugly. other accesses (in pthread_exit and pthread_detach) were synchronized by a poorly named "exitlock", with an ad-hoc trylock operation on it open-coded in pthread_detach, whose only purpose was establishing protocol for which thread is responsible for deallocation of detached-thread resources. instead, use an atomic detach_state and unify it with the futex used to wait for thread exit. this eliminates 2 members from the pthread structure, gets rid of the hackish lock usage, and makes rigorous the trap added in commit 80bf5952551c002cf12d96deb145629765272db0 for catching attempts to join detached threads. it should also make attempt to detach an already-detached thread reliably trap.
2018-05-05improve pthread_exit synchronization with functions targeting tidRich Felker-16/+18
if the last thread exited via pthread_exit, the logic that marked it dead did not account for the possibility of it targeting itself via atexit handlers. for example, an atexit handler calling pthread_kill(pthread_self(), SIGKILL) would return success (previously, ESRCH) rather than causing termination via the signal. move the release of killlock after the determination is made whether the exiting thread is the last thread. in the case where it's not, move the release all the way to the end of the function. this way we can clear the tid rather than spending storage on a dedicated dead-flag. clearing the tid is also preferable in that it hardens against inadvertent use of the value after the thread has terminated but before it is joined.
2018-05-04remove incorrect ESRCH error from pthread_killRich Felker-1/+2
posix documents in the rationale and future directions for pthread_kill that, since the lifetime of the thread id for a joinable thread lasts until it is joined, ESRCH is not a correct error for pthread_kill to produce when the target thread has exited but not yet been joined, and that conforming applications cannot attempt to detect this state. future versions of the standard may explicitly require that ESRCH not be returned for this case.
2018-05-02use a dedicated futex object for pthread_join instead of tid fieldRich Felker-4/+5
the tid field in the pthread structure is not volatile, and really shouldn't be, so as not to limit the compiler's ability to reorder, merge, or split loads in code paths that may be relevant to performance (like controlling lock ownership). however, use of objects which are not volatile or atomic with futex wait is inherently broken, since the compiler is free to transform a single load into multiple loads, thereby using a different value for the controlling expression of the loop and the value passed to the futex syscall, leading the syscall to block instead of returning. reportedly glibc's pthread_join was actually affected by an equivalent issue in glibc on s390. add a separate, dedicated join_futex object for pthread_join to use.
2018-02-03store pthread stack guard sizes for pthread_getattr_npWilliam Pitcock-1/+3
2018-01-09revise the definition of multiple basic locks in the codeJens Gustedt-3/+3
In all cases this is just a change from two volatile int to one.
2018-01-09consistently use the LOCK an UNLOCK macrosJens Gustedt-12/+12
In some places there has been a direct usage of the functions. Use the macros consistently everywhere, such that it might be easier later on to capture the fast path directly inside the macro and only have the call overhead on the slow path.
2018-01-09new lock algorithm with state and congestion count in one atomic intJens Gustedt-8/+52
A variant of this new lock algorithm has been presented at SAC'16, see https://hal.inria.fr/hal-01304108. A full version of that paper is available at https://hal.inria.fr/hal-01236734. The main motivation of this is to improve on the safety of the basic lock implementation in musl. This is achieved by squeezing a lock flag and a congestion count (= threads inside the critical section) into a single int. Thereby an unlock operation does exactly one memory transfer (a_fetch_add) and never touches the value again, but still detects if a waiter has to be woken up. This is a fix of a use-after-free bug in pthread_detach that had temporarily been patched. Therefore this patch also reverts c1e27367a9b26b9baac0f37a12349fc36567c8b6 This is also the only place where internal knowledge of the lock algorithm is used. The main price for the improved safety is a little bit larger code. Under high congestion, the scheduling behavior will be different compared to the previous algorithm. In that case, a successful put-to-sleep may appear out of order compared to the arrival in the critical section.
2017-10-13fix read-after-free type error in pthread_detachRich Felker-1/+2
calling __unlock on t->exitlock is not valid because __unlock reads the waiters count after making the atomic store that could allow pthread_exit to continue and unmap the thread's stack and the object t points to. for now, inline the __unlock logic with an unconditional futex wake operation so that the waiters count is not needed. once __lock/__unlock have been made safe for self-synchronized destruction, we could switch back to using them.
2017-09-06fix signal masking race in pthread_create with priority attributesRich Felker-2/+7
if the parent thread was able to set the new thread's priority before it reached the check for 'startlock', the new thread failed to restore its signal mask and thus ran with all signals blocked. concept for patch by Sergei, who reported the issue; unnecessary changes were removed and comments added since the whole 'startlock' thing is non-idiomatic and confusing. eventually it should be replaced with use of idiomatic synchronization primitives.
2017-08-11trap UB from attempts to join a detached threadRich Felker-0/+1
passing to pthread_join the id of a thread which is not joinable results in undefined behavior. in principle the check to trap does not necessarily work if pthread_detach was called after thread creation, since no effort is made here to synchronize access to t->detached, but the check is well-defined and harmless for callers which did not invoke UB, and likely to help catch erroneous code that would otherwise mysteriously hang. patch by William Pitcock.
2017-07-04unify the use of FUTEX_PRIVATEJens Gustedt-3/+3
The flag 1<<7 is used in several places for different purposes that are not always easy to distinguish. Mark those usages that correspond to the flag that is used by the kernel for futexes.
2017-06-08use hard-coded sh4a atomic opcodes to avoid linker errors on shRich Felker-4/+4
when using the sh4a opcodes, the assembler tags the resulting object file as requiring sh4a. the linker then refuses to (static) link it with object files marked as requiring j2, since there is no isa level that includes both sh4a and j2 instructions.
2017-02-15fix build regression in arm atomics asm with new binutilsRich Felker-1/+1
binutils commit bada43421274615d0d5f629a61a60b7daa71bc15 tightened immediate fixup handling in gas in such a way that the final .arch of an object file must be compatible with the fixups used when the instruction was assembled; this in turn broke assembling of atomics.s, at least in thumb mode. it's not clear whether this should be considered a bug in gas, but .object_arch is preferable anyway for our purpose here of controlling the ISA level tag on the object file being produced, and it's the intended directive for use in object files with runtime code selection. research by Szabolcs Nagy confirmed that .object_arch is supported in all relevant versions of binutils and clang's integrated assembler. patch by Reiner Herrmann.
2017-01-19fix spurious EINTR errors from multithreaded set*id, etc.Rich Felker-1/+1
commit 78a8ef47c4d92b7680c52a85f80a81e29da86bb9 inadvertently removed the SA_RESTART flag from the sigaction for the internal signal handler used by __synccall for broadcasting. as a result, programs which did not use interrupting signals but which used set*id() in a multithreaded context could wrongly observe EINTR errors they're not prepared to handle.
2017-01-13fix crashes in x32 __tls_get_addrrofl0r-2/+2
x32 has another gratuitous difference to all other archs: it passes an array of 64bit values to __tls_get_addr(). usually it is an array of size_t.