summaryrefslogtreecommitdiff
path: root/src/thread/__tls_get_addr.c
AgeCommit message (Collapse)AuthorLines
2019-02-18install dynamic tls synchronously at dlopen, streamline accessRich Felker-6/+1
previously, dynamic loading of new libraries with thread-local storage allocated the storage needed for all existing threads at load-time, precluding late failure that can't be handled, but left installation in existing threads to take place lazily on first access. this imposed an additional memory access and branch on every dynamic tls access, and imposed a requirement, which was not actually met, that the dynamic tlsdesc asm functions preserve all call-clobbered registers before calling C code to to install new dynamic tls on first access. the x86[_64] versions of this code wrongly omitted saving and restoring of fpu/vector registers, assuming the compiler would not generate anything using them in the called C code. the arm and aarch64 versions saved known existing registers, but failed to be future-proof against expansion of the register file. now that we track live threads in a list, it's possible to install the new dynamic tls for each thread at dlopen time. for the most part, synchronization is not needed, because if a thread has not synchronized with completion of the dlopen, there is no way it can meaningfully request access to a slot past the end of the old dtv, which remains valid for accessing slots which already existed. however, it is necessary to ensure that, if a thread sees its new dtv pointer, it sees correct pointers in each of the slots that existed prior to the dlopen. my understanding is that, on most real-world coherency architectures including all the ones we presently support, a built-in consume order guarantees this; however, don't rely on that. instead, the SYS_membarrier syscall is used to ensure that all threads see the stores to the slots of their new dtv prior to the installation of the new dtv. if it is not supported, the same is implemented in userspace via signals, using the same mechanism as __synccall. the __tls_get_addr function, variants, and dynamic tlsdesc asm functions are all updated to remove the fallback paths for claiming new dynamic tls, and are now all branch-free.
2018-10-12combine arch ABI's DTP_OFFSET into DTV pointersRich Felker-2/+2
as explained in commit 6ba5517a460c6c438f64d69464fdfc3269a4c91a, some archs use an offset (typicaly -0x8000) with their DTPOFF relocations, which __tls_get_addr needs to invert. on affected archs, which lack direct support for large immediates, this can cost multiple extra instructions in the hot path. instead, incorporate the DTP_OFFSET into the DTV entries. this means they are no longer valid pointers, so store them as an array of uintptr_t rather than void *; this also makes it easier to access slot 0 as a valid slot count. commit e75b16cf93ebbc1ce758d3ea6b2923e8b2457c68 left behind cruft in two places, __reset_tls and __tls_get_new, from back when it was possible to have uninitialized gap slots indicated by a null pointer in the DTV. since the concept of null pointer is no longer meaningful with an offset applied, remove this cruft. presently there are no archs with both TLSDESC and nonzero DTP_OFFSET, but the dynamic TLSDESC relocation code is also updated to apply an inverted offset to its offset field, so that the offset DTV would not impose a runtime cost in TLSDESC resolver functions.
2018-09-12reduce spurious inclusion of libc.hRich Felker-1/+0
libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
2018-09-12move declarations of tls setup/access functions to pthread_impl.hRich Felker-2/+0
it's already included in all places where these are needed, and aside from __tls_get_addr, they're all implementation internals.
2018-09-05define and use internal macros for hidden visibility, weak refsRich Felker-2/+1
this cleans up what had become widespread direct inline use of "GNU C" style attributes directly in the source, and lowers the barrier to increased use of hidden visibility, which will be useful to recovering some of the efficiency lost when the protected visibility hack was dropped in commit dc2f368e565c37728b0d620380b849c3a1ddd78f, especially on archs where the PLT ABI is costly.
2017-01-13fix crashes in x32 __tls_get_addrrofl0r-2/+2
x32 has another gratuitous difference to all other archs: it passes an array of 64bit values to __tls_get_addr(). usually it is an array of size_t.
2015-11-11eliminate use of SHARED macro in __tls_get_addrRich Felker-6/+6
this was only a tiny optimization, and static-linked binaries should not be calling __tls_get_addr anyway since the linker is supposed to perform relaxation, resulting in use of the local-exec TLS model.
2015-06-25fix local-dynamic model TLS on mips and powerpcRich Felker-2/+2
the TLS ABI spec for mips, powerpc, and some other (presently unsupported) RISC archs has the return value of __tls_get_addr offset by +0x8000 and the result of DTPOFF relocations offset by -0x8000. I had previously assumed this part of the ABI was actually just an implementation detail, since the adjustments cancel out. however, when the local dynamic model is used for accessing TLS that's known to be in the same DSO, either of the following may happen: 1. the -0x8000 offset may already be applied to the argument structure passed to __tls_get_addr at ld time, without any opportunity for runtime relocations. 2. __tls_get_addr may be used with a zero offset argument to obtain a base address for the module's TLS, to which the caller then applies immediate offsets for individual objects accessed using the local dynamic model. since the immediate offsets have the -0x8000 adjustment applied to them, the base address they use needs to include the +0x8000 offset. it would be possible, but more complex, to store the pointers in the dtv[] array with the +0x8000 offset pre-applied, to avoid the runtime cost of adding 0x8000 on each call to __tls_get_addr. this change could be made later if measurements show that it would help.
2015-04-14fix inconsistent visibility for internal __tls_get_new functionRich Felker-3/+2
at the point of call it was declared hidden, but the definition was not hidden. for some toolchains this inconsistency produced textrels without ld-time binding.
2014-06-19separate __tls_get_addr implementation from dynamic linker/init_tlsRich Felker-0/+17
such separation serves multiple purposes: - by having the common path for __tls_get_addr alone in its own function with a tail call to the slow case, code generation is greatly improved. - by having __tls_get_addr in it own file, it can be replaced on a per-arch basis as needed, for optimization or ABI-specific purposes. - by removing __tls_get_addr from __init_tls.c, a few bytes of code are shaved off of static binaries (which are unlikely to use this function unless the linker messed up).