Age | Commit message (Collapse) | Author | Lines |
|
this mirrors the stdio_impl.h cleanup. one header which is not
strictly needed, errno.h, is left in pthread_impl.h, because since
pthread functions return their error codes rather than using errno,
nearly every single pthread function needs the errno constants.
in a few places, rather than bringing in string.h to use memset, the
memset was replaced by direct assignment. this seems to generate much
better code anyway, and makes many functions which were previously
non-leaf functions into leaf functions (possibly eliminating a great
deal of bloat on some platforms where non-leaf functions require ugly
prologue and/or epilogue).
|
|
|
|
|
|
despite documentation that makes it sound a lot different, the only
ABI-constraint difference between TLS variants II and I seems to be
that variant II stores the initial TLS segment immediately below the
thread pointer (i.e. the thread pointer points to the end of it) and
variant I stores the initial TLS segment above the thread pointer,
requiring the thread descriptor to be stored below. the actual value
stored in the thread pointer register also tends to have per-arch
random offsets applied to it for silly micro-optimization purposes.
with these changes applied, TLS should be basically working on all
supported archs except microblaze. I'm still working on getting the
necessary information and a working toolchain that can build TLS
binaries for microblaze, but in theory, static-linked programs with
TLS and dynamic-linked programs where only the main executable uses
TLS should already work on microblaze.
alignment constraints have not yet been heavily tested, so it's
possible that this code does not always align TLS segments correctly
on archs that need TLS variant I.
|
|
the code in __libc_start_main is now responsible for parsing auxv,
rather than duplicating the parsing all over the place. this should
shave off a few cycles and some code size. __init_libc is left as an
external-linkage function despite the fact that it could be static, to
prevent it from being inlined and permanently wasting stack space when
main is called.
a few other minor changes are included, like eliminating per-thread
ssp canaries (they were likely broken when combined with certain
dlopen usages, and completely unnecessary) and some other unnecessary
checks. since this code gets linked into every program, it should be
as small and simple as possible.
|
|
|
|
unlike other implementations, this one reserves memory for new TLS in
all pre-existing threads at dlopen-time, and dlopen will fail with no
resources consumed and no new libraries loaded if memory is not
available. memory is not immediately distributed to running threads;
that would be too complex and too costly. instead, assurances are made
that threads needing the new TLS can obtain it in an async-signal-safe
way from a buffer belonging to the dynamic linker/new module (via
atomic fetch-and-add based allocator).
I've re-appropriated the lock that was previously used for __synccall
(synchronizing set*id() syscalls between threads) as a general
pthread_create lock. it's a "backwards" rwlock where the "read"
operation is safe atomic modification of the live thread count, which
multiple threads can perform at the same time, and the "write"
operation is making sure the count does not increase during an
operation that depends on it remaining bounded (__synccall or dlopen).
in static-linked programs that don't use __synccall, this lock is a
no-op and has no cost.
|
|
only TLS in the main program is supported so far; TLS defined in
shared libraries will not work yet.
|
|
the design for TLS in dynamic-linked programs is mostly complete too,
but I have not yet implemented it. cost is nonzero but still low for
programs which do not use TLS and/or do not use threads (a few hundred
bytes of new code, plus dependency on memcpy). i believe it can be
made smaller at some point by merging __init_tls and __init_security
into __libc_start_main and avoiding duplicate auxv-parsing code.
at the same time, I've also slightly changed the logic pthread_create
uses to allocate guard pages to ensure that guard pages are not
counted towards commit charge.
|