<feed xmlns='http://www.w3.org/2005/Atom'>
<title>musl/src/thread, branch v1.1.6</title>
<subtitle>musl - an implementation of the standard library for Linux-based systems</subtitle>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/'/>
<entry>
<title>fix __aeabi_read_tp oversight in arm atomics/tls overhaul</title>
<updated>2014-11-22T17:26:38+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-11-22T17:26:38+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=8cd0b11eafeaaec3df5113cb39094e5456ca6b22'/>
<id>8cd0b11eafeaaec3df5113cb39094e5456ca6b22</id>
<content type='text'>
calls to __aeabi_read_tp may be generated by the compiler to access
TLS on pre-v6 targets. previously, this function was hard-coded to
call the kuser helper, which would crash on kernels with kuser helper
removed.

to fix the problem most efficiently, the definition of __aeabi_read_tp
is moved so that it's an alias for the new __a_gettp. however, on v7+
targets, code to initialize the runtime choice of thread-pointer
loading code is not even compiled, meaning that defining
__aeabi_read_tp would have caused an immediate crash due to using the
default implementation of __a_gettp with a HCF instruction.

fortunately there is an elegant solution which reduces overall code
size: putting the native thread-pointer loading instruction in the
default code path for __a_gettp, so that separate default/native code
paths are not needed. this function should never be called before
__set_thread_area anyway, and if it is called early on pre-v6
hardware, the old behavior (crashing) is maintained.

ideally __aeabi_read_tp would not be called at all on v7+ targets
anyway -- in fact, prior to the overhaul, the same problem existed,
but it was never caught by users building for v7+ with kuser disabled.
however, it's possible for calls to __aeabi_read_tp to end up in a v7+
binary if some of the object files were built for pre-v7 targets, e.g.
in the case of static libraries that were built separately, so this
case needs to be handled.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
calls to __aeabi_read_tp may be generated by the compiler to access
TLS on pre-v6 targets. previously, this function was hard-coded to
call the kuser helper, which would crash on kernels with kuser helper
removed.

to fix the problem most efficiently, the definition of __aeabi_read_tp
is moved so that it's an alias for the new __a_gettp. however, on v7+
targets, code to initialize the runtime choice of thread-pointer
loading code is not even compiled, meaning that defining
__aeabi_read_tp would have caused an immediate crash due to using the
default implementation of __a_gettp with a HCF instruction.

fortunately there is an elegant solution which reduces overall code
size: putting the native thread-pointer loading instruction in the
default code path for __a_gettp, so that separate default/native code
paths are not needed. this function should never be called before
__set_thread_area anyway, and if it is called early on pre-v6
hardware, the old behavior (crashing) is maintained.

ideally __aeabi_read_tp would not be called at all on v7+ targets
anyway -- in fact, prior to the overhaul, the same problem existed,
but it was never caught by users building for v7+ with kuser disabled.
however, it's possible for calls to __aeabi_read_tp to end up in a v7+
binary if some of the object files were built for pre-v7 targets, e.g.
in the case of static libraries that were built separately, so this
case needs to be handled.
</pre>
</div>
</content>
</entry>
<entry>
<title>overhaul ARM atomics/tls for performance and compatibility</title>
<updated>2014-11-19T06:02:01+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-11-19T05:40:32+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=4a241f14a6bea81b9b50edda09f8184e35a75860'/>
<id>4a241f14a6bea81b9b50edda09f8184e35a75860</id>
<content type='text'>
previously, builds for pre-armv6 targets hard-coded use of the "kuser
helper" system for atomics and thread-pointer access, resulting in
binaries that fail to run (crash) on systems where this functionality
has been disabled (as a security/hardening measure) in the kernel.
additionally, builds for armv6 hard-coded an outdated/deprecated
memory barrier instruction which may require emulation (extremely
slow) on future models.

this overhaul replaces the behavior for all pre-armv7 builds (both of
the above cases) to perform runtime detection of the appropriate
mechanisms for barrier, atomic compare-and-swap, and thread pointer
access. detection is based on information provided by the kernel in
auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture
version encoded in AT_PLATFORM. direct use of the instructions is
preferred when possible, since probing for the existence of the kuser
helper page would be difficult and would incur runtime cost.

for builds targeting armv7 or later, the runtime detection code is not
compiled at all, and much more efficient versions of the non-cas
atomic operations are provided by using ldrex/strex directly rather
than wrapping cas.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
previously, builds for pre-armv6 targets hard-coded use of the "kuser
helper" system for atomics and thread-pointer access, resulting in
binaries that fail to run (crash) on systems where this functionality
has been disabled (as a security/hardening measure) in the kernel.
additionally, builds for armv6 hard-coded an outdated/deprecated
memory barrier instruction which may require emulation (extremely
slow) on future models.

this overhaul replaces the behavior for all pre-armv7 builds (both of
the above cases) to perform runtime detection of the appropriate
mechanisms for barrier, atomic compare-and-swap, and thread pointer
access. detection is based on information provided by the kernel in
auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture
version encoded in AT_PLATFORM. direct use of the instructions is
preferred when possible, since probing for the existence of the kuser
helper page would be difficult and would incur runtime cost.

for builds targeting armv7 or later, the runtime detection code is not
compiled at all, and much more efficient versions of the non-cas
atomic operations are provided by using ldrex/strex directly rather
than wrapping cas.
</pre>
</div>
</content>
</entry>
<entry>
<title>manually "shrink wrap" fast path in pthread_once</title>
<updated>2014-10-20T04:22:51+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-10-20T04:22:51+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=dc95322e18615392eea69de355edd735a15a8f36'/>
<id>dc95322e18615392eea69de355edd735a15a8f36</id>
<content type='text'>
this change is a workaround for the inability of current compilers to
perform "shrink wrapping" optimizations. in casual testing, it roughly
doubled the performance of pthread_once when called on an
already-finished once control object.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this change is a workaround for the inability of current compilers to
perform "shrink wrapping" optimizations. in casual testing, it roughly
doubled the performance of pthread_once when called on an
already-finished once control object.
</pre>
</div>
</content>
</entry>
<entry>
<title>eliminate global waiters count in pthread_once</title>
<updated>2014-10-13T22:26:28+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-10-13T22:26:28+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=00548408398ced546c540dab773ea66cea4fe1c2'/>
<id>00548408398ced546c540dab773ea66cea4fe1c2</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>fix missing barrier in pthread_once/call_once shortcut path</title>
<updated>2014-10-10T22:21:31+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-10-10T22:20:33+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=df37d3960abec482e17fad2274a99b790f6cc08b'/>
<id>df37d3960abec482e17fad2274a99b790f6cc08b</id>
<content type='text'>
these functions need to be fast when the init routine has already run,
since they may be called very often from code which depends on global
initialization having taken place. as such, a fast path bypassing
atomic cas on the once control object was used to avoid heavy memory
contention. however, on archs with weakly ordered memory, the fast
path failed to ensure that the caller actually observes the side
effects of the init routine.

preliminary performance testing showed that simply removing the fast
path was not practical; a performance drop of roughly 85x was observed
with 20 threads hammering the same once control on a 24-core machine.
so the new explicit barrier operation from atomic.h is used to retain
the fast path while ensuring memory visibility.

performance may be reduced on some archs where the barrier actually
makes a difference, but the previous behavior was unsafe and incorrect
on these archs. future improvements to the implementation of a_barrier
should reduce the impact.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
these functions need to be fast when the init routine has already run,
since they may be called very often from code which depends on global
initialization having taken place. as such, a fast path bypassing
atomic cas on the once control object was used to avoid heavy memory
contention. however, on archs with weakly ordered memory, the fast
path failed to ensure that the caller actually observes the side
effects of the init routine.

preliminary performance testing showed that simply removing the fast
path was not practical; a performance drop of roughly 85x was observed
with 20 threads hammering the same once control on a 24-core machine.
so the new explicit barrier operation from atomic.h is used to retain
the fast path while ensuring memory visibility.

performance may be reduced on some archs where the barrier actually
makes a difference, but the previous behavior was unsafe and incorrect
on these archs. future improvements to the implementation of a_barrier
should reduce the impact.
</pre>
</div>
</content>
</entry>
<entry>
<title>add C11 thread creation and related thread functions</title>
<updated>2014-09-07T14:28:08+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-09-07T14:28:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=23614b0fcb4cd4d7b2e4148d3b1887b642169765'/>
<id>23614b0fcb4cd4d7b2e4148d3b1887b642169765</id>
<content type='text'>
based on patch by Jens Gustedt.

the main difficulty here is handling the difference between start
function signatures and thread return types for C11 threads versus
POSIX threads. pointers to void are assumed to be able to represent
faithfully all values of int. the function pointer for the thread
start function is cast to an incorrect type for passing through
pthread_create, but is cast back to its correct type before calling so
that the behavior of the call is well-defined.

changes to the existing threads implementation were kept minimal to
reduce the risk of regressions, and duplication of code that carries
implementation-specific assumptions was avoided for ease and safety of
future maintenance.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
based on patch by Jens Gustedt.

the main difficulty here is handling the difference between start
function signatures and thread return types for C11 threads versus
POSIX threads. pointers to void are assumed to be able to represent
faithfully all values of int. the function pointer for the thread
start function is cast to an incorrect type for passing through
pthread_create, but is cast back to its correct type before calling so
that the behavior of the call is well-defined.

changes to the existing threads implementation were kept minimal to
reduce the risk of regressions, and duplication of code that carries
implementation-specific assumptions was avoided for ease and safety of
future maintenance.
</pre>
</div>
</content>
</entry>
<entry>
<title>add C11 condition variable functions</title>
<updated>2014-09-07T02:27:45+00:00</updated>
<author>
<name>Jens Gustedt</name>
<email>Jens.Gustedt@inria.fr</email>
</author>
<published>2014-09-07T02:27:45+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=14397cec2c8429b504b17aaf92509b48da3681b9'/>
<id>14397cec2c8429b504b17aaf92509b48da3681b9</id>
<content type='text'>
Because of the clear separation for private pthread_cond_t these
interfaces are quite simple and direct.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Because of the clear separation for private pthread_cond_t these
interfaces are quite simple and direct.
</pre>
</div>
</content>
</entry>
<entry>
<title>add C11 mutex functions</title>
<updated>2014-09-07T02:07:22+00:00</updated>
<author>
<name>Jens Gustedt</name>
<email>Jens.Gustedt@inria.fr</email>
</author>
<published>2014-09-07T02:07:22+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=8b0472932c1cb8cb2cc46322b21c0c4e21848522'/>
<id>8b0472932c1cb8cb2cc46322b21c0c4e21848522</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>add C11 thread functions operating on tss_t and once_flag</title>
<updated>2014-09-07T01:38:04+00:00</updated>
<author>
<name>Jens Gustedt</name>
<email>Jens.Gustedt@inria.fr</email>
</author>
<published>2014-09-07T01:32:53+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=e16f70f45210294321a88f23c85ac45046577adc'/>
<id>e16f70f45210294321a88f23c85ac45046577adc</id>
<content type='text'>
These all have POSIX equivalents, but aside from tss_get, they all
have minor changes to the signature or return value and thus need to
exist as separate functions.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These all have POSIX equivalents, but aside from tss_get, they all
have minor changes to the signature or return value and thus need to
exist as separate functions.
</pre>
</div>
</content>
</entry>
<entry>
<title>use weak symbols for the POSIX functions that will be used by C threads</title>
<updated>2014-09-06T22:11:24+00:00</updated>
<author>
<name>Jens Gustedt</name>
<email>Jens.Gustedt@inria.fr</email>
</author>
<published>2014-08-31T22:46:23+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=df7d0dfb9c686df31149d09008ba92834bed9803'/>
<id>df7d0dfb9c686df31149d09008ba92834bed9803</id>
<content type='text'>
The intent of this is to avoid name space pollution of the C threads
implementation.

This has two sides to it. First we have to provide symbols that wouldn't
pollute the name space for the C threads implementation. Second we have
to clean up some internal uses of POSIX functions such that they don't
implicitly drag in such symbols.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The intent of this is to avoid name space pollution of the C threads
implementation.

This has two sides to it. First we have to provide symbols that wouldn't
pollute the name space for the C threads implementation. Second we have
to clean up some internal uses of POSIX functions such that they don't
implicitly drag in such symbols.
</pre>
</div>
</content>
</entry>
</feed>
