<feed xmlns='http://www.w3.org/2005/Atom'>
<title>musl/src/thread/i386, branch master</title>
<subtitle>musl - an implementation of the standard library for Linux-based systems</subtitle>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/'/>
<entry>
<title>fix i386 __set_thread_area fallback</title>
<updated>2020-08-31T01:37:12+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2020-08-31T01:37:12+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=0b0640219338b80cf47026d1970b5503414ed7f3'/>
<id>0b0640219338b80cf47026d1970b5503414ed7f3</id>
<content type='text'>
this code is only needed for pre-2.6 kernels, which are not actually
supported anyway, and was never tested. the fallback path using
SYS_modify_ldt failed to clear the upper bits of %eax (all ones due to
SYS_set_thread_area's return value being an error) before modifying
%al to attempt a new syscall.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this code is only needed for pre-2.6 kernels, which are not actually
supported anyway, and was never tested. the fallback path using
SYS_modify_ldt failed to clear the upper bits of %eax (all ones due to
SYS_set_thread_area's return value being an error) before modifying
%al to attempt a new syscall.
</pre>
</div>
</content>
</entry>
<entry>
<title>install dynamic tls synchronously at dlopen, streamline access</title>
<updated>2019-02-19T02:01:16+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2019-02-18T04:22:27+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=9d44b6460ab603487dab4d916342d9ba4467e6b9'/>
<id>9d44b6460ab603487dab4d916342d9ba4467e6b9</id>
<content type='text'>
previously, dynamic loading of new libraries with thread-local storage
allocated the storage needed for all existing threads at load-time,
precluding late failure that can't be handled, but left installation
in existing threads to take place lazily on first access. this imposed
an additional memory access and branch on every dynamic tls access,
and imposed a requirement, which was not actually met, that the
dynamic tlsdesc asm functions preserve all call-clobbered registers
before calling C code to to install new dynamic tls on first access.
the x86[_64] versions of this code wrongly omitted saving and
restoring of fpu/vector registers, assuming the compiler would not
generate anything using them in the called C code. the arm and aarch64
versions saved known existing registers, but failed to be future-proof
against expansion of the register file.

now that we track live threads in a list, it's possible to install the
new dynamic tls for each thread at dlopen time. for the most part,
synchronization is not needed, because if a thread has not
synchronized with completion of the dlopen, there is no way it can
meaningfully request access to a slot past the end of the old dtv,
which remains valid for accessing slots which already existed.
however, it is necessary to ensure that, if a thread sees its new dtv
pointer, it sees correct pointers in each of the slots that existed
prior to the dlopen. my understanding is that, on most real-world
coherency architectures including all the ones we presently support, a
built-in consume order guarantees this; however, don't rely on that.
instead, the SYS_membarrier syscall is used to ensure that all threads
see the stores to the slots of their new dtv prior to the installation
of the new dtv. if it is not supported, the same is implemented in
userspace via signals, using the same mechanism as __synccall.

the __tls_get_addr function, variants, and dynamic tlsdesc asm
functions are all updated to remove the fallback paths for claiming
new dynamic tls, and are now all branch-free.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
previously, dynamic loading of new libraries with thread-local storage
allocated the storage needed for all existing threads at load-time,
precluding late failure that can't be handled, but left installation
in existing threads to take place lazily on first access. this imposed
an additional memory access and branch on every dynamic tls access,
and imposed a requirement, which was not actually met, that the
dynamic tlsdesc asm functions preserve all call-clobbered registers
before calling C code to to install new dynamic tls on first access.
the x86[_64] versions of this code wrongly omitted saving and
restoring of fpu/vector registers, assuming the compiler would not
generate anything using them in the called C code. the arm and aarch64
versions saved known existing registers, but failed to be future-proof
against expansion of the register file.

now that we track live threads in a list, it's possible to install the
new dynamic tls for each thread at dlopen time. for the most part,
synchronization is not needed, because if a thread has not
synchronized with completion of the dlopen, there is no way it can
meaningfully request access to a slot past the end of the old dtv,
which remains valid for accessing slots which already existed.
however, it is necessary to ensure that, if a thread sees its new dtv
pointer, it sees correct pointers in each of the slots that existed
prior to the dlopen. my understanding is that, on most real-world
coherency architectures including all the ones we presently support, a
built-in consume order guarantees this; however, don't rely on that.
instead, the SYS_membarrier syscall is used to ensure that all threads
see the stores to the slots of their new dtv prior to the installation
of the new dtv. if it is not supported, the same is implemented in
userspace via signals, using the same mechanism as __synccall.

the __tls_get_addr function, variants, and dynamic tlsdesc asm
functions are all updated to remove the fallback paths for claiming
new dynamic tls, and are now all branch-free.
</pre>
</div>
</content>
</entry>
<entry>
<title>make arch __set_thread_area backends hidden</title>
<updated>2018-09-12T18:34:32+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2018-09-10T19:42:03+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=5e1019b01c968707accc85c99e63a18af665cf27'/>
<id>5e1019b01c968707accc85c99e63a18af665cf27</id>
<content type='text'>
this is not a public interface, and does not even necessarily match
the syscall on all archs that have a syscall by that name.

on archs where it's implemented in C, no action on the source file is
needed; the hidden declaration in pthread_arch.h suffices.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this is not a public interface, and does not even necessarily match
the syscall on all archs that have a syscall by that name.

on archs where it's implemented in C, no action on the source file is
needed; the hidden declaration in pthread_arch.h suffices.
</pre>
</div>
</content>
</entry>
<entry>
<title>make arch __clone backends hidden</title>
<updated>2018-09-12T18:34:31+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2018-09-10T19:36:33+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=f5f7673d71f843b423e60bbdd7de49fd1bbcc8c1'/>
<id>f5f7673d71f843b423e60bbdd7de49fd1bbcc8c1</id>
<content type='text'>
these are not a public interface and are not intended to be callable
from anywhere but the public clone function or other places in libc.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
these are not a public interface and are not intended to be callable
from anywhere but the public clone function or other places in libc.
</pre>
</div>
</content>
</entry>
<entry>
<title>in i386 __set_thread_area, don't assume %gs register is initially zero</title>
<updated>2015-05-16T05:15:40+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-05-16T05:15:40+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=707d7c30f3379441de9b320536ddfd354f4c2143'/>
<id>707d7c30f3379441de9b320536ddfd354f4c2143</id>
<content type='text'>
commit f630df09b1fd954eda16e2f779da0b5ecc9d80d3 added logic to handle
the case where __set_thread_area is called more than once by reusing
the GDT slot already in the %gs register, and only setting up a new
GDT slot when %gs is zero. this created a hidden assumption that %gs
is zero when a new process image starts, which is true in practice on
Linux, but does not seem to be documented ABI, and fails to hold under
qemu app-level emulation.

while it would in theory be possible to zero %gs in the entry point
code, this code is shared between static and dynamic binaries, and
dynamic binaries must not clobber the value of %gs already setup by
the dynamic linker.

the alternative solution implemented in this commit simply uses global
data to store the GDT index that's selected. __set_thread_area should
only be called in the initial thread anyway (subsequent threads get
their thread pointer setup by __clone), but even if it were called by
another thread, it would simply read and write back the same GDT index
that was already assigned to the initial thread, and thus (in the x86
memory model) there is no data race.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f630df09b1fd954eda16e2f779da0b5ecc9d80d3 added logic to handle
the case where __set_thread_area is called more than once by reusing
the GDT slot already in the %gs register, and only setting up a new
GDT slot when %gs is zero. this created a hidden assumption that %gs
is zero when a new process image starts, which is true in practice on
Linux, but does not seem to be documented ABI, and fails to hold under
qemu app-level emulation.

while it would in theory be possible to zero %gs in the entry point
code, this code is shared between static and dynamic binaries, and
dynamic binaries must not clobber the value of %gs already setup by
the dynamic linker.

the alternative solution implemented in this commit simply uses global
data to store the GDT index that's selected. __set_thread_area should
only be called in the initial thread anyway (subsequent threads get
their thread pointer setup by __clone), but even if it were called by
another thread, it would simply read and write back the same GDT index
that was already assigned to the initial thread, and thus (in the x86
memory model) there is no data race.
</pre>
</div>
</content>
</entry>
<entry>
<title>use hidden __tls_get_new for tls/tlsdesc lookup fallback cases</title>
<updated>2015-04-15T03:45:08+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-04-15T03:45:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=81e18eb3cd61f7b68a8f99b3e6c728f2a4d79221'/>
<id>81e18eb3cd61f7b68a8f99b3e6c728f2a4d79221</id>
<content type='text'>
previously, the dynamic tlsdesc lookup functions and the i386
special-ABI ___tls_get_addr (3 underscores) function called
__tls_get_addr when the slot they wanted was not already setup;
__tls_get_addr would then in turn also see that it's not setup and
call __tls_get_new.

calling __tls_get_new directly is both more efficient and avoids the
issue of calling a non-hidden (public API/ABI) function from asm.

for the special i386 function, a weak reference to __tls_get_new is
used since this function is not defined when static linking (the code
path that needs it is unreachable in static-linked programs).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
previously, the dynamic tlsdesc lookup functions and the i386
special-ABI ___tls_get_addr (3 underscores) function called
__tls_get_addr when the slot they wanted was not already setup;
__tls_get_addr would then in turn also see that it's not setup and
call __tls_get_new.

calling __tls_get_new directly is both more efficient and avoids the
issue of calling a non-hidden (public API/ABI) function from asm.

for the special i386 function, a weak reference to __tls_get_new is
used since this function is not defined when static linking (the code
path that needs it is unreachable in static-linked programs).
</pre>
</div>
</content>
</entry>
<entry>
<title>consistently use hidden visibility for cancellable syscall internals</title>
<updated>2015-04-14T15:18:59+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-04-14T15:18:59+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=cbc02ba23cec16d7a821648ea8424546bc7f02dc'/>
<id>cbc02ba23cec16d7a821648ea8424546bc7f02dc</id>
<content type='text'>
in a few places, non-hidden symbols were referenced from asm in ways
that assumed ld-time binding. while these is no semantic reason these
symbols need to be hidden, fixing the references without making them
hidden was going to be ugly, and hidden reduces some bloat anyway.

in the asm files, .global/.hidden directives have been moved to the
top to unclutter the actual code.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
in a few places, non-hidden symbols were referenced from asm in ways
that assumed ld-time binding. while these is no semantic reason these
symbols need to be hidden, fixing the references without making them
hidden was going to be ugly, and hidden reduces some bloat anyway.

in the asm files, .global/.hidden directives have been moved to the
top to unclutter the actual code.
</pre>
</div>
</content>
</entry>
<entry>
<title>allow i386 __set_thread_area to be called more than once</title>
<updated>2015-04-13T21:26:08+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-04-13T21:26:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=f630df09b1fd954eda16e2f779da0b5ecc9d80d3'/>
<id>f630df09b1fd954eda16e2f779da0b5ecc9d80d3</id>
<content type='text'>
previously a new GDT slot was requested, even if one had already been
obtained by a previous call. instead extract the old slot number from
GS and reuse it if it was already set. the formula (GS-3)/8 for the
slot number automatically yields -1 (request for new slot) if GS is
zero (unset).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
previously a new GDT slot was requested, even if one had already been
obtained by a previous call. instead extract the old slot number from
GS and reuse it if it was already set. the formula (GS-3)/8 for the
slot number automatically yields -1 (request for new slot) if GS is
zero (unset).
</pre>
</div>
</content>
</entry>
<entry>
<title>prepare cancellation syscall asm for possibility of __cancel returning</title>
<updated>2015-02-21T01:25:35+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-02-21T01:25:35+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=f409338a9e808a09001669377c608fd2803d808d'/>
<id>f409338a9e808a09001669377c608fd2803d808d</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>optimize i386 ___tls_get_addr asm</title>
<updated>2014-06-19T06:48:45+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-06-19T06:48:45+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=880c479f0e3a798d826c1c91b38d33e2b3e36580'/>
<id>880c479f0e3a798d826c1c91b38d33e2b3e36580</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
