|Age||Commit message (Collapse)||Author||Lines|
Some declarations of __tls_get_new were left in the code, even
though the definition got removed in
install dynamic tls synchronously at dlopen, streamline access
this can make the build fail with
ld: lib/libc.so: hidden symbol `__tls_get_new' isn't defined
when libc.so is linked without --gc-sections, because a .hidden
declaration in asm code creates a reference even if the symbol
is not actually used.
we don't actually support building asm source files as thumb1, but
it's possible that the condition __ARM_ARCH>=5 would be false on old
compilers that did not define __ARM_ARCH at all. avoiding that would
require enumerating all of the possible __ARM_ARCH_*__ macros for
as noted in commit 05870abeaac0588fb9115cfd11f96880a0af2108, mov lr,pc
is not valid for saving a return address when in thumb mode. since
this code is a hot path (dynamic TLS access), don't do the out-of-line
bl->bx chaining to save the return value; instead, use the fact that
this file is preprocessed asm to add the missing thumb bit with an add
in place of the mov.
the change here does not affect builds for ISA levels new enough to
have a thread pointer read instruction, or for armv5 and later as long
as the compiler properly defines __ARM_ARCH, or for any build as arm
(not thumb) code. it's likely that it makes no difference whatsoever
to any present-day practical build environments, but nonetheless now
as an alternative, we could just assume __thumb__ implies availability
of blx since we don't support building asm source files as thumb1. I
didn't do that in order to avoid having a wrong assumption here if
that ever changes.
previously, dynamic loading of new libraries with thread-local storage
allocated the storage needed for all existing threads at load-time,
precluding late failure that can't be handled, but left installation
in existing threads to take place lazily on first access. this imposed
an additional memory access and branch on every dynamic tls access,
and imposed a requirement, which was not actually met, that the
dynamic tlsdesc asm functions preserve all call-clobbered registers
before calling C code to to install new dynamic tls on first access.
the x86[_64] versions of this code wrongly omitted saving and
restoring of fpu/vector registers, assuming the compiler would not
generate anything using them in the called C code. the arm and aarch64
versions saved known existing registers, but failed to be future-proof
against expansion of the register file.
now that we track live threads in a list, it's possible to install the
new dynamic tls for each thread at dlopen time. for the most part,
synchronization is not needed, because if a thread has not
synchronized with completion of the dlopen, there is no way it can
meaningfully request access to a slot past the end of the old dtv,
which remains valid for accessing slots which already existed.
however, it is necessary to ensure that, if a thread sees its new dtv
pointer, it sees correct pointers in each of the slots that existed
prior to the dlopen. my understanding is that, on most real-world
coherency architectures including all the ones we presently support, a
built-in consume order guarantees this; however, don't rely on that.
instead, the SYS_membarrier syscall is used to ensure that all threads
see the stores to the slots of their new dtv prior to the installation
of the new dtv. if it is not supported, the same is implemented in
userspace via signals, using the same mechanism as __synccall.
the __tls_get_addr function, variants, and dynamic tlsdesc asm
functions are all updated to remove the fallback paths for claiming
new dynamic tls, and are now all branch-free.
when invoking the assembler, arm gcc does not always pass the right
flags to enable use of vfp instruction mnemonics. for C code it
produces, it emits the .fpu directive, but this does not help when
building asm source files, which tlsdesc needs to be. to fix, use an
explicit directive here.
commit 0beb9dfbecad38af9759b1e83eeb007e28b70abb introduced this
regression. it has not appeared in any release.
the indirect function call is a significant portion of the code path
for the dynamic case, and most users are probably building for ISA
levels where it can be omitted.
we could drop at least one register save/restore (lr) with this
change, and possibly another (ip) with some clever shuffling, but it's
not clear whether there's a way to do it that's not more expensive, or
whether avoiding the save/restore would have any practical effect, so
in the interest of avoiding complexity it's omitted for now.
unlike other asm where the baseline ISA is used, these functions are
hot paths and use ISA-level specializations.
call-clobbered vfp registers are saved before calling __tls_get_new,
since there is no guarantee it won't use them. while setjmp/longjmp
have to use hwcap to decide whether to the fpu is in use, since
application code could be using vfp registers even if libc was
compiled as pure softfloat, __tls_get_new is part of libc and can be
assumed not to have access to vfp registers if tlsdesc.S does not.
thus it suffices just to check the predefined preprocessor macros. the
check for __ARM_PCS_VFP is redundant; !__SOFTFP__ must always be true
if the target ISA level includes fpu instructions/registers.
this is possible with the new build system that allows src/*/$(ARCH)/*
files which do not shadow a file in the parent directory, and yields a
more logical organization. eventually it will be possible to remove
arch/*/src from the build system.
these files are all accepted as legacy arm syntax when producing arm
code, but legacy syntax cannot be used for producing thumb2 with
access to the full ISA. even after switching to UAL, some asm source
files contain instructions which are not valid in thumb mode, so these
will need to be addressed separately.
this overhaul further reduces the amount of arch-specific code needed
by the dynamic linker and removes a number of assumptions, including:
- that symbolic function references inside libc are bound at link time
via the linker option -Bsymbolic-functions.
- that libc functions used by the dynamic linker do not require
access to data symbols.
- that static/internal function calls and data accesses can be made
without performing any relocations, or that arch-specific startup
code handled any such relocations needed.
removing these assumptions paves the way for allowing libc.so itself
to be built with stack protector (among other things), and is achieved
by a three-stage bootstrap process:
1. relative relocations are processed with a flat function.
2. symbolic relocations are processed with no external calls/data.
3. main program and dependency libs are processed with a
reduction in arch-specific code is achived through the following:
- crt_arch.h, used for generating crt1.o, now provides the entry point
for the dynamic linker too.
- asm is no longer responsible for skipping the beginning of argv
when ldso is invoked as a command.
- the functionality previously provided by __reloc_self for heavily
GOT-dependent RISC archs is now the arch-agnostic stage-1.
- arch-specific relocation type codes are mapped directly as macros
rather than via an inline translation function/switch statement.
the main motivation for this change is to aid in debugging. since the
main program's entry point is also named _start, it was difficult to
set breakpoints or quickly identify which _start execution stopped in.
two actual issues: one is that __dynlink no longer wants/needs a GOT
pointer argument, so the code to generate that argument can be
removed. the other issue was that in the i386 code, argc/argv were
being loaded into registers that would be call-clobbered, then copied
to preserved registers, rather than just being loaded into the proper
call-preserved registers to begin with.
this cleanup is in preparation for adding new dynamic linker
functionality (ability to explicitly invoke the dynamic linker to run
mildly tested, seems to work