|Age||Commit message (Collapse)||Author||Lines|
In TLS variant I the TLS is above TP (or above a fixed offset from TP)
but on some targets there is a reserved gap above TP before TLS starts.
This matters for the local-exec tls access model when the offsets of
TLS variables from the TP are hard coded by the linker into the
executable, so the libc must compute these offsets the same way as the
linker. The tls offset of the main module has to be
If there is no TLS in the main module then the gap can be ignored
since musl does not use it and the tls access models of shared
libraries are not affected.
The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0
(i.e. TLS did not require large alignment) because the gap was
treated as a fixed offset from TP. Now the TP points at the end
of the pthread struct (which is aligned) and there is a gap above
it (which may also need alignment).
The fix required changing TP_ADJ and __pthread_self on affected
targets (aarch64, arm and sh) and in the tlsdesc asm the offset to
access the dtv changed too.
on linux/nommu, non-writable private mappings of files may actually
use memory shared with other processes or the fs cache. the old nommu
loader code (used when mmap with MAP_FIXED fails) simply wrote over
top of the original file mapping, possibly clobbering this shared
memory. no such breakage was observed in practice, but it should have
the new code starts by mapping anonymous writable memory on archs that
might support nommu, then maps load segments over top of it, falling
back to read if MAP_FIXED fails. we use an anonymous map rather than a
writable file map to avoid reading more data from disk than needed.
since pages cannot be loaded lazily on fault, in case of large
data/bss, mapping the full file may read a lot of data that will
subsequently be thrown away when processing additional LOAD segments.
as a result, we cannot skip the first LOAD segment when operating in
these changes affect only non-FDPIC nommu support.
previously, the normal ELF library loading code was used even for
fdpic, so only the kernel-loaded dynamic linker and main app could
benefit from separate placement of segments and shared text.
at this point not all functionality is complete. the dynamic linker
itself, and main app if it is also loaded by the kernel, take
advantage of fdpic and do not need constant displacement between
segments, but additional libraries loaded by the dynamic linker follow
normal ELF semantics for mapping still. this fully works, but does not
admit shared text on nommu.
in terms of actual functional correctness, dlsym's results are
presently incorrect for function symbols, RTLD_NEXT fails to identify
the caller correctly, and dladdr fails almost entirely.
with the dynamic linker entry point working, support for static pie is
automatically included, but linking the main application as ET_DYN
(pie) probably does not make sense for fdpic anyway. ET_EXEC is
equally relocatable but more efficient at representing relocations.
with this commit it should be possible to produce a working
static-linked fdpic libc and application binaries for sh.
the changes in reloc.h are largely unused at this point since dynamic
linking is not supported, but the CRTJMP macro is used one place
outside of dynamic linking, in __unmapself.
previously it was using the same name as the default ABI with hard
float (floating point args and return value in registers).
the test __SH_FPU_ANY__ || __SH4__ matches what's used in the
configure script already, and seems correct under casual review
against gcc's config/sh.h, but may need tweaks. the logic for
predefined macros for sh, and what they all mean, is very complex.
eventually this should be documented in comments here.
configure already rejects "half-hard" configurations on sh where
double=float since these do not conform to Annex F and are not
suitable for musl, so these do not need to be considered here.
versions of reloc.h that rely on endian macros much include endian.h
to ensure they are available.
this overhaul further reduces the amount of arch-specific code needed
by the dynamic linker and removes a number of assumptions, including:
- that symbolic function references inside libc are bound at link time
via the linker option -Bsymbolic-functions.
- that libc functions used by the dynamic linker do not require
access to data symbols.
- that static/internal function calls and data accesses can be made
without performing any relocations, or that arch-specific startup
code handled any such relocations needed.
removing these assumptions paves the way for allowing libc.so itself
to be built with stack protector (among other things), and is achieved
by a three-stage bootstrap process:
1. relative relocations are processed with a flat function.
2. symbolic relocations are processed with no external calls/data.
3. main program and dependency libs are processed with a
reduction in arch-specific code is achived through the following:
- crt_arch.h, used for generating crt1.o, now provides the entry point
for the dynamic linker too.
- asm is no longer responsible for skipping the beginning of argv
when ldso is invoked as a command.
- the functionality previously provided by __reloc_self for heavily
GOT-dependent RISC archs is now the arch-agnostic stage-1.
- arch-specific relocation type codes are mapped directly as macros
rather than via an inline translation function/switch statement.
this was one of the main instances of ugly code duplication: all archs
use basically the same types of relocations, but roughly equivalent
logic was duplicated for each arch to account for the different naming
and numbering of relocation types and variation in whether REL or RELA
records are used.
as an added bonus, both REL and RELA are now supported on all archs,
regardless of which is used by the standard toolchain.
the following issues are fixed:
- R_SH_REL32 was adding the load address of the module being relocated
to the result. this seems to have been a mistake in the original
port, since it does not match other dynamic linker implementations
and since adding a difference between two addresses (the symbol
value and the relocation address) to a load address does not make
- R_SH_TLS_DTPMOD32 was wrongly accepting an inline addend (i.e. using
+= rather than = on *reloc_addr) which makes no sense; addition is
not an operation that's defined on module ids.
- R_SH_TLS_DTPOFF32 and R_SH_TLS_TPOFF32 were wrongly using inline
addends rather than the RELA-provided addends.
in addition, handling of R_SH_GLOB_DAT, R_SH_JMP_SLOT, and R_SH_DIR32
are merged to all honor the addend. the first two should not need it
for correct usage generated by toolchains, but other dynamic linkers
allow addends here, and it simplifies the code anyway.
these issues were spotted while reviewing the code for the purpose of
refactoring this part of the dynamic linker. no testing was performed.
the immediate motivation is supporting TLSDESC relocations which
require allocation and thus may fail (unless we pre-allocate), but
this mechanism should also be used for throwing an error on
unsupported or invalid relocation types, and perhaps in certain cases,
for reporting when a relocation is not satisfiable.
default endianness for sh on linux is little, and while conventions
vary, "eb" seems to be the most widely used suffix for big endian.
linux, gcc, etc. all use "sh" as the name for the superh arch. there
was already some inconsistency internally in musl: the dynamic linker
was searching for "ld-musl-sh.path" as its path file despite its own
name being "ld-musl-superh.so.1". there was some sentiment in both
directions as to how to resolve the inconsistency, but overall "sh"