Age | Commit message (Collapse) | Author | Lines |
|
the arm atomics/TLS runtime selection code is called from
__set_thread_area and depends on having libc.auxv and __hwcap
available. commit 71f099cb7db821c51d8f39dfac622c61e54d794c moved the
first call to __set_thread_area to the top of dynamic linking stage 3,
before this data is made available, causing the runtime detection code
to always see __hwcap as zero and thereby select the atomics/TLS
implementations based on kuser helper.
upcoming work on superh will use similar runtime detection.
ideally this early-init code should be cleanly refactored and shared
between the dynamic linker and static-linked startup.
|
|
commit f3ddd173806fd5c60b3f034528ca24542aecc5b9 inadvertently removed
the early check for "none" type relocations, causing the address
dso->base+0 to be dereferenced to obtain an addend. shared libraries,
(including libc.so) and PIE executables were unaffected, since their
base addresses are the actual address of their mappings and are
readable. non-PIE main executables, however, have a base address of 0
because their load addresses are absolute and not offset at load time.
in practice none-type relocations do not arise with toolchains that
are in use except on mips, and on mips it's moderately rare for a
non-PIE executable to have a relocation table, since the mips-specific
got processing serves in its place for most purposes.
|
|
commit f3ddd173806fd5c60b3f034528ca24542aecc5b9 introduced early
relocations and subsequent reprocessing as part of the dynamic linker
bootstrap overhaul, to allow use of arbitrary libc functions before
the main application and libraries are loaded, but only reprocessed
GOT/PLT relocation types.
commit c093e2e8201524db0d638920e76bcb6b1d925f3a added reprocessing of
non-GOT/PLT relocations to fix an actual regression that was observed
on powerpc, but only for RELA format tables with out-of-line addends.
REL table (inline addends at the relocation address) reprocessing is
trickier because the first relocation pass clobbers the addends.
this patch extends symbolic relocation reprocessing for libc/ldso to
support all relocation types, whether REL or RELA format tables are
used. it is believed not to alter behavior on any existing archs for
the current dynamic linker and libc code. the motivations for this
change are consistency and future-proofing. it ensures that behavior
does not differ depending on whether REL or RELA tables are used,
which could lead to undetected arch-specific bugs. it also ensures
that, if in the future code depending on additional relocation types
is added to libc.so, either at the source level or as part of the
compiler runtime that gets pulled in (for example, soft-float with TLS
for fenv), the new code will work properly.
the implementation concept is simple: stage 2 of the dynamic linker
counts the number of symbolic relocations in the libc/ldso REL table
and allocates a VLA to save their addends into; stage 3 then uses the
saved addends in place of the inline ones which were clobbered. for
stack safety, a hard limit (currently 4k) is imposed on the number of
such addends; this should be a couple orders of magnitude larger than
the actual need. this number is not a runtime variable that could
break fail-safety; it is constant for a given libc.so build.
|
|
this move eliminates a duplicate "by-hand" symbol lookup loop from the
stage-1 code and replaces it with a call to find_sym, which can be
used once we're in stage 2. it reduces the size of the stage 1 code,
which is helpful because stage 1 will become the crt start file for
static-PIE executables, and it will allow stage 3 to access stage 2's
automatic storage, which will be important in an upcoming commit.
|
|
the outer-loop approach made sense when we were also processing
DT_JMPREL, which might be in REL or RELA form, to avoid major code
duplication. commit 09db855b35709aa627d7055c57a98e1e471920ab removed
processing of DT_JMPREL, and in the remaining two tables, the format
(REL or RELA) is known by the name of the table. simply writing two
versions of the loop results in smaller and simpler code.
|
|
the DT_JMPREL relocation table necessarily consists entirely of
JMP_SLOT (REL_PLT in internal nomenclature) relocations, which are
symbolic; they cannot be resolved in stage 1, so there is no point in
processing them.
|
|
this fixes a regression on powerpc that was introduced in commit
f3ddd173806fd5c60b3f034528ca24542aecc5b9. global data accesses on
powerpc seem to be using a translation-unit-local GOT filled via
R_PPC_ADDR32 relocations rather than R_PPC_GLOB_DAT. being a non-GOT
relocation type, these were not reprocessed after adding the main
application and its libraries to the chain, causing libc code not to
see copy relocations in the main program, and therefore to use the
pre-copy-relocation addresses for global data objects (like environ).
the motivation for the dynamic linker only reprocessing GOT/PLT
relocation types in stage 3 is that these types always have a zero
addend, making them safe to process again even if the storage for the
addend has been clobbered. other relocation types which can be used
for address constants in initialized data objects may have non-zero
addends which will be clobbered during the first pass of relocation
processing if they're stored inline (REL form) rather than out-of-line
(RELA form).
powerpc generally uses only RELA, so this patch is sufficient to fix
the regression in practice, but is not fully general, and would not
suffice if an alternate toolchain generated REL for powerpc.
|
|
the allocating path which can fail is for dynamic TLS, which can only
occur at runtime, and the check for runtime was already made in the
outer conditional.
|
|
commit 637dd2d383cc1f63bf02a732f03786857b22c7bd introduced the checks
for RTLD_DEFAULT and RTLD_NEXT here, claiming they fixed a regression,
but the above conditional block clearly already covered these cases,
and removing the checks produces no difference in the generated code.
|
|
this fixes truncation of error messages containing long pathnames or
symbol names.
the dlerror state was previously required by POSIX to be global. the
resolution of bug 97 relaxed the requirements to allow thread-safe
implementations of dlerror with thread-local state and message buffer.
|
|
these functions are never called directly; only their addresses are
used, so PLT indirections should never happen unless a broken
application tries to redefine them, but it's still best to make them
hidden.
|
|
|
|
this change is made in preparation to support linking without
-Bsymbolic-functions.
|
|
the braf instruction's destination register is an offset from the
address of the braf instruction plus 4 (or equivalently, the address
of the next instruction after the delay slot). the code for dlsym was
incorrectly computing the offset to pass using the address of the
delay slot itself. in other places, a label was placed after the delay
slot, but I find this confusing. putting the label on the branch
instruction itself, and manually adding 4, makes it more clear which
branch the offset in the constant pool goes with.
|
|
even hidden functions need @PLT symbol references; otherwise an
absolute address is produced instead of a PC-relative one.
|
|
previously, the dynamic tlsdesc lookup functions and the i386
special-ABI ___tls_get_addr (3 underscores) function called
__tls_get_addr when the slot they wanted was not already setup;
__tls_get_addr would then in turn also see that it's not setup and
call __tls_get_new.
calling __tls_get_new directly is both more efficient and avoids the
issue of calling a non-hidden (public API/ABI) function from asm.
for the special i386 function, a weak reference to __tls_get_new is
used since this function is not defined when static linking (the code
path that needs it is unreachable in static-linked programs).
|
|
|
|
at the point of call it was declared hidden, but the definition was
not hidden. for some toolchains this inconsistency produced textrels
without ld-time binding.
|
|
otherwise the call/jump from the crt_arch.h asm may not resolve
correctly without -Bsymbolic-functions.
|
|
the zero initialization is redundant since decode_vec does its own
clearing, and it increases the risk that buggy compilers will generate
calls to memset. as long as symbols are bound at ld time, such a call
will not break anything, but it may be desirable to turn off ld-time
binding in the future.
|
|
since 1.1.0, musl has nominally required a thread pointer to be setup.
most of the remaining code that was checking for its availability was
doing so for the sake of being usable by the dynamic linker. as of
commit 71f099cb7db821c51d8f39dfac622c61e54d794c, this is no longer
necessary; the thread pointer is now valid before any libc code
(outside of dynamic linker bootstrap functions) runs.
this commit essentially concludes "phase 3" of the "transition path
for removing lazy init of thread pointer" project that began during
the 1.1.0 release cycle.
|
|
this allows the dynamic linker itself to run with a valid thread
pointer, which is a prerequisite for stack protector on archs where
the ssp canary is stored in TLS. it will also allow us to remove some
remaining runtime checks for whether the thread pointer is valid.
as long as the application and its libraries do not require additional
size or alignment, this early thread pointer will be kept and reused
at runtime. otherwise, a new static TLS block is allocated after
library loading has finished and the thread pointer is switched over.
|
|
previously, the layout of the static TLS block was perturbed by the
size of the dtv; dtv size increasing from 0 to 1 perturbed both TLS
arch types, and the TLS-above-TP type's layout was perturbed by the
specific number of dtv slots (libraries with TLS). this behavior made
it virtually impossible to setup a tentative thread pointer address
before loading libraries and keep it unchanged as long as the
libraries' TLS size/alignment requirements fit.
the new code fixes the location of the dtv and pthread structure at
opposite ends of the static TLS block so that they will not move
unless size or alignment changes.
|
|
this overhaul further reduces the amount of arch-specific code needed
by the dynamic linker and removes a number of assumptions, including:
- that symbolic function references inside libc are bound at link time
via the linker option -Bsymbolic-functions.
- that libc functions used by the dynamic linker do not require
access to data symbols.
- that static/internal function calls and data accesses can be made
without performing any relocations, or that arch-specific startup
code handled any such relocations needed.
removing these assumptions paves the way for allowing libc.so itself
to be built with stack protector (among other things), and is achieved
by a three-stage bootstrap process:
1. relative relocations are processed with a flat function.
2. symbolic relocations are processed with no external calls/data.
3. main program and dependency libs are processed with a
fully-functional libc/ldso.
reduction in arch-specific code is achived through the following:
- crt_arch.h, used for generating crt1.o, now provides the entry point
for the dynamic linker too.
- asm is no longer responsible for skipping the beginning of argv[]
when ldso is invoked as a command.
- the functionality previously provided by __reloc_self for heavily
GOT-dependent RISC archs is now the arch-agnostic stage-1.
- arch-specific relocation type codes are mapped directly as macros
rather than via an inline translation function/switch statement.
|
|
when dlopen fails, all partially-loaded libraries need to be unmapped
and freed. any of these libraries using an rpath with $ORIGIN
expansion may have an allocated string for the expanded rpath;
previously, this string was not freed when freeing the library data
structures.
|
|
this change hardens the dynamic linker against the possibility of
loading the wrong library due to inability to expand $ORIGIN in rpath.
hard failures such as excessively long paths or absence of /proc (when
resolving /proc/self/exe for the main executable's origin) do not stop
the path search, but memory allocation failures and any other
potentially transient failures do.
to implement this change, the meaning of the return value of
fixup_rpath function is changed. returning zero no longer indicates
that the dso's rpath string pointer is non-null; instead, the caller
needs to check. a return value of -1 indicates a failure that should
stop further path search.
|
|
transient errors during the path search should not allow the search to
continue and possibly open the wrong file. this patch eliminates most
conditions where that could happen, but there is still a possibility
that $ORIGIN-based rpath processing will have an allocation failure,
causing the search to skip such a path. fixing this is left as a
separate task.
a small bug where overly-long path components caused an infinite loop
rather than being skipped/ignored is also fixed.
|
|
This adds complete aarch64 target support including bigendian subarch.
Some of the long double math functions are known to be broken otherwise
interfaces should be fully functional, but at this point consider this
port experimental.
Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
|
|
There are two main abi variants for thread local storage layout:
(1) TLS is above the thread pointer at a fixed offset and the pthread
struct is below that. So the end of the struct is at known offset.
(2) the thread pointer points to the pthread struct and TLS starts
below it. So the start of the struct is at known (zero) offset.
Assembly code for the dynamic TLSDESC callback needs to access the
dynamic thread vector (dtv) pointer which is currently at the front
of the pthread struct. So in case of (1) the asm code needs to hard
code the offset from the end of the struct which can easily break if
the struct changes.
This commit adds a copy of the dtv at the end of the struct. New members
must not be added after dtv_copy, only before it. The size of the struct
is increased a bit, but there is opportunity for size optimizations.
|
|
a conservative estimate of 4*sizeof(size_t) was used as the minimum
alignment for thread-local storage, despite the only requirements
being alignment suitable for struct pthread and void* (which struct
pthread already contains). additional alignment required by the
application or libraries is encoded in their headers and is already
applied.
over-alignment prevented the builtin_tls array from ever being used in
dynamic-linked programs on 64-bit archs, thereby requiring allocation
at startup even in programs with no TLS of their own.
|
|
the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
|
|
the new DT_RUNPATH semantics for search order are always used, and
since binutils had always set both DT_RPATH and DT_RUNPATH when the
latter was used, processing only DT_RPATH worked fine. however, recent
binutils has stopped generating DT_RPATH when DT_RUNPATH is used,
which broke support for this feature completely.
|
|
this allows most code to assume it has already been saved, and is a
prerequisite for upcoming changes for arm atomic/tls operations.
|
|
|
|
when the dynamic loader is disabled, dlopen fails correctly but dlerror
did not return a human readable error string like it should have.
|
|
With the exception of a fenv implementation, the port is fully featured.
The port has been tested in or1ksim, the golden reference functional
simulator for OpenRISC 1000.
It passes all libc-test tests (except the math tests that
requires a fenv implementation).
The port assumes an or1k implementation that has support for
atomic instructions (l.lwa/l.swa).
Although it passes all the libc-test tests, the port is still
in an experimental state, and has yet experienced very little
'real-world' use.
|
|
previously passing an empty string for name resulted in failure, as
expected, but only after spurious syscalls, and it produced confusing
errno values (and thus dlerror strings).
in addition to dlopen calls, this issue affected use of LD_PRELOAD
with trailing whitespace or colon characters.
|
|
|
|
this issue caused the address of functions in shared libraries to
resolve to their PLT thunks in the main program rather than their
correct addresses. it was observed causing crashes, though the
mechanism of the crash was not thoroughly investigated. since the
issue is very subtle, it calls for some explanation:
on all well-behaved archs, GOT entries that belong to the PLT use a
special relocation type, typically called JMP_SLOT, so that the
dynamic linker can avoid having the jump destinations for the PLT
resolve to PLT thunks themselves (they also provide a definition for
the symbol, which must be used whenever the address of the function is
taken so that all DSOs see the same address).
however, the traditional mips PIC ABI lacked such a JMP_SLOT
relocation type, presumably because, due to the way PIC works, the
address of the PLT thunk was never needed and could always be ignored.
prior to commit adf94c19666e687a728bbf398f9a88ea4ea19996, the mips
version of reloc.h contained a hack that caused all symbol lookups to
be treated like JMP_SLOT, inhibiting undefined symbols from ever being
used to resolve symbolic relocations. this hack goes all the way back
to commit babf820180368f00742ec65b2050a82380d7c542, when the mips
dynamic linker was first made usable.
during the recent refactoring to eliminate arch-specific relocation
processing (commit adf94c19666e687a728bbf398f9a88ea4ea19996), this
hack was overlooked and no equivalent functionality was provided in
the new code.
fixing the problem is not as simple as adding back an equivalent hack,
since there is now also a "non-PIC ABI" that can be used for the main
executable, which actually does use a PLT. the closest thing to
official documentation I could find for this ABI is nonpic.txt,
attached to Message-ID: 20080701202236.GA1534@caradoc.them.org, which
can be found in the gcc mailing list archives and elsewhere. per this
document, undefined symbols corresponding to PLT thunks have the
STO_MIPS_PLT bit set in the symbol's st_other field. thus, I have
added an arch-specific rule for mips, applied at the find_sym level
rather than the relocation level, to reject undefined symbols with the
STO_MIPS_PLT bit clear.
the previous hack of treating all mips relocations as JMP_SLOT-like,
rather than rejecting the unwanted symbols in find_sym, probably also
caused dlsym to wrongly return PLT thunks in place of the correct
address of a function under at least some conditions. this should now
be fixed, at least for global-scope symbol lookups.
|
|
due to a mistake when refactoring the error printing for the dynamic
linker (commit 7c73cacd09a51a87484db5689864743e4984a84d), all messages
were suppressed and replaced by blank lines.
|
|
the renaming was previously applied to all real versions of the
function in commit 3fa2eb2aba8d6b54dec53e7ad4c37e17392b166f.
|
|
the main motivation for this change is to aid in debugging. since the
main program's entry point is also named _start, it was difficult to
set breakpoints or quickly identify which _start execution stopped in.
|
|
|
|
such separation serves multiple purposes:
- by having the common path for __tls_get_addr alone in its own
function with a tail call to the slow case, code generation is
greatly improved.
- by having __tls_get_addr in it own file, it can be replaced on a
per-arch basis as needed, for optimization or ABI-specific purposes.
- by removing __tls_get_addr from __init_tls.c, a few bytes of code
are shaved off of static binaries (which are unlikely to use this
function unless the linker messed up).
|
|
|
|
previously, accesses to dynamic TLS had to check two conditions before
being able to use a dtv slot: (1) that the module index was within the
bounds of the current dtv size, and (2) that the dynamic tls for the
requested module index was already installed in the dtv.
this commit changes the installation strategy so that, whenever an
attempt is made to access dynamic TLS that's not yet installed in the
dtv, the dynamic TLS for all lower-index modules is also installed.
thus it provides a new invariant: if a given module index is within
the bounds of the current dtv size, we automatically know that its TLS
is installed and directly available. the requirement that the second
condition (above) be checked is eliminated.
|
|
this code is non-functional without further changes to link up the
arch-specific reloc types for tlsdesc and add asm implementations of
__tlsdesc_static and __tlsdesc_dynamic.
|
|
eventually this should help making dlerror thread-safe too.
|
|
this was one of the main instances of ugly code duplication: all archs
use basically the same types of relocations, but roughly equivalent
logic was duplicated for each arch to account for the different naming
and numbering of relocation types and variation in whether REL or RELA
records are used.
as an added bonus, both REL and RELA are now supported on all archs,
regardless of which is used by the standard toolchain.
|
|
so far the options are --library-path and --preload which override the
corresponding environment variables, and --list which forces the
behavior of ldd even if the invocation name is not ldd. both the
two-arg form and the one-arg form using an equals sign are supported.
based loosely on a patch proposed by Rune.
|