musl/src/env, branch master

protect stack canary from leak via read-as-string by zeroing second byte

2022-03-08T21:52:25+00:00

This reduces entropy of the canary from 64-bit to 56-bit in exchange
for mitigating non-terminated C string overflows by setting the second
byte of the canary to nul, so that off-by-one write overflow with a
nul byte can still be detected.

Idea from GrapheneOS bionic commit 7024d880b51f03a796ff8832f1298f2f1531fd7b

fix inconsistent signature of __libc_start_main

2021-01-30T21:42:26+00:00

commit 7586360badcae6e73f04eb1b8189ce630281c4b2 removed the unused
arguments from the definition of __libc_start_main, making it
incompatible with the declaration at the point of call, which still
passed 6 arguments. calls with mismatched function type have undefined
behavior, breaking LTO and any other tooling that checks for function
signature mismatch.

removing the extra arguments from the point of call (crt1) is not an
option for fixing this, since that would be a change in ABI surface
between application and libc.

adding back the extra arguments requires some care. on archs that pass
arguments on the stack or that reserve argument spill space for the
callee on the stack, it imposes an ABI requirement on the caller to
provide such space. the modern crt1.c entry point provides such space,
but originally there was arch-specific asm for the call to
__libc_start_main. the last of this asm was removed in commit
6fef8cafbd0f6f185897bc87feb1ff66e2e204e1, and manual review of the
code removed and its prior history was performed to check that all
archs/variants passed the legacy init/fini/ldso_fini arguments.

remove redundant pthread struct members repeated for layout purposes

2020-08-27T22:36:45+00:00

dtv_copy, canary2, and canary_at_end existed solely to match multiple
ABI and asm-accessed layouts simultaneously. now that pthread_arch.h
can be included before struct __pthread is defined, the struct layout
can depend on macros defined by pthread_arch.h.

add secure_getenv function

2019-08-08T15:33:18+00:00

This function is a GNU extension introduced in glibc 2.17.

fix tls offsets when p_vaddr%p_align != 0 on TLS_ABOVE_TP targets

2019-05-17T01:48:39+00:00

currently the bfd linker does not seem to create tls segments where
p_vaddr%p_align != 0, but this is valid in ELF and then the runtime
computed tls offset must satisfy

  offset%p_align == (base+p_vaddr)%p_align

and in case of local exec tls (main executable) the smallest such
offset must be used (otherwise it is incompatible with the offset
computed by the static linker). the !TLS_ABOVE_TP case is handled
correctly (the offset is negative then in the formula).

the ldso code for TLS_ABOVE_TP is changed so the static tls offset
of each module satisfies the formula.

overhaul i386 syscall mechanism not to depend on external asm source

2019-04-10T21:10:36+00:00

this is the first part of a series of patches intended to make
__syscall fully self-contained in the object file produced using
syscall.h, which will make it possible for crt1 code to perform
syscalls.

the (confusingly named) i386 __vsyscall mechanism, which this commit
removes, was introduced before the presence of a valid thread pointer
was mandatory; back then the thread pointer was setup lazily only if
threads were used. the intent was to be able to perform syscalls using
the kernel's fast entry point in the VDSO, which can use the sysenter
(Intel) or syscall (AMD) instruction instead of int $128, but without
inlining an access to the __syscall global at the point of each
syscall, which would incur a significant size cost from PIC setup
everywhere. the mechanism also shuffled registers/calling convention
around to avoid spills of call-saved registers, and to avoid
allocating ebx or ebp via asm constraints, since there are plenty of
broken-but-supported compiler versions which are incapable of
allocating ebx with -fPIC or ebp with -fno-omit-frame-pointer.

the new mechanism preserves the properties of avoiding spills and
avoiding allocation of ebx/ebp in constraints, but does it inline,
using some fairly simple register shuffling, and uses a field of the
thread structure rather than global data for the vdso-provided syscall
code address.

for now, the external __syscall function is refactored not to use the
old __vsyscall so it can be kept, but the intent is to remove it too.

track all live threads in an AS-safe, fully-consistent linked list

2019-02-16T03:29:01+00:00

the hard problem here is unlinking threads from a list when they exit
without creating a window of inconsistency where the kernel task for a
thread still exists and is still executing instructions in userspace,
but is not reflected in the list. the magic solution here is getting
rid of per-thread exit futex addresses (set_tid_address), and instead
using the exit futex to unlock the global thread list.

since pthread_join can no longer see the thread enter a detach_state
of EXITED (which depended on the exit futex address pointing to the
detach_state), it must now observe the unlocking of the thread list
lock before it can unmap the joined thread and return. it doesn't
actually have to take the lock. for this, a __tl_sync primitive is
offered, with a signature that will allow it to be enhanced for quick
return even under contention on the lock, if needed. for now, the
exiting thread always performs a futex wake on its detach_state. a
future change could optimize this out except when there is already a
joiner waiting.

initial/dynamic variants of detached state no longer need to be
tracked separately, since the futex address is always set to the
global list lock, not a thread-local address that could become invalid
on detached thread exit. all detached threads, however, must perform a
second sigprocmask syscall to block implementation-internal signals,
since locking the thread list with them already blocked is not
permissible.

the arch-independent C version of __unmapself no longer needs to take
a lock or setup its own futex address to release the lock, since it
must necessarily be called with the thread list lock already held,
guaranteeing exclusive access to the temporary stack.

changes to libc.threads_minus_1 no longer need to be atomic, since
they are guarded by the thread list lock. it is largely vestigial at
this point, and can be replaced with a cheaper boolean indicating
whether the process is multithreaded at some point in the future.

__libc_start_main: slightly simplify stage2 pointer setup

2018-11-02T16:13:53+00:00

Use "+r" in the asm instead of implementing a non-transparent copy by
applying "0" constraint to the source value. Introduce a typedef for
the function type to avoid spelling it out twice.

use prototype for function pointer in static link libc init barrier

2018-10-18T15:44:49+00:00

this is not needed for correctness, but doesn't hurt, and in some
cases the compiler may pessimize the call assuming the callee might be
variadic when it lacks a prototype.

fix error in constraints for static link libc init barrier

2018-10-18T15:41:47+00:00

commit 4390383b32250a941ec616e8bff6f568a801b1c0 inadvertently used "r"
instead of "0" for the input constraint, which only happened to work
for the configuration I tested it on because it usually makes sense
for the compiler to choose the same input and output register.