|Age||Commit message (Collapse)||Author||Lines|
the previous values (2k min and 8k default) were too small for some
archs. aarch64 reserves 4k in the signal context for future extensions
and requires about 4.5k total, and powerpc reportedly uses over 2k.
the new minimums are chosen to fit the saved context and also allow a
minimal signal handler to run.
since the default (SIGSTKSZ) has always been 6k larger than the
minimum, it is also increased to maintain the 6k usable by the signal
handler. this happens to be able to store one pathname buffer and
should be sufficient for calling any function in libc that doesn't
involve conversion between floating point and decimal representations.
x86 (both 32-bit and 64-bit variants) may also need a larger minimum
(around 2.5k) in the future to support avx-512, but the values on
these archs are left alone for now pending further analysis.
the value for PTHREAD_STACK_MIN is not increased to match MINSIGSTKSZ
at this time. this is so as not to preclude applications from using
extremely small thread stacks when they know they will not be handling
signals. unfortunately cancellation and multi-threaded set*id() use
signals as an implementation detail and therefore require a stack
large enough for a signal context, so applications which use extremely
small thread stacks may still need to avoid using these features.
The unwind code in libgcc uses this type for unwinding across signal
handlers. On aarch64 the kernel may place a sequence of structs on the
signal stack on top of the ucontext to provide additional information.
The unwinder only needs the header, but added all the types the kernel
currently defines for this mechanism because they are part of the uapi.
previously, commit e7b9887e8b65253087ab0b209dc8dd85c9f09614 aligned
the sizes with the glibc ABI. subsequent discussion during the merge
of the aarch64 port reached a conclusion that we should reject larger
arch-specific sizes, which have significant cost and no benefit, and
stick with the existing common 32-bit sizes for all 32-bit/ILP32 archs
and the x86_64 sizes for 64-bit archs.
one peculiarity of this change is that x32 pthread_attr_t is now
larger in musl than in the glibc x32 ABI, making it unsafe to call
pthread_attr_init from x32 code that was compiled against glibc. with
all the ABI issues of x32, it's not clear that ABI compatibility will
ever work, but if it's needed, pthread_attr_init and related functions
could be modified not to write to the last slot of the object.
this is not a regression versus previous releases, since on previous
releases the x32 pthread type sizes were all severely oversized
already (due to incorrectly using the x86_64 LP64 definitions).
moreover, x32 is still considered experimental and not ABI-stable.
This adds complete aarch64 target support including bigendian subarch.
Some of the long double math functions are known to be broken otherwise
interfaces should be fully functional, but at this point consider this
Initial work on this port was done by Sireesh Tripurari and Kevin Bortis.
Implemented as a wrapper around fegetround introducing a new function
to the ABI: __flt_rounds. (fegetround cannot be used directly from float.h)
these macros have the same distinct definition on blackfin, frv, m68k,
mips, sparc and xtensa kernels. POLLMSG and POLLRDHUP additionally
differ on sparc.
the previous definitions were copied from x86_64. not only did they
fail to match the ABI sizes; they also wrongly encoded an assumption
that long/pointer types are twice as large as int.
the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
this syscall allows fexecve to be implemented without /proc, it is new
in linux v3.19, added in commit 51f39a1f0cea1cacf8c787f652f26dfee9611874
(sh and microblaze do not have allocated syscall numbers yet)
added a x32 fix as well: the io_setup and io_submit syscalls are no
longer common with x86_64, so use the x32 specific numbers.
x86_64 syscall.h defined some musl internal syscall names and made
them public. These defines were already moved to src/internal/syscall.h
(except for SYS_fadvise which is added now) so the cruft in x86_64
syscall.h is not needed.
mxcs_mask should be mxcr_mask
the definitions are generic for all kernel archs. exposure of these
macros now only occurs on the same feature test as for the function
accepting them, which is believed to be more correct.
the errno values are unused by the kernel and the macro definitions were
never exposed by glibc.
these syscalls are new in linux v3.18, bpf is present on all
supported archs except sh, kexec_file_load is only allocted for
x86_64 and x32 yet.
bpf was added in linux commit 99c55f7d47c0dc6fc64729f37bf435abf43f4c60
kexec_file_load syscall number was allocated in commit
it is part of kernel uapi, and some programs (e.g. nodejs) do use them
except powerpc, which still lacks inline syscalls simply because
nobody has written the code, these are all fallbacks used to work
around a clang bug that probably does not exist in versions of clang
that can compile musl. however, it's useful to have the generic
non-inline code anyway, as it eases the task of porting to new archs:
writing inline syscall code is now optional. this approach could also
help support compilers which don't understand inline asm or lack
support for the needed register constraints.
mips could not be unified because it has special fixup code for broken
layout of the kernel's struct stat.
the register constraints in the non-clang case were tested to work on
clang back to 3.2, and earlier versions of clang have known bugs that
preclude building musl.
there may be other reasons to prefer not to use inline syscalls, but
if so the function-call-based implementations should be added back in
a unified way for all archs.
calls to __aeabi_read_tp may be generated by the compiler to access
TLS on pre-v6 targets. previously, this function was hard-coded to
call the kuser helper, which would crash on kernels with kuser helper
to fix the problem most efficiently, the definition of __aeabi_read_tp
is moved so that it's an alias for the new __a_gettp. however, on v7+
targets, code to initialize the runtime choice of thread-pointer
loading code is not even compiled, meaning that defining
__aeabi_read_tp would have caused an immediate crash due to using the
default implementation of __a_gettp with a HCF instruction.
fortunately there is an elegant solution which reduces overall code
size: putting the native thread-pointer loading instruction in the
default code path for __a_gettp, so that separate default/native code
paths are not needed. this function should never be called before
__set_thread_area anyway, and if it is called early on pre-v6
hardware, the old behavior (crashing) is maintained.
ideally __aeabi_read_tp would not be called at all on v7+ targets
anyway -- in fact, prior to the overhaul, the same problem existed,
but it was never caught by users building for v7+ with kuser disabled.
however, it's possible for calls to __aeabi_read_tp to end up in a v7+
binary if some of the object files were built for pre-v7 targets, e.g.
in the case of static libraries that were built separately, so this
case needs to be handled.
previously, builds for pre-armv6 targets hard-coded use of the "kuser
helper" system for atomics and thread-pointer access, resulting in
binaries that fail to run (crash) on systems where this functionality
has been disabled (as a security/hardening measure) in the kernel.
additionally, builds for armv6 hard-coded an outdated/deprecated
memory barrier instruction which may require emulation (extremely
slow) on future models.
this overhaul replaces the behavior for all pre-armv7 builds (both of
the above cases) to perform runtime detection of the appropriate
mechanisms for barrier, atomic compare-and-swap, and thread pointer
access. detection is based on information provided by the kernel in
auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture
version encoded in AT_PLATFORM. direct use of the instructions is
preferred when possible, since probing for the existence of the kuser
helper page would be difficult and would incur runtime cost.
for builds targeting armv7 or later, the runtime detection code is not
compiled at all, and much more efficient versions of the non-cas
atomic operations are provided by using ldrex/strex directly rather
than wrapping cas.
the kernel syscall interface for or1k does not expect 64-bit arguments
to be aligned to "even" register boundaries. this incorrect alignment
broke truncate/ftruncate and as well as a few less-common syscalls.
these syscalls are new in linux v3.17 and present on all supported
archs except sh.
seccomp was added in commit 48dc92b9fc3926844257316e75ba11eb5c742b2c
it has operation, flags and pointer arguments (if flags==0 then it is
the same as prctl(PR_SET_SECCOMP,...)), the uapi header for flag
definitions is linux/seccomp.h
getrandom was added in commit c6e9d6f38894798696f23c8084ca7edbf16ee895
it provides an entropy source when open("/dev/urandom",..) would fail,
the uapi header for flags is linux/random.h
memfd_create was added in commit 9183df25fe7b194563db3fec6dc3202a5855839c
it allows anon mmap to have an fd, that can be shared, sealed and needs no
mount point, the uapi header for flags is linux/memfd.h
the C11 _Alignas keyword is not present in C++, and despite it being
in the reserved namespace and thus reasonable to support even in
non-C11 modes, compilers seem to fail to support it.
based on patch by Jens Gustedt.
mtx_t and cnd_t are defined in such a way that they are formally
"compatible types" with pthread_mutex_t and pthread_cond_t,
respectively, when accessed from a different translation unit. this
makes it possible to implement the C11 functions using the pthread
functions (which will dereference them with the pthread types) without
having to use the same types, which would necessitate either namespace
violations (exposing pthread type names in threads.h) or incompatible
changes to the C++ name mangling ABI for the pthread types.
for the rest of the types, things are much simpler; using identical
types is possible without any namespace considerations.
this was broken by commit ea818ea8340c13742a4f41e6077f732291aea4bc.
conceptually, a_spin needs to be at least a compiler barrier, so the
compiler will not optimize out loops (and the load on each iteration)
while spinning. it should also be a memory barrier, or the spinning
thread might keep spinning without noticing stores from other threads,
thus delaying for longer than it should.
ideally, an optimal a_spin implementation that avoids unnecessary
cache/memory contention should be chosen for each arch, but for now,
the easiest thing is to perform a useless a_cas on the calling
unfortunately this needs to be able to vary by arch, because of a huge
mess GCC made: the GCC definition, which became the ABI, depends on
quirks in GCC's definition of __alignof__, which does not match the
formal alignment of the type.
GCC's __alignof__ unexpectedly exposes the an implementation detail,
its "preferred alignment" for the type, rather than the formal/ABI
alignment of the type, which it only actually uses in structures. on
most archs the two values are the same, but on some (at least i386)
the preferred alignment is greater than the ABI alignment.
I considered using _Alignas(8) unconditionally, but on at least one
arch (or1k), the alignment of max_align_t with GCC's definition is
only 4 (even the "preferred alignment" for these types is only 4).
when manipulating the robust list, the order of stores matters,
because the code may be asynchronously interrupted by a fatal signal
and the kernel will then access the robust list in what is essentially
an async-signal context.
previously, aliasing considerations made it seem unlikely that a
compiler could reorder the stores, but proving that they could not be
reordered incorrectly would have been extremely difficult. instead
I've opted to make all the pointers used as part of the robust list,
including those in the robust list head and in the individual mutexes,
in addition, the format of the robust list has been changed to point
back to the head at the end, rather than ending with a null pointer.
this is to match the documented kernel robust list ABI. the null
pointer, which was previously used, only worked because faults during
access terminate the robust list processing.
for or1k, the kernel expects the offset passed to mmap2 in units of
the 8k page size, not the standard unit of 4k used on most other
according to Stefan Kristiansson, or1k page size is not actually
variable and the value of 8192 is part of the ABI.
this commit changes the names to match the kernel names, exposing
under the normal names the "old" versions which work with a smaller
termios structure compatible with the userspace structure, and
renaming the "new" versions with "2" on the end like the kernel has.
this fixes spurious warnings "Unsupported ioctl: cmd=0x802c542a" from
qemu-sh4 and should be more correct anyway, since our userspace
termios structure does not have meaningful information in the part
which the kernel would be interpreting as speeds with the new ioctl.
this follows the same logic as in the previous commit for other archs.
the a_cas_l, a_swap_l, a_swap_p, and a_store_l operations were
probably used a long time ago when only i386 and x86_64 were
supported. as other archs were added, support for them was
inconsistent, and they are obviously not in use at present. having
them around potentially confuses readers working on new ports, and the
type-punning hacks and inconsistent use of types in their definitions
is not a style I wish to perpetuate in the source tree, so removing
them seems appropriate.
while other usage I've seen only has the synco instruction after the
atomic operation, I cannot find any documentation indicating that this
is correct. certainly all stores before the atomic need to have been
synchronized before the atomic operation takes place.
it's like rename but with flags eg. to allow atomic exchange of two files,
introduced in linux 3.15 commit 520c8b16505236fc82daa352e6c5e73cd9870cff
due to what was essentially a copy and paste error, the changes made
in commit f61be1f875a2758509d6e9e2cf6f1d9603b28b65 caused syscalls
with 5 or 6 arguments (and syscalls with 2, 3, or 4 arguments when
compiled with clang compatibility) to negate the returned error code a
second time, breaking errno reporting.
the mips version of this structure on the kernel side wrongly has
32-bit type rather than 64-bit type. fortunately there is adjacent
padding to bring it up to 64 bits, and on little-endian, this allows
us to treat the adjacent kernel st_dev and st_pad0 as as single
64-bit dev_t. however, on big endian, such treatment results in the
upper and lower 32-bit parts of the dev_t value being swapped. for the
purpose of just comparing st_dev values this did not break anything,
but it precluded actually processing the device numbers as major/minor
since the broken kernel behavior that needs to be worked around is
isolated to one arch, I put the workarounds in syscall_arch.h rather
than adding a stat fixup path in the common code. on little endian
mips, the added code optimizes out completely.
the changes necessary were incompatible with the way the __asm_syscall
macro was factored so I just removed it and flattened the individual
__syscallN functions. this arguably makes the code easier to read and
at the very least, a compiler barrier is required no matter what, and
that was missing. current or1k implementations have strong ordering,
but this is not guaranteed as part of the ISA, so some sort of
synchronizing operation is necessary.
in principle we should use l.msync, but due to misinterpretation of
the spec, it was wrongly treated as an optional instruction and is not
supported by some implementations. if future kernels trap it and treat
it as a nop (rather than illegal instruction) when the
hardware/emulator does not support it, we could consider using it.
in the absence of l.msync support, the l.lwa/l.swa instructions, which
are specified to have a built-in l.msync, need to be used. the easiest
way to use them to implement atomic store is to perform an atomic swap
and throw away the result. using compare-and-swap would be lighter,
and would probably be sufficient for all actual usage cases, but
checking this is difficult and error-prone:
with store implemented in terms of swap, it's guaranteed that, when
another atomic operation is performed at the same time as the store,
either the result of the store followed by the other operation, or
just the store (clobbering the other operation's result) is seen. if
store were implemented in terms of cas, there are cases where this
invariant would fail to hold, and we would need detailed rules for the
situations in which the store operation is well-defined.
as far as I can tell, microblaze is strongly ordered, but this does
not seem to be well-documented and the assumption may need revisiting.
even with strong ordering, however, a volatile C assignment is not
sufficient to implement atomic store, since it does not preclude
reordering by the compiler with respect to non-volatile stores and
simply flanking a C store with empty volatile asm blocks with memory
clobbers would achieve the desired result, but is likely to result in
worse code generation, since the address and value for the store may
need to be spilled. actually writing the store in asm, so that there's
only one asm block, should give optimal code generation while
satisfying the requirement for having a compiler barrier.
previously I had wrongly assumed the ll/sc instructions also provided
memory synchronization; apparently they do not. this commit adds sync
instructions before and after each atomic operation and changes the
atomic store to simply use sync before and after a plain store, rather
than a useless compare-and-swap.
despite lacking the semantic content that the asm accesses the
pointed-to object rather than just using its address as a value, the
mips asm was not actually broken. the asm blocks were declared
volatile, meaning that the compiler must treat them as having unknown
however changing the asm to use memory constraints is desirable not
just from a semantic correctness and consistency standpoint, but also
produces better code. the compiler is able to use base/offset
addressing expressions for the atomic object's address rather than
having to load the address into a single register. this improves
access to global locks in static libc, and access to non-zero-offset
atomic fields in synchronization primitives, etc.
due to a mistake in my testing procedure, the changes in the previous
commit were not correctly tested and wrongly assumed to be valid. the
lwarx and stwcx. instructions do not accept general ppc memory address
expressions and thus the argument associated with the memory
constraint cannot be used directly.
instead, the memory constraint can be left as an argument that the asm
does not actually use, and the address can be provided in a separate
the register constraint for the address to be accessed did not convey
that the asm can access the pointed-to object. as far as the compiler
could tell, the result of the asm was just a pure function of the
address and the values passed in, and thus the asm could be hoisted
out of loops or omitted entirely if the result was not used.
the erroneous definition was missed because with works with qemu
user-level emulation, which also has the wrong definition. the actual
kernel uses the asm-generic generic definition.