|Age||Commit message (Collapse)||Author||Lines|
linux commit a49f4f81cb48925e8d7cbd9e59068f516e984144
arch: Wire up Landlock syscalls
Merge tag 'landlock_v34' of ... jmorris/linux-security
Landlock provides for unprivileged application sandboxing. The goal of
Landlock is to enable to restrict ambient rights (e.g. global filesystem
access) for a set of processes. Landlock is inspired by seccomp-bpf but
instead of filtering syscalls and their raw arguments, a Landlock rule
can restrict the use of kernel objects like file hierarchies, according
to the kernel semantic.
new syscall to change the properties of a mount or a mount tree using
file descriptors which the new mount api is based on, see
linux commit 2a1867219c7b27f928e2545782b86daaf9ad50bd
fs: add mount_setattr()
linux commit b0a0c2615f6f199a656ed8549d7dce625d77aa77
epoll: wire up syscall epoll_pwait2
linux commit 58169a52ebc9a733aeb5bea857bc5daa71a301bb
epoll: add syscall epoll_pwait2
epoll_wait with struct timespec timeout instead of int. no time32 variant.
When the soft-float ABI for PowerPC was added in commit
5a92dd95c77cee81755f1a441ae0b71e3ae2bcdb, with Freescale cpus using
the alternative SPE FPU as the main use case, it was noted that we
could probably support hard float on them, but that it would involve
determining some difficult ABI constraints. This commit is the
completion of that work.
The Power-Arch-32 ABI supplement defines the ABI profiles, and indeed
ATR-SPE is built on ATR-SOFT-FLOAT. But setjmp/longjmp compatibility
are problematic for the same reason they're problematic on ARM, where
optional float-related parts of the register file are "call-saved if
present". This requires testing __hwcap, which is now done.
In keeping with the existing powerpc-sf subarch definition, which did
not have fenv, the fenv macros are not defined for SPE and the SPEFSCR
control register is left (and assumed to start in) the default mode.
the kernel structure has padding of the shm_segsz member up to 64
bits, as well as 2 unused longs at the end. somehow that was
overlooked when the powerpc port was added, and it has been broken
ever since; applications compiled with the wrong definition do not
correctly see the shm_segsz, shm_cpid, and shm_lpid members.
fixing the definition just by adding the missing padding would break
the ABI size of the structure as well as the position of the time64
shm_atime and shm_dtime members we added at the end. instead, just
move one of the unused padding members from the original end (before
time64) of the structure to the position of the missing padding. this
preserves size and preserves correct behavior of any compiled code
that was already working. programs affected by the wrong definition
need to be recompiled with the correct one.
mainly added to linux to allow a central process management service in
android to give MADV_COLD|PAGEOUT hints for other processes, see
linux commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
mm/madvise: introduce process_madvise() syscall: an external memory
linux commit 9b4feb630e8e9801603f3cab3a36369e3c1cf88d
arch: wire-up close_range()
linux commit 278a5fbaed89dacd04e9d052f4594ffd0e0585de
open: add close_range()
the linux faccessat syscall lacks a flag argument that is necessary
to implement the posix api, see
linux commit c8ffd8bcdd28296a198f237cc595148a8d4adfbe
vfs: add faccessat2 syscall
also added clone3 on sh and m68k, on sh it's still missing (not
yet wired up), but reserved so safe to add.
linux commit fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179
open: introduce openat2(2) syscall
linux commit 9a2cef09c801de54feecd912303ace5c27237f12
arch: wire up pidfd_getfd syscall
linux commit 8649c322f75c96e7ced2fec201e123b2b073bf09
pid: Implement pidfd_getfd syscall
linux commit e8bb2a2a1d51511e6b3f7e08125d52ec73c11139
m68k: Wire up clone3() syscall
dtv_copy, canary2, and canary_at_end existed solely to match multiple
ABI and asm-accessed layouts simultaneously. now that pthread_arch.h
can be included before struct __pthread is defined, the struct layout
can depend on macros defined by pthread_arch.h.
the adjustment made is entirely a function of TLS_ABOVE_TP and
TP_OFFSET. aside from avoiding repetition of the TP_OFFSET value and
arithmetic, this change makes pthread_arch.h independent of the
definition of struct __pthread from pthread_impl.h. this in turn will
allow inclusion of pthread_arch.h to be moved to the top of
pthread_impl.h so that it can influence the definition of the
previously, arch files were very inconsistent about the type used for
the thread pointer. this change unifies the new __get_tp interface to
always use uintptr_t, which is the most correct when performing
arithmetic that may involve addresses outside the actual pointed-to
object (due to TP_OFFSET).
the only part of TP_ADJ that was not uniquely determined by
TLS_ABOVE_TP was the 0x7000 adjustment used mainly on mips and powerpc
Linux defines MAP_SYNC on powerpc and powerpc64 as of commit
22fcea6f85f2 ("mm: move MAP_SYNC to asm-generic/mman-common.h"),
so we can stop undefining it on those architectures.
this extends commit 5a105f19b5aae79dd302899e634b6b18b3dcd0d6, removing
timer[fd]_settime and timer[fd]_gettime. the timerfd ones are likely
to have been used in software that started using them before it could
rely on libc exposing functions.
this extends commit 5a105f19b5aae79dd302899e634b6b18b3dcd0d6, removing
clock_settime, clock_getres, clock_nanosleep, and settimeofday.
some nontrivial number of applications have historically performed
direct syscalls for these operations rather than using the public
functions. such usage is invalid now that time_t is 64-bit and these
syscalls no longer match the types they are used with, and it was
already harmful before (by suppressing use of vdso).
since syscall() has no type safety, incorrect usage of these syscalls
can't be caught at compile-time. so, without manually inspecting or
running additional tools to check sources, the risk of such errors
slipping through is high.
this patch renames the syscalls on 32-bit archs to clock_gettime32 and
gettimeofday_time32, so that applications using the original names
will fail to build without being fixed.
note that there are a number of other syscalls that may also be unsafe
to use directly after the time64 switchover, but (1) these are the
main two that seem to be in widespread use, and (2) most of the others
continue to have valid usage with a null timeval/timespec argument, as
the argument is an optional timeout or similar.
the syscall number is reserved on all targets, but it is not wired up
on all targets, see
linux commit 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
Merge tag 'clone3-v5.3' of ... brauner/linux
linux commit 8f3220a806545442f6f26195bc491520f5276e7c
arch: wire-up clone3() syscall
linux commit 7f192e3cd316ba58c88dfa26796cf77789dd9872
fork: add clone3
linux commit 7615d9e1780e26e0178c93c55b73309a5dc093d7
arch: wire-up pidfd_open()
linux commit 32fcb426ec001cb6d5a4a195091a8486ea77e2df
pid: add pidfd_open()
now that all 32-bit archs have 64-bit time_t (and suseconds_t), the
arch-provided _Int64 macro (long or long long, as appropriate) can be
used to define them, and arch-specific definitions are no longer
now that all 32-bit archs have 64-bit time types, the values for the
time-related socket option macros can be treated as universal for
32-bit archs. the sys/socket.h mechanism for this predates
arch/generic and is instead in the top-level header.
x32, which does not use the new time64 values of the macros, already
has its own overrides, so this commit does not affect it.
this commit preserves ABI fully for existing interface boundaries
between libc and libc consumers (applications or libraries), by
retaining existing symbol names for the legacy 32-bit interfaces and
redirecting sources compiled against the new headers to alternate
symbol names. this does not necessarily, however, preserve the
pairwise ABI of libc consumers with one another; where they use
time_t-derived types in their interfaces with one another, it may be
necessary to synchronize updates with each other.
the intent is that ABI resulting from this commit already be stable
and permanent, but it will not be officially so until a release is
made. changes to some header-defined types that do not play any role
in the ABI between libc and its consumers may still be subject to
mechanically, the changes made by this commit for each 32-bit arch are
- _REDIR_TIME64 is defined to activate the symbol redirections in
- COMPAT_SRC_DIRS is defined in arch.mak to activate build of ABI
compat shims to serve as definitions for the original symbol names
- time_t and suseconds_t definitions are changed to long long (64-bit)
- IPC_STAT definition is changed to add the IPC_TIME64 bit (0x100),
triggering conversion of semid_ds, shmid_ds, and msqid_ds split
low/high time bits into new time_t members
- structs semid_ds, shmid_ds, msqid_ds, and stat are modified to add
new 64-bit time_t/timespec members at the end, maintaining existing
layout of other members.
- socket options (SO_*) and ioctl (sockios) command macros are
redefined to use the kernel's "_NEW" values.
in addition, on archs where vdso clock_gettime is used, the
VDSO_CGT_SYM macro definition in syscall_arch.h is changed to use a
new time64 vdso function if available, and a new VDSO_CGT32_SYM macro
is added for use as fallback on kernels lacking time64.
these structures can now be defined generically in terms of endianness
and long size. previously, the 32-bit archs all shared a common
definition from the generic bits header, and each 64-bit arch had to
repeat the 64-bit version, with endian conditionals if the arch had
variants of each endianness.
I would prefer getting rid of the preprocessor conditionals for
padding and instead using unnamed bitfield members, like commit
9b2921bea1d5017832e1b45d1fd64220047a9802 did for struct timespec.
however, at present sendmsg, recvmsg, and recvmmsg need access to the
padding members by name to zero them. this could perhaps be cleaned up
in the future.
this is to match the kernel and glibc interfaces. here, struct pt_regs
is an incomplete type, but that's harmless, and if it's completed by
inclusion of another header then members of the struct pointed to by
the regs member can be accessed directly without going through a cast
or intermediate pointer object.
policy has long been that these definitions are purely a function of
whether long/pointer is 32- or 64-bit, and that they are not allowed
to vary per-arch. move the definition to the shared alltypes.h.in
fragment, using integer constant expressions in terms of sizeof to
vary the array dimensions appropriately. I'm not sure whether this is
more or less ugly than using preprocessor conditionals and two sets of
definitions here, but either way is a lot less ugly than repeating the
same thing for every arch.
LLONG_MAX is uniform for all archs we support and plenty of header and
code level logic assumes it is, so it does not make sense for limits.h
bits mechanism to pretend it's variable.
LONG_BIT can be defined in terms of LONG_MAX; there's no reason to put
it in bits.
by moving LONG_MAX definition to __LONG_MAX in alltypes.h and moving
LLONG_MAX out of bits, there are now no plain-C limits that are
defined in the bits header, so the bits header only needs to be
included in the POSIX or extended profiles. this allows the feature
test macro logic to be removed from the bits header, facilitating a
long-term goal of getting such logic out of bits.
having __LONG_MAX in alltypes.h will allow further generalization of
archs without a constant PAGESIZE no longer need bits/limits.h at all.
this change is motivated by the intersection of several factors.
presently, despite being a nonstandard header, endian.h is exposing
the unprefixed byte order macros and functions only if _BSD_SOURCE or
_GNU_SOURCE is defined. this is to accommodate use of endian.h from
other headers, including bits headers, which need to define structure
layout in terms of endianness. with time64 switch-over, even more
headers will need to do this.
at the same time, the resolution of Austin Group issue 162 makes
endian.h a standard header for POSIX-future, requiring that it expose
the unprefixed macros and the functions even in standards-conforming
profiles. changes to meet this new requirement would break existing
internal usage of endian.h by causing it to violate namespace where
instead, have the arch's alltypes.h define __BYTE_ORDER, either as a
fixed constant or depending on the right arch-specific predefined
macros for determining endianness. explicit literals 1234 and 4321 are
used instead of __LITTLE_ENDIAN and __BIG_ENDIAN so that there's no
danger of getting the wrong result if a macro is undefined and
implicitly evaluates to 0 at the preprocessor level.
the powerpc (32-bit) bits/endian.h being removed had logic for varying
endianness, but our powerpc arch has never supported that and has
always been big-endian-only. this logic is not carried over to the new
__BYTE_ORDER definition in alltypes.h.
now that commit f7f1079796abc6f97c69521d2334e9c7d3945dd8 removed the
legacy i386 conditional definition, va_list is in no way
arch-specific, and has no reason to be in the future. move it to the
shared part of alltypes.h.in
new mount api syscalls were added, same numers on all targets, see
linux commit a07b20004793d8926f78d63eb5980559f7813404
vfs: syscall: Add open_tree(2) to reference or clone a mount
linux commit 2db154b3ea8e14b04fee23e3fdfd5e9d17fbc6ae
vfs: syscall: Add move_mount(2) to move mounts around
linux commit 24dcb3d90a1f67fe08c68a004af37df059d74005
vfs: syscall: Add fsopen() to prepare for superblock creation
linux commit ecdab150fddb42fe6a739335257949220033b782
vfs: syscall: Add fsconfig() for configuring and managing a context
linux commit 93766fbd2696c2c4453dd8e1070977e9cd4e6b6d
vfs: syscall: Add fsmount() to create a mount for a superblock
linux commit cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb
vfs: syscall: Add fspick() to select a superblock for reconfiguration
linux commit 9c8ad7a2ff0bfe58f019ec0abc1fb965114dde7d
uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]
linux commit d8076bdb56af5e5918376cd1573a6b0007fc1a89
uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]
historically, a number of 32-bit archs used long rather than int for
wchar_t, for no good reason. GCC still uses the historical types, but
clang replaced them all with int, and it seems PCC uses int too.
mismatching the compiler's type for wchar_t is not an option due to
wide string literals.
note that the mismatch does not affect C++ ABI since wchar_t is its
own builtin type/keyword in C++, distinct from both int and long, not
i386 already worked around this by honoring __WCHAR_TYPE__ if defined
by the compiler, and only using the official legacy ABI type if not.
add the same to the other affected archs.
it might make sense at some point to switch to using int as the
default if __WCHAR_TYPE__ is not defined, if the expectations is that
new compilers will treat int as the correct choice, but it's unlikely
that the case where __WCHAR_TYPE__ is undefined will ever be used
anyway. I actually wanted to move the definition of wchar_t to the
top-level shared alltypes.h.in, using __WCHAR_TYPE__ and falling back
to int if not defined, but that can't be done without assuming all
compilers define __WCHAR_TYPE__ thanks to some pathological archs
where the ABI has wchar_t as an unsigned type.
due to historical accident/sloppiness in glibc, the powerpc,
powerpc64, and sh versions of struct user, defined by sys/user.h, used
struct pt_regs from the kernel asm/ptrace.h for their regs member.
this made it impossible to define the type in an API-compatible manner
without either including asm/ptrace.h like glibc does (contrary to our
policy of not depending on kernel headers), or clashing with
asm/ptrace.h's definition of struct pt_regs if both headers are
included (which is almost always the case in software using
for a long time I viewed this problem as having no reasonable fix. I
even explored the possibility of having the powerpc and sh
versions of user.h just include the kernel header (breaking with
policy), but that looked like it might introduce new clashes with
sys/ptrace.h. and it would also bring in a lot of additional cruft
that makes no sense for sys/user.h to expose. glibc goes out of its
way to suppress some of that with #undef, possibly leading to
different problems. this is a rabbit-hole that should be explored no
as it turns out, however, nothing actually uses struct user
sufficiently to care about the type of the regs member; most software
including sys/user.h does not even use struct user at all. so, the
problem can be fixed just by doing away with the insistence on strict
glibc API compatibility for the struct tag of the regs member.
rather than renaming the tag, which might lead to the new name
entering use as API, simply use an untagged structure inside struct
user with the same members/layout as struct pt_regs.
for sh, struct pt_dspregs is just removed entirely since it was not
R_PPC_UADDR32 (R_PPC64_UADDR64) has the same meaning as R_PPC_ADDR32
(R_PPC64_ADDR64), except that its address need not be aligned. For
powerpc64, BFD ld(1) will automatically convert between ADDR<->UADDR
relocations when the address is/isn't at its native alignment. This
will happen if, for example, there is a pointer in a packed struct.
gold and lld do not currently generate R_PPC64_UADDR64, but pass
through misaligned R_PPC64_ADDR64 relocations from object files,
possibly relaxing them to misaligned R_PPC64_RELATIVE. In both cases
(relaxed or not) this violates the PSABI, which defines the relevant
field type as "a 64-bit field occupying 8 bytes, the alignment of
which is 8 bytes unless otherwise specified."
All three linkers violate the PSABI on 32-bit powerpc, where the only
difference is that the field is 32 bits wide, aligned to 4 bytes.
Currently musl fails to load executables linked by BFD ld containing
R_PPC64_UADDR64, with the error "unsupported relocation type 43".
This change provides compatibility with BFD ld on powerpc64, and any
static linker on either architecture that starts following the PSABI
otherwise, 32-bit archs that could otherwise share the generic
bits/ipc.h would need to duplicate the struct ipc_perm definition,
obscuring the fact that it's the same. sysvipc is not widely used and
these headers are not commonly included, so there is no performance
gain to be had by limiting the number of indirectly included files
files with the existing time32 definition of IPC_STAT are added to all
current 32-bit archs now, so that when it's changed the change will
show up as a change rather than addition of a new file where it's less
obvious that the value is changing vs the generic one that was used
without this, the SO_RCVTIMEO and SO_SNDTIMEO socket options would
stop working on pre-5.1 kernels after time_t is switched to 64-bit and
their values are changed to the new time64 versions.
new code is written such that it's statically unreachable on 64-bit
archs, and on existing 32-bit archs until the macro values are changed
to activate 64-bit time_t.
the definition of the IPC_64 macro controls the interface between libc
and the kernel through syscalls; it's not a public API. the meaning is
rather obscure. long ago, Linux's sysvipc *id_ds structures used
16-bit uids/gids and wrong types for a few other fields. this was in
the libc5 era, before glibc. the IPC_64 flag (64 is a misnomer; it's
more like 32) tells the kernel to use the modern[-ish] versions of the
the definition of IPC_64 has nothing to do with whether the arch is
32- or 64-bit. rather, due to either historical accident or
intentional obnoxiousness, the kernel only accepts and masks off the
0x100 IPC_64 flag conditional on CONFIG_ARCH_WANT_IPC_PARSE_VERSION,
i.e. for archs that want to provide, or that accidentally provided,
both. for archs which don't define this option, no masking is
performed and commands with the 0x100 bit set will fail as invalid. so
ultimately, the definition is just a matter of matching an arbitrary
switch defined per-arch in the kernel.
presently, all archs/ABIs have struct stat matching the kernel
stat type, except mips/mipsn32/mips64 which do conversion hacks in
syscall_arch.h to work around bugs in the kernel type. this patch
completely decouples them and adds a translation step to the success
path of fstatat. at present, this is just a gratuitous copying, but it
opens up multiple possibilities for future support for 64-bit time_t
on 32-bit archs and for cleaned-up/unified ABIs.
for clarity, the mips hacks are not yet removed in this commit, so the
mips kstat structs still correspond to the output of the hacks in
their syscall_arch.h files, not the raw kernel type. a subsequent
commit will fix this.
Commit 3517d74a5e04a377192d1f4882ad6c8dc22ce69a changed the token in
sys/ioctl.h from 0x01 to 1, so bits/termios.h no longer matches. Revert
the bits/termios.h change to keep the headers in sync.
This reverts commit 9eda4dc69c33852c97c6f69176bf45ffc80b522f.
syscall numbers are now synced up across targets (starting from 403 the
numbers are the same on all targets other than an arch specific offset)
IPC syscalls sem*, shm*, msg* got added where they were missing (except
for semop: only semtimedop got added), the new semctl, shmctl, msgctl
imply IPC_64, see
linux commit 0d6040d4681735dfc47565de288525de405a5c99
arch: add split IPC system calls where needed
new 64bit time_t syscall variants got added on 32bit targets, see
linux commit 48166e6ea47d23984f0b481ca199250e1ce0730a
y2038: add 64-bit time_t syscalls to all 32-bit architectures
new async io syscalls got added, see
linux commit 2b188cc1bb857a9d4701ae59aa7768b5124e262e
Add io_uring IO interface
linux commit edafccee56ff31678a091ddb7219aba9b28bc3cb
io_uring: add support for pre-mapped user IO buffers
a new syscall got added that uses the fd of /proc/<pid> as a stable
handle for processes: allows sending signals without pid reuse issues,
intended to eventually replace rt_sigqueueinfo, kill, tgkill and
linux commit 3eb39f47934f9d5a3027fe00d906a45fe3a15fad
signal: add pidfd_send_signal() syscall
on some targets (arm, m68k, s390x, sh) some previously missing syscall
numbers got added as well.
the inline syscall code is copied directly from powerpc64. the extent
of register clobber specifiers may be excessive on both; if that turns
out to be the case it can be fixed later.
added in linux commit 5521eb4bca2db733952f068c37bdf3cd656ad23c
io_pgetevents is new in linux commit
rseq is new in linux commit
this will allow the compiler to cache and reuse the result, meaning we
no longer have to take care not to load it more than once for the sake
of archs where the load may be expensive.
depends on commit 1c84c99913bf1cd47b866ed31e665848a0da84a2 for
correctness, since otherwise the compiler could hoist loads during
stage 3 of dynamic linking before the initial thread-pointer setup.
versions of clang all the way back to 3.1 lack the bug this was
purportedly working around.
in our memory model, all atomics are supposed to be full barriers;
stores are not release-only. this is important because store is used
as an unlock operation in places where it needs to acquire the waiter
count to determine if a futex wake is needed. at least in the
malloc-internal locks, but possibly elsewhere, soft deadlocks from
missing futex wake (breakable by poking the threads to restart the
syscall, e.g. by attaching a tracer) were reported to occur.
once the malloc lock is replaced with Jens Gustedt's new lock
implementation (see commit 47d0bcd4762f223364e5b58d5a381aaa0cbd7c38),
malloc will not be affected by the issue, but it's not clear that
other uses won't be. reducing the strength of the ordering properties
required from a_store would require a thorough analysis of how it's
to fix the problem, I'm removing the powerpc-specific a_store
definition; now, the top-level atomic.h will implement a_store using
a_barrier on both sides of the store.
it's not clear to me yet whether there might be issues with the other
atomics. it's possible that a_post_llsc needs to be replaced with a
full barrier to guarantee the formal semanics we want, but either way
I think the difference is unlikely to impact the way we use them.
sys/ptrace.h is target specific, use bits/ptrace.h to add target
specific macro definitions.
these macros are kept in the generic sys/ptrace.h even though some
targets don't support them:
so no macro definition got removed in this patch on any target. only
s390x has a numerically conflicting macro definition (PTRACE_SINGLEBLOCK).
the PT_ aliases follow glibc headers, otherwise the definitions come
from linux uapi headers except ones that are skipped in glibc and
there is no real kernel support (s390x PTRACE_*_AREA) or need special
type definitions (mips PTRACE_*_WATCH_*) or only relevant for linux
2.4 compatibility (PTRACE_OLDSETOPTIONS).
commit 587f5a53bc3a68d80b239ba515d583df690a96df moved the definition
of SO_PEERSEC to bits/socket.h for archs where the SO_* macros differ
from their standard values, but failed to add copies of the generic
definition for powerpc and powerpc64.
add pkey_mprotect, pkey_alloc, pkey_free syscall numbers,
new in linux commits 3350eb2ea127978319ced883523d828046af4045
In TLS variant I the TLS is above TP (or above a fixed offset from TP)
but on some targets there is a reserved gap above TP before TLS starts.
This matters for the local-exec tls access model when the offsets of
TLS variables from the TP are hard coded by the linker into the
executable, so the libc must compute these offsets the same way as the
linker. The tls offset of the main module has to be
If there is no TLS in the main module then the gap can be ignored
since musl does not use it and the tls access models of shared
libraries are not affected.
The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0
(i.e. TLS did not require large alignment) because the gap was
treated as a fixed offset from TP. Now the TP points at the end
of the pthread struct (which is aligned) and there is a gap above
it (which may also need alignment).
The fix required changing TP_ADJ and __pthread_self on affected
targets (aarch64, arm and sh) and in the tlsdesc asm the offset to
access the dtv changed too.
the output delay features (NL*, CR*, TAB*, BS*, and VT*) are
XSI-shaded. VT* is in the V* namespace reservation but the rest need
to be suppressed in base POSIX namespace.
unfortunately this change introduces feature test macro checks into
another bits header. at some point these checks should be simplified
by having features.h handle the "FTM X implies Y" relationships.