|Age||Commit message (Collapse)||Author||Lines|
linux commit a49f4f81cb48925e8d7cbd9e59068f516e984144
arch: Wire up Landlock syscalls
Merge tag 'landlock_v34' of ... jmorris/linux-security
Landlock provides for unprivileged application sandboxing. The goal of
Landlock is to enable to restrict ambient rights (e.g. global filesystem
access) for a set of processes. Landlock is inspired by seccomp-bpf but
instead of filtering syscalls and their raw arguments, a Landlock rule
can restrict the use of kernel objects like file hierarchies, according
to the kernel semantic.
new syscall to change the properties of a mount or a mount tree using
file descriptors which the new mount api is based on, see
linux commit 2a1867219c7b27f928e2545782b86daaf9ad50bd
fs: add mount_setattr()
linux commit b0a0c2615f6f199a656ed8549d7dce625d77aa77
epoll: wire up syscall epoll_pwait2
linux commit 58169a52ebc9a733aeb5bea857bc5daa71a301bb
epoll: add syscall epoll_pwait2
epoll_wait with struct timespec timeout instead of int. no time32 variant.
mainly added to linux to allow a central process management service in
android to give MADV_COLD|PAGEOUT hints for other processes, see
linux commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
mm/madvise: introduce process_madvise() syscall: an external memory
linux commit 9b4feb630e8e9801603f3cab3a36369e3c1cf88d
arch: wire-up close_range()
linux commit 278a5fbaed89dacd04e9d052f4594ffd0e0585de
open: add close_range()
the linux faccessat syscall lacks a flag argument that is necessary
to implement the posix api, see
linux commit c8ffd8bcdd28296a198f237cc595148a8d4adfbe
vfs: add faccessat2 syscall
also added clone3 on sh and m68k, on sh it's still missing (not
yet wired up), but reserved so safe to add.
linux commit fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179
open: introduce openat2(2) syscall
linux commit 9a2cef09c801de54feecd912303ace5c27237f12
arch: wire up pidfd_getfd syscall
linux commit 8649c322f75c96e7ced2fec201e123b2b073bf09
pid: Implement pidfd_getfd syscall
linux commit e8bb2a2a1d51511e6b3f7e08125d52ec73c11139
m68k: Wire up clone3() syscall
the adjustment made is entirely a function of TLS_ABOVE_TP and
TP_OFFSET. aside from avoiding repetition of the TP_OFFSET value and
arithmetic, this change makes pthread_arch.h independent of the
definition of struct __pthread from pthread_impl.h. this in turn will
allow inclusion of pthread_arch.h to be moved to the top of
pthread_impl.h so that it can influence the definition of the
previously, arch files were very inconsistent about the type used for
the thread pointer. this change unifies the new __get_tp interface to
always use uintptr_t, which is the most correct when performing
arithmetic that may involve addresses outside the actual pointed-to
object (due to TP_OFFSET).
the only part of TP_ADJ that was not uniquely determined by
TLS_ABOVE_TP was the 0x7000 adjustment used mainly on mips and powerpc
signal 7 is SIGEMT on Linux mips* ABI according to the man pages and
kernel. it's not clear where the wrong name came from but it dates
back to original mips commit.
on all mips variants, Linux did (and maybe still does) have some
syscall return paths that wrongly return both the error flag in r7 and
a negated error code in r2. in particular this happened for at least
some causes of ENOSYS.
add an extra check to only negate the error code if it's positive to
bug report and concept for patch by Andreas Dröscher.
effectivly revert commit ddc7c4f936c7a90781072f10dbaa122007e939d0
which was wrong; it caused a major regression on Linux versions prior
to 2.6.36. old kernels did not properly preserve r2 across syscall
restart, and instead restarted with the instruction right before
syscall, imposing a contract that the previous instruction must load
r2 from an immediate or a register (or memory) not clobbered by the
since other changes were made since, including removal of the struct
stat conversion that was replaced by separate struct kstat, this is
not a direct revert, only a functional one.
the "0"(r2) input constraint added back seems useless/erroneous, but
without it most gcc versions (seems to be all prior to 9.x) fail to
honor the output register binding for r2. this seems to be a variant
of gcc bug #87733. further changes should be made later if a better
workaround is found, but this one has been working since 2012. it
seems this issue was encountered but misidentified then, when it
inspired commit 4221f154ff29ab0d6be1e7beaa5ea2d1731bc58e.
this extends commit 5a105f19b5aae79dd302899e634b6b18b3dcd0d6, removing
timer[fd]_settime and timer[fd]_gettime. the timerfd ones are likely
to have been used in software that started using them before it could
rely on libc exposing functions.
this extends commit 5a105f19b5aae79dd302899e634b6b18b3dcd0d6, removing
clock_settime, clock_getres, clock_nanosleep, and settimeofday.
some nontrivial number of applications have historically performed
direct syscalls for these operations rather than using the public
functions. such usage is invalid now that time_t is 64-bit and these
syscalls no longer match the types they are used with, and it was
already harmful before (by suppressing use of vdso).
since syscall() has no type safety, incorrect usage of these syscalls
can't be caught at compile-time. so, without manually inspecting or
running additional tools to check sources, the risk of such errors
slipping through is high.
this patch renames the syscalls on 32-bit archs to clock_gettime32 and
gettimeofday_time32, so that applications using the original names
will fail to build without being fixed.
note that there are a number of other syscalls that may also be unsafe
to use directly after the time64 switchover, but (1) these are the
main two that seem to be in widespread use, and (2) most of the others
continue to have valid usage with a null timeval/timespec argument, as
the argument is an optional timeout or similar.
the syscall numbers were reserved in v5.3 but not wired up on mips, see
linux commit 0671c5b84e9e0a6d42d22da9b5d093787ac1c5f3
MIPS: Wire up clone3 syscall
linux commit 7615d9e1780e26e0178c93c55b73309a5dc093d7
arch: wire-up pidfd_open()
linux commit 32fcb426ec001cb6d5a4a195091a8486ea77e2df
pid: add pidfd_open()
now that all 32-bit archs have 64-bit time_t (and suseconds_t), the
arch-provided _Int64 macro (long or long long, as appropriate) can be
used to define them, and arch-specific definitions are no longer
now that all 32-bit archs have 64-bit time types, the values for the
time-related socket option macros can be treated as universal for
32-bit archs. the sys/socket.h mechanism for this predates
arch/generic and is instead in the top-level header.
x32, which does not use the new time64 values of the macros, already
has its own overrides, so this commit does not affect it.
this commit preserves ABI fully for existing interface boundaries
between libc and libc consumers (applications or libraries), by
retaining existing symbol names for the legacy 32-bit interfaces and
redirecting sources compiled against the new headers to alternate
symbol names. this does not necessarily, however, preserve the
pairwise ABI of libc consumers with one another; where they use
time_t-derived types in their interfaces with one another, it may be
necessary to synchronize updates with each other.
the intent is that ABI resulting from this commit already be stable
and permanent, but it will not be officially so until a release is
made. changes to some header-defined types that do not play any role
in the ABI between libc and its consumers may still be subject to
mechanically, the changes made by this commit for each 32-bit arch are
- _REDIR_TIME64 is defined to activate the symbol redirections in
- COMPAT_SRC_DIRS is defined in arch.mak to activate build of ABI
compat shims to serve as definitions for the original symbol names
- time_t and suseconds_t definitions are changed to long long (64-bit)
- IPC_STAT definition is changed to add the IPC_TIME64 bit (0x100),
triggering conversion of semid_ds, shmid_ds, and msqid_ds split
low/high time bits into new time_t members
- structs semid_ds, shmid_ds, msqid_ds, and stat are modified to add
new 64-bit time_t/timespec members at the end, maintaining existing
layout of other members.
- socket options (SO_*) and ioctl (sockios) command macros are
redefined to use the kernel's "_NEW" values.
in addition, on archs where vdso clock_gettime is used, the
VDSO_CGT_SYM macro definition in syscall_arch.h is changed to use a
new time64 vdso function if available, and a new VDSO_CGT32_SYM macro
is added for use as fallback on kernels lacking time64.
these structures can now be defined generically in terms of endianness
and long size. previously, the 32-bit archs all shared a common
definition from the generic bits header, and each 64-bit arch had to
repeat the 64-bit version, with endian conditionals if the arch had
variants of each endianness.
I would prefer getting rid of the preprocessor conditionals for
padding and instead using unnamed bitfield members, like commit
9b2921bea1d5017832e1b45d1fd64220047a9802 did for struct timespec.
however, at present sendmsg, recvmsg, and recvmmsg need access to the
padding members by name to zero them. this could perhaps be cleaned up
in the future.
policy has long been that these definitions are purely a function of
whether long/pointer is 32- or 64-bit, and that they are not allowed
to vary per-arch. move the definition to the shared alltypes.h.in
fragment, using integer constant expressions in terms of sizeof to
vary the array dimensions appropriately. I'm not sure whether this is
more or less ugly than using preprocessor conditionals and two sets of
definitions here, but either way is a lot less ugly than repeating the
same thing for every arch.
LLONG_MAX is uniform for all archs we support and plenty of header and
code level logic assumes it is, so it does not make sense for limits.h
bits mechanism to pretend it's variable.
LONG_BIT can be defined in terms of LONG_MAX; there's no reason to put
it in bits.
by moving LONG_MAX definition to __LONG_MAX in alltypes.h and moving
LLONG_MAX out of bits, there are now no plain-C limits that are
defined in the bits header, so the bits header only needs to be
included in the POSIX or extended profiles. this allows the feature
test macro logic to be removed from the bits header, facilitating a
long-term goal of getting such logic out of bits.
having __LONG_MAX in alltypes.h will allow further generalization of
archs without a constant PAGESIZE no longer need bits/limits.h at all.
building on commit 97d35a552ec5b6ddf7923dd2f9a8eb973526acea,
__BYTE_ORDER is now available wherever alltypes.h is included. since
reloc.h is only used from src/internal/dynlink.h, it can be assumed
that __BYTE_ORDER is exposed. reloc.h is not permitted to be included
in other contexts, and generally, like most arch headers, lacks
inclusion guards that would allow such usage. the mips64 version
mistakenly included such guards; they are removed for consistency.
this change is motivated by the intersection of several factors.
presently, despite being a nonstandard header, endian.h is exposing
the unprefixed byte order macros and functions only if _BSD_SOURCE or
_GNU_SOURCE is defined. this is to accommodate use of endian.h from
other headers, including bits headers, which need to define structure
layout in terms of endianness. with time64 switch-over, even more
headers will need to do this.
at the same time, the resolution of Austin Group issue 162 makes
endian.h a standard header for POSIX-future, requiring that it expose
the unprefixed macros and the functions even in standards-conforming
profiles. changes to meet this new requirement would break existing
internal usage of endian.h by causing it to violate namespace where
instead, have the arch's alltypes.h define __BYTE_ORDER, either as a
fixed constant or depending on the right arch-specific predefined
macros for determining endianness. explicit literals 1234 and 4321 are
used instead of __LITTLE_ENDIAN and __BIG_ENDIAN so that there's no
danger of getting the wrong result if a macro is undefined and
implicitly evaluates to 0 at the preprocessor level.
the powerpc (32-bit) bits/endian.h being removed had logic for varying
endianness, but our powerpc arch has never supported that and has
always been big-endian-only. this logic is not carried over to the new
__BYTE_ORDER definition in alltypes.h.
now that commit f7f1079796abc6f97c69521d2334e9c7d3945dd8 removed the
legacy i386 conditional definition, va_list is in no way
arch-specific, and has no reason to be in the future. move it to the
shared part of alltypes.h.in
mips r6 (an incompatible isa from traditional mips) removes the hi and
lo registers used for mul/div results. older gcc versions accepted
them in the clobber list for asm, but their presence is incorrect and
breaks on later versions.
in the process of fixing this, the clobber list for 32-bit mips
syscalls has been deduplicated via a macro like on mips64 and n32.
new mount api syscalls were added, same numers on all targets, see
linux commit a07b20004793d8926f78d63eb5980559f7813404
vfs: syscall: Add open_tree(2) to reference or clone a mount
linux commit 2db154b3ea8e14b04fee23e3fdfd5e9d17fbc6ae
vfs: syscall: Add move_mount(2) to move mounts around
linux commit 24dcb3d90a1f67fe08c68a004af37df059d74005
vfs: syscall: Add fsopen() to prepare for superblock creation
linux commit ecdab150fddb42fe6a739335257949220033b782
vfs: syscall: Add fsconfig() for configuring and managing a context
linux commit 93766fbd2696c2c4453dd8e1070977e9cd4e6b6d
vfs: syscall: Add fsmount() to create a mount for a superblock
linux commit cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb
vfs: syscall: Add fspick() to select a superblock for reconfiguration
linux commit 9c8ad7a2ff0bfe58f019ec0abc1fb965114dde7d
uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]
linux commit d8076bdb56af5e5918376cd1573a6b0007fc1a89
uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]
otherwise, 32-bit archs that could otherwise share the generic
bits/ipc.h would need to duplicate the struct ipc_perm definition,
obscuring the fact that it's the same. sysvipc is not widely used and
these headers are not commonly included, so there is no performance
gain to be had by limiting the number of indirectly included files
files with the existing time32 definition of IPC_STAT are added to all
current 32-bit archs now, so that when it's changed the change will
show up as a change rather than addition of a new file where it's less
obvious that the value is changing vs the generic one that was used
without this, the SO_RCVTIMEO and SO_SNDTIMEO socket options would
stop working on pre-5.1 kernels after time_t is switched to 64-bit and
their values are changed to the new time64 versions.
new code is written such that it's statically unreachable on 64-bit
archs, and on existing 32-bit archs until the macro values are changed
to activate 64-bit time_t.
various padding fields in the generic bits/sem.h were defined in terms
of time_t as a cheap hack standing in for "kernel long", to allow x32
to use the generic version of the file. this was a really bad idea, as
it ended up getting copied into lots of arch-specific versions of the
bits file, and is a blocker to changing time_t to 64-bit on 32-bit
this commit adds an x32-specific version of the header, and changes
padding type back from time_t to long (currently the same type on all
archs but x32) in the generic header and all the others the hack got
now that we have a kstat structure decoupled from the public struct
stat, we can just use the broken kernel structures directly and let
the code in fstatat do the translation.
presently, all archs/ABIs have struct stat matching the kernel
stat type, except mips/mipsn32/mips64 which do conversion hacks in
syscall_arch.h to work around bugs in the kernel type. this patch
completely decouples them and adds a translation step to the success
path of fstatat. at present, this is just a gratuitous copying, but it
opens up multiple possibilities for future support for 64-bit time_t
on 32-bit archs and for cleaned-up/unified ABIs.
for clarity, the mips hacks are not yet removed in this commit, so the
mips kstat structs still correspond to the output of the hacks in
their syscall_arch.h files, not the raw kernel type. a subsequent
commit will fix this.
these were overlooked during review. bits headers are not allowed to
pull in additional headers (note: that rule is currently broken in
other places but just for endian.h). string.h has no place here
anyway, and including bits/alltypes.h without defining macros to
request types from it is a nop.
ever since inline syscalls were added for (o32) mips in commit
328810d32524e4928fec50b57e37e1bf330b2e40, the asm has nonsensically
loaded the syscall number, rather than taking $2 as an input
constraint to let the compiler load it. commit
cfc09b1ecf0c6981494fd73dffe234416f66af10 improved on this somewhat by
allowing a constant syscall number to propagate into an immediate, but
missed that the whole operation made no sense.
now, only $4, $5, $6, $8, and $9 are potential input-only registers.
$2 is always input and output, and $7 is both when it's an argument,
otherwise output-only. previously, $7 was treated as an input (with a
"1" constraint matching its output position) even when it was not an
input, which was arguably undefined behavior (asm input from
indeterminate value). this is corrected.
this patch is not purely non-functional changes, since before, $8 and
$9 were wrongly in the clobberlist for syscalls with fewer than 5 or 6
arguments. of course it's impossible for syscalls to have different
clobbers depending on their number of arguments. the clobberlist for
the recently-added 5- and 6-argument forms was correct, and for the 0-
to 4-argument forms was erroneously copied from the mips o32 ABI where
the additional arguments had to be passed on the stack.
in making this change, I reviewed the kernel sources, and $8 and $9
are always saved for 64-bit kernels since they're part of the syscall
argument list for n32 and n64 ABIs.
Commit 3517d74a5e04a377192d1f4882ad6c8dc22ce69a changed the token in
sys/ioctl.h from 0x01 to 1, so bits/termios.h no longer matches. Revert
the bits/termios.h change to keep the headers in sync.
This reverts commit 9eda4dc69c33852c97c6f69176bf45ffc80b522f.
syscall numbers are now synced up across targets (starting from 403 the
numbers are the same on all targets other than an arch specific offset)
IPC syscalls sem*, shm*, msg* got added where they were missing (except
for semop: only semtimedop got added), the new semctl, shmctl, msgctl
imply IPC_64, see
linux commit 0d6040d4681735dfc47565de288525de405a5c99
arch: add split IPC system calls where needed
new 64bit time_t syscall variants got added on 32bit targets, see
linux commit 48166e6ea47d23984f0b481ca199250e1ce0730a
y2038: add 64-bit time_t syscalls to all 32-bit architectures
new async io syscalls got added, see
linux commit 2b188cc1bb857a9d4701ae59aa7768b5124e262e
Add io_uring IO interface
linux commit edafccee56ff31678a091ddb7219aba9b28bc3cb
io_uring: add support for pre-mapped user IO buffers
a new syscall got added that uses the fd of /proc/<pid> as a stable
handle for processes: allows sending signals without pid reuse issues,
intended to eventually replace rt_sigqueueinfo, kill, tgkill and
linux commit 3eb39f47934f9d5a3027fe00d906a45fe3a15fad
signal: add pidfd_send_signal() syscall
on some targets (arm, m68k, s390x, sh) some previously missing syscall
numbers got added as well.
commit 1bcdaeee6e659f1d856717c9aa562a068f2f3bd4 introduced the
n32 and n64 ABIs add new argument registers vs o32, so that passing on
the stack is not necessary, so it's not clear why the 5- and
6-argument versions were special-cased to begin with; it seems to have
been pattern-copying from arch/mips (o32).
i've treated the new argument registers like the first 4 in terms of
clobber status (non-clobbered). hopefully this is correct.
io_pgetevents is new in linux commit
rseq is new in linux commit
this will allow the compiler to cache and reuse the result, meaning we
no longer have to take care not to load it more than once for the sake
of archs where the load may be expensive.
depends on commit 1c84c99913bf1cd47b866ed31e665848a0da84a2 for
correctness, since otherwise the compiler could hoist loads during
stage 3 of dynamic linking before the initial thread-pointer setup.
these were overlooked in the declarations overhaul work because they
are not properly declared, and the current framework even allows their
declared types to vary by arch. at some point this should be cleaned
up, but I'm not sure what the right way would be.
this cleans up what had become widespread direct inline use of "GNU C"
style attributes directly in the source, and lowers the barrier to
increased use of hidden visibility, which will be useful to recovering
some of the efficiency lost when the protected visibility hack was
dropped in commit dc2f368e565c37728b0d620380b849c3a1ddd78f, especially
on archs where the PLT ABI is costly.
sys/ptrace.h is target specific, use bits/ptrace.h to add target
specific macro definitions.
these macros are kept in the generic sys/ptrace.h even though some
targets don't support them:
so no macro definition got removed in this patch on any target. only
s390x has a numerically conflicting macro definition (PTRACE_SINGLEBLOCK).
the PT_ aliases follow glibc headers, otherwise the definitions come
from linux uapi headers except ones that are skipped in glibc and
there is no real kernel support (s390x PTRACE_*_AREA) or need special
type definitions (mips PTRACE_*_WATCH_*) or only relevant for linux
2.4 compatibility (PTRACE_OLDSETOPTIONS).
adapted from patch by Matthias Schiffer.
new in linux commit 256211f2b0b251e532d1899b115e374feb16fa7a
In TLS variant I the TLS is above TP (or above a fixed offset from TP)
but on some targets there is a reserved gap above TP before TLS starts.
This matters for the local-exec tls access model when the offsets of
TLS variables from the TP are hard coded by the linker into the
executable, so the libc must compute these offsets the same way as the
linker. The tls offset of the main module has to be
If there is no TLS in the main module then the gap can be ignored
since musl does not use it and the tls access models of shared
libraries are not affected.
The previous setup only worked if (tls_align & -GAP_ABOVE_TP) == 0
(i.e. TLS did not require large alignment) because the gap was
treated as a fixed offset from TP. Now the TP points at the end
of the pthread struct (which is aligned) and there is a gap above
it (which may also need alignment).
The fix required changing TP_ADJ and __pthread_self on affected
targets (aarch64, arm and sh) and in the tlsdesc asm the offset to
access the dtv changed too.
the output delay features (NL*, CR*, TAB*, BS*, and VT*) are
XSI-shaded. VT* is in the V* namespace reservation but the rest need
to be suppressed in base POSIX namespace.
unfortunately this change introduces feature test macro checks into
another bits header. at some point these checks should be simplified
by having features.h handle the "FTM X implies Y" relationships.