summaryrefslogtreecommitdiff
path: root/src/unistd
AgeCommit message (Collapse)AuthorLines
2024-03-14fix pwrite/pwritev handling of O_APPEND filesRich Felker-1/+20
POSIX requires pwrite to honor the explicit file offset where the write should take place even if the file was opened as O_APPEND. however, linux historically defined the pwrite syscall family as honoring O_APPEND. this cannot be changed on the kernel side due to stability policy, but the addition of the pwritev2 syscall with a flags argument opened the door to fixing it, and linux commit 73fa7547c70b32cc69685f79be31135797734eb6 adds the RWF_NOAPPEND flag that lets us request a write honoring the file offset argument. this patch changes the pwrite function to first attempt using the pwritev2 syscall with RWF_NOAPPEND, falling back to using the old pwrite syscall only after checking that O_APPEND is not set for the open file. if O_APPEND is set, the operation fails with EOPNOTSUPP, reflecting that the kernel does not support the correct behavior. this is an extended error case needed to avoid the wrong behavior that happened before (writing the data at the wrong location), and is aligned with the spirit of the POSIX requirement that "An attempt to perform a pwrite() on a file that is incapable of seeking shall result in an error." since the pwritev2 syscall interprets the offset of -1 as a request to write at the current file offset, it is mapped to a different negative value that will produce the expected error. pwritev, though not governed by POSIX at this time, is adjusted to match pwrite in honoring the offset.
2024-02-22add framework to support archs without a native wait4 syscallRich Felker-1/+1
this commit should make no codegen change for existing archs, but is a prerequisite for new archs including riscv32. the wait4 emulation backend provides both cancellable and non-cancellable variants because waitpid is required to be a cancellation point, but all of our other uses are not, and most of them cannot be. based on patch by Stefan O'Rear.
2023-11-06ensure valid setxid return value in an unexpected error caseMarkus Wichmann-1/+1
If __synccall() fails to capture all threads because tkill fails for some reason other than EAGAIN, then the callback given will never be executed, so nothing will ever overwrite the initial value. So that is the value that will be returned from the function. The previous setting of 1 is not a valid value for setuid() et al. to return. I chose -EAGAIN since I don't know the reason the synccall failed ahead of time, but EAGAIN is a specified error code for a possibly temporary failure in setuid().
2023-02-28dup3: don't set FD_CLOEXEC on failure on kernels without dup3 syscallRich Felker-1/+2
this is the best-effort fallback path for kernels that can't actually support the dup3 functionality. it was setting FD_CLOEXEC flag on the target fd (new) even if the dup2 operation failed. normally that shouldn't happen under correct usage, but it's possible if the source fd is not open or intentionally invalid (e.g. -1).
2023-02-28fix dup3 ignoring all flags but O_CLOEXEC on archs with SYS_dup2 syscallRich Felker-1/+2
our dup3 code wrongly skipped directly to making the SYS_dup2 syscall whenever the O_CLOEXEC bit of flags was not set. this is incorrect if any new flags are ever added, as it would silently ignore them rather than failing with an error. archs which lack SYS_dup2 were unaffected. adjust the logic so that SYS_dup3 is attempted whenever flags is nonzero, and explicitly fail with EINVAL if SYS_dup3 is unavailable and there are any unknown flags.
2023-02-28fix pipe2 silently ignoring unknown flags on old kernelsRich Felker-0/+1
kernels using the fallback have an inherent close-on-exec race condition and as such support for them is only best-effort anyway. however, ignoring potential new flags is still very bad behavior. instead, fail with EINVAL.
2022-10-19remove LFS64 symbol aliases; replace with dynamic linker remappingRich Felker-15/+0
originally the namespace-infringing "large file support" interfaces were included as part of glibc-ABI-compat, with the intent that they not be used for linking, since our off_t is and always has been unconditionally 64-bit and since we usually do not aim to support nonstandard interfaces when there is an equivalent standard interface. unfortunately, having the symbols present and available for linking caused configure scripts to detect them and attempt to use them without declarations, producing all the expected ill effects that entails. as a result, commit 2dd8d5e1b8ba1118ff1782e96545cb8a2318592c was made to prevent this, using macros to redirect the LFS64 names to the standard names, conditional on _GNU_SOURCE or _LARGEFILE64_SOURCE. however, this has turned out to be a source of further problems, especially since g++ defines _GNU_SOURCE by default. in particular, the presence of these names as macros breaks a lot of valid code. this commit removes all the LFS64 symbols and replaces them with a mechanism in the dynamic linker symbol lookup failure path to retry with the spurious "64" removed from the symbol name. in the future, if/when the rest of glibc-ABI-compat is moved out of libc, this can be removed.
2022-03-08nice: return EPERM instead of EACCESAlexey Kodanev-1/+8
To comply with POSIX, change errno from EACCES to EPERM when the caller did not have the required privilege.
2020-11-23work around linux bug in readlink syscall with zero buffer sizeRich Felker-3/+17
linux fails with EINVAL when a zero buffer size is passed to the syscall. this is non-conforming because POSIX already defines EINVAL with a significantly different meaning: the target is not a symlink. since the request is semantically valid, patch it up by using a dummy buffer of length one, and truncating the return value to zero if it succeeds.
2020-10-27refactor setxid return path to use __syscall_retRich Felker-14/+9
this avoids some spurious negation and duplicated errno logic, and brings the code in line with the newly-added multithreaded setgroups.
2020-10-14move aio implementation details to a proper internal headerRich Felker-0/+1
also fix the lack of declaration (and thus hidden visibility) in __stdio_close's use of __aio_close.
2020-09-09use new SYS_faccessat2 syscall to implement faccessat with flagsRich Felker-3/+8
commit 0a05eace163cee9b08571d2ff9d90f5e82d9c228 implemented AT_EACCESS for faccessat with a horrible hack, creating a child process to change switch uid/gid and perform the access probe without making potentially irreversible changes to the caller's credentials. this was due to the syscall lacking a flags argument. linux 5.8 introduced a new syscall, SYS_faccessat2, fixing this deficiency. use it if any flags are passed, and fallback to the old strategy on ENOSYS. continue using the old syscall when there are no flags.
2020-08-30clean up overinclusion in files using TIOCGWINSZRich Felker-1/+0
now that struct winsize is available via sys/ioctl.h once again, including termios.h is not needed.
2020-08-24add tcgetwinsize and tcsetwinsize functions, move struct winsizeRich Felker-0/+1
these have been adopted for future issue of POSIX as the outcome of Austin Group issue 1151, and are simply functions performing the roles of the historical ioctls. since struct winsize is being standardized along with them, its definition is moved to the appropriate header. there is some chance this will break source files that expect struct winsize to be defined by sys/ioctl.h without including termios.h. if this happens, further changes will be needed to have sys/ioctl.h expose it too.
2019-08-05use setitimer function rather than syscall to implement alarmRich Felker-3/+3
otherwise alarm will break on 32-bit archs when time_t is changed to 64-bit. a second itimerval object is introduced for retrieving the old value, since the setitimer function has restrict-qualified arguments.
2019-07-16fix broken lseek on x32 (x86_64/ILP32) with offsets larger than LONG_MAXRich Felker-0/+15
this is analogous to commit 918c5fa0fc656e49b1ab9ce47183a23e3a36bc00 which fixed the corresponding issue for mips n32.
2019-07-16fix broken lseek on mipsn32 with offsets larger than LONG_MAXRich Felker-0/+20
mips n32 has 32-bit long, and generally uses long syscall arguments and return values, but provides only SYS_lseek, not SYS_llseek. we have some framework (syscall_arg_t, added for x32) to make syscall arguments 64-bit in such a setting, but it's not clear whether this could match the sign-extension semantics needed for 32-bit args to all the other syscalls, and we don't have any existing mechanism to allow the return value of syscalls to be something other than long. instead, just provide a custom mipsn32 version of the lseek function doing its own syscall asm with 64-bit arguments. as a result of commit 03919b26ed41c31876db41f7cee076ced4513fad, stdio will also get the new code, fixing fseeko/ftello too.
2019-07-16use namespace-safe __lseek for __stdio_seek instead of direct syscallRich Felker-2/+3
this probably saves a few bytes, avoids duplicating the clunky lseek/_llseek syscall convention in two places, and sets the stage for fixing broken seeks on x32 and mipsn32.
2019-07-10fix restrict violations in internal use of several functionsSamuel Holland-3/+3
The old/new parameters to pthread_sigmask, sigprocmask, and setitimer are marked restrict, so passing the same address to both is prohibited. Modify callers of these functions to use a separate object for each argument.
2019-03-21support archs with no renameat syscall, only renameat2Drew DeVault-0/+4
2018-09-15improve error handling of ttyname_r and isattyBenjamin Peterson-2/+6
POSIX allows ttyname(_r) and isatty to return EBADF if passed file descriptor is invalid. maintainer's note: these are optional ("may fail") errors, but it's non-conforming for ttyname_r to return ENOTTY when it failed for a different reason.
2018-09-12remove spurious inclusion of libc.h for LFS64 ABI aliasesRich Felker-14/+7
the LFS64 macro was not self-documenting and barely saved any characters. simply use weak_alias directly so that it's clear what's being done, and doesn't depend on a header to provide a strange macro.
2018-09-12reduce spurious inclusion of libc.hRich Felker-9/+0
libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
2018-09-12move and deduplicate declarations of __procfdname to make it checkableRich Felker-6/+1
syscall.h was chosen as the header to declare it, since its intended usage is alongside syscalls as a fallback for operations the direct syscall does not support.
2018-05-01avoid excessive stack usage in getcwdRich Felker-2/+2
to support the GNU extension of allocating a buffer for getcwd's result when a null pointer is passed without incurring a link dependency on free, we use a PATH_MAX-sized buffer on the stack and only duplicate it to allocated storage after the operation succeeds. unfortunately this imposed excessive stack usage on all callers, including those not making use of the GNU extension. instead, use a VLA to make stack allocation conditional.
2018-04-19fix out of bounds write for zero length buffer in gethostnameMarc André Tanner-1/+1
2018-04-17fix return value of nice functionRich Felker-5/+9
the Linux SYS_nice syscall is unusable because it does not return the newly set priority. always use SYS_setpriority. also avoid overflows in addition of inc by handling large inc values directly without examining the old nice value.
2018-02-07make getcwd fail if it cannot obtain an absolute pathDmitry V. Levin-1/+7
Currently getcwd(3) can succeed without returning an absolute path because the underlying getcwd syscall, starting with linux commit v2.6.36-rc1~96^2~2, may succeed without returning an absolute path. This is a conformance issue because "The getcwd() function shall place an absolute pathname of the current working directory in the array pointed to by buf, and return buf". Fix this by checking the path returned by syscall and failing with ENOENT if the path is not absolute. The error code is chosen for consistency with the case when the current directory is unlinked. Similar issue was fixed in glibc recently, see https://sourceware.org/bugzilla/show_bug.cgi?id=22679
2018-02-05revert regression in faccessat AT_EACCESS robustnessRich Felker-21/+14
commit f9fb20b42da0e755d93de229a5a737d79a0e8f60 switched from using a pipe for the result to conveying it via the child process exit status. Alexander Monakov pointed out that the latter could fail if the application is not expecting faccessat to produce a child and performs a wait operation with __WCLONE or __WALL, and that it is not clear whether it's guaranteed to work when SIGCHLD's disposition has been set to SIG_IGN. in addition, that commit introduced a bug that caused EACCES to be produced instead of EBUSY due to an exit path that was overlooked when the error channel was changed, and introduced a spurious retry loop around the wait operation.
2017-05-27fix fchown fallback on arches without chown(2)Samuel Holland-1/+1
The flags argument was missing, causing uninitalized data to be passed to fchownat(2). The correct value of flags should match the fallback for chown(3).
2017-04-21make ttyname[_r] return ENODEV rather than ENOENTRich Felker-1/+1
commit 0a950dcf15bb9f7274c804dca490e9e20e475f3e added checking that the pathname a tty device was opened with actually matches the device, which can fail to hold when a container inherits a tty from outside the container. the error code added at the time was ENOENT; however, discussions between affected applications and glibc developers resulted in glibc adopting ENODEV as the error for this condition, and this has now been documented in the man pages project as well. adopt the same error code for consistency. patch by Christian Brauner.
2016-08-30verify that ttyname refers to the same file as the fdSzabolcs Nagy-4/+11
linux containers use separate mount namespace so the /proc symlink might not point to the right device if the fd was opened in the parent namespace, in this case return ENOENT.
2016-08-11fix pread/pwrite syscall calling convention on shRich Felker-2/+2
despite sh not generally using register-pair alignment for 64-bit syscall arguments, there are arch-specific versions of the syscall entry points for pread and pwrite which include a dummy argument for alignment before the 64-bit offset argument.
2016-04-18add mips n32 port (ILP32 ABI for mips64)Rich Felker-0/+19
based on patch submitted by Jaydeep Patil, with minor changes.
2016-03-06add mips64 portRich Felker-0/+19
patch by Mahesh Bodapati and Jaydeep Patil of Imagination Technologies.
2015-06-16switch to using trap number 31 for syscalls on shRich Felker-1/+1
nominally the low bits of the trap number on sh are the number of syscall arguments, but they have never been used by the kernel, and some code making syscalls does not even know the number of arguments and needs to pass an arbitrary high number anyway. sh3/sh4 traditionally used the trap range 16-31 for syscalls, but part of this range overlapped with hardware exceptions/interrupts on sh2 hardware, so an incompatible range 32-47 was chosen for sh2. using trap number 31 everywhere, since it's in the existing sh3/sh4 range and does not conflict with sh2 hardware, is a proposed unification of the kernel syscall convention that will allow binaries to be shared between sh2 and sh3/sh4. if this is not accepted into the kernel, we can refit the sh2 target with runtime selection mechanisms for the trap number, but doing so would be invasive and would entail non-trivial overhead.
2015-02-23fix possible isatty false positives and unwanted device state changesRich Felker-3/+4
the equivalent checks for newly opened stdio output streams, used to determine buffering mode, are also fixed. on most archs, the TCGETS ioctl command shares a value with SNDCTL_TMR_TIMEBASE, part of the OSS sound API which was apparently used with certain MIDI and timer devices. for file descriptors referring to such a device, TCGETS will not fail with ENOTTY as expected; it may produce a different error, or may succeed, and if it succeeds it changes the mode of the device. while it's unlikely that such devices are in use, this is in principle very harmful behavior for an operation which is supposed to do nothing but query whether the fd refers to a tty. TIOCGWINSZ, used to query logical window size for a terminal, was chosen as an alternate ioctl to perform the isatty check. it does not share a value with any other ioctl commands, and it succeeds on any tty device. this change also cleans up strace output to be less ugly and misleading.
2015-02-20map interruption of close by signal to success rather than EINPROGRESSRich Felker-1/+1
commit 82dc1e2e783815e00a90cd3f681436a80d54a314 addressed the resolution of Austin Group issue 529, which requires close to leave the fd open when failing with EINTR, by returning the newly defined error code EINPROGRESS. this turns out to be a bad idea, though, since legacy applications not aware of the new specification are likely to interpret any error from close except EINTR as a hard failure.
2015-02-13overhaul aio implementation for correctnessRich Felker-0/+8
previously, aio operations were not tracked by file descriptor; each operation was completely independent. this resulted in non-conforming behavior for non-seekable/append-mode writes (which are required to be ordered) and made it impossible to implement aio_cancel, which in turn made closing file descriptors with outstanding aio operations unsafe. the new implementation is significantly heavier (roughly twice the size, and seems to be slightly slower) and presently aims mainly at correctness, not performance. most of the public interfaces have been moved into a single file, aio.c, because there is little benefit to be had from splitting them. whenever any aio functions are used, aio_cancel and the internal queue lifetime management and fd-to-queue mapping code must be linked, and these functions make up the bulk of the code size. the close function's interaction with aio is implemented with weak alias magic, to avoid pulling in heavy aio cancellation code in programs that don't use aio, and the expensive cancellation path (which includes signal blocking) is optimized out when there are no active aio queues.
2015-01-30make fsync, fdatasync, and msync cancellation pointsTrutz Behn-2/+2
these are mandatory cancellation points per POSIX, so their omission was a conformance bug.
2015-01-15for multithreaded set*id/setrlimit, handle case where callback does not runRich Felker-1/+1
in the current version of __synccall, the callback is always run, so failure to handle this case did not matter. however, the upcoming overhaul of __synccall will have failure cases, in which case the callback does not run and errno is already set. the changes being committed now are in preparation for that.
2015-01-12remove rlimit hacks from multi-threaded set*id() codeRich Felker-23/+15
the code being removed was introduced to work around "partial failure" of multi-threaded set*id() operations, where some threads would succeed in changing their ids but an RLIMIT_NPROC setting would prevent the rest from succeeding, leaving the process in an inconsistent and dangerous state. however, the workaround code did not handle important usage cases like swapping real and effective uids then restoring their original values, and the wrongful kernel enforcement of RLIMIT_NPROC at setuid time was removed in Linux 3.1, making the workaround obsolete. since the partial failure still is dangerous on old kernels, and could in principle happen on post-fix kernels as well if set*id() syscalls fail for another spurious reason such as resource-related failures, new code is added to detect and forcibly kill the process if/when such a situation arises. future documentation releases should be updated to reflect that setting RLIMIT_NPROC to RLIM_INFINITY is necessary to avoid this forced-kill on old kernels. ideally, at some point the kernel will get proper multi-threaded set*id() syscalls capable of performing their actions atomically, and all of the userspace code to emulate them can be treated as a fallback for outdated kernels.
2015-01-12simplify ctermidRich Felker-14/+2
opening /dev/tty then using ttyname_r on it does not produce a canonical terminal name; it simply yields "/dev/tty". it would be possible to make ctermid determine the actual controlling terminal device via field 7 of /proc/self/stat, but doing so would introduce a buffer overflow into applications built with L_ctermid==9, which glibc defines, adversely affecting the quality of ABI compat.
2014-05-29support linux kernel apis (new archs) with old syscalls removedRich Felker-1/+69
such archs are expected to omit definitions of the SYS_* macros for syscalls their kernels lack from arch/$ARCH/bits/syscall.h. the preprocessor is then able to select the an appropriate implementation for affected functions. two basic strategies are used on a case-by-case basis: where the old syscalls correspond to deprecated library-level functions, the deprecated functions have been converted to wrappers for the modern function, and the modern function has fallback code (omitted at the preprocessor level on new archs) to make use of the old syscalls if the new syscall fails with ENOSYS. this also improves functionality on older kernels and eliminates the incentive to program with deprecated library-level functions for the sake of compatibility with older kernels. in other situations where the old syscalls correspond to library-level functions which are not deprecated but merely lack some new features, such as the *at functions, the old syscalls are still used on archs which support them. this may change at some point in the future if or when fallback code is added to the new functions to make them usable (possibly with reduced functionality) on old kernels.
2014-02-27rename superh port to "sh" for consistencyRich Felker-0/+0
linux, gcc, etc. all use "sh" as the name for the superh arch. there was already some inconsistency internally in musl: the dynamic linker was searching for "ld-musl-sh.path" as its path file despite its own name being "ld-musl-superh.so.1". there was some sentiment in both directions as to how to resolve the inconsistency, but overall "sh" was favored.
2014-02-23superh portBobby Bingham-0/+27
2013-12-19fix failure of fchmod, fstat, fchdir, and fchown to produce EBADFRich Felker-2/+6
the workaround/fallback code for supporting O_PATH file descriptors when the kernel lacks support for performing these operations on them caused EBADF to get replaced by ENOENT (due to missing entry in /proc/self/fd). this is unlikely to affect real-world code (calls that might yield EBADF are generally unsafe, especially in library code) but it was breaking some test cases. the fix I've applied is something of a tradeoff: it adds one syscall to these operations on kernels where the workaround is needed. the alternative would be to catch ENOENT from the /proc lookup and translate it to EBADF, but I want to avoid doing that in the interest of not touching/depending on /proc at all in these functions as long as the kernel correctly supports the operations. this is following the general principle of isolating hacks to code paths that are taken on broken systems, and keeping the code for correct systems completely hack-free.
2013-12-12include cleanups: remove unused headers and add feature test macrosSzabolcs Nagy-6/+1
2013-12-06add posix_close, accepted for inclusion in the next issue of POSIXRich Felker-0/+6
this is purely a wrapper for close since Linux does not support EINTR semantics for the close syscall.
2013-11-01simplify faccessat AT_EACCESS path and eliminate resource dependenceRich Felker-14/+21
now that we're waiting for the exit status of the child process, the result can be conveyed in the exit status rather than via a pipe. since the error value might not fit in 7 bits, a table is used to translate possible meaningful error values to small integers.