summaryrefslogtreecommitdiff
path: root/arch/x86_64
AgeCommit message (Collapse)AuthorLines
2016-03-29fix regression disabling use of pause instruction for x86 a_spinRich Felker-1/+1
commits e24984efd5c6ac5ea8e6cb6cd914fa8435d458bc and 16b55298dc4b6a54d287d7494e04542667ef8861 inadvertently disabled the a_spin implementations for i386, x86_64, and x32 by defining a macro named a_pause instead of a_spin. this should not have caused any functional regression, but it inhibited cpu relaxation while spinning for locks. bug reported by George Kulakowski.
2016-03-19add copy_file_range syscall numbers from linux v4.5Szabolcs Nagy-0/+2
it was introduced for offloading copying between regular files in linux commit 29732938a6289a15e907da234d6692a2ead71855 (microblaze and sh does not yet have the syscall number.)
2016-03-18deduplicate bits/mman.hSzabolcs Nagy-59/+0
currently five targets use the same mman.h constants and the rest share most constants too, so move them to sys/mman.h before the bits/mman.h include where the differences can be corrected by redefinition of the macros. this fixes two minor bugs: POSIX_MADV_DONTNEED was wrong on most targets (it should be the same as MADV_DONTNEED), and sh defined the x86-only MAP_32BIT mmap flag.
2016-03-02add sched_getcpu vDSO supportNathan Zadoks-0/+2
This brings the call to an actually usable speed. Quick unscientific benchmark: 14ns : 102ns :: vDSO : syscall
2016-01-27deduplicate the bulk of the arch bits headersRich Felker-321/+0
all bits headers that were identical for a number of 'clean' archs are moved to the new arch/generic tree. in addition, a few headers that differed only cosmetically from the new generic version are removed. additional deduplication may be possible in mman.h and in several headers (limits.h, posix.h, stdint.h) that mostly depend on whether the arch is 32- or 64-bit, but they are left alone for now because greater gains are likely possible with more invasive changes to header logic, which is beyond the scope of this commit.
2016-01-26add MCL_ONFAULT and MLOCK_ONFAULT mlockall and mlock2 flagsSzabolcs Nagy-0/+1
they lock faulted pages into memory (useful when a small part of a large mapped file needs efficient access), new in linux v4.4, commit b0f205c2a3082dd9081f9a94e50658c5fa906ff1 MLOCK_* is not in the POSIX reserved namespace for sys/mman.h
2016-01-26add mlock2 syscall number from linux v4.4Szabolcs Nagy-0/+2
this is mlock with a flags argument, new in linux commit a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e as usual microblaze and sh don't have allocated syscall number yet.
2016-01-26add new membarrier, userfaultfd and switch_endian syscallsSzabolcs Nagy-0/+4
new in linux v4.3 added for aarch64, arm, i386, mips, or1k, powerpc, x32 and x86_64. membarrier is a system wide memory barrier, moves most of the synchronization cost to one side, new in kernel commit 5b25b13ab08f616efd566347d809b4ece54570d1 userfaultfd is useful for qemu and is new in kernel commit 8d2afd96c20316d112e04d935d9e09150e988397 switch_endian is powerpc only for switching endianness, new in commit 529d235a0e190ded1d21ccc80a73e625ebcad09b
2016-01-22clean up x86_64 (and x32) atomics for new atomics frameworkRich Felker-55/+64
this commit mostly makes consistent things like spacing, function ordering in atomic_arch.h, argument names, use of volatile, etc. a_ctz_l was also removed from x86_64 since atomic.h provides it automatically using a_ctz_64.
2016-01-21refactor internal atomic.hRich Felker-14/+16
rather than having each arch provide its own atomic.h, there is a new shared atomic.h in src/internal which pulls arch-specific definitions from arc/$(ARCH)/atomic_arch.h. the latter can be extremely minimal, defining only a_cas or new ll/sc type primitives which the shared atomic.h will use to construct everything else. this commit avoids making heavy changes to the individual archs' atomic implementations. definitions which are identical or near-identical to what the new shared atomic.h would produce have been removed, but otherwise the changes made are just hooking up the arch-specific files to the new infrastructure. major changes to take advantage of the new system will come in subsequent commits.
2015-11-02properly access mcontext_t program counter in cancellation handlerRich Felker-1/+1
using the actual mcontext_t definition rather than an overlaid pointer array both improves correctness/readability and eliminates some ugly hacks for archs with 64-bit registers bit 32-bit program counter. also fix UB due to comparison of pointers not in a common array object.
2015-09-17new dlstart stage-2 chaining for x86_64 and x32Rich Felker-0/+5
2015-08-16mitigate performance regression in libc-internal locks on x86_64Rich Felker-1/+1
commit 3c43c0761e1725fd5f89a9c028cbf43250abb913 fixed missing synchronization in the atomic store operation for i386 and x86_64, but opted to use mfence for the barrier on x86_64 where it's always available. however, in practice mfence is significantly slower than the barrier approach used on i386 (a nop-like lock orl operation). this commit changes x86_64 (and x32) to use the faster barrier.
2015-07-28fix missing synchronization in atomic store on i386 and x86_64Rich Felker-1/+1
despite being strongly ordered, the x86 memory model does not preclude reordering of loads across earlier stores. while a plain store suffices as a release barrier, we actually need a full barrier, since users of a_store subsequently load a waiter count to determine whether to issue a futex wait, and using a stale count will result in soft (fail-to-wake) deadlocks. these deadlocks were observed in malloc and possible with stdio locks and other libc-internal locking. on i386, an atomic operation on the caller's stack is used as the barrier rather than performing the store itself using xchg; this avoids the need to read the cache line on which the store is being performed. mfence is used on x86_64 where it's always available, and could be used on i386 with the appropriate cpu model checks if it's shown to perform better.
2015-05-20fix inconsistency in a_and and a_or argument types on x86[_64]Rich Felker-4/+4
conceptually, and on other archs, these functions take a pointer to int, but in the i386, x86_64, and x32 versions of atomic.h, they took a pointer to void instead.
2015-04-13dynamic linker bootstrap overhaulRich Felker-39/+25
this overhaul further reduces the amount of arch-specific code needed by the dynamic linker and removes a number of assumptions, including: - that symbolic function references inside libc are bound at link time via the linker option -Bsymbolic-functions. - that libc functions used by the dynamic linker do not require access to data symbols. - that static/internal function calls and data accesses can be made without performing any relocations, or that arch-specific startup code handled any such relocations needed. removing these assumptions paves the way for allowing libc.so itself to be built with stack protector (among other things), and is achieved by a three-stage bootstrap process: 1. relative relocations are processed with a flat function. 2. symbolic relocations are processed with no external calls/data. 3. main program and dependency libs are processed with a fully-functional libc/ldso. reduction in arch-specific code is achived through the following: - crt_arch.h, used for generating crt1.o, now provides the entry point for the dynamic linker too. - asm is no longer responsible for skipping the beginning of argv[] when ldso is invoked as a command. - the functionality previously provided by __reloc_self for heavily GOT-dependent RISC archs is now the arch-agnostic stage-1. - arch-specific relocation type codes are mapped directly as macros rather than via an inline translation function/switch statement.
2015-04-01move O_PATH definition back to arch bitsRich Felker-0/+1
while it's the same for all presently supported archs, it differs at least on sparc, and conceptually it's no less arch-specific than the other O_* macros. O_SEARCH and O_EXEC are still defined in terms of O_PATH in the main fcntl.h.
2015-03-18fix MINSIGSTKSZ values for archs with large signal contextsRich Felker-0/+5
the previous values (2k min and 8k default) were too small for some archs. aarch64 reserves 4k in the signal context for future extensions and requires about 4.5k total, and powerpc reportedly uses over 2k. the new minimums are chosen to fit the saved context and also allow a minimal signal handler to run. since the default (SIGSTKSZ) has always been 6k larger than the minimum, it is also increased to maintain the 6k usable by the signal handler. this happens to be able to store one pathname buffer and should be sufficient for calling any function in libc that doesn't involve conversion between floating point and decimal representations. x86 (both 32-bit and 64-bit variants) may also need a larger minimum (around 2.5k) in the future to support avx-512, but the values on these archs are left alone for now pending further analysis. the value for PTHREAD_STACK_MIN is not increased to match MINSIGSTKSZ at this time. this is so as not to preclude applications from using extremely small thread stacks when they know they will not be handling signals. unfortunately cancellation and multi-threaded set*id() use signals as an implementation detail and therefore require a stack large enough for a signal context, so applications which use extremely small thread stacks may still need to avoid using these features.
2015-03-07fix FLT_ROUNDS to reflect the current rounding modeSzabolcs Nagy-1/+0
Implemented as a wrapper around fegetround introducing a new function to the ABI: __flt_rounds. (fegetround cannot be used directly from float.h)
2015-03-04fix POLLWRNORM and POLLWRBAND on mipsTrutz Behn-0/+0
these macros have the same distinct definition on blackfin, frv, m68k, mips, sparc and xtensa kernels. POLLMSG and POLLRDHUP additionally differ on sparc.
2015-03-03make all objects used with atomic operations volatileRich Felker-7/+7
the memory model we use internally for atomics permits plain loads of values which may be subject to concurrent modification without requiring that a special load function be used. since a compiler is free to make transformations that alter the number of loads or the way in which loads are performed, the compiler is theoretically free to break this usage. the most obvious concern is with atomic cas constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be transformed to a_cas(p,*p,f(*p)); where the latter is intended to show multiple loads of *p whose resulting values might fail to be equal; this would break the atomicity of the whole operation. but even more fundamental breakage is possible. with the changes being made now, objects that may be modified by atomics are modeled as volatile, and the atomic operations performed on them by other threads are modeled as asynchronous stores by hardware which happens to be acting on the request of another thread. such modeling of course does not itself address memory synchronization between cores/cpus, but that aspect was already handled. this all seems less than ideal, but it's the best we can do without mandating a C11 compiler and using the C11 model for atomics. in the case of pthread_once_t, the ABI type of the underlying object is not volatile-qualified. so we are assuming that accessing the object through a volatile-qualified lvalue via casts yields volatile access semantics. the language of the C standard is somewhat unclear on this matter, but this is an assumption the linux kernel also makes, and seems to be the correct interpretation of the standard.
2015-02-09add syscall numbers for the new execveat syscallSzabolcs Nagy-0/+2
this syscall allows fexecve to be implemented without /proc, it is new in linux v3.19, added in commit 51f39a1f0cea1cacf8c787f652f26dfee9611874 (sh and microblaze do not have allocated syscall numbers yet) added a x32 fix as well: the io_setup and io_submit syscalls are no longer common with x86_64, so use the x32 specific numbers.
2015-02-07remove cruft from x86_64 syscall.hSzabolcs Nagy-23/+0
x86_64 syscall.h defined some musl internal syscall names and made them public. These defines were already moved to src/internal/syscall.h (except for SYS_fadvise which is added now) so the cruft in x86_64 syscall.h is not needed.
2015-02-01fix typo in x86_64/x32 user_fpregs_structFelix Janda-1/+1
mxcs_mask should be mxcr_mask
2015-01-30move MREMAP_MAYMOVE and MREMAP_FIXED out of bitsTrutz Behn-3/+0
the definitions are generic for all kernel archs. exposure of these macros now only occurs on the same feature test as for the function accepting them, which is believed to be more correct.
2014-12-23add new syscall numbers for bpf and kexec_file_loadSzabolcs Nagy-0/+4
these syscalls are new in linux v3.18, bpf is present on all supported archs except sh, kexec_file_load is only allocted for x86_64 and x32 yet. bpf was added in linux commit 99c55f7d47c0dc6fc64729f37bf435abf43f4c60 kexec_file_load syscall number was allocated in commit f0895685c7fd8c938c91a9d8a6f7c11f22df58d2
2014-12-21move wint_t definition to the shared part of alltypes.h.inRich Felker-1/+0
2014-10-10add explicit barrier operation to internal atomic.h APIRich Felker-0/+5
2014-10-08add new syscall numbers for seccomp, getrandom, memfd_createSzabolcs Nagy-0/+6
these syscalls are new in linux v3.17 and present on all supported archs except sh. seccomp was added in commit 48dc92b9fc3926844257316e75ba11eb5c742b2c it has operation, flags and pointer arguments (if flags==0 then it is the same as prctl(PR_SET_SECCOMP,...)), the uapi header for flag definitions is linux/seccomp.h getrandom was added in commit c6e9d6f38894798696f23c8084ca7edbf16ee895 it provides an entropy source when open("/dev/urandom",..) would fail, the uapi header for flags is linux/random.h memfd_create was added in commit 9183df25fe7b194563db3fec6dc3202a5855839c it allows anon mmap to have an fd, that can be shared, sealed and needs no mount point, the uapi header for flags is linux/memfd.h
2014-09-06add threads.h and needed per-arch types for mtx_t and cnd_tRich Felker-0/+2
based on patch by Jens Gustedt. mtx_t and cnd_t are defined in such a way that they are formally "compatible types" with pthread_mutex_t and pthread_cond_t, respectively, when accessed from a different translation unit. this makes it possible to implement the C11 functions using the pthread functions (which will dereference them with the pthread types) without having to use the same types, which would necessitate either namespace violations (exposing pthread type names in threads.h) or incompatible changes to the C++ name mangling ABI for the pthread types. for the rest of the types, things are much simpler; using identical types is possible without any namespace considerations.
2014-08-20add max_align_t definition for C11 and C++11Rich Felker-0/+2
unfortunately this needs to be able to vary by arch, because of a huge mess GCC made: the GCC definition, which became the ABI, depends on quirks in GCC's definition of __alignof__, which does not match the formal alignment of the type. GCC's __alignof__ unexpectedly exposes the an implementation detail, its "preferred alignment" for the type, rather than the formal/ABI alignment of the type, which it only actually uses in structures. on most archs the two values are the same, but on some (at least i386) the preferred alignment is greater than the ABI alignment. I considered using _Alignas(8) unconditionally, but on at least one arch (or1k), the alignment of max_align_t with GCC's definition is only 4 (even the "preferred alignment" for these types is only 4).
2014-08-17make pointers used in robust list volatileRich Felker-1/+1
when manipulating the robust list, the order of stores matters, because the code may be asynchronously interrupted by a fatal signal and the kernel will then access the robust list in what is essentially an async-signal context. previously, aliasing considerations made it seem unlikely that a compiler could reorder the stores, but proving that they could not be reordered incorrectly would have been extremely difficult. instead I've opted to make all the pointers used as part of the robust list, including those in the robust list head and in the individual mutexes, volatile. in addition, the format of the robust list has been changed to point back to the head at the end, rather than ending with a null pointer. this is to match the documented kernel robust list ABI. the null pointer, which was previously used, only worked because faults during access terminate the robust list processing.
2014-07-27clean up unused and inconsistent atomics in arch dirsRich Felker-25/+0
the a_cas_l, a_swap_l, a_swap_p, and a_store_l operations were probably used a long time ago when only i386 and x86_64 were supported. as other archs were added, support for them was inconsistent, and they are obviously not in use at present. having them around potentially confuses readers working on new ports, and the type-punning hacks and inconsistent use of types in their definitions is not a style I wish to perpetuate in the source tree, so removing them seems appropriate.
2014-07-20add syscall numbers for the new renameat2 syscallSzabolcs Nagy-0/+3
it's like rename but with flags eg. to allow atomic exchange of two files, introduced in linux 3.15 commit 520c8b16505236fc82daa352e6c5e73cd9870cff
2014-06-19add tlsdesc support for x86_64Rich Felker-0/+2
2014-06-18refactor to remove arch-specific relocation code from dynamic linkerRich Felker-29/+12
this was one of the main instances of ugly code duplication: all archs use basically the same types of relocations, but roughly equivalent logic was duplicated for each arch to account for the different naming and numbering of relocation types and variation in whether REL or RELA records are used. as an added bonus, both REL and RELA are now supported on all archs, regardless of which is used by the standard toolchain.
2014-06-16dynamic linker: permit error returns from arch-specific reloc functionRich Felker-1/+2
the immediate motivation is supporting TLSDESC relocations which require allocation and thus may fail (unless we pre-allocate), but this mechanism should also be used for throwing an error on unsupported or invalid relocation types, and perhaps in certain cases, for reporting when a relocation is not satisfiable.
2014-05-30add sched_{get,set}attr syscall numbers and SCHED_DEADLINE macroSzabolcs Nagy-0/+4
linux 3.14 introduced sched_getattr and sched_setattr syscalls in commit d50dde5a10f305253cbc3855307f608f8a3c5f73 and the related SCHED_DEADLINE scheduling policy in commit aab03e05e8f7e26f51dee792beddcb5cca9215a5 but struct sched_attr "extended scheduling parameters data structure" is not yet exported to userspace (necessary for using the syscalls) so related uapi definitions are not added yet.
2014-04-16add working vdso clock_gettime support, including static linkingRich Felker-0/+4
the vdso symbol lookup code is based on the original 2011 patch by Nicholas J. Kain, with some streamlining, pointer arithmetic fixes, and one symbol version matching fix. on the consumer side (clock_gettime), per-arch macros for the particular symbol name and version to lookup are added in syscall_arch.h, and no vdso code is pulled in on archs which do not define these macros. at this time, vdso is enabled only on x86_64. the vdso support at the dynamic linker level is no longer useful to libc, but is left in place for the sake of debuggers (which may need the vdso in the link map to find its functions) and possibly use with dlsym.
2014-04-15fix RLIMIT_ constants for mipsSzabolcs Nagy-0/+0
The mips arch is special in that it uses different RLIMIT_ numbers than other archs, so allow bits/resource.h to override the default RLIMIT_ numbers (empty on all archs except mips). Reported by orc.
2014-03-18fix signal.h breakage from moving stack_t to arch-specific bitsRich Felker-6/+6
in the previous changes, I missed the fact that both the prototype of the sigaltstack function and the definition of ucontext_t depend on stack_t.
2014-03-18move signal.h definition of stack_t to arch-specific bitsRich Felker-0/+6
it's different at least on mips. mips version will be fixed in a separate commit to show the change.
2014-03-11move struct semid_ds to from shared sys/sem.h to bitsRich Felker-0/+16
the definition was found to be incorrect at least for powerpc, and fixing this cleanly requires making the definition arch-specific. this will allow cleaning up the definition for other archs to make it more specific, and reversing some of the ugliness (time_t hacks) introduced with the x32 port. this first commit simply copies the existing definition to each arch without any changes. this is intentional, to make it easier to review changes made on a per-arch basis.
2014-02-23sys/shm.h: move arch specific structs to bits/rofl0r-0/+11
2014-01-15remove more unnecessary operand-size suffixes from x86_64 atomic.hRich Felker-3/+3
2014-01-11remove gratuitous temp vars, casts, and suffixes in x86_64 atomic.hRich Felker-13/+11
aside from general cleanup, this should allow the identical atomic.h file to be used for the upcoming x32 port.
2014-01-11remove size suffix in x86_64 __pthread_self asmRich Felker-1/+1
the operand size is unnecessary, since the assembler knows it from the destination register size. removing the suffix makes it so the same code should work for x32.
2014-01-11make type of st_dev explicitly dev_t in x86_64 stat.hRich Felker-1/+1
otherwise it's unclear that it's correct. aside from that, it makes for a gratuitous difference between the x86_64 header and the upcoming x32 header.
2014-01-08add IUTF8 to termios.h on archs that were missing itRich Felker-0/+1
2014-01-08fix namespace violations in termios.h, at least mostlyRich Felker-8/+7
the fix should be complete on archs that use the generic definitions (i386, arm, x86_64, microblaze), but mips and powerpc have not been checked thoroughly and may need more fixes.