2019-09-11fix arm __a_barrier_oldkuser when built as thumbRich Felker-2/+2
as noted in commit 05870abeaac0588fb9115cfd11f96880a0af2108, mov lr,pc is not a valid method for saving the return address in code that might be built as thumb. this one is unlikely to matter, since any ISA level that has thumb2 should also have native implementations of atomics that don't involve kuser_helper, and the affected code is only used on very old kernels to begin with.
2019-09-11fix code path where child function returns in arm __clone built as thumbRich Felker-7/+3
mov lr,pc is not a valid way to save the return address in thumb mode since it omits the thumb bit. use a chain of bl and bx to emulate blx. this could be avoided by converting to a .S file with preprocessor conditions to use blx if available, but the time cost here is dominated by the syscall anyway. while making this change, also remove the remnants of support for pre-bx ISA levels. commit 9f290a49bf9ee247d540d3c83875288a7991699c removed the hack from the parent code paths, but left the unnecessary code in the child. keeping it would require rewriting two code paths rather than one, and is useless for reasons described in that commit.
2019-08-06in arm cancellation point asm, don't unnecessarily preserve link registerPatrick Oppenlander-4/+4
The only reason we needed to preserve the link register was because we were using a branch-link instruction to branch to __cp_cancel. Replacing this with a branch means we can avoid the save/restore as the link register is no longer modified.
2018-09-12make arch __clone backends hiddenRich Felker-0/+1
these are not a public interface and are not intended to be callable from anywhere but the public clone function or other places in libc.
2018-09-05define and use internal macros for hidden visibility, weak refsRich Felker-3/+2
this cleans up what had become widespread direct inline use of "GNU C" style attributes directly in the source, and lowers the barrier to increased use of hidden visibility, which will be useful to recovering some of the efficiency lost when the protected visibility hack was dropped in commit dc2f368e565c37728b0d620380b849c3a1ddd78f, especially on archs where the PLT ABI is costly.
2018-08-28rewrite __aeabi_read_tp in asmSzabolcs Nagy-12/+6
__aeabi_read_tp used to call c code, but that was incorrect as the arm runtime abi specifies special pcs for this function: it is only allowed to clobber r0, ip, lr and cpsr. maintainer's note: the old code explicitly saved and restored all general-purpose registers which are call-clobbered in the normal calling convention, so it's unlikely that any real-world compilers produced code that could break. however theoretically they could have chosen to use floating point registers, in which case the caller's values of those registers would be clobbered.
2018-08-23fix tls access on arm targets before armv6kSzabolcs Nagy-1/+1
commit 610c5a8524c3d6cd3ac5a5f1231422e7648a3791 changed the thread pointer setup so tp points at the end of the pthread struct on arm, but failed to update __aeabi_read_tp so it was off by 8. this broke tls access in code that is compiled with -mtp=soft, which is the default when target arch is pre armv6k or thumb1. maintainer's note: no release versions are affected.
2017-02-15fix build regression in arm atomics asm with new binutilsRich Felker-1/+1
binutils commit bada43421274615d0d5f629a61a60b7daa71bc15 tightened immediate fixup handling in gas in such a way that the final .arch of an object file must be compatible with the fixups used when the instruction was assembled; this in turn broke assembling of atomics.s, at least in thumb mode. it's not clear whether this should be considered a bug in gas, but .object_arch is preferable anyway for our purpose here of controlling the ISA level tag on the object file being produced, and it's the intended directive for use in object files with runtime code selection. research by Szabolcs Nagy confirmed that .object_arch is supported in all relevant versions of binutils and clang's integrated assembler. patch by Reiner Herrmann.
2016-12-19rework arm atomic/tp backends to be thumb-compatible and fdpic-readyRich Felker-56/+69
three problems are addressed: - use of pc arithmetic, which was difficult if not impossible to make correct in thumb mode on all models, so that relative rather than absolute pointers to the backends could be used. this was designed back when there was no coherent model for the early stages of the dynamic linker before relocations, and is no longer necessary. - assumption that data (the relative pointers to the backends) can be accessed at a constant displacement from the code. this will not be possible on future fdpic subarchs (for cortex-m), so move responsibility for loading the backend code address to the caller. - hard-coded arm opcodes using the .word directive. instead, use the .arch directive to work around the assembler's refusal to assemble instructions not available (or in some cases, available but just considered deprecated) in the target isa level. the obscure v6t2 arch is used for v6 code so as to (1) allow generation of thumb2 output if -mthumb is active, and (2) avoid warnings/errors for mcr barriers that clang would produce if we just set arch to v7-a. in addition, the __aeabi_read_tp function is moved out of the inner workings and implemented as an asm wrapper around a C function, so that asm code does not need to read global data. the asm wrapper serves to satisfy the ABI calling convention requirements for this function.
2016-01-30fix misaligned pointer-like objects in arm atomics asm source fileRich Felker-0/+2
this file's .data section was not aligned, and just happened to get the correct alignment with past builds. it's likely that the move of atomic.s from arch/arm/src to src/thread/arm caused the change in alignment, which broke the atomic and thread-pointer access fragments on actual armv5 hardware.
2016-01-22move arm-specific translation units out of arch/arm/src, to src/*/armRich Felker-1/+160
this is possible with the new build system that allows src/*/$(ARCH)/* files which do not shadow a file in the parent directory, and yields a more logical organization. eventually it will be possible to remove arch/*/src from the build system.
2015-11-10explicitly assemble all arm asm sources as UALRich Felker-0/+3
these files are all accepted as legacy arm syntax when producing arm code, but legacy syntax cannot be used for producing thumb2 with access to the full ISA. even after switching to UAL, some asm source files contain instructions which are not valid in thumb mode, so these will need to be addressed separately.
2015-11-09remove non-working pre-armv4t support from arm asmRich Felker-4/+0
the idea of the three-instruction sequence being removed was to be able to return to thumb code when used on armv4t+ from a thumb caller, but also to be able to run on armv4 without the bx instruction available (in which case the low bit of lr would always be 0). however, without compiler support for generating such a sequence from C code, which does not exist and which there is unlikely to be interest in implementing, there is little point in having it in the asm, and it would likely be easier to add pre-armv4t support via enhanced linker handling of R_ARM_V4BX than at the compiler level. removing this code simplifies adding support for building libc in thumb2-only form (for cortex-m).
2015-04-14consistently use hidden visibility for cancellable syscall internalsRich Felker-3/+8
in a few places, non-hidden symbols were referenced from asm in ways that assumed ld-time binding. while these is no semantic reason these symbols need to be hidden, fixing the references without making them hidden was going to be ugly, and hidden reduces some bloat anyway. in the asm files, .global/.hidden directives have been moved to the top to unclutter the actual code.
2015-02-20prepare cancellation syscall asm for possibility of __cancel returningRich Felker-1/+5
2014-11-22fix __aeabi_read_tp oversight in arm atomics/tls overhaulRich Felker-4/+0
calls to __aeabi_read_tp may be generated by the compiler to access TLS on pre-v6 targets. previously, this function was hard-coded to call the kuser helper, which would crash on kernels with kuser helper removed. to fix the problem most efficiently, the definition of __aeabi_read_tp is moved so that it's an alias for the new __a_gettp. however, on v7+ targets, code to initialize the runtime choice of thread-pointer loading code is not even compiled, meaning that defining __aeabi_read_tp would have caused an immediate crash due to using the default implementation of __a_gettp with a HCF instruction. fortunately there is an elegant solution which reduces overall code size: putting the native thread-pointer loading instruction in the default code path for __a_gettp, so that separate default/native code paths are not needed. this function should never be called before __set_thread_area anyway, and if it is called early on pre-v6 hardware, the old behavior (crashing) is maintained. ideally __aeabi_read_tp would not be called at all on v7+ targets anyway -- in fact, prior to the overhaul, the same problem existed, but it was never caught by users building for v7+ with kuser disabled. however, it's possible for calls to __aeabi_read_tp to end up in a v7+ binary if some of the object files were built for pre-v7 targets, e.g. in the case of static libraries that were built separately, so this case needs to be handled.
2014-11-19overhaul ARM atomics/tls for performance and compatibilityRich Felker-12/+1
previously, builds for pre-armv6 targets hard-coded use of the "kuser helper" system for atomics and thread-pointer access, resulting in binaries that fail to run (crash) on systems where this functionality has been disabled (as a security/hardening measure) in the kernel. additionally, builds for armv6 hard-coded an outdated/deprecated memory barrier instruction which may require emulation (extremely slow) on future models. this overhaul replaces the behavior for all pre-armv7 builds (both of the above cases) to perform runtime detection of the appropriate mechanisms for barrier, atomic compare-and-swap, and thread pointer access. detection is based on information provided by the kernel in auxv: presence of the HWCAP_TLS bit for AT_HWCAP and the architecture version encoded in AT_PLATFORM. direct use of the instructions is preferred when possible, since probing for the existence of the kuser helper page would be difficult and would incur runtime cost. for builds targeting armv7 or later, the runtime detection code is not compiled at all, and much more efficient versions of the non-cas atomic operations are provided by using ldrex/strex directly rather than wrapping cas.
2014-02-09clone: make clone a wrapper around __cloneBobby Bingham-3/+0
The architecture-specific assembly versions of clone did not set errno on failure, which is inconsistent with glibc. __clone still returns the error via its return value, and clone is now a wrapper that sets errno as needed. The public clone has also been moved to src/linux, as it's not directly related to the pthreads API. __clone is called by pthread_create, which does not report errors via errno. Though not strictly necessary, it's nice to avoid clobbering errno here.
2012-10-15add support for TLS variant I, presently needed for arm and mipsRich Felker-0/+4
despite documentation that makes it sound a lot different, the only ABI-constraint difference between TLS variants II and I seems to be that variant II stores the initial TLS segment immediately below the thread pointer (i.e. the thread pointer points to the end of it) and variant I stores the initial TLS segment above the thread pointer, requiring the thread descriptor to be stored below. the actual value stored in the thread pointer register also tends to have per-arch random offsets applied to it for silly micro-optimization purposes. with these changes applied, TLS should be basically working on all supported archs except microblaze. I'm still working on getting the necessary information and a working toolchain that can build TLS binaries for microblaze, but in theory, static-linked programs with TLS and dynamic-linked programs where only the main executable uses TLS should already work on microblaze. alignment constraints have not yet been heavily tested, so it's possible that this code does not always align TLS segments correctly on archs that need TLS variant I.
2012-09-27fix arm clone syscall bug (no effect unless app uses clone)Rich Felker-2/+1
the code to exit the new thread/process after the start function returns was mixed up in its syscall convention.
2012-07-14avoid blx instruction which does not exist on armv4t or armv4Rich Felker-1/+2
2012-05-23fix bad opcode in arm syscall_cp_asmRich Felker-1/+1
2012-05-23fix issue with longjmp out of signal handlers and cancellationRich Felker-10/+8
stale state information indicating that a thread was possibly blocked at a cancellation point could get left behind if longjmp was used to exit a signal handler that interrupted a cancellation point. to fix the issue, we throw away the state information entirely and simply compare the saved instruction pointer to a range of code addresses in the __syscall_cp_asm function. all the ugly PIC work (which becomes minimal anyway with this approach) is defered to cancellation time instead of happening at every syscall, which should improve performance too. this commit also fixes cancellation on arm, which was mildly broken (race condition, not checking cancellation flag once inside the cancellation point zone). apparently i forgot to implement that. the new arm code is untested, but appears correct; i'll test and fix it later if there are problems.
2011-10-09fix typo in arm clone() asmRich Felker-1/+1
2011-09-18initial commit of the arm portRich Felker-0/+78
this port assumes eabi calling conventions, eabi linux syscall convention, and presence of the kernel helpers at 0xffff0f?0 needed for threads support. otherwise it makes very few assumptions, and the code should work even on armv4 without thumb support, as well as on systems with thumb interworking. the bits headers declare this a little endian system, but as far as i can tell the code should work equally well on big endian. some small details are probably broken; so far, testing has been limited to qemu/aboriginal linux.