musl/src/string, branch v1.1.9

remove potentially PIC-incompatible relocations from x86_64 and x32 asm

2015-04-19T01:18:23+00:00

analogous to commit 8ed66ecbcba1dd0f899f22b534aac92a282f42d5 for i386.

remove the last of possible-textrels from i386 asm

2015-04-19T00:45:39+00:00

none of these are actual textrels because of ld-time binding performed
by -Bsymbolic-functions, but I'm changing them with the goal of making
ld-time binding purely an optimization rather than relying on it for
semantic purposes.

in the case of memmove's call to memcpy, making it explicit that the
memmove asm is assuming the forward-copying behavior of the memcpy asm
is desirable anyway; in case memcpy is ever changed, the semantic
mismatch would be apparent while editing memmcpy.s.

overhaul optimized x86_64 memset asm

2015-02-26T07:07:08+00:00

on most cpu models, "rep stosq" has high overhead that makes it
undesirable for small memset sizes. the new code extends the
minimal-branch fast path for short memsets from size 15 up to size
126, and shrink-wraps this code path. in addition, "rep stosq" is
sensitive to misalignment. the cost varies with size and with cpu
model, but it has been observed performing 1.5 times slower when the
destination address is not aligned mod 16. the new code thus ensures
alignment mod 16, but also preserves any existing additional
alignment, in case there are cpu models where it is beneficial.

this version is based in part on changes proposed by Denys Vlasenko.

overhaul optimized i386 memset asm

2015-02-26T06:51:39+00:00

on most cpu models, "rep stosl" has high overhead that makes it
undesirable for small memset sizes. the new code extends the
minimal-branch fast path for short memsets from size 15 up to size 62,
and shrink-wraps this code path. in addition, "rep stosl" is very
sensitive to misalignment. the cost varies with size and with cpu
model, but it has been observed performing 1.5 to 4 times slower when
the destination address is not aligned mod 16. the new code thus
ensures alignment mod 16, but also preserves any existing additional
alignment, in case there are cpu models where it is beneficial.

this version is based in part on changes to the x86_64 memset asm
proposed by Denys Vlasenko.

x86_64/memset: avoid performing final store twice

2015-02-10T23:54:27+00:00

The code does a potentially misaligned 8-byte store to fill the tail
of the buffer. Then it fills the initial part of the buffer
which is a multiple of 8 bytes.
Therefore, if size is divisible by 8, we were storing last word twice.

This patch decrements byte count before dividing it by 8,
making one less store in "size is divisible by 8" case,
and not changing anything in all other cases.
All at the cost of replacing one MOV insn with LEA insn.

Signed-off-by: Denys Vlasenko

x86_64/memset: simple optimizations

2015-02-10T23:53:31+00:00

"and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use
4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead.

64-bit imul is slow, move it as far up as possible so that the result
(rax) has more time to be ready by the time we start using it
in mem stores.

There is no need to shuffle registers in preparation to "rep movs"
if we are not going to take that code path. Thus, patch moves
"jump if len < 16" instructions up, and changes alternate code path
to use rdx and rdi instead of rcx and r8.

Signed-off-by: Denys Vlasenko

fix tabs/spaces in memcpy.s

2014-11-23T19:33:01+00:00

this file had been a mess that went unnoticed ever since it was
imported. some lines used spaces for indention while others used tabs,
and tabs were used for alignment.

fix build regression in arm asm for memcpy

2014-11-23T19:12:14+00:00

commit 27828f7e9adb6b4f93ca56f6f98ef4c44bb5ed4e fixed compatibility
with clang's internal assembler, but broke compatibility with gas and
the traditional arm asm syntax by switching to the arm "unified
assembler language" (UAL). recent versions of gas also support UAL,
but require the .syntax directive to be used to switch to it. clang on
the other hand defaults to UAL. and old versions of gas (still
relevant) don't support UAL at all.

for the conditional ldm/stm instructions, "ia" is default and can just
be omitted, resulting in a mnemonic that's compatible with both
traditional and UAL syntax. but for byte/halfword loads and stores,
there seems to be no mnemonic compatible with both, and thus .word is
used to produce the desired opcode explicitly. the .inst directive is
not used because it is not compatible with older assemblers.

arm assembly changes for clang compatibility

2014-11-23T17:03:34+00:00

fix handling of odd lengths in swab function

2014-10-04T15:14:01+00:00

this function is specified to leave the last byte with "unspecified
disposition" when the length is odd, so for the most part correct
programs should not be calling swab with odd lengths. however, doing
so is permitted, and should not write past the end of the destination
buffer.