summaryrefslogtreecommitdiff
path: root/src/string
AgeCommit message (Collapse)AuthorLines
2015-02-26overhaul optimized x86_64 memset asmRich Felker-26/+55
on most cpu models, "rep stosq" has high overhead that makes it undesirable for small memset sizes. the new code extends the minimal-branch fast path for short memsets from size 15 up to size 126, and shrink-wraps this code path. in addition, "rep stosq" is sensitive to misalignment. the cost varies with size and with cpu model, but it has been observed performing 1.5 times slower when the destination address is not aligned mod 16. the new code thus ensures alignment mod 16, but also preserves any existing additional alignment, in case there are cpu models where it is beneficial. this version is based in part on changes proposed by Denys Vlasenko.
2015-02-26overhaul optimized i386 memset asmRich Felker-32/+61
on most cpu models, "rep stosl" has high overhead that makes it undesirable for small memset sizes. the new code extends the minimal-branch fast path for short memsets from size 15 up to size 62, and shrink-wraps this code path. in addition, "rep stosl" is very sensitive to misalignment. the cost varies with size and with cpu model, but it has been observed performing 1.5 to 4 times slower when the destination address is not aligned mod 16. the new code thus ensures alignment mod 16, but also preserves any existing additional alignment, in case there are cpu models where it is beneficial. this version is based in part on changes to the x86_64 memset asm proposed by Denys Vlasenko.
2015-02-10x86_64/memset: avoid performing final store twiceDenys Vlasenko-1/+1
The code does a potentially misaligned 8-byte store to fill the tail of the buffer. Then it fills the initial part of the buffer which is a multiple of 8 bytes. Therefore, if size is divisible by 8, we were storing last word twice. This patch decrements byte count before dividing it by 8, making one less store in "size is divisible by 8" case, and not changing anything in all other cases. All at the cost of replacing one MOV insn with LEA insn. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2015-02-10x86_64/memset: simple optimizationsDenys Vlasenko-14/+16
"and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use 4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead. 64-bit imul is slow, move it as far up as possible so that the result (rax) has more time to be ready by the time we start using it in mem stores. There is no need to shuffle registers in preparation to "rep movs" if we are not going to take that code path. Thus, patch moves "jump if len < 16" instructions up, and changes alternate code path to use rdx and rdi instead of rcx and r8. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
2014-11-23fix tabs/spaces in memcpy.sRich Felker-279/+279
this file had been a mess that went unnoticed ever since it was imported. some lines used spaces for indention while others used tabs, and tabs were used for alignment.
2014-11-23fix build regression in arm asm for memcpyRich Felker-30/+30
commit 27828f7e9adb6b4f93ca56f6f98ef4c44bb5ed4e fixed compatibility with clang's internal assembler, but broke compatibility with gas and the traditional arm asm syntax by switching to the arm "unified assembler language" (UAL). recent versions of gas also support UAL, but require the .syntax directive to be used to switch to it. clang on the other hand defaults to UAL. and old versions of gas (still relevant) don't support UAL at all. for the conditional ldm/stm instructions, "ia" is default and can just be omitted, resulting in a mnemonic that's compatible with both traditional and UAL syntax. but for byte/halfword loads and stores, there seems to be no mnemonic compatible with both, and thus .word is used to produce the desired opcode explicitly. the .inst directive is not used because it is not compatible with older assemblers.
2014-11-23arm assembly changes for clang compatibilityJoakim Sindholt-30/+30
2014-10-04fix handling of odd lengths in swab functionRich Felker-1/+1
this function is specified to leave the last byte with "unspecified disposition" when the length is odd, so for the most part correct programs should not be calling swab with odd lengths. however, doing so is permitted, and should not write past the end of the destination buffer.
2014-07-26add support for LC_TIME and LC_MESSAGES translationsRich Felker-2/+3
for LC_MESSAGES, translation of strerror and similar literal message functions is supported. for messages in other places (particularly the dynamic linker) that use format strings, translation is not yet supported. in order to make it possible and safe, such messages will need to be refactored to separate the textual content from the format. for LC_TIME, the day and month names and strftime-style format strings provided by nl_langinfo are supported for translation. however there may be limitations, as some of the original C-locale nl_langinfo strings are non-unique and thus perhaps non-suitable as keys. overall, the locale support activated by this commit should not be seen as complete and polished but as a basis for beginning to test locale functionality and implement locales.
2014-07-02consolidate str[n]casecmp_l into str[n]casecmp source filesRich Felker-0/+16
this is mainly done for consistency with the ctype functions and to declutter the src/locale directory.
2014-06-19fix incorrect comparison loop condition in memmemRich Felker-2/+2
the logic for this loop was copied from null-terminated-string logic in strstr without properly adapting it to work with explicit lengths. presumably this error could result in false negatives (wrongly comparing past the end of the needle/haystack), false positives (stopping comparison early when the needle contains null bytes), and crashes (from runaway reads past the end of mapped memory).
2014-04-18fix false negatives with periodic needles in strstr, wcsstr, and memmemRich Felker-3/+3
in cases where the memorized match range from the right factor exceeded the length of the left factor, it was wrongly treated as a mismatch rather than a match. issue reported by Yves Bastide.
2014-04-09fix search past the end of haystack in memmemTimo Teräs-0/+1
to optimize the search, memchr is used to find the first occurrence of the first character of the needle in the haystack before switching to a search for the full needle. however, the number of characters skipped by this first step were not subtracted from the haystack length, causing memmem to search past the end of the haystack.
2013-12-12include cleanups: remove unused headers and add feature test macrosSzabolcs Nagy-18/+7
2013-11-23strcmp: Remove unnecessary check for *rMichael Forney-1/+1
If *l == *r && *l, then by transitivity, *r.
2013-08-28optimized C memcpyRich Felker-16/+111
unlike the old C memcpy, this version handles word-at-a-time reads and writes even for misaligned copies. it does not require that the cpu support misaligned accesses; instead, it performs bit shifts to realign the bytes for the destination. essentially, this is the C version of the ARM assembly language memcpy. the ideas are all the same, and it should perform well on any arch with a decent number of general-purpose registers that has a barrel shift operation. since the barrel shifter is an optional cpu feature on microblaze, it may be desirable to provide an alternate asm implementation on microblaze, but otherwise the C code provides a competitive implementation for "generic risc-y" cpu archs that should alleviate the urgent need for arch-specific memcpy asm.
2013-08-27optimized C memsetRich Felker-12/+77
this version of memset is optimized both for small and large values of n, and makes no misaligned writes, so it is usable (and near-optimal) on all archs. it is capable of filling up to 52 or 56 bytes without entering a loop and with at most 7 branches, all of which can be fully predicted if memset is called multiple times with the same size. it also uses the attribute extension to inform the compiler that it is violating the aliasing rules, unlike the previous code which simply assumed it was safe to violate the aliasing rules since translation unit boundaries hide the violations from the compiler. for non-GNUC compilers, 100% portable fallback code in the form of a naive loop is provided. I intend to eventually apply this approach to all of the string/memory functions which are doing word-at-a-time accesses.
2013-08-14add arm-optimized memcpy implementation from bionic libcRich Felker-0/+383
the approach of this implementation was heavily investigated prior to adopting it. attempts to obtain similar performance with pure C code were capping out at about 75% of the performance of the asm, with considerably larger code size, and were fragile in that the compiler would sometimes compile part of memcpy into a call to itself. therefore, just using the asm seems to be the best option. this commit is the first to make use of the new subarch-specific asm framework. the new armel directory is the location for arm asm that should not be used for all arm subarchs, only the default one. armhf is the name of the little-endian hardfloat-ABI subarch, which can use the exact same asm. in both cases, the build system finds the asm by following a memcpy.sub file. the other two subarchs, armeb and armebhf, would need a big-endian variant of this code. it would not be hard to adapt the code to big endian, but I will hold off on doing so until there is demand for it.
2013-08-01optimized memset asm for i386 and x86_64Rich Felker-0/+88
the concept of both versions is the same; they differ only in details. for long runs, they use "rep movsl" or "rep movsq", and for small runs, they use a trick, writing from both ends towards the middle, that reduces the number of branches needed. in addition, if memset is called multiple times with the same length, all branches will be predicted; there are no loops. for larger runs, there are likely faster approaches than "rep", at least on some cpu models. for 32-bit, it's unlikely that there is any faster approach that does not require non-baseline instructions; doing anything fancier would require inspecting cpu capabilities. for 64-bit, there may very well be faster versions that work on all models; further optimization could be explored in the future. with these changes, memset is anywhere between 50% faster and 6 times faster, depending on the cpu model and the length and alignment of the destination buffer.
2013-07-09fix a couple misleading/wrong signal descriptions in strsignalRich Felker-2/+2
there are still several more that are misleading, but SIGFPE (integer division error misdescribed as floating point) and and SIGCHLD (possibly non-exit status change events described as exiting) were the worst offenders.
2013-07-09add realtime signals to strsignalRich Felker-3/+19
the name format RTnn/RTnnn was chosen to minimized bloat while uniquely identifying the signal.
2013-07-09fix off-by-one array bound in strsignalRich Felker-1/+1
2013-04-05Add ABI compatability aliases.Isaac Dunham-0/+3
GNU used several extensions that were incompatible with C99 and POSIX, so they used alternate names for the standard functions. The result is that we need these to run standards-conformant programs that were linked with glibc.
2013-02-26fix integer type issue in strverscmpRich Felker-1/+3
lenl-lenr is not a valid expression for a signed int return value from strverscmp, since after implicit conversion from size_t to int this difference could have the wrong sign or might even be zero. using the difference for char values works since they're bounded well within the range of differences representable by int, but it does not work for size_t values.
2013-02-26implement non-stub strverscmpRich Felker-2/+35
patch by Isaac Dunham.
2013-02-21replace stub with working strcasestrRich Felker-2/+4
2013-02-21fix wrong return value from wmemmove on forward copiesRich Felker-1/+2
2012-12-26fix alignment logic in strlcpyRich Felker-1/+1
2012-10-22simplify logic in stpcpy; avoid copying first aligned byte twiceRich Felker-4/+4
gcc seems to be generating identical or near-identical code for both versions, but the newer code is more expressive of what it's doing.
2012-10-15add memmem function (gnu extension)Rich Felker-0/+148
based on strstr. passes gnulib tests and a few quick checks of my own.
2012-09-27optimize strchrnul/strcspn not to scan string twice on no-matchRich Felker-25/+29
when strchr fails, and important piece of information already computed, the string length, is thrown away. have strchrnul (with namespace protection) be the underlying function so this information can be kept, and let strchr be a wrapper for it. this also allows strcspn to be considerably faster in the case where the match set has a single element that's not matched.
2012-09-27slightly cleaner strlen, also seems to compile to better codeRich Felker-6/+4
testing with gcc 4.6.3 on x86, -Os, the old version does a duplicate null byte check after the first loop. this is purely the compiler being stupid, but the old code was also stupid and unintuitive in how it expressed the check.
2012-09-10asm for memmove on i386 and x86_64Rich Felker-0/+36
for the sake of simplicity, I've only used rep movsb rather than breaking up the copy for using rep movsd/q. on all modern cpus, this seems to be fine, but if there are performance problems, there might be a need to go back and add support for rep movsd/q.
2012-09-10reenable word-at-at-time copying in memmoveRich Felker-4/+27
before restrict was added, memove called memcpy for forward copies and used a byte-at-a-time loop for reverse copies. this was changed to avoid invoking UB now that memcpy has an undefined copying order, making memmove considerably slower. performance is still rather bad, so I'll be adding asm soon.
2012-09-06use restrict everywhere it's required by c99 and/or posix 2008Rich Felker-20/+20
to deal with the fact that the public headers may be used with pre-c99 compilers, __restrict is used in place of restrict, and defined appropriately for any supported compiler. we also avoid the form [restrict] since older versions of gcc rejected it due to a bug in the original c99 standard, and instead use the form *restrict.
2012-09-06remove dependency of wmemmove on wmemcpy directionRich Felker-4/+4
unlike the memmove commit, this one should be fine to leave in place. wmemmove is not performance-critical, and even if it were, it's already copying whole 32-bit words at a time instead of bytes.
2012-09-06remove dependency of memmove on memcpy directionRich Felker-5/+4
this commit introduces a performance regression in many uses of memmove, which will need to be addressed before the next release. i'm making it as a temporary measure so that the restrict patch can be committed without invoking undefined behavior when memmove calls memcpy with overlapping regions.
2012-08-11memcpy asm for i386 and x86_64Rich Felker-0/+51
2012-08-11remove unused but buggy code from strstr.cRich Felker-10/+0
2012-08-11remove buggy short-string wcsstr implementation; always use twowayRich Felker-9/+0
since this interface is rarely used, it's probably best to lean towards keeping code size down anyway. one-character needles will still be found immediately by the initial wcschr call anyway.
2012-07-31optimize mempcpy to minimize need for data saved across the callRich Felker-2/+1
2012-06-20make strerror_r behave nicer on failureRich Felker-2/+8
if the buffer is too short, at least return a partial string. this is helpful if the caller is lazy and does not check for failure. care is taken to avoid writing anything if the buffer length is zero, and to always null-terminate when the buffer length is non-zero.
2012-05-26fix overrun (n essentially ignored) in wcsncmpRich Felker-1/+1
bug report and solution by Richard Pennington
2012-05-26fix failure of strrchr(str, 0)Rich Felker-1/+1
bug report and solution by Richard Pennington
2012-03-01add all missing wchar functions except floating point parsersRich Felker-0/+71
these are mostly untested and adapted directly from corresponding byte string functions and similar.
2011-09-11add dummied strverscmp (obnoxious GNU function)Rich Felker-0/+7
programs that use this tend to horribly botch international text support, so it's questionable whether we want to support it even in the long term... for now, it's just a dummy that calls strcmp.
2011-06-13fix wrong type for wcsrchr argument 2Rich Felker-1/+1
2011-05-22fix strncat and wcsncat (double null termination)Rich Felker-3/+3
also modify wcsncpy to use the same loop logic
2011-05-22fix wcsncpy writing past end of bufferRich Felker-1/+1
2011-04-26function signature fix: add const qualifier to mempcpy src argRich Felker-1/+1