<feed xmlns='http://www.w3.org/2005/Atom'>
<title>musl/src/string, branch rs-1.0</title>
<subtitle>musl - an implementation of the standard library for Linux-based systems</subtitle>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/'/>
<entry>
<title>fix handling of odd lengths in swab function</title>
<updated>2015-03-30T05:15:45+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-10-04T15:14:01+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=9882dc933dd7e3cc6bf645153778c7ef48032018'/>
<id>9882dc933dd7e3cc6bf645153778c7ef48032018</id>
<content type='text'>
this function is specified to leave the last byte with "unspecified
disposition" when the length is odd, so for the most part correct
programs should not be calling swab with odd lengths. however, doing
so is permitted, and should not write past the end of the destination
buffer.

(cherry picked from commit dccbf4c809efc311aa37da71de70d04dfd8b0db3)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this function is specified to leave the last byte with "unspecified
disposition" when the length is odd, so for the most part correct
programs should not be calling swab with odd lengths. however, doing
so is permitted, and should not write past the end of the destination
buffer.

(cherry picked from commit dccbf4c809efc311aa37da71de70d04dfd8b0db3)
</pre>
</div>
</content>
</entry>
<entry>
<title>fix incorrect comparison loop condition in memmem</title>
<updated>2014-07-28T04:27:58+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-06-19T04:42:28+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=c65dbec736419d706a2d3a0070ab3bedb0151f4a'/>
<id>c65dbec736419d706a2d3a0070ab3bedb0151f4a</id>
<content type='text'>
the logic for this loop was copied from null-terminated-string logic
in strstr without properly adapting it to work with explicit lengths.

presumably this error could result in false negatives (wrongly
comparing past the end of the needle/haystack), false positives
(stopping comparison early when the needle contains null bytes), and
crashes (from runaway reads past the end of mapped memory).

(cherry picked from commit cef0f289f666b6c963bfd11537a6d80916ff889e)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the logic for this loop was copied from null-terminated-string logic
in strstr without properly adapting it to work with explicit lengths.

presumably this error could result in false negatives (wrongly
comparing past the end of the needle/haystack), false positives
(stopping comparison early when the needle contains null bytes), and
crashes (from runaway reads past the end of mapped memory).

(cherry picked from commit cef0f289f666b6c963bfd11537a6d80916ff889e)
</pre>
</div>
</content>
</entry>
<entry>
<title>fix false negatives with periodic needles in strstr, wcsstr, and memmem</title>
<updated>2014-05-20T21:58:24+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-04-18T21:38:35+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=e65bb40b30d4422f2d6ac2e1ae542e7879ddc13f'/>
<id>e65bb40b30d4422f2d6ac2e1ae542e7879ddc13f</id>
<content type='text'>
in cases where the memorized match range from the right factor
exceeded the length of the left factor, it was wrongly treated as a
mismatch rather than a match.

issue reported by Yves Bastide.

(cherry picked from commit 476cd1d96560aaf7f210319597556e7fbcd60469)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
in cases where the memorized match range from the right factor
exceeded the length of the left factor, it was wrongly treated as a
mismatch rather than a match.

issue reported by Yves Bastide.

(cherry picked from commit 476cd1d96560aaf7f210319597556e7fbcd60469)
</pre>
</div>
</content>
</entry>
<entry>
<title>fix search past the end of haystack in memmem</title>
<updated>2014-04-16T06:46:05+00:00</updated>
<author>
<name>Timo Teräs</name>
<email>timo.teras@iki.fi</email>
</author>
<published>2014-04-10T01:06:17+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=043865cadf1b26ee3a9ccdac3b0d0ca9d380cdad'/>
<id>043865cadf1b26ee3a9ccdac3b0d0ca9d380cdad</id>
<content type='text'>
to optimize the search, memchr is used to find the first occurrence of
the first character of the needle in the haystack before switching to
a search for the full needle. however, the number of characters
skipped by this first step were not subtracted from the haystack
length, causing memmem to search past the end of the haystack.

(cherry picked from commit 6fbdeff0e51f6afc38fbb1476a4db81322779da4)
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
to optimize the search, memchr is used to find the first occurrence of
the first character of the needle in the haystack before switching to
a search for the full needle. however, the number of characters
skipped by this first step were not subtracted from the haystack
length, causing memmem to search past the end of the haystack.

(cherry picked from commit 6fbdeff0e51f6afc38fbb1476a4db81322779da4)
</pre>
</div>
</content>
</entry>
<entry>
<title>include cleanups: remove unused headers and add feature test macros</title>
<updated>2013-12-12T05:09:18+00:00</updated>
<author>
<name>Szabolcs Nagy</name>
<email>nsz@port70.net</email>
</author>
<published>2013-12-12T05:09:18+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=571744447c23f91feb6439948f3a619aca850dfb'/>
<id>571744447c23f91feb6439948f3a619aca850dfb</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>strcmp: Remove unnecessary check for *r</title>
<updated>2013-11-23T21:17:38+00:00</updated>
<author>
<name>Michael Forney</name>
<email>mforney@mforney.org</email>
</author>
<published>2013-11-05T05:48:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=b300d5b7bd74070982da50d996773a2dd8156a01'/>
<id>b300d5b7bd74070982da50d996773a2dd8156a01</id>
<content type='text'>
If *l == *r &amp;&amp; *l, then by transitivity, *r.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If *l == *r &amp;&amp; *l, then by transitivity, *r.
</pre>
</div>
</content>
</entry>
<entry>
<title>optimized C memcpy</title>
<updated>2013-08-28T07:34:57+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-08-28T07:34:57+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=90edf1cc15cec685c18ec2485ddce5b655963464'/>
<id>90edf1cc15cec685c18ec2485ddce5b655963464</id>
<content type='text'>
unlike the old C memcpy, this version handles word-at-a-time reads and
writes even for misaligned copies. it does not require that the cpu
support misaligned accesses; instead, it performs bit shifts to
realign the bytes for the destination.

essentially, this is the C version of the ARM assembly language
memcpy. the ideas are all the same, and it should perform well on any
arch with a decent number of general-purpose registers that has a
barrel shift operation. since the barrel shifter is an optional cpu
feature on microblaze, it may be desirable to provide an alternate asm
implementation on microblaze, but otherwise the C code provides a
competitive implementation for "generic risc-y" cpu archs that should
alleviate the urgent need for arch-specific memcpy asm.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
unlike the old C memcpy, this version handles word-at-a-time reads and
writes even for misaligned copies. it does not require that the cpu
support misaligned accesses; instead, it performs bit shifts to
realign the bytes for the destination.

essentially, this is the C version of the ARM assembly language
memcpy. the ideas are all the same, and it should perform well on any
arch with a decent number of general-purpose registers that has a
barrel shift operation. since the barrel shifter is an optional cpu
feature on microblaze, it may be desirable to provide an alternate asm
implementation on microblaze, but otherwise the C code provides a
competitive implementation for "generic risc-y" cpu archs that should
alleviate the urgent need for arch-specific memcpy asm.
</pre>
</div>
</content>
</entry>
<entry>
<title>optimized C memset</title>
<updated>2013-08-27T22:08:29+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-08-27T22:08:29+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=a543369e3b06a51eacd392c738fc10c5267a195f'/>
<id>a543369e3b06a51eacd392c738fc10c5267a195f</id>
<content type='text'>
this version of memset is optimized both for small and large values of
n, and makes no misaligned writes, so it is usable (and near-optimal)
on all archs. it is capable of filling up to 52 or 56 bytes without
entering a loop and with at most 7 branches, all of which can be fully
predicted if memset is called multiple times with the same size.

it also uses the attribute extension to inform the compiler that it is
violating the aliasing rules, unlike the previous code which simply
assumed it was safe to violate the aliasing rules since translation
unit boundaries hide the violations from the compiler. for non-GNUC
compilers, 100% portable fallback code in the form of a naive loop is
provided. I intend to eventually apply this approach to all of the
string/memory functions which are doing word-at-a-time accesses.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this version of memset is optimized both for small and large values of
n, and makes no misaligned writes, so it is usable (and near-optimal)
on all archs. it is capable of filling up to 52 or 56 bytes without
entering a loop and with at most 7 branches, all of which can be fully
predicted if memset is called multiple times with the same size.

it also uses the attribute extension to inform the compiler that it is
violating the aliasing rules, unlike the previous code which simply
assumed it was safe to violate the aliasing rules since translation
unit boundaries hide the violations from the compiler. for non-GNUC
compilers, 100% portable fallback code in the form of a naive loop is
provided. I intend to eventually apply this approach to all of the
string/memory functions which are doing word-at-a-time accesses.
</pre>
</div>
</content>
</entry>
<entry>
<title>add arm-optimized memcpy implementation from bionic libc</title>
<updated>2013-08-14T07:06:21+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-08-14T07:06:21+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=cccc1844be95549e5b6c91ffc1f2c2ba3d3aab16'/>
<id>cccc1844be95549e5b6c91ffc1f2c2ba3d3aab16</id>
<content type='text'>
the approach of this implementation was heavily investigated prior to
adopting it. attempts to obtain similar performance with pure C code
were capping out at about 75% of the performance of the asm, with
considerably larger code size, and were fragile in that the compiler
would sometimes compile part of memcpy into a call to itself.
therefore, just using the asm seems to be the best option.

this commit is the first to make use of the new subarch-specific asm
framework. the new armel directory is the location for arm asm that
should not be used for all arm subarchs, only the default one. armhf
is the name of the little-endian hardfloat-ABI subarch, which can use
the exact same asm. in both cases, the build system finds the asm by
following a memcpy.sub file.

the other two subarchs, armeb and armebhf, would need a big-endian
variant of this code. it would not be hard to adapt the code to big
endian, but I will hold off on doing so until there is demand for it.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the approach of this implementation was heavily investigated prior to
adopting it. attempts to obtain similar performance with pure C code
were capping out at about 75% of the performance of the asm, with
considerably larger code size, and were fragile in that the compiler
would sometimes compile part of memcpy into a call to itself.
therefore, just using the asm seems to be the best option.

this commit is the first to make use of the new subarch-specific asm
framework. the new armel directory is the location for arm asm that
should not be used for all arm subarchs, only the default one. armhf
is the name of the little-endian hardfloat-ABI subarch, which can use
the exact same asm. in both cases, the build system finds the asm by
following a memcpy.sub file.

the other two subarchs, armeb and armebhf, would need a big-endian
variant of this code. it would not be hard to adapt the code to big
endian, but I will hold off on doing so until there is demand for it.
</pre>
</div>
</content>
</entry>
<entry>
<title>optimized memset asm for i386 and x86_64</title>
<updated>2013-08-02T01:44:43+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-08-02T01:44:43+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=926272ddffa293ee68ffeb01422fc8c792acf428'/>
<id>926272ddffa293ee68ffeb01422fc8c792acf428</id>
<content type='text'>
the concept of both versions is the same; they differ only in details.
for long runs, they use "rep movsl" or "rep movsq", and for small
runs, they use a trick, writing from both ends towards the middle,
that reduces the number of branches needed. in addition, if memset is
called multiple times with the same length, all branches will be
predicted; there are no loops.

for larger runs, there are likely faster approaches than "rep", at
least on some cpu models. for 32-bit, it's unlikely that there is any
faster approach that does not require non-baseline instructions; doing
anything fancier would require inspecting cpu capabilities. for
64-bit, there may very well be faster versions that work on all
models; further optimization could be explored in the future.

with these changes, memset is anywhere between 50% faster and 6 times
faster, depending on the cpu model and the length and alignment of the
destination buffer.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the concept of both versions is the same; they differ only in details.
for long runs, they use "rep movsl" or "rep movsq", and for small
runs, they use a trick, writing from both ends towards the middle,
that reduces the number of branches needed. in addition, if memset is
called multiple times with the same length, all branches will be
predicted; there are no loops.

for larger runs, there are likely faster approaches than "rep", at
least on some cpu models. for 32-bit, it's unlikely that there is any
faster approach that does not require non-baseline instructions; doing
anything fancier would require inspecting cpu capabilities. for
64-bit, there may very well be faster versions that work on all
models; further optimization could be explored in the future.

with these changes, memset is anywhere between 50% faster and 6 times
faster, depending on the cpu model and the length and alignment of the
destination buffer.
</pre>
</div>
</content>
</entry>
</feed>
