<feed xmlns='http://www.w3.org/2005/Atom'>
<title>musl/src/string, branch v0.9.13</title>
<subtitle>musl - an implementation of the standard library for Linux-based systems</subtitle>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/'/>
<entry>
<title>optimized C memcpy</title>
<updated>2013-08-28T07:34:57+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-08-28T07:34:57+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=90edf1cc15cec685c18ec2485ddce5b655963464'/>
<id>90edf1cc15cec685c18ec2485ddce5b655963464</id>
<content type='text'>
unlike the old C memcpy, this version handles word-at-a-time reads and
writes even for misaligned copies. it does not require that the cpu
support misaligned accesses; instead, it performs bit shifts to
realign the bytes for the destination.

essentially, this is the C version of the ARM assembly language
memcpy. the ideas are all the same, and it should perform well on any
arch with a decent number of general-purpose registers that has a
barrel shift operation. since the barrel shifter is an optional cpu
feature on microblaze, it may be desirable to provide an alternate asm
implementation on microblaze, but otherwise the C code provides a
competitive implementation for "generic risc-y" cpu archs that should
alleviate the urgent need for arch-specific memcpy asm.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
unlike the old C memcpy, this version handles word-at-a-time reads and
writes even for misaligned copies. it does not require that the cpu
support misaligned accesses; instead, it performs bit shifts to
realign the bytes for the destination.

essentially, this is the C version of the ARM assembly language
memcpy. the ideas are all the same, and it should perform well on any
arch with a decent number of general-purpose registers that has a
barrel shift operation. since the barrel shifter is an optional cpu
feature on microblaze, it may be desirable to provide an alternate asm
implementation on microblaze, but otherwise the C code provides a
competitive implementation for "generic risc-y" cpu archs that should
alleviate the urgent need for arch-specific memcpy asm.
</pre>
</div>
</content>
</entry>
<entry>
<title>optimized C memset</title>
<updated>2013-08-27T22:08:29+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-08-27T22:08:29+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=a543369e3b06a51eacd392c738fc10c5267a195f'/>
<id>a543369e3b06a51eacd392c738fc10c5267a195f</id>
<content type='text'>
this version of memset is optimized both for small and large values of
n, and makes no misaligned writes, so it is usable (and near-optimal)
on all archs. it is capable of filling up to 52 or 56 bytes without
entering a loop and with at most 7 branches, all of which can be fully
predicted if memset is called multiple times with the same size.

it also uses the attribute extension to inform the compiler that it is
violating the aliasing rules, unlike the previous code which simply
assumed it was safe to violate the aliasing rules since translation
unit boundaries hide the violations from the compiler. for non-GNUC
compilers, 100% portable fallback code in the form of a naive loop is
provided. I intend to eventually apply this approach to all of the
string/memory functions which are doing word-at-a-time accesses.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this version of memset is optimized both for small and large values of
n, and makes no misaligned writes, so it is usable (and near-optimal)
on all archs. it is capable of filling up to 52 or 56 bytes without
entering a loop and with at most 7 branches, all of which can be fully
predicted if memset is called multiple times with the same size.

it also uses the attribute extension to inform the compiler that it is
violating the aliasing rules, unlike the previous code which simply
assumed it was safe to violate the aliasing rules since translation
unit boundaries hide the violations from the compiler. for non-GNUC
compilers, 100% portable fallback code in the form of a naive loop is
provided. I intend to eventually apply this approach to all of the
string/memory functions which are doing word-at-a-time accesses.
</pre>
</div>
</content>
</entry>
<entry>
<title>add arm-optimized memcpy implementation from bionic libc</title>
<updated>2013-08-14T07:06:21+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-08-14T07:06:21+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=cccc1844be95549e5b6c91ffc1f2c2ba3d3aab16'/>
<id>cccc1844be95549e5b6c91ffc1f2c2ba3d3aab16</id>
<content type='text'>
the approach of this implementation was heavily investigated prior to
adopting it. attempts to obtain similar performance with pure C code
were capping out at about 75% of the performance of the asm, with
considerably larger code size, and were fragile in that the compiler
would sometimes compile part of memcpy into a call to itself.
therefore, just using the asm seems to be the best option.

this commit is the first to make use of the new subarch-specific asm
framework. the new armel directory is the location for arm asm that
should not be used for all arm subarchs, only the default one. armhf
is the name of the little-endian hardfloat-ABI subarch, which can use
the exact same asm. in both cases, the build system finds the asm by
following a memcpy.sub file.

the other two subarchs, armeb and armebhf, would need a big-endian
variant of this code. it would not be hard to adapt the code to big
endian, but I will hold off on doing so until there is demand for it.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the approach of this implementation was heavily investigated prior to
adopting it. attempts to obtain similar performance with pure C code
were capping out at about 75% of the performance of the asm, with
considerably larger code size, and were fragile in that the compiler
would sometimes compile part of memcpy into a call to itself.
therefore, just using the asm seems to be the best option.

this commit is the first to make use of the new subarch-specific asm
framework. the new armel directory is the location for arm asm that
should not be used for all arm subarchs, only the default one. armhf
is the name of the little-endian hardfloat-ABI subarch, which can use
the exact same asm. in both cases, the build system finds the asm by
following a memcpy.sub file.

the other two subarchs, armeb and armebhf, would need a big-endian
variant of this code. it would not be hard to adapt the code to big
endian, but I will hold off on doing so until there is demand for it.
</pre>
</div>
</content>
</entry>
<entry>
<title>optimized memset asm for i386 and x86_64</title>
<updated>2013-08-02T01:44:43+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-08-02T01:44:43+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=926272ddffa293ee68ffeb01422fc8c792acf428'/>
<id>926272ddffa293ee68ffeb01422fc8c792acf428</id>
<content type='text'>
the concept of both versions is the same; they differ only in details.
for long runs, they use "rep movsl" or "rep movsq", and for small
runs, they use a trick, writing from both ends towards the middle,
that reduces the number of branches needed. in addition, if memset is
called multiple times with the same length, all branches will be
predicted; there are no loops.

for larger runs, there are likely faster approaches than "rep", at
least on some cpu models. for 32-bit, it's unlikely that there is any
faster approach that does not require non-baseline instructions; doing
anything fancier would require inspecting cpu capabilities. for
64-bit, there may very well be faster versions that work on all
models; further optimization could be explored in the future.

with these changes, memset is anywhere between 50% faster and 6 times
faster, depending on the cpu model and the length and alignment of the
destination buffer.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the concept of both versions is the same; they differ only in details.
for long runs, they use "rep movsl" or "rep movsq", and for small
runs, they use a trick, writing from both ends towards the middle,
that reduces the number of branches needed. in addition, if memset is
called multiple times with the same length, all branches will be
predicted; there are no loops.

for larger runs, there are likely faster approaches than "rep", at
least on some cpu models. for 32-bit, it's unlikely that there is any
faster approach that does not require non-baseline instructions; doing
anything fancier would require inspecting cpu capabilities. for
64-bit, there may very well be faster versions that work on all
models; further optimization could be explored in the future.

with these changes, memset is anywhere between 50% faster and 6 times
faster, depending on the cpu model and the length and alignment of the
destination buffer.
</pre>
</div>
</content>
</entry>
<entry>
<title>fix a couple misleading/wrong signal descriptions in strsignal</title>
<updated>2013-07-09T06:30:21+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-07-09T06:30:21+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=c713d8797804903b54203a645e023e2077c7556d'/>
<id>c713d8797804903b54203a645e023e2077c7556d</id>
<content type='text'>
there are still several more that are misleading, but SIGFPE (integer
division error misdescribed as floating point) and and SIGCHLD
(possibly non-exit status change events described as exiting) were the
worst offenders.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
there are still several more that are misleading, but SIGFPE (integer
division error misdescribed as floating point) and and SIGCHLD
(possibly non-exit status change events described as exiting) were the
worst offenders.
</pre>
</div>
</content>
</entry>
<entry>
<title>add realtime signals to strsignal</title>
<updated>2013-07-09T06:23:16+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-07-09T06:23:16+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=c90fa2ace73851d6af4946c239c90db91cad8abe'/>
<id>c90fa2ace73851d6af4946c239c90db91cad8abe</id>
<content type='text'>
the name format RTnn/RTnnn was chosen to minimized bloat while
uniquely identifying the signal.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the name format RTnn/RTnnn was chosen to minimized bloat while
uniquely identifying the signal.
</pre>
</div>
</content>
</entry>
<entry>
<title>fix off-by-one array bound in strsignal</title>
<updated>2013-07-09T06:11:52+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-07-09T06:11:52+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=8599822ee1d70a1e42e1d4b4962bc1d0bdf7e5ab'/>
<id>8599822ee1d70a1e42e1d4b4962bc1d0bdf7e5ab</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Add ABI compatability aliases.</title>
<updated>2013-04-06T06:20:28+00:00</updated>
<author>
<name>Isaac Dunham</name>
<email>idunham@lavabit.com</email>
</author>
<published>2013-04-06T06:20:28+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=14f0272ea1775c35801b2bc17e67ef8bb7e9742d'/>
<id>14f0272ea1775c35801b2bc17e67ef8bb7e9742d</id>
<content type='text'>
GNU used several extensions that were incompatible with C99 and POSIX,
so they used alternate names for the standard functions.

The result is that we need these to run standards-conformant programs
that were linked with glibc.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
GNU used several extensions that were incompatible with C99 and POSIX,
so they used alternate names for the standard functions.

The result is that we need these to run standards-conformant programs
that were linked with glibc.
</pre>
</div>
</content>
</entry>
<entry>
<title>fix integer type issue in strverscmp</title>
<updated>2013-02-26T06:42:11+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-02-26T06:42:11+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=5afc74fbaa2371f30df0dc9fb7bc3afe6bd96137'/>
<id>5afc74fbaa2371f30df0dc9fb7bc3afe6bd96137</id>
<content type='text'>
lenl-lenr is not a valid expression for a signed int return value from
strverscmp, since after implicit conversion from size_t to int this
difference could have the wrong sign or might even be zero. using the
difference for char values works since they're bounded well within the
range of differences representable by int, but it does not work for
size_t values.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
lenl-lenr is not a valid expression for a signed int return value from
strverscmp, since after implicit conversion from size_t to int this
difference could have the wrong sign or might even be zero. using the
difference for char values works since they're bounded well within the
range of differences representable by int, but it does not work for
size_t values.
</pre>
</div>
</content>
</entry>
<entry>
<title>implement non-stub strverscmp</title>
<updated>2013-02-26T06:36:47+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2013-02-26T06:36:47+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=4853c1f7f7b5023aa6a409abc1e759f5f92c9c4e'/>
<id>4853c1f7f7b5023aa6a409abc1e759f5f92c9c4e</id>
<content type='text'>
patch by Isaac Dunham.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
patch by Isaac Dunham.
</pre>
</div>
</content>
</entry>
</feed>
