<feed xmlns='http://www.w3.org/2005/Atom'>
<title>musl/src/string, branch v1.1.8</title>
<subtitle>musl - an implementation of the standard library for Linux-based systems</subtitle>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/'/>
<entry>
<title>overhaul optimized x86_64 memset asm</title>
<updated>2015-02-26T07:07:08+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-02-26T07:07:08+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=e346ff86c8faee901a7c2d502b5beb983b99f972'/>
<id>e346ff86c8faee901a7c2d502b5beb983b99f972</id>
<content type='text'>
on most cpu models, "rep stosq" has high overhead that makes it
undesirable for small memset sizes. the new code extends the
minimal-branch fast path for short memsets from size 15 up to size
126, and shrink-wraps this code path. in addition, "rep stosq" is
sensitive to misalignment. the cost varies with size and with cpu
model, but it has been observed performing 1.5 times slower when the
destination address is not aligned mod 16. the new code thus ensures
alignment mod 16, but also preserves any existing additional
alignment, in case there are cpu models where it is beneficial.

this version is based in part on changes proposed by Denys Vlasenko.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
on most cpu models, "rep stosq" has high overhead that makes it
undesirable for small memset sizes. the new code extends the
minimal-branch fast path for short memsets from size 15 up to size
126, and shrink-wraps this code path. in addition, "rep stosq" is
sensitive to misalignment. the cost varies with size and with cpu
model, but it has been observed performing 1.5 times slower when the
destination address is not aligned mod 16. the new code thus ensures
alignment mod 16, but also preserves any existing additional
alignment, in case there are cpu models where it is beneficial.

this version is based in part on changes proposed by Denys Vlasenko.
</pre>
</div>
</content>
</entry>
<entry>
<title>overhaul optimized i386 memset asm</title>
<updated>2015-02-26T06:51:39+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-02-26T06:51:39+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=69858fa93107aa7485b143c54137e745a7b7ad72'/>
<id>69858fa93107aa7485b143c54137e745a7b7ad72</id>
<content type='text'>
on most cpu models, "rep stosl" has high overhead that makes it
undesirable for small memset sizes. the new code extends the
minimal-branch fast path for short memsets from size 15 up to size 62,
and shrink-wraps this code path. in addition, "rep stosl" is very
sensitive to misalignment. the cost varies with size and with cpu
model, but it has been observed performing 1.5 to 4 times slower when
the destination address is not aligned mod 16. the new code thus
ensures alignment mod 16, but also preserves any existing additional
alignment, in case there are cpu models where it is beneficial.

this version is based in part on changes to the x86_64 memset asm
proposed by Denys Vlasenko.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
on most cpu models, "rep stosl" has high overhead that makes it
undesirable for small memset sizes. the new code extends the
minimal-branch fast path for short memsets from size 15 up to size 62,
and shrink-wraps this code path. in addition, "rep stosl" is very
sensitive to misalignment. the cost varies with size and with cpu
model, but it has been observed performing 1.5 to 4 times slower when
the destination address is not aligned mod 16. the new code thus
ensures alignment mod 16, but also preserves any existing additional
alignment, in case there are cpu models where it is beneficial.

this version is based in part on changes to the x86_64 memset asm
proposed by Denys Vlasenko.
</pre>
</div>
</content>
</entry>
<entry>
<title>x86_64/memset: avoid performing final store twice</title>
<updated>2015-02-10T23:54:27+00:00</updated>
<author>
<name>Denys Vlasenko</name>
<email>vda.linux@googlemail.com</email>
</author>
<published>2015-02-10T17:30:57+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=74e334dcd177b585c64ddafa732a3dc9e3f6b5ec'/>
<id>74e334dcd177b585c64ddafa732a3dc9e3f6b5ec</id>
<content type='text'>
The code does a potentially misaligned 8-byte store to fill the tail
of the buffer. Then it fills the initial part of the buffer
which is a multiple of 8 bytes.
Therefore, if size is divisible by 8, we were storing last word twice.

This patch decrements byte count before dividing it by 8,
making one less store in "size is divisible by 8" case,
and not changing anything in all other cases.
All at the cost of replacing one MOV insn with LEA insn.

Signed-off-by: Denys Vlasenko &lt;vda.linux@googlemail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The code does a potentially misaligned 8-byte store to fill the tail
of the buffer. Then it fills the initial part of the buffer
which is a multiple of 8 bytes.
Therefore, if size is divisible by 8, we were storing last word twice.

This patch decrements byte count before dividing it by 8,
making one less store in "size is divisible by 8" case,
and not changing anything in all other cases.
All at the cost of replacing one MOV insn with LEA insn.

Signed-off-by: Denys Vlasenko &lt;vda.linux@googlemail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86_64/memset: simple optimizations</title>
<updated>2015-02-10T23:53:31+00:00</updated>
<author>
<name>Denys Vlasenko</name>
<email>vda.linux@googlemail.com</email>
</author>
<published>2015-02-10T17:30:56+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=bf2071eda32528ee8b0bb89544152646684a2cf3'/>
<id>bf2071eda32528ee8b0bb89544152646684a2cf3</id>
<content type='text'>
"and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use
4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead.

64-bit imul is slow, move it as far up as possible so that the result
(rax) has more time to be ready by the time we start using it
in mem stores.

There is no need to shuffle registers in preparation to "rep movs"
if we are not going to take that code path. Thus, patch moves
"jump if len &lt; 16" instructions up, and changes alternate code path
to use rdx and rdi instead of rcx and r8.

Signed-off-by: Denys Vlasenko &lt;vda.linux@googlemail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
"and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use
4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead.

64-bit imul is slow, move it as far up as possible so that the result
(rax) has more time to be ready by the time we start using it
in mem stores.

There is no need to shuffle registers in preparation to "rep movs"
if we are not going to take that code path. Thus, patch moves
"jump if len &lt; 16" instructions up, and changes alternate code path
to use rdx and rdi instead of rcx and r8.

Signed-off-by: Denys Vlasenko &lt;vda.linux@googlemail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fix tabs/spaces in memcpy.s</title>
<updated>2014-11-23T19:33:01+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-11-23T19:33:01+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=9911754b198aaf1e2b0e98951766a9d83c277c67'/>
<id>9911754b198aaf1e2b0e98951766a9d83c277c67</id>
<content type='text'>
this file had been a mess that went unnoticed ever since it was
imported. some lines used spaces for indention while others used tabs,
and tabs were used for alignment.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this file had been a mess that went unnoticed ever since it was
imported. some lines used spaces for indention while others used tabs,
and tabs were used for alignment.
</pre>
</div>
</content>
</entry>
<entry>
<title>fix build regression in arm asm for memcpy</title>
<updated>2014-11-23T19:12:14+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-11-23T19:12:14+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=9367fe926196f407705bb07cd29c6e40eb1774dd'/>
<id>9367fe926196f407705bb07cd29c6e40eb1774dd</id>
<content type='text'>
commit 27828f7e9adb6b4f93ca56f6f98ef4c44bb5ed4e fixed compatibility
with clang's internal assembler, but broke compatibility with gas and
the traditional arm asm syntax by switching to the arm "unified
assembler language" (UAL). recent versions of gas also support UAL,
but require the .syntax directive to be used to switch to it. clang on
the other hand defaults to UAL. and old versions of gas (still
relevant) don't support UAL at all.

for the conditional ldm/stm instructions, "ia" is default and can just
be omitted, resulting in a mnemonic that's compatible with both
traditional and UAL syntax. but for byte/halfword loads and stores,
there seems to be no mnemonic compatible with both, and thus .word is
used to produce the desired opcode explicitly. the .inst directive is
not used because it is not compatible with older assemblers.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 27828f7e9adb6b4f93ca56f6f98ef4c44bb5ed4e fixed compatibility
with clang's internal assembler, but broke compatibility with gas and
the traditional arm asm syntax by switching to the arm "unified
assembler language" (UAL). recent versions of gas also support UAL,
but require the .syntax directive to be used to switch to it. clang on
the other hand defaults to UAL. and old versions of gas (still
relevant) don't support UAL at all.

for the conditional ldm/stm instructions, "ia" is default and can just
be omitted, resulting in a mnemonic that's compatible with both
traditional and UAL syntax. but for byte/halfword loads and stores,
there seems to be no mnemonic compatible with both, and thus .word is
used to produce the desired opcode explicitly. the .inst directive is
not used because it is not compatible with older assemblers.
</pre>
</div>
</content>
</entry>
<entry>
<title>arm assembly changes for clang compatibility</title>
<updated>2014-11-23T17:03:34+00:00</updated>
<author>
<name>Joakim Sindholt</name>
<email>opensource@zhasha.com</email>
</author>
<published>2014-11-06T17:57:56+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=27828f7e9adb6b4f93ca56f6f98ef4c44bb5ed4e'/>
<id>27828f7e9adb6b4f93ca56f6f98ef4c44bb5ed4e</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>fix handling of odd lengths in swab function</title>
<updated>2014-10-04T15:14:01+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-10-04T15:14:01+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=dccbf4c809efc311aa37da71de70d04dfd8b0db3'/>
<id>dccbf4c809efc311aa37da71de70d04dfd8b0db3</id>
<content type='text'>
this function is specified to leave the last byte with "unspecified
disposition" when the length is odd, so for the most part correct
programs should not be calling swab with odd lengths. however, doing
so is permitted, and should not write past the end of the destination
buffer.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this function is specified to leave the last byte with "unspecified
disposition" when the length is odd, so for the most part correct
programs should not be calling swab with odd lengths. however, doing
so is permitted, and should not write past the end of the destination
buffer.
</pre>
</div>
</content>
</entry>
<entry>
<title>add support for LC_TIME and LC_MESSAGES translations</title>
<updated>2014-07-26T09:36:25+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-07-26T09:36:25+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=c5b8f1930512d206a7c1cf1093a4a47e1722a414'/>
<id>c5b8f1930512d206a7c1cf1093a4a47e1722a414</id>
<content type='text'>
for LC_MESSAGES, translation of strerror and similar literal message
functions is supported. for messages in other places (particularly the
dynamic linker) that use format strings, translation is not yet
supported. in order to make it possible and safe, such messages will
need to be refactored to separate the textual content from the format.

for LC_TIME, the day and month names and strftime-style format strings
provided by nl_langinfo are supported for translation. however there
may be limitations, as some of the original C-locale nl_langinfo
strings are non-unique and thus perhaps non-suitable as keys.

overall, the locale support activated by this commit should not be
seen as complete and polished but as a basis for beginning to test
locale functionality and implement locales.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
for LC_MESSAGES, translation of strerror and similar literal message
functions is supported. for messages in other places (particularly the
dynamic linker) that use format strings, translation is not yet
supported. in order to make it possible and safe, such messages will
need to be refactored to separate the textual content from the format.

for LC_TIME, the day and month names and strftime-style format strings
provided by nl_langinfo are supported for translation. however there
may be limitations, as some of the original C-locale nl_langinfo
strings are non-unique and thus perhaps non-suitable as keys.

overall, the locale support activated by this commit should not be
seen as complete and polished but as a basis for beginning to test
locale functionality and implement locales.
</pre>
</div>
</content>
</entry>
<entry>
<title>consolidate str[n]casecmp_l into str[n]casecmp source files</title>
<updated>2014-07-03T01:38:54+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2014-07-03T01:38:54+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=7424ac58b1f47adb03de55de5998c530aee91551'/>
<id>7424ac58b1f47adb03de55de5998c530aee91551</id>
<content type='text'>
this is mainly done for consistency with the ctype functions and to
declutter the src/locale directory.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this is mainly done for consistency with the ctype functions and to
declutter the src/locale directory.
</pre>
</div>
</content>
</entry>
</feed>
