<feed xmlns='http://www.w3.org/2005/Atom'>
<title>musl/src/malloc, branch v1.1.16</title>
<subtitle>musl - an implementation of the standard library for Linux-based systems</subtitle>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/'/>
<entry>
<title>use lookup table for malloc bin index instead of float conversion</title>
<updated>2016-12-17T23:16:43+00:00</updated>
<author>
<name>Szabolcs Nagy</name>
<email>nsz@port70.net</email>
</author>
<published>2016-12-17T14:03:24+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=61ff1af76f4887bb7c555e4d0b8a7eeb73b05086'/>
<id>61ff1af76f4887bb7c555e4d0b8a7eeb73b05086</id>
<content type='text'>
float conversion is slow and big on soft-float targets.

The lookup table increases code size a bit on most hard float targets
(and adds 60byte rodata), performance can be a bit slower because of
position independent data access and cpu internal state dependence
(cache, extra branches), but the overall effect should be minimal
(common, small size allocations should be unaffected).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
float conversion is slow and big on soft-float targets.

The lookup table increases code size a bit on most hard float targets
(and adds 60byte rodata), performance can be a bit slower because of
position independent data access and cpu internal state dependence
(cache, extra branches), but the overall effect should be minimal
(common, small size allocations should be unaffected).
</pre>
</div>
</content>
</entry>
<entry>
<title>fix malloc_usable_size for NULL input</title>
<updated>2016-01-31T22:34:45+00:00</updated>
<author>
<name>Szabolcs Nagy</name>
<email>nsz@port70.net</email>
</author>
<published>2016-01-31T16:31:03+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=d1507646975cbf6c3e511ba07b193f27f032d108'/>
<id>d1507646975cbf6c3e511ba07b193f27f032d108</id>
<content type='text'>
the linux man page specifies malloc_usable_size(0) to return 0 and
this is the semantics other implementations follow (jemalloc).
reported by Alexander Monakov.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the linux man page specifies malloc_usable_size(0) to return 0 and
this is the semantics other implementations follow (jemalloc).
reported by Alexander Monakov.
</pre>
</div>
</content>
</entry>
<entry>
<title>remove external linkage from __simple_malloc definition</title>
<updated>2015-11-05T02:41:29+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-11-05T02:41:29+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=918b1c1d177b5e3cf22a8aae4a01776495fdc3bc'/>
<id>918b1c1d177b5e3cf22a8aae4a01776495fdc3bc</id>
<content type='text'>
this function is used only as a weak definition for malloc, for static
linking in programs which do not call realloc or free. since it had
external linkage and was thereby exported in libc.so's dynamic symbol
table, --gc-sections was unable to drop it. this was merely an
oversight; there's no reason for it to be external, so make it static.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this function is used only as a weak definition for malloc, for static
linking in programs which do not call realloc or free. since it had
external linkage and was thereby exported in libc.so's dynamic symbol
table, --gc-sections was unable to drop it. this was merely an
oversight; there's no reason for it to be external, so make it static.
</pre>
</div>
</content>
</entry>
<entry>
<title>mitigate blow-up of heap size under malloc/free contention</title>
<updated>2015-08-07T19:19:49+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-08-07T19:19:49+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=c3761622e8168b0c6453637ac82e70b09af3e8e9'/>
<id>c3761622e8168b0c6453637ac82e70b09af3e8e9</id>
<content type='text'>
during calls to free, any free chunks adjacent to the chunk being
freed are momentarily held in allocated state for the purpose of
merging, possibly leaving little or no available free memory for other
threads to allocate. under this condition, other threads will attempt
to expand the heap rather than waiting to use memory that will soon be
available. the race window where this happens is normally very small,
but became huge when free chooses to use madvise to release unused
physical memory, causing unbounded heap size growth.

this patch drastically shrinks the race window for unwanted heap
expansion by performing madvise with the bin lock held and marking the
bin non-empty in the binmask before making the expensive madvise
syscall. testing by Timo Teräs has shown this approach to be a
suitable mitigation.

more invasive changes to the synchronization between malloc and free
would be needed to completely eliminate the problem. it's not clear
whether such changes would improve or worsen typical-case performance,
or whether this would be a worthwhile direction to take malloc
development.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
during calls to free, any free chunks adjacent to the chunk being
freed are momentarily held in allocated state for the purpose of
merging, possibly leaving little or no available free memory for other
threads to allocate. under this condition, other threads will attempt
to expand the heap rather than waiting to use memory that will soon be
available. the race window where this happens is normally very small,
but became huge when free chooses to use madvise to release unused
physical memory, causing unbounded heap size growth.

this patch drastically shrinks the race window for unwanted heap
expansion by performing madvise with the bin lock held and marking the
bin non-empty in the binmask before making the expensive madvise
syscall. testing by Timo Teräs has shown this approach to be a
suitable mitigation.

more invasive changes to the synchronization between malloc and free
would be needed to completely eliminate the problem. it's not clear
whether such changes would improve or worsen typical-case performance,
or whether this would be a worthwhile direction to take malloc
development.
</pre>
</div>
</content>
</entry>
<entry>
<title>fix regression/typo that disabled __simple_malloc when calloc is used</title>
<updated>2015-06-22T20:33:28+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-06-22T20:33:28+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=153e952e1a688859d7095345b17e6c1df74a295c'/>
<id>153e952e1a688859d7095345b17e6c1df74a295c</id>
<content type='text'>
commit ba819787ee93ceae94efd274f7849e317c1bff58 introduced this
regression. since the __malloc0 weak alias was not properly provided
by __simple_malloc, use of calloc forced the full malloc to be linked.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ba819787ee93ceae94efd274f7849e317c1bff58 introduced this
regression. since the __malloc0 weak alias was not properly provided
by __simple_malloc, use of calloc forced the full malloc to be linked.
</pre>
</div>
</content>
</entry>
<entry>
<title>fix calloc when __simple_malloc implementation is used</title>
<updated>2015-06-22T18:50:09+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-06-22T18:50:09+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=ba819787ee93ceae94efd274f7849e317c1bff58'/>
<id>ba819787ee93ceae94efd274f7849e317c1bff58</id>
<content type='text'>
previously, calloc's implementation encoded assumptions about the
implementation of malloc, accessing a size_t word just prior to the
allocated memory to determine if it was obtained by mmap to optimize
out the zero-filling. when __simple_malloc is used (static linking a
program with no realloc/free), it doesn't matter if the result of this
check is wrong, since all allocations are zero-initialized anyway. but
the access could be invalid if it crosses a page boundary or if the
pointer is not sufficiently aligned, which can happen for very small
allocations.

this patch fixes the issue by moving the zero-fill logic into malloc.c
with the full malloc, as a new function named __malloc0, which is
provided by a weak alias to __simple_malloc (which always gives
zero-filled memory) when the full malloc is not in use.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
previously, calloc's implementation encoded assumptions about the
implementation of malloc, accessing a size_t word just prior to the
allocated memory to determine if it was obtained by mmap to optimize
out the zero-filling. when __simple_malloc is used (static linking a
program with no realloc/free), it doesn't matter if the result of this
check is wrong, since all allocations are zero-initialized anyway. but
the access could be invalid if it crosses a page boundary or if the
pointer is not sufficiently aligned, which can happen for very small
allocations.

this patch fixes the issue by moving the zero-fill logic into malloc.c
with the full malloc, as a new function named __malloc0, which is
provided by a weak alias to __simple_malloc (which always gives
zero-filled memory) when the full malloc is not in use.
</pre>
</div>
</content>
</entry>
<entry>
<title>refactor malloc's expand_heap to share with __simple_malloc</title>
<updated>2015-06-14T01:59:02+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-06-14T01:59:02+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=e3bc22f1eff87b8f029a6ab31f1a269d69e4b053'/>
<id>e3bc22f1eff87b8f029a6ab31f1a269d69e4b053</id>
<content type='text'>
this extends the brk/stack collision protection added to full malloc
in commit 276904c2f6bde3a31a24ebfa201482601d18b4f9 to also protect the
__simple_malloc function used in static-linked programs that don't
reference the free function.

it also extends support for using mmap when brk fails, which full
malloc got in commit 5446303328adf4b4e36d9fba21848e6feb55fab4, to
__simple_malloc.

since __simple_malloc may expand the heap by arbitrarily large
increments, the stack collision detection is enhanced to detect
interval overlap rather than just proximity of a single address to the
stack. code size is increased a bit, but this is partly offset by the
sharing of code between the two malloc implementations, which due to
linking semantics, both get linked in a program that needs the full
malloc with realloc/free support.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this extends the brk/stack collision protection added to full malloc
in commit 276904c2f6bde3a31a24ebfa201482601d18b4f9 to also protect the
__simple_malloc function used in static-linked programs that don't
reference the free function.

it also extends support for using mmap when brk fails, which full
malloc got in commit 5446303328adf4b4e36d9fba21848e6feb55fab4, to
__simple_malloc.

since __simple_malloc may expand the heap by arbitrarily large
increments, the stack collision detection is enhanced to detect
interval overlap rather than just proximity of a single address to the
stack. code size is increased a bit, but this is partly offset by the
sharing of code between the two malloc implementations, which due to
linking semantics, both get linked in a program that needs the full
malloc with realloc/free support.
</pre>
</div>
</content>
</entry>
<entry>
<title>in malloc, refuse to use brk if it grows into stack</title>
<updated>2015-06-09T21:31:55+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-06-09T20:30:35+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=276904c2f6bde3a31a24ebfa201482601d18b4f9'/>
<id>276904c2f6bde3a31a24ebfa201482601d18b4f9</id>
<content type='text'>
the linux/nommu fdpic ELF loader sets up the brk range to overlap
entirely with the main thread's stack (but growing from opposite
ends), so that the resulting failure mode for malloc is not to return
a null pointer but to start returning pointers to memory that overlaps
with the caller's stack. needless to say this extremely dangerous and
makes brk unusable.

since it's non-trivial to detect execution environments that might be
affected by this kernel bug, and since the severity of the bug makes
any sort of detection that might yield false-negatives unsafe, we
instead check the proximity of the brk to the stack pointer each time
the brk is to be expanded. both the main thread's stack (where the
real known risk lies) and the calling thread's stack are checked. an
arbitrary gap distance of 8 MB is imposed, chosen to be larger than
linux default main-thread stack reservation sizes and larger than any
reasonable stack configuration on nommu.

the effeciveness of this patch relies on an assumption that the amount
by which the brk is being grown is smaller than the gap limit, which
is always true for malloc's use of brk. reliance on this assumption is
why the check is being done in malloc-specific code and not in __brk.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the linux/nommu fdpic ELF loader sets up the brk range to overlap
entirely with the main thread's stack (but growing from opposite
ends), so that the resulting failure mode for malloc is not to return
a null pointer but to start returning pointers to memory that overlaps
with the caller's stack. needless to say this extremely dangerous and
makes brk unusable.

since it's non-trivial to detect execution environments that might be
affected by this kernel bug, and since the severity of the bug makes
any sort of detection that might yield false-negatives unsafe, we
instead check the proximity of the brk to the stack pointer each time
the brk is to be expanded. both the main thread's stack (where the
real known risk lies) and the calling thread's stack are checked. an
arbitrary gap distance of 8 MB is imposed, chosen to be larger than
linux default main-thread stack reservation sizes and larger than any
reasonable stack configuration on nommu.

the effeciveness of this patch relies on an assumption that the amount
by which the brk is being grown is smaller than the gap limit, which
is always true for malloc's use of brk. reliance on this assumption is
why the check is being done in malloc-specific code and not in __brk.
</pre>
</div>
</content>
</entry>
<entry>
<title>remove useless check of bin match in malloc</title>
<updated>2015-03-04T15:48:00+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-03-04T15:48:00+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=064898cfe2233526e7639c21e780695be5ece257'/>
<id>064898cfe2233526e7639c21e780695be5ece257</id>
<content type='text'>
this re-check idiom seems to have been copied from the alloc_fwd and
alloc_rev functions, which guess a bin based on non-synchronized
memory access to adjacent chunk headers then need to confirm, after
locking the bin, that the chunk is actually in the bin they locked.

the check being removed, however, was being performed on a chunk
obtained from the already-locked bin. there is no race to account for
here; the check could only fail in the event of corrupt free lists,
and even then it would not catch them but simply continue running.

since the bin_index function is mildly expensive, it seems preferable
to remove the check rather than trying to convert it into a useful
consistency check. casual testing shows a 1-5% reduction in run time.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this re-check idiom seems to have been copied from the alloc_fwd and
alloc_rev functions, which guess a bin based on non-synchronized
memory access to adjacent chunk headers then need to confirm, after
locking the bin, that the chunk is actually in the bin they locked.

the check being removed, however, was being performed on a chunk
obtained from the already-locked bin. there is no race to account for
here; the check could only fail in the event of corrupt free lists,
and even then it would not catch them but simply continue running.

since the bin_index function is mildly expensive, it seems preferable
to remove the check rather than trying to convert it into a useful
consistency check. casual testing shows a 1-5% reduction in run time.
</pre>
</div>
</content>
</entry>
<entry>
<title>fix init race that could lead to deadlock in malloc init code</title>
<updated>2015-03-04T14:29:39+00:00</updated>
<author>
<name>Rich Felker</name>
<email>dalias@aerifal.cx</email>
</author>
<published>2015-03-04T14:29:39+00:00</published>
<link rel='alternate' type='text/html' href='http://git.musl-libc.org/cgit/musl/commit/?id=7a81fe3710be0128d29071e76c5acbea3d84277b'/>
<id>7a81fe3710be0128d29071e76c5acbea3d84277b</id>
<content type='text'>
the malloc init code provided its own version of pthread_once type
logic, including the exact same bug that was fixed in pthread_once in
commit 0d0c2f40344640a2a6942dda156509593f51db5d.

since this code is called adjacent to expand_heap, which takes a lock,
there is no reason to have pthread_once-type initialization. simply
moving the init code into the interval where expand_heap already holds
its lock on the brk achieves the same result with much less
synchronization logic, and allows the buggy code to be eliminated
rather than just fixed.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the malloc init code provided its own version of pthread_once type
logic, including the exact same bug that was fixed in pthread_once in
commit 0d0c2f40344640a2a6942dda156509593f51db5d.

since this code is called adjacent to expand_heap, which takes a lock,
there is no reason to have pthread_once-type initialization. simply
moving the init code into the interval where expand_heap already holds
its lock on the brk achieves the same result with much less
synchronization logic, and allows the buggy code to be eliminated
rather than just fixed.
</pre>
</div>
</content>
</entry>
</feed>
