musl/src/malloc, branch v1.1.7

remove useless check of bin match in malloc

2015-03-04T15:48:00+00:00

this re-check idiom seems to have been copied from the alloc_fwd and
alloc_rev functions, which guess a bin based on non-synchronized
memory access to adjacent chunk headers then need to confirm, after
locking the bin, that the chunk is actually in the bin they locked.

the check being removed, however, was being performed on a chunk
obtained from the already-locked bin. there is no race to account for
here; the check could only fail in the event of corrupt free lists,
and even then it would not catch them but simply continue running.

since the bin_index function is mildly expensive, it seems preferable
to remove the check rather than trying to convert it into a useful
consistency check. casual testing shows a 1-5% reduction in run time.

fix init race that could lead to deadlock in malloc init code

2015-03-04T14:29:39+00:00

the malloc init code provided its own version of pthread_once type
logic, including the exact same bug that was fixed in pthread_once in
commit 0d0c2f40344640a2a6942dda156509593f51db5d.

since this code is called adjacent to expand_heap, which takes a lock,
there is no reason to have pthread_once-type initialization. simply
moving the init code into the interval where expand_heap already holds
its lock on the brk achieves the same result with much less
synchronization logic, and allows the buggy code to be eliminated
rather than just fixed.

make all objects used with atomic operations volatile

2015-03-04T03:50:02+00:00

the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.

with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.

in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.

add malloc_usable_size function and non-stub malloc.h

2014-08-26T02:47:27+00:00

this function is needed for some important practical applications of
ABI compatibility, and may be useful for supporting some non-portable
software at the source level too.

I was hesitant to add a function which imposes any constraints on
malloc internals; however, it turns out that any malloc implementation
which has realloc must already have an efficient way to determine the
size of existing allocations, so no additional constraint is imposed.

for now, some internal malloc definitions are duplicated in the new
source file. if/when malloc is refactored to put them in a shared
internal header file, these could be removed.

since malloc_usable_size is conventionally declared in malloc.h, the
empty stub version of this file was no longer suitable. it's updated
to provide the standard allocator functions, nonstandard ones (even if
stdlib.h would not expose them based on the feature test macros in
effect), and any malloc-extension functions provided (currently, only
malloc_usable_size).

avoid malloc failure for small requests when brk can't be extended

2014-04-02T21:57:15+00:00

this issue mainly affects PIE binaries and execution of programs via
direct invocation of the dynamic linker binary: depending on kernel
behavior, in these cases the initial brk may be placed at at location
where it cannot be extended, due to conflicting adjacent maps.

when brk fails, mmap is used instead to expand the heap. in order to
avoid expensive bookkeeping for managing fragmentation by merging
these new heap regions, the minimum size for new heap regions
increases exponentially in the number of regions. this limits the
number of regions, and thereby the number of fixed fragmentation
points, to a quantity which is logarithmic with respect to the size of
virtual address space and thus negligible. the exponential growth is
tuned so as to avoid expanding the heap by more than approximately 50%
of its current total size.

include cleanups: remove unused headers and add feature test macros

2013-12-12T05:09:18+00:00

slightly optimize __brk for size

2013-10-05T16:00:55+00:00

there is no reason to check the return value for setting errno, since
brk never returns errors, only the new value of the brk (which may be
the same as the old, or otherwise differ from the requested brk, on
failure).

it may be beneficial to eventually just eliminate this file and make
the syscalls inline in malloc.c.

fix failure of malloc to set errno on heap (brk) exhaustion

2013-10-05T15:59:21+00:00

I wrongly assumed the brk syscall would set errno, but on failure it
returns the old value of the brk rather than an error code.

fix potential deadlock bug in libc-internal locking logic

2013-09-20T06:00:27+00:00

if a multithreaded program became non-multithreaded (i.e. all other
threads exited) while one thread held an internal lock, the remaining
thread would fail to release the lock. the the program then became
multithreaded again at a later time, any further attempts to obtain
the lock would deadlock permanently.

the underlying cause is that the value of libc.threads_minus_1 at
unlock time might not match the value at lock time. one solution would
be returning a flag to the caller indicating whether the lock was
taken and needs to be unlocked, but there is a simpler solution: using
the lock itself as such a flag.

note that this flag is not needed anyway for correctness; if the lock
is not held, the unlock code is harmless. however, the memory
synchronization properties associated with a_store are costly on some
archs, so it's best to avoid executing the unlock code when it is
unnecessary.

remove redundant check in memalign

2013-07-24T03:40:26+00:00

the case where mem was already aligned is handled earlier in the
function now.