|Age||Commit message (Collapse)||Author||Lines|
commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732 included checks to
make calloc fallback to memset if used with a replaced malloc that
didn't also replace calloc, and the memalign family fail if free has
been replaced. however, the checks gave false positives for
replacement whenever malloc or free resolved to a PLT entry in the
for now, disable the checks so as not to leave libc in a broken state.
this means that the properties documented in the above commit are no
longer satisfied; failure to replace calloc and the memalign family
along with malloc is unsafe if they are ever called.
the calloc checks were correct but useless for static linking. in both
cases (simple or full malloc), calloc and malloc are in a source file
together, so replacement of one but not the other would give linking
errors. the memalign-family check was useful for static linking, but
broken for dynamic as described above, and can be replaced with a
better link-time check.
replacement is subject to conditions on the replacement functions.
they may only call functions which are async-signal-safe, as specified
either by POSIX or as an implementation-defined extension. if any
allocator functions are replaced, at least malloc, realloc, and free
must be provided. if calloc is not provided, it will behave as
malloc+memset. any of the memalign-family functions not provided will
fail with ENOMEM.
in order to implement the above properties, calloc and __memalign
check that they are using their own malloc or free, respectively.
choice to check malloc or free is based on considerations of
supporting __simple_malloc. in order to make this work, calloc is
split into separate versions for __simple_malloc and full malloc;
commit ba819787ee93ceae94efd274f7849e317c1bff58 already did most of
the split anyway, and completing it saves an extra call frame.
previously, use of -Bsymbolic-functions made dynamic interposition
impossible. now, we are using an explicit dynamic-list, so add
allocator functions to the list. most are not referenced anyway, but
all are added for completeness.
In all cases this is just a change from two volatile int to one.
this function is used only as a weak definition for malloc, for static
linking in programs which do not call realloc or free. since it had
external linkage and was thereby exported in libc.so's dynamic symbol
table, --gc-sections was unable to drop it. this was merely an
oversight; there's no reason for it to be external, so make it static.
commit ba819787ee93ceae94efd274f7849e317c1bff58 introduced this
regression. since the __malloc0 weak alias was not properly provided
by __simple_malloc, use of calloc forced the full malloc to be linked.
previously, calloc's implementation encoded assumptions about the
implementation of malloc, accessing a size_t word just prior to the
allocated memory to determine if it was obtained by mmap to optimize
out the zero-filling. when __simple_malloc is used (static linking a
program with no realloc/free), it doesn't matter if the result of this
check is wrong, since all allocations are zero-initialized anyway. but
the access could be invalid if it crosses a page boundary or if the
pointer is not sufficiently aligned, which can happen for very small
this patch fixes the issue by moving the zero-fill logic into malloc.c
with the full malloc, as a new function named __malloc0, which is
provided by a weak alias to __simple_malloc (which always gives
zero-filled memory) when the full malloc is not in use.
this extends the brk/stack collision protection added to full malloc
in commit 276904c2f6bde3a31a24ebfa201482601d18b4f9 to also protect the
__simple_malloc function used in static-linked programs that don't
reference the free function.
it also extends support for using mmap when brk fails, which full
malloc got in commit 5446303328adf4b4e36d9fba21848e6feb55fab4, to
since __simple_malloc may expand the heap by arbitrarily large
increments, the stack collision detection is enhanced to detect
interval overlap rather than just proximity of a single address to the
stack. code size is increased a bit, but this is partly offset by the
sharing of code between the two malloc implementations, which due to
linking semantics, both get linked in a program that needs the full
malloc with realloc/free support.
the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
i did some testing trying to switch malloc to use the new internal
lock with priority inheritance, and my malloc contention test got
20-100 times slower. if priority inheritance futexes are this slow,
it's simply too high a price to pay for avoiding priority inversion.
maybe we can consider them somewhere down the road once the kernel
folks get their act together on this (and perferably don't link it to
glibc's inefficient lock API)...
as such, i've switch __lock to use malloc's implementation of
lightweight locks, and updated all the users of the code to use an
array with a waiter count for their locks. this should give optimal
performance in the vast majority of cases, and it's simple.
malloc is still using its own internal copy of the lock code because
it seems to yield measurably better performance with -O3 when it's
inlined (20% or more difference in the contention stress test).
why does this affect behavior? well, the linker seems to traverse
archive files starting from its current position when resolving
symbols. since calloc.c comes alphabetically (and thus in sequence in
the archive file) between __simple_malloc.c and malloc.c, attempts to
resolve the "malloc" symbol for use by calloc.c were pulling in the
full malloc.c implementation rather than the __simple_malloc.c
as of now, lite_malloc.c and malloc.c are adjacent in the archive and
in the correct order, so malloc.c should never be used to resolve
"malloc" unless it's already needed to resolve another symbol ("free"