|Age||Commit message (Collapse)||Author||Lines|
as an outcome of Austin Group issue #385, future versions of the
standard will require free not to alter the value of errno. save and
restore it individually around the calls to madvise and munmap so that
the cost is not imposed on calls to free that do not result in any
commit 8d37958d58cf36f53d5fcc7a8aa6d633da6071b2 inadvertently broke
oldmalloc by having it implement __libc_malloc rather than
as the outcome of Austin Group tracker issue #62, future editions of
POSIX have dropped the requirement that fork be AS-safe. this allows
but does not require implementations to synchronize fork with internal
locks and give forked children of multithreaded parents a partly or
fully unrestricted execution environment where they can continue to
use the standard library (per POSIX, they can only portably use
up until recently, taking this allowance did not seem desirable.
however, commit 8ed2bd8bfcb4ea6448afb55a941f4b5b2b0398c0 exposed the
extent to which applications and libraries are depending on the
ability to use malloc and other non-AS-safe interfaces in MT-forked
children, by converting latent very-low-probability catastrophic state
corruption into predictable deadlock. dealing with the fallout has
been a huge burden for users/distros.
while it looks like most of the non-portable usage in applications
could be fixed given sufficient effort, at least some of it seems to
occur in language runtimes which are exposing the ability to run
unrestricted code in the child as part of the contract with the
programmer. any attempt at fixing such contracts is not just a
technical problem but a social one, and is probably not tractable.
this patch extends the fork function to take locks for all libc
singletons in the parent, and release or reset those locks in the
child, so that when the underlying fork operation takes place, the
state protected by these locks is consistent and ready for the child
to use. locking is skipped in the case where the parent is
single-threaded so as not to interfere with legacy AS-safety property
of fork in single-threaded programs. lock order is mostly arbitrary,
but the malloc locks (including bump allocator in case it's used) must
be taken after the locks on any subsystems that might use malloc, and
non-AS-safe locks cannot be taken while the thread list lock is held,
imposing a requirement that it be taken last.
allowing the application to replace malloc (since commit
c9f415d7ea2dace5bf77f6518b6afc36bb7a5732) has brought multiple
headaches where it's used from various critical sections in libc
components. for example:
- the thread-local message buffers allocated for dlerror can't be
freed at thread exit time because application code would then run in
the context of a non-existant thread. this was handled in commit
aa5a9d15e09851f7b4a1668e9dbde0f6234abada by queuing them for free
- the dynamic linker has to be careful not to pass memory allocated at
early startup time (necessarily using its own malloc) to realloc or
free after redoing relocations with the application and all
libraries present. bugs in this area were fixed several times, at
least in commits 0c5c8f5da6e36fe4ab704bee0cd981837859e23f and
2f1f51ae7b2d78247568e7fdb8462f3c19e469a4 and possibly others.
- by calling the allocator from contexts where libc-internal locks are
held, we impose undocumented requirements on alternate malloc
implementations not to call into any libc function that might
attempt to take these locks; if they do, deadlock results.
- work to make fork of a multithreaded parent give the child an
unrestricted execution environment is blocked by lock order issues
as long as the application-provided allocator can be called with
libc-internal locks held.
these problems are all fixed by giving libc internals access to the
original, non-replaced allocator, for use where needed. it can't be
used everywhere, as some interfaces like str[n]dup, open_[w]memstream,
getline/getdelim, etc. are required to provide the called memory
obtained as if by (the public) malloc. and there are a number of libc
interfaces that are "pure library" code, not part of some internal
singleton, and where using the application's choice of malloc
implementation is preferable -- things like glob, regex, etc.
one might expect there to be significant cost to static-linked
programs, pulling in two malloc implementations, one of them
mostly-unused, if malloc is replaced. however, in almost all of the
places where malloc is used internally, care has been taken already
not to pull in realloc/free (i.e. to link with just the bump
allocator). this size optimization carries over automatically.
the newly-exposed internal allocator functions are obtained by
renaming the actual definitions, then adding new wrappers around them
with the public names. technically __libc_realloc and __libc_free
could be aliases rather than needing a layer of wrapper, but this
would almost surely break certain instrumentation (valgrind) and the
size and performance difference is negligible. __libc_calloc needs to
be handled specially since calloc is designed to work with either the
internal or the replaced malloc.
as a bonus, this change also eliminates the longstanding ugly
dependency of the static bump allocator on order of object files in
libc.a, by making it so there's only one definition of the malloc
function and having it in the same source file as the bump allocator.
otherwise, shrink in-place. as explained in the description of commit
3e16313f8fe2ed143ae0267fd79d63014c24779f, the split here is valid
without holding split_merge_lock because all chunks involved are in
the in-use state.
commit 3e16313f8fe2ed143ae0267fd79d63014c24779f introduced this bug by
making the copy case reachable with n (new size) smaller than n0
(original size). this was left as the only way of shrinking an
allocation because it reduces fragmentation if a free chunk of the
appropriate size is available. when that's not the case, another
approach may be better, but any such improvement would be independent
of fixing this bug.
a new weak predicate function replacable by the malloc implementation,
__malloc_allzerop, is introduced. by default it's always false; the
default version will be used when static linking if the bump allocator
was used (in which case performance doesn't matter) or if malloc was
replaced by the application. only if the real internal malloc is
linked (always the case with dynamic linking) does the real version
if malloc was replaced dynamically, as indicated by __malloc_replaced,
the predicate function is ignored and conditional-memset is always
it's not part of the malloc implementation but glue with musl dynamic
abstractly, calloc is completely malloc-implementation-independent;
it's malloc followed by memset, or as we do it, a "conditional memset"
that avoids touching fresh zero pages.
previously, calloc was kept separate for the bump allocator, which can
always skip memset, and the version of calloc provided with the full
malloc conditionally skipped the clearing for large direct-mmapped
allocations. the latter is a moderately attractive optimization, and
can be added back if needed. however, further consideration to make it
correct under malloc replacement would be needed.
commit b4b1e10364c8737a632be61582e05a8d3acf5690 documented the
contract for malloc replacement as allowing omission of calloc, and
indeed that worked for dynamic linking, but for static linking it was
possible to get the non-clearing definition from the bump allocator;
if not for that, it would have been a link error trying to pull in
the conditional-clearing code for the new common calloc is taken from
mal0_clear in oldmalloc, but drops the need to access actual page size
and just uses a fixed value of 4096. this avoids potentially needing
access to global data for the sake of an optimization that at best
marginally helps archs with offensively-large page sizes.
this sets the stage for replacement, and makes it practical to keep
oldmalloc around as a build option for a while if that ends up being
only the files which are actually part of the implementation are
moved. memalign and posix_memalign are entirely generic. in theory
calloc could be pulled out too, but it's useful to have it tied to the
implementation so as to optimize out unnecessary memset when
implementation details make it possible to know the memory is already