summaryrefslogtreecommitdiff
path: root/src/malloc
AgeCommit message (Collapse)AuthorLines
2022-10-19disable MADV_FREE usage in mallocngRich Felker-1/+3
the entire intent of using madvise/MADV_FREE on freed slots is to improve system performance by avoiding evicting cache of useful data, or swapping useless data to disk, by marking any whole pages in the freed slot as discardable by the kernel. in particular, unlike unmapping the memory or replacing it with a PROT_NONE region, use of MADV_FREE does not make any difference to memory accounting for commit charge purposes, and so does not increase the memory available to other processes in a non-overcommitted environment. however, various measurements have shown that inordinate amounts of time are spent performing madvise syscalls in processes which frequently allocate and free medium sized objects in the size range roughly between PAGESIZE and MMAP_THRESHOLD, to the point that the net effect is almost surely significant performance degredation. so, turn it off. the code, which has some nontrivial logic for efficiently determining whether there is a whole-page range to apply madvise to, is left in place so that it can easily be re-enabled if desired, or later tuned to only apply to certain sizes or to use additional heuristics.
2021-04-27remove return with expression in void functionMichael Forney-1/+1
2021-04-16mallocng/aligned_alloc: check for malloc failureDominic Chen-0/+3
With mallocng, calling posix_memalign() or aligned_alloc() will SIGSEGV if the internal malloc() call returns NULL. This does not occur with oldmalloc, which explicitly checks for allocation failure.
2021-01-30oldmalloc: preserve errno across freeRich Felker-0/+4
as an outcome of Austin Group issue #385, future versions of the standard will require free not to alter the value of errno. save and restore it individually around the calls to madvise and munmap so that the cost is not imposed on calls to free that do not result in any syscall.
2021-01-30fix build regression in oldmallocRich Felker-1/+1
commit 8d37958d58cf36f53d5fcc7a8aa6d633da6071b2 inadvertently broke oldmalloc by having it implement __libc_malloc rather than __libc_malloc_impl.
2021-01-30preserve errno across freeRich Felker-2/+10
as an outcome of Austin Group issue #385, future versions of the standard will require free not to alter the value of errno. save and restore it individually around the calls to madvise and munmap so that the cost is not imposed on calls to free that do not result in any syscall.
2020-11-30implement reallocarrayAriadne Conill-0/+13
reallocarray is an extension introduced by OpenBSD, which introduces calloc overflow checking to realloc. glibc 2.28 introduced support for this function behind _GNU_SOURCE, while glibc 2.29 allows its usage in _DEFAULT_SOURCE.
2020-11-29fix mallocng regression in malloc_usable_size with null argumentDominic Chen-0/+1
commit d1507646975cbf6c3e511ba07b193f27f032d108 added support for null argument in oldmalloc and was overlooked when switching to mallocng.
2020-11-11lift child restrictions after multi-threaded forkRich Felker-2/+36
as the outcome of Austin Group tracker issue #62, future editions of POSIX have dropped the requirement that fork be AS-safe. this allows but does not require implementations to synchronize fork with internal locks and give forked children of multithreaded parents a partly or fully unrestricted execution environment where they can continue to use the standard library (per POSIX, they can only portably use AS-safe functions). up until recently, taking this allowance did not seem desirable. however, commit 8ed2bd8bfcb4ea6448afb55a941f4b5b2b0398c0 exposed the extent to which applications and libraries are depending on the ability to use malloc and other non-AS-safe interfaces in MT-forked children, by converting latent very-low-probability catastrophic state corruption into predictable deadlock. dealing with the fallout has been a huge burden for users/distros. while it looks like most of the non-portable usage in applications could be fixed given sufficient effort, at least some of it seems to occur in language runtimes which are exposing the ability to run unrestricted code in the child as part of the contract with the programmer. any attempt at fixing such contracts is not just a technical problem but a social one, and is probably not tractable. this patch extends the fork function to take locks for all libc singletons in the parent, and release or reset those locks in the child, so that when the underlying fork operation takes place, the state protected by these locks is consistent and ready for the child to use. locking is skipped in the case where the parent is single-threaded so as not to interfere with legacy AS-safety property of fork in single-threaded programs. lock order is mostly arbitrary, but the malloc locks (including bump allocator in case it's used) must be taken after the locks on any subsystems that might use malloc, and non-AS-safe locks cannot be taken while the thread list lock is held, imposing a requirement that it be taken last.
2020-11-11give libc access to its own malloc even if public malloc is interposedRich Felker-1/+37
allowing the application to replace malloc (since commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732) has brought multiple headaches where it's used from various critical sections in libc components. for example: - the thread-local message buffers allocated for dlerror can't be freed at thread exit time because application code would then run in the context of a non-existant thread. this was handled in commit aa5a9d15e09851f7b4a1668e9dbde0f6234abada by queuing them for free later. - the dynamic linker has to be careful not to pass memory allocated at early startup time (necessarily using its own malloc) to realloc or free after redoing relocations with the application and all libraries present. bugs in this area were fixed several times, at least in commits 0c5c8f5da6e36fe4ab704bee0cd981837859e23f and 2f1f51ae7b2d78247568e7fdb8462f3c19e469a4 and possibly others. - by calling the allocator from contexts where libc-internal locks are held, we impose undocumented requirements on alternate malloc implementations not to call into any libc function that might attempt to take these locks; if they do, deadlock results. - work to make fork of a multithreaded parent give the child an unrestricted execution environment is blocked by lock order issues as long as the application-provided allocator can be called with libc-internal locks held. these problems are all fixed by giving libc internals access to the original, non-replaced allocator, for use where needed. it can't be used everywhere, as some interfaces like str[n]dup, open_[w]memstream, getline/getdelim, etc. are required to provide the called memory obtained as if by (the public) malloc. and there are a number of libc interfaces that are "pure library" code, not part of some internal singleton, and where using the application's choice of malloc implementation is preferable -- things like glob, regex, etc. one might expect there to be significant cost to static-linked programs, pulling in two malloc implementations, one of them mostly-unused, if malloc is replaced. however, in almost all of the places where malloc is used internally, care has been taken already not to pull in realloc/free (i.e. to link with just the bump allocator). this size optimization carries over automatically. the newly-exposed internal allocator functions are obtained by renaming the actual definitions, then adding new wrappers around them with the public names. technically __libc_realloc and __libc_free could be aliases rather than needing a layer of wrapper, but this would almost surely break certain instrumentation (valgrind) and the size and performance difference is negligible. __libc_calloc needs to be handled specially since calloc is designed to work with either the internal or the replaced malloc. as a bonus, this change also eliminates the longstanding ugly dependency of the static bump allocator on order of object files in libc.a, by making it so there's only one definition of the malloc function and having it in the same source file as the bump allocator.
2020-06-30import mallocngRich Felker-13/+938
the files added come from the mallocng development repo, commit 2ed58817cca5bc055974e5a0e43c280d106e696b. they comprise a new malloc implementation, developed over the past 9 months, to replace the old allocator (since dubbed "oldmalloc") with one that retains low code size and minimal baseline memory overhead while avoiding fundamental flaws in oldmalloc and making significant enhancements. these include highly controlled fragmentation, fine-grained ability to return memory to the system when freed, and strong hardening against dynamic memory usage errors by the caller. internally, mallocng derives most of these properties from tightly structuring memory, creating space for allocations as uniform-sized slots within individually mmapped (and individually freeable) allocation groups. smaller-than-pagesize groups are created within slots of larger ones. minimal group size is very small, and larger sizes (in geometric progression) only come into play when usage is high. all data necessary for maintaining consistency of the allocator state is tracked in out-of-band metadata, reachable via a validated path from minimal in-band metadata. all pointers passed (to free, etc.) are validated before any stores to memory take place. early reuse of freed slots is avoided via approximate LRU order of freed slots. further hardening against use-after-free and double-free, even in the case where the freed slot has been reused, is made by cycling the offset within the slot at which the allocation is placed; this is possible whenever the slot size is larger than the requested allocation.
2020-06-29add glue code for mallocng mergeRich Felker-0/+129
this includes both an implementation of reclaimed-gap donation from ldso and a version of mallocng's glue.h with namespace-safe linkage to underlying syscalls, integration with AT_RANDOM initialization, and internal locking that's optimized out when the process is single-threaded.
2020-06-16only use memcpy realloc to shrink if an exact-sized free chunk existsRich Felker-0/+12
otherwise, shrink in-place. as explained in the description of commit 3e16313f8fe2ed143ae0267fd79d63014c24779f, the split here is valid without holding split_merge_lock because all chunks involved are in the in-use state.
2020-06-16fix memset overflow in oldmalloc race fix overhaulRich Felker-1/+1
commit 3e16313f8fe2ed143ae0267fd79d63014c24779f introduced this bug by making the copy case reachable with n (new size) smaller than n0 (original size). this was left as the only way of shrinking an allocation because it reduces fragmentation if a free chunk of the appropriate size is available. when that's not the case, another approach may be better, but any such improvement would be independent of fixing this bug.
2020-06-10only disable aligned_alloc if malloc was replaced but it wasn'tRich Felker-1/+2
it both malloc and aligned_alloc have been replaced but the internal aligned_alloc still gets called, the replacement is a wrapper of some sort. it's not clear if this usage should be officially supported, but it's at least a plausibly interesting debugging usage, and easy to do. it should not be relied upon unless it's documented as supported at some later time.
2020-06-10have ldso track replacement of aligned_allocRich Felker-0/+1
this is in preparation for improving behavior of malloc interposition.
2020-06-10reintroduce calloc elison of memset for direct-mmapped allocationsRich Felker-1/+14
a new weak predicate function replacable by the malloc implementation, __malloc_allzerop, is introduced. by default it's always false; the default version will be used when static linking if the bump allocator was used (in which case performance doesn't matter) or if malloc was replaced by the application. only if the real internal malloc is linked (always the case with dynamic linking) does the real version get used. if malloc was replaced dynamically, as indicated by __malloc_replaced, the predicate function is ignored and conditional-memset is always performed.
2020-06-10move __malloc_replaced to a top-level malloc fileRich Felker-2/+3
it's not part of the malloc implementation but glue with musl dynamic linker.
2020-06-10switch to a common calloc implementationRich Felker-47/+37
abstractly, calloc is completely malloc-implementation-independent; it's malloc followed by memset, or as we do it, a "conditional memset" that avoids touching fresh zero pages. previously, calloc was kept separate for the bump allocator, which can always skip memset, and the version of calloc provided with the full malloc conditionally skipped the clearing for large direct-mmapped allocations. the latter is a moderately attractive optimization, and can be added back if needed. however, further consideration to make it correct under malloc replacement would be needed. commit b4b1e10364c8737a632be61582e05a8d3acf5690 documented the contract for malloc replacement as allowing omission of calloc, and indeed that worked for dynamic linking, but for static linking it was possible to get the non-clearing definition from the bump allocator; if not for that, it would have been a link error trying to pull in malloc.o. the conditional-clearing code for the new common calloc is taken from mal0_clear in oldmalloc, but drops the need to access actual page size and just uses a fixed value of 4096. this avoids potentially needing access to global data for the sake of an optimization that at best marginally helps archs with offensively-large page sizes.
2020-06-03move oldmalloc to its own directory under src/mallocRich Felker-0/+0
this sets the stage for replacement, and makes it practical to keep oldmalloc around as a build option for a while if that ends up being useful. only the files which are actually part of the implementation are moved. memalign and posix_memalign are entirely generic. in theory calloc could be pulled out too, but it's useful to have it tied to the implementation so as to optimize out unnecessary memset when implementation details make it possible to know the memory is already clear.
2020-06-03move __expand_heap into malloc.cRich Felker-73/+64
this function is no longer used elsewhere, and moving it reduces the number of source files specific to the malloc implementation.
2020-06-03rename memalign source file back to its proper nameRich Felker-0/+0
2020-06-03rename aligned_alloc source file back to its proper nameRich Felker-0/+0
2020-06-03reverse dependency order of memalign and aligned_allocRich Felker-10/+5
this change eliminates the internal __memalign function and makes the memalign and posix_memalign functions completely independent of the malloc implementation, written portably in terms of aligned_alloc.
2020-06-03rename aligned_alloc source fileRich Felker-0/+0
this is the first step of swapping the name of the actual implementation to aligned_alloc while preserving history follow.
2020-06-03remove stale document from malloc src directoryRich Felker-22/+0
this was an unfinished draft document present since the initial check-in, that was never intended to ship in its current form. remove it as part of reorganizing for replacement of the allocator.
2020-06-03rewrite bump allocator to fix corner cases, decouple from expand_heapRich Felker-17/+72
this affects the bump allocator used when static linking in programs that don't need allocation metadata due to not using realloc, free, etc. commit e3bc22f1eff87b8f029a6ab31f1a269d69e4b053 refactored the bump allocator to share code with __expand_heap, used by malloc, for the purpose of fixing the case (mainly nommu) where brk doesn't work. however, the geometric growth behavior of __expand_heap is not actually well-suited to the bump allocator, and can produce significant excessive memory usage. in particular, by repeatedly requesting just over the remaining free space in the current mmap-allocated area, the total mapped memory will be roughly double the nominal usage. and since the main user of the no-brk mmap fallback in the bump allocator is nommu, this excessive usage is not just virtual address space but physical memory. in addition, even on systems with brk, having a unified size request to __expand_heap without knowing whether the brk or mmap backend would get used made it so the brk could be expanded twice as far as needed. for example, with malloc(n) and n-1 bytes available before the current brk, the brk would be expanded by n bytes rounded up to page size, when expansion by just one page would have sufficed. the new implementation computes request size separately for the cases where brk expansion is being attempted vs using mmap, and also performs individual mmap of large allocations without moving to a new bump area and throwing away the rest of the old one. this greatly reduces the need for geometric area size growth and limits the extent to which free space at the end of one bump area might be unusable for future allocations. as a bonus, the resulting code size is somewhat smaller than the combined old version plus __expand_heap.
2020-06-02move malloc_impl.h from src/internal to src/mallocRich Felker-0/+43
this reflects that it is no longer intended for consumption outside of the malloc implementation.
2020-06-02fix unbounded heap expansion race in mallocRich Felker-152/+87
this has been a longstanding issue reported many times over the years, with it becoming increasingly clear that it could be hit in practice. under concurrent malloc and free from multiple threads, it's possible to hit usage patterns where unbounded amounts of new memory are obtained via brk/mmap despite the total nominal usage being small and bounded. the underlying cause is that, as a fundamental consequence of keeping locking as fine-grained as possible, the state where free has unbinned an already-free chunk to merge it with a newly-freed one, but has not yet re-binned the combined chunk, is exposed to other threads. this is bad even with small chunks, and leads to suboptimal use of memory, but where it really blows up is where the already-freed chunk in question is the large free region "at the top of the heap". in this situation, other threads momentarily see a state of having almost no free memory, and conclude that they need to obtain more. as far as I can tell there is no fix for this that does not harm performance. the fix made here forces all split/merge of free chunks to take place under a single lock, which also takes the place of the old free_lock, being held at least momentarily at the time of free to determine whether there are neighboring free chunks that need merging. as a consequence, the pretrim, alloc_fwd, and alloc_rev operations no longer make sense and are deleted. simplified merging now takes place inline in free (__bin_chunk) and realloc. as commented in the source, holding the split_merge_lock precludes any chunk transition from in-use to free state. for the most part, it also precludes change to chunk header sizes. however, __memalign may still modify the sizes of an in-use chunk to split it into two in-use chunks. arguably this should require holding the split_merge_lock, but that would necessitate refactoring to expose it externally, which is a mess. and it turns out not to be necessary, at least assuming the existing sloppy memory model malloc has been using, because if free (__bin_chunk) or realloc sees any unsynchronized change to the size, it will also see the in-use bit being set, and thereby can't do anything with the neighboring chunk that changed size.
2020-05-22restore lock-skipping for processes that return to single-threaded stateRich Felker-1/+4
the design used here relies on the barrier provided by the first lock operation after the process returns to single-threaded state to synchronize with actions by the last thread that exited. by storing the intent to change modes in the same object used to detect whether locking is needed, it's possible to avoid an extra (possibly costly) memory load after the lock is taken.
2020-05-22don't use libc.threads_minus_1 as relaxed atomic for skipping locksRich Felker-1/+1
after all but the last thread exits, the next thread to observe libc.threads_minus_1==0 and conclude that it can skip locking fails to synchronize with any changes to memory that were made by the last-exiting thread. this can produce data races. on some archs, at least x86, memory synchronization is unlikely to be a problem; however, with the inline locks in malloc, skipping the lock also eliminated the compiler barrier, and caused code that needed to re-check chunk in-use bits after obtaining the lock to reuse a stale value, possibly from before the process became single-threaded. this in turn produced corruption of the heap state. some uses of libc.threads_minus_1 remain, especially for allocation of new TLS in the dynamic linker; otherwise, it could be removed entirely. it's made non-volatile to reflect that the remaining accesses are only made under lock on the thread list. instead of libc.threads_minus_1, libc.threaded is now used for skipping locks. the difference is that libc.threaded is permanently true once an additional thread has been created. this will produce some performance regression in processes that are mostly single-threaded but occasionally creating threads. in the future it may be possible to bring back the full lock-skipping, but more care needs to be taken to produce a safe design.
2018-09-12split internal lock API out of libc.h, creating lock.hRich Felker-1/+1
this further reduces the number of source files which need to include libc.h and thereby be potentially exposed to libc global state and internals. this will also facilitate further improvements like adding an inline fast-path, if we want to do so later.
2018-09-12reduce spurious inclusion of libc.hRich Felker-1/+0
libc.h was intended to be a header for access to global libc state and related interfaces, but ended up included all over the place because it was the way to get the weak_alias macro. most of the inclusions removed here are places where weak_alias was needed. a few were recently introduced for hidden. some go all the way back to when libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented) cancellation points had to include it. remaining spurious users are mostly callers of the LOCK/UNLOCK macros and files that use the LFS64 macro to define the awful *64 aliases. in a few places, new inclusion of libc.h is added because several internal headers no longer implicitly include libc.h. declarations for __lockfile and __unlockfile are moved from libc.h to stdio_impl.h so that the latter does not need libc.h. putting them in libc.h made no sense at all, since the macros in stdio_impl.h are needed to use them correctly anyway.
2018-09-12hide dependency-triggering pointer object in malloc_usable_size.cRich Felker-2/+2
2018-09-12rework malloc_usable_size to use malloc_impl.hRich Felker-9/+1
2018-09-12move __memalign declaration to malloc_impl.hRich Felker-4/+2
the malloc-implementation-private header is the only right place for this, because, being in the reserved namespace, __memalign is not interposable and thus not valid to use anywhere else. anything outside of the malloc implementation must call an appropriate-namespace public function (aligned_alloc or posix_memalign).
2018-09-12move declarations for malloc internals to malloc_impl.hRich Felker-6/+2
2018-04-19reintroduce hardening against partially-replaced allocatorRich Felker-5/+10
commit 618b18c78e33acfe54a4434e91aa57b8e171df89 removed the previous detection and hardening since it was incorrect. commit 72141795d4edd17f88da192447395a48444afa10 already handled all that remained for hardening the static-linked case. in the dynamic-linked case, have the dynamic linker check whether malloc was replaced and make that information available. with these changes, the properties documented in commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732 are restored: if calloc is not provided, it will behave as malloc+memset, and any of the memalign-family functions not provided will fail with ENOMEM.
2018-04-19return chunks split off by memalign using __bin_chunk instead of freeRich Felker-7/+5
this change serves multiple purposes: 1. it ensures that static linking of memalign-family functions will pull in the system malloc implementation, thereby causing link errors if an attempt is made to link the system memalign functions with a replacement malloc (incomplete allocator replacement). 2. it eliminates calls to free that are unpaired with allocations, which are confusing when setting breakpoints or tracing execution. as a bonus, making __bin_chunk external may discourage aggressive and unnecessary inlining of it.
2018-04-19using malloc implementation types/macros/idioms for memalignRich Felker-20/+22
the generated code should be mostly unchanged, except for explicit use of C_INUSE in place of copying the low bits from existing chunk headers/footers. these changes also remove mild UB due to dubious arithmetic on pointers into imaginary size_t[] arrays.
2018-04-19move malloc implementation types and macros to an internal headerRich Felker-37/+1
2018-04-19revert detection of partially-replaced allocatorRich Felker-15/+6
commit c9f415d7ea2dace5bf77f6518b6afc36bb7a5732 included checks to make calloc fallback to memset if used with a replaced malloc that didn't also replace calloc, and the memalign family fail if free has been replaced. however, the checks gave false positives for replacement whenever malloc or free resolved to a PLT entry in the main program. for now, disable the checks so as not to leave libc in a broken state. this means that the properties documented in the above commit are no longer satisfied; failure to replace calloc and the memalign family along with malloc is unsafe if they are ever called. the calloc checks were correct but useless for static linking. in both cases (simple or full malloc), calloc and malloc are in a source file together, so replacement of one but not the other would give linking errors. the memalign-family check was useful for static linking, but broken for dynamic as described above, and can be replaced with a better link-time check.
2018-04-18allow interposition/replacement of allocator (malloc)Rich Felker-23/+30
replacement is subject to conditions on the replacement functions. they may only call functions which are async-signal-safe, as specified either by POSIX or as an implementation-defined extension. if any allocator functions are replaced, at least malloc, realloc, and free must be provided. if calloc is not provided, it will behave as malloc+memset. any of the memalign-family functions not provided will fail with ENOMEM. in order to implement the above properties, calloc and __memalign check that they are using their own malloc or free, respectively. choice to check malloc or free is based on considerations of supporting __simple_malloc. in order to make this work, calloc is split into separate versions for __simple_malloc and full malloc; commit ba819787ee93ceae94efd274f7849e317c1bff58 already did most of the split anyway, and completing it saves an extra call frame. previously, use of -Bsymbolic-functions made dynamic interposition impossible. now, we are using an explicit dynamic-list, so add allocator functions to the list. most are not referenced anyway, but all are added for completeness.
2018-04-17remove unused __brk function/source fileRich Felker-7/+0
commit e3bc22f1eff87b8f029a6ab31f1a269d69e4b053 removed all references to __brk.
2018-04-17comment __malloc_donate overflow logicRich Felker-0/+3
2018-04-17ldso, malloc: implement reclaim_gaps via __malloc_donateAlexander Monakov-18/+43
Split 'free' into unmap_chunk and bin_chunk, use the latter to introduce __malloc_donate and use it in reclaim_gaps instead of calling 'free'.
2018-04-17malloc: fix an over-allocation bugAlexander Monakov-4/+4
Fix an instance where realloc code would overallocate by OVERHEAD bytes amount. Manually arrange for reuse of memcpy-free-return exit sequence.
2018-04-11optimize malloc0Alexander Monakov-6/+23
Implementation of __malloc0 in malloc.c takes care to preserve zero pages by overwriting only non-zero data. However, malloc must have already modified auxiliary heap data just before and beyond the allocated region, so we know that edge pages need not be preserved. For allocations smaller than one page, pass them immediately to memset. Otherwise, use memset to handle partial pages at the head and tail of the allocation, and scan complete pages in the interior. Optimize the scanning loop by processing 16 bytes per iteration and handling rest of page via memset as soon as a non-zero byte is found.
2018-01-09revise the definition of multiple basic locks in the codeJens Gustedt-1/+1
In all cases this is just a change from two volatile int to one.
2017-07-04fix undefined behavior in freeAlexander Monakov-2/+3