musl/src/multibyte, branch master

align mbsnrtowcs behavior on partial character with new requirements

2025-05-05T13:27:29+00:00

POSIX 2024 added a requirement that mbsnrtowcs, like mbrtowc, consume
any final partial character and store it in the mbstate_t object
before returning. this was previously unspecified but documented as a
potential future change.

an internal mbstate_t object is added for the case where the argument
is a null pointer. previously this was not needed since no operations
could modify the internal object and not processing it at all gave the
same behavior "as if" there were an internal object.

mbrtowc: Fix wrong return value when n > UINT_MAX

2023-05-26T20:12:29+00:00

mbrtowc truncates n to unsigned int when storing its copy.
If n > UINT_MAX and the locale is not POSIX, the function will
return a wrong value greater than UINT_MAX on the success path.

rewrite wcsnrtombs to fix buffer overflow and other bugs

2020-11-19T22:12:43+00:00

the original wcsnrtombs implementation, which has been largely
untouched since 0.5.0, attempted to build input-length-limiting
conversion on top of wcsrtombs, which only limits output length. as
best I recall, this choice was made out of a mix of disdain over
having yet another variant function to implement (added in POSIX 2008;
not standard C) and preference not to switch things around and
implement the wcsrtombs in terms of the more general new function,
probably over namespace issues. the strategy employed was to impose
output limits that would ensure the input limit wasn't exceeded, then
finish up the tail character-at-a-time. unfortunately, none of that
worked correctly.

first, the logic in the wcsrtombs loop was wrong in that it could
easily get stuck making no forward progress, by imposing an output
limit too small to convert even one character.

the character-at-a-time loop that followed was even worse. it made no
effort to ensure that the converted multibyte character would fit in
the remaining output space, only that there was a nonzero amount of
output space remaining. it also employed an incorrect interpretation
of wcrtomb's interface contract for converting the null character,
thereby failing to act on end of input, and remaining space accounting
was subject to unsigned wrap-around. together these errors allow
unbounded overflow of the destination buffer, controlled by input
length limit and input wchar_t string contents.

given the extent to which this function was broken, it's plausible
that most applications that would have been rendered exploitable were
sufficiently broken not to be usable in the first place. however, it's
also plausible that common (especially ASCII-only) inputs succeeded in
the wcsrtombs loop, which mostly worked, while leaving the wildly
erroneous code in the second loop exposed to particular non-ASCII
inputs.

CVE-2020-28928 has been assigned for this issue.

fix aliasing-based undefined behavior in mbsrtowcs

2019-10-13T21:21:36+00:00

mbsrtowcs contains "vectorized" loops to quickly step over bytes
without the high bit set; these have undefined behavior by virtue of
aliasing uint32_t over top of char data for the accesses.

commit 4d0a82170a25464c39522d7190b9fe302045ddb2 fixed the
corresponding usage in string functions by using the may_alias
attribute conditional on __GNUC__ and disabled the vectorized code in
its absence. do the same for mbsrtowcs.

reduce spurious inclusion of libc.h

2018-09-12T18:34:37+00:00

libc.h was intended to be a header for access to global libc state and
related interfaces, but ended up included all over the place because
it was the way to get the weak_alias macro. most of the inclusions
removed here are places where weak_alias was needed. a few were
recently introduced for hidden. some go all the way back to when
libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented)
cancellation points had to include it.

remaining spurious users are mostly callers of the LOCK/UNLOCK macros
and files that use the LFS64 macro to define the awful *64 aliases.

in a few places, new inclusion of libc.h is added because several
internal headers no longer implicitly include libc.h.

declarations for __lockfile and __unlockfile are moved from libc.h to
stdio_impl.h so that the latter does not need libc.h. putting them in
libc.h made no sense at all, since the macros in stdio_impl.h are
needed to use them correctly anyway.

define and use internal macros for hidden visibility, weak refs

2018-09-05T18:05:14+00:00

this cleans up what had become widespread direct inline use of "GNU C"
style attributes directly in the source, and lowers the barrier to
increased use of hidden visibility, which will be useful to recovering
some of the efficiency lost when the protected visibility hack was
dropped in commit dc2f368e565c37728b0d620380b849c3a1ddd78f, especially
on archs where the PLT ABI is costly.

fix erroneous acceptance of f4 9x xx xx code sequences by utf-8 decoder

2017-09-01T21:05:40+00:00

the DFA table controlling accepted ranges for the f4 prefix used an
incorrect upper bound of 0xa0 where it should have been 0x90, allowing
such sequences to be accepted and decoded as non-Unicode-scalar values
0x110000 through 0x11ffff.

fix erroneous stop before input limit in mbsnrtowcs and wcsnrtombs

2017-08-31T19:48:00+00:00

the value computed as an output limit that bounds the amount of input
consumed below the input limit was incorrectly being used as the
actual amount of input consumed. instead, compute the actual amount of
input consumed as a difference of pointers before and after the
conversion.

patch by Mikhail Kremnyov.

remove comments on copyright status from UTF-8 implementation files

2016-06-21T20:33:14+00:00

despite clarifications made to the COPYRIGHT file in commit
f0a61399330bae42beeb27d6ecd05570b3382a60, there continues to be
confusion about whether the permissions granted actually apply to all
files. I am the sole author of these files and clearly intend, and
have always intended, for the grant of permission to apply to them.

explicitly include stdio.h to get EOF definition needed by wctob

2016-03-02T05:44:56+00:00