summaryrefslogtreecommitdiff
path: root/src/multibyte/mbsrtowcs.c
AgeCommit message (Collapse)AuthorLines
2019-10-13fix aliasing-based undefined behavior in mbsrtowcsRich Felker-2/+8
mbsrtowcs contains "vectorized" loops to quickly step over bytes without the high bit set; these have undefined behavior by virtue of aliasing uint32_t over top of char data for the accesses. commit 4d0a82170a25464c39522d7190b9fe302045ddb2 fixed the corresponding usage in string functions by using the may_alias attribute conditional on __GNUC__ and disabled the vectorized code in its absence. do the same for mbsrtowcs.
2016-06-21remove comments on copyright status from UTF-8 implementation filesRich Felker-6/+0
despite clarifications made to the COPYRIGHT file in commit f0a61399330bae42beeb27d6ecd05570b3382a60, there continues to be confusion about whether the permissions granted actually apply to all files. I am the sole author of these files and clearly intend, and have always intended, for the grant of permission to apply to them.
2015-06-16byte-based C locale, phase 1: multibyte character handling functionsRich Felker-0/+19
this patch makes the functions which work directly on multibyte characters treat the high bytes as individual abstract code units rather than as multibyte sequences when MB_CUR_MAX is 1. since MB_CUR_MAX is presently defined as a constant 4, all of the new code added is dead code, and optimizing compilers' code generation should not be affected at all. a future commit will activate the new code. as abstract code units, bytes 0x80 to 0xff are represented by wchar_t values 0xdf80 to 0xdfff, at the end of the surrogates range. this ensures that they will never be misinterpreted as Unicode characters, and that all wctype functions return false for these "characters" without needing locale-specific logic. a high range outside of Unicode such as 0x7fffff80 to 0x7fffffff was also considered, but since C11's char16_t also needs to be able to represent conversions of these bytes, the surrogate range was the natural choice.
2013-12-12include cleanups: remove unused headers and add feature test macrosSzabolcs Nagy-3/+1
2013-09-27fix buffer overflow in mbsrtowcsRich Felker-1/+1
issue reported by Michael Forney: "If wn becomes 0 after processing a chunk of 4, mbsrtowcs currently continues on, wrapping wn around to -1, causing the rest of the string to be processed. This resulted in buffer overruns if there was only space in ws for wn wide characters." the original patch submitted added an additional check for !wn after the loop; to avoid extra branching, I instead just changed the wn>=4 check to wn>=5 to ensure that at least one slot remains after the word-at-a-time loop runs. this should not slow down the tail processing on real-world usage, since an extra slot that can't be processed in the word-at-a-time loop is needed for the null termination anyway.
2013-06-29fix failure of mbsrtowcs to record stop position when dest is fullRich Felker-1/+4
2013-04-04overhaul mbsrtowcsRich Felker-69/+64
these changes fix at least two bugs: - misaligned access to the input as uint32_t for vectorized ASCII test - incorrect src pointer after stopping on EILSEQ in addition, the text of the standard makes it unclear whether the mbstate_t object is to be modified when the destination pointer is null; previously it was cleared either way; now, it's only cleared when the destination is non-null. this change may need revisiting, but it should not affect most applications, since calling mbsrtowcs with non-zero state can only happen when the head of the string was already processed with mbrtowc. finally, these changes shave about 20% size off the function and seem to improve performance by 1-5%.
2012-09-06use restrict everywhere it's required by c99 and/or posix 2008Rich Felker-1/+1
to deal with the fact that the public headers may be used with pre-c99 compilers, __restrict is used in place of restrict, and defined appropriately for any supported compiler. we also avoid the form [restrict] since older versions of gcc rejected it due to a bug in the original c99 standard, and instead use the form *restrict.
2011-02-27cleanup utf-8 multibyte code, use visibility if possibleRich Felker-16/+0
this code was written independently of musl, with support for a the backwards, nonstandard "31-bit unicode" some libraries/apps might want. unfortunately the extra code (inside #ifdef) makes the source harder to read and makes code that should be simple look complex, so i'm removing it. anyone who wants to use the old code can find it in the history or from elsewhere. also, change the visibility of the __fsmu8 state machine table to hidden, if supported. this should improve performance slightly in shared-library builds.
2011-02-13cleanup multibyte stuff to remove ugly casts, sanitize the ptr align castsRich Felker-19/+19
2011-02-12initial check-in, version 0.5.0v0.5.0Rich Felker-0/+121