diff options
author | Rich Felker <dalias@aerifal.cx> | 2017-11-20 16:25:54 -0500 |
---|---|---|
committer | Rich Felker <dalias@aerifal.cx> | 2017-11-20 16:25:54 -0500 |
commit | 4000b0107ddd7fe733fa31d4f078c6fcd35851d6 (patch) | |
tree | ed33813b0dbc943915c1a6f0458075e1742b6fd3 /include/stdio.h | |
parent | a90d9da1d1b14d81c4f93e1a6d1a686c3312e4ba (diff) | |
download | musl-4000b0107ddd7fe733fa31d4f078c6fcd35851d6.tar.gz |
make fgetwc handling of encoding errors consistent with/without buffer
previously, fgetwc left all but the first byte of an illegal sequence
unread (available for subsequent calls) when reading out of the FILE
buffer, but dropped all bytes contibuting to the error when falling
back to reading a byte at a time. neither behavior was ideal. in the
buffered case, each malformed character produced one error per byte,
rather than one per character. in the unbuffered case, consuming the
last byte that caused the transition from "incomplete" to "invalid"
state potentially dropped (and produced additional spurious encoding
errors for) the next valid character.
to handle both cases uniformly without duplicate code, revise the
buffered case to only cover situations where a complete and valid
character is present in the buffer, and fall back to byte-at-a-time
for all other cases. this allows using mbtowc (stateless) instead of
mbrtowc, which may slightly improve performance too.
when an encoding error has been hit in the byte-at-a-time case, leave
the final byte that produced the error unread (via ungetc) except in
the case of single-byte errors (for UTF-8, bytes c0, c1, f5-ff, and
continuation bytes with no lead byte). single-byte errors are fully
consumed so as not to leave the caller in an infinite loop repeating
the same error.
none of these changes are distinguished from a conformance standpoint,
since the file position is unspecified after encoding errors. they are
intended merely as QoI/consistency improvements.
Diffstat (limited to 'include/stdio.h')
0 files changed, 0 insertions, 0 deletions