|Age||Commit message (Collapse)||Author||Lines|
prior to commit e68c51ac46a9f273927aef8dcebc89912ab19ece, h_errno was
actually an external data object not a macro. bring back the symbol,
and use it as the storage for the main thread's h_errno.
technically this still doesn't provide full compatibility if the
application was multithreaded, but at the time there were no res_*
functions (and they did not set h_errno anyway), so any use of h_errno
would have been via thread-unsafe functions. thus a solution that just
fixes single-threaded applications seems acceptable.
while it's not clearly documented anywhere, this is the historical
behavior which some applications expect. applications which need to
see the response packet in these cases, for example to distinguish
between nonexistence in a secure vs insecure zone, must already use
res_mkquery with res_send in order to be portable, since most if not
all other implementations of res_query don't provide it.
the framework to do this always existed but it was deemed unnecessary
because the only [ex-]standard functions using h_errno were not
thread-safe anyway. however, some of the nonstandard res_* functions
are also supposed to set h_errno to indicate the cause of error, and
were unable to do so because it was not thread-safe. this change is a
prerequisite for fixing them.
prior to this change, the canonical name came from the first hosts
file line matching the requested family, so the canonical name for a
given hostname could differ depending on whether it was requested with
AF_UNSPEC or a particular family (AF_INET or AF_INET6). now, the
canonical name is deterministically the first one to appear with the
requested name as an alias.
the existing code clobbered the canonical name already discovered
every time another matching line was found, which will necessarily be
the case when a hostname has both IPv4 and v6 definitions.
patch by Wolf.
the internal __res_msend returns 0 on timeout without having obtained
any conclusive answer, but in this case has not filled in meaningful
anslen. res_send wrongly treated that as success, but returned a zero
answer length. any reasonable caller would eventually end up treating
that as an error when attempting to parse/validate it, but it should
just be reported as an error.
alternatively we could return the last-received inconclusive answer
(typically servfail), but doing so would require internal changes in
__res_msend. this may be considered later.
the old logic here likely dates back, at least in inspiration, to
before it was recognized that transient errors must not be allowed to
reflect the contents of successful results and must be reported to the
here, the dns backend for getaddrinfo, when performing a paired query
for v4 and v6 addresses, accepted results for one address family even
if the other timed out. (the __res_msend backend does not propagate
error rcodes back to the caller, but continues to retry until timeout,
so other error conditions were not actually possible.)
this patch moves the checks to take place before answer parsing, and
performs them for each answer rather than only the answer to the first
query. if nxdomain is seen it's assumed to apply to both queries since
that's how dns semantics work.
the AD (authenticated data) bit in outgoing dns queries is defined by
rfc3655 to request that the nameserver report (via the same bit in the
response) whether the result is authenticated by DNSSEC. while all
results returned by a DNSSEC conforming nameserver will be either
authenticated or cryptographically proven to lack DNSSEC protection,
for some applications it's necessary to be able to distinguish these
two cases. in particular, conforming and compatible handling of DANE
(TLSA) records requires enforcing them only in signed zones.
when the AD bit was first defined for queries, there were reports of
compatibility problems with broken firewalls and nameservers dropping
queries with it set. these problems are probably a thing of the past,
and broken nameservers are already unsupported. however, since there
is no use in the AD bit with the netdb.h interfaces, explicitly clear
it in the queries they make. this ensures that, even with broken
setups, the standard functions will work, and at most the res_*
commit 59324c8b0950ee94db846a50554183c845ede160 added __socketcall
analogous to __syscall, returning the negated error rather than
setting errno. use it to simplify the fallback path of socket(),
avoiding extern calls and access to errno.
Author: Rich Felker <firstname.lastname@example.org>
Date: Tue Jul 30 17:51:16 2019 -0400
make __socketcall analogous to __syscall, error-returning
always try the time64 syscall first since we can use its success to
conclude that no conversion is needed (any setsockopt for the
timestamp options would have succeeded without need for fallbacks).
otherwise, we have to remember the original controllen for each
msghdr, requiring O(vlen) space, so vlen must be bounded. linux clamps
it to IOV_MAX for sendmmsg only (not recvmmsg), but doing the same for
recvmmsg is not unreasonable, especially since the limitation will
only apply to old kernels.
we could optimize to avoid trying SYS_recvmmsg_time64 first if all
msghdrs have controllen zero, or support unlimited vlen by looping and
emulating the timeout logic, but I'm not inclined to do complex and
error-prone optimizations on a function that has so many underlying
problems it should really never be used.
the definitions of SO_TIMESTAMP* changed on 32-bit archs in commit
38143339646a4ccce8afe298c34467767c899f51 to the new versions that
provide 64-bit versions of timeval/timespec structure in control
message payload. socket options, being state attached to the socket
rather than function calls, are not trivial to implement as fallbacks
on ENOSYS, and support for them was initially omitted on the
assumption that the ioctl-based polling alternatives (SIOCGSTAMP*)
could be used instead by applications if setsockopt fails.
unfortunately, it turns out that SO_TIMESTAMP is sufficiently old and
widely supported that a number of applications assume it's available
and treat errors as fatal.
this patch introduces emulation of SO_TIMESTAMP[NS] on pre-time64
kernels by falling back to setting the "_OLD" (time32) versions of the
options if the time64 ones are not recognized, and performing
translation of the SCM_TIMESTAMP[NS] control messages in recvmsg.
since recvmsg does not know whether its caller is legacy time32 code
or time64, it performs translation for any SCM_TIMESTAMP[NS]_OLD
control messages it sees, leaving the original time32 timestamp as-is
(it can't be rewritten in-place anyway, and memmove would be mildly
expensive) and appending the converted time64 control message at the
end of the buffer. legacy time32 callers will see the converted one as
a spurious control message of unknown type; time64 callers running on
pre-time64 kernels will see the original one as a spurious control
message of unknown type. a time64 caller running on a kernel with
native time64 support will only see the time64 version of the control
emulation of SO_TIMESTAMPING is not included at this time since (1)
applications which use it seem to be prepared for the possibility that
it's not present or working, and (2) it can also be used in sendmsg
control messages, in a manner that looks complex to emulate
completely, and costly even when running on a time64-supporting
corresponding changes in recvmmsg are not made at this time; they will
be done separately.
somewhat analogous to commit d0b547dfb5f7678cab6bc39dd736ed6454357ca4,
but here the omission of the null timeout check was in the time64
syscall code path. this code is not yet used except on x32.
without this, the SO_RCVTIMEO and SO_SNDTIMEO socket options would
stop working on pre-5.1 kernels after time_t is switched to 64-bit and
their values are changed to the new time64 versions.
new code is written such that it's statically unreachable on 64-bit
archs, and on existing 32-bit archs until the macro values are changed
to activate 64-bit time_t.
the time64 syscall is used only if the timeout does not fit in 32
bits. after preprocessing, the code is unchanged on 64-bit archs. for
32-bit archs, the timeout now goes through an intermediate copy,
meaning that the caller does not get back the updated timeout. this is
based on my reading of the documentation, which does not document the
updating as a contract you can rely on, and mentions that the whole
recvmmsg timeout mechanism is buggy and unlikely to be useful. if it
turns out that there's interest in making the remaining time
officially available to callers, such functionality could be added
The original logic considered each byte until it either found a 0
value or a value >= 192. This means if a string segment contained any
byte >= 192 it was interepretted as a compressed segment marker even
if it wasn't in a position where it should be interpretted as such.
The fix is to adjust dn_skipname to increment by each segments size
rather than look at each character. This avoids misinterpretting
string segment characters by not considering those bytes.
addressing &out[k].sa was arguably undefined, despite &out[k] being
defined the slot one past the end of an array, since the member access
.sa is intervening between the  operator and the & operator.
the backindex stored by getaddrinfo to allow freeaddrinfo to perform
partial-free wrongly used the address result index, rather than the
output slot index, and thus was only valid when they were equal
patch based on report with proposed fix by Markus Wichmann.
the specification for freeaddrinfo allows it to be used to free
"arbitrary sublists" of the list returned by getaddrinfo. it's not
clearly stated how such sublists come into existence, but the
interpretation seems to be that the application can edit the ai_next
pointers to cut off a portion of the list and then free it.
actual freeing of individual list slots is contrary to the design of
our getaddrinfo implementation, which has no failure paths after
making a single allocation, so that light callers can avoid linking
realloc/free. freeing individual slots is also incompatible with
sharing the string for ai_canonname, which the current implementation
does despite no requirement that it be present except on the first
result. so, rather than actually freeing individual slots, provide a
way to find the start of the allocated array, and reference-count it,
freeing the memory all at once after the last slot has been freed.
since the language in the spec is "arbitrary sublists", no provision
for handling other constructs like multiple lists glued together,
circular links, etc. is made. presumably passing such a construct to
freeaddrinfo produces undefined behavior.
despite not being documented to do so in the standard or Linux
documentation, attempts to udp connect to 127.0.0.1 or ::1 generate
EADDRNOTAVAIL when the loopback device is not configured and there is
no default route for IPv6. this caused getaddrinfo with AI_ADDRCONFIG
to fail with EAI_SYSTEM and EADDRNOTAVAIL on some no-IPv6
configurations, rather than the intended behavior of detecting IPv6 as
unsuppported and producing IPv4-only results.
previously, only EAFNOSUPPORT was treated as unavailability of the
address family being probed. instead, treat all errors related to
inability to get an address or route as conclusive that the family
being probed is unsupported, and only fail with EAI_SYSTEM on other
further improvements may be desirable, such as reporting EAI_AGAIN
instead of EAI_SYSTEM for errors which are expected to be transient,
but this patch should suffice to fix the serious regression.
libc.h was intended to be a header for access to global libc state and
related interfaces, but ended up included all over the place because
it was the way to get the weak_alias macro. most of the inclusions
removed here are places where weak_alias was needed. a few were
recently introduced for hidden. some go all the way back to when
libc.h defined CANCELPT_BEGIN and _END, and all (wrongly implemented)
cancellation points had to include it.
remaining spurious users are mostly callers of the LOCK/UNLOCK macros
and files that use the LFS64 macro to define the awful *64 aliases.
in a few places, new inclusion of libc.h is added because several
internal headers no longer implicitly include libc.h.
declarations for __lockfile and __unlockfile are moved from libc.h to
stdio_impl.h so that the latter does not need libc.h. putting them in
libc.h made no sense at all, since the macros in stdio_impl.h are
needed to use them correctly anyway.
commits leading up to this one have moved the vast majority of
libc-internal interface declarations to appropriate internal headers,
allowing them to be type-checked and setting the stage to limit their
visibility. the ones that have not yet been moved are mostly
namespace-protected aliases for standard/public interfaces, which
exist to facilitate implementing plain C functions in terms of POSIX
functionality, or C or POSIX functionality in terms of extensions that
are not standardized. some don't quite fit this description, but are
"internally public" interfacs between subsystems of libc.
rather than create a number of newly-named headers to declare these
functions, and having to add explicit include directives for them to
every source file where they're needed, I have introduced a method of
wrapping the corresponding public headers.
parallel to the public headers in $(srcdir)/include, we now have
wrappers in $(srcdir)/src/include that come earlier in the include
path order. they include the public header they're wrapping, then add
declarations for namespace-protected versions of the same interfaces
and any "internally public" interfaces for the subsystem they
along these lines, the wrapper for features.h is now responsible for
the definition of the hidden, weak, and weak_alias macros. this means
source files will no longer need to include any special headers to
access these features.
over time, it is my expectation that the scope of what is "internally
public" will expand, reducing the number of source files which need to
include *_impl.h and related headers down to those which are actually
implementing the corresponding subsystems, not just using them.
unlike the other res/dn functions, this one is tied to struct
resolvconf which is not a public interface, so put it in the private
header for its subsystem.
the source file for this function is completely standalone, but it
doesn't seem worth adding a header just for it, so declare it in
lookup.h for now.
policy is that all public functions which have a public declaration
should be defined in a context where that public declaration is
visible, to avoid preventable type mismatches.
an audit performed using GCC's -Wmissing-declarations turned up the
violations corrected here. in some cases the public header had not
been included; in others, a feature test macro needed to make the
declaration visible had been omitted.
in the case of gethostent and getnetent, the omission seems to have
been intentional, as a hack to admit a single stub definition for both
functions. this kind of hack is no longer acceptable; it's UB and
would not fly with LTO or advanced toolchains. the hack is undone to
make exposure of the declarations possible.
commit 4f35eb7591031a1e5ef9828f9304361f282f28b9 introduced this bug.
it is not present in any released versions. inadvertent use of the &
operator on an array into which we're indexing produced arithmetic on
the wrong-type pointer, with undefined behavior.
this flag is notoriously under-/mis-specified, and in the past it was
implemented as a nop, essentially considering the absence of a
loopback interface with 127.0.0.1 and ::1 addresses an unsupported
configuration. however, common real-world container environments omit
IPv6 support (even for the network-namespaced loopback interface), and
some kernels omit IPv6 support entirely. future systems on the other
hand might omit IPv4 entirely.
treat these as supported configurations and suppress results of the
unconfigured/unsupported address families when AI_ADDRCONFIG is
requested. use routability of the loopback address to make the
determination; unlike other implementations, we do not exclude
loopback from the "an address is configured" condition, since there is
no basis in the specification for such exclusion. obtaining a result
with AI_ADDRCONFIG does not imply routability of the result, and
applications must still be able to cope with unroutable results even
if they pass AI_ADDRCONFIG.
to produce sorted results roughly corresponding to RFC 3484/6724,
__lookup_name computes routability and choice of source address via
dummy UDP connect operations (which do not produce any packets). since
at the logical level, the properties fed into the sort key are
computed on ipv6 addresses, the code was written to use the v4mapped
ipv6 form of ipv4 addresses and share a common code path for them all.
however, on kernels where ipv6 support has been completely omitted,
this causes ipv4 to appear equally unroutable as ipv6, thereby putting
unreachable ipv6 addresses before ipv4 addresses in the results.
instead, use only ipv4 sockets to compute routability for ipv4
addresses. some gratuitous conversion back and forth is left so that
the logic is not affected by these changes. it may be possible to
simplify the ipv4 case considerably, thereby reducing code size and
maintainer's note: this change is for conformance with RFC 5952,
4.2.2, which explicitly forbids use of :: to shorten a single 16-bit 0
field when producing the canonical text representation for an IPv6
address. fixes a test failure reported by Philip Homburg, who also
submitted a patch, but this fix is simpler and should produce smaller
if a final dot was included in the queried host name to anchor it to
the dns root/suppress search domains, and the result was not a CNAME,
the returned canonical name included the final dot. this was not
consistent with other implementations, confused some applications, and
does not seem desirable.
POSIX specifies returning a pointer to, or to a copy of, the input
nodename, when the canonical name is not available, but does not
attempt to specify what constitutes "not available". in the case of
search, we already have an implementation-defined "availability" of a
canonical name as the fully-qualified name resulting from search, so
defining it similarly in the no-search case seems reasonable in
addition to being consistent with other implementations.
as a bonus, fix the case where more than one trailing dot is included,
since otherwise the changes made here would wrongly cause lookups with
two trailing dots to succeed. previously this case resulted in
malformed dns queries and produced EAI_AGAIN after a timeout. now it
fails immediately with EAI_NONAME.
If AI_NUMERICSERV is specified and a numeric service was not provided,
POSIX mandates getaddrinfo return EAI_NONAME. EAI_SERVICE is only for
services that cannot be used on the specified socket type.
MAXADDRS was chosen not to need enforcement, but the logic used to
compute it assumes the answers received match the RR types of the
queries. specifically, it assumes that only one replu contains A
record answers. if the replies to both the A and the AAAA query have
their answer sections filled with A records, MAXADDRS can be exceeded
and clobber the stack of the calling function.
this bug was found and reported by Felix Wilhelm.
some applications use getservbyport to find port numbers that are not
assigned to a service; if getservbyport always succeeds with a numeric
string as the result, they fail to find any available ports.
POSIX doesn't seem to mandate the behavior one way or another. it
specifies an abstract service database, which an implementation could
define to include numeric port strings, but it makes more sense to
align behavior with traditional implementations.
based on patch by A. Wilcox. the original patch only changed
getservbyport[_r]. to maintain a consistent view of the "service
database", I have also modified getservbyname[_r] to exclude numeric
commit d6cb08bcaca4ff1f921375510ca72bccea969c75 moved the code and
introduced an incorrect string offset for the new parsing, probably
due to a copy-and-paste error.
patch by Stefan Sedich.
due to testing buf[i].family==AF_INET before checking i==cnt, it was
possible to read past the end of the array, or past the valid part. in
practice, without active bounds/indeterminate-value checking by the
compiler, the worst that happened was failure to return early and
optimize out the sorting that's unneeded for v4-only results.
returning on i==cnt-1 rather than i==cnt would be an alternate fix,
but the approach this patch takes is more idiomatic and less
patch by Timo Teräs.
this is a clone of the fix to the gethostby*_r functions in
commit fe82bb9b921be34370e6b71a1c6f062c20999ae0. the man pages
document that the getservby*_r functions set this pointer to
NULL if there was an error or if no record was found.
this case statement was accidently left behind when this function
was refactored in commit e8f39ca4898237cf71657500f0b11534c47a0521.
posix requires errno to be set to ENXIO if the interface does not exist.
linux returns ENODEV instead so we handle this.
this code was already under #if 0, but could be confusing if a reader
didn't notice that, and it's almost surely full of bugs and/or
inconsistencies with the current code that uses the gethostbyname2_r
loop over an address family / resource record mapping to avoid
don't send a query that may be malformed.
mistakenly ordering strings before addresses in the result buffer
broke the alignment that the preceding code had set up.
previously if you called getprotobyname("egp") you would get
NULL because \008 is invalid octal and so the protocol id was
interpreted as 0 and name as "8egp".
The variable nss is set to zero in following line.
name_from_hosts failed to account for the possibility of an address
family error from name_from_numeric, wrongly counting such a return as
success and using the uninitialized address data as part of the
results passed up to the caller.
non-matching address family entries cannot simply be ignored or
results would be inconsistent with respect to whether AF_UNSPEC or a
specific address family is queried. instead, record that a
non-matching entry was seen, and fail the lookup with EAI_NONAME of no
matching-family entries are found.