summaryrefslogtreecommitdiff
path: root/src/network/lookup_name.c
AgeCommit message (Collapse)AuthorLines
2023-07-04dns stub resolver: increase buffer size to handle chained CNAMEsRich Felker-1/+1
in the event of chained CNAMEs, the answer to a query will contain the entire CNAME chain, not just one CNAME record. previously, the answer buffer size had been chosen to admit a maximal-length CNAME, but only one. a moderate-length chain could fill the available 768 bytes leaving no room for an actual address answering the query. while the DNS RFCs do not specify any limit on the length of a CNAME chain, or any reasonable behavior is the chain exceeds the entire 64k possible message size, actual recursive servers have to impose a limit, and a such, for all practical purposes, chains longer than this limit are not usable. it turns out BIND has a hard-coded limit of 16, and Unbound has a default limit of 11. assuming the recursive server makes use of "compression" (pointers), each maximal-length CNAME record takes at most 268 bytes, and thus any chain up to length 16 fits in at most 4288 bytes. this patch increases the answer buffer size to preserve the original intent of having 512 bytes available for address answers, plus space needed for a maximal CNAME chain, for a total of 4800 bytes. the resulting size of 9600 bytes for two queries (A+AAAA) is still well within what is reasonable to place in automatic storage.
2023-04-07dns: check length field in tcp response messageAlexey Kodanev-0/+1
The received length field in the message may be greater than the size of the 'answer' buffer in which the message resides. Currently, ABUF_SIZE is 768. And if we get a larger 'alens[i]', it will result in an out-of-bounds reading in __dns_parse(). To fix this, limit the length to the size of the received buffer.
2023-02-27prevent CNAME/PTR parsing from reading data past the response endAlexey Izbyshev-2/+2
DNS parsing callbacks pass the response buffer end instead of the actual response end to dn_expand, so a malformed DNS response can use message compression to make dn_expand jump past the response end and attempt to parse uninitialized parts of that buffer, which might succeed and return garbage.
2022-10-19clean up dns_parse_callbackRich Felker-13/+13
the only functional change here should be that MAXADDRS is only checked for RRs that provide address results, so that a CNAME which appears after an excessive number of address RRs does not get ignored. I'm not aware of any servers that order the RRs this way, and it may even be forbidden to do so, but I prefer having the callback logic not be order dependent. other than that, the motivation for this change is that the A and AAAA cases were mostly duplicate code that could be combined as a single code path.
2022-10-19dns response handling: don't treat too many addresses as an errorRich Felker-1/+1
returning -1 rather than 0 from the parse function causes __dns_parse to bail out and return an error. presently, name_from_dns does not check the return value anyway, so this does not matter, but if it ever started treating this as an error, lookups with large numbers of addresses would break. this is a consequence of adding TCP support and extending the buffer size used in name_from_dns.
2022-10-19dns response handling: ignore presence of wrong-type RRsRich Felker-2/+8
reportedly there is nameserver software with question-rewriting "functionality" which gives A answers when AAAA is queried. since we made no effort to validate that the answer RR type actually corresponds to the question asked, it was possible (depending on flags, etc.) for these answers to leak through, which the caller might not be prepared for. indeed, our implementation of gethostbyname2_r makes an assumption that the resulting addresses are in the family requested, and will misinterpret the results if they don't. commit 45ca5d3fcb6f874bf5ba55d0e9651cef68515395 already noted in fixing CVE-2017-15650 that this could happen, but did nothing to validate that the RR type of the answer matches the question; it just enforced the limit on number of results to preclude overflow. presently, name_from_dns ignores the return value of __dns_parse, so it doesn't really matter whether we return 0 (ignoring the RR) or -1 (parse-ending error) upon encountering the mismatched RR. if that ever changes, though, ignoring irrelevant answer RRs sounds like the semantically correct thing to do, so for now let's return 0 from the callback when this happens.
2022-10-19getaddrinfo dns lookup: use larger answer buffer to handle long CNAMEsRich Felker-3/+5
the size of 512 is not sufficient to get at least one address in the worst case where the name is at or near max length and resolves to a CNAME at or near max length. prior to tcp fallback, there was nothing we could do about this case anyway, but now it's fixable. the new limit 768 is chosen so as to admit roughly the number of addresses with a worst-case CNAME as could fit for a worst-case name that's not a CNAME in the old 512-byte limit. outside of this worst-case, the number of addresses that might be obtained is increased. MAXADDRS (48) was originally chosen as an upper bound on the combined number of A and AAAA records that could fit in 512-byte packets (31 and 17, respectively). it is not increased at this time. so as to prevent a situation where the A records consume almost all of these slots (at 768 bytes, a "best-case" name can fit almost 47 A records), the order of parsing is swapped to process AAAA first. this ensures roughly half of the slots are available to each address family.
2022-09-20getaddrinfo: add EAI_NODATA error code to distinguish NODATA vs NxDomainRich Felker-2/+2
this was apparently omitted long ago out of a lack of understanding of its importance and the fact that POSIX doesn't specify it. despite not being officially standardized, however, it turns out that at least AIX, glibc, NetBSD, OpenBSD, QNX, and Solaris document and support it. in certain usage cases, such as implementing a DNS gateway on top of the stub resolver interfaces, it's necessary to distinguish the case where a name does not exit (NxDomain) from one where it exists but has no addresses (or other records) of the requested type (NODATA). in fact, even the legacy gethostbyname API had this distinction, which we were previously unable to support correctly because the backend lacked it. apart from fixing an important functionality gap, adding this distinction helps clarify to users how search domain fallback works (falling back in cases corresponding to EAI_NONAME, not in ones corresponding to EAI_NODATA), a topic that has been a source of ongoing confusion and frustration. as a result of this change, EAI_NONAME is no longer a valid universal error code for getaddrinfo in the case where AI_ADDRCONFIG has suppressed use of all address families. in order to return an accurate result in this case, getaddrinfo is modified to still perform at least one lookup. this will almost surely fail (with a network error, since there is no v4 or v6 network to query DNS over) unless a result comes from the hosts file or from ip literal parsing, but in case it does succeed, the result is replaced by EAI_NODATA. glibc has a related error code, EAI_ADDRFAMILY, that could be used for the AI_ADDRCONFIG case and certain NODATA cases, but distinguishing them properly in full generality seems to require additional DNS queries that are otherwise not useful. on glibc, it is only used for ip literals with mismatching family, not for DNS or hosts file results where the name has addresses only in the opposite family. since this seems misleading and inconsistent, and since EAI_NODATA already covers the semantic case where the "name" exists but doesn't have any addresses in the requested family, we do not adopt EAI_ADDRFAMILY at this time. this could be changed at some point if desired, but the logic for getting all the corner cases with AI_ADDRCONFIG right is slightly nontrivial.
2022-09-19dns: treat names rejected by res_mkquery as nonexistent rather than errorRich Felker-1/+1
this distinction only affects search, but allows search to continue when concatenating one of the search domains onto the requested name produces a result that's not valid. this can happen when the concatenation is too long, or one of the search list entries is itself not valid. as a consequence of this change, having "." in the search domains list will now be ignored/skipped rather than making the lookup abort with no results (due to producing a concatenation ending in ".."). this behavior could be changed later if needed.
2022-06-03ensure distinct query id for parallel A and AAAA queries in resolverRich Felker-0/+3
assuming a reasonable realtime clock, res_mkquery is highly unlikely to generate the same query id twice in a row, but it's possible with a very low-resolution system clock or under extreme delay of forward progress. when it happens, res_msend fails to wait for both answers, and instead stops listening after getting two answers to the same query (A or AAAA). to avoid this, increment one byte of the second query's id if it matches the first query's. don't bother checking if the second byte is also equal, since it doesn't matter; we just need to ensure that at least one byte is distinct.
2020-08-05in hosts file lookups, honor first canonical name regardless of familyRich Felker-1/+1
prior to this change, the canonical name came from the first hosts file line matching the requested family, so the canonical name for a given hostname could differ depending on whether it was requested with AF_UNSPEC or a particular family (AF_INET or AF_INET6). now, the canonical name is deterministically the first one to appear with the requested name as an alias.
2020-08-04in hosts file lookups, use only first match for canonical nameRich Felker-2/+7
the existing code clobbered the canonical name already discovered every time another matching line was found, which will necessarily be the case when a hostname has both IPv4 and v6 definitions. patch by Wolf.
2020-05-19fix handling of errors resolving one of paired A+AAAA queryRich Felker-4/+7
the old logic here likely dates back, at least in inspiration, to before it was recognized that transient errors must not be allowed to reflect the contents of successful results and must be reported to the application. here, the dns backend for getaddrinfo, when performing a paired query for v4 and v6 addresses, accepted results for one address family even if the other timed out. (the __res_msend backend does not propagate error rcodes back to the caller, but continues to retry until timeout, so other error conditions were not actually possible.) this patch moves the checks to take place before answer parsing, and performs them for each answer rather than only the answer to the first query. if nxdomain is seen it's assumed to apply to both queries since that's how dns semantics work.
2020-05-18set AD bit in dns queries, suppress for internal useRich Felker-0/+1
the AD (authenticated data) bit in outgoing dns queries is defined by rfc3655 to request that the nameserver report (via the same bit in the response) whether the result is authenticated by DNSSEC. while all results returned by a DNSSEC conforming nameserver will be either authenticated or cryptographically proven to lack DNSSEC protection, for some applications it's necessary to be able to distinguish these two cases. in particular, conforming and compatible handling of DANE (TLSA) records requires enforcing them only in signed zones. when the AD bit was first defined for queries, there were reports of compatibility problems with broken firewalls and nameservers dropping queries with it set. these problems are probably a thing of the past, and broken nameservers are already unsupported. however, since there is no use in the AD bit with the netdb.h interfaces, explicitly clear it in the queries they make. this ensures that, even with broken setups, the standard functions will work, and at most the res_* functions break.
2018-09-12overhaul internally-public declarations using wrapper headersRich Felker-3/+1
commits leading up to this one have moved the vast majority of libc-internal interface declarations to appropriate internal headers, allowing them to be type-checked and setting the stage to limit their visibility. the ones that have not yet been moved are mostly namespace-protected aliases for standard/public interfaces, which exist to facilitate implementing plain C functions in terms of POSIX functionality, or C or POSIX functionality in terms of extensions that are not standardized. some don't quite fit this description, but are "internally public" interfacs between subsystems of libc. rather than create a number of newly-named headers to declare these functions, and having to add explicit include directives for them to every source file where they're needed, I have introduced a method of wrapping the corresponding public headers. parallel to the public headers in $(srcdir)/include, we now have wrappers in $(srcdir)/src/include that come earlier in the include path order. they include the public header they're wrapping, then add declarations for namespace-protected versions of the same interfaces and any "internally public" interfaces for the subsystem they correspond to. along these lines, the wrapper for features.h is now responsible for the definition of the hidden, weak, and weak_alias macros. this means source files will no longer need to include any special headers to access these features. over time, it is my expectation that the scope of what is "internally public" will expand, reducing the number of source files which need to include *_impl.h and related headers down to those which are actually implementing the corresponding subsystems, not just using them.
2018-09-12move __res_msend_rc declaration to lookup.hRich Felker-1/+0
unlike the other res/dn functions, this one is tied to struct resolvconf which is not a public interface, so put it in the private header for its subsystem.
2018-09-12move and deduplicate declarations of __dns_parse to make it checkableRich Felker-1/+0
the source file for this function is completely standalone, but it doesn't seem worth adding a header just for it, so declare it in lookup.h for now.
2018-09-02fix stack-based oob memory clobber in resolver's result sortingRich Felker-1/+1
commit 4f35eb7591031a1e5ef9828f9304361f282f28b9 introduced this bug. it is not present in any released versions. inadvertent use of the & operator on an array into which we're indexing produced arithmetic on the wrong-type pointer, with undefined behavior.
2018-07-11resolver: don't depend on v4mapped ipv6 to probe routability of v4 addrsRich Felker-15/+32
to produce sorted results roughly corresponding to RFC 3484/6724, __lookup_name computes routability and choice of source address via dummy UDP connect operations (which do not produce any packets). since at the logical level, the properties fed into the sort key are computed on ipv6 addresses, the code was written to use the v4mapped ipv6 form of ipv4 addresses and share a common code path for them all. however, on kernels where ipv6 support has been completely omitted, this causes ipv4 to appear equally unroutable as ipv6, thereby putting unreachable ipv6 addresses before ipv4 addresses in the results. instead, use only ipv4 sockets to compute routability for ipv4 addresses. some gratuitous conversion back and forth is left so that the logic is not affected by these changes. it may be possible to simplify the ipv4 case considerably, thereby reducing code size and complexity.
2018-06-26resolver: omit final dot (root/suppress-search) in canonical nameRich Felker-0/+4
if a final dot was included in the queried host name to anchor it to the dns root/suppress search domains, and the result was not a CNAME, the returned canonical name included the final dot. this was not consistent with other implementations, confused some applications, and does not seem desirable. POSIX specifies returning a pointer to, or to a copy of, the input nodename, when the canonical name is not available, but does not attempt to specify what constitutes "not available". in the case of search, we already have an implementation-defined "availability" of a canonical name as the fully-qualified name resulting from search, so defining it similarly in the no-search case seems reasonable in addition to being consistent with other implementations. as a bonus, fix the case where more than one trailing dot is included, since otherwise the changes made here would wrongly cause lookups with two trailing dots to succeed. previously this case resulted in malformed dns queries and produced EAI_AGAIN after a timeout. now it fails immediately with EAI_NONAME.
2017-10-18in dns parsing callback, enforce MAXADDRS to preclude overflowRich Felker-0/+1
MAXADDRS was chosen not to need enforcement, but the logic used to compute it assumes the answers received match the RR types of the queries. specifically, it assumes that only one replu contains A record answers. if the replies to both the A and the AAAA query have their answer sections filled with A records, MAXADDRS can be exceeded and clobber the stack of the calling function. this bug was found and reported by Felix Wilhelm.
2017-04-11fix read past end of buffer in getaddrinfo backendRich Felker-2/+2
due to testing buf[i].family==AF_INET before checking i==cnt, it was possible to read past the end of the array, or past the valid part. in practice, without active bounds/indeterminate-value checking by the compiler, the worst that happened was failure to return early and optimize out the sorting that's unneeded for v4-only results. returning on i==cnt-1 rather than i==cnt would be an alternate fix, but the approach this patch takes is more idiomatic and less error-prone. patch by Timo Teräs.
2016-06-29refactor name_from_dns in hostname lookup backendNatanael Copa-14/+13
loop over an address family / resource record mapping to avoid repetitive code.
2016-06-29in performing dns lookups, check result from res_mkqueryNatanael Copa-0/+4
don't send a query that may be malformed.
2016-03-02handle non-matching address family entries in hosts fileRich Felker-3/+11
name_from_hosts failed to account for the possibility of an address family error from name_from_numeric, wrongly counting such a return as success and using the uninitialized address data as part of the results passed up to the caller. non-matching address family entries cannot simply be ignored or results would be inconsistent with respect to whether AF_UNSPEC or a specific address family is queried. instead, record that a non-matching entry was seen, and fail the lookup with EAI_NONAME of no matching-family entries are found.
2016-01-28reuse parsed resolv.conf in dns core to avoid re-reading/re-parsingRich Felker-5/+6
2016-01-28add support for search domains to dns resolverRich Felker-1/+41
search is only performed if the search or domain keyword is used in resolv.conf and the queried name has fewer than ndots dots. there is no default domain and names with >=ndots dots are never subjected to search; failure in the root scope is final. the (non-POSIX) res_search API presently does not honor search. this may be added at some point in the future if needed. resolv.conf is now parsed twice, at two different layers of the code involved. this will be fixed in a subsequent patch.
2016-01-28fix handling of dns response codesRich Felker-1/+2
rcode of 3 (NxDomain) was treated as a hard EAI_NONAME failure, but it should instead return 0 (no results) so the caller can continue searching. this will be important for adding search domain support. the top-level caller will automatically return EAI_NONAME if there are zero results at the end. also, the case where rcode is 0 (success) but there are no results was not handled. this happens when the domain exists but there are no A or AAAA records for it. in this case a hard EAI_NONAME should be imposed to inhibit further search, since the name was defined and just does not have any address associated with it. previously a misleading hard failure of EAI_FAIL was reported.
2015-10-26safely handle failure to open hosts, services, resolv.conf filesRich Felker-1/+9
previously, transient failures like fd exhaustion or other resource-related errors were treated the same as non-existence of these files, leading to fallbacks or false-negative results. in particular: - failure to open hosts resulted in fallback to dns, possibly yielding EAI_NONAME for a hostname that should be defined locally, or an unwanted result from dns that the hosts file was intended to replace. - failure to open services resulted in EAI_SERVICE. - failure to open resolv.conf resulted in querying localhost rather than the configured nameservers. now, only permanent errors trigger the fallback behaviors above; all other errors are reportable to the caller as EAI_SYSTEM.
2014-06-21implement result address sorting in the resolver (getaddrinfo, etc.)Rich Felker-0/+135
2014-06-06accept trailing . and empty domain namesSzabolcs Nagy-3/+4
trailing . should be accepted in domain name strings by convention (RFC 1034), host name lookup accepts "." but rejects empty "", res_* interfaces also accept empty name following existing practice.
2014-06-05fix the domain name length limit checksSzabolcs Nagy-2/+2
A domain name is at most 255 bytes long (RFC 1035), but the string representation is two bytes smaller so the strlen maximum is 253.
2014-06-04add support for reverse name lookups from hosts file to getnameinfoRich Felker-38/+3
this also affects the legacy gethostbyaddr family, which uses getnameinfo as its backend. some other minor changes associated with the refactoring of source files are also made; in particular, the resolv.conf parser now uses the same code that's used elsewhere to handle ip literals, so as a side effect it can now accept a scope id for nameserver addressed with link-local scope.
2014-06-04add support for ipv6 scope_id to getaddrinfo and getnameinfoRich Felker-4/+27
for all address types, a scope_id specified as a decimal value is accepted. for addresses with link-local scope, a string containing the interface name is also accepted. some changes are made to error handling to avoid unwanted fallbacks in the case where the scope_id is invalid: if an earlier name lookup backend fails with an error rather than simply "0 results", this failure now suppresses any later attempts with other backends. in getnameinfo, a light "itoa" type function is added for generating decimal scope_id results, and decimal port strings for services are also generated using this function now so as not to pull in the dependency on snprintf. in netdb.h, a definition for the NI_NUMERICSCOPE flag is added. this is required by POSIX (it was previously missing) and needed to allow callers to suppress interface-name lookups.
2014-06-03fix negative response and non-response handling for dns queriesRich Felker-1/+4
previously, all failures to obtain at least one address were treated as nonexistant names (EAI_NONAME). this failed to account for the possibility of transient failures (no response at all, or a response with rcode of 2, server failure) or permanent failures that do not indicate the nonexistence of the requested name. only an rcode of 3 should be treated as an indication of nonexistence.
2014-06-02remove cruft from old resolver and numeric ip parsingRich Felker-1/+3
the old resolver code used a function __ipparse which contained the logic for inet_addr and inet_aton, which is needed in getaddrinfo. this was phased out in the resolver overhaul in favor of directly using inet_aton and inet_pton as appropriate. this commit cleans up some stuff that was left behind.
2014-06-02switch standard resolver functions to use the new dns backendRich Felker-21/+61
this is the third phase of the "resolver overhaul" project. this commit removes all of the old dns code, and switches the __lookup_name backend (used by getaddrinfo, etc.) and the getnameinfo function to use the newly implemented __res_mkquery and __res_msend interfaces. for parsing the results, a new callback-based __dns_parse function, based on __dns_get_rr from the old dns code, is used.
2014-06-02fix off-by-one in checking hostname length in new resolver backendRich Felker-2/+2
this bug was introduced in the recent resolver overhaul commits. it likely had visible symptoms. these were probably limited to wrongly accepting truncated versions of over-long names (vs rejecting them), as opposed to stack-based overflows or anything more severe, but no extensive checks were made. there have been no releases where this bug was present.
2014-05-31refactor getaddrinfo and add support for most remaining featuresRich Felker-0/+168
this is the first phase of the "resolver overhaul" project. conceptually, the results of getaddrinfo are a direct product of a list of address results and a list of service results. the new code makes this explicit by computing these lists separately and combining the results. this adds support for services that have both tcp and udp versions, where the caller has not specified which it wants, and eliminates a number of duplicate code paths which were all producing the final output addrinfo structures, but in subtly different ways, making it difficult to implement any of the features which were missing. in addition to the above benefits, the refactoring allows for legacy functions like gethostbyname to be implemented without using the getaddrinfo function itself. such changes to the legacy functions have not yet been made, however. further improvements include matching of service alias names from /etc/services (previously only the primary name was supported), returning multiple results from /etc/hosts (previously only the first matching line was honored), and support for the AI_V4MAPPED and AI_ALL flags. features which remain unimplemented are IDN translations (encoding non-ASCII hostnames for DNS lookup) and the AI_ADDRCONFIG flag. at this point, the DNS-based name resolving code is still based on the old interfaces in __dns.c, albeit somewhat simpler in its use of them. there may be some dead code which could already be removed, but changes to this layer will be a later phase of the resolver overhaul.