summaryrefslogtreecommitdiff
path: root/src/network
AgeCommit message (Collapse)AuthorLines
2024-02-29getnameinfo: fix calling __dns_parse with potentially too large rlenAlexey Izbyshev-1/+3
__res_send returns the full answer length even if it didn't fit the buffer, but __dns_parse expects the length of the filled part of the buffer. This is analogous to commit 77327ed064bd57b0e1865cd0e0364057ff4a53b4, which fixed the only other __dns_parse call site.
2023-11-06remove arbitrary limit from dns result parsingQuentin Rameau-1/+0
The name resolution would abort when getting more than 63 records per request, due to what seems to be a left-over from the original code. This check was non-breaking but spurious prior to TCP fallback support, since any 512-byte packet with more than 63 records was necessarily malformed. But now, it wrongly rejects valid results. Reported by Daniel Stefanik in Alpine Linux aports issue 15320.
2023-07-17fix rejection of dns responses with pointers past 512 byte offsetRich Felker-2/+2
the __dns_parse code used by the stub resolver traditionally included code to reject label pointers to offsets past a 512 byte limit, despite never processing the label contents, only stepping over them. when commit 51d4669fb97782f6a66606da852b5afd49a08001 added support for tcp fallback, this limit was overlooked, and as a result, it was at least theoretically possible for some valid large answers to be rejected on account of these offsets. since the limit was never serving any useful purpose, just remove it.
2023-07-04dns stub resolver: increase buffer size to handle chained CNAMEsRich Felker-1/+1
in the event of chained CNAMEs, the answer to a query will contain the entire CNAME chain, not just one CNAME record. previously, the answer buffer size had been chosen to admit a maximal-length CNAME, but only one. a moderate-length chain could fill the available 768 bytes leaving no room for an actual address answering the query. while the DNS RFCs do not specify any limit on the length of a CNAME chain, or any reasonable behavior is the chain exceeds the entire 64k possible message size, actual recursive servers have to impose a limit, and a such, for all practical purposes, chains longer than this limit are not usable. it turns out BIND has a hard-coded limit of 16, and Unbound has a default limit of 11. assuming the recursive server makes use of "compression" (pointers), each maximal-length CNAME record takes at most 268 bytes, and thus any chain up to length 16 fits in at most 4288 bytes. this patch increases the answer buffer size to preserve the original intent of having 512 bytes available for address answers, plus space needed for a maximal CNAME chain, for a total of 4800 bytes. the resulting size of 9600 bytes for two queries (A+AAAA) is still well within what is reasonable to place in automatic storage.
2023-04-07dns: check length field in tcp response messageAlexey Kodanev-0/+1
The received length field in the message may be greater than the size of the 'answer' buffer in which the message resides. Currently, ABUF_SIZE is 768. And if we get a larger 'alens[i]', it will result in an out-of-bounds reading in __dns_parse(). To fix this, limit the length to the size of the received buffer.
2023-02-28getservbyport_r: fix wrong result if getnameinfo fails with EAI_OVERFLOWAlexey Izbyshev-0/+2
EAI_OVERFLOW should be propagated as ERANGE to inform the caller about the need to expand the buffer.
2023-02-28getservbyport_r: fix out-of-bounds buffer readAlexey Izbyshev-1/+1
If the buffer passed to getservbyport_r is just enough to store two pointers after aligning it, getnameinfo is called with buflen == 0 (which means that service name is not needed) and trivially succeeds. Then, strtol is called on the address just past the buffer end, and if it doesn't happen to find the port number there, getservbyport_r spuriously succeeds and returns the same bad address to the caller. Fix this by ensuring that buflen is at least 1 when passed to getnameinfo.
2023-02-28getifaddrs: fix UB via taking address of null pointer union dereferenceAlexey Izbyshev-7/+7
getifaddrs computes &ctx->first->ifa even if ctx->first is NULL. While this shouldn't be possible on the success path because the loopback interface is hardcoded into the kernel, this is still possible on the error path (for example, if __rtnetlink_enumerate couldn't create a socket due to exceeding the fd limit).
2023-02-28accept4: don't fall back to accept if we got unknown flagsAlexey Izbyshev-0/+4
accept4 emulation via accept ignores unknown flags, so it can spuriously succeed instead of failing (or succeed without doing the action implied by an unknown flag if it's added in a future kernel). Worse, unknown flags trigger the fallback code even on modern kernels if the real accept4 syscall returns EINVAL, because this is indistinguishable from socketcall returning EINVAL due to lack of accept4 support. Fix this by always failing with EINVAL if unknown flags are present and the syscall is missing or failed with EINVAL.
2023-02-27fix potential read past end of buffer in getnameinfo host name lookupAlexey Izbyshev-0/+1
This is completely analoguous to commit 633183b5d1c2. Similar code called from __lookup_name is not affected because it checks that the line contains the host name surrounded by blanks.
2023-02-27dns: fix workaround for systems defaulting to ipv6-only socketsAlexey Izbyshev-15/+16
When IPv6 nameservers are present, __res_msend_rc attempts to disable IPV6_V6ONLY socket option to ensure that it can communicate with IPv4 nameservers (if they are present too) via IPv4-mapped IPv6 addresses. However, this option can't be disabled on bound sockets, so setsockopt always fails.
2023-02-27dns: handle early eof in tcp fallbackAlexey Izbyshev-1/+1
A zero returned from recvmsg is currently treated as if some data were received, so if a DNS server closes its TCP socket before sending the full answer, __res_msend_rc will spin until the timeout elapses because POLLIN event will be reported on each poll. Fix this by treating an early EOF as an error.
2023-02-27prevent CNAME/PTR parsing from reading data past the response endAlexey Izbyshev-7/+7
DNS parsing callbacks pass the response buffer end instead of the actual response end to dn_expand, so a malformed DNS response can use message compression to make dn_expand jump past the response end and attempt to parse uninitialized parts of that buffer, which might succeed and return garbage.
2023-02-27fix out-of-bounds reads in __dns_parseAlexey Izbyshev-3/+3
There are several issues with range checks in this function: * The question section parsing loop can read up to two out-of-bounds bytes before doing the range check and bailing out. * The answer section parsing loop, in addition to the same issue as above, uses the wrong length in the range check that doesn't prevent OOB reads when computing len later. * The len range check before calling the callback is off by 10. Also, p+len can overflow in a (probably theoretical) case when p is within 2^16 from UINTPTR_MAX. Because __dns_parse is used only with stack-allocated buffers, such small overreads can't result in a segfault. The first two also don't affect the function result, but the last one may result in getaddrinfo incorrectly succeeding and returning up to 10 bytes past the response buffer as a part of the IP address, and in (canon) name returned by getaddrinfo/getnameinfo being affected by memory past the response buffer (because dn_expand might interpret it as a pointer).
2023-02-12dns: prefer monotonic clock for timeoutsA. Wilcox-1/+2
Before this commit, DNS timeouts always used CLOCK_REALTIME, which could produce spurious timeouts or delays if wall time changed for whatever reason. Now we try CLOCK_MONOTONIC and only fall back to CLOCK_REALTIME when it is unavailable.
2023-02-12inet_pton: fix uninitialized memory use for IPv4-mapped IPv6 addressesAlexey Izbyshev-0/+1
When a dot is encountered, the loop counter is incremented before exiting the loop, but the corresponding ip array element is left uninitialized, so the subsequent memmove (if "::" was seen) and the loop copying ip to the output buffer will operate on an uninitialized uint16_t. The uninitialized data never directly influences the control flow and is overwritten on successful return by the second half of the parsed IPv4 address. But it's better to fix this to avoid unexpected transformations by a sufficiently smart compiler and reports from UB-detection tools.
2023-02-12increase sendmsg internal buffer to support SCM_MAX_FDColin Cross-2/+5
The kernel defines a limit on the number of fds that can be passed through an SCM_RIGHTS ancillary message as SCM_MAX_FD. The value was 255 before kernel 2.6.38 (after that it is 253), and an SCM_RIGHTS ancillary message with 255 fds requires 1040 bytes, slightly more than the current 1024 byte internal buffer in sendmsg. 1024 is an arbitrary size, so increase it to match the the arbitrary size limit in the kernel. This fixes tests that are verifying they support up to SCM_MAX_FD fds.
2022-10-20fix return value of gethostby{name[2],addr} with no result but no errorRich Felker-2/+2
commit f081d5336a80b68d3e1bed789cc373c5c3d6699b fixed gethostbyname[2]_r to treat negative results as a non-error, leaving gethostbyname[2] wrongly returning a pointer to the unfilled result buffer rather than a null pointer. since, as documented with commit fe82bb9b921be34370e6b71a1c6f062c20999ae0, the caller of gethostby{name[2],addr}_r can always rely on the result pointer being set, use that consistently rather than trying to duplicate logic about whether we have a result or not in gethostby{name[2],addr}.
2022-10-19clean up dns_parse_callbackRich Felker-13/+13
the only functional change here should be that MAXADDRS is only checked for RRs that provide address results, so that a CNAME which appears after an excessive number of address RRs does not get ignored. I'm not aware of any servers that order the RRs this way, and it may even be forbidden to do so, but I prefer having the callback logic not be order dependent. other than that, the motivation for this change is that the A and AAAA cases were mostly duplicate code that could be combined as a single code path.
2022-10-19dns response handling: don't treat too many addresses as an errorRich Felker-1/+1
returning -1 rather than 0 from the parse function causes __dns_parse to bail out and return an error. presently, name_from_dns does not check the return value anyway, so this does not matter, but if it ever started treating this as an error, lookups with large numbers of addresses would break. this is a consequence of adding TCP support and extending the buffer size used in name_from_dns.
2022-10-19dns response handling: ignore presence of wrong-type RRsRich Felker-2/+8
reportedly there is nameserver software with question-rewriting "functionality" which gives A answers when AAAA is queried. since we made no effort to validate that the answer RR type actually corresponds to the question asked, it was possible (depending on flags, etc.) for these answers to leak through, which the caller might not be prepared for. indeed, our implementation of gethostbyname2_r makes an assumption that the resulting addresses are in the family requested, and will misinterpret the results if they don't. commit 45ca5d3fcb6f874bf5ba55d0e9651cef68515395 already noted in fixing CVE-2017-15650 that this could happen, but did nothing to validate that the RR type of the answer matches the question; it just enforced the limit on number of results to preclude overflow. presently, name_from_dns ignores the return value of __dns_parse, so it doesn't really matter whether we return 0 (ignoring the RR) or -1 (parse-ending error) upon encountering the mismatched RR. if that ever changes, though, ignoring irrelevant answer RRs sounds like the semantically correct thing to do, so for now let's return 0 from the callback when this happens.
2022-10-19dns query core: detect udp truncation at recv timeRich Felker-4/+13
we already attempt to preclude this case by having res_send use a sufficiently large temporary buffer even if the caller did not provide one as large as or larger than the udp dns max of 512 bytes. however, it's possible that the caller passed a custom-crafted query packet using EDNS0, e.g. to get detailed DNSSEC results, with a larger udp size allowance. I have also seen claims that there are some broken nameservers in the wild that do not honor the dns udp limit of 512 and send large answers without the TC bit set, when the query was not using EDNS. we generally don't aim to support broken nameservers, but in this case both problems, if the latter is even real, have a common solution: using recvmsg instead of recvfrom so we can examine the MSG_TRUNC flag.
2022-10-19getaddrinfo dns lookup: use larger answer buffer to handle long CNAMEsRich Felker-3/+5
the size of 512 is not sufficient to get at least one address in the worst case where the name is at or near max length and resolves to a CNAME at or near max length. prior to tcp fallback, there was nothing we could do about this case anyway, but now it's fixable. the new limit 768 is chosen so as to admit roughly the number of addresses with a worst-case CNAME as could fit for a worst-case name that's not a CNAME in the old 512-byte limit. outside of this worst-case, the number of addresses that might be obtained is increased. MAXADDRS (48) was originally chosen as an upper bound on the combined number of A and AAAA records that could fit in 512-byte packets (31 and 17, respectively). it is not increased at this time. so as to prevent a situation where the A records consume almost all of these slots (at 768 bytes, a "best-case" name can fit almost 47 A records), the order of parsing is swapped to process AAAA first. this ensures roughly half of the slots are available to each address family.
2022-09-22dns: implement tcp fallback in __res_msend query coreRich Felker-2/+117
tcp fallback was originally deemed unwanted and unnecessary, since we aim to return a bounded-size result from getaddrinfo anyway and normally plenty of address records fit in the 512-byte udp dns limit. however, this turned out to have several problems: - some recursive nameservers truncate by omitting all the answers, rather than sending as many as can fit. - a pathological worst-case CNAME for a worst-case name can fill the entire 512-byte space with just the two names, leaving no room for any addresses. - the res_* family of interfaces allow querying of non-address records such as TLSA (DANE), TXT, etc. which can be very large. for many of these, it's critical that the caller see the whole RRset. also, res_send/res_query are specified to return the complete, untruncated length so that the caller can retry with an appropriately-sized buffer. determining this is not possible without tcp. so, it's time to add tcp fallback. the fallback strategy implemented here uses one tcp socket per question (1 or 2 questions), initiated via tcp fastopen when possible. the connection is made to the nameserver that issued the truncated answer. right now, fallback happens unconditionally when truncation is seen. this can, and may later be, relaxed for queries made by the getaddrinfo system, since it will only use a bounded number of results anyway. retry is not attempted again after failure over tcp. the logic could easily be adapted to do that, but it's of questionable value, since the tcp stack automatically handles retransmission and the successs answer with TC=1 over udp strongly suggests that the nameserver has the full answer ready to give. further retry is likely just "take longer to fail".
2022-09-22res_send: use a temp buffer if caller's buffer is under 512 bytesRich Felker-1/+9
for extremely small buffer sizes, the DNS query core in __res_msend may malfunction completely, being unable to get even the headers to determine the response code. but there is also a problem for reasonable sizes under 512 bytes: __res_msend is unable to determine if the udp answer was truncated at the recv layer, in which case it may be incomplete, and res_send is then unable to honor its contract to return the length of the full, non-truncated answer. at present, res_send does not honor that contract anyway when the full answer would exceed 512 bytes, since there is no tcp fallback, but this change at least makes it consistent in a context where this is the only "full answer" to be had.
2022-09-21adapt res_msend DNS query core for working with multiple socketsRich Felker-6/+11
this is groundwork for TCP fallback support, but does not itself change behavior in any way.
2022-09-20getaddrinfo: add EAI_NODATA error code to distinguish NODATA vs NxDomainRich Felker-6/+12
this was apparently omitted long ago out of a lack of understanding of its importance and the fact that POSIX doesn't specify it. despite not being officially standardized, however, it turns out that at least AIX, glibc, NetBSD, OpenBSD, QNX, and Solaris document and support it. in certain usage cases, such as implementing a DNS gateway on top of the stub resolver interfaces, it's necessary to distinguish the case where a name does not exit (NxDomain) from one where it exists but has no addresses (or other records) of the requested type (NODATA). in fact, even the legacy gethostbyname API had this distinction, which we were previously unable to support correctly because the backend lacked it. apart from fixing an important functionality gap, adding this distinction helps clarify to users how search domain fallback works (falling back in cases corresponding to EAI_NONAME, not in ones corresponding to EAI_NODATA), a topic that has been a source of ongoing confusion and frustration. as a result of this change, EAI_NONAME is no longer a valid universal error code for getaddrinfo in the case where AI_ADDRCONFIG has suppressed use of all address families. in order to return an accurate result in this case, getaddrinfo is modified to still perform at least one lookup. this will almost surely fail (with a network error, since there is no v4 or v6 network to query DNS over) unless a result comes from the hosts file or from ip literal parsing, but in case it does succeed, the result is replaced by EAI_NODATA. glibc has a related error code, EAI_ADDRFAMILY, that could be used for the AI_ADDRCONFIG case and certain NODATA cases, but distinguishing them properly in full generality seems to require additional DNS queries that are otherwise not useful. on glibc, it is only used for ip literals with mismatching family, not for DNS or hosts file results where the name has addresses only in the opposite family. since this seems misleading and inconsistent, and since EAI_NODATA already covers the semantic case where the "name" exists but doesn't have any addresses in the requested family, we do not adopt EAI_ADDRFAMILY at this time. this could be changed at some point if desired, but the logic for getting all the corner cases with AI_ADDRCONFIG right is slightly nontrivial.
2022-09-19fix error cases in gethostbyaddr_rRich Felker-2/+3
EAI_MEMORY is not possible (but would not provide errno if it were) and EAI_FAIL does not provide errno. treat the latter as EBADMSG to match how it's handled in gethostbyname2_r (it indicates erroneous or failure response from the nameserver).
2022-09-19remove impossible error case from gethostbyname2_rRich Felker-1/+0
EAI_MEMORY is not possible because the resolver backend does not allocate. if it did, it would be necessary for us to explicitly return ENOMEM as the error, since errno is not guaranteed to reflect the error cause except in the case of EAI_SYSTEM, so the existing code was not correct anyway.
2022-09-19fix return value of gethostnbyname[2]_r on result not foundRich Felker-1/+1
these functions are horribly underspecified, inconsistent between historical systems, and should never have been included. however, the signatures we have match the glibc ones, and the glibc behavior is to treat NxDomain and NODATA results as a success condition, not an ENOENT error.
2022-09-19dns: treat names rejected by res_mkquery as nonexistent rather than errorRich Felker-1/+1
this distinction only affects search, but allows search to continue when concatenating one of the search domains onto the requested name produces a result that's not valid. this can happen when the concatenation is too long, or one of the search list entries is itself not valid. as a consequence of this change, having "." in the search domains list will now be ignored/skipped rather than making the lookup abort with no results (due to producing a concatenation ending in ".."). this behavior could be changed later if needed.
2022-09-19res_mkquery: error out on consecutive final dots in nameRich Felker-0/+1
the main loop already errors out on zero-length labels within the name, but terminates before having a chance to check for an erroneous final zero-length label, instead producing a malformed query packet with a '.' byte instead of the terminating zero. rather than poke at the look logic, simply detect this condition early and error out without doing anything. this also fixes behavior of getaddrinfo when "." appears in the search domain list, which produces a name ending in ".." after concatenation, at least in the sense of no longer emitting malformed packets on the network. however, due to other issues, the lookup will still fail.
2022-08-26dns: fail if ipv6 is disabled and resolv.conf has only v6 nameservesRich Felker-0/+5
if resolv.conf lists no nameservers at all, the default of 127.0.0.1 is used. however, another "no nameservers" case arises where the system has ipv6 support disabled/configured-out and resolv.conf only contains v6 nameservers. this caused the resolver to repeat socket operations that will necessarily fail (sending to one or more wrong-family addresses) while waiting for a timeout. it would be contrary to configured intent to query 127.0.0.1 in this case, but the current behavior is not conducive to diagnosing the configuration problem. instead, fail immediately with EAI_SYSTEM and errno==EAFNOSUPPORT so that the configuration error is reportable.
2022-08-24fix fallback when ipv6 is disabled but resolv.conf has v6 nameservesRich Felker-1/+2
apparently this code path was never tested, as it's not usual to have v6 nameservers listed on a system without v6 networking support. but it was always intended to work. when reverting to binding a v4 address, also revert the family in the sockaddr structure and the socklen for it. otherwise bind will just fail due to mismatched family/sockaddr size. fix dns resolver fallback when v6 nameservers are listed by
2022-08-01fix mishandling of errno in getaddrinfo AI_ADDRCONFIG logicRich Felker-0/+2
this code attempts to use the value of errno from failure of socket or connect to infer availability of the requested address family (v4 or v6). however, in the case where connect failed, there is an intervening call to close between connect and the use of errno. close is not required to preserve errno on success, and in fact the __aio_close code, which is called whenever aio is linked and thus always called in dynamic-linked programs, unconditionally clobbers errno. as a result, getaddrinfo fails with EAI_SYSTEM and errno=ENOENT rather than correctly determining that the address family was unavailable. this fix is based on report/patch by Jussi Nieminen, but simplified slightly to avoid breaking the case where socket, not connect, failed.
2022-06-03ensure distinct query id for parallel A and AAAA queries in resolverRich Felker-0/+3
assuming a reasonable realtime clock, res_mkquery is highly unlikely to generate the same query id twice in a row, but it's possible with a very low-resolution system clock or under extreme delay of forward progress. when it happens, res_msend fails to wait for both answers, and instead stops listening after getting two answers to the same query (A or AAAA). to avoid this, increment one byte of the second query's id if it matches the first query's. don't bother checking if the second byte is also equal, since it doesn't matter; we just need to ensure that at least one byte is distinct.
2022-04-10fix incorrect parameter name in internal netlink.h RTA_OK macroOndrej Jirman-1/+1
the wrong name works only by accident.
2020-09-03fix missing newline in herror outputRich Felker-1/+1
2020-08-30restore h_errno ABI compatibility with ancient binariesRich Felker-0/+4
prior to commit e68c51ac46a9f273927aef8dcebc89912ab19ece, h_errno was actually an external data object not a macro. bring back the symbol, and use it as the storage for the main thread's h_errno. technically this still doesn't provide full compatibility if the application was multithreaded, but at the time there were no res_* functions (and they did not set h_errno anyway), so any use of h_errno would have been via thread-unsafe functions. thus a solution that just fixes single-threaded applications seems acceptable.
2020-08-24report res_query failures, including nxdomain/nodata, via h_errnoRich Felker-1/+15
while it's not clearly documented anywhere, this is the historical behavior which some applications expect. applications which need to see the response packet in these cases, for example to distinguish between nonexistence in a secure vs insecure zone, must already use res_mkquery with res_send in order to be portable, since most if not all other implementations of res_query don't provide it.
2020-08-24make h_errno thread-localRich Felker-4/+2
the framework to do this always existed but it was deemed unnecessary because the only [ex-]standard functions using h_errno were not thread-safe anyway. however, some of the nonstandard res_* functions are also supposed to set h_errno to indicate the cause of error, and were unable to do so because it was not thread-safe. this change is a prerequisite for fixing them.
2020-08-05in hosts file lookups, honor first canonical name regardless of familyRich Felker-1/+1
prior to this change, the canonical name came from the first hosts file line matching the requested family, so the canonical name for a given hostname could differ depending on whether it was requested with AF_UNSPEC or a particular family (AF_INET or AF_INET6). now, the canonical name is deterministically the first one to appear with the requested name as an alias.
2020-08-04in hosts file lookups, use only first match for canonical nameRich Felker-2/+7
the existing code clobbered the canonical name already discovered every time another matching line was found, which will necessarily be the case when a hostname has both IPv4 and v6 definitions. patch by Wolf.
2020-05-19fix return value of res_send, res_query on errors from nameserverRich Felker-1/+1
the internal __res_msend returns 0 on timeout without having obtained any conclusive answer, but in this case has not filled in meaningful anslen. res_send wrongly treated that as success, but returned a zero answer length. any reasonable caller would eventually end up treating that as an error when attempting to parse/validate it, but it should just be reported as an error. alternatively we could return the last-received inconclusive answer (typically servfail), but doing so would require internal changes in __res_msend. this may be considered later.
2020-05-19fix handling of errors resolving one of paired A+AAAA queryRich Felker-4/+7
the old logic here likely dates back, at least in inspiration, to before it was recognized that transient errors must not be allowed to reflect the contents of successful results and must be reported to the application. here, the dns backend for getaddrinfo, when performing a paired query for v4 and v6 addresses, accepted results for one address family even if the other timed out. (the __res_msend backend does not propagate error rcodes back to the caller, but continues to retry until timeout, so other error conditions were not actually possible.) this patch moves the checks to take place before answer parsing, and performs them for each answer rather than only the answer to the first query. if nxdomain is seen it's assumed to apply to both queries since that's how dns semantics work.
2020-05-18set AD bit in dns queries, suppress for internal useRich Felker-0/+3
the AD (authenticated data) bit in outgoing dns queries is defined by rfc3655 to request that the nameserver report (via the same bit in the response) whether the result is authenticated by DNSSEC. while all results returned by a DNSSEC conforming nameserver will be either authenticated or cryptographically proven to lack DNSSEC protection, for some applications it's necessary to be able to distinguish these two cases. in particular, conforming and compatible handling of DANE (TLSA) records requires enforcing them only in signed zones. when the AD bit was first defined for queries, there were reports of compatibility problems with broken firewalls and nameservers dropping queries with it set. these problems are probably a thing of the past, and broken nameservers are already unsupported. however, since there is no use in the AD bit with the netdb.h interfaces, explicitly clear it in the queries they make. this ensures that, even with broken setups, the standard functions will work, and at most the res_* functions break.
2020-02-22use __socketcall to simplify socket()Rich Felker-5/+5
commit 59324c8b0950ee94db846a50554183c845ede160 added __socketcall analogous to __syscall, returning the negated error rather than setting errno. use it to simplify the fallback path of socket(), avoiding extern calls and access to errno. Author: Rich Felker <dalias@aerifal.cx> Date: Tue Jul 30 17:51:16 2019 -0400 make __socketcall analogous to __syscall, error-returning
2019-12-17hook recvmmsg up to SO_TIMESTAMP[NS] fallback for pre-time64 kernelsRich Felker-6/+14
always try the time64 syscall first since we can use its success to conclude that no conversion is needed (any setsockopt for the timestamp options would have succeeded without need for fallbacks). otherwise, we have to remember the original controllen for each msghdr, requiring O(vlen) space, so vlen must be bounded. linux clamps it to IOV_MAX for sendmmsg only (not recvmmsg), but doing the same for recvmmsg is not unreasonable, especially since the limitation will only apply to old kernels. we could optimize to avoid trying SYS_recvmmsg_time64 first if all msghdrs have controllen zero, or support unlimited vlen by looping and emulating the timeout logic, but I'm not inclined to do complex and error-prone optimizations on a function that has so many underlying problems it should really never be used.
2019-12-17implement SO_TIMESTAMP[NS] fallback for kernels without time64 versionsRich Felker-0/+63
the definitions of SO_TIMESTAMP* changed on 32-bit archs in commit 38143339646a4ccce8afe298c34467767c899f51 to the new versions that provide 64-bit versions of timeval/timespec structure in control message payload. socket options, being state attached to the socket rather than function calls, are not trivial to implement as fallbacks on ENOSYS, and support for them was initially omitted on the assumption that the ioctl-based polling alternatives (SIOCGSTAMP*) could be used instead by applications if setsockopt fails. unfortunately, it turns out that SO_TIMESTAMP is sufficiently old and widely supported that a number of applications assume it's available and treat errors as fatal. this patch introduces emulation of SO_TIMESTAMP[NS] on pre-time64 kernels by falling back to setting the "_OLD" (time32) versions of the options if the time64 ones are not recognized, and performing translation of the SCM_TIMESTAMP[NS] control messages in recvmsg. since recvmsg does not know whether its caller is legacy time32 code or time64, it performs translation for any SCM_TIMESTAMP[NS]_OLD control messages it sees, leaving the original time32 timestamp as-is (it can't be rewritten in-place anyway, and memmove would be mildly expensive) and appending the converted time64 control message at the end of the buffer. legacy time32 callers will see the converted one as a spurious control message of unknown type; time64 callers running on pre-time64 kernels will see the original one as a spurious control message of unknown type. a time64 caller running on a kernel with native time64 support will only see the time64 version of the control message. emulation of SO_TIMESTAMPING is not included at this time since (1) applications which use it seem to be prepared for the possibility that it's not present or working, and (2) it can also be used in sendmsg control messages, in a manner that looks complex to emulate completely, and costly even when running on a time64-supporting kernel. corresponding changes in recvmmsg are not made at this time; they will be done separately.
2019-08-07fix regression in recvmmsg with no timeoutRich Felker-1/+1
somewhat analogous to commit d0b547dfb5f7678cab6bc39dd736ed6454357ca4, but here the omission of the null timeout check was in the time64 syscall code path. this code is not yet used except on x32.