Age | Commit message (Collapse) | Author | Lines |
|
as the outcome of Austin Group tracker issue #62, future editions of
POSIX have dropped the requirement that fork be AS-safe. this allows
but does not require implementations to synchronize fork with internal
locks and give forked children of multithreaded parents a partly or
fully unrestricted execution environment where they can continue to
use the standard library (per POSIX, they can only portably use
AS-safe functions).
up until recently, taking this allowance did not seem desirable.
however, commit 8ed2bd8bfcb4ea6448afb55a941f4b5b2b0398c0 exposed the
extent to which applications and libraries are depending on the
ability to use malloc and other non-AS-safe interfaces in MT-forked
children, by converting latent very-low-probability catastrophic state
corruption into predictable deadlock. dealing with the fallout has
been a huge burden for users/distros.
while it looks like most of the non-portable usage in applications
could be fixed given sufficient effort, at least some of it seems to
occur in language runtimes which are exposing the ability to run
unrestricted code in the child as part of the contract with the
programmer. any attempt at fixing such contracts is not just a
technical problem but a social one, and is probably not tractable.
this patch extends the fork function to take locks for all libc
singletons in the parent, and release or reset those locks in the
child, so that when the underlying fork operation takes place, the
state protected by these locks is consistent and ready for the child
to use. locking is skipped in the case where the parent is
single-threaded so as not to interfere with legacy AS-safety property
of fork in single-threaded programs. lock order is mostly arbitrary,
but the malloc locks (including bump allocator in case it's used) must
be taken after the locks on any subsystems that might use malloc, and
non-AS-safe locks cannot be taken while the thread list lock is held,
imposing a requirement that it be taken last.
|
|
this further reduces the number of source files which need to include
libc.h and thereby be potentially exposed to libc global state and
internals.
this will also facilitate further improvements like adding an inline
fast-path, if we want to do so later.
|
|
|
|
In all cases this is just a change from two volatile int to one.
|
|
when traditional syslogd implementations are restarted, the old server
socket ceases to exist and a new unix socket with the same pathname is
created. when this happens, the default destination address associated
with the client socket via connect is no longer valid, and attempts to
send produce errors. this happens despite the socket being datagram
type, and is in contrast to the behavior that would be seen with an IP
datagram (UDP) socket.
in order to avoid a situation where the application is unable to send
further syslog messages without calling closelog, this patch makes
syslog attempt to reconnect the socket when send returns an error
indicating a lost connection.
additionally, initial failure to connect the socket no longer results
in the socket being closed. this ensures that an application which
calls openlog to reserve the socket file descriptor will not run into
a situation where transient connection failure (e.g. due to syslogd
restart) prevents fd reservation. however, applications which may be
unable to connect the socket later (e.g. due to chroot, restricted
permissions, seccomp, etc.) will still fail to log if the syslog
socket cannot be connected at openlog time or if it has to be
reconnected later.
|
|
|
|
the memory model we use internally for atomics permits plain loads of
values which may be subject to concurrent modification without
requiring that a special load function be used. since a compiler is
free to make transformations that alter the number of loads or the way
in which loads are performed, the compiler is theoretically free to
break this usage. the most obvious concern is with atomic cas
constructs: something of the form tmp=*p;a_cas(p,tmp,f(tmp)); could be
transformed to a_cas(p,*p,f(*p)); where the latter is intended to show
multiple loads of *p whose resulting values might fail to be equal;
this would break the atomicity of the whole operation. but even more
fundamental breakage is possible.
with the changes being made now, objects that may be modified by
atomics are modeled as volatile, and the atomic operations performed
on them by other threads are modeled as asynchronous stores by
hardware which happens to be acting on the request of another thread.
such modeling of course does not itself address memory synchronization
between cores/cpus, but that aspect was already handled. this all
seems less than ideal, but it's the best we can do without mandating a
C11 compiler and using the C11 model for atomics.
in the case of pthread_once_t, the ABI type of the underlying object
is not volatile-qualified. so we are assuming that accessing the
object through a volatile-qualified lvalue via casts yields volatile
access semantics. the language of the C standard is somewhat unclear
on this matter, but this is an assumption the linux kernel also makes,
and seems to be the correct interpretation of the standard.
|
|
this addresses alpine linux issue #3692 and brings the syslog message
length limit in alignment with uclibc's implementation.
|
|
based on patch by Dima Krasner, with minor improvements for code size.
connect can fail if there is no listening syslogd, in which case a
useless socket was kept open, preventing subsequent syslog call from
attempting to connect again.
|
|
this was previously a no-op, somewhat intentionally, because I failed
to understand that it only has an effect when sending to the logging
facility fails and thus is not the nuisance that it would be if always
sent output to the console.
|
|
this behavior is no longer valid in general, and was never necessary.
if the LOG_PERROR option is set, output to stderr could still succeed.
also, when the LOG_CONS option is added, it will need syslog to
proceed even if opening the log socket fails.
|
|
this is a nonstandard feature, but easy and inexpensive to add. since
the corresponding macro has always been defined in our syslog.h, it
makes sense to actually support it. applications may reasonably be
using the presence of the macro to assume that the feature is
supported.
the behavior of omitting the 'header' part of the log message does not
seem to be well-documented, but matches other implementations (at
least glibc) which have this option.
based on a patch by Clément Vasseur, but simplified using %n.
|
|
errno must be saved upon vsyslog entry, otherwise its value could be
changed by some libc function before reaching the %m handler in
vsnprintf.
|
|
|
|
1. as reported by William Haddon, the value returned by snprintf was
wrongly used as a length passed to sendto, despite it possibly
exceeding the buffer length. this could lead to invalid reads and
leaking additional data to syslog.
2. openlog was storing a pointer to the ident string passed by the
caller, rather than copying it. this bug is shared with (and even
documented in) other implementations like glibc, but such behavior
does not seem to meet the requirements of the standard.
3. extremely long ident provided to openlog, or corrupt ident due to
the above issue, could possibly have resulted in buffer overflows.
despite having the potential for smashing the stack, i believe the
impact is low since ident points to a short string literal in typical
application usage (and per the above bug, other usages will break
horribly on other implementations).
4. when used with LOG_NDELAY, openlog was not connecting the
newly-opened socket; sendto was being used instead. this defeated the
main purpose of LOG_NDELAY: preparing for chroot.
5. the default facility was not being used at all, so all messages
without an explicit facility passed to syslog were getting logged at
the kernel facility.
6. setlogmask was not thread-safe; no synchronization was performed
updating the mask. the fix uses atomics rather than locking to avoid
introducing a lock in the fast path for messages whose priority is not
in the mask.
7. in some code paths, the syslog lock was being unlocked twice; this
could result in releasing a lock that was actually held by a different
thread.
some additional enhancements to syslog such as a default identifier
based on argv[0] or similar may still be desired; at this time, only
the above-listed bugs have been fixed.
|
|
also update syslog to use SOCK_CLOEXEC rather than separate fcntl
step, to make it safe in multithreaded programs that run external
programs.
emulation is not atomic; it could be made atomic by holding a lock on
forking during the operation, but this seems like overkill. my goal is
not to achieve perfect behavior on old kernels (which have plenty of
other imperfect behavior already) but to avoid catastrophic breakage
in (1) syslog, which would give no output on old kernels with the
change to use SOCK_CLOEXEC, and (2) programs built on a new kernel
where configure scripts detected a working SOCK_CLOEXEC, which later
get run on older kernels (they may otherwise fail to work completely).
|
|
i did some testing trying to switch malloc to use the new internal
lock with priority inheritance, and my malloc contention test got
20-100 times slower. if priority inheritance futexes are this slow,
it's simply too high a price to pay for avoiding priority inversion.
maybe we can consider them somewhere down the road once the kernel
folks get their act together on this (and perferably don't link it to
glibc's inefficient lock API)...
as such, i've switch __lock to use malloc's implementation of
lightweight locks, and updated all the users of the code to use an
array with a waiter count for their locks. this should give optimal
performance in the vast majority of cases, and it's simple.
malloc is still using its own internal copy of the lock code because
it seems to yield measurably better performance with -O3 when it's
inlined (20% or more difference in the contention stress test).
|
|
these functions are allowed to be cancellation points, but then we
would have to install cleanup handlers to avoid termination with locks
held.
|
|
with datagram sockets, depending on fprintf not to flush the output
early was very fragile; the new version simply uses a small fixed-size
buffer. it could be updated to dynamic-allocate large buffers if
needed, but i can't envision any admin being happy about finding
64kb-long lines in their syslog...
|
|
per the standard, SIGPIPE is not generated for SOCK_DGRAM.
|
|
it actually appears the hacks to block SIGPIPE are probably not
necessary, and potentially harmful. if i can confirm this, i'll remove
them.
|
|
|