From 54ca677983d47529bab8752315ac1a2b49888870 Mon Sep 17 00:00:00 2001 From: Rich Felker Date: Sun, 31 Mar 2019 18:03:27 -0400 Subject: implement priority inheritance mutexes priority inheritance is a feature to mitigate priority inversion situations, where a execution of a medium-priority thread can unboundedly block forward progress of a high-priority thread when a lock it needs is held by a low-priority thread. the natural way to do priority inheritance would be with a simple futex flag to donate the calling thread's priority to a target thread while it waits on the futex. unfortunately, linux does not offer such an interface, but instead insists on implementing the whole locking protocol in kernelspace with special futex commands that exist solely for the purpose of doing PI mutexes. this would require the entire "trylock" logic to be duplicated in the timedlock code path for PI mutexes, since, once the previous lock holder releases the lock and the futex call returns, the lock is already held by the caller. obviously such code duplication is undesirable. instead, I've made the PI timedlock success path set the mutex lock count to -1, which can be thought of as "not yet complete", since a lock count of 0 is "locked, with no recursive references". a simple branch in a non-hot path of pthread_mutex_trylock can then see and act on this state, skipping past the code that would check and take the lock to the same code path that runs after the lock is obtained for a non-PI mutex. because we're forced to let the kernel perform the actual lock and unlock operations whenever the mutex is contended, we have to patch things up when it does the wrong thing: 1. the lock operation is not aware of whether the mutex is error-checking, so it will always fail with EDEADLK rather than deadlocking. 2. the lock operation is not aware of whether the mutex is robust, so it will successfully obtain mutexes in the owner-died state even if they're non-robust, whereas this operation should deadlock. 3. the unlock operation always sets the lock value to zero, whereas for robust mutexes, we want to set it to a special value indicating that the mutex obtained after its owner died was unlocked without marking it consistent, so that future operations all fail with ENOTRECOVERABLE. the first of these is easy to solve, just by performing a futex wait on a dummy futex address to simulate deadlock or ETIMEDOUT as appropriate. but problems 2 and 3 interact in a nasty way. to solve problem 2, we need to back out the spurious success. but if waiters are present -- which we can't just ignore, because even if we don't want to wake them, the calling thread is incorrectly inheriting their priorities -- this requires using the kernel's unlock operation, which will zero the lock value, thereby losing the "owner died with lock held" state. to solve these problems, we overload the mutex's waiters field, which is unused for PI mutexes since they don't call the normal futex wait functions, as an indicator that the PI mutex is permanently non-lockable. originally I wanted to use the count field, but there is one code path that needs to access this flag without synchronization: trylock's CAS failure path needs to be able to decide whether to fail with EBUSY or ENOTRECOVERABLE, the waiters field is already treated as a relaxed-order atomic in our memory model, so this works out nicely. --- src/thread/pthread_mutex_trylock.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) (limited to 'src/thread/pthread_mutex_trylock.c') diff --git a/src/thread/pthread_mutex_trylock.c b/src/thread/pthread_mutex_trylock.c index 29622ff9..37e5c473 100644 --- a/src/thread/pthread_mutex_trylock.c +++ b/src/thread/pthread_mutex_trylock.c @@ -9,10 +9,17 @@ int __pthread_mutex_trylock_owner(pthread_mutex_t *m) old = m->_m_lock; own = old & 0x3fffffff; - if (own == tid && (type&3) == PTHREAD_MUTEX_RECURSIVE) { - if ((unsigned)m->_m_count >= INT_MAX) return EAGAIN; - m->_m_count++; - return 0; + if (own == tid) { + if ((type&8) && m->_m_count<0) { + old &= 0x40000000; + m->_m_count = 0; + goto success; + } + if ((type&3) == PTHREAD_MUTEX_RECURSIVE) { + if ((unsigned)m->_m_count >= INT_MAX) return EAGAIN; + m->_m_count++; + return 0; + } } if (own == 0x3fffffff) return ENOTRECOVERABLE; if (own || (old && !(type & 4))) return EBUSY; @@ -29,9 +36,18 @@ int __pthread_mutex_trylock_owner(pthread_mutex_t *m) if (a_cas(&m->_m_lock, old, tid) != old) { self->robust_list.pending = 0; + if ((type&12)==12 & m->_m_waiters) return ENOTRECOVERABLE; return EBUSY; } +success: + if ((type&8) && m->_m_waiters) { + int priv = (type & 128) ^ 128; + __syscall(SYS_futex, &m->_m_lock, FUTEX_UNLOCK_PI|priv); + self->robust_list.pending = 0; + return (type&4) ? ENOTRECOVERABLE : EBUSY; + } + volatile void *next = self->robust_list.head; m->_m_next = next; m->_m_prev = &self->robust_list.head; -- cgit v1.2.1