/sys/sys/proc.h
Thread and Turnstile
line 228
A thread may block for a short , medium, or long period of time depending on the reason that it needs to wait. A short wait occurs when a thread need to wait for access to a lock that protects a data structure. A medium wait occurs while a thread waits for an event that is expected to occur quickly such as waiting for data to be read from a disk. A long wait occurs when a thread is waiting for an event that will happen at an indeterminate time in the future such as input from a user.
Short-term waits arise only form a lock request. Short-term locks include mutexes, read-writer locks, and read-mostly locks.
A short-term lock is managed by a turnstile data structure.
Turnstile structure
Turnstile interfaces is provided in sys\sys\turnstile.h
and detailed implementation can be found in sys\kern\subr_turnstile.c
/*
* Implementation of turnstiles used to hold queue of threads blocked on
* non-sleepable locks. Sleepable locks use condition variables to
* implement their queues. Turnstiles differ from a sleep queue in that
* turnstile queue's are assigned to a lock held by an owning thread. Thus,
* when one thread is enqueued onto a turnstile, it can lend its priority
* to the owning thread.
*
* We wish to avoid bloating locks with an embedded turnstile and we do not
* want to use back-pointers in the locks for the same reason. Thus, we
* use a similar approach to that of Solaris 7 as described in Solaris
* Internals by Jim Mauro and Richard McDougall. Turnstiles are looked up
* in a hash table based on the address of the lock. Each entry in the
* hash table is a linked-lists of turnstiles and is called a turnstile
* chain. Each chain contains a spin mutex that protects all of the
* turnstiles in the chain.
*
* Each time a thread is created, a turnstile is allocated from a UMA zone
* and attached to that thread. When a thread blocks on a lock, if it is the
* first thread to block, it lends its turnstile to the lock. If the lock
* already has a turnstile, then it gives its turnstile to the lock's
* turnstile's free list. When a thread is woken up, it takes a turnstile from
* the free list if there are any other waiters. If it is the only thread
* blocked on the lock, then it reclaims the turnstile associated with the lock
* and removes it from the hash table.
*/
and here’s the hash table implementation
/*
* Constants for the hash table of turnstile chains. TC_SHIFT is a magic
* number chosen because the sleep queue's use the same value for the
* shift. Basically, we ignore the lower 8 bits of the address.
* TC_TABLESIZE must be a power of two for TC_MASK to work properly.
*/
#define TC_TABLESIZE 128 /* Must be power of 2. */
#define TC_MASK (TC_TABLESIZE - 1)
#define TC_SHIFT 8
#define TC_HASH(lock) (((uintptr_t)(lock) >> TC_SHIFT) & TC_MASK)
#define TC_LOOKUP(lock) &turnstile_chains[TC_HASH(lock)]
/*
* There are three different lists of turnstiles as follows. The list
* connected by ts_link entries is a per-thread list of all the turnstiles
* attached to locks that we own. This is used to fixup our priority when
* a lock is released. The other two lists use the ts_hash entries. The
* first of these two is the turnstile chain list that a turnstile is on
* when it is attached to a lock. The second list to use ts_hash is the
* free list hung off of a turnstile that is attached to a lock.
*
* Each turnstile contains three lists of threads. The two ts_blocked lists
* are linked list of threads blocked on the turnstile's lock. One list is
* for exclusive waiters, and the other is for shared waiters. The
* ts_pending list is a linked list of threads previously awakened by
* turnstile_signal() or turnstile_wait() that are waiting to be put on
* the run queue.
*
* Locking key:
* c - turnstile chain lock
* q - td_contested lock
*/
struct turnstile {
struct mtx ts_lock; /* Spin lock for self. */
struct threadqueue ts_blocked[2]; /* (c + q) Blocked threads. */
struct threadqueue ts_pending; /* (c) Pending threads. */
LIST_ENTRY(turnstile) ts_hash; /* (c) Chain and free list. */
LIST_ENTRY(turnstile) ts_link; /* (q) Contested locks. */
LIST_HEAD(, turnstile) ts_free; /* (c) Free turnstiles. */
struct lock_object *ts_lockobj; /* (c) Lock we reference. */
struct thread *ts_owner; /* (c + q) Who owns the lock. */
};
The most significant difference between turnstile and sleepqueue is that turnstile provide a mechanism called priority propagation, which is used to handle the occurrence of priority inversion. Detailed info about the priority inversion can be found on Wikipedia.
static void
propagate_priority(struct thread *td)
{
struct turnstile *ts, *top;
int pri;
THREAD_LOCK_ASSERT(td, MA_OWNED);
pri = td->td_priority;
top = ts = td->td_blocked;
THREAD_LOCKPTR_ASSERT(td, &ts->ts_lock);
/*
* The original turnstile lock is held across the entire
* operation. We only ever lock down the chain so the lock
* order is constant.
*/
for (;;) {
td = ts->ts_owner;
if (td == NULL) {
/*
* This might be a read lock with no owner. There's
* not much we can do, so just bail.
*/
propagate_unlock_ts(top, ts);
return;
}
/*
* Wait for the thread lock to be stable and then only
* acquire if it is not the turnstile lock.
*/
thread_lock_block_wait(td);
if (td->td_lock != &ts->ts_lock) {
thread_lock_flags(td, MTX_DUPOK);
propagate_unlock_ts(top, ts);
}
MPASS(td->td_proc != NULL);
MPASS(td->td_proc->p_magic == P_MAGIC);
/*
* If the thread is asleep, then we are probably about
* to deadlock. To make debugging this easier, show
* backtrace of misbehaving thread and panic to not
* leave the kernel deadlocked.
*/
if (TD_IS_SLEEPING(td)) {
printf(
"Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
td->td_tid, td->td_proc->p_pid);
kdb_backtrace_thread(td);
panic("sleeping thread");
}
/*
* If this thread already has higher priority than the
* thread that is being blocked, we are finished.
*/
if (td->td_priority <= pri) {
propagate_unlock_td(top, td);
return;
}
/*
* Bump this thread's priority.
*/
sched_lend_prio(td, pri);
/*
* If lock holder is actually running or on the run queue
* then we are done.
*/
if (TD_IS_RUNNING(td) || TD_ON_RUNQ(td)) {
MPASS(td->td_blocked == NULL);
propagate_unlock_td(top, td);
return;
}
#ifndef SMP
/*
* For UP, we check to see if td is curthread (this shouldn't
* ever happen however as it would mean we are in a deadlock.)
*/
KASSERT(td != curthread, ("Deadlock detected"));
#endif
/*
* If we aren't blocked on a lock, we should be.
*/
KASSERT(TD_ON_LOCK(td), (
"thread %d(%s):%d holds %s but isn't blocked on a lock\n",
td->td_tid, td->td_name, td->td_state,
ts->ts_lockobj->lo_name));
/*
* Pick up the lock that td is blocked on.
*/
ts = td->td_blocked;
MPASS(ts != NULL);
THREAD_LOCKPTR_ASSERT(td, &ts->ts_lock);
/* Resort td on the list if needed. */
if (!turnstile_adjust_thread(ts, td)) {
propagate_unlock_ts(top, ts);
return;
}
/* The thread lock is released as ts lock above. */
}
}
Basically the functional part of propagate_priority()
is contained in the for(;;)
loop. This function continues looking along the turnstile chain and lend the initial thread’s priority to the lower-priority thread which holds a lock and obstructs the higher-priority one to execute, until it meets a higher-priority thread, that means we’ve got job done, or an unwilling kernel panic, while it probably means we get into a deadlock. The kernel panic can be reported by KASSERT()
and its variant MPASS()
.