Proxy Management <-> Scheduling Layer Interface: Notifications & Decisions

Scheduling Layer Notifications

Scheduling layer can observe actions occuring in Proxy Management. These are notifications that do not modify proxy data structures. However, they do allow data structures maintained by the scheduling layer to be updated. Proxy Management has no knowledge of what the scheduling layer data consists of.

There are two types of scheduling layer data: data stored on the proxy and data stored on the waiter. Data stored on the proxy is useful when the scheduling characteristics of a proxy depend on its waiters. The data stored on the waiter allows scheduling layer data to be duplicated to enable it to be accessed in a concurrency control context controlled by the proxy. In addition, the waiter data can server as an element of a data structure. Often, a data structure will be rooted at in the proxy data and its elements will be contained in the waiter data.

The proxy data is stored in the task_struct in include/linux/sched.h. This data is represented as an opaque typedef called pmgt_sched_proxy_t and it is stored in the task_struct as the field sched_proxy.

The waiter data is stored in the pmgt_waiter in include/linux/rtmutex.h. This data is represented as an opaque typedef called pmgt_sched_waiter_t and it is stored in the pmgt_waiter as the field sched_waiter.

When the proxy data is the root of a data structure and the waiter data is the elemnts of the data structure then the entire data structure is controlled by the root. Therefore, the task_struct->pi_lock controls the sched_proxy and sched_waiter fields.

struct pmgt_sched {
        void (*task_init)(struct task_struct*);
        void (*waiter_init)(struct pmgt_waiter*, struct task_struct*);
        void (*waiter_update)(struct pmgt_waiter*, struct task_struct*, struct task_struct*);
        void (*waiter_destroy)(struct pmgt_waiter*, struct task_struct*);
        void (*waiter_move_prepare)(struct pmgt_waiter*, struct task_struct*);
        void (*waiter_move)(struct pmgt_waiter*, struct task_struct*);
        void (*task_finalize)(struct task_struct*);
        void (*waiter_reset)(struct pmgt_waiter*, struct task_struct*);

        struct pmgt_arbiter *arbiter;
};

task_init

Allows an observer to perform initialization on data that it keeps in the task struct about the state of the task in proxy management.

  1. The task structure to initialize
Called in:
copy_process
CC:
No locks are held because the task is being initalized.

waiter_init

Initialize the waiter for a task that is about to block on a lock. This callback allows an observer to perform initialization on data that it keeps on the waiter.

1. The pmgt_waiter for the task that is blocking 3. The task that is blocking

Called in:

task_blocks_on_rt_mutex

CC:

lock->wait_lock

task->pi_lock

waiter_update

When the scheduling layer wants to change information about a waiting task, the scheduling layer may need to update information that it has duplicated, but still controls, on the pmgt_waiter structure. Using pmgt_update the scheduling layer asks Proxy Management to notify it the next time that the waiter that represents a task is able to have its scheudling layer information updated. Proxy Management notifies the scheduling layer that the waiter can be updated using the waiter_update callback when Proxy Management is able to gaurantee that the waiter’s relationships will not be modified during the update.

This callback will be invoked with the pi_lock of the task acquired. This lock prevents the task from changing its scheduling criteria. The pi_lock of the proxy for the proxy is also held. This lock prevents the information about the locking chain stored on the proxy task from being changed. Finally, the wait_lock for the lock held by the proxy is held to prevent the proxy relationship from changing.

  1. The waiter structure to update
  2. The task the waiter structure represents
  3. The proxy task of the task

Called in:

pmgt_update

CC:

task->pi_lock

proxy->pi_lock

task->pi_blocked_on->waiter->tnode->via_lock->wait_lock (lock held by proxy)

waiter_destroy

Destroy the waiter for a task because the task is no longer blocked on a lock.

The task has become the pending owner.

The task has aborted.

  1. The waiter for the task
  2. The task represented by the waiter

Called in:

wakeup_next_waiter

pmgt_remove_waiter

CC:

task->pi_blocked_on->waiter->pmgt_waiter->lock

proxy->pi_lock

task->pi_blocked_on->waiter->tnode->via_lock->wait_lock (lock held by proxy)

waiter_move_prepare

Prepare to move a waiter from its current proxy to a new proxy.

For a short period of time, the waiter is considered to be blocked on a lock but the proxy is unknown.

Called when moving a waiter down chain because one chain is joining another.

Called when waiters are being prepeared to move from the current owner of a lock to the pending owner of the lock because the lock is being released.

Called when a task steals a lock and the waiters are being prepared to move from the pending owner of the lock to the new owner.

  1. The waiter to be prepared for movement
  2. The old proxy of the waiter

Called in:

task_blocks_on_rt_mutex

pmgt_adjust_rt_mutex_chain

try_to_steal_lock

wakeup_next_waiter

CC:

task->pi_blocked_on->lock->wait_lock

task->pi_lock

waiter_move

Move a waiter to a proxy

Called when a chain C1 is joining another chain C2 and the head of C2 has been found.

Called when waiters are moving from an owner to the pending owner on lock release or when a lock has been stolen.

Called when a chain C1 has joined to a chain C2 and task T1 at the head of C1 is aborting. The waiters for the tasks that are up-chain from T1 must be moved from task T2 at the head of chain C2, which is their current proxy, to T1, which is their new proxy.

  1. The waiter being moved
  2. The new proxy it is being moved to

Called in:

try_to_steal_lock

wakeup_next_waiter

pmgt_adjust_rt_mutex_chain

pmgt_move_waiters (caused by abort, proxy changes to aborting task)

CC:

task->pi_blocked_on->pmgt_waiter->tnode->via_lock (lock held by proxy)

proxy->pi_lock

task_finalize

Finalize the state of a task that has had the set of tasks waiting on it changed. This is

Called after a set of waiter_move_prepare or waiter_move operations ahve been performed on a proxy task.

Useful for changing scheduling classes.

  1. The task to finalize

CC:

task->pi_blocked_on->pmgt_waiter->tnode->via_lock (lock held by proxy)

proxy->pi_lock

waiter_reset

A combination of waiter_move_prepare and task_finalize.

Provided as an optimization.

Called for the waiter structure representing the pending owner of a lock that is being released.

Called when a chain C1 has joined to a chain C2 and task T1 at the head of C1 is aborting. The waiters for the tasks that are up-chain from T1 must be moved from task T2 at the head of chain C2, which is their current proxy, to T1, which is their new proxy.

  1. The waiter being prepared for movement
  2. The old proxy of the waiter

CC:

pmgt_move_waiters

CC:

task->pi_blocked_on->lock->wait_lock

task->pi_lock

arbiter

The default arbiter to use for all locks unless a different arbiter has been set.

Lock Decisions

An arbiter is a decision maker for Proxy Management. Arbiters control the ordering of waiters on a lock, independently of the scheduling layer. The arbiter chosen by the scheduling layer is the default arbiter for all locks. A different arbiter can be defined and set for specific locks.

An arbiter is defined by the following set of callbacks. The callbacks for a particular arbiter will only be called with a mutex controlled by the arbiter.

All of these callbacks are called with the wait_lock for the mutexa acquired.

struct pmgt_arbiter {
        void (*init)(struct rt_mutex*);
        bool (*has_waiters)(struct rt_mutex*);
        void (*insert)(struct rt_mutex*, struct rt_mutex_waiter*);
        struct rt_mutex_waiter* (*top)(struct rt_mutex*);
        void (*remove)(struct rt_mutex*, struct rt_mutex_waiter*);
        bool (*can_steal)(struct rt_mutex*, struct task_struct*, struct task_struct*, int mode);
        bool (*pendowner_wakeup)(struct rt_mutex*, struct task_struct*);
};

Additionally, every mutex contains a union of data structures that can be used to store the waiters for a lock. The union must contain the data structures needed by all arbiters on the system. The union is an anonymous union field named waiters inside the rt_mutex structure. A given arbiter picks one data structure in the union to use.

The data to store direct lock waiters, used by all arbiters, is a union instead of an opaque type, like the scheduling layer data, because the arbiter for a lock can change at runtime.

union {
        struct list_head list;
        struct plist_head plist;
}                       waiters;

The rt_mutex_waiter structure contains the elements of this data structure in the waiters_ent union.

union {
        struct list_head node;
        struct plist_node pnode;
}                       waiters_ent;

init

Initialize the component of the waiters union that is going to be used to store waiters.

  1. The mutex to initialize

has_waiters

Return true if a mutex has waiters.

  1. The mutex to examine

insert

Insert an rt_mutex_waiter in the waiters field of an rt_mutex. This callback occurs when a task is blocking on a lock.

  1. The mutex that the waiter is blocking on
  2. The waiter to insert in the list of waiters for the mutex

top

Query who the top waiter for a mutex is. This should not remove the waiter.

  1. The mutex to examine

remove

Remove a waiter from a mutex. This occurs when a mutex is released or a waiter aborts.

  1. The mutex to remove the waiter from
  2. The waiter to remove

can_steal

Return true if a task can steal a mutex.

STEAL_LATERAL used by rt_spin_lock_slowlock STEAL_NROMAL used by rt_mutex_slowlock

  1. The mutex to examine
  2. The task trying to steal
  3. The task that is the pending owner
  4. STEAL_NORMAL or STEAL_LATERAL

pendowner_wakeup

When a lock has been released, should a wakeup be automatically issued to the pending owner by the locking code? Currently, this feature is used by Guided Execution to prevent pending owners from running out of turn.

  1. The mutex being released
  2. The pending owner