Integration of GS with Concurrency Control

The Proxy Management (PMGT) subsystem is responsible for all accounting required to track: (1) all owners of locks, (2) all waiters on locks, (3) all blocking relations among tasks and locks, and (4) all proxy relations among tasks. A detailed discussion of its design and implementation is provided in the Proxy Managment Section.

This section discusses how the PMGT Scheduling Layer interface is configured to use Group Scheduling (GS) as the Linux system Scheduling Layer (SL) for purposes of integrating concurrency control and and scheudling. The PMGT layer implements its interface to the SL as a set of function pointers that are filled in as appropriate to the SL being used. The KUSP system configuration associates with GS routines enabling the GS layer to make use of the PMGT information in controlling the behavior of the system. The system configuration demonstrating PMGT layer compatibility with default PREEMPT_RT patch semantics associates the pointers with routines that implement the Priority Inheritance semantics of RT-Mutexes under PREEMPT_RT using the PMGT software layer.

The set of routines implementing the PMGT-SL interface can be divided into two broad categories: notification and decision. Routines in both sets are called from the RT-Mutex code as the situations for which they are responsible arise. Notification routines are concerned with notifying the SL when the state or context of a task changes in a way that may affect the control of task execution by the SL. The decision making set are concerned with asking the scheduling layer for decisions related to lock management, including: which waiter to select when a lock is released, or whether a given task should be permitted to steal an RT-Mutex from a pending owner.

Structures

gsched_task_state

This enumeration defines a set of task states used by the GS software in managing tasks as members of groups. The state is stored in the GS member structure. Since tasks may be members of more than one group, state variables in more than one member structure may be involved. In general, the set of task memberships is managed as a group, and so the state variables of each member structure will generally have the same value, except during update periods when some in a set may have been changed, but others have not.

This set of states is a superset of those used by the vanilla Linux system since GS is a user of the PA information and is thus willing to select tasks which are blocked but have proxies, so that the proxy may be run and stop blocking the preferred task.

enum gsched_task_state {
        GSCHED_TASK_RUNNABLE
        GSCHED_TASK_BLOCKED
        GSCHED_TASK_PROXY_SEARCH
        GSCHED_TASK_PROXIED
}
  • GSCHED_TASK_RUNNABLE - A task is runnable. The member is on the task cpu.

  • GSCHED_TASK_BLOCKED - A task is blocked and does not have a proxy. The member

    is on the task cpu.

  • GSCHED_TASK_PROXY_SEARCH - A task is blocked and the proxy is in the process

    of being found.

  • GSCHED_TASK_PROXIED - A task is blocked and has a proxy.

A task A and its proxy B may be on the same CPU or on different CPUs. When they are on different CPUs, the member structure of A will be transferred to the per-CPU structure of B’s CPU so that if A is selected, B can be run. These state variables are thus important in controlling crucial aspects of GS behavior.

gsched_member_task

The gsched_member structure holds all information for the member of a group, and is discussed elsewhere. When a group member is a task the gsched member structure’s anonymous union members ‘task’ and ‘group’ are of structure types appropriate to representing tasks and groups. The gsched_member_task structure is the type used to represent tasks as members of a group. The structure gsched_member_task is defined in include/linux/kusp/sched_gsched.h.

struct gsched_member_task {
      struct task_struct *ptr;
      struct task_struct *proxy;
      struct list_head waiter_ent;
      struct list_head waiting_members_ent;
      int task_cpu;
      int proxy_cpu;
      enum gsched_task_state state;
      enum gsched_task_state blocked_state;
}

It holds proxy information and CPU assignment.

  • ptr - Pointer to the task struct of the task represented by this member.

  • proxy - Pointer to the proxy for the task represented by the

    member. Proxy must be on the same CPU. NULL when a process is blocked on a lock but we are still searching for the proxy.

  • waiter_ent - Entry on the list rooted at gsched_task_members

    field in the pmgt_waiter structure which represents all group memberships of this task. When a task is blocking on a lock the list is protected by the pi_lock of the task. After the proxy has been discovered, the list is protected by the pi_lock of the proxy.

  • waiting_members_ent - Entry on list rooted at

    gsched_waiting_members field in task structure of proxy. This list represents all of the tasks waiting on the proxy. Protected by the rq lock of proxy. Contains the default values indicating an empty list if the proxy field is NULL.

  • task_cpu - The cpu that the task represented by this member is on.

  • proxy_cpu - The cpu of the proxy. -1 if proxy is NULL

  • state - One of the values from gsched_task_state.

  • blocked_state - The state that a task will take on if it is runnable

    and then blocks. The blocked state can never be GSCHED_TASK_RUNNABLE. If the state of a task is not GSCHED_TASK_RUNNABLE then the state and blocked_state will have the same value. The reason for this is in the implementation of proxy accounting there are some periods during which the state of the thread is blocked but it must continue running to complete some of the accounting code.

pmgt_waiter

The pmgt_waiter structure is used by the PAL layer to represent a task waiting on a lock. The primary discussion of its is in the PA documentation, here. The pmgt_waiter structure is created and associated with a proxy task when a task blocks on a lock. There can be several steps involved in identifying the Proxy of the blocking task, so the pmgt_waiter structure is associated with tasks along the chain from the point at which a new waiter joins a set to the end where it finds its proxy.

When GS is the scheduling layer above PMGT, it stores the group memberships of a task on a gsched_task_members list in the pmgt_waiter structure. When the PI PREEMPT_RT scheduling layer is using PMGT, then a pi_waiters_ent field is used to keep the waiter on a queue sorted by priority.

The elements of the gsched_task_members list are removed when the task is no longer waiting on a lock and thus waiting on a proxy. Elements of the list are also removed when the waiting task represented by the pmgt_waiter structure is removed from a group, and thus no longer a member of it. An element is added to the list if the waiting task is added to a group while it is a waiter.

Relevant Lists

gsched_waiting_members

The gsched_waiting_members field in the task_struct structure representing task T is the root of a list which represents the tasks for which T is a proxy. The task structure is defined in include/linux/sched.h. The elements of the list are instances of the structure gsched_member which represents members of a group under GS, and is defined in include/linux/kusp/sched_gsched.h. Since the waiters are necessarily tasks, the member structures use the gsched_member_task version of the union in the gsched_member struct. Within that union containing information specific to tasks, the waiting_members_ent field is used to keep entries on the gsched_waiting_members list.

List Root: task_struct->gsched_waiting_members

List Element: gsched_member->task->waiting_members_ent

Use: If a task T is a proxy for one or more tasks, then this list holds all of the gsched_member structures for the tasks waiting on T. Otherwise, it is empty.

CC: rq lock of CPU to which the task is assigned.

gsched_memberships

The group_memberships field in the task_struct structure represents the memberships of the task in one or more GS groups. The task structure is defined in include/linux/sched.h. The elements of the list are instances of the structure gsched_member which represents members of a group under GS, and is defined in include/linux/kusp/sched_gsched.h. The list uses the samecomputation_entry member of the structure to form the list.

Note

The samecomputation_entry field is also used for groups which are members of a group. When the entity is a group, the list is rooted in the gshed_groups structure, defined in include/linux/kusp/sched_gsched.h, using the memberships field to root the list.

Root: task_struct->gsched_memberships

List Element: gsched_member->samecomputation_entry

Use: Links all of the gsched_member structures for a task
which represent all of a task’s group memberships. This is used whenever the state of a task changes in a way that is of interest to GS. The most obvious example of this, is when a task exits, and the task must be removed from all its groups.

CC: rq lock of CPU to which the task is assigned.

gsched_task_members

This list holds the same information as gsched_memberships, but it rooted in the pmgt_waiter structure, and uses the waiter_ent element of the list to form the list of all of a waiting task’s group memberships. The reason we create a “duplicate” list is that the concurrency control can be differnt, and less restrictive. The gsched_memberships list rooted in the task structure is protected with the run queue lock for the CPU to which the task is assigned. This obviously restricts concurrency for all tasks on the CPU. In contrast, this list is protected by the task_struct->pi_lock of the waiting task at first, and then of the proxy task when it is identified. This obviously restricts concurrency only for tasks accessing the infomation associted with a single task, and is much less restrictive.

Root: pmgt_waiter->gsched_task_members

List Element: gsched_member_task->waiter_ent

Use: Holds all of the group memberships for the task represented
by the pmgt_waiter.

CC: Initially protected by task_struct->pi_lock of the task represented by the pmgt_waiter. When a proxy has been found, then it is protected by pi_lock of the proxy, pmgt_waiter->proxy_task->pi_lock.

CC -> GS Integration

The Proxy Management (PMGT) layer implementation assumes that some scheduling layer will exist in the system and use the proxy data being managed. The interface between the scheduling layer (SL) and PMGT is implemented as a set of hooks, divided into two subsets, those notifying the SL of information it may care about, and those requiring a decision from the SL.

The notification set are managed using the pmgt_sched structure, while the decision set are managed using the pmgt_arbiter structure. Both structures are defined in include/linux/rtmutex.h. The pmgt_sched structure represents the full set of PMGT SL interface functions since it contains the pointer field arbiter which refers to an instance of the pmgt_arbiter structure.

Note

This is a bit of a tangle. The text here would be easier to follow if it included soem mention of where all these routines and structures are defined. The players include: kusp/gsched_core.c, kernel/rtmutex_proxy.c, kernel/pmgt_sched.c, include/linux/rtmutex.h, linux/kusp/sched_gsched.h

These hooks are called when the locking code must interact with the SL because the proxy code is in a situation where scheduling semantics influence behaviour. One of the obvious examples of such a hook is when a lock is released and a pending owner should be selected. Obviously the choice of pending owner is determined by the current scheduling semantics, and is beyond the jurisdiction of the PMGT layer. Other examples include when the lock code must ask the SL if a lock can be stolen from its pending owner by another task, or when a task has stopped being a waiter on a lock and the SL must be notified that the set of waiters for a given proxy task has changed.

These functions are prefixed with gsched_pmgt.

init

Note

FIXME.D: This appears to have changed into task_init and waiter_init. Check with Tyrian on this.

Called when proxy management is creating a new waiter because a task is about to block on a lock.

Walk gsched_memberships list in task structure of blocking task to find all its group memberships in order to build the parallel list of group memberships rooted at gsched_task_members in pmgt_waiter structure with each member of the list using waiter_ent of the gsched_member_task. The purpose of the parallel data structure is to shift the concurrency control domain from the rq lock required for gsched_memberships to the pi_lock required for pmgt_waiter.

gsched_pmgt_destroy

Called when a task was a waiter on a lock but is becoming the owner or aborting the wait. We are thus destroying any record of the task as a waiter.

Usually, the task represented by a waiter will be GSCHED_TASK_RUNNABLE whenever a waiter is destroyed. However, it is possible for a GS SDF to modify these semantics. Therefore, GSCHED_TASK_PROXY_SEARCH and GSCHED_TASK_PROXIED are also valid states. For example, on lock release, guided execution does not permit wakeups for the pending owner because it waits for the releasing thread to reach the next waypoint in userspace. Therefore, the pending owner could have its waiter destroyed while it is GSCHED_TASK_PROXY_SEARCH or GSCHED_TASK_PROXIED.

The blocked state of the task having its waiter destroyed will be set to GSCHED_TASK_BLOCKED, regardless of which state the task is currently in. This indicates to GS that the task is no longer blocked on a lock. Additionally, if the task is not currently runnable then the state will be set to the blocked_state.

Under normal circumstances, the task having its waiter destroyed will be GSCHED_TASK_RUNNABLE. However, this is not always the case when the guided execution programming model is being used and its SDF is controlling a lock and the waiters on the lock. When a lock is released, guided execution does not allow the pending owner of the lock to be woken up immediately to prevent tasks controlled by the linux scheduling class from violating the guided execution schedule. Therefore, the pending owner will not immediately be runnable and it could be in the GSCHED_TASK_PROXIED state.

Walk gsched_task_members list in pmgt_waiter structure.

remove each member from gsched_task_members

waiter is going to be destroyed

move member back to the cpu of the task

task no longer has a proxy

remove member from gsched_waiting_members of its former proxy

member is not waiting on another task anymore

move_prepare

Called when the proxy field for a waiter is being temporarily cleared because a new proxy is being found. The typical example of this is when chain 1 is joining chain 2, and thus the proxy of the already blocked tasks in chain 1 will be changing.

Requires old proxy->pi_lock held by calling context

Acquires rq lock for old proxy

To access gsched_waiting_members list of the old proxy in order to remove the memberships of a task T in chain 1 which will be getting a new proxy. We step through the old proxy’s list removing all memberships representing T.

Walk gsched_task_members list in pmgt_waiter protected by pi_lock of old proxy

Clear the proxy for the member(erase old proxy)

Do not move the member back to the cpu of the task - optimization

if task is blocked then member remains on the cpu of the old proxy

a new proxy will be found shortly and member will be moved to the cpu of the new proxy

if task is runnable then members will be moved to the task cpu by separate code in gsched_enqueue_task when the task is placed on the runqueue

The blocked state of the waiting task is set to GSCHED_TASK_PROXY_SEARCH. If a task is not runnable then its state is set to its blocked state.

The scheduling class of the task the waiter is being removed from is not changed

Class switch occurs after all waiters have been prepared for movement

Invoked later by pmgt code

Optimization for chain walking

gsched_pmgt_move_prepare is called for all of the members waiting on the task T that is the proxy for chain 1. When T is blocking on a member of chain 2, all of T’s waiting members need to be moved to become waiters of chain 2’s proxy.

move

Called when a proxy is being set for a waiter because its proxy has been identified

Paired with gsched_pmgt_move_prepare. gsched_pmgt_move_prepare must be called previous to gsched_pmgt_move. Generally, this is done earlier in the process in a different concurrency context.

Move should be called on tasks that are either GSCHED_TASK_RUNNABLE or GSCHED_TASK_PROXY_SEARCH. In the case of GSCHED_TASK_RUNNABLE, the most common case is the running end of chain 1 that is joining chain 2. This task will become GSCHED_TASK_PROXIED eventually, but outside the context of the call to move. A less common possibility is a task that was previously GSCHED_TASK_PROXIED may receive a signal or timeout and thus be in GSCHED_TASK_RUNNABLE state. In these situations, the task will leave the rt mutex code and destroy its waiter structure. However, it may then immediately re-enter the locking code and try to reacquire the same lock. Rarely, a task will receive a wakeup that does not correspond to a signal or a timeout. In this case, a task will not leave the locking code and will immediately re-block.

If a task is GSCHED_TASK_PROXY_SEARCH then the search is over at the time move is called becase we have found its proxy. The task is blocked already and becomes GSCHED_TASK_PROXIED as part of what move does.

The blocked state of the waiting task is always set to GSCHED_TASK_PROXIED and its proxy task pointer is set because a proxy is being assigned. If a task is not runnable then its state is set to its blocked state.

Members are moved to the CPU of the proxy if it is a different CPU

than they are on Must acquire group->lock to change CPU assignment of members

call sdf->move_member

Add members to the gsched_waiting_members list on the new proxy so that group scheduling knows that the proxy has more waiting members.

Must acquire rq_lock for proxy to access gsched_waiting_members

Optimization for chain walking

gsched_pmgt_move is only called when all wwaiters have reached the end of the chain and the new proxy has been found

switch_class

Place a task in the correct scheduling class. Tasks that are under group scheduling retain memory of wanting to be in the KUSP scheduling class even when they are moved to the linux class because they have linux waiters. This enables the code to note when a task wanting to be in the KUSP class no longer has linux waiters and can thus be moved back to the KUSP class.

If a task has linux waiters

if the task is in KUSP scheudling class

move it to CFS or RT based on its prio

If a task does not have linux waiters

If the task wants to be in the KUSP scheduling class

move it to the KUSP scheudilng class

reset

Perform same operations as gsched_pmgt_move_prepare

Additionally, perform gsched_pmgt_switch_class.

Used when class switch operation cannot be performed at a later time.

Reset must be used if waiters are being moved from multiple tasks at the same time as opposed to moving waiters from a single task.

pmgt_remove_waiters

Less efficient than the gsched_pmgt_move_prepare and gsched_pmgt_switch_class pair

More class switch checks are performed

GS -> CC Integration

When scheduling code must influence locking data structures.

Members are added or removed from the pmgt_waiter’s gsched_task_members list when they are added and removed from the task represented by the waiter. This can occur when an executive task is influencing group structure as it runs in its own context.

These functions are prefixed with gsched_pmgt.

transition_member

This is a wrapper function which unifies the common code required in several contexts. Acquiring the locks needed to access the waiter for a task is part of establishing the necessary context. A function pointer(add_member, rem_member) is one of the arguments of this function and the referenced function is called when the proper context has been established.

task->pi_lock

The task struct uses pi_blocked_on to refer to the rt_mutex_waiter structure representing that task as a waiter in the PI code. Access to this is controlled with the pi_lock. Proxy currently creates a sibling data structure pmgt_waiter to hold proxy specific code. The rt_mutex_waiter structure uses the pmgt_waiter field to refer to the pmgt_waiter structure.

pmgt_waiter->lock

Controls read access required by transition_member find the proxy task by reading the proxy_task field in the pmgt_waiter structure.

pmgt_waiter->tnode->via_lock->wait_lock

FIXME: Analysis of how tnode is used and some policy statements of what it is used for are pending.

add_member

Put member on gsched_task_members list in pmgt_waiter. An example of when this is called is when a task is given a new membership.

Acquire rq_lock for proxy to access gsched_waiting_members list to add the new member structure and to switch scheduling class as required.

Place member struct on gsched_waiting_members list of the proxy. This may require moving the member structure to the per-cpu area of the proxy’s cpu if the proxy is remote. To do this, it calls gsched_pmgt_move_member.

Switch proxy to appropriate scheduling class

gsched_pmgt_switch_class

rem_member

Remove member from gsched_task_members list on pmgt_waiter. An example of this is when a task is removed from a group.

Acquire rq_lock for proxy to access gsched_waiting_members list to remove the member structure and to switch scheduling class

Remove member from gsched_waiting_members

Do not need to move member back to CPU of its task since we are removing the membership.

Switch proxy to appriprate scheduling class

gsched_pmgt_switch_class

GS Proxy Awareness

These functions are prefixed with gsched and use the proxy related information that is maintained by other parts of this subsystem.

task_cpu

Find the cpu of a group scheduling task member

if the task is GSCHED_TASK_RUNNABLE or GSCHED_TASK_BLOCKED

the cpu is the cpu of the task

if the task is GSCHED_TASK_PROXY_SEARCH or GSCHED_TASK_PROXIED

the cpu is the cpu of the proxy

pick_next_task

Hierarchy is evaluated by a particular CPU to make a scheduling decision.

If a task is chosen then we need to consider its proxy, if it has one

task_cpu, proxy_cpu, proxy, state fields in gsched_member_task structure are protected by group->lock

check if we need to run the proxy

do not run proxy if task is runnable

Special case: Generally, a runnable task should not have a proxy but

A task with a proxy can wake up and start running. So for some set of instructions inside the locking code, it will still have a proxy noted but that will be removed in the course of emerging from the locking code.

a task may be awakened and be executing in the locking code while still having a proxy designated even though it may decide to reblock. In this case, the proxy remains the same.

do not run proxy if the executing cpu is not the same as proxy_cpu

do not run proxy if proxy is not runnable

if there is no proxy chosen

run task if the task is on this cpu and on the runqueue

enqueue_task

Called when a task is placed on a runqueue.

This happens under several different circumstances. First, when a task blocks it is removed from a runqueue and it when it unblocks it is replaced. Second, when task affinity is changed it is moved from the runqueue of one cpu to the runqueue of another. Third, when a task is first created, it is added to a runqueue. Fourth, when scheduling class is changed, when scheduling class is changed the task must be moved from one section of the runqueue to another and removing and replacing the task on the same runqueue is part of this process.

if a task is waking up and has a proxy

move members from the proxy cpu to the cpu of the task

special case

task is being placed on the runqueue but not due to a wakeup

occurs when changing cpus or switching scheduling classes

thread is in atomic context when these oprations are being performed. In this situation, not only is preemption disabled as indicated by atomic context but itnerrupts are also disabled. For example, when changing scheduling classes the task spends a period in atomic context.

When changing scheduling classes, members should already be on the cpu of the task because when changing scheduling classes the task is being removed and replaced on the same runqueue.

When changing CPU affinity, the task doing the changing is also in atomic context but the memberships of the task whose affinity is being changed would have to move from one CPU to another.

if a task switches CPUs

move members from old task cpu to new task cpu

if a task is a proxy and switches cpus

move the members that are waiting on it to the new CPU of the proxy

waiting tasks are recorded in gsched_waiting_members of the proxy task which is the task being enqueued

step down gsched_waiting_members and move all elements

protected by rq_lock which is held by calling context

dequeue_task

Called when a task T is taken off of a runqueue.

Task T may dequeue itself in schedule() or when it changes its own scheduling class or CPU affinity. Task T may also be dequeued by task R if R is changing the CPU affinity or scheduling class of T.

if task T is going to sleep

Task T will always be the task executing the dequeue

set the state of all of the memberships of task T to be equal to its blocked state

if task T has a proxy

move task T’s memberships to the cpu of the proxy

Task T is being taken off of the runqueue but it is not going to sleep

occurs when changing cpus and switching scheduling classes

Task T or Task R may be executing

The thread executing the dequeue is in atomic context with interrupts disabled. It will continue to run on the current cpu until task T is placed back on the runqueue. Additionally, the cpu associated with the runqueue cannot make any scheduling decisions until T has been placed back on the runqueue because the lock for the runqueue is held for the duration of the operation.

If task T is executing the dequeue then task T will run for a short period of time while it is not on the runqueue.

If task R is executing the dequeue then Task T will be taken off of the runqueue temporarily. Task T may or may not be running during this period of time. However, GS does not need to change the state of task T’s memberships because the cpu of the runqueue(s) involved cannot make a scheduling decision until task R has finished changing the cpu or scheduling class of task T. Task T remains runnable because no scheduling decision can be made for this duration and T will be runnable afterwards.

Special case

If a task is GSCHED_TASK_RUNNABLE then we do not need to move its memberships to the proxy because there is no proxy or any indicated proxy relationship is about to be eliminated. This case can arise most commonly because R is simply changing the assignment of a running task or a task that did have a proxy has aborted or been spontaneously awakened.

Non-Locking Scenarios

Some scenarios requiring GS and/or proxy actions are not related to any lock actions. Tasks have CPU affinity and scheduling class properties that are of concern to the proxy accounting and scheduling subsystems, and which may change due to actions outside the context of a thread executing lock code.

It is important to remember that any Task A, with adequate priveledges, can change the CPU affinity or scheduling class of any Task B.

Task CPU change

When a task is running and changes its own CPU affinity, its GS member structures must be moved from the per-CPU data structure associated with its current CPU to that associated with its destination CPU.

When a task is either running or blocked, and an executive Task A changes the CPU affinity of Task B, the GS member structures representing B must be moved from the data structures associated with its original CPU to those associated with its destination CPU.

Proxy CPU change

When tasks are block on a lock, they have a designated Proxy process, except during short periods when the PRoxy is being determined or adjusted. The Proxy task is one not currently blocked on a lock, although it may block for other reasons in the normal course of execution.

When a Task B that is a Proxy for one or more tasks changes its own CPU affinity, or if another Task A changes the CPU affinity of Task B, then the GS memberships structures for all tasks for which Task B is proxy must be moved from data structures associated with the old CPU of Task B to the data structures associated with the new CPU affinity of Task B. Note that these group member structures are “remote” in the sense that they are on data structures associated with a CPU that can be different from that of the CPU affinity of the task they represent. This is permitted because while they can be chosen by the scheduler for a CPU on which they cannot execute, their proxy on the relevant CPU will be the task given execution context, not the blocked task.

By

Move all memberships waiting on the proxy to the new cpu of the proxy

only move memberships if the task represented by the membership is not running

special case for timeout, signal, stray wakeup

Locking Scenarios

These scenarios involve tasks and locks in some way. The most obvious kind of situation is, of course, when a task is newly blocking on a lock and thus joining a set of waiting tasks. In this case, the proxy for the task must be found, and the data structures representing its direct and indirect blocking relations must be updated. This can be a farily complex process of the newly blocking task is itself a proxy for other blocked tasks. The other most obvious case is when a lock is being released by the current proxy and a new owner for the lock, and a new set of proxy relationships, must be established. More subtle scenarios can also occur. A blocked task may receive a signal or unblock for another reason, requiring the data structures representing the blocking relations and identity of the proxy tasks to be updated.

At the most basic level the accounting for these scenarios is handled by the PMGT layer, and is of limited concern to the SL of the system. However, the SL clearly needs to know about these status changes so it can take whatever measures are deemed appropriate. In the case of GS, we want to make every task in in the GSCHED_TASK_RUNNABLE and GSCHED_TASK_PROXIED states be selectable at scheduling time, because in both cases there is a clearly identified task to run.

Complicating factors relevant at the SL level which are not a concern at the PMGT level, however, are the CPU affinity and scheduling class of the tasks involved in a given scenario. Each task can only run on the CPU for which it currently has affinity, but a task and its proxy may have different CPU affinities. The scheduling classes of the set of tasks involved affect specific actions taken by the GS layer to perform its duties, and the waiting tasks and their proxy can be managed under an arbitrary mix of scheduling classes.

These complicating factors add configuration dimensions to set of locking scenarios used to evaluate the correctness and completeness of the PMGT implementation. By adding these dimensions, we effectively convert each scenario at the PMGT level into several scenarios at the GS level. In addition, concerns at the GS level which are not relevant at the PMGT level make it necessary to create some additional senarios.

The CPU affinity dimension has two values: same-cpu and different-cpu. In general, these values describe whether a task and its proxy have the saem CPU affinity or different affinity. This matters because it affects where the GS_MEMBER structure for the waiting task should be placed, since it must be selectable when the proxy’s scheduling code is executed. Remember that the scheduling code is shared among all CPUs in a system, but the data structures used by that code are segregated by CPU. This means that the GS_MEMBER structure for a waiting task must be in the per-cpu data structures for the CPU of the proxy in order for it to be selectable by the CPU which can run the proxy task.

The scheduling class of the tasks involved are relevant because the run queues are segregated by scheduling class, GS has its own class, and proxy relations across scheduling class boundaries are possible. In the context of these tests, we really only care about the “Linux” scheduling class, technically known as SCHED_OTHER, and the GS scheduling class. For purposes of discussing the various scenarios we will refer to the L and G scheduling classes of the tasks.

Lock Scenario Categories

While the lock scenarios described here are closely related to the locking scenarios used to test PMGT, there are important differences arising from the fact that the PMGT layer is only concerned with the accounting required to track task blocking relations and to identify proxy tasks. In contrast the SL in the system, in this case GS, is responsible for controlling the execution of the tasks using the information provided by PMGT as well as other task state information. As a result, the set of testing scenarios must be understood in a more complex way when considering the GS scheduling layer than was sufficent for the tests concentrating on the actions of the PMGT layer.

We have identified four broad categories of locking scenarios from the GS perspective:

  • First Waiter Arrives, Inducing Proxy
  • Proxy Static, but Waiter Set Changes
  • Proxy and Waiter Set Static, but Thread State Changes
  • Proxy Releases Lock

We will still use the previous number scheme denoting the number of tasks, the number of locks, and the number of the scenario as the basic designation for a particular test. In addition, however, we will also not additional parameters that specificy aspects of the sceanrio that are significant for the GS layer. Each basic scenario identified under the basic numbering scheme used for the PMGT test will thus potentially become several tests at the GS layer. Most of the PMGT sceanrios are reelvant, but not all, so the relationship between PMGT and GS level tests is somewhat more complex than it may first appear.

First Waiter Arrives, Inducing Proxy

This covers a variety of scenarios related to establishing the first waiting relationship for a task that thus becomes a proxy. The scheduling classes of the new waiter and the proxy are relevant because some sets of circumstances require a task in the G class to temporarily be added to the L class.

Dimensions

Proxy Scheduling Class: L/G

Waiter Scheduling Class: L/G

CPU Affinity: waiter and proxy have the Same (S) or Different (D) CPU affinity

Scenarios: 2.1.1, 2.1.4, 2.1.8

Combinations: LLS, LGS, GLS, GGS, LLD, LGD, GLD, GGD

Proxy Static, but Waiter Set Changes

This covers a variety of scenarios related to changes in the membership of the existing set of waiters on a proxy. Such a change can occur because a new waiter is added to the set due to a task blocking on a lock, or because an existing waiter is removed due to a waiter aborting its wait on a lock.

Dimensions

Waiter Set State Before: L/G, indicates whether it contains only tasks in
the GS scheduling class (G), or it contains at least one waiter i the Linux scheduling class (L).
Waiter Set State After: L/G, indicates whether it contains only tasks in
the GS scheduling class (G), or it contains at least one waiter i the Linux scheduling class (L).

Scenarios:

Combinations: L->L, L->G, G->L, and G->G

Proxy and Waiter Set Static, but Thread State Changes

This covers a variety of scenarios that fall into two broad categories: changes to proxy state and changes to waiter state.

Dimensions

CPU Affinity wrt each Waiter: For each waiter, the proxy is either
on the Same (S) CPU as the waiter or on a Different (D) CPU.

Scheduling Class Change: L->G (LG) or G->L (GL)

Scheduling Parameter Change: should be irrelevant

Scenarios:

Combinations: LGS, GLS, LGD, GLD

Proxy Releases Lock

This covers a variety of scenarios that are distinguished form each other in a number of ways. The releasing task was the proxy, so a new proxy will have to be chosen. The releasing task can be called the old proxy and the task chosen as its successor can be called the new proxy. The set of waiters is generally reduced by one, since one of the waiters is chosen as the new proxy. The set of waiters might also

Dimensions

Old Proxy: Real and effective domain LG, LL, GG, GL???

New Proxy: L/G

CPU Affinity of New Proxy: Same (S) or DIffernt(D) from old. Ame
or different from each waiter.

Scenarios:

Combinations:

2 Threads, 1 Lock Scenarios

2.1.1

_images/blocking-scenarios-2.1.1.png

Figure BS 2.1.1: Blocking Scenarios in the 2.1.1 Set

Two phases

setup

proxy relationship with T1 created

T2 is executing

block

T2 is no longer runnable

gg.s

setup

GS records that T1 is the proxy of T2

gg.d

setup

GS records that T1 is the proxy of T2

block

move T2 memberships to cpu of T1

gl.s

setup

GS records that T1 is the proxy of T2

gl.d

setup

GS records that T1 is the proxy of T2

block

move T2 memberships to cpu of T1

lg.s

setup

change T1 to be in linux class

lg.d

setup

change T1 to be in linux class

2.1.2

_images/blocking-scenarios-2.1.2.png

Figure BS 2.1.2: Blocking Scenarios in the 2.1.2 Set

T1 is executing

T2 is blocked

gg.s

GS clears the proxy of T2

gg.d

GS clears the proxy of T2

move T2 memberships to cpu of T2

gl.s

GS clears the proxy of T2

gl.d

GS clears the proxy of T2

move T2 memberships to cpu of T2

lg.s

change T1 to be in GS class

lg.d

change T1 to be in GS class

2.1.3

_images/blocking-scenarios-2.1.3.png

Figure BS 2.1.3: Blocking Scenarios in the 2.1.3 Set

No actions

2.1.4

_images/blocking-scenarios-2.1.4.png

Figure BS 2.1.4: Blocking Scenarios in the 2.1.4 Set

Similar to 2.1.1

2.1.5

_images/blocking-scenarios-2.1.5.png

Figure BS 2.1.5: Blocking Scenarios in the 2.1.5 Set

No actions

2.1.6

_images/blocking-scenarios-2.1.6.png

Figure BS 2.1.6: Blocking Scenarios in the 2.1.6 Set

Two phases

run

T1 starts to run

remove

T1 removes its waiter

gg.s

remove

GS clears the proxy of T1

gg.d

run

move T1 memberships to cpu of T1

remove

GS clears the proxy of T1

gl.s

remove

GS clears the proxy of T1

gl.d

run

move T1 memberships to cpu of T1

remove

GS clears the proxy of T1

lg.s

remove

change T2 to be in linux class

lg.d

remove

change T2 to be in linux class

2.1.7

_images/blocking-scenarios-2.1.7.png

Figure BS 2.1.7: Blocking Scenarios in the 2.1.7 Set

Two phases

run

T1 starts to run
steal
T1 steals the lock

gg.s

steal

GS clears the proxy of T1

gg.d

steal

GS clears the proxy of T1

run

move T1 memberships to cpu of T1

gl.s

steal

GS clears the proxy of T1

gl.d

run

move T1 memberships to cpu of T1

steal

GS clears the proxy of T1

lg.s

steal

change T2 to be in GS class

lg.d

steal

change T2 to be in GS class

2.1.8

_images/blocking-scenarios-2.1.8.png

Figure BS 2.1.8: Blocking Scenarios in the 2.1.8 Set

Similar to 2.1.1

2.1.9

_images/blocking-scenarios-2.1.9.png

Figure BS 2.1.9: Blocking Scenarios in the 2.1.9 Set

No actions

3 Threads, 1 Lock Scenarios

3.1.1

_images/blocking-scenarios-3.1.1.png

Figure BS 3.1.1: Blocking Scenarios in the 3.1.1 Set

Two phases

setup

set new proxy for waiters

block

locking task blocks on a lock

setup

if T3 in GS

GS records that T1 is the proxy for T3

if T3 in linux, T2 in GS, and T1 in GS

put T1 in linux class

FIXME: The class of T2 is not represented.

block

if T3 in GS

move T3 memberships to cpu of T1

3.1.2

_images/blocking-scenarios-3.1.2.png

Figure BS 3.1.2: Blocking Scenarios in the 3.1.2 Set

if T2 in GS

GS clears the proxy of T2

move T2 memberships to cpu of T2

if T3 in GS

GS temporarily clear proxy for T3

GS set T2 to be proxy for T3

move T3 memberships to cpu of T2

FIXME: The cpu of T3 is not represented.

if T3 in Linux and T2 in GS

put T2 in Linux class

FIXME: The class of T3 is not represented.

if T2 or T3 is in linux and T1 wants to be in GS but is in linux because of T2 or T3

put T1 in GS class

3.1.3

_images/blocking-scenarios-3.1.3.png

Figure BS 3.1.3: Blocking Scenarios in the 3.1.3 Set

Similar to 3.1.1

3.1.4

_images/blocking-scenarios-3.1.4.png

Figure BS 3.1.4: Blocking Scenarios in the 3.1.4 Set

if T3 in GS

GS temporarily clears proxy of T3

GS records that T1 is now the proxy of T3

move T3 memberships from the cpu of T2 to the cpu of T1

FIXME: The cpu of T3 is not represented.

if T3 in linux and T2 wants to be in GS but was in linux because of T3

put T2 in GS class

if T3 in Linux and T1 in GS

put T1 in Linux Class

FIXME: The class of T3 is not represented.

3.1.5

_images/blocking-scenarios-3.1.5.png

Figure BS 3.1.5: Blocking Scenarios in the 3.1.5 Set

if T3 in GS

GS temporarily clears proxy for T3

GS records that T1 is now the proxy of T3

GS moves T3 memberships to cpu of T1

FIXME: The cpu of T3 is not represented

if T1 in GS

GS moves T1 memberships to cpu of T1

GS clears proxy of T1

if T3 or T1 in linux and T2 wants to be in GS but was in linux because of T3 or T1

put T2 in GS class

if T3 in Linux and T1 in GS

put T1 in Linux Class

FIXME: The class of T3 is not represented

3.1.6

_images/blocking-scenarios-3.1.6.png

Figure BS 3.1.6: Blocking Scenarios in the 3.1.6 Set

No action

3.1.7

_images/blocking-scenarios-3.1.7.png

Figure BS 3.1.7: Blocking Scenarios in the 3.1.7 Set

Similar to 3.1.1

3 Threads, 2 Locks Scenarios

3.2.1

_images/blocking-scenarios-3.2.1.png

Figure BS 3.2.1: Blocking Scenarios in the 3.2.1 Set

Three phases

reset

clear old proxy of waiters

setup

set new proxy for waiters

block

locking task blocks on a lock

reset

if T3 in GS

GS clears proxy for T3

setup

if T3 in GS

GS records that T1 is the proxy for T3

move T3 memberships from cpu of T2 to cpu of T1

if T3 in linux and T2 wants to be in GS but was in linux because of T3

move T2 from linux class to GS class

FIXME: Class of T3 is not represented

if T2 in GS

GS records that T1 is the proxy for T2

if T2 or T3 in linux and T1 in GS

put T1 in linux class

FIXME: Class of T3 is not represented

block

if T2 in GS

move T2 memberships to cpu of T1

3.2.2

_images/blocking-scenarios-3.2.2.png

Figure BS 3.2.2: Blocking Scenarios in the 3.2.2 Set

if T2 in GS

GS clears proxy of T2

move T2 memberships to cpu of T2

if T3 in GS

GS temporarily clears proxy of T3

GS records that T2 is the proxy for T3

move T3 memberships from cpu of T1 to cpu of T2

FIXME: The cpu of T3 is not represented

if T3 in linux and T2 in GS

Put T2 in linux

FIXME: The class of T3 is not represented

if T1 wants to be in GS but is currently in linux because T3 or T2 was in linux

put T1 in GS

3.2.3

_images/blocking-scenarios-3.2.3.png

Figure BS 3.2.3: Blocking Scenarios in the 3.2.3 Set

Two phases

setup

set new proxy for waiters

block

locking task blocks on a lock

setup

if T1 in GS

GS records that T2 is the proxy for T1

if T1 in linux, T3 in GS, and T2 in GS

put T2 in linux class

FIXME: The class of T3 is not represented.

block

if T1 in GS

move T1 memberships to cpu of T2

3.2.4

_images/blocking-scenarios-3.2.4.png

Figure BS 3.2.4: Blocking Scenarios in the 3.2.4 Set

FIXME: Class of T3 is not represented

if T3 in GS

clear proxy of T3

move memberships of T3 from the cpu of T2 to the cpu of T1

FIXME: CPU of T3 is not represented

if T3 in linux

if T2 wants to be in GS but is in linux because of T3

put T2 in GS

if T1 in GS

put T1 in linux

3.2.5

_images/blocking-scenarios-3.2.5.png

Figure BS 3.2.5: Blocking Scenarios in the 3.2.5 Set

No action

3.2.6

_images/blocking-scenarios-3.2.6.png

Figure BS 3.2.6: Blocking Scenarios in the 3.2.6 Set

Similar to 3.2.1

3.2.7

_images/blocking-scenarios-3.2.7.png

Figure BS 3.2.7: Blocking Scenarios in the 3.2.7 Set

T1 becomes runnable, even though it is a waiter(signal, timeout, etc)

move T1 memberships to cpu of T1 because T1 is runnable

if T1 in GS

GS clears proxy of T1

if T3 in GS

GS clears proxy of T3

move T3 memberships from the cpu of T2 to the cpu of T1

FIXME: Cpu of T3 is not represented

if T1 or T3 in linux, and T2 wants to be in GS but is currently in linux due to T1 or T3

T1 is no longer waiting on T2 and T3 is now waiting on T1

move T2 to GS

FIXME: Class of T3 is not represented

if T3 in linux and T1 in GS

move T1 to linux

FIXME: Class of T3 is not represented

3.2.8

_images/blocking-scenarios-3.2.8.png

Figure BS 3.2.8: Blocking Scenarios in the 3.2.8 Set

if T1 in GS

GS clears proxy for T1

move T1 memberships to cpu of T1

if T1 in linux, T3 in GS, and T2 wants to be in GS but is currently in linux due to T1

put T2 in GS class

3.2.9

_images/blocking-scenarios-3.2.9.png

Figure BS 3.2.9: Blocking Scenarios in the 3.2.9 Set

No actions

3.2.10

_images/blocking-scenarios-3.2.10.png

Figure BS 3.2.10: Blocking Scenarios in the 3.2.10 Set

if T2 in GS

GS records that T1 is the proxy for T2

move T2 memberships to cpu of T1

if T2 in linux, T3 in GS, T1 in GS

put T1 in linux

FIXME: Class of T3 is not represented

3.2.11

_images/blocking-scenarios-3.2.11.png

Figure BS 3.2.11: Blocking Scenarios in the 3.2.11 Set

Two phases

abort

T2 aborts its attempt to lock L1

exit

T2 thread ends

abort

if T2 in GS

GS clears proxy for T2

move memberships of T2 from cpu of T1 to cpu of T2

if T3 in GS

GS temporarily clears proxy for T3

GS sets T2 to be the proxy for T3

move memberships of T3 from cpu of T1 to cpu of T2

FIXME: CPU and class of T3 is not represented

if T3 in linux and T2 in GS

put T2 in linux

exit

futex implementation unlocks L2

if T3 in GS

GS clears the proxy of T3

move memberships of T3 from cpu of T2 to cpu of T3

FIXME: CPU and class of T3 notn represented

4 Threads, 3 Locks Scenarios

4.3.1

_images/blocking-scenarios-4.3.1.png

Figure BS 4.3.1: Blocking Scenarios in the 4.3.1 Set

if T4 in GS

move memberships of T4 to cpu of T1

if T4 in linux and T1 in GS

put T1 in linux class