Group Scheduling Internals

We believe that as both applications and computer system designs become more varied and sophisticated, a wide variety of scheduling semantics will become desirable. Implementers of specific applications will want to describe the execution control of their application in terms specific to the semantics of the application. Designers of embedded or special purpose systems will want to describe how sets of applications can be controlled, and how conflicts between the needs of different applications can be resolved, in terms specific to their needs and desires. For these and many other reasons we believe there is a strong current need for an approach to operating system scheduling which emphasizes both configurability and precision of control over all aspects of thread execution at both the user and system level. Group Scheduling is designed to meet this need.

Group Scheduling is a hierarchical scheduling framework within which a wide variety of scheduling semantics can be specific for specific applications, sets of applications, or for the system as a whole. Under it, different scheduling semantics can be used to control different sets of system components. Each group can have its own Scheduling Decision Function (SDF) which selects among group members when asked. A group can be used to represent a set of threads implementing an application at the user level, or a set of threads implementing interrupt service routines at the system level. Groups can be hierarchically composed, so sets of computations can be controlled under selected semantics, and by extension, the semantics under which every activity in the system is controlled can be configured. The process of customization is simplified by letting all threads in the system be under the control of the standard Linux semantics by default, thus requiring explicit specification only for those threads requiring special semantics.

However, the generalization of scheduling semantics raises the need for generalizing the semantics of concurrency control. It does little good to specify subtle and varied semantics for selecting among runnable threads if every thread is controlled by a single semantics whenever it is waiting with an arbitrary set of other threads on a user or OS level lock for access to a shared resource. A need for generalized scheduling semantics implies the need for a generalized integration of concurrency control and scheduling. Proxy Management is our approach to integrating concurrency control and scheduling in a way that assumes as little as possible about the semantics of either the concurrency control or the scheduling. Neither Proxy management nor Group Scheduling are miraculous cures for all ills, however.

The user is granted the power to specify any semantics they want through the implementation of SDFs, the configuration of the Group Scheduling hierarchy for the system, and by configuration of semantics for the integration of concurrency control and scheduling. With this power comes the responsibility to make choices consistent with specified design goals. The system presented here provides the ability to say almost anything that can be imagined, but the developer is still responsible for imagining what they really want.

The rest of this document first provides a high level overview of the motivation for a generalization of scheduling and concurrency control in operating system design. Then it provides an overview of the Group Scheduling framework and some of its uses in creating examples of various scheduling semantics, some of which require generalized support for integration of scheduling and concurrency control. Then we discuss integration of concurrency control and scheduling. First, in the context of the the RT-Mutex implementation provided by the PREEMPT-RT patch, and then in the context of the Proxy Management approach provided by the KUSP system. Finally, we discuss a number of ways in which Group Scheduling can be used to implement an extremely wide variety of specialized semantics.

Group Scheduling Framework

These sections describe how the Groups Scheduling Hierarchy works, how to implement an SDF, and discusses some existing SDFs.

Integration of Scheduling and Concurrency Control

These sections describe how scheduling and concurrency control can be integrated and generalized as a symbiotic pair. We begin by discussing the RT-Mutex implementation in the PREEMPT_RT patch, since it: (1) is a good example of scheduling and concurrency control integration and (2) serves as the foundation for the Proxy Management code modifications. The second role for RT-Mutex is motivated by our desire to make Proxy Management a patch to the PREEMPT_RT software, to minimize the size of the patch, and to maximize the familiarity of the approach to those already familiar with RT-Mutex.

The Proxy Management section describes how we track the status of all RT-Mutex instances and generalize accounting required to track owners, waiters, and proxies. The Proxy Management Scenarios Section discusses the set of scenarios any accounting scheme will have to handle correctly to provide adequate information to the scheduling layer in support of integrating concurrency control and scheduling.

The next section discusses the interface provided by Proxy Management for use by the system scheduling layer. The next section discusses use of the Proxy Management interface by Group scheduling, ad the one after that discusses

The GS and CC integration section then shows how arbitrary Proxy Execution brings processes block by concurrency control (BBCC) constraints into the scheduling decision and chooses proxy processes appropriately to the programming model (SDF) semantics.

GS Implementation of a Variety of Programming Models

The customized programming model section describes how a range of customized programming models can be implemented using a customized SDF, CC semantics, and library routines of various complexity. The future section discusses further developments including: integration of user-level CC (futexes) into the GS CC integration, extension to higher-level OS CC semantics, direct implementation of user-level condition variables, and CC for shared memory segments shared among threads in both user and OS mode.

Indices and tables