Initial Data Streams Tutorial Sequence

The Data Streams (DS) subsystem supports gathering of system behavioral data from instrumentation points inserted in kernel and application source. When a thread of execution crosses an instrumentation point performance data, most commonly and event record, is produced. The DS subsystems permits users to enable specific sets of instrumentation points to select the data relevant for specific purposes. Multiple data streams may be opened simultaneously, and the stream of performance data records in each can be processed on-line or stored for off-line processing. Processing of the stream of performance data records is referred to as “post-processing” partly for historical reasons, as the off-line processing of stored performance data after the completion of an experiment was the only type of processing performed for many years. Currently methods of using views of system behavior derived by processing streams of performance data records on-line, in both OS and user contexts are of increasing interest.

DSUI Examples

The Data Streams User Interface (DSUI) enables instrumentation of user level code and uses the same performance data types, events, counters, and histograms that Data Streams Kernel Interface (DSKI) uses. While performance evaluation based solely on user data is sometimes appropriate, interesting questions most often need to be answered by analysis depending on a mixture of user and kernel level performance data.

Simple DSUI

This is a basic example covering the use of various DSUI instrumentation point types. Also covered is a brief introduction to Data Streams Post-Processing (DSPP), and some of the existing filters used to extract various views of system and application behavior from the raw performance data gathered. The example is discussed in detail here:

Signal Pipeline - DSUI

This is a multi-threaded example application using only DSUI instrumentation points. The parent thread of the application creates a separate thread for each of the pipeline stages, and then generates a sequence of signals that are passed from the beginning of the pipeline to the end. The post-processing of this example is designed to extract several views of the applications behavior, using various filters from the DSPP Filter Library (DSPP-FL). This example also covers how to write a filter to providing a human readable view of the raw performance data containing all the events generated by the multi-threaded pipeline. The example is discussed in detail here:

DSKI Examples

The Data Streams Kernel Interface (DSKI) supports a namespace of instrumentation points in the Linux operating system and any loaded drivers that are available to user programmings wishing to gather kernel level performance data. A DSKI control program dski_cntl (FIXME.D: verify the command name) is available for managing data gathering as specified by configuration files, so most experiments use this command rather than using the Python or C API directly.

A wide range of data is available from the kernel and precisely which data is relevant to a given experiment or application varies widely. While on-line application and system support uses of DSKI are possible and of increasing interest, the majority of DSKI uses at the time of this writing gather data for specific KUSP examples or experiments.

The examples listed here illustrate the use of DSKI to gather a richer view of the Signal Pipeline example, supplementing the DSUI events with a set of DSKI events. This illustrates the basic principle that fully understanding application behavior often requires the addition of DSKI performance data as well as the obvious data obtained directly from the application using DSUI. Note that individual projects and experiments often find that adding new instrumentation points to those already existing in applications, libraries and the kernel is necessary to gather the full set of data required for their experiments’ performance analysis. The design of Data Streams has striven to make that as simple and understandable a process as possible.

Signal Pipeline DSKI

This example extends the Signal Pipeline DSUI example with process context switch events from DSKI. This permits the DSPP filtering to extract views of the execution intervals for each thread implementing a pipeline stage, among others. It also illustrates a potential difficulty when using DSKI: information overload. For example, this example enables the context switch events, which are generated for every process on the system, since every thread crosses the instrumentation points in the kernel context switch code. DSUI level information identifying the PID of each pipeline thread is used to extract relevant events from the raw data, but the volume of raw data is instructive. The example is discussed in detail here:

Signal Pipeline DSKI - Active Filter

This example refines the Signal Pipeline DSKI example by showing how the use of an Active Filter can significantly reduce the number of DS events in the DSKI event stream, and how the instrumentation effect can thus be reduced for a given experiment. The Active Filter in this case is the Per-Task filter which will either accept or reject events generated by the specified set of threads. In this case, the set of threads are those implementing the Signal Pipeline, and the action of the filter is to reject all context switch events generated by threads not in the set.

It is important to realize that since an instrumentation point, once enabled, generates an event for every thread whose locus of control crosses it, the most efficient way to limit the instrumentation overhead is to terminate the events started by irrelevant threads as early as possible. This is the function of the Per-Task filter. It is also important to note that, since Active Filters see every event in the Data Stream to which they are attached, and can take arbitrary actions in response to each event, Active Filters are a powerful mechanism that can be used for a wide range of purposes.