KU: ACS : Functional Programming Environment (FPE)

Functional Programming Environment (FPE)

Functional programming languages are well-suited for expressing digital signal processing (DSP) algorithms. A signal processing algorithm is typically a recipe for solving a group of mathematical equations. Expressing such an algorithm in a low-level language such as C or Fortran is possible, but the resulting program is typically over specified and constrains how the algorithm may be implemented. For example, a loop in C will be expressed as an iterative sequence of operations when in reality, it may be possible to execute the operations in parallel. Functional programming languages, on the other hand, eliminate assignment from the language, and as a result there are fewer unintended constraints on how the program is executed. This in turn gives a compiler more freedom in mapping the language to a target language.

Functional programming languages are particularly useful for hardware specification. Functional programs are essentially dataflow programs and thus map more readily to hardware that consists of functional units connected by dataflow paths. Furthermore, a transformer can select from several implementations of the algorithm in hardware. For example, if speed is critical then functional units may be replicated and run in parallel when possible, but if space or power is critical then operations that could run in parallel are instead run sequentially using the same hardware.

Using a functional language not only facilitates the transformation to FPGAs, it also facilitates the determination of static properties of a program. Consider timing properties as an example. If the program is written in an imperative language, then the timing properties of the algorithm must be determined when the program is written. There is little chance to adjust timing later in the design process or at execution time.

The Functional Programming Environment (FPE) we propose is a programming environment designed to facilitate the specification of DSP applications and the translation of such applications to soft-hardware. The environment consists of a first-order functional language, a compiler from that language to hardware, and a resource management system to manage partially compiled programs. Each aspect of the environment is addressed in turn.

Language design

The FPE language is a first-order functional language. The language supports streams [6] of integers as its basic datatype. A stream is conceptually an infinite sequence of values. This datatype is particularly appropriate for DSP applications that process a continuous sequence of input values and produce a continuous sequence of output values. Furthermore, in contrast to a function from integers to integers, a function that maps a stream to a stream can manipulate several input and output values at each time interval. An input pattern recognition function is a common example of a function that may manipulate multiple stream values on each time interval.

The primitives of the language will be selected based on our experience with hand-constructing several radio and SAR DSP applications for FPGA boards. Examples include adders, multipliers, and filters, but other primitives may also be useful. At the language level these primitives will be expressed as primitive operators on streams. For example, a ten-bit adder might take as input two streams of ten-bit integers and produce a stream of ten-bit integers.

It will be possible to compute static properties of programs and use pragmas to guide compilation. The choice of pragmas and the static properties computed are dictated largely by the goals of the compiler. In particular, we expect that the compilation process will be constrained by three implementation characteristics: speed, space, and power. The pragmas allow the programmer to direct how the compiler should factor these constraints into the compilation process. For example, the programmer may instruct the compiler to optimize for power at the expense of speed. In terms of static properties computed, examples include timing behavior and power consumption as well as semantic properties of the program such as overflow and underflow behavior.

Compiler architecture

The compiler takes FPE programs as input and maps them to specifications that may be used to program soft-hardware. The compiler may be roughly divided into a front end that performs hardware-independent transformations on a source program and a back end that performs transformations designed to optimize the program for a particular hardware target. The compiler supports separate compilation of code modules and combines them in the back end to produce an application. The front- and back-ends are described separately.

The front end performs source-to-source program transformations and computes global program properties. Program transformations include common sub-expression elimination, procedure integration, copy propagation and constant folding, algebraic transformations, and loop optimizations such as unrolling and moving invariant computations out of loops. Global properties that might be computed include properties such as timing and size characteristics.

After the front end is complete the transformed program is written to a code module database. This database consists of compiled code as well as the code's static properties, as computed by the front end. This information is ultimately used by the back end when generating applications.

Issues associated with the front end's implementation are similar to issues in a traditional compiler, but they are complicated by pragmas. For example, procedure integration and loop unrolling both increase code size but may expose opportunities for copy propagation and constant folding. Fortunately, there has been some work on effective strategies for optimistically allowing code to grow in order to expose optimization opportunities[10], and our choice of a functional source language simplifies some standard optimizations such as hoisting loop invariants out of loop bodies [7].

Besides program transformations the front end must compute global static properties of the input program. We expect that the front end will consult the module database to resolve properties of externally defined functions. Furthermore, we expect that the properties in which we will be interested can be computed compositionally with the primitives being the ground terms. We are prepared, however, to use iterative fixpoint algorithms to compute solutions to recursively defined properties.

At the completion of front-end transformations, the program describes hardware far above the level of actual implementation. The program does not explicitly express timing constraints for example, and it does not specify how components are to be arranged on the FPGA. It also may refer to functions in separately compiled code modules. The role of the back end is to use the code database to combine separately compiled modules, satisfy pragmas on the source program regarding space, speed, and power constraints, and generate a specification that can be used to program soft-hardware.

The primary issues with the back end are respecting pragmas and determining the physical layout of components. While determining layout on the FPGA is a nontrivial problem, there are design tools that take position-independent hardware descriptions and map them onto an FPGA. We expect to target these high-level descriptions initially and ultimately incorporate our own algorithms for determining physical layout.

We expect to handle pragmas in two steps. The initial implementation of the compiler will optimize for only one criterion, but revised implementations will optimize for multiple criteria. For example, it will be possible to write a program that should be optimized for both space and speed, and the compiler will attempt to find a balance between the two.

While optimizing for a single criterion is relatively straightforward, optimizing for multiple criteria is a challenge. There is no analog in the compilers community from which to draw experience. We expect to use a constraint-based strategy in which tolerances are specified quantitatively, and the system attempts to find a compilation strategy that meets the tolerances. Although ideally the constraint-satisfaction process is global across the entire compiler, we expect that our preliminary implementation will attempt to satisfy constraints on a local basis. In other words, each transformation or group of transformations will solve for the constraints independently of previous and following transformations. While this may preclude some solutions, we expect that this approach is tractable while the global approach is not.

Execution-Time Resource Management

A given DSP application may remain largely invariant from one day to the next, but small parts of it may need to be changed. For example, an airplane may be operating over Germany one day and need its radio tailored for the needs of operating in German airspace. The next day, however, that same plane may be over Saudi Arabia and need different signal processing software.

One solution to this problem is to reload the hardware with a complete application designed for the needs of the airplane's crew. This solution, however, requires anticipating the crew's needs when the applications are developed and sent into the field. A more flexible solution would allow the crew to plug components into a parameterized application or more generally, combine isolated components to build an entirely new application for their hardware. By analogy with typical software systems, the more flexible solution corresponds to sending object code into the field and allowing the crews to build applications by linking the object code in whatever way suits their needs.

The proposed FPE environment supports this more flexible solution, although there are issues that must be addressed. After front-end compilation is complete, the compilation process could be suspended at any step. The current intermediate code can be taken as object code, and the compilation completed in the field, perhaps after combining it with other code modules. We envision two points at which the compilation may be suspended.

The first point is immediately after instantiation-independent optimizations have been performed. Object code taken at this point could be shipped as a library into the field where it would further be compiled for whatever programmable hardware is available. This allows generic object code modules to be distributed without concern for the hardware on which it will run. It also allows hardware to be interchanged without the concern of whether or not the compiled code can be compiled to the new hardware.

The second point is after some machine-dependent optimizations have been performed but before the compiler begins making layout decisions. The only remaining step to build an application is to collect the object code, resolve external references, and make layout decisions. One could envision this object code in read-only memory of a shrink-wrapped system, and the application built as part of the system_s configuration sequence.

The issues involved with using the proposed FPE for this model of code distribution includes determining where to stop compilation and whether incorporating these cutpoints will limit the compiler's precision in mapping to soft-hardware. The first issue is a design decision that will be driven by applied uses of the environment. The latter is complicated by experience with modular compilers. A modular compiler may accept programs in multiple source languages and emit code in multiple output languages. For example, a compiler may accept both C and Fortran as the input language, compile them to a common intermediate format, and then continue processing. Similarly, a compiler may target both Alpha and MIPS hardware and invoke separate back ends depending on the target. The disadvantage of such compilers is that in exchange for modularity they do not generate code of the same quality as a compiler that commits to a single source and target language. To what extent this will affect the FPE compiler is an open question.

In either of the above cases, there is a need for execution time management of the processing functions and their mapping to hardware resources. The FPE will support mechanisms to store multiple processing functions, assign those functions to hardware components on demand, and monitor the execution of those functions.

Some Useful links