#### System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks

Heechul Yun, Po-Liang Wu, Anshu Arya, Tarek Abdelzaher, Cheolgi Kim, and Lui Sha



illinois.edu

## DVS in Real-time Systems

- The Goal
  - To minimize energy consumption by adjusting freq. and voltage but still meet the deadline
- Most consider CPU only
  - Assume execution time depends on CPU freq.
- But memory and bus are also important
  - Affect execution time (e.g., memory intensive app will be slowed if memory or bus is slow.)
  - Consume considerable energy (similar order of energy compared to CPU)
  - Are DVS capable in many recent embedded processors



### Motivation

Memxfer5b : memory benchmark program





#### Motivation

Dhrystone: CPU benchmark program





#### Contents

- Motivation
- Energy Model
  - Considers CPU, BUS and Memory and task characteristics
  - Evaluation (Model validation)
- Energy Optimization of Real-time Tasks
  - Static multi-DVS problem and solution
  - Evaluation
- Conclusion



## Task Model



• Task = Computation + Memory fetch



# Task Model (3)

• Execution time of a task

$$e = \frac{C}{f_c} + \frac{M}{f_m}$$

- C: CPU cycles of a given task
- *M* : memory cycles of a given task
- $-f_c$ : CPU clock frequency
- $-f_m$ : Memory clock frequency

#### Power Model

• Power of a component (i.e., CPU)

$$W = kfV^2 + R$$

-k: capacitance constant

Different k for different modes:  $k_{active}$  - active mode capacitance  $k_{standby}$ - standby mode capacitance



## **Energy Model**



• Total system energy is

$$E = E_{comp} + E_{mem} + E_{idle}$$



## **Pure Computation Block**



## Memory Fetch Block



## Idle Block



- I : idle mode power consumption.
- -e: execution time (*C*/*f<sub>c</sub>* + *M*/*f<sub>m</sub>*)

## **Energy Model Summary**



- System wide energy model
  - Considers CPU, bus, and memory power consumption
  - Considers active, standby and idle modes
  - Other components are assumed to be static (included in R)



#### **Energy Equation**

$$E = E_{comp} + E_{mem} + E_{idle}$$

$$= (k_{ca}V_{cpu}^2f_c + k_{bs}V_{bus}^2f_b + k_{ms}V_{mem}^2f_m + R) \times \frac{C}{f_c} - CPU \text{ block}$$

$$+ (k_{cs}V_{cpu}^2f_c + k_{ba}V_{bus}^2f_b + k_{ma}V_{mem}^2f_m + R) \times \frac{M}{f_m} - CPU \text{ block}$$

$$+ (I+R) \times (P-e) - CPU \text{ block}$$

• System-wide energy consumption of a task during period P

#### **Evaluation Platform**





# Evaluation Platform (2)

- ARM9 based SoC
  - CPU : up to 200Mhz, BUS : up to 100Mhz
  - CPU and BUS are synchronous (BUS = CPU/N)
  - Memory (PSRAM) freq is equal to system bus frequency  $(f_b = f_m)$
  - CPU, BUS, and memory all share the common voltage
  - Vdd : 1.504V ~ 1.804V (0.32V step)
- Energy equation

$$E = (k_{ca}V^{2}f_{c} + k_{ms}^{*}V^{2}f_{m} + R) \times \frac{C}{f_{c}} + (k_{cs}V^{2}f_{c} + k_{ma}^{*}V^{2}f_{m} + R) \times \frac{M}{f_{m}} + (I+R) \times (P-e)$$

- V: shared voltage for CPU, bus, and memory
- $-k_{ma}^{*}$ : active bus and memory constant
- $-k_{ms}^*$ : standby bus and memory constant

# Validation

- Methodology
  - 4 synthetic programs with different cache stall ratio (0%, 10%, 25%, 55%)
  - 8 clock configurations (f<sub>c</sub>, f<sub>m</sub>) for each program
  - Performed nonlinear least square analysis for total
     32 data points against the energy equation

## **Energy Model Fitting**



#### **Energy Equation for Our Platform**

$$E = (k_{ca}V^{2}f_{c} + k_{ms}^{*}V^{2}f_{m} + R) \times \frac{C}{f_{c}} + (k_{cs}V_{cpu}^{2}f_{c} + k_{ma}^{*}V^{2}f_{m} + R) \times \frac{M}{f_{m}}$$
$$+ (I+R) \times (P-e)$$

| Capacitance (nF) |                 |                   |                   | Power (mW) |        |
|------------------|-----------------|-------------------|-------------------|------------|--------|
| K <sub>ca</sub>  | K <sub>cs</sub> | K <sub>ma</sub> * | K <sub>ms</sub> * | 1          | R      |
| 0.505            | 0.224           | 0.540             | 0.210             | 6.570      | 67.434 |

#### Obtained coefficients in the energy equation



#### Contents

- Motivation
- Energy Model
  - Considers CPU, BUS and Memory and task characteristics
  - Evaluation
- Energy Optimization of Real-time Tasks
  - Static Multi-DVS Problem and optimal solution
  - Evaluation
- Conclusion



## Static Multi-DVS Problem

 Given a set of periodic real-time tasks (T<sub>1</sub>, ..., T<sub>n</sub>), where each task invocation requires up to C<sub>i</sub> CPU cycles and up to M<sub>i</sub> memory cycles at worst.

 Find the energy optimal static frequencies for multiple DVS capable components (CPU, bus, and memory)



#### Problem Formulation

Minimize  $\sum_{i=1}^{n} \frac{H}{P_i} (E_{comp,i} + E_{mem,i}) + E_{idle}$ Subjects to  $\sum_{i=1}^{n} \frac{e_i}{P_i} \le 1.$ 

where

H : hyper period  $e_i$ : execution time of task i *E<sub>comp,i</sub>* : computation block energy of task i  $E_{mem,i}$ : cache stall block energy of task i *E<sub>idle</sub>* : idle block energy



# **Optimal Solution**

- Intuitive procedure
  - Find an unconstrained minimal over  $f_c$  and  $f_m$  ( $f_b = f_m$ )
  - Check boundary conditions due to system specific constraints. (e.g., minimum and maximum clock range)
  - Details are in the paper





Task set :  $C_H = 140^* 10^6$ ,  $M_H = 30^* 10^6$ , H = 3s

## Evaluation

- Compare the following schemes:
  - MAX
    - CPU and memory are all set to maximum.
  - CPU-only static DVS
    - Memory frequency is set to maximum
  - Baseline static multi-DVS
    - CPU and memory frequencies change proportionally
  - Optimal static multi-DVS
    - Proposed scheme
  - Optimal dynamic multi-DVS
    - Can change frequencies at each task schedule
    - Brute force search among all the possible combination
- Simulation setup
  - Use energy equation obtained from measurements on our real hardware platform





Task set cache stall ratio  $(M_H/(C_H+M_H))$ : 0.3



Task set utilization ratio( $e_H/H$ ): 0.5

#### Effect of Diversity of Cache Stall Ratio



# Conclusion

- Energy model
  - Considers multiple DVS capable components and task characteristic
  - Validated on a real hardware platform
- Static multi-DVS problem
  - Assigns energy optimal *static* frequencies of multiple DVS components for periodic real-time tasks
  - Optimal solution (static multi-DVS scheme) shows better energy saving compared to CPU-only DVS



## Thank you.



#### **Additional Slides**



#### **CPU-only DVS**



#### Not effective in allowed range

illinois.edu

(\*) based on energy equation for out h/w platform. Memory clock was set to max

#### **Power Distribution**

Cache stall ratio = 55% (cpu,bus)=(80,80Mhz) Cache stall ratio = 10% (cpu,bus)=(80,80Mhz)





illinois.edu (\*) based on energy equation for our h/w platform E = Ecpu + Emem + Estatic

#### Active and Idle



