Andrew Farmer

afarmer@ittc.ku.edu

Functional Programming Group

Information and Telecommunication Technology Center

University of Kansas

February 8, 2013

More General

More Specific

More General

More Specific

Can be performed mechanically

More General

More Specific

Can be performed mechanically

Strictly Improving

Broadly Applicable

No Eureka Steps

Broadly Applicable

No Eureka Steps

```
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
main = print (foldl (+) 0 [1..10000])
```

Notice that `f` is passed unmodified in the recursive call. It is static.

```
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
main = print (foldl (+) 0 [1..10000])
```

Notice that `f` is passed unmodified in the recursive call. It is static.

```
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z ys = let go acc [] = acc
go acc (x:xs) = go (f acc x) xs
in go z ys
main = print (foldl (+) 0 [1..10000])
```

After inlining foldl.

```
main = print (let go acc [] = acc
go acc (x:xs) = go (acc + x) xs
in go 0 [1..10000])
```

Runs about 15% faster.

After inlining foldl.

```
main = print (let go acc [] = acc
go acc (x:xs) = go (acc + x) xs
in go 0 [1..10000])
```

~~Runs about 15% faster.~~

Runs about 20x faster!

Downside: Code Explosion

```
foo n = (foldl (*) 1 [1..n]) `div` (foldl (+) 0 [1..n])
main = print (foo 10000)
```

Downside: Code Explosion

```
foo n = (foldl (*) 1 [1..n]) `div` (foldl (+) 0 [1..n])
main = print (foo 10000)
```

After inlining.

```
foo n = (let go acc [] = acc
go acc (x:xs) = go (acc * x) xs
in go 1 [1..n])
`div`
(let go acc [] = acc
go acc (x:xs) = go (acc + x) xs
in go 0 [1..n])
main = print (foo 10000)
```

Compiler falls back to heuristics.

- Function must have at least two static arguments.
- Do not perform on mutually recursive groups.
- Do not force inlining of wrapper.

The user does not have control.

Compiler falls back to heuristics.

- Function must have at least two static arguments.
- Do not perform on mutually recursive groups.
- Do not force inlining of wrapper.

The user does not have control.

```
{-# INLINE foldl #-}
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z ys = let go acc [] = acc
go acc (x:xs) = go (f acc x) xs
in go z ys
```

This code mixes functional and operational concerns.

Ideally we could separate functional concerns from operational concerns.

```
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
main = print (foldl (+) 0 [1..10000])
```

with 'foldl static-argument with 'main any-call (inline 'foldl)

"The idea is this: that ordinary programmers (i.e. not compiler writers) should be able to build libraries that, through the medium of programmer-speciﬁed rewrite rules, effectively extend the compiler with domain-specific knowledge."

S. Peyton Jones. Call-Pattern Specialisation for Haskell Programs. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional programming, pages 327-337, Freiburg, Germany, 2007.

- Compiler Flags
- Pragmas
- Rewrite Rules

- Enable and disable certain optimization passes.
- Specified externally.
- No fine-grained control.
- There are a lot of them.

Idea: Use genetic algorithms to find optimum compiler flags for your program. |

```
ghc ex1.hs -Odph --make -fforce-recomp -fllvm -optlo-O2
-optlo-globalopt -optlo-loop-unswitch
-optlo-mem2reg -optlo-prune-eh
```

"Now, it is unclear to me why these optimizations where[sic] what was needed..."

http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/

- Annotate code with suggestions to optimizer.
- Require intimate knowledge of compiler.
- Obfuscate source.

```
data SPEC = SPEC | SPEC2
{-# ANN type SPEC ForceSpecConstr #-}
foldl :: (a -> b -> a) -> a -> Stream b -> a
{-# INLINE [1] foldl #-}
foldl f z (Stream step s _) = foldl_loop SPEC z s
where
foldl_loop !sPEC z s =
case step s of
Yield x s' -> foldl_loop sPEC (f z x) s'
Skip -> foldl_loop sPEC z s'
Done -> z
```

"This is all quite ugly; we ought to come up with a better design."

http://darcs.haskell.org/ghc/compiler/specialise/SpecConstr.lhs

- Extend compiler with equational-reasoning rewrites.
- Used extensively by library writers.

```
{-# RULES "map/map"
forall f g xs. map f (map g xs) = map (f . g) xs
#-}
```

Limitations

- No side conditions
- LHS must be function application or constant
- No control over rule application

A transformation that eliminates intermediate data structures.

As an example, consider this function defined using list combinators.

`f n = sum (map (*2) (enumFromTo 1 n))`

A transformation that eliminates intermediate data structures.

As an example, consider this function defined using list combinators.

`f n = sum (map (*2) (enumFromTo 1 n))`

This alternative definition is faster, because it does not allocate any intermediate data structures.

```
f n = let go acc x =
case x > n of
True -> acc
False -> go (acc + x * 2) (x + 1)
in go 0 1
```

Stream fusion allows us to write the first definition and get the performance of the second.

Using a type that can represent lists.

```
data Stream a = forall s. Stream (s -> Step a s) s
data Step a s = Done | Skip s | Yield a s
```

And conversion functions between lists and streams.

```
unstream :: Stream a -> [a]
stream :: [a] -> Stream a
```

List combinators can be written in terms of stream combinators.

```
foldl f z xs = foldlS f z (stream xs)
map f xs = unstream (mapS f (stream xs))
enumFromTo x y = unstream (enumFromToS x y)
```

`f n = foldl (+) 0 (map (*2) (enumFromTo 1 n))`

Inline list combinators.

```
f n = foldlS (+) 0
(stream (unstream
(mapS (*2)
(stream (unstream
(enumFromToS 1 n))))))
```

Apply fusion rule.

```
{-# RULES "stream fusion" forall s. stream (unstream s) = s #-}
f n = foldlS (+) 0 (mapS (*2) (enumFromToS 1 n))
```

Optimize to taste.

- Implemented as a Haskell library.
- Can optimize
`foldl`,`filter`, and`zip`. - Cannot optimize
`concatMap`, because it is too expressive.

- Implemented as a Haskell library.
- Can optimize
`foldl`,`filter`, and`zip`. - Cannot optimize
`concatMap`, because it is too expressive.

```
foreach x in xs {
case x:
A: foreach y in ys { ... }
B: foreach z in zs { ... }
...
}
```

- Rewrites expressible in HERMIT can optimize common cases.

Stream Fusion cannot optimize `concatMap`.

`concatMap :: (a -> Stream b) -> Stream a -> Stream b`

It can, however optimize this `concatMap`-like operation.

`concatMap' :: (s -> Step a s) -> (a -> s) -> Stream a -> Stream b`

In some (common) cases, we can transform the first into the second.

```
forall next f.
concatMap (\x -> Stream next (f x)) = concatMap' next f
```

This rewrite cannot be expressed using GHC's rewrite rules.

Stream Fusion appears to be an instance of a more general transformation, Worker/Wrapper.

Challenge: transform a recursive list function into an non-recursive stream function.

- Extend Stream Fusion to optimize
`concatMap`. - Implement this extended transformation using my language for specifying optimizations.
- Demonstrate Stream Fusion is an instance of Worker/Wrapper.

Evaluation:

Use extended Stream Fusion framework to optimize ADPfusion examples.

C. Höner zu Siederdissen. Sneaking around concatMap: Efficient Combinators for Dynamic Programming. Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming, pages 215–226, Copenhagen, Denmark, 2012.

- Use HERMIT, tell me how to make it better!
- I need more examples!