Towards a HERMIT-based DSL for Specifying Optimizations in GHC


Andrew Farmer
afarmer@ittc.ku.edu

Functional Programming Group
Information and Telecommunication Technology Center
University of Kansas

February 8, 2013

Optimization Spectrum

More General
More Specific

Optimization Spectrum

More General
More Specific
Can be performed mechanically

Optimization Spectrum

More General
More Specific
Can be performed mechanically
Strictly Improving
Broadly Applicable
No Eureka Steps

Example: Static-Argument Transformation

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z []     = z
foldl f z (x:xs) = foldl f (f z x) xs

main = print (foldl (+) 0 [1..10000])

Notice that f is passed unmodified in the recursive call. It is static.

Example: Static-Argument Transformation

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z []     = z
foldl f z (x:xs) = foldl f (f z x) xs

main = print (foldl (+) 0 [1..10000])

Notice that f is passed unmodified in the recursive call. It is static.

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z ys = let go acc []     = acc
                   go acc (x:xs) = go (f acc x) xs
               in go z ys

main = print (foldl (+) 0 [1..10000])

Example: Static-Argument Transformation

After inlining foldl.

main = print (let go acc []     = acc
                  go acc (x:xs) = go (acc + x) xs
              in go 0 [1..10000])

Runs about 15% faster.

Example: Static-Argument Transformation

After inlining foldl.

main = print (let go acc []     = acc
                  go acc (x:xs) = go (acc + x) xs
              in go 0 [1..10000])

Runs about 15% faster.

Runs about 20x faster!

Example: Static-Argument Transformation

Downside: Code Explosion

foo n = (foldl (*) 1 [1..n]) `div` (foldl (+) 0 [1..n])

main = print (foo 10000)

Example: Static-Argument Transformation

Downside: Code Explosion

foo n = (foldl (*) 1 [1..n]) `div` (foldl (+) 0 [1..n])

main = print (foo 10000)

After inlining.

foo n = (let go acc []     = acc
             go acc (x:xs) = go (acc * x) xs
         in go 1 [1..n])
        `div`
        (let go acc []     = acc
             go acc (x:xs) = go (acc + x) xs
         in go 0 [1..n])

main = print (foo 10000)

Example: Static-Argument Transformation

Compiler falls back to heuristics.

The user does not have control.

Example: Static-Argument Transformation

Compiler falls back to heuristics.

The user does not have control.

{-# INLINE foldl #-}
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z ys = let go acc []     = acc
                   go acc (x:xs) = go (f acc x) xs
               in go z ys

This code mixes functional and operational concerns.

Example: Static-Argument Transformation

Ideally we could separate functional concerns from operational concerns.

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f z []     = z
foldl f z (x:xs) = foldl f (f z x) xs

main = print (foldl (+) 0 [1..10000])
with 'foldl
    static-argument

with 'main
    any-call (inline 'foldl)

Quote

"The idea is this: that ordinary programmers (i.e. not compiler writers) should be able to build libraries that, through the medium of programmer-specified rewrite rules, effectively extend the compiler with domain-specific knowledge."
S. Peyton Jones. Call-Pattern Specialisation for Haskell Programs. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional programming, pages 327-337, Freiburg, Germany, 2007.

Interfaces to GHC's Optimizer

Flags

Flags

Idea: Use genetic algorithms to find optimum compiler flags for your program.
ghc ex1.hs -Odph --make -fforce-recomp -fllvm -optlo-O2
           -optlo-globalopt -optlo-loop-unswitch
           -optlo-mem2reg -optlo-prune-eh
"Now, it is unclear to me why these optimizations where[sic] what was needed..."

http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/

Pragmas

data SPEC = SPEC | SPEC2
{-# ANN type SPEC ForceSpecConstr #-}

foldl :: (a -> b -> a) -> a -> Stream b -> a
{-# INLINE [1] foldl #-}
foldl f z (Stream step s _) = foldl_loop SPEC z s
    where
        foldl_loop !sPEC z s =
            case step s of
                Yield x s' -> foldl_loop sPEC (f z x) s'
                Skip       -> foldl_loop sPEC z s'
                Done       -> z
"This is all quite ugly; we ought to come up with a better design."

http://darcs.haskell.org/ghc/compiler/specialise/SpecConstr.lhs

Rewrite Rules

{-# RULES "map/map"
    forall f g xs. map f (map g xs) = map (f . g) xs
#-}

Limitations

Case Study: Stream Fusion

A transformation that eliminates intermediate data structures.

As an example, consider this function defined using list combinators.

f n = sum (map (*2) (enumFromTo 1 n))

Case Study: Stream Fusion

A transformation that eliminates intermediate data structures.

As an example, consider this function defined using list combinators.

f n = sum (map (*2) (enumFromTo 1 n))

This alternative definition is faster, because it does not allocate any intermediate data structures.

f n = let go acc x =
            case x > n of
                True  -> acc
                False -> go (acc + x * 2) (x + 1)
      in go 0 1

Stream fusion allows us to write the first definition and get the performance of the second.

Stream Fusion

Using a type that can represent lists.

data Stream a = forall s. Stream (s -> Step a s) s
data Step a s = Done | Skip s | Yield a s

And conversion functions between lists and streams.

unstream :: Stream a -> [a]
stream   :: [a] -> Stream a

List combinators can be written in terms of stream combinators.

foldl f z xs = foldlS f z (stream xs)
map f xs = unstream (mapS f (stream xs))
enumFromTo x y = unstream (enumFromToS x y)

Stream Fusion

f n = foldl (+) 0 (map (*2) (enumFromTo 1 n))

Inline list combinators.

f n = foldlS (+) 0
        (stream (unstream
            (mapS (*2)
                (stream (unstream
                    (enumFromToS 1 n))))))

Apply fusion rule.

{-# RULES "stream fusion" forall s. stream (unstream s) = s #-}

f n = foldlS (+) 0 (mapS (*2) (enumFromToS 1 n))

Optimize to taste.

Case Study: Stream Fusion

Case Study: Stream Fusion

foreach x in xs {
    case x:
        A: foreach y in ys { ... }
        B: foreach z in zs { ... }
        ...
}

concatMap

Stream Fusion cannot optimize concatMap.

concatMap :: (a -> Stream b) -> Stream a -> Stream b

It can, however optimize this concatMap-like operation.

concatMap' :: (s -> Step a s) -> (a -> s) -> Stream a -> Stream b

In some (common) cases, we can transform the first into the second.

forall next f.
    concatMap (\x -> Stream next (f x)) = concatMap' next f

This rewrite cannot be expressed using GHC's rewrite rules.

Case Study: Stream Fusion

Stream Fusion appears to be an instance of a more general transformation, Worker/Wrapper.

Challenge: transform a recursive list function into an non-recursive stream function.

Case Study: Stream Fusion

Evaluation:

Use extended Stream Fusion framework to optimize ADPfusion examples.

C. Höner zu Siederdissen. Sneaking around concatMap: Efficient Combinators for Dynamic Programming. Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming, pages 215–226, Copenhagen, Denmark, 2012.

Conclusion