performance - Why doesn't the List fuse as well as the Vector? (Haskell) -
consider following benchmark:
module main import qualified data.list l import qualified data.vector.unboxed u import criterion.main goodsum :: int -> double {-# noinline goodsum #-} goodsum n = let ints = u.enumfromn 0 (n * n * 10) :: u.vector int in u.foldl' (+) 0 $ u.map fromintegral ints badsum :: int -> double {-# noinline badsum #-} badsum n = l.foldl' (+) 0.5 [fromintegral | <- [0 .. 10*n*n]] badsum2 :: int -> double {-# noinline badsum2 #-} badsum2 n = l.foldr (+) 0.5 [fromintegral | <- [0 .. 10*n*n]] worstsum :: int -> double {-# noinline worstsum #-} worstsum n = l.foldl1' (+) $ <- [0 .. n*n] return $ l.foldl1' (+) $ k <- [0 .. 10] return $ fromintegral $ k + main = defaultmain [ bench "good" $ nf goodsum 500 , bench "bad" $ nf badsum 500 , bench "bad2" $ nf badsum2 500 , bench "worst" $ nf worstsum 500 ]
the results:
benchmarking time 1.826 ms (1.819 ms .. 1.835 ms) 1.000 r² (1.000 r² .. 1.000 r²) mean 1.810 ms (1.803 ms .. 1.817 ms) std dev 23.18 μs (19.91 μs .. 27.96 μs) benchmarking bad time 38.38 ms (38.07 ms .. 38.74 ms) 1.000 r² (1.000 r² .. 1.000 r²) mean 38.23 ms (38.07 ms .. 38.38 ms) std dev 298.5 μs (220.6 μs .. 417.8 μs) benchmarking bad2 time 77.87 ms (73.74 ms .. 82.67 ms) 0.992 r² (0.976 r² .. 0.998 r²) mean 78.14 ms (75.33 ms .. 82.13 ms) std dev 5.184 ms (3.056 ms .. 7.966 ms) variance introduced outliers: 19% (moderately inflated) benchmarking worst time 80.80 ms (79.53 ms .. 82.10 ms) 1.000 r² (0.999 r² .. 1.000 r²) mean 80.73 ms (80.29 ms .. 81.19 ms) std dev 756.9 μs (591.9 μs .. 938.2 μs)
list comprehensions good producers , foldr
consumer, why didn't list fuse?
contrary question, foldr
, list comprehension did, in fact, fuse. however, have recall definition of foldr
, not not tail recursive function. prelude defines foldr
as, not compile down tight loop vector
based example.
foldr k z = go go [] = z go (y:ys) = y `k` go ys
the important bit of core generated badsum2
looks this
$wgo_s8ah [occ=loopbreaker] :: int# -> double# $wgo_s8ah = \ (w_s8ad :: int#) -> case tagtoenum# @ bool (==# w_s8ad y1_a69v) of _ [occ=dead] { false -> case $wgo_s8ah (+# w_s8ad 1) of ww_s8ag { __default -> +## (int2double# w_s8ad) ww_s8ag }; true -> +## (int2double# w_s8ad) 0.5
which equivalent function (modulo unboxed arithmetic)
badsum3 :: int -> double badsum3 n = go 0 stop = 10 * n * n go | == stop = fromintegral + 0.5 | otherwise = fromintegral + go (i + 1)
running through criterion, function gives same runtime badsum2
. while generated function not producing , consuming intermediate cons cells, still performing function calls , doing associated stack operations.
the poor performance for foldl'
based version due fact foldl'
not consumer, cannot fuse list comprehension. left-fold produce tail recursive loop walks list produced list comprehension, incurring overhead of allocation , associated memory operations.
i not sure if can same performance vector
operations using standard list operations, stream-fusion package provides list combinators using different fusion method, can achieve similar performance problem.
import qualified data.list.stream s -- stream-fusion package not seem have `enumfrom` functions enumerate :: int -> [int] enumerate n = s.unfoldr f 0 f | > n = nothing | otherwise = (i, + 1) goodsum2 :: int -> double goodsum2 n = s.foldl' (+) 0.5 $ s.map fromintegral $ enumerate (n * n * 10)
Comments
Post a Comment