performance - Why doesn't the List fuse as well as the Vector? (Haskell) -


consider following benchmark:

module main  import qualified data.list l import qualified data.vector.unboxed u import criterion.main    goodsum :: int -> double {-# noinline goodsum #-} goodsum n =   let ints = u.enumfromn 0 (n * n * 10) :: u.vector int   in u.foldl' (+) 0 $ u.map fromintegral ints  badsum :: int -> double {-# noinline badsum #-} badsum n = l.foldl' (+) 0.5 [fromintegral | <- [0 .. 10*n*n]]  badsum2 :: int -> double {-# noinline badsum2 #-} badsum2 n = l.foldr (+) 0.5 [fromintegral | <- [0 .. 10*n*n]]  worstsum :: int -> double {-# noinline worstsum #-} worstsum n = l.foldl1' (+) $   <- [0 .. n*n]   return $ l.foldl1' (+) $     k <- [0 .. 10]     return $ fromintegral $ k +  main =   defaultmain     [ bench "good" $ nf goodsum 500     , bench "bad" $ nf badsum 500     , bench "bad2" $ nf badsum2 500     , bench "worst" $ nf worstsum 500     ] 

the results:

benchmarking time                 1.826 ms   (1.819 ms .. 1.835 ms)                      1.000 r²   (1.000 r² .. 1.000 r²) mean                 1.810 ms   (1.803 ms .. 1.817 ms) std dev              23.18 μs   (19.91 μs .. 27.96 μs)  benchmarking bad time                 38.38 ms   (38.07 ms .. 38.74 ms)                      1.000 r²   (1.000 r² .. 1.000 r²) mean                 38.23 ms   (38.07 ms .. 38.38 ms) std dev              298.5 μs   (220.6 μs .. 417.8 μs)  benchmarking bad2 time                 77.87 ms   (73.74 ms .. 82.67 ms)                      0.992 r²   (0.976 r² .. 0.998 r²) mean                 78.14 ms   (75.33 ms .. 82.13 ms) std dev              5.184 ms   (3.056 ms .. 7.966 ms) variance introduced outliers: 19% (moderately inflated)  benchmarking worst time                 80.80 ms   (79.53 ms .. 82.10 ms)                      1.000 r²   (0.999 r² .. 1.000 r²) mean                 80.73 ms   (80.29 ms .. 81.19 ms) std dev              756.9 μs   (591.9 μs .. 938.2 μs) 

list comprehensions good producers , foldr consumer, why didn't list fuse?

contrary question, foldr , list comprehension did, in fact, fuse. however, have recall definition of foldr , not not tail recursive function. prelude defines foldr as, not compile down tight loop vector based example.

foldr k z = go       go []     = z     go (y:ys) = y `k` go ys 

the important bit of core generated badsum2 looks this

$wgo_s8ah [occ=loopbreaker] :: int# -> double# $wgo_s8ah =   \ (w_s8ad :: int#) ->     case tagtoenum#            @ bool (==# w_s8ad y1_a69v)     of _ [occ=dead] {       false ->         case $wgo_s8ah (+# w_s8ad 1) of ww_s8ag { __default ->         +## (int2double# w_s8ad) ww_s8ag         };       true -> +## (int2double# w_s8ad) 0.5 

which equivalent function (modulo unboxed arithmetic)

badsum3 :: int -> double badsum3 n = go 0       stop = 10 * n * n      go | == stop = fromintegral + 0.5          | otherwise = fromintegral + go (i + 1) 

running through criterion, function gives same runtime badsum2. while generated function not producing , consuming intermediate cons cells, still performing function calls , doing associated stack operations.

the poor performance for foldl' based version due fact foldl' not consumer, cannot fuse list comprehension. left-fold produce tail recursive loop walks list produced list comprehension, incurring overhead of allocation , associated memory operations.

i not sure if can same performance vector operations using standard list operations, stream-fusion package provides list combinators using different fusion method, can achieve similar performance problem.

import qualified data.list.stream s  -- stream-fusion package not seem have `enumfrom` functions enumerate :: int -> [int] enumerate n = s.unfoldr f 0       f | > n     = nothing         | otherwise = (i, + 1)  goodsum2 :: int -> double goodsum2 n = s.foldl' (+) 0.5 $ s.map fromintegral $ enumerate (n * n * 10) 

Comments

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -