parallel processing - How to distribute MATLAB's parfor workers between GPUs AND CPUs (cores)? -


i have computation has structure of binary tree, @ each node bunch of highly vectorized functions take output of previous branches produce new branch(es) (nodes on same level independent). since functions vectorized, run both on cpu or gpu, latter naturally giving substantially faster execution.

i have access 4-gpu 2-cpu workstation run code on , use optimally can. understand how use parfor on gpus or on cpus' cores only, reasonably distribute workload between gpus , cpu, since gpu execution leaves many cpu cores @ idle, , though slower gpus, still fast enough have noticeable impact on total execution time.

(q1) since functions in each node vectorized, reasonable run independent nodes in 1-node-per-core mode? or strictly depend on particular case? there "rule of thumb" such dilemmas?

(q2) assuming in (q1) simultaneous execution of 1 node per core suboptimal, there way assign several cpu cores 1 worker?

(q3) there way distribute parfor workers between gpus , cpus in efficient way?

here don't consider particularly efficient in (q3): depending on loop index, loop instance can execute gpu code on given gpudevice or on cpu (core). knowing performance difference between gpu , cpu execution, 1 can deduce suitable proportion of indices assigned cpu execution. problem parfor not pick loop indices in particular order, in turn can lead instances tries execute 2 independent tasks on same gpu, inefficient since have serialize tasks.

thanks!


Comments

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -