for loop - R comparing unequal vectors with inequality -


i have 2 single vector data frames of unequal length

aa<-data.frame(c(2,12,35)) bb<-data.frame(c(1,2,3,4,5,6,7,15,22,36)) 

for each observation in aa want count number of instances bb less aa

my result:

   bb<aa  1   1 2   7 3   9 

i have been able 2 ways creating function , using apply, datasets large , let 1 run night without end.

what have:

fun1<-function(a,b){k<-colsums(b<a)                     k<-k*.000058242}  system.time(replicate(5000,data.frame(apply(aa,1,fun1,b=bb))))        user  system elapsed        3.813   0.011   3.883  

secondly,

fun2<-function(a,b){k<-length(which(b<a))                     k<-k*.000058242}  system.time(replicate(5000,data.frame(apply(aa,1,fun2,b=bb))))    user  system elapsed    3.648   0.006   3.664  

the second function faster in tests, let first run night on dataset bb>1.7m , aa>160k

i found this post, , have tried using with() cannot seem work, tried loop without success.

any or direction appreciated.

thank you!

aa<-data.frame(c(2,12,35)) bb<-data.frame(c(1,2,3,4,5,6,7,15,22,36)) sapply(aa[[1]],function(x)sum(bb[[1]]<x)) # [1] 1 7 9 

some more realistic examples:

n  <- 1.6e3 bb <- sample(1:n,1.7e6,replace=t) aa <- 1:n system.time(sapply(aa,function(x)sum(bb<x))) #    user  system elapsed  #   14.63    2.23   16.87   n  <- 1.6e4 bb <- sample(1:n,1.7e6,replace=t) aa <- 1:n system.time(sapply(aa,function(x)sum(bb<x))) #    user  system elapsed  #  148.77   18.11  167.26  

so length(aa) = 1.6e4 takes 2.5 min (on system), , process scales o(length(aa)) - no surprise there. therefore, full dataset, should run in 25 min. still kind of slow. maybe else come better way.


Comments

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -