for loop - R comparing unequal vectors with inequality -
i have 2 single vector data frames of unequal length
aa<-data.frame(c(2,12,35)) bb<-data.frame(c(1,2,3,4,5,6,7,15,22,36))
for each observation in aa want count number of instances bb less aa
my result:
bb<aa 1 1 2 7 3 9
i have been able 2 ways creating function , using apply, datasets large , let 1 run night without end.
what have:
fun1<-function(a,b){k<-colsums(b<a) k<-k*.000058242} system.time(replicate(5000,data.frame(apply(aa,1,fun1,b=bb)))) user system elapsed 3.813 0.011 3.883
secondly,
fun2<-function(a,b){k<-length(which(b<a)) k<-k*.000058242} system.time(replicate(5000,data.frame(apply(aa,1,fun2,b=bb)))) user system elapsed 3.648 0.006 3.664
the second function faster in tests, let first run night on dataset bb>1.7m , aa>160k
i found this post, , have tried using with() cannot seem work, tried loop without success.
any or direction appreciated.
thank you!
aa<-data.frame(c(2,12,35)) bb<-data.frame(c(1,2,3,4,5,6,7,15,22,36)) sapply(aa[[1]],function(x)sum(bb[[1]]<x)) # [1] 1 7 9
some more realistic examples:
n <- 1.6e3 bb <- sample(1:n,1.7e6,replace=t) aa <- 1:n system.time(sapply(aa,function(x)sum(bb<x))) # user system elapsed # 14.63 2.23 16.87 n <- 1.6e4 bb <- sample(1:n,1.7e6,replace=t) aa <- 1:n system.time(sapply(aa,function(x)sum(bb<x))) # user system elapsed # 148.77 18.11 167.26
so length(aa) = 1.6e4
takes 2.5 min (on system), , process scales o(length(aa))
- no surprise there. therefore, full dataset, should run in 25 min. still kind of slow. maybe else come better way.
Comments
Post a Comment