r - Closest point to a path -


i have 2 sets of points, called path , centers. each point in path, efficient method finding id of closest point in centers. in r. below simple reproducible example.

set.seed(1) n <- 10000 x <- 100*cumprod(1 + rnorm(n, 0.0001, 0.002)) y <- 50*cumprod(1 + rnorm(n, 0.0001, 0.002))  path <- data.frame(cbind(x=x, y=y))  centers <- expand.grid(x=seq(0, 500,by=0.5) + rnorm(1001),                         y=seq(0, 500, by=0.2) + rnorm(2501))  centers$id <- seq(nrow(centers)) 

x , y coordinates. add column path data.frame has id of closest center given x , y co-ordinate. want of unique ids.

my solution @ moment work, slow when scale of problem increases. more efficient.

path$closest.id <- sapply(seq(nrow(path)), function(z){    tmp <- ((centers$x - path[z, 'x'])^2) + ((centers$y - path[z, 'y'])^2)    as.numeric(centers[tmp == min(tmp), 'id']) })  output <- unique(path$closest.id) 

any on speeding appreciated.

i think data.table might help, ideally looking algorithm perhaps bit smarter in terms of search, i.e. instead of calculating distances each center , selecting minimum one... id...

i happy use rcpp/rcpp11 if improve performance.

my minimum acceptable time perform kind of calculation out 10 seconds, faster better.

you can nn2 rann package. on system, computes nearest center each of path points in under 2 seconds.

library(rann) system.time(closest <- nn2(centers[, 1:2], path, 1))  #   user  system elapsed  #   1.41    0.14    1.55     sapply(closest, head)  #      nn.idx   nn.dists # [1,] 247451 0.20334929 # [2,] 250454 0.12326323 # [3,] 250454 0.28540127 # [4,] 253457 0.05178687 # [5,] 253457 0.13324137 # [6,] 253457 0.09009626 

here's example 2.5 million candidate points fall within extent of path points (in example, centers have larger x , y range path points). it's little slower in case.

set.seed(1) centers2 <- cbind(runif(2.5e6, min(x), max(x)), runif(2.5e6, min(y), max(y))) system.time(closest2 <- nn2(centers2, path, 1))  #   user  system elapsed  #   2.96    0.11    3.07   sapply(closest2, head)  #       nn.idx    nn.dists # [1,]  730127 0.025803703 # [2,]  375514 0.025999069 # [3,] 2443707 0.047259283 # [4,]   62780 0.022747930 # [5,] 1431847 0.002482623 # [6,] 2199405 0.028815865 

this can compared output using sp::spdistsn1 (which slower problem):

library(sp) apply(head(path), 1, function(x) which.min(spdistsn1(centers, x)))  #       1       2       3       4       5       6  #  730127  375514 2443707   62780 1431847 2199405  

adding point id path data.frame , reducing unique values trivial:

path$closest.id <- closest$nn.idx output <- unique(path$closest.id) 

Comments

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -