r - Closest point to a path -
i have 2 sets of points, called path , centers. each point in path, efficient method finding id of closest point in centers. in r. below simple reproducible example.
set.seed(1) n <- 10000 x <- 100*cumprod(1 + rnorm(n, 0.0001, 0.002)) y <- 50*cumprod(1 + rnorm(n, 0.0001, 0.002)) path <- data.frame(cbind(x=x, y=y)) centers <- expand.grid(x=seq(0, 500,by=0.5) + rnorm(1001), y=seq(0, 500, by=0.2) + rnorm(2501)) centers$id <- seq(nrow(centers)) x , y coordinates. add column path data.frame has id of closest center given x , y co-ordinate. want of unique ids.
my solution @ moment work, slow when scale of problem increases. more efficient.
path$closest.id <- sapply(seq(nrow(path)), function(z){ tmp <- ((centers$x - path[z, 'x'])^2) + ((centers$y - path[z, 'y'])^2) as.numeric(centers[tmp == min(tmp), 'id']) }) output <- unique(path$closest.id) any on speeding appreciated.
i think data.table might help, ideally looking algorithm perhaps bit smarter in terms of search, i.e. instead of calculating distances each center , selecting minimum one... id...
i happy use rcpp/rcpp11 if improve performance.
my minimum acceptable time perform kind of calculation out 10 seconds, faster better.
you can nn2 rann package. on system, computes nearest center each of path points in under 2 seconds.
library(rann) system.time(closest <- nn2(centers[, 1:2], path, 1)) # user system elapsed # 1.41 0.14 1.55 sapply(closest, head) # nn.idx nn.dists # [1,] 247451 0.20334929 # [2,] 250454 0.12326323 # [3,] 250454 0.28540127 # [4,] 253457 0.05178687 # [5,] 253457 0.13324137 # [6,] 253457 0.09009626 here's example 2.5 million candidate points fall within extent of path points (in example, centers have larger x , y range path points). it's little slower in case.
set.seed(1) centers2 <- cbind(runif(2.5e6, min(x), max(x)), runif(2.5e6, min(y), max(y))) system.time(closest2 <- nn2(centers2, path, 1)) # user system elapsed # 2.96 0.11 3.07 sapply(closest2, head) # nn.idx nn.dists # [1,] 730127 0.025803703 # [2,] 375514 0.025999069 # [3,] 2443707 0.047259283 # [4,] 62780 0.022747930 # [5,] 1431847 0.002482623 # [6,] 2199405 0.028815865 this can compared output using sp::spdistsn1 (which slower problem):
library(sp) apply(head(path), 1, function(x) which.min(spdistsn1(centers, x))) # 1 2 3 4 5 6 # 730127 375514 2443707 62780 1431847 2199405 adding point id path data.frame , reducing unique values trivial:
path$closest.id <- closest$nn.idx output <- unique(path$closest.id)
Comments
Post a Comment