r - Closest point to a path -
i have 2 sets of points, called path
, centers
. each point in path
, efficient method finding id of closest point in centers
. in r. below simple reproducible example.
set.seed(1) n <- 10000 x <- 100*cumprod(1 + rnorm(n, 0.0001, 0.002)) y <- 50*cumprod(1 + rnorm(n, 0.0001, 0.002)) path <- data.frame(cbind(x=x, y=y)) centers <- expand.grid(x=seq(0, 500,by=0.5) + rnorm(1001), y=seq(0, 500, by=0.2) + rnorm(2501)) centers$id <- seq(nrow(centers))
x
, y
coordinates. add column path
data.frame has id of closest center given x , y co-ordinate. want of unique ids.
my solution @ moment work, slow when scale of problem increases. more efficient.
path$closest.id <- sapply(seq(nrow(path)), function(z){ tmp <- ((centers$x - path[z, 'x'])^2) + ((centers$y - path[z, 'y'])^2) as.numeric(centers[tmp == min(tmp), 'id']) }) output <- unique(path$closest.id)
any on speeding appreciated.
i think data.table
might help, ideally looking algorithm perhaps bit smarter in terms of search, i.e. instead of calculating distances each center , selecting minimum one... id...
i happy use rcpp
/rcpp11
if improve performance.
my minimum acceptable time perform kind of calculation out 10 seconds, faster better.
you can nn2
rann
package. on system, computes nearest center
each of path
points in under 2 seconds.
library(rann) system.time(closest <- nn2(centers[, 1:2], path, 1)) # user system elapsed # 1.41 0.14 1.55 sapply(closest, head) # nn.idx nn.dists # [1,] 247451 0.20334929 # [2,] 250454 0.12326323 # [3,] 250454 0.28540127 # [4,] 253457 0.05178687 # [5,] 253457 0.13324137 # [6,] 253457 0.09009626
here's example 2.5 million candidate points fall within extent of path
points (in example, centers
have larger x
, y
range path
points). it's little slower in case.
set.seed(1) centers2 <- cbind(runif(2.5e6, min(x), max(x)), runif(2.5e6, min(y), max(y))) system.time(closest2 <- nn2(centers2, path, 1)) # user system elapsed # 2.96 0.11 3.07 sapply(closest2, head) # nn.idx nn.dists # [1,] 730127 0.025803703 # [2,] 375514 0.025999069 # [3,] 2443707 0.047259283 # [4,] 62780 0.022747930 # [5,] 1431847 0.002482623 # [6,] 2199405 0.028815865
this can compared output using sp::spdistsn1
(which slower problem):
library(sp) apply(head(path), 1, function(x) which.min(spdistsn1(centers, x))) # 1 2 3 4 5 6 # 730127 375514 2443707 62780 1431847 2199405
adding point id path
data.frame , reducing unique values trivial:
path$closest.id <- closest$nn.idx output <- unique(path$closest.id)
Comments
Post a Comment