You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by Chud Muckram <rm...@gmail.com> on 2022/07/22 16:00:51 UTC
Spatial KNN question
Hi,
I have a large dataset of points on the order of hundreds of millions of
points and a dataset of lines that is on the order of millions of lines. I
was looking for a method in apache sedona to do the following:
For every point in my dataset find the distance to the nearest line.
I think somehow using spatial knn to loop over every point would work but I
dont see any function that does that in the documentation. In the
documentation spatial knn does it for one query point and a feature dataset.
Thanks,
Richard
Re: Spatial KNN question
Posted by Jia Yu <ji...@gmail.com>.
Hi Richard,
The problem that you are working on is called KNN join. A distributed and
accurate KNN join is very hard to implement although an existing paper
already provides the detailed algorithm.
I would suggest that you do an approximate KNN join. This could be done in
two steps (1) do a spatial distance join in Sedona. The distance means you
only look for KNN of a spatial object within such distance. (2) For each
object and its potential neighbors, perform a KNN check. You could use
Sedona RDD API to do Step 1, then write a little bit of code to implement
the second logic.
Thanks,
Jia
On Fri, Jul 22, 2022 at 10:39 AM Chud Muckram <rm...@gmail.com> wrote:
> Hi,
>
> I have a large dataset of points on the order of hundreds of millions of
> points and a dataset of lines that is on the order of millions of lines. I
> was looking for a method in apache sedona to do the following:
> For every point in my dataset find the distance to the nearest line.
>
> I think somehow using spatial knn to loop over every point would work but I
> dont see any function that does that in the documentation. In the
> documentation spatial knn does it for one query point and a feature
> dataset.
>
> Thanks,
> Richard
>