You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by Chud Muckram <rm...@gmail.com> on 2022/07/22 16:00:51 UTC

Spatial KNN question

Hi,

I have a large dataset of points on the order of hundreds of millions of
points and a dataset of lines that is on the order of millions of lines.  I
was looking for a method in apache sedona to do the following:
For every point in my dataset find the distance to the nearest line.

I think somehow using spatial knn to loop over every point would work but I
dont see any function that does that in the documentation.  In the
documentation spatial knn does it for one query point and a feature dataset.

Thanks,
Richard

Re: Spatial KNN question

Posted by Jia Yu <ji...@gmail.com>.
Hi Richard,

The problem that you are working on is called KNN join. A distributed and
accurate KNN join is very hard to implement although an existing paper
already provides the detailed algorithm.

I would suggest that you do an approximate KNN join. This could be done in
two steps (1) do a spatial distance join in Sedona. The distance means you
only look for KNN of a spatial object within such distance. (2) For each
object and its potential neighbors, perform a KNN check. You could use
Sedona RDD API to do Step 1, then write a little bit of code to implement
the second logic.

Thanks,
Jia

On Fri, Jul 22, 2022 at 10:39 AM Chud Muckram <rm...@gmail.com> wrote:

> Hi,
>
> I have a large dataset of points on the order of hundreds of millions of
> points and a dataset of lines that is on the order of millions of lines.  I
> was looking for a method in apache sedona to do the following:
> For every point in my dataset find the distance to the nearest line.
>
> I think somehow using spatial knn to loop over every point would work but I
> dont see any function that does that in the documentation.  In the
> documentation spatial knn does it for one query point and a feature
> dataset.
>
> Thanks,
> Richard
>