You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2016/04/14 11:43:25 UTC

[jira] [Updated] (LUCENE-7069) Add LatLonPoint.nearest to find closest indexed point to a given query point

     [ https://issues.apache.org/jira/browse/LUCENE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-7069:
---------------------------------------
    Attachment: LUCENE-7069.patch

Here's another patch ... I think it's nearly ready.

I added "nearest neighbor" to the luceneutil London, UK geo benchmark, and getting top 10 nearest hits from the center of each of the 225 boxes that benchmark tests is extremely fast ... less than 0.5 msec for each call ... KD trees are very good for this ;)

This adds {{LatLonPoint.nearest}}, and it now takes an {{int topN}} (vs. single nearest point in previous patches).

I think it solves the adversary issues of the previous patches, at least the ones that are solvable.  Some are not!  Imagine the index only contains many points indexed from a circle, and then you ask for nearest from its center ... no matter your algorithm, that will amount to a linear scan.

There's one wrinkle though: the search works directly with {{BKDReader}}, because it's a best first search across all segments, picking which cell (across all segments) is the most "promising" to descend into next.  This means it requires that you use {{Lucene60PointsFormat}} (the default codec), and it gets the underlying {{BKDReader}} from that.  I think making this work only through the codec APIs is ... tricky.

There are further optimizations we could do like using {{haversinSortKey}}, or maybe fixing the {{approxBestDistance}} to be a true best distance (I need heavy geo math help for that!), and using Lucene's priority queue not the JDK's, but these all can come later.


> Add LatLonPoint.nearest to find closest indexed point to a given query point
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-7069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7069
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.1
>
>         Attachments: LUCENE-7069.patch, LUCENE-7069.patch, LUCENE-7069.patch
>
>
> KD trees (used by Lucene's new dimensional points) excel at finding "nearest neighbors" to a given query point ... I think we should add this to Lucene's sandbox as:
> {noformat}
>   public static Document nearest(IndexReader r, String field, double lat, double lon) throws IOException
> {noformat}
> I only implemented the 1 nearest neighbor for starters ... I think we can easily generalize this in the future to K nearest.
> It could also be generalized to more than 2 dimensions, but for now I'm making the class package private in sandbox for just the geo2d (lat/lon) use case.
> I don't think this should go into 6.0.0, but should go into 6.1: it's a new feature, and we need to wrap up and ship 6.0.0 already ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org