You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2017/12/11 18:18:00 UTC

[jira] [Updated] (LUCENE-8091) Better nearest-neighbor queries

     [ https://issues.apache.org/jira/browse/LUCENE-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-8091:
---------------------------------
    Attachment: LUCENE-8091.patch

Here is a prototype that demonstrates the idea in the 1D case. For instance when running a nearest-neighbor search given a dataset of 1M points uniformly distributed between 0 and 1M, a regular sorted search needs to visit the 1M documents (as expected), while this new special query only requires ~11k calls to DocIdSetIterator.nextDoc / TopDocsCollector.collect, ~32k calls to IntersectVisitor.visit, ~3k calls to IntersectVisitor.compare and runs about 7x faster.

This patch needs a lot of cleaning/testing before being ready to commit.

> Better nearest-neighbor queries
> -------------------------------
>
>                 Key: LUCENE-8091
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8091
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8091.patch
>
>
> LatLonPoint.nearest is very efficient at identifying the top-k documents sorted by distance from a given point, by working directly on the BKD tree. This doesn't support filtering though, so if you need to filter by another property, you need to switch to querying on the filter and sorting by a LatLonPointSortField. Unfortunately this requires visiting all documents that match the filter.
> We could leverage the new {{setMinCompetitiveScore}} API introduced in LUCENE-4100 in order to allow for retrieval of nearest neighbors with arbitrary filters, by recomputing a bounding-box when a new minimum competitive score is set.
> In the future we could also leverage this to speed up queries that are boosted by distance. For instance if the final score is a weighted sum of the score on a text field and a distance-based score, and the minimum competitive score gets higher than the maximum score that may be produced on the text field at some point, then we could dynamically prune hits based on the distance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org