You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Joel Bernstein <jo...@gmail.com> on 2018/02/07 13:48:51 UTC

Lucene points current and future capabilties

I've been digging into the capabilities of the current points
implementation in Lucene. The use case I'm interested in is K nearest
neighbor search for vectors. The idea is to provide Lucene with a  vector
to search for, seek to a location in the TermsEnum that most closely
matches the vector and then be able to retrieve the K nearest vectors above
and below the match.

Can the current implementation support this type of query? If it doesn't
support this yet, does the underlying structure of the KDTree support this
and it just needs a new TermsEnum implementation?

Thanks,
Joel

Re: Lucene points current and future capabilties

Posted by Joel Bernstein <jo...@gmail.com>.
Thanks Adrien, I'll take a look at the FloatPointNearestNeighbor
implementation. Points are really an exciting feature!


Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Feb 7, 2018 at 8:53 AM, Adrien Grand <jp...@gmail.com> wrote:

> Hi Joel,
>
> You can search for nearest neighbors of a vector, see eg. Steve's
> FloatPointNearestNeighbor in the sandbox. One important limitation is that
> it can only work on the whole index, ie. you can't find for the nearest
> neighbors to a point that also match a filter. If you want to do this, you
> will need to store your vector in a doc-values field and implement a
> SortField that sorts by distance similarly to LatLonPointSortField.
>
> Points do not have the concept of a terms enum, it is rather a tree of
> bounding boxes where the leaves store points that are within this bounding
> box.
>
> Le mer. 7 févr. 2018 à 14:48, Joel Bernstein <jo...@gmail.com> a
> écrit :
>
>> I've been digging into the capabilities of the current points
>> implementation in Lucene. The use case I'm interested in is K nearest
>> neighbor search for vectors. The idea is to provide Lucene with a  vector
>> to search for, seek to a location in the TermsEnum that most closely
>> matches the vector and then be able to retrieve the K nearest vectors above
>> and below the match.
>>
>> Can the current implementation support this type of query? If it doesn't
>> support this yet, does the underlying structure of the KDTree support this
>> and it just needs a new TermsEnum implementation?
>>
>> Thanks,
>> Joel
>>
>>

Re: Lucene points current and future capabilties

Posted by Adrien Grand <jp...@gmail.com>.
Hi Joel,

You can search for nearest neighbors of a vector, see eg. Steve's
FloatPointNearestNeighbor in the sandbox. One important limitation is that
it can only work on the whole index, ie. you can't find for the nearest
neighbors to a point that also match a filter. If you want to do this, you
will need to store your vector in a doc-values field and implement a
SortField that sorts by distance similarly to LatLonPointSortField.

Points do not have the concept of a terms enum, it is rather a tree of
bounding boxes where the leaves store points that are within this bounding
box.

Le mer. 7 févr. 2018 à 14:48, Joel Bernstein <jo...@gmail.com> a écrit :

> I've been digging into the capabilities of the current points
> implementation in Lucene. The use case I'm interested in is K nearest
> neighbor search for vectors. The idea is to provide Lucene with a  vector
> to search for, seek to a location in the TermsEnum that most closely
> matches the vector and then be able to retrieve the K nearest vectors above
> and below the match.
>
> Can the current implementation support this type of query? If it doesn't
> support this yet, does the underlying structure of the KDTree support this
> and it just needs a new TermsEnum implementation?
>
> Thanks,
> Joel
>
>