You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael Sokolov (Jira)" <ji...@apache.org> on 2021/01/23 20:25:00 UTC

[jira] [Comment Edited] (LUCENE-9674) Faster advance on Vector Values

    [ https://issues.apache.org/jira/browse/LUCENE-9674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270750#comment-17270750 ] 

Michael Sokolov edited comment on LUCENE-9674 at 1/23/21, 8:24 PM:
-------------------------------------------------------------------

Checking luceneutil, that does not surprise me. At some point I had added support for retrieval of vectors (as we had done previously for stored fields), which would in theory exercise this API. KNN search does not exercise it - it uses only the random access API, not the forward iterator (nextDoc) that was optimized here.

I think we had disabled this previously since turning it on would impact *all* tasks, not just the vector tasks. At the same time I see that the way this was implemented *also* uses the random access API, although it should not. My  past self was not thinking clearly.  In SearchTask we added vector retrieval for tasks  https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTask.java#L312
but the way it is done will retrieve the vector for the wrong doc as it is using the docid to retrieve from the ordinal API.

So -- we should fix that last issue and see if we can measure the impact of this change, and then maybe we can find a way to control this per-task?


was (Author: sokolov):
Checking luceneutil, that does not surprise me. At some point I had added support for retrieval of vectors (as we had done previously for stored fields), which would in theory exercise this API. KNN search does not exercise it - it uses only the random access API, not the forward iterator (nextDoc) that was optimized here.

It would be nice to measure the impact, but I think we had disabled this previously since turning it on would impact *all* tasks, not just the vector tasks. At the same time I see that the way this was implemented *also* uses the random access API, although it should not. My  past self was not thinking clearly.  In SearchTask we added vector retrieval for tasks  https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTask.java#L312
but the way it is done will retrieve the vector for the wrong doc as it is using the docid to retrieve from the ordinal API.

So -- we should fix that last issue and see if we can measure the impact of this change, and then maybe we can find a way to control this per-task?

> Faster advance on Vector Values
> -------------------------------
>
>                 Key: LUCENE-9674
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9674
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: master (9.0)
>         Environment:  
>            Reporter: Anand Kotriwal
>            Priority: Major
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> The advance() function in the class Lucene90VectorReader does a linear search for the target document.
> To make it faster we can do a  binary search over the "ordToDoc" array which will make the advance operation take logarithmic time to search.This will make retrieving vectors for a sparse set of documents efficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org