You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michael Sokolov (Jira)" <ji...@apache.org> on 2021/01/23 20:25:00 UTC
[jira] [Comment Edited] (LUCENE-9674) Faster advance on Vector
Values
[ https://issues.apache.org/jira/browse/LUCENE-9674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270750#comment-17270750 ]
Michael Sokolov edited comment on LUCENE-9674 at 1/23/21, 8:24 PM:
-------------------------------------------------------------------
Checking luceneutil, that does not surprise me. At some point I had added support for retrieval of vectors (as we had done previously for stored fields), which would in theory exercise this API. KNN search does not exercise it - it uses only the random access API, not the forward iterator (nextDoc) that was optimized here.
I think we had disabled this previously since turning it on would impact *all* tasks, not just the vector tasks. At the same time I see that the way this was implemented *also* uses the random access API, although it should not. My past self was not thinking clearly. In SearchTask we added vector retrieval for tasks https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTask.java#L312
but the way it is done will retrieve the vector for the wrong doc as it is using the docid to retrieve from the ordinal API.
So -- we should fix that last issue and see if we can measure the impact of this change, and then maybe we can find a way to control this per-task?
was (Author: sokolov):
Checking luceneutil, that does not surprise me. At some point I had added support for retrieval of vectors (as we had done previously for stored fields), which would in theory exercise this API. KNN search does not exercise it - it uses only the random access API, not the forward iterator (nextDoc) that was optimized here.
It would be nice to measure the impact, but I think we had disabled this previously since turning it on would impact *all* tasks, not just the vector tasks. At the same time I see that the way this was implemented *also* uses the random access API, although it should not. My past self was not thinking clearly. In SearchTask we added vector retrieval for tasks https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTask.java#L312
but the way it is done will retrieve the vector for the wrong doc as it is using the docid to retrieve from the ordinal API.
So -- we should fix that last issue and see if we can measure the impact of this change, and then maybe we can find a way to control this per-task?
> Faster advance on Vector Values
> -------------------------------
>
> Key: LUCENE-9674
> URL: https://issues.apache.org/jira/browse/LUCENE-9674
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: master (9.0)
> Environment:
> Reporter: Anand Kotriwal
> Priority: Major
> Time Spent: 4h
> Remaining Estimate: 0h
>
> The advance() function in the class Lucene90VectorReader does a linear search for the target document.
> To make it faster we can do a binary search over the "ordToDoc" array which will make the advance operation take logarithmic time to search.This will make retrieving vectors for a sparse set of documents efficient.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org