You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/08/20 01:05:00 UTC
[jira] [Commented] (LUCENE-9583) How should we expose VectorValues.RandomAccess?

    [ https://issues.apache.org/jira/browse/LUCENE-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582097#comment-17582097 ] 

ASF subversion and git services commented on LUCENE-9583:
---------------------------------------------------------

Commit 8308688d786cd6c55fcbe4e59f67966f385989a2 in lucene's branch refs/heads/main from Julie Tibshirani
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=8308688d786 ]

LUCENE-9583: Remove RandomAccessVectorValuesProducer (#1071)

This change folds the `RandomAccessVectorValuesProducer` interface into
`RandomAccessVectorValues`. This reduces the number of interfaces and clarifies
the cloning/ copying behavior.

This is a small simplification related to LUCENE-9583, but does not address the
main issue.

> How should we expose VectorValues.RandomAccess?
> -----------------------------------------------
>
>                 Key: LUCENE-9583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9583
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: 9.0
>            Reporter: Michael Sokolov
>            Assignee: Julie Tibshirani
>            Priority: Major
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> In the newly-added {{VectorValues}} API, we have a {{RandomAccess}} sub-interface. [~jtibshirani] pointed out this is not needed by some vector-indexing strategies which can operate solely using a forward-iterator (it is needed by HNSW), and so in the interest of simplifying the public API we should not expose this internal detail (which by the way surfaces internal ordinals that are somewhat uninteresting outside the random access API).
> I looked into how to move this inside the HNSW-specific code and remembered that we do also currently make use of the RA API when merging vector fields over sorted indexes. Without it, we would need to load all vectors into RAM  while flushing/merging, as we currently do in {{BinaryDocValuesWriter.BinaryDVs}}. I wonder if it's worth paying this cost for the simpler API.
> Another thing I noticed while reviewing this is that I moved the KNN {{search(float[] target, int topK, int fanout)}} method from {{VectorValues}}  to {{VectorValues.RandomAccess}}. This I think we could move back, and handle the HNSW requirements for search elsewhere. I wonder if that would alleviate the major concern here? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org