You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/02/08 08:04:13 UTC

[GitHub] [lucene] jpountz commented on a change in pull request #656: LUCENE-10382: Support filtering in KnnVectorQuery

jpountz commented on a change in pull request #656:
URL: https://github.com/apache/lucene/pull/656#discussion_r801354469



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene91/Lucene91HnswVectorsReader.java
##########
@@ -227,16 +231,36 @@ public TopDocs search(String field, float[] target, int k, Bits acceptDocs) thro
 
     // bound k by total number of vectors to prevent oversizing data structures
     k = Math.min(k, fieldEntry.size());
-
     OffHeapVectorValues vectorValues = getOffHeapVectorValues(fieldEntry);
+
+    DocIdSetIterator acceptIterator = null;
+    int visitedLimit = Integer.MAX_VALUE;
+
+    if (acceptDocs instanceof BitSet acceptBitSet) {

Review comment:
       I think I have a preference for option 2
    - This feels like a high-level query planning decision, which belongs more to the query API than to the codec API.
    - My gut feeling is that a limit on the number of considered candidates is something that would be generalizable to most NN algorithms.
    - Queries might have better options than a BitSet at times, e.g. if the filter is a `IndexSortSortedNumericDocValuesRangeQuery`, then you could have both a Bits and DocIdSetIterator view of the matches that do not require materializing a BitSet.
    - Vectors are currently not handled by `ExitableDirectoryReader`. Option 1 would require adding a BitSet wrapper, while we'd like to keep the number of sub classes of `BitSet` to exactly 2, a case that the JVM handles better. With option 2 we could go with just a `Bits` wrapper that would check the timeout whenever `Bit#get` is called?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org