You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "mccullocht (via GitHub)" <gi...@apache.org> on 2024/02/28 17:22:56 UTC

[PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

mccullocht opened a new pull request, #13143:
URL: https://github.com/apache/lucene/pull/13143

   ### Description
   
   As of #12806 the hnsw codec has implemented a more complete version of this logic that may trigger without a pre-filter query. The `exactSearch()` might also have some odd behavior surrounding scalar quantization -- IIUC any segment in
   which we hit this path may score against the unquantized vectors which may then be mixed with quantized scored results.
   
   This is related to #12505
   
   <!--
   If this is your first contribution to Lucene, please make sure you have reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "benwtrent (via GitHub)" <gi...@apache.org>.
benwtrent commented on code in PR #13143:
URL: https://github.com/apache/lucene/pull/13143#discussion_r1511329799


##########
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##########
@@ -125,24 +124,11 @@ private TopDocs getLeafResults(
       return NO_RESULTS;
     }
 
-    BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);
-    final int cost = acceptDocs.cardinality();
-
-    if (cost <= k) {
-      // If there are <= k possible matches, short-circuit and perform exact search, since HNSW
-      // must always visit at least k documents
-      return exactSearch(ctx, new BitSetIterator(acceptDocs, cost));
-    }
-
-    // Perform the approximate kNN search
-    // We pass cost + 1 here to account for the edge case when we explore exactly cost vectors
-    TopDocs results = approximateSearch(ctx, acceptDocs, cost + 1, knnCollectorManager);
-    if (results.totalHits.relation == TotalHits.Relation.EQUAL_TO) {
-      return results;
-    } else {
-      // We stopped the kNN search because it visited too many nodes, so fall back to exact search
-      return exactSearch(ctx, new BitSetIterator(acceptDocs, cost));
-    }

Review Comment:
   How do you handle the case when graph search early exits due to the visitation limit? Right now your change will return incomplete results to the user because we stopped exploring the graph.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "benwtrent (via GitHub)" <gi...@apache.org>.
benwtrent commented on PR #13143:
URL: https://github.com/apache/lucene/pull/13143#issuecomment-1971336368

   @mccullocht How does this work against segments that were created before 9.9? I know the new reader will support this. But my concern is the SPI loading Lucene95HNSWReader and that code not handling the under sampling. 
   
   Maybe we should add my change to the other backwards codec readers to make your change possible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "mccullocht (via GitHub)" <gi...@apache.org>.
mccullocht commented on PR #13143:
URL: https://github.com/apache/lucene/pull/13143#issuecomment-1971644555

   @benwtrent Segments before 9.9 would perform a graph search. In 9c3679bf14b0d75cd58e1fda3ca4b8a76aa033b9 I back ported your change from #12806 to 9.1, 9.2, 9.4, and 9.5 since it wasn't too much effort.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "mccullocht (via GitHub)" <gi...@apache.org>.
mccullocht commented on PR #13143:
URL: https://github.com/apache/lucene/pull/13143#issuecomment-1977032706

   @benwtrent I see a couple of possibilities to internalize this in `KnnVectorsReader`:
   * add a `public abstract void exhaustiveSearch(field, target, knnCollector, acceptDocs)` call that internalizes the exact search logic that lives in the query today.
   * add a `public VectorScorer createVectorScorer(field, target)` call to internalize creation of the `VectorScorer` and handling of the similarity function but otherwise leave much of the logic the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #13143:
URL: https://github.com/apache/lucene/pull/13143#issuecomment-2089326240

   This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #13143:
URL: https://github.com/apache/lucene/pull/13143#issuecomment-2060105356

   This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "benwtrent (via GitHub)" <gi...@apache.org>.
benwtrent commented on PR #13143:
URL: https://github.com/apache/lucene/pull/13143#issuecomment-1977065449

   > add a public VectorScorer createVectorScorer(field, target) call to internalize creation of the VectorScorer and handling of the similarity function but otherwise leave much of the logic the same.
   
   I personally prefer this. It should allow it to combine with other docset iterators. I am not 100% sure of the API/name/etc. But it would have to be added to the leafReader.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "benwtrent (via GitHub)" <gi...@apache.org>.
benwtrent commented on PR #13143:
URL: https://github.com/apache/lucene/pull/13143#issuecomment-2061143742

   I agree, I think it can be closed. We will still need the `exactSearch` method, even with refactorings that make it play well with quantization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "vigyasharma (via GitHub)" <gi...@apache.org>.
vigyasharma commented on PR #13143:
URL: https://github.com/apache/lucene/pull/13143#issuecomment-2032786496

   I was looking at this PR since it's marked as stale by our bots. I see that we've already added the [fallback to exact search](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java#L258-L265) when `k > maxOrd()`, and we also have the [createVectorScorer()](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L177) that uses the similarity configured in FieldInfo.
   
   Do we plan to make any other changes as part of this PR? If not, we can close this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] Remove unnecessary `AbstractKnnVectorQuery.exactSearch()` [lucene]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #13143:
URL: https://github.com/apache/lucene/pull/13143#issuecomment-2005421733

   This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org