You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/20 18:16:39 UTC

[GitHub] [lucene] jtibshirani commented on a diff in pull request #11790: Mark HNSW search results incomplete when fewer than topK are found

jtibshirani commented on code in PR #11790:
URL: https://github.com/apache/lucene/pull/11790#discussion_r975678640


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##########
@@ -267,6 +267,9 @@ private NeighborQueue searchLevel(
     while (results.size() > topK) {
       results.pop();
     }
+    if (level == 0 && results.size() < topK && numVisited < size) {

Review Comment:
   I'm feeling unsure about this check -- I'm having trouble understanding the general principle. First, it only applies when a filter is present, but it seems like we could run into the problem even without filtering (in a highly disconnected graph?)
   
   Also, could it sometimes cause surprisingly slow searches? Let's say you had a filter that matched 90% of documents. Because the graph is disconnected, you return find fewer than `topK` matches. Then you fall back to an exact scan, but this needs to consider 90% of the index!
   
   For me it'd be better to just remove this test (as we chatted about in https://github.com/apache/lucene/issues/11787), since we generally don't have proper handling for poorly connected graphs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org