You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/06/14 08:12:27 UTC

[GitHub] [lucene] kaivalnp opened a new pull request, #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

kaivalnp opened a new pull request, #958:
URL: https://github.com/apache/lucene/pull/958

   ### Description
   
   Link to [Jira](https://issues.apache.org/jira/browse/LUCENE-10611)
   
   The HNSW graph search does not consider that visitedLimit may be reached in the upper levels of graph search itself
   
   This occurs when the pre-filter is too restrictive (and its count sets the visitedLimit). So instead of switching over to exactSearch, it tries to [pop from an empty heap](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#L90) and throws an error
   
   ### Solution
   
   We can check if results are incomplete after searching in upper levels, and break out accordingly. This way it won't throw heap errors, and gracefully switch to exactSearch instead


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on code in PR #958:
URL: https://github.com/apache/lucene/pull/958#discussion_r898465271


##########
lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java:
##########
@@ -498,7 +498,7 @@ public void testRandom() throws IOException {
 
   /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */
   public void testRandomWithFilter() throws IOException {
-    int numDocs = 200;
+    int numDocs = 2000;

Review Comment:
   I found that `numDocs = 1000`, `int lower = random().nextInt(500);` will also trigger the bug. It's nice to use lower numbers when possible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] kaivalnp commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

Posted by GitBox <gi...@apache.org>.
kaivalnp commented on code in PR #958:
URL: https://github.com/apache/lucene/pull/958#discussion_r898967077


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##########
@@ -87,10 +87,14 @@ public static NeighborQueue search(
     int numVisited = 0;
     for (int level = graph.numLevels() - 1; level >= 1; level--) {
       results = graphSearcher.searchLevel(query, 1, level, eps, vectors, graph, null, visitedLimit);
-      eps[0] = results.pop();
 
       numVisited += results.visitedCount();
       visitedLimit -= results.visitedCount();
+
+      if (results.incomplete()) {

Review Comment:
   I had done this to prevent some duplicate code (as `searchLevel` won't do anything when `visitedLimit` <= 0)
   However, it also makes sense from a readability perspective to return `results` there itself



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] kaivalnp commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

Posted by GitBox <gi...@apache.org>.
kaivalnp commented on code in PR #958:
URL: https://github.com/apache/lucene/pull/958#discussion_r898955669


##########
lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java:
##########
@@ -498,7 +498,7 @@ public void testRandom() throws IOException {
 
   /** Tests with random vectors and a random filter. Uses RandomIndexWriter. */
   public void testRandomWithFilter() throws IOException {
-    int numDocs = 200;
+    int numDocs = 2000;

Review Comment:
   Yes, makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a diff in pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on code in PR #958:
URL: https://github.com/apache/lucene/pull/958#discussion_r898455193


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java:
##########
@@ -87,10 +87,14 @@ public static NeighborQueue search(
     int numVisited = 0;
     for (int level = graph.numLevels() - 1; level >= 1; level--) {
       results = graphSearcher.searchLevel(query, 1, level, eps, vectors, graph, null, visitedLimit);
-      eps[0] = results.pop();
 
       numVisited += results.visitedCount();
       visitedLimit -= results.visitedCount();
+
+      if (results.incomplete()) {

Review Comment:
   It seems like we should return here instead of breaking. If we've hit the visitedLimit, then we don't want to continue on and attempt to search the last level.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani merged pull request #958: LUCENE-10611: Fix Heap Error in HnswGraphSearcher

Posted by GitBox <gi...@apache.org>.
jtibshirani merged PR #958:
URL: https://github.com/apache/lucene/pull/958


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org