You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "tang-hi (via GitHub)" <gi...@apache.org> on 2023/05/18 15:31:30 UTC

[GitHub] [lucene] tang-hi commented on pull request #12255: allocate one NeighborQueue per search for results

tang-hi commented on PR #12255:
URL: https://github.com/apache/lucene/pull/12255#issuecomment-1553237625

   I've noticed that we create a NeighborQueue with an initialSize set to topK. For instance, if topK is 100, the maximum size of LongHeap is also 100. However, when we execute the searchLayer function for any layer other than 0, the maximum size of LongHeap in the original version is only 1.
   
   In the `searchLevel` method, we attempt to insert qualifying neighbors. LongHeap will push an element only if its size is less than maxSize.
   
   ````Java
    if (results.insertWithOverflow(friendOrd, friendSimilarity) && results.size() >= topK) {
   ````
   
   ```Java
   if (size >= maxSize) {
     if (value < heap[1]) {
       return false;
     }
     updateTop(value);
     return true;
   }
   push(value);
   return true;
   ```
   
   So, if maxSize is 100, it will store up to 100 points. However, if maxSize is 1, it will only store 1 point. 
   
   When we try to return the result, we pop elements from the heap. If maxSize is 100, it will pop 99 elements when the layer is not 0. This could potentially be a time-consuming operation. In contrast, the original version does not pop any elements; it simply returns the result.
   
   Here is the relevant code from the [Apache Lucene repository](https://github.com/apache/lucene/blob/04ef6de826eea6c1eaeffba6f6f3752b64250406/lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java#LL279C4-L281C6):
   
   ```Java
   while (results.size() > topK) {
     results.pop();
   }
   ```
   
   In summary, the performance degradation could be due to the increased number of pop operations when the maximum size of the heap is larger than 1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org