You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/02 14:37:10 UTC

[GitHub] [lucene] msokolov commented on a diff in pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

msokolov commented on code in PR #11743:
URL: https://github.com/apache/lucene/pull/11743#discussion_r961747487


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##########
@@ -104,8 +104,8 @@ public void removeLast() {
   }
 
   public void removeIndex(int idx) {
-    System.arraycopy(node, idx + 1, node, idx, size - idx);
-    System.arraycopy(score, idx + 1, score, idx, size - idx);
+    System.arraycopy(node, idx + 1, node, idx, size - idx - 1);

Review Comment:
   oh, good! Is it OK when `size - idx - 1` == 0? I think so, at least I checked javadocs and they say only length < 0 raises an error



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/OnHeapHnswGraph.java:
##########
@@ -175,20 +175,28 @@ public long ramBytesUsed() {
     long neighborArrayBytes0 =
         nsize0 * (Integer.BYTES + Float.BYTES)
             + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER * 2
-            + RamUsageEstimator.NUM_BYTES_OBJECT_REF;
+            + RamUsageEstimator.NUM_BYTES_OBJECT_REF
+            + Integer.BYTES * 2;
     long neighborArrayBytes =
         nsize * (Integer.BYTES + Float.BYTES)
             + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER * 2
-            + RamUsageEstimator.NUM_BYTES_OBJECT_REF;
-
+            + RamUsageEstimator.NUM_BYTES_OBJECT_REF
+            + Integer.BYTES * 2;
     long total = 0;
     for (int l = 0; l < numLevels; l++) {
       int numNodesOnLevel = graph.get(l).size();

Review Comment:
   were we overestimating before because not all the nodes were populated?



##########
lucene/core/src/test/org/apache/lucene/util/TestRamUsageEstimator.java:
##########
@@ -222,6 +229,33 @@ public void testPrintValues() {
     System.out.println("LONG_SIZE = " + LONG_SIZE);
   }
 
+  public void testHnswGraph() throws IOException {
+    int size = atLeast(2000);
+    int dim = randomIntBetween(100, 1024);
+    int M = randomIntBetween(4, 96);
+    VectorSimilarityFunction similarityFunction =
+        VectorSimilarityFunction.values()[
+            random().nextInt(VectorSimilarityFunction.values().length - 1) + 1];
+    VectorEncoding vectorEncoding;
+    if (similarityFunction == VectorSimilarityFunction.DOT_PRODUCT) {

Review Comment:
   I don't think you need this check -- it's OK now to use BYTE encoding with other similarities



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org