You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/09/05 07:26:56 UTC

[GitHub] [lucene] jpountz commented on a diff in pull request #11743: LUCENE-10592 Better estimate memory for HNSW graph

jpountz commented on code in PR #11743:
URL: https://github.com/apache/lucene/pull/11743#discussion_r962569779


##########
lucene/core/src/test/org/apache/lucene/util/TestRamUsageEstimator.java:
##########
@@ -222,6 +229,28 @@ public void testPrintValues() {
     System.out.println("LONG_SIZE = " + LONG_SIZE);
   }
 
+  public void testHnswGraph() throws IOException {
+    int size = atLeast(2000);
+    int dim = randomIntBetween(100, 1024);
+    int M = randomIntBetween(4, 96);
+    VectorSimilarityFunction similarityFunction =
+        VectorSimilarityFunction.values()[
+            random().nextInt(VectorSimilarityFunction.values().length - 1) + 1];
+    VectorEncoding vectorEncoding =
+        VectorEncoding.values()[random().nextInt(VectorEncoding.values().length - 1) + 1];
+    TestHnswGraph.RandomVectorValues vectors =
+        new TestHnswGraph.RandomVectorValues(size, dim, vectorEncoding, random());
+
+    HnswGraphBuilder<?> builder =
+        HnswGraphBuilder.create(
+            vectors, vectorEncoding, similarityFunction, M, M * 2, random().nextLong());
+    OnHeapHnswGraph hnsw = builder.build(vectors.copy());
+    long estimated = RamUsageEstimator.sizeOfObject(hnsw);
+    long actual = ramUsed(hnsw);
+
+    assertEquals((double) actual, (double) estimated, (double) actual * 0.3);
+  }
+

Review Comment:
   Nit: Since this test is about testing `OnHeapHnswGraph#ramBytesUsed` rather than `RamUsageEstimator`, it should be  in `TestHnswGraph`.



##########
lucene/core/src/java/org/apache/lucene/util/hnsw/NeighborArray.java:
##########
@@ -46,7 +46,7 @@ public NeighborArray(int maxSize, boolean descOrder) {
    * nodes.
    */
   public void add(int newNode, float newScore) {
-    if (size == node.length - 1) {
+    if (size == node.length) {
       node = ArrayUtil.grow(node, (size + 1) * 3 / 2);

Review Comment:
   I wonder if this line is intentional. `ArrayUtil#grow` already increases the provided size by 12.5%. And it looks like there is an intention here to increase the size by 50%. So overall this increases the size of the array by 68.75%. Should we either do `ArrayUtil.growExact(node, (size + 1) * 3 / 2)` to grow by 50% or possibly rely on `ArrayUtil`'s default by doing `ArrayUtil.grow(node)` and grow by 12.5%?



##########
lucene/core/src/test/org/apache/lucene/util/hnsw/TestHnswGraph.java:
##########
@@ -74,12 +74,8 @@ public void setup() {
     similarityFunction =
         VectorSimilarityFunction.values()[
             random().nextInt(VectorSimilarityFunction.values().length - 1) + 1];
-    if (similarityFunction == VectorSimilarityFunction.DOT_PRODUCT) {
-      vectorEncoding =
-          VectorEncoding.values()[random().nextInt(VectorEncoding.values().length - 1) + 1];
-    } else {
-      vectorEncoding = VectorEncoding.FLOAT32;
-    }
+    vectorEncoding =
+        VectorEncoding.values()[random().nextInt(VectorEncoding.values().length - 1) + 1];

Review Comment:
   I think you can do something like `RandomPicks.randomFrom(random(), VectorEncoding.values())` to make it more readable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org