You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/06/03 00:05:12 UTC

[GitHub] [lucene] jtibshirani opened a new pull request #166: LUCENE-9905: Move HNSW build parameters to codec

jtibshirani opened a new pull request #166:
URL: https://github.com/apache/lucene/pull/166


   Previously, the max connections and beam width parameters could be configured as
   field type attributes. This PR moves them to be parameters on
   Lucene90HnswVectorFormat, to avoid exposing details of the vector format
   implementation in the API.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on pull request #166:
URL: https://github.com/apache/lucene/pull/166#issuecomment-854021866


   @msokolov as a heads up, I didn't have the chance to try out `KnnGraphTester` to double-check the behavior didn't change. However the update to `KnnGraphTester` was straightforward and I don't anticipate problems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #166:
URL: https://github.com/apache/lucene/pull/166#discussion_r644395730



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorFormat.java
##########
@@ -76,14 +79,55 @@
   static final int VERSION_START = 0;
   static final int VERSION_CURRENT = VERSION_START;
 
-  /** Sole constructor */
+  static final String BEAM_WIDTH_KEY =
+      Lucene90HnswVectorFormat.class.getSimpleName() + ".beam_width";
+  static final String MAX_CONN_KEY = Lucene90HnswVectorFormat.class.getSimpleName() + ".max_conn";
+
+  /**
+   * Controls how many of the nearest neighbor candidates are connected to the new node. See {@link
+   * HnswGraph} for details.
+   */
+  private final int maxConn;
+
+  /**
+   * The number of candidate neighbors to track while searching the graph for each newly inserted
+   * node. See {@link HnswGraph} for details.
+   */
+  private final int beamWidth;
+
   public Lucene90HnswVectorFormat() {
     super("Lucene90HnswVectorFormat");
+    this.maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+    this.beamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH;
+  }
+
+  public Lucene90HnswVectorFormat(int maxConn, int beamWidth) {
+    super("Lucene90HnswVectorFormat");
+    this.maxConn = maxConn;
+    this.beamWidth = beamWidth;
   }
 
   @Override
   public VectorWriter fieldsWriter(SegmentWriteState state) throws IOException {
-    return new Lucene90HnswVectorWriter(state);
+    SegmentInfo segmentInfo = state.segmentInfo;
+    putFormatAttribute(segmentInfo, MAX_CONN_KEY, String.valueOf(maxConn));
+    putFormatAttribute(segmentInfo, BEAM_WIDTH_KEY, String.valueOf(beamWidth));
+    return new Lucene90HnswVectorWriter(state, maxConn, beamWidth);
+  }
+
+  private void putFormatAttribute(SegmentInfo si, String key, String value) {
+    String previousValue = si.putAttribute(key, value);
+    if (previousValue != null && previousValue.equals(value) == false) {

Review comment:
       I'm not sure that writing and validating these format attributes is necessary, since we don't use them when reading. It just seemed nice (and low cost) to have the construction parameters available in the segment infos for debugging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #166:
URL: https://github.com/apache/lucene/pull/166#discussion_r645789797



##########
File path: lucene/core/src/test/org/apache/lucene/codecs/lucene90/TestLucene90HnswVectorFormat.java
##########
@@ -16,14 +16,76 @@
  */
 package org.apache.lucene.codecs.lucene90;
 
+import java.util.List;
 import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.codecs.VectorFormat;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.VectorField;
 import org.apache.lucene.index.BaseVectorFormatTestCase;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.SegmentCommitInfo;
+import org.apache.lucene.index.SegmentInfo;
+import org.apache.lucene.index.SegmentInfos;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.store.Directory;
 import org.apache.lucene.util.TestUtil;
+import org.apache.lucene.util.hnsw.HnswGraphBuilder;
 
 public class TestLucene90HnswVectorFormat extends BaseVectorFormatTestCase {
 
   @Override
   protected Codec getCodec() {
     return TestUtil.getDefaultCodec();
   }
+
+  public void testFormatParameters() throws Exception {
+    Directory dir = newDirectory();
+
+    // Create a segment using the default parameters
+    IndexWriterConfig iwc = newIndexWriterConfig();
+    IndexWriter w = new IndexWriter(dir, iwc);
+    Document doc = new Document();
+    doc.add(new VectorField("field", new float[4], VectorValues.SimilarityFunction.DOT_PRODUCT));
+    w.addDocument(doc);
+    w.close();
+
+    // Write another segment using different parameters
+    int newMaxConn = HnswGraphBuilder.DEFAULT_MAX_CONN + 3;
+    int newBeamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH + 42;
+    IndexWriterConfig iwc2 =
+        newIndexWriterConfig()
+            .setCodec(
+                new Lucene90Codec() {
+                  @Override
+                  public VectorFormat getVectorFormatForField(String field) {
+                    return new Lucene90HnswVectorFormat(newMaxConn, newBeamWidth);
+                  }
+                });
+    IndexWriter w2 = new IndexWriter(dir, iwc2);
+    Document doc2 = new Document();
+    doc2.add(new VectorField("field", new float[4], VectorValues.SimilarityFunction.DOT_PRODUCT));
+    w2.addDocument(doc2);
+    w2.close();
+
+    // Check that we record the parameters that were used in the segment infos

Review comment:
       I tried a new testing approach where we randomize the parameters in `TestHnsw#testReadWrite`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on pull request #166:
URL: https://github.com/apache/lucene/pull/166#issuecomment-854021866


   @msokolov as a heads up, I didn't have the chance to try out `KnnGraphTester` to double-check the behavior didn't change. However the update to `KnnGraphTester` was straightforward and I don't anticipate problems.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
rmuir commented on a change in pull request #166:
URL: https://github.com/apache/lucene/pull/166#discussion_r645249079



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorFormat.java
##########
@@ -76,14 +79,55 @@
   static final int VERSION_START = 0;
   static final int VERSION_CURRENT = VERSION_START;
 
-  /** Sole constructor */
+  static final String BEAM_WIDTH_KEY =
+      Lucene90HnswVectorFormat.class.getSimpleName() + ".beam_width";
+  static final String MAX_CONN_KEY = Lucene90HnswVectorFormat.class.getSimpleName() + ".max_conn";
+
+  /**
+   * Controls how many of the nearest neighbor candidates are connected to the new node. See {@link
+   * HnswGraph} for details.
+   */
+  private final int maxConn;
+
+  /**
+   * The number of candidate neighbors to track while searching the graph for each newly inserted
+   * node. See {@link HnswGraph} for details.
+   */
+  private final int beamWidth;
+
   public Lucene90HnswVectorFormat() {
     super("Lucene90HnswVectorFormat");
+    this.maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+    this.beamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH;
+  }
+
+  public Lucene90HnswVectorFormat(int maxConn, int beamWidth) {
+    super("Lucene90HnswVectorFormat");
+    this.maxConn = maxConn;
+    this.beamWidth = beamWidth;
   }
 
   @Override
   public VectorWriter fieldsWriter(SegmentWriteState state) throws IOException {
-    return new Lucene90HnswVectorWriter(state);
+    SegmentInfo segmentInfo = state.segmentInfo;
+    putFormatAttribute(segmentInfo, MAX_CONN_KEY, String.valueOf(maxConn));
+    putFormatAttribute(segmentInfo, BEAM_WIDTH_KEY, String.valueOf(beamWidth));
+    return new Lucene90HnswVectorWriter(state, maxConn, beamWidth);
+  }
+
+  private void putFormatAttribute(SegmentInfo si, String key, String value) {
+    String previousValue = si.putAttribute(key, value);
+    if (previousValue != null && previousValue.equals(value) == false) {

Review comment:
       I don't think these should be written. If someone is using the per-field impl, and has different fields with different values, then they'd trample each other.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #166:
URL: https://github.com/apache/lucene/pull/166#discussion_r645698401



##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java
##########
@@ -38,17 +38,11 @@
   // expose for testing.
   public static long randSeed = DEFAULT_RAND_SEED;
 
-  /* These "default" hyper-parameter settings are exposed (and non-final) to enable performance
-   * testing since the indexing API doesn't provide any control over them.
-   */
-
   // default max connections per node
   public static final int DEFAULT_MAX_CONN = 16;
-  public static String HNSW_MAX_CONN_ATTRIBUTE_KEY = "max_connections";
 
   // default candidate list size
   public static final int DEFAULT_BEAM_WIDTH = 16;

Review comment:
       I also think we should bump the default. If you're okay with it, I'd prefer to do it in a separate PR to keep this one a straight refactor.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #166:
URL: https://github.com/apache/lucene/pull/166#discussion_r644395730



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorFormat.java
##########
@@ -76,14 +79,55 @@
   static final int VERSION_START = 0;
   static final int VERSION_CURRENT = VERSION_START;
 
-  /** Sole constructor */
+  static final String BEAM_WIDTH_KEY =
+      Lucene90HnswVectorFormat.class.getSimpleName() + ".beam_width";
+  static final String MAX_CONN_KEY = Lucene90HnswVectorFormat.class.getSimpleName() + ".max_conn";
+
+  /**
+   * Controls how many of the nearest neighbor candidates are connected to the new node. See {@link
+   * HnswGraph} for details.
+   */
+  private final int maxConn;
+
+  /**
+   * The number of candidate neighbors to track while searching the graph for each newly inserted
+   * node. See {@link HnswGraph} for details.
+   */
+  private final int beamWidth;
+
   public Lucene90HnswVectorFormat() {
     super("Lucene90HnswVectorFormat");
+    this.maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+    this.beamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH;
+  }
+
+  public Lucene90HnswVectorFormat(int maxConn, int beamWidth) {
+    super("Lucene90HnswVectorFormat");
+    this.maxConn = maxConn;
+    this.beamWidth = beamWidth;
   }
 
   @Override
   public VectorWriter fieldsWriter(SegmentWriteState state) throws IOException {
-    return new Lucene90HnswVectorWriter(state);
+    SegmentInfo segmentInfo = state.segmentInfo;
+    putFormatAttribute(segmentInfo, MAX_CONN_KEY, String.valueOf(maxConn));
+    putFormatAttribute(segmentInfo, BEAM_WIDTH_KEY, String.valueOf(beamWidth));
+    return new Lucene90HnswVectorWriter(state, maxConn, beamWidth);
+  }
+
+  private void putFormatAttribute(SegmentInfo si, String key, String value) {
+    String previousValue = si.putAttribute(key, value);
+    if (previousValue != null && previousValue.equals(value) == false) {

Review comment:
       I'm not sure that writing and validating these format attributes is necessary, since we don't use them when reading. It just seemed nice (and low cost) to have the construction parameters available in the segment infos for debugging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] msokolov commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
msokolov commented on a change in pull request #166:
URL: https://github.com/apache/lucene/pull/166#discussion_r645594479



##########
File path: lucene/core/src/test/org/apache/lucene/codecs/lucene90/TestLucene90HnswVectorFormat.java
##########
@@ -16,14 +16,76 @@
  */
 package org.apache.lucene.codecs.lucene90;
 
+import java.util.List;
 import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.codecs.VectorFormat;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.VectorField;
 import org.apache.lucene.index.BaseVectorFormatTestCase;
+import org.apache.lucene.index.IndexWriter;
+import org.apache.lucene.index.IndexWriterConfig;
+import org.apache.lucene.index.SegmentCommitInfo;
+import org.apache.lucene.index.SegmentInfo;
+import org.apache.lucene.index.SegmentInfos;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.store.Directory;
 import org.apache.lucene.util.TestUtil;
+import org.apache.lucene.util.hnsw.HnswGraphBuilder;
 
 public class TestLucene90HnswVectorFormat extends BaseVectorFormatTestCase {
 
   @Override
   protected Codec getCodec() {
     return TestUtil.getDefaultCodec();
   }
+
+  public void testFormatParameters() throws Exception {
+    Directory dir = newDirectory();
+
+    // Create a segment using the default parameters
+    IndexWriterConfig iwc = newIndexWriterConfig();
+    IndexWriter w = new IndexWriter(dir, iwc);
+    Document doc = new Document();
+    doc.add(new VectorField("field", new float[4], VectorValues.SimilarityFunction.DOT_PRODUCT));
+    w.addDocument(doc);
+    w.close();
+
+    // Write another segment using different parameters
+    int newMaxConn = HnswGraphBuilder.DEFAULT_MAX_CONN + 3;
+    int newBeamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH + 42;
+    IndexWriterConfig iwc2 =
+        newIndexWriterConfig()
+            .setCodec(
+                new Lucene90Codec() {
+                  @Override
+                  public VectorFormat getVectorFormatForField(String field) {
+                    return new Lucene90HnswVectorFormat(newMaxConn, newBeamWidth);
+                  }
+                });
+    IndexWriter w2 = new IndexWriter(dir, iwc2);
+    Document doc2 = new Document();
+    doc2.add(new VectorField("field", new float[4], VectorValues.SimilarityFunction.DOT_PRODUCT));
+    w2.addDocument(doc2);
+    w2.close();
+
+    // Check that we record the parameters that were used in the segment infos

Review comment:
       Of course if we don't save the value it will be hard to test this! I guess we could check the *actual* maximum number of connections after adding more documents? Maybe make max conn 2?

##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java
##########
@@ -38,17 +38,11 @@
   // expose for testing.
   public static long randSeed = DEFAULT_RAND_SEED;
 
-  /* These "default" hyper-parameter settings are exposed (and non-final) to enable performance
-   * testing since the indexing API doesn't provide any control over them.
-   */
-
   // default max connections per node
   public static final int DEFAULT_MAX_CONN = 16;
-  public static String HNSW_MAX_CONN_ATTRIBUTE_KEY = "max_connections";
 
   // default candidate list size
   public static final int DEFAULT_BEAM_WIDTH = 16;

Review comment:
       I wonder if we should change this to something more realistic, like 100?

##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorFormat.java
##########
@@ -76,14 +79,55 @@
   static final int VERSION_START = 0;
   static final int VERSION_CURRENT = VERSION_START;
 
-  /** Sole constructor */
+  static final String BEAM_WIDTH_KEY =
+      Lucene90HnswVectorFormat.class.getSimpleName() + ".beam_width";
+  static final String MAX_CONN_KEY = Lucene90HnswVectorFormat.class.getSimpleName() + ".max_conn";
+
+  /**
+   * Controls how many of the nearest neighbor candidates are connected to the new node. See {@link
+   * HnswGraph} for details.
+   */
+  private final int maxConn;
+
+  /**
+   * The number of candidate neighbors to track while searching the graph for each newly inserted
+   * node. See {@link HnswGraph} for details.
+   */
+  private final int beamWidth;
+
   public Lucene90HnswVectorFormat() {
     super("Lucene90HnswVectorFormat");
+    this.maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+    this.beamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH;
+  }
+
+  public Lucene90HnswVectorFormat(int maxConn, int beamWidth) {
+    super("Lucene90HnswVectorFormat");
+    this.maxConn = maxConn;
+    this.beamWidth = beamWidth;
   }
 
   @Override
   public VectorWriter fieldsWriter(SegmentWriteState state) throws IOException {
-    return new Lucene90HnswVectorWriter(state);
+    SegmentInfo segmentInfo = state.segmentInfo;
+    putFormatAttribute(segmentInfo, MAX_CONN_KEY, String.valueOf(maxConn));
+    putFormatAttribute(segmentInfo, BEAM_WIDTH_KEY, String.valueOf(beamWidth));
+    return new Lucene90HnswVectorWriter(state, maxConn, beamWidth);
+  }
+
+  private void putFormatAttribute(SegmentInfo si, String key, String value) {
+    String previousValue = si.putAttribute(key, value);
+    if (previousValue != null && previousValue.equals(value) == false) {

Review comment:
       I agree, if we don't use them, we don't need to save them. I mean it's interesting in a theoretical sense to know how the graph was constructed, but we can figure out how to add them (per-field) later if need be.

##########
File path: lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java
##########
@@ -566,13 +569,19 @@ private void writeNN(int[][] nn, Path nnPath) throws IOException {
 
   private int createIndex(Path docsPath, Path indexPath) throws IOException {
     IndexWriterConfig iwc = new IndexWriterConfig().setOpenMode(IndexWriterConfig.OpenMode.CREATE);
+    iwc.setCodec(

Review comment:
       It's OK not to test, this looks pretty safe. If you have time, you could try [knnPerfTest.py](https://github.com/mikemccand/luceneutil/blob/master/src/python/knnPerfTest.py). It takes a little setup to generate the GloVe vectors for wikipedia documents, documented in the script. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
rmuir commented on a change in pull request #166:
URL: https://github.com/apache/lucene/pull/166#discussion_r645249079



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorFormat.java
##########
@@ -76,14 +79,55 @@
   static final int VERSION_START = 0;
   static final int VERSION_CURRENT = VERSION_START;
 
-  /** Sole constructor */
+  static final String BEAM_WIDTH_KEY =
+      Lucene90HnswVectorFormat.class.getSimpleName() + ".beam_width";
+  static final String MAX_CONN_KEY = Lucene90HnswVectorFormat.class.getSimpleName() + ".max_conn";
+
+  /**
+   * Controls how many of the nearest neighbor candidates are connected to the new node. See {@link
+   * HnswGraph} for details.
+   */
+  private final int maxConn;
+
+  /**
+   * The number of candidate neighbors to track while searching the graph for each newly inserted
+   * node. See {@link HnswGraph} for details.
+   */
+  private final int beamWidth;
+
   public Lucene90HnswVectorFormat() {
     super("Lucene90HnswVectorFormat");
+    this.maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+    this.beamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH;
+  }
+
+  public Lucene90HnswVectorFormat(int maxConn, int beamWidth) {
+    super("Lucene90HnswVectorFormat");
+    this.maxConn = maxConn;
+    this.beamWidth = beamWidth;
   }
 
   @Override
   public VectorWriter fieldsWriter(SegmentWriteState state) throws IOException {
-    return new Lucene90HnswVectorWriter(state);
+    SegmentInfo segmentInfo = state.segmentInfo;
+    putFormatAttribute(segmentInfo, MAX_CONN_KEY, String.valueOf(maxConn));
+    putFormatAttribute(segmentInfo, BEAM_WIDTH_KEY, String.valueOf(beamWidth));
+    return new Lucene90HnswVectorWriter(state, maxConn, beamWidth);
+  }
+
+  private void putFormatAttribute(SegmentInfo si, String key, String value) {
+    String previousValue = si.putAttribute(key, value);
+    if (previousValue != null && previousValue.equals(value) == false) {

Review comment:
       I don't think these should be written. If someone is using the per-field impl, and has different fields with different values, then they'd trample each other.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on pull request #166:
URL: https://github.com/apache/lucene/pull/166#issuecomment-854940702


   This is ready for a second look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani merged pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
jtibshirani merged pull request #166:
URL: https://github.com/apache/lucene/pull/166


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jtibshirani commented on a change in pull request #166: LUCENE-9905: Move HNSW build parameters to codec

Posted by GitBox <gi...@apache.org>.
jtibshirani commented on a change in pull request #166:
URL: https://github.com/apache/lucene/pull/166#discussion_r645694641



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90HnswVectorFormat.java
##########
@@ -76,14 +79,55 @@
   static final int VERSION_START = 0;
   static final int VERSION_CURRENT = VERSION_START;
 
-  /** Sole constructor */
+  static final String BEAM_WIDTH_KEY =
+      Lucene90HnswVectorFormat.class.getSimpleName() + ".beam_width";
+  static final String MAX_CONN_KEY = Lucene90HnswVectorFormat.class.getSimpleName() + ".max_conn";
+
+  /**
+   * Controls how many of the nearest neighbor candidates are connected to the new node. See {@link
+   * HnswGraph} for details.
+   */
+  private final int maxConn;
+
+  /**
+   * The number of candidate neighbors to track while searching the graph for each newly inserted
+   * node. See {@link HnswGraph} for details.
+   */
+  private final int beamWidth;
+
   public Lucene90HnswVectorFormat() {
     super("Lucene90HnswVectorFormat");
+    this.maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+    this.beamWidth = HnswGraphBuilder.DEFAULT_BEAM_WIDTH;
+  }
+
+  public Lucene90HnswVectorFormat(int maxConn, int beamWidth) {
+    super("Lucene90HnswVectorFormat");
+    this.maxConn = maxConn;
+    this.beamWidth = beamWidth;
   }
 
   @Override
   public VectorWriter fieldsWriter(SegmentWriteState state) throws IOException {
-    return new Lucene90HnswVectorWriter(state);
+    SegmentInfo segmentInfo = state.segmentInfo;
+    putFormatAttribute(segmentInfo, MAX_CONN_KEY, String.valueOf(maxConn));
+    putFormatAttribute(segmentInfo, BEAM_WIDTH_KEY, String.valueOf(beamWidth));
+    return new Lucene90HnswVectorWriter(state, maxConn, beamWidth);
+  }
+
+  private void putFormatAttribute(SegmentInfo si, String key, String value) {
+    String previousValue = si.putAttribute(key, value);
+    if (previousValue != null && previousValue.equals(value) == false) {

Review comment:
       Ah right this doesn't handle the per-field case correctly, I'll just remove!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org