You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/02/01 15:51:43 UTC

[GitHub] [lucene-solr] sbeniwal12 opened a new pull request #2282: LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes

sbeniwal12 opened a new pull request #2282:
URL: https://github.com/apache/lucene-solr/pull/2282


   Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes
   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Lucene or Solr:
   
   * https://issues.apache.org/jira/projects/LUCENE
   * https://issues.apache.org/jira/projects/SOLR
   
   You will need to create an account in Jira in order to create an issue.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * LUCENE-####: <short description of problem or changes>
   * SOLR-####: <short description of problem or changes>
   
   LUCENE and SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   HnswGraphBuilder has a few tunables: maxConnections, beamWidth, and we may add a few more, such as whether to use a diversity heuristic when choosing neighbors to link in the graph. Currently these are locked to defaults set by global variables. Instead we should provide some interface for configuring them.
   
   # Solution
   
   Here with this change we allow to pass down these parameters as FieldType attributes of VectorFields as well provide a convenience method in VectorFields.java to  add these parameters in FieldType attributes. The value passed in the attributes would be used in `Lucene90VectorWriter ` while creating a HNSW graph, in case no value is found we switch back to default parameter values present in `HnswGraphBuilder`. 
   
   # Tests
   
   Updated way of setting these parameters  in `KnnGraphTester.java` for testing indexing and search performance of a knn-graph. 
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov merged pull request #2282: LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes

Posted by GitBox <gi...@apache.org>.
msokolov merged pull request #2282:
URL: https://github.com/apache/lucene-solr/pull/2282


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov commented on a change in pull request #2282: LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes

Posted by GitBox <gi...@apache.org>.
msokolov commented on a change in pull request #2282:
URL: https://github.com/apache/lucene-solr/pull/2282#discussion_r568075607



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90VectorWriter.java
##########
@@ -188,9 +190,29 @@ private void writeGraph(
       RandomAccessVectorValuesProducer vectorValues,
       long graphDataOffset,
       long[] offsets,
-      int count)
+      int count,
+      String maxConnStr,
+      String beamWidthStr)
       throws IOException {
-    HnswGraphBuilder hnswGraphBuilder = new HnswGraphBuilder(vectorValues);
+    int maxConn, beamWidth;
+    if (maxConnStr == null) {
+      maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+    } else if (!maxConnStr.matches("[0-9]+")) {

Review comment:
       I don't think we need this - we can allow parseInt to throw an exception. Let's catch `NumberFormatException` and rethrow with more context (which attribute caused the exception). Also HnswGraphBuilder tests for `<= 0`, so we don't need to check that here.

##########
File path: lucene/core/src/java/org/apache/lucene/document/VectorField.java
##########
@@ -53,6 +54,44 @@ private static FieldType getType(float[] v, VectorValues.SearchStrategy searchSt
     return type;
   }
 
+  /**
+   * Public method to create HNSW field type with the given max-connections and beam-width
+   * parameters that would be used by HnswGraphBuilder while constructing HNSW graph.
+   *
+   * @param dimension dimension of vectors
+   * @param searchStrategy a function defining vector proximity.
+   * @param maxConn max-connections at each HNSW graph node
+   * @param beamWidth size of list to be used while constructing HNSW graph
+   * @throws IllegalArgumentException if any parameter is null, or has dimension &gt; 1024.
+   */
+  public static FieldType createHnswType(
+      int dimension, VectorValues.SearchStrategy searchStrategy, int maxConn, int beamWidth) {
+    if (dimension == 0) {
+      throw new IllegalArgumentException("cannot index an empty vector");
+    }
+    if (dimension > VectorValues.MAX_DIMENSIONS) {
+      throw new IllegalArgumentException(
+          "cannot index vectors with dimension greater than " + VectorValues.MAX_DIMENSIONS);
+    }
+    if (searchStrategy == null) {
+      throw new IllegalArgumentException("search strategy must not be null");

Review comment:
       Let's also assert `searchStrategy.isHnsw()` to catch attempts to use `NONE` or some other unsupported future strategy.

##########
File path: lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java
##########
@@ -132,13 +135,13 @@ private void run(String... args) throws Exception {
           if (iarg == args.length - 1) {
             throw new IllegalArgumentException("-beamWidthIndex requires a following number");
           }
-          HnswGraphBuilder.DEFAULT_BEAM_WIDTH = Integer.parseInt(args[++iarg]);

Review comment:
       With this change, we no longer have any need to make these static variables writable - let's change them to `final` in `HnswGraphBuilder`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] sbeniwal12 commented on a change in pull request #2282: LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes

Posted by GitBox <gi...@apache.org>.
sbeniwal12 commented on a change in pull request #2282:
URL: https://github.com/apache/lucene-solr/pull/2282#discussion_r568327126



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90VectorWriter.java
##########
@@ -188,9 +190,29 @@ private void writeGraph(
       RandomAccessVectorValuesProducer vectorValues,
       long graphDataOffset,
       long[] offsets,
-      int count)
+      int count,
+      String maxConnStr,
+      String beamWidthStr)
       throws IOException {
-    HnswGraphBuilder hnswGraphBuilder = new HnswGraphBuilder(vectorValues);
+    int maxConn, beamWidth;
+    if (maxConnStr == null) {
+      maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+    } else if (!maxConnStr.matches("[0-9]+")) {

Review comment:
       Thanks for pointing out, threw `NumberFormatException` with message describing which attribute caused the exception.

##########
File path: lucene/core/src/java/org/apache/lucene/document/VectorField.java
##########
@@ -53,6 +54,44 @@ private static FieldType getType(float[] v, VectorValues.SearchStrategy searchSt
     return type;
   }
 
+  /**
+   * Public method to create HNSW field type with the given max-connections and beam-width
+   * parameters that would be used by HnswGraphBuilder while constructing HNSW graph.
+   *
+   * @param dimension dimension of vectors
+   * @param searchStrategy a function defining vector proximity.
+   * @param maxConn max-connections at each HNSW graph node
+   * @param beamWidth size of list to be used while constructing HNSW graph
+   * @throws IllegalArgumentException if any parameter is null, or has dimension &gt; 1024.
+   */
+  public static FieldType createHnswType(
+      int dimension, VectorValues.SearchStrategy searchStrategy, int maxConn, int beamWidth) {
+    if (dimension == 0) {
+      throw new IllegalArgumentException("cannot index an empty vector");
+    }
+    if (dimension > VectorValues.MAX_DIMENSIONS) {
+      throw new IllegalArgumentException(
+          "cannot index vectors with dimension greater than " + VectorValues.MAX_DIMENSIONS);
+    }
+    if (searchStrategy == null) {
+      throw new IllegalArgumentException("search strategy must not be null");

Review comment:
       Added this check and also added a unit test for this check. 

##########
File path: lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java
##########
@@ -132,13 +135,13 @@ private void run(String... args) throws Exception {
           if (iarg == args.length - 1) {
             throw new IllegalArgumentException("-beamWidthIndex requires a following number");
           }
-          HnswGraphBuilder.DEFAULT_BEAM_WIDTH = Integer.parseInt(args[++iarg]);

Review comment:
       Made them final and also made changes to `TestKnnGraph.java` to accommodate this change. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] sbeniwal12 commented on a change in pull request #2282: LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes

Posted by GitBox <gi...@apache.org>.
sbeniwal12 commented on a change in pull request #2282:
URL: https://github.com/apache/lucene-solr/pull/2282#discussion_r568327126



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90VectorWriter.java
##########
@@ -188,9 +190,29 @@ private void writeGraph(
       RandomAccessVectorValuesProducer vectorValues,
       long graphDataOffset,
       long[] offsets,
-      int count)
+      int count,
+      String maxConnStr,
+      String beamWidthStr)
       throws IOException {
-    HnswGraphBuilder hnswGraphBuilder = new HnswGraphBuilder(vectorValues);
+    int maxConn, beamWidth;
+    if (maxConnStr == null) {
+      maxConn = HnswGraphBuilder.DEFAULT_MAX_CONN;
+    } else if (!maxConnStr.matches("[0-9]+")) {

Review comment:
       Thanks for pointing out, threw `NumberFormatException` with message describing which attribute caused the exception.

##########
File path: lucene/core/src/java/org/apache/lucene/document/VectorField.java
##########
@@ -53,6 +54,44 @@ private static FieldType getType(float[] v, VectorValues.SearchStrategy searchSt
     return type;
   }
 
+  /**
+   * Public method to create HNSW field type with the given max-connections and beam-width
+   * parameters that would be used by HnswGraphBuilder while constructing HNSW graph.
+   *
+   * @param dimension dimension of vectors
+   * @param searchStrategy a function defining vector proximity.
+   * @param maxConn max-connections at each HNSW graph node
+   * @param beamWidth size of list to be used while constructing HNSW graph
+   * @throws IllegalArgumentException if any parameter is null, or has dimension &gt; 1024.
+   */
+  public static FieldType createHnswType(
+      int dimension, VectorValues.SearchStrategy searchStrategy, int maxConn, int beamWidth) {
+    if (dimension == 0) {
+      throw new IllegalArgumentException("cannot index an empty vector");
+    }
+    if (dimension > VectorValues.MAX_DIMENSIONS) {
+      throw new IllegalArgumentException(
+          "cannot index vectors with dimension greater than " + VectorValues.MAX_DIMENSIONS);
+    }
+    if (searchStrategy == null) {
+      throw new IllegalArgumentException("search strategy must not be null");

Review comment:
       Added this check and also added a unit test for this check. 

##########
File path: lucene/core/src/test/org/apache/lucene/util/hnsw/KnnGraphTester.java
##########
@@ -132,13 +135,13 @@ private void run(String... args) throws Exception {
           if (iarg == args.length - 1) {
             throw new IllegalArgumentException("-beamWidthIndex requires a following number");
           }
-          HnswGraphBuilder.DEFAULT_BEAM_WIDTH = Integer.parseInt(args[++iarg]);

Review comment:
       Made them final and also made changes to `TestKnnGraph.java` to accommodate this change. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] msokolov merged pull request #2282: LUCENE-9615: Expose HnswGraphBuilder index-time hyperparameters as FieldType attributes

Posted by GitBox <gi...@apache.org>.
msokolov merged pull request #2282:
URL: https://github.com/apache/lucene-solr/pull/2282


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org