You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/08/20 15:46:41 UTC

[GitHub] [lucene] msokolov commented on a change in pull request #250: LUCENE-10054 Make HnswGraph hierarchical

msokolov commented on a change in pull request #250:
URL: https://github.com/apache/lucene/pull/250#discussion_r693041906



##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##########
@@ -56,22 +56,28 @@
 public final class HnswGraph extends KnnGraphValues {
 
   private final int maxConn;
-
-  // Each entry lists the top maxConn neighbors of a node. The nodes correspond to vectors added to
-  // HnswBuilder, and the
-  // node values are the ordinals of those vectors.
-  private final List<NeighborArray> graph;
+  // graph is a list of graph levels.
+  // Each level is represented as List<NeighborArray> – nodes' connections on this level.
+  // Each entry in the list has the top maxConn neighbors of a node. The nodes correspond to vectors
+  // added to HnswBuilder, and the node values are the ordinals of those vectors.
+  private final List<List<NeighborArray>> graph;
 
   // KnnGraphValues iterator members
   private int upto;
   private NeighborArray cur;
 
-  HnswGraph(int maxConn) {
-    graph = new ArrayList<>();
-    // Typically with diversity criteria we see nodes not fully occupied; average fanout seems to be
-    // about 1/2 maxConn. There is some indexing time penalty for under-allocating, but saves RAM
-    graph.add(new NeighborArray(Math.max(32, maxConn / 4)));
+  HnswGraph(int maxConn, int numLevels, int levelOfFirstNode) {
     this.maxConn = maxConn;
+    this.graph = new ArrayList<>(numLevels);
+    for (int i = 0; i < numLevels; i++) {
+      graph.add(new ArrayList<>());
+    }
+    for (int i = 0; i <= levelOfFirstNode; i++) {
+      // Typically with diversity criteria we see nodes not fully occupied; average fanout seems to
+      // be

Review comment:
       can we re-flow the comment? Maybe use a block comment so spotless doesn't get confused

##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##########
@@ -40,10 +40,10 @@
  * <h2>Hyperparameters</h2>
  *
  * <ul>
- *   <li><code>numSeed</code> is the equivalent of <code>m</code> in the 2012 paper; it controls the
+ *   <li><code>numSeed</code> is the equivalent of <code>m</code> in the 2014 paper; it controls the

Review comment:
       heh

##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##########
@@ -166,25 +173,35 @@ public static NeighborQueue search(
   /**
    * Returns the {@link NeighborQueue} connected to the given node.
    *
+   * @param level level of the graph
    * @param node the node whose neighbors are returned
    */
-  public NeighborArray getNeighbors(int node) {
-    return graph.get(node);
+  public NeighborArray getNeighbors(int level, int node) {
+    NeighborArray result = graph.get(level).get(node);
+    assert result != null;
+    return result;
   }
 
   @Override
   public int size() {
-    return graph.size();
+    return graph.get(0).size(); // all nodes are located on the 0th level
   }
 
-  int addNode() {
-    graph.add(new NeighborArray(maxConn + 1));
-    return graph.size() - 1;
+  public void addNode(int level, int node) {
+    if (level > 0) {
+      // Levels above 0th don't contain all nodes,
+      // so for missing nodes we add null NeighborArray
+      int nullsToAdd = node - graph.get(level).size();
+      for (int i = 0; i < nullsToAdd; i++) {

Review comment:
       This seems kind of expensive. Perhaps we should have a data structure that has NeighborArray, with values that are dense ordinals in the current layer *plus* the ordinal of the current node in the next layer. If we do that, each layer can be dense instead of creating N*numlayers entries, most of which are null. Maybe just throw an UnsupportedOperationException here for now and we can resolve this as we get deeper into the implementation though




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org