You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/09/07 19:38:06 UTC

[GitHub] [lucene] jtibshirani commented on a change in pull request #287: Optimize storage of neighbours on level > 0

jtibshirani commented on a change in pull request #287:
URL: https://github.com/apache/lucene/pull/287#discussion_r703752457



##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##########
@@ -230,36 +244,50 @@ static NeighborQueue searchLevel(
    * Returns the {@link NeighborQueue} connected to the given node.
    *
    * @param level level of the graph
-   * @param node the node whose neighbors are returned
+   * @param node the node whose neighbors are returned, represented as an ordinal on the level 0.
    */
   public NeighborArray getNeighbors(int level, int node) {
-    NeighborArray result = graph.get(level).get(node);
-    assert result != null;
-    return result;
+    if (level == 0) {
+      return graph.get(level).get(node);
+    }
+    int nodeOrd = Arrays.binarySearch(nodesByLevel.get(level), 0, graph.get(level).size(), node);

Review comment:
       Would a clearer name for this be `nodeIndex`? I think of `nodeOrd` as representing the ordinal on level 0.

##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##########
@@ -56,21 +58,27 @@
 public final class HnswGraph extends KnnGraphValues {
 
   private final int maxConn;
+  private int curMaxLevel; // the current max graph level
+  private int entryNode; // the current graph entry node on the top level
+
+  // Nodes by level expressed as the level 0's nodes' ordinals.
+  // As level 0 contains all nodes, nodesByLevel.get(0) is null.
+  private final List<int[]> nodesByLevel;
+
   // graph is a list of graph levels.
   // Each level is represented as List<NeighborArray> – nodes' connections on this level.
   // Each entry in the list has the top maxConn neighbors of a node. The nodes correspond to vectors
   // added to HnswBuilder, and the node values are the ordinals of those vectors.
+  // Thus, on all levels, neighbors expressed as the level 0's nodes' ordinals.
   private final List<List<NeighborArray>> graph;
-  private int curMaxLevel; // the current max graph level
-  private int entryNode; // the current graph entry node on the top level
 
   // KnnGraphValues iterator members
   private int upto;
   private NeighborArray cur;
 
   // used for iterating over graph values
   private int curLevel = -1;
-  private int curNode = -1;
+  private int curNodeOrd = -1;

Review comment:
       This is not directly related to this PR, but I am finding the seeking behavior hard to follow. We have two pieces of state that seem similar: `cur` for the current node's NeighborArray, and `curNodeOrd` which is a node index within a level. These can get out of sync depending on what methods are called.
   
   I wonder if we could remove `curNodeOrd` -- I think it is only used to support `nextNodeOnLevel()`, which in turn is only used in tests. Maybe we could replace it with a package-private method for testing like `getAllNodesOnLevel(level)?

##########
File path: lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java
##########
@@ -56,21 +58,27 @@
 public final class HnswGraph extends KnnGraphValues {
 
   private final int maxConn;
+  private int curMaxLevel; // the current max graph level
+  private int entryNode; // the current graph entry node on the top level
+
+  // Nodes by level expressed as the level 0's nodes' ordinals.
+  // As level 0 contains all nodes, nodesByLevel.get(0) is null.
+  private final List<int[]> nodesByLevel;
+
   // graph is a list of graph levels.
   // Each level is represented as List<NeighborArray> – nodes' connections on this level.
   // Each entry in the list has the top maxConn neighbors of a node. The nodes correspond to vectors
   // added to HnswBuilder, and the node values are the ordinals of those vectors.
+  // Thus, on all levels, neighbors expressed as the level 0's nodes' ordinals.
   private final List<List<NeighborArray>> graph;
-  private int curMaxLevel; // the current max graph level
-  private int entryNode; // the current graph entry node on the top level
 
   // KnnGraphValues iterator members
   private int upto;
   private NeighborArray cur;
 
   // used for iterating over graph values
   private int curLevel = -1;
-  private int curNode = -1;
+  private int curNodeOrd = -1;

Review comment:
       This is not directly related to this PR, but I am finding the seeking behavior hard to follow. We have two pieces of state that seem similar: `cur` for the current node's NeighborArray, and `curNodeOrd` which is a node index within a level. These can get out of sync depending on what methods are called.
   
   I wonder if we could remove `curNodeOrd` -- I think it is only used to support `nextNodeOnLevel()`, which in turn is only used in tests. Maybe we could replace it with a package-private method for testing like `getAllNodesOnLevel(level)`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org