You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/26 18:26:57 UTC

[GitHub] [lucene-solr] shaie commented on a change in pull request #2247: WIP: LUCENE-9476 Add getBulkPath API for the Taxonomy index

shaie commented on a change in pull request #2247:
URL: https://github.com/apache/lucene-solr/pull/2247#discussion_r564721526



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -353,12 +349,65 @@ public FacetLabel getPath(int ordinal) throws IOException {
     }
 
     synchronized (categoryCache) {
-      categoryCache.put(catIDInteger, ret);
+      categoryCache.put(ordinal, ret);
     }
 
     return ret;
   }
 
+  private FacetLabel getPathFromCache(int ordinal) {
+    ensureOpen();

Review comment:
       I think Mike asked about it above -- it's better that we call this method in the outer *public* API so it's visible and clear we perform this check rather than rely on calling it in a private method.
   
   Also, potentially a `private` method may be called many times in the course of one `public` API invocation, and we don't need to perform this check over and over.

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -353,12 +349,65 @@ public FacetLabel getPath(int ordinal) throws IOException {
     }
 
     synchronized (categoryCache) {
-      categoryCache.put(catIDInteger, ret);
+      categoryCache.put(ordinal, ret);
     }
 
     return ret;
   }
 
+  private FacetLabel getPathFromCache(int ordinal) {
+    ensureOpen();
+
+    // TODO: can we use an int-based hash impl, such as IntToObjectMap,
+    // wrapped as LRU?
+    synchronized (categoryCache) {
+      FacetLabel res = categoryCache.get(ordinal);
+      if (res != null) {
+        return res;

Review comment:
       It's a style comment, can we do this:
   
   ```
   FacetLabel res = null;
   sync (catCache) {
     res = catCache.get(ordinal);
   }
   return res;
   ```
   
   It's just shorter and I believe achieves the same outcome.
   
   In fact, the entire code can be changed to:
   
   ```
   sync(catCache) {
     return catCache.get(ordinal);
   }
   ```
   ??

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -353,12 +349,65 @@ public FacetLabel getPath(int ordinal) throws IOException {
     }
 
     synchronized (categoryCache) {
-      categoryCache.put(catIDInteger, ret);
+      categoryCache.put(ordinal, ret);
     }
 
     return ret;
   }
 
+  private FacetLabel getPathFromCache(int ordinal) {
+    ensureOpen();
+
+    // TODO: can we use an int-based hash impl, such as IntToObjectMap,
+    // wrapped as LRU?
+    synchronized (categoryCache) {
+      FacetLabel res = categoryCache.get(ordinal);
+      if (res != null) {
+        return res;
+      }
+    }
+    return null;
+  }
+
+  /* This API is only supported for indexes created with Lucene 8.7+ codec **/
+  public FacetLabel[] getBulkPath(int[] ordinal) throws IOException {
+    FacetLabel[] bulkPath = new FacetLabel[ordinal.length];
+    Map<Integer, Integer> originalPosition = new HashMap<>();
+    for (int i = 0; i < ordinal.length; i++) {
+      if (ordinal[i] < 0 || ordinal[i] >= indexReader.maxDoc()) {
+        return null;
+      }
+      FacetLabel ordinalPath = getPathFromCache(ordinal[i]);

Review comment:
       There you go - you call the new private method in the context of a single public API invocation, therefore no need to perform the `ensureOpen` check for each ordinal 😄 

##########
File path: lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestDirectoryTaxonomyReader.java
##########
@@ -569,4 +569,31 @@ public void testAccountable() throws Exception {
     taxoReader.close();
     dir.close();
   }
+
+  public void testBulkPath() throws Exception {

Review comment:
       I know Lucene tests are written with that naming convention, but over the last few years I've started to name tests that describe what's being tested. e.g. `testCallingBulkPathReturnsCorrectResult` vs `testBulkPathFailsIfReaderIsClosed` ...
   
   Would be nice if we gradually started to name our tests with more descriptive names :)

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -353,12 +349,65 @@ public FacetLabel getPath(int ordinal) throws IOException {
     }
 
     synchronized (categoryCache) {
-      categoryCache.put(catIDInteger, ret);
+      categoryCache.put(ordinal, ret);
     }
 
     return ret;
   }
 
+  private FacetLabel getPathFromCache(int ordinal) {
+    ensureOpen();
+
+    // TODO: can we use an int-based hash impl, such as IntToObjectMap,
+    // wrapped as LRU?
+    synchronized (categoryCache) {
+      FacetLabel res = categoryCache.get(ordinal);
+      if (res != null) {
+        return res;
+      }
+    }
+    return null;
+  }
+
+  /* This API is only supported for indexes created with Lucene 8.7+ codec **/
+  public FacetLabel[] getBulkPath(int[] ordinal) throws IOException {
+    FacetLabel[] bulkPath = new FacetLabel[ordinal.length];
+    Map<Integer, Integer> originalPosition = new HashMap<>();
+    for (int i = 0; i < ordinal.length; i++) {
+      if (ordinal[i] < 0 || ordinal[i] >= indexReader.maxDoc()) {
+        return null;

Review comment:
       I don't want to do premature optimization here, but `indexReader.maxDoc()` is called for each ordinal, but it cannot change between ordinals, so we could extract its value outside the loop?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -353,12 +349,65 @@ public FacetLabel getPath(int ordinal) throws IOException {
     }
 
     synchronized (categoryCache) {
-      categoryCache.put(catIDInteger, ret);
+      categoryCache.put(ordinal, ret);
     }
 
     return ret;
   }
 
+  private FacetLabel getPathFromCache(int ordinal) {
+    ensureOpen();
+
+    // TODO: can we use an int-based hash impl, such as IntToObjectMap,
+    // wrapped as LRU?
+    synchronized (categoryCache) {
+      FacetLabel res = categoryCache.get(ordinal);
+      if (res != null) {
+        return res;
+      }
+    }
+    return null;
+  }
+
+  /* This API is only supported for indexes created with Lucene 8.7+ codec **/
+  public FacetLabel[] getBulkPath(int[] ordinal) throws IOException {
+    FacetLabel[] bulkPath = new FacetLabel[ordinal.length];
+    Map<Integer, Integer> originalPosition = new HashMap<>();
+    for (int i = 0; i < ordinal.length; i++) {
+      if (ordinal[i] < 0 || ordinal[i] >= indexReader.maxDoc()) {
+        return null;
+      }
+      FacetLabel ordinalPath = getPathFromCache(ordinal[i]);
+      if (ordinalPath != null) {
+        bulkPath[i] = ordinalPath;
+      }
+      originalPosition.put(ordinal[i], i);
+    }
+
+    Arrays.sort(ordinal);
+    int readerIndex = 0;
+    BinaryDocValues values = null;
+
+    for (int ord : ordinal) {
+      if (bulkPath[originalPosition.get(ord)] == null) {
+        if (values == null
+            || values.advanceExact(ord - indexReader.leaves().get(readerIndex).docBase) == false) {

Review comment:
       Again, don't want to prematurely optimize, but we could extract `docBase` outside the loop and update it after we update `readerIndex`?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -353,12 +349,65 @@ public FacetLabel getPath(int ordinal) throws IOException {
     }
 
     synchronized (categoryCache) {
-      categoryCache.put(catIDInteger, ret);
+      categoryCache.put(ordinal, ret);
     }
 
     return ret;
   }
 
+  private FacetLabel getPathFromCache(int ordinal) {
+    ensureOpen();
+
+    // TODO: can we use an int-based hash impl, such as IntToObjectMap,
+    // wrapped as LRU?
+    synchronized (categoryCache) {
+      FacetLabel res = categoryCache.get(ordinal);
+      if (res != null) {
+        return res;
+      }
+    }
+    return null;
+  }
+
+  /* This API is only supported for indexes created with Lucene 8.7+ codec **/
+  public FacetLabel[] getBulkPath(int[] ordinal) throws IOException {

Review comment:
       I'd also change the method to take `int... ordinal` instead? More convenient if you want to only get the path for one ordinal (even though the method is called `getBulkXXX`?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -353,12 +349,65 @@ public FacetLabel getPath(int ordinal) throws IOException {
     }
 
     synchronized (categoryCache) {
-      categoryCache.put(catIDInteger, ret);
+      categoryCache.put(ordinal, ret);
     }
 
     return ret;
   }
 
+  private FacetLabel getPathFromCache(int ordinal) {
+    ensureOpen();
+
+    // TODO: can we use an int-based hash impl, such as IntToObjectMap,
+    // wrapped as LRU?
+    synchronized (categoryCache) {
+      FacetLabel res = categoryCache.get(ordinal);
+      if (res != null) {
+        return res;
+      }
+    }
+    return null;
+  }
+
+  /* This API is only supported for indexes created with Lucene 8.7+ codec **/
+  public FacetLabel[] getBulkPath(int[] ordinal) throws IOException {

Review comment:
       Separately, and it's been a while since I looked at this code so take treat this comment cautiously -- it confuses me that the method is called `getBulkPath` where we return `FacetLabel` ... perhaps we should rename it to `getFacetLabels`?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -353,12 +349,65 @@ public FacetLabel getPath(int ordinal) throws IOException {
     }
 
     synchronized (categoryCache) {
-      categoryCache.put(catIDInteger, ret);
+      categoryCache.put(ordinal, ret);
     }
 
     return ret;
   }
 
+  private FacetLabel getPathFromCache(int ordinal) {
+    ensureOpen();
+
+    // TODO: can we use an int-based hash impl, such as IntToObjectMap,
+    // wrapped as LRU?
+    synchronized (categoryCache) {
+      FacetLabel res = categoryCache.get(ordinal);
+      if (res != null) {
+        return res;
+      }
+    }
+    return null;
+  }
+
+  /* This API is only supported for indexes created with Lucene 8.7+ codec **/
+  public FacetLabel[] getBulkPath(int[] ordinal) throws IOException {
+    FacetLabel[] bulkPath = new FacetLabel[ordinal.length];
+    Map<Integer, Integer> originalPosition = new HashMap<>();
+    for (int i = 0; i < ordinal.length; i++) {
+      if (ordinal[i] < 0 || ordinal[i] >= indexReader.maxDoc()) {
+        return null;
+      }
+      FacetLabel ordinalPath = getPathFromCache(ordinal[i]);
+      if (ordinalPath != null) {
+        bulkPath[i] = ordinalPath;
+      }
+      originalPosition.put(ordinal[i], i);
+    }
+
+    Arrays.sort(ordinal);
+    int readerIndex = 0;
+    BinaryDocValues values = null;
+
+    for (int ord : ordinal) {
+      if (bulkPath[originalPosition.get(ord)] == null) {
+        if (values == null
+            || values.advanceExact(ord - indexReader.leaves().get(readerIndex).docBase) == false) {
+          readerIndex = ReaderUtil.subIndex(ord, indexReader.leaves());
+          LeafReader leafReader = indexReader.leaves().get(readerIndex).reader();
+          values = leafReader.getBinaryDocValues(Consts.FULL);
+          assert values.advanceExact(ord - indexReader.leaves().get(readerIndex).docBase);

Review comment:
       @mikemccand it could also be that the test runs just didn't trip yet on the seed which disabled assertions for this code, right? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org