You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/11/02 21:51:30 UTC

[GitHub] [lucene] gsmiller commented on a change in pull request #420: [DRAFT] LUCENE-10122 Explore using NumericDocValue to store taxonomy parent array

gsmiller commented on a change in pull request #420:
URL: https://github.com/apache/lucene/pull/420#discussion_r741472877



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##########
@@ -129,39 +124,19 @@ private void initParents(IndexReader reader, int first) throws IOException {
     if (reader.maxDoc() == first) {
       return;
     }
-
-    // it's ok to use MultiTerms because we only iterate on one posting list.
-    // breaking it to loop over the leaves() only complicates code for no
-    // apparent gain.
-    PostingsEnum positions =
-        MultiTerms.getTermPostingsEnum(
-            reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, PostingsEnum.PAYLOADS);
-
-    // shouldn't really happen, if it does, something's wrong
-    if (positions == null || positions.advance(first) == DocIdSetIterator.NO_MORE_DOCS) {
-      throw new CorruptIndexException(
-          "Missing parent data for category " + first, reader.toString());
-    }
-
-    int num = reader.maxDoc();
-    for (int i = first; i < num; i++) {
-      if (positions.docID() == i) {
-        if (positions.freq() == 0) { // shouldn't happen
-          throw new CorruptIndexException(
-              "Missing parent data for category " + i, reader.toString());
-        }
-
-        parents[i] = positions.nextPosition();
-
-        if (positions.nextDoc() == DocIdSetIterator.NO_MORE_DOCS) {
-          if (i + 1 < num) {
-            throw new CorruptIndexException(
-                "Missing parent data for category " + (i + 1), reader.toString());
-          }
-          break;
+    for (LeafReaderContext leafContext: reader.leaves()) {

Review comment:
       I know this is just for benchmarking right now, but for what it's worth, I think you can probably simplify here and rely on `MultiDocValues#getNumericValues` if you move forward with this work.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org