You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/08/26 11:27:17 UTC

[GitHub] [lucene] mikemccand commented on a change in pull request #264: LUCENE-10062: Switch to numeric doc values for encoding taxonomy ordinals (instead of custom binary format)

mikemccand commented on a change in pull request #264:
URL: https://github.com/apache/lucene/pull/264#discussion_r696530876



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java
##########
@@ -410,7 +411,16 @@ private void processFacetFields(
 
       // Facet counts:
       // DocValues are considered stored fields:

Review comment:
       Hmm maybe remove this old and misleading comment?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java
##########
@@ -410,7 +411,16 @@ private void processFacetFields(
 
       // Facet counts:
       // DocValues are considered stored fields:
-      doc.add(new BinaryDocValuesField(indexFieldName, dedupAndEncode(ordinals.get())));
+      IntsRef o = ordinals.get();
+      Arrays.sort(o.ints, o.offset, o.length);
+      int prev = -1;
+      for (int i = 0; i < o.length; i++) {
+        int ord = o.ints[o.offset + i];
+        if (ord > prev) {
+          doc.add(new SortedNumericDocValuesField(indexFieldName, ord));

Review comment:
       Lucene also does this same sorting during indexing, so it is redundant here.  But we do indeed need to dedup.  Are we sure nothing above this has already dedup'd the added SSDV facet labels?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/taxonomy/DocValuesOrdinalsReader.java
##########
@@ -59,16 +53,21 @@ public void get(int docID, IntsRef ordinals) throws IOException {
               "docs out of order: lastDocID=" + lastDocID + " vs docID=" + docID);
         }
         lastDocID = docID;
-        if (docID > values.docID()) {
-          values.advance(docID);
-        }
-        final BytesRef bytes;
-        if (values.docID() == docID) {
-          bytes = values.binaryValue();
-        } else {
-          bytes = new BytesRef(BytesRef.EMPTY_BYTES);
+
+        ordinals.offset = 0;
+        ordinals.length = 0;
+
+        if (dv.advanceExact(docID)) {
+          int count = dv.docValueCount();
+          if (ordinals.ints.length < count) {
+            ordinals.ints = ArrayUtil.grow(ordinals.ints, count);
+          }
+
+          for (int i = 0; i < count; i++) {
+            ordinals.ints[ordinals.length] = (int) dv.nextValue();

Review comment:
       Maybe use `Math.toIntExact` instead of `(int)` for better safety (in case somehow a too-big `long` shows up)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org