You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/05/14 18:28:59 UTC

[GitHub] [lucene] gsmiller commented on a change in pull request #133: LUCENE-9950: New facet counting implementation for general string doc value fields

gsmiller commented on a change in pull request #133:
URL: https://github.com/apache/lucene/pull/133#discussion_r632717530



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java
##########
@@ -0,0 +1,371 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.MultiDocValues;
+import org.apache.lucene.index.OrdinalMap;
+import org.apache.lucene.index.ReaderUtil;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.search.ConjunctionDISI;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.search.MatchAllDocsQuery;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.LongValues;
+
+/**
+ * Compute facet counts from a previously indexed {@link SortedSetDocValues} or {@link
+ * org.apache.lucene.index.SortedDocValues} field. This approach will execute facet counting against
+ * the string values found in the specified field, with no assumptions on their format. Unlike
+ * {@link org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts}, no assumption is made
+ * about a "dimension" path component being indexed. Because of this, the field itself is
+ * effectively treated as the "dimension", and counts for all unique string values are produced.
+ * This approach is meant to compliment {@link LongValueFacetCounts} in that they both provide facet
+ * counting on a doc value field with no assumptions of content.
+ *
+ * <p>This implementation is useful if you want to dynamically count against any string doc value
+ * field without relying on {@link FacetField} and {@link FacetsConfig}. The disadvantage is that a
+ * separate field is required for each "dimension". If you want to pack multiple dimensions into the
+ * same doc values field, you probably want one of {@link
+ * org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts} or {@link
+ * org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts}.
+ *
+ * <p>Note that there is an added cost on every {@link IndexReader} open to create a new {@link
+ * StringDocValuesReaderState}. Also note that this class should be instantiated and used from a
+ * single thread, because it holds a thread-private instance of {@link SortedSetDocValues}.
+ *
+ * @lucene.experimental
+ */
+// TODO: Add a concurrent version much like ConcurrentSortedSetDocValuesFacetCounts?
+public class StringValueFacetCounts extends Facets {
+
+  private final IndexReader reader;
+  private final String field;
+  private final OrdinalMap ordinalMap;
+  private final SortedSetDocValues docValues;
+
+  private final int[] counts;
+
+  private int totalDocCount = 0;

Review comment:
       Good point; will remove.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org