You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/09 17:17:29 UTC

[GitHub] [druid] morokosi commented on issue #10293: Filtered Aggregator at ingestion time don't work

morokosi commented on issue #10293:
URL: https://github.com/apache/druid/issues/10293#issuecomment-689703547


   This issue can be reproduced with the following modification to the test case:
   (I used approxHistogram for an example, but any complex aggregator will do)
   
   ```diff
   diff --git a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
   index c38b3350a1..058c4c7388 100644
   --- a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
   +++ b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
   @@ -139,10 +139,17 @@ public class ApproximateHistogramAggregationTest extends InitializedNullHandling
        String ingestionAgg = ignoreNulls ? "approxHistogramFold" : "approxHistogram";
   
        String metricSpec = "[{"
   -                        + "\"type\": \"" + ingestionAgg + "\","
   -                        + "\"name\": \"index_ah\","
   -                        + "\"fieldName\": \"index\""
   -                        + "}]";
   +            + "\"type\":\"filtered\","
   +            + "\"filter\": {"
   +            + "    \"type\": \"regex\","
   +            + "    \"dimension\":\"market\","
   +            + "    \"pattern\":\".+\""
   +            + "},"
   +            + "\"aggregator\":{"
   +            + "    \"type\": \"" + ingestionAgg + "\","
   +            + "    \"name\": \"index_ah\","
   +            + "    \"fieldName\": \"index\""
   +            + "}}]";
   
        String parseSpec = "{"
                           + "\"type\" : \"string\","
   ```
   
   Note that the filtered aggregator matches all values, so the test result should not be affected by it.
   Currently, it filters out all the rows.
   
   After some investigation, I found the modification above did not affect the test result before https://github.com/apache/druid/pull/9484 was introduced.
   ValueMatcher returned in `FilteredAggregatorFactory#factorize` was made by `IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector` before, now it is made by `IncrementalIndexInputRowColumnSelectorFactory#makeColumnValueSelector`.
   In this case, `IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector` infers the columnValueSelector to be made is complex since underlying aggregator has complex type, so it wraps the selector to use ComplexMetricSerde.
   https://github.com/apache/druid/blob/e5f0da30ae15369f66c7e9ecc05a41c3d49eb2e6/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java#L133
   This is a bit strange, as the selected column is for filtering unlike normal aggregation, but it successfully handled complex types and applied filters before.
   https://github.com/apache/druid/pull/9484/files#diff-310a154902921cae9b118d769560f297L123-L130
   https://github.com/apache/druid/blob/c6c2282b59cda107089a9b3944477fd630bc0657/processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java#L215
   
   Now, if the selector is complex, it is considered unfilterable and it never match values.
   https://github.com/apache/druid/blob/c557a1448d872e3aab03aec3bafb035e74583656/processing/src/main/java/org/apache/druid/segment/filter/PredicateValueMatcherFactory.java#L89
   
   An easy fix will be to add some condition to judge whether a columnValueSelector should be complex when referenced column is for filtering and not for delegated complex aggregation of filtered aggregator:
   ```diff
   diff --git a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
   index 8778700a7e..37c43cd0ce 100644
   --- a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
   +++ b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
   @@ -131,7 +130,7 @@ public abstract class IncrementalIndex<AggregatorType> extends AbstractIndex imp
          @Override
          public ColumnValueSelector<?> makeColumnValueSelector(final String column)
          {
   -        final boolean isComplexMetric = ValueType.COMPLEX.equals(agg.getType());
   +        final boolean isComplexMetric = agg.requiredFields().contains(column) && ValueType.COMPLEX.equals(agg.getType());
   ```
   I can do a PR, but more appropriate solutions are welcome.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org