You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/09 17:17:29 UTC
[GitHub] [druid] morokosi commented on issue #10293: Filtered Aggregator at ingestion time don't work
morokosi commented on issue #10293:
URL: https://github.com/apache/druid/issues/10293#issuecomment-689703547
This issue can be reproduced with the following modification to the test case:
(I used approxHistogram for an example, but any complex aggregator will do)
```diff
diff --git a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
index c38b3350a1..058c4c7388 100644
--- a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
+++ b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
@@ -139,10 +139,17 @@ public class ApproximateHistogramAggregationTest extends InitializedNullHandling
String ingestionAgg = ignoreNulls ? "approxHistogramFold" : "approxHistogram";
String metricSpec = "[{"
- + "\"type\": \"" + ingestionAgg + "\","
- + "\"name\": \"index_ah\","
- + "\"fieldName\": \"index\""
- + "}]";
+ + "\"type\":\"filtered\","
+ + "\"filter\": {"
+ + " \"type\": \"regex\","
+ + " \"dimension\":\"market\","
+ + " \"pattern\":\".+\""
+ + "},"
+ + "\"aggregator\":{"
+ + " \"type\": \"" + ingestionAgg + "\","
+ + " \"name\": \"index_ah\","
+ + " \"fieldName\": \"index\""
+ + "}}]";
String parseSpec = "{"
+ "\"type\" : \"string\","
```
Note that the filtered aggregator matches all values, so the test result should not be affected by it.
Currently, it filters out all the rows.
After some investigation, I found the modification above did not affect the test result before https://github.com/apache/druid/pull/9484 was introduced.
ValueMatcher returned in `FilteredAggregatorFactory#factorize` was made by `IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector` before, now it is made by `IncrementalIndexInputRowColumnSelectorFactory#makeColumnValueSelector`.
In this case, `IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector` infers the columnValueSelector to be made is complex since underlying aggregator has complex type, so it wraps the selector to use ComplexMetricSerde.
https://github.com/apache/druid/blob/e5f0da30ae15369f66c7e9ecc05a41c3d49eb2e6/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java#L133
This is a bit strange, as the selected column is for filtering unlike normal aggregation, but it successfully handled complex types and applied filters before.
https://github.com/apache/druid/pull/9484/files#diff-310a154902921cae9b118d769560f297L123-L130
https://github.com/apache/druid/blob/c6c2282b59cda107089a9b3944477fd630bc0657/processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java#L215
Now, if the selector is complex, it is considered unfilterable and it never match values.
https://github.com/apache/druid/blob/c557a1448d872e3aab03aec3bafb035e74583656/processing/src/main/java/org/apache/druid/segment/filter/PredicateValueMatcherFactory.java#L89
An easy fix will be to add some condition to judge whether a columnValueSelector should be complex when referenced column is for filtering and not for delegated complex aggregation of filtered aggregator:
```diff
diff --git a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
index 8778700a7e..37c43cd0ce 100644
--- a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
+++ b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
@@ -131,7 +130,7 @@ public abstract class IncrementalIndex<AggregatorType> extends AbstractIndex imp
@Override
public ColumnValueSelector<?> makeColumnValueSelector(final String column)
{
- final boolean isComplexMetric = ValueType.COMPLEX.equals(agg.getType());
+ final boolean isComplexMetric = agg.requiredFields().contains(column) && ValueType.COMPLEX.equals(agg.getType());
```
I can do a PR, but more appropriate solutions are welcome.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org