You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/16 15:32:37 UTC

[GitHub] [druid] doenis-usk opened a new issue #10293: Filtered Aggregator at ingestion time don't work

doenis-usk opened a new issue #10293:
URL: https://github.com/apache/druid/issues/10293


   ### affected version
   0.18.1
   
   ### Description
   I use extension metrics(DataSketches Quantiles Sketch or Approximate Histogram) with Filtered Aggregator at ingestion time.
   When I post a query, I got only `NaN` result.
   
   However, when I use longSum or count metrics with Filtered Aggregator at ingestion time, I got the numeric result.
   
   ####  ingestion spec
   For example, I ingest two metrics (1)latency_histogram, (2)latency_histogram_without_filter and status 200 data exists.
   
   ```
   {
   ...
     "metricsSpec": [
       {
         "aggregator": {
           "name": "latency_histogram",
           "type": "approxHistogramFold",
           "filedName": "fe_latency"
         },
         "type": "filtered",
         "filter": {
           "type": "selector",
           "dimension": "status",
           "value": "200"
         }
       },
       {
         "name": "latency_histogram_without_filter",
         "type": "approxHistogramFold",
         "filedName": "fe_latency"
       }
     ]
   ...
   }
   ```
   
   #### query
   When I post below query, latency_histogram is NaN and latency_histogram_without_filter is numeric result.
   
   ```
   {
     "queryType": "timeseries",
     "dataSource": <DATASOURCE>,
     "granularity": "minute",
     "aggregations": [
       {
         "type": "approxHistogramFold",
         "name": "latency_histogram",
         "fieldName": "latency_histogram"
       },
       {
         "type": "approxHistogramFold",
         "name": "latency_histogram_without_filter",
         "fieldName": "latency_histogram_without_filter"
       }
     ],
     "intervals": <INTERVALS>
   }
   ```
   
   ```
   [
       {
           "timestamp": "2020-08-16T13:00:00.000Z",
           "result": {
               "latency_histogram": {
                   "breaks": [
                       "Infinity",
                       "NaN",
                       "NaN",
                       "NaN",
                       "NaN",
                       "NaN",
                       "NaN",
                       "-Infinity"
                   ],
                   "counts": [
                       "NaN",
                       "NaN",
                       "NaN",
                       "NaN",
                       "NaN",
                       "NaN",
                       "NaN"
                   ]
               },
               "latency_histogram_without_filter": {
                   "breaks": [
                       -395428.5,
                       61553,
                       518534.5,
                       975516,
                       1432497.5,
                       1889479,
                       2346460.5,
                       2803442
                   ],
                   "counts": [
                       0,
                       66000.5625,
                       2844.926513671875,
                       9.512149810791016,
                       1,
                       2,
                       2
                   ]
               }
           }
       }
   ]
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] morokosi edited a comment on issue #10293: Filtered Aggregator at ingestion time don't work

Posted by GitBox <gi...@apache.org>.
morokosi edited a comment on issue #10293:
URL: https://github.com/apache/druid/issues/10293#issuecomment-689703547


   This issue can be reproduced with the following modification to the test case:
   (I used approxHistogram for an example, but any complex aggregator will do)
   
   ```diff
   diff --git a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
   index c38b3350a1..058c4c7388 100644
   --- a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
   +++ b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
   @@ -139,10 +139,17 @@ public class ApproximateHistogramAggregationTest extends InitializedNullHandling
        String ingestionAgg = ignoreNulls ? "approxHistogramFold" : "approxHistogram";
   
        String metricSpec = "[{"
   -                        + "\"type\": \"" + ingestionAgg + "\","
   -                        + "\"name\": \"index_ah\","
   -                        + "\"fieldName\": \"index\""
   -                        + "}]";
   +            + "\"type\":\"filtered\","
   +            + "\"filter\": {"
   +            + "    \"type\": \"regex\","
   +            + "    \"dimension\":\"market\","
   +            + "    \"pattern\":\".+\""
   +            + "},"
   +            + "\"aggregator\":{"
   +            + "    \"type\": \"" + ingestionAgg + "\","
   +            + "    \"name\": \"index_ah\","
   +            + "    \"fieldName\": \"index\""
   +            + "}}]";
   
        String parseSpec = "{"
                           + "\"type\" : \"string\","
   ```
   
   Note that the filtered aggregator matches all values, so the test result should not be affected by it.
   Currently, it filters out all the rows.
   
   After some investigation, I found the modification above did not affect the test result before https://github.com/apache/druid/pull/9484 was introduced.
   ValueMatcher returned in `FilteredAggregatorFactory#factorize` was made by `IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector` before, now it is made by `IncrementalIndexInputRowColumnSelectorFactory#makeColumnValueSelector`.
   In this case, `IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector` infers the columnValueSelector to be made is complex since underlying aggregator has complex type, so it wraps the selector to use ComplexMetricSerde.
   https://github.com/apache/druid/blob/e5f0da30ae15369f66c7e9ecc05a41c3d49eb2e6/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java#L133
   This is a bit strange, as the selected column is for filtering unlike normal aggregation, but it successfully handled complex types and applied filters before.
   https://github.com/apache/druid/pull/9484/files#diff-310a154902921cae9b118d769560f297L123-L130
   https://github.com/apache/druid/blob/c6c2282b59cda107089a9b3944477fd630bc0657/processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java#L215
   
   Now, if the selector is complex, it is considered unfilterable and it never matches values.
   https://github.com/apache/druid/blob/c557a1448d872e3aab03aec3bafb035e74583656/processing/src/main/java/org/apache/druid/segment/filter/PredicateValueMatcherFactory.java#L89
   
   An easy fix will be to add some condition to judge whether a columnValueSelector should be complex when referenced column is for filtering and not for delegated complex aggregation of filtered aggregator:
   ```diff
   diff --git a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
   index 8778700a7e..37c43cd0ce 100644
   --- a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
   +++ b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
   @@ -131,7 +130,7 @@ public abstract class IncrementalIndex<AggregatorType> extends AbstractIndex imp
          @Override
          public ColumnValueSelector<?> makeColumnValueSelector(final String column)
          {
   -        final boolean isComplexMetric = ValueType.COMPLEX.equals(agg.getType());
   +        final boolean isComplexMetric = agg.requiredFields().contains(column) && ValueType.COMPLEX.equals(agg.getType());
   ```
   I can do a PR, but more appropriate solutions are welcome.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] morokosi commented on issue #10293: Filtered Aggregator at ingestion time don't work

Posted by GitBox <gi...@apache.org>.
morokosi commented on issue #10293:
URL: https://github.com/apache/druid/issues/10293#issuecomment-689703547


   This issue can be reproduced with the following modification to the test case:
   (I used approxHistogram for an example, but any complex aggregator will do)
   
   ```diff
   diff --git a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
   index c38b3350a1..058c4c7388 100644
   --- a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
   +++ b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
   @@ -139,10 +139,17 @@ public class ApproximateHistogramAggregationTest extends InitializedNullHandling
        String ingestionAgg = ignoreNulls ? "approxHistogramFold" : "approxHistogram";
   
        String metricSpec = "[{"
   -                        + "\"type\": \"" + ingestionAgg + "\","
   -                        + "\"name\": \"index_ah\","
   -                        + "\"fieldName\": \"index\""
   -                        + "}]";
   +            + "\"type\":\"filtered\","
   +            + "\"filter\": {"
   +            + "    \"type\": \"regex\","
   +            + "    \"dimension\":\"market\","
   +            + "    \"pattern\":\".+\""
   +            + "},"
   +            + "\"aggregator\":{"
   +            + "    \"type\": \"" + ingestionAgg + "\","
   +            + "    \"name\": \"index_ah\","
   +            + "    \"fieldName\": \"index\""
   +            + "}}]";
   
        String parseSpec = "{"
                           + "\"type\" : \"string\","
   ```
   
   Note that the filtered aggregator matches all values, so the test result should not be affected by it.
   Currently, it filters out all the rows.
   
   After some investigation, I found the modification above did not affect the test result before https://github.com/apache/druid/pull/9484 was introduced.
   ValueMatcher returned in `FilteredAggregatorFactory#factorize` was made by `IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector` before, now it is made by `IncrementalIndexInputRowColumnSelectorFactory#makeColumnValueSelector`.
   In this case, `IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector` infers the columnValueSelector to be made is complex since underlying aggregator has complex type, so it wraps the selector to use ComplexMetricSerde.
   https://github.com/apache/druid/blob/e5f0da30ae15369f66c7e9ecc05a41c3d49eb2e6/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java#L133
   This is a bit strange, as the selected column is for filtering unlike normal aggregation, but it successfully handled complex types and applied filters before.
   https://github.com/apache/druid/pull/9484/files#diff-310a154902921cae9b118d769560f297L123-L130
   https://github.com/apache/druid/blob/c6c2282b59cda107089a9b3944477fd630bc0657/processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java#L215
   
   Now, if the selector is complex, it is considered unfilterable and it never match values.
   https://github.com/apache/druid/blob/c557a1448d872e3aab03aec3bafb035e74583656/processing/src/main/java/org/apache/druid/segment/filter/PredicateValueMatcherFactory.java#L89
   
   An easy fix will be to add some condition to judge whether a columnValueSelector should be complex when referenced column is for filtering and not for delegated complex aggregation of filtered aggregator:
   ```diff
   diff --git a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
   index 8778700a7e..37c43cd0ce 100644
   --- a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
   +++ b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
   @@ -131,7 +130,7 @@ public abstract class IncrementalIndex<AggregatorType> extends AbstractIndex imp
          @Override
          public ColumnValueSelector<?> makeColumnValueSelector(final String column)
          {
   -        final boolean isComplexMetric = ValueType.COMPLEX.equals(agg.getType());
   +        final boolean isComplexMetric = agg.requiredFields().contains(column) && ValueType.COMPLEX.equals(agg.getType());
   ```
   I can do a PR, but more appropriate solutions are welcome.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org