You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/06 04:28:13 UTC

[GitHub] [druid] clintropolis opened a new pull request #11208: fix count and average SQL aggregators on constant virtual columns

clintropolis opened a new pull request #11208:
URL: https://github.com/apache/druid/pull/11208


   ### Description
   This PR fixes a regression I believed was caused by #10135, which refactored the `AVG` and `COUNT` `SqlAggregator` implementations to directly construct the `FilteredAggregatorFactory` instead of [using `Aggregation.create(...).filter(...)`](https://github.com/apache/druid/pull/10135/files#diff-8816fcddac6763b07323c31d729ba5c3991c1fb902b74999b3f3385e407431a5L120), which caused any virtual columns which were created by the filter to no longer be associated with the `Aggregation` and so [not retained when the final query was constructed](https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java#L639), effectively making these aggregators process nil inputs instead of the correct values.
   
   While looking into this issue, I noticed that `FilteredAggregatorFactory.requiredFields` did not include the required columns of the `DimFilter`, but only of the delegate `AggregatorFactory`. This seemed a mistake to me so I have adjusted this so that it will now return the correct set of columns, and am using this, along with a new method on `VirtualColumnRegistry` which can filter a list of columns to only include columns which are virtual so that we can continue to populate the `Aggregation` virtual columns list.
   
   I think we can rework `Aggregation` to not require having this list of `VirtualColumn` at all and instead just check the registry against `requiredFields` when constructing the query, but it was a more disruptive change than I wanted to tie up in this fix, so I will save it for a follow-up.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on pull request #11208: fix count and average SQL aggregators on constant virtual columns

Posted by GitBox <gi...@apache.org>.
jon-wei commented on pull request #11208:
URL: https://github.com/apache/druid/pull/11208#issuecomment-834013171


   LGTM after CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] Makesh-Gmak commented on pull request #11208: fix count and average SQL aggregators on constant virtual columns

Posted by GitBox <gi...@apache.org>.
Makesh-Gmak commented on pull request #11208:
URL: https://github.com/apache/druid/pull/11208#issuecomment-905232955


   Could you please let us know when this fix will be officially released  (any date) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #11208: fix count and average SQL aggregators on constant virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis merged pull request #11208:
URL: https://github.com/apache/druid/pull/11208


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #11208: fix count and average SQL aggregators on constant virtual columns

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #11208:
URL: https://github.com/apache/druid/pull/11208#discussion_r627625346



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/CountSqlAggregator.java
##########
@@ -134,15 +134,17 @@ public Aggregation toDruidAggregation(
     } else {
       // Not COUNT(*), not distinct
       // COUNT(x) should count all non-null values of x.
-      return Aggregation.create(createCountAggregatorFactory(
+      AggregatorFactory theCount = createCountAggregatorFactory(

Review comment:
       How about making createCountAggregatorFactory return Aggregation instead of AggregatorFactory? It seems like that would be less error-prone. (Currently, since it returns an AggregatorFactory that might reference some virtual columns, all callers must check for that, which is easy to forget [hence this bug] and makes the callers more complex.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #11208: fix count and average SQL aggregators on constant virtual columns

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #11208:
URL: https://github.com/apache/druid/pull/11208#issuecomment-834093942


   This pull request **introduces 1 alert** when merging 4ee69fbd7b43b980220554da291ea0db664f48ed into 351059ca435233a5ddf60ef998b2fd4f3612f1e2 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-1037e4c023a8eb80e44c9d7195cd0196f50f8794)
   
   **new alerts:**
   
   * 1 for Container contents are never accessed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11208: fix count and average SQL aggregators on constant virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11208:
URL: https://github.com/apache/druid/pull/11208#discussion_r627869215



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/CountSqlAggregator.java
##########
@@ -134,15 +134,17 @@ public Aggregation toDruidAggregation(
     } else {
       // Not COUNT(*), not distinct
       // COUNT(x) should count all non-null values of x.
-      return Aggregation.create(createCountAggregatorFactory(
+      AggregatorFactory theCount = createCountAggregatorFactory(

Review comment:
       Instead I have just removed the `VirtualColumn` tracking from `Aggregation` entirely in favor of replacing it with a new `getRequiredColumns ` that can list all columns between agg factories and the post agg, and using the new find method on the `VirtualColumnRegistry` to look them up.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #11208: fix count and average SQL aggregators on constant virtual columns

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #11208:
URL: https://github.com/apache/druid/pull/11208#discussion_r627869215



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/CountSqlAggregator.java
##########
@@ -134,15 +134,17 @@ public Aggregation toDruidAggregation(
     } else {
       // Not COUNT(*), not distinct
       // COUNT(x) should count all non-null values of x.
-      return Aggregation.create(createCountAggregatorFactory(
+      AggregatorFactory theCount = createCountAggregatorFactory(

Review comment:
       Instead I have just removed the `VirtualColumn` tracking from `Aggregation` entirely in favor of replacing it with a new `getRequiredColumns ` that can list all columns between agg factories and the post agg, and using the new find method on the `VirtualColumnRegistry` to look them up.
   
   I think this is probably better overall because `Aggregation` is now simpler, and more importantly, its impossible to mess up (unless agg factory or postagg has a bug causing it to not list all of their column names)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org