You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/29 19:18:13 UTC

[GitHub] [druid] erkdursun opened a new issue #9792: Aggregation functions are not working properly when using join feature

erkdursun opened a new issue #9792:
URL: https://github.com/apache/druid/issues/9792


   Aggregation functions are not working properly when using join feature
   
   ### Affected Version
   0.18.0
   ### Description
   I have a query to find duration of event which returns 1 record in total
   
   ![image](https://user-images.githubusercontent.com/35697140/80636547-57a5c380-8a66-11ea-9c1a-de85530d5ed3.png)
   
   When I try to **count the results with query** it returns **6 records** and if I try to sum all duration for one key it returns some dummy record which should be 86341
   
   ![image](https://user-images.githubusercontent.com/35697140/80636836-b9fec400-8a66-11ea-8f27-6436810c98f7.png)
   
   ![image](https://user-images.githubusercontent.com/35697140/80636900-d7cc2900-8a66-11ea-8ddd-200e111c1f10.png)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
gianm commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-621425704


   By the way, could you also try running these queries _without_ "Smart query limit" enabled?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] erkdursun commented on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
erkdursun commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-621486273


   > Hi @erkdursun,
   > 
   > Thanks for the report.
   > 
   > Could you please include the results of "EXPLAIN PLAN" for this query? i.e. `EXPLAIN PLAN FOR SELECT ...`
   
   
   Hello these are the EXPLAIN PLAN results
   
   ![image](https://user-images.githubusercontent.com/35697140/80650360-89c22000-8a7c-11ea-8077-f27d2bd59970.png)
   
   ~~~
   ----SUM----
   DruidOuterQueryRel(query=[{"queryType":"timeseries","dataSource":{"type":"table","name":"__subquery__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"descending":false,"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"longSum","name":"a0","fieldName":"duration","expression":null}],"postAggregations":[],"limit":2147483647,"context":{"skipEmptyBuckets":true,"sqlQueryId":"f79085fd-5296-4899-aa77-de389875b939"}}], signature=[{a0:LONG}])
     DruidJoinQueryRel(condition=[=($0, $3)], joinType=[left], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"filter":{"type":"and","fields":[{"type":"expression","expression":"(\"__time\" < \"__time0\")"},{"type":"expression","expression":"(\"value\" != \"value0\")"},{"type":"not","field":{"type":"selector","dimension":"value","value":"0","extractionFn":null}}]},"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"__time","outputName":"d1","outputType":"LONG"},{"type":"default","dimension":"value","outputName":"d2","outputType":"LONG"},{"type":"default","dimension":"value0","outputName":"d3","outputType":"LONG"}],"aggregations":[{"type":"longMin","name":"a0","fieldName":"__time0","expression":null}],"postAggregations":[{"type":"expression","name":"p0","expression":"div((\"a0\" - \"d1\"),1000)","ordering":null}],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"sqlQueryId":"f79085fd-5296-4899-aa77-de389875b939"},"descending":false}], signature=[{p0:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-22T00:00:00.000Z/2020-04-28T21:33:13.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"f79085fd-5296-4899-aa77-de389875b939"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-22T00:00:00.000Z/2020-04-28T21:33:13.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"f79085fd-5296-4899-aa77-de389875b939"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
   ~~~
   
   ![image](https://user-images.githubusercontent.com/35697140/80650371-90509780-8a7c-11ea-9c1b-1cc611741ae1.png)
   
   ~~~
   ----COUNT----
   DruidOuterQueryRel(query=[{"queryType":"timeseries","dataSource":{"type":"table","name":"__subquery__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"descending":false,"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"limit":2147483647,"context":{"skipEmptyBuckets":true,"sqlQueryId":"f22dc038-f377-47b4-8802-40abe3a68bf1"}}], signature=[{a0:LONG}])
     DruidJoinQueryRel(condition=[=($0, $3)], joinType=[left], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"filter":{"type":"and","fields":[{"type":"expression","expression":"(\"__time\" < \"__time0\")"},{"type":"expression","expression":"(\"value\" != \"value0\")"},{"type":"not","field":{"type":"selector","dimension":"value","value":"0","extractionFn":null}}]},"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"__time","outputName":"d1","outputType":"LONG"},{"type":"default","dimension":"value","outputName":"d2","outputType":"LONG"},{"type":"default","dimension":"value0","outputName":"d3","outputType":"LONG"}],"aggregations":[],"postAggregations":[{"type":"expression","name":"p0","expression":"0","ordering":null}],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"sqlQueryId":"f22dc038-f377-47b4-8802-40abe3a68bf1"},"descending":false}], signature=[{p0:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-22T00:00:00.000Z/2020-04-28T21:40:00.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"f22dc038-f377-47b4-8802-40abe3a68bf1"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-22T00:00:00.000Z/2020-04-28T21:40:00.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"f22dc038-f377-47b4-8802-40abe3a68bf1"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
   ~~~
   
   > By the way, could you also try running these queries _without_ "Smart query limit" enabled?
   
   Disabling smart query didn't work. Same results again


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
suneet-s commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-624148930


   @erkdursun I think you can unblock yourself by changing your query from a left join to an inner join. Druid appears to produce consistent results if an inner join query is used.
   
   For your query, since you are joining a table with itself, I believe a left join should be identical to an inner join, thanks again for finding and reporting this bug.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei closed issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
jon-wei closed issue #9792:
URL: https://github.com/apache/druid/issues/9792


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s edited a comment on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
suneet-s edited a comment on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-623834181


   I was able to re-produce this on the sample wikipedia dataset.
   
   ```
   with abc as
   (
     SELECT page, "__time", sum_added, sum_deleted from wikipedia WHERE "countryName" = 'United States' and "__time" > '2016'
   )
   , klm AS
   (
       SELECT 
         t1.page
         from 
           abc as t1 left join abc as t2 on t1.page = t2.page
         where t1.sum_added < t2.sum_added
         group by 1
   )
   SELECT count(*) from klm
   ```
   
   The query above `count(*)` returns 306 results. This is the same as the number of results if there is no filter applied (ie remove where `t1.sum_added < t2.sum_added`)
   
   If we change the query to `select *`, it returns 18 results, which appears to be correct.
   
   This issue only happens if the filter contains an expression from t2. For example, using a filter `t1.sum_added > t1.sum_deleted` or `t1.sum_added > 1000` returns the expected consistent results, but `t2.sum_added > 1999` returns inconsistent results.
   
   My hypothesis is that something in calcite is trying to deciding that we don't need to apply the filters for the aggregate query. I believe this to be the case because I ran the queries with `enableJoinFilterPushDown` set to false.
   
   I could not reproduce this against a druid datasource
   ```
   SELECT t1.page from  wikipedia as t1 left join wikipedia as t2 on t1.page = t2.page WHERE t1."countryName" = 'United States' and t1.sum_added < t2.sum_added group by 1
   ```
   
   Next steps:
   * Look at the calcite plans more closely to find the difference in the generated plans


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] erkdursun edited a comment on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
erkdursun edited a comment on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-621486273


   > Hi @erkdursun,
   > 
   > Thanks for the report.
   > 
   > Could you please include the results of "EXPLAIN PLAN" for this query? i.e. `EXPLAIN PLAN FOR SELECT ...`
   
   
   Hello these are the EXPLAIN PLAN results
   
   ![image](https://user-images.githubusercontent.com/35697140/80650360-89c22000-8a7c-11ea-8077-f27d2bd59970.png)
   
   ~~~
   ----SUM----
   DruidOuterQueryRel(query=[{"queryType":"timeseries","dataSource":{"type":"table","name":"__subquery__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"descending":false,"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"longSum","name":"a0","fieldName":"duration","expression":null}],"postAggregations":[],"limit":2147483647,"context":{"skipEmptyBuckets":true,"sqlQueryId":"f79085fd-5296-4899-aa77-de389875b939"}}], signature=[{a0:LONG}])
     DruidJoinQueryRel(condition=[=($0, $3)], joinType=[left], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"filter":{"type":"and","fields":[{"type":"expression","expression":"(\"__time\" < \"__time0\")"},{"type":"expression","expression":"(\"value\" != \"value0\")"},{"type":"not","field":{"type":"selector","dimension":"value","value":"0","extractionFn":null}}]},"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"__time","outputName":"d1","outputType":"LONG"},{"type":"default","dimension":"value","outputName":"d2","outputType":"LONG"},{"type":"default","dimension":"value0","outputName":"d3","outputType":"LONG"}],"aggregations":[{"type":"longMin","name":"a0","fieldName":"__time0","expression":null}],"postAggregations":[{"type":"expression","name":"p0","expression":"div((\"a0\" - \"d1\"),1000)","ordering":null}],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"sqlQueryId":"f79085fd-5296-4899-aa77-de389875b939"},"descending":false}], signature=[{p0:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-22T00:00:00.000Z/2020-04-28T21:33:13.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"f79085fd-5296-4899-aa77-de389875b939"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-22T00:00:00.000Z/2020-04-28T21:33:13.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"f79085fd-5296-4899-aa77-de389875b939"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
   ~~~
   
   ![image](https://user-images.githubusercontent.com/35697140/80650371-90509780-8a7c-11ea-9c1b-1cc611741ae1.png)
   
   ~~~
   ----COUNT----
   DruidOuterQueryRel(query=[{"queryType":"timeseries","dataSource":{"type":"table","name":"__subquery__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"descending":false,"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"limit":2147483647,"context":{"skipEmptyBuckets":true,"sqlQueryId":"f22dc038-f377-47b4-8802-40abe3a68bf1"}}], signature=[{a0:LONG}])
     DruidJoinQueryRel(condition=[=($0, $3)], joinType=[left], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"filter":{"type":"and","fields":[{"type":"expression","expression":"(\"__time\" < \"__time0\")"},{"type":"expression","expression":"(\"value\" != \"value0\")"},{"type":"not","field":{"type":"selector","dimension":"value","value":"0","extractionFn":null}}]},"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"__time","outputName":"d1","outputType":"LONG"},{"type":"default","dimension":"value","outputName":"d2","outputType":"LONG"},{"type":"default","dimension":"value0","outputName":"d3","outputType":"LONG"}],"aggregations":[],"postAggregations":[{"type":"expression","name":"p0","expression":"0","ordering":null}],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"sqlQueryId":"f22dc038-f377-47b4-8802-40abe3a68bf1"},"descending":false}], signature=[{p0:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-22T00:00:00.000Z/2020-04-28T21:40:00.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"f22dc038-f377-47b4-8802-40abe3a68bf1"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-22T00:00:00.000Z/2020-04-28T21:40:00.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"f22dc038-f377-47b4-8802-40abe3a68bf1"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
   ~~~
   
   > By the way, could you also try running these queries _without_ "Smart query limit" enabled?
   
   Disabling smart query didn't work. Same results again
   
   ![image](https://user-images.githubusercontent.com/35697140/80651475-a3646700-8a7e-11ea-9187-8d2a2c0400c0.png)
   
   ![image](https://user-images.githubusercontent.com/35697140/80651481-a6f7ee00-8a7e-11ea-82f2-4f6fae1d1f4b.png)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] erkdursun commented on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
erkdursun commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-624314268


   Hello @suneet-s 
   First of all thanks for your and @gianm efforts. I had already tried inner join but unfortunately it showed same behavior in my query.
   
   ![image](https://user-images.githubusercontent.com/35697140/81116228-aa332400-8f2d-11ea-9c83-bd672b6ccd3a.png)
   
   ![image](https://user-images.githubusercontent.com/35697140/81116312-ce8f0080-8f2d-11ea-9ece-67d1ba1c8e1b.png)
   
   ![image](https://user-images.githubusercontent.com/35697140/81116353-e36b9400-8f2d-11ea-9e79-51e657b60009.png)
   
   PLANS for the queries
   
   #### select * from klm
   ~~~
   DruidJoinQueryRel(condition=[=($0, $3)], joinType=[inner], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"filter":{"type":"and","fields":[{"type":"expression","expression":"(\"__time\" < \"__time0\")"},{"type":"expression","expression":"(\"value\" != \"value0\")"},{"type":"not","field":{"type":"selector","dimension":"value","value":"0","extractionFn":null}}]},"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"v0","outputName":"d0","outputType":"STRING"},{"type":"default","dimension":"__time","outputName":"d1","outputType":"LONG"},{"type":"default","dimension":"value","outputName":"d2","outputType":"LONG"},{"type":"default","dimension":"value0","outputName":"d3","outputType":"LONG"}],"aggregations":[{"type":"longMin","name":"a0","fieldName":"__time0","expression":null}],"postAggregations":[{"type":"expression","name":"p0","expression":"div((\"a0\" - \"d1\"),1000)","ordering":null}],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"sqlQueryId":"6c957e82-c860-4c27-aa24-db664bacba2c"},"descending":false}], signature=[{d0:STRING, d1:LONG, d2:LONG, a0:LONG, d3:LONG, p0:LONG}])
     DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-21T00:00:00.000Z/2020-04-28T21:04:42.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"6c957e82-c860-4c27-aa24-db664bacba2c"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
     DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-21T00:00:00.000Z/2020-04-28T21:04:42.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"6c957e82-c860-4c27-aa24-db664bacba2c"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
   ~~~
   
   #### select count(*) from klm
   ~~~
   DruidOuterQueryRel(query=[{"queryType":"timeseries","dataSource":{"type":"table","name":"__subquery__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"descending":false,"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"limit":2147483647,"context":{"skipEmptyBuckets":true,"sqlQueryId":"6c2eda1c-df9b-40d2-8218-bbcb72181825"}}], signature=[{a0:LONG}])
     DruidJoinQueryRel(condition=[=($0, $3)], joinType=[inner], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"filter":{"type":"and","fields":[{"type":"expression","expression":"(\"__time\" < \"__time0\")"},{"type":"expression","expression":"(\"value\" != \"value0\")"},{"type":"not","field":{"type":"selector","dimension":"value","value":"0","extractionFn":null}}]},"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"__time","outputName":"d1","outputType":"LONG"},{"type":"default","dimension":"value","outputName":"d2","outputType":"LONG"},{"type":"default","dimension":"value0","outputName":"d3","outputType":"LONG"}],"aggregations":[],"postAggregations":[{"type":"expression","name":"p0","expression":"0","ordering":null}],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"sqlQueryId":"6c2eda1c-df9b-40d2-8218-bbcb72181825"},"descending":false}], signature=[{p0:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-21T00:00:00.000Z/2020-04-28T21:05:57.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"6c2eda1c-df9b-40d2-8218-bbcb72181825"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-21T00:00:00.000Z/2020-04-28T21:05:57.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"6c2eda1c-df9b-40d2-8218-bbcb72181825"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
   ~~~
   
   #### select sum(duration) from klm
   ~~~
   DruidOuterQueryRel(query=[{"queryType":"timeseries","dataSource":{"type":"table","name":"__subquery__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"descending":false,"virtualColumns":[],"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"longSum","name":"a0","fieldName":"duration","expression":null}],"postAggregations":[],"limit":2147483647,"context":{"skipEmptyBuckets":true,"sqlQueryId":"9dbd9333-701c-4c3c-a409-5d50a3bb7386"}}], signature=[{a0:LONG}])
     DruidJoinQueryRel(condition=[=($0, $3)], joinType=[inner], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"filter":{"type":"and","fields":[{"type":"expression","expression":"(\"__time\" < \"__time0\")"},{"type":"expression","expression":"(\"value\" != \"value0\")"},{"type":"not","field":{"type":"selector","dimension":"value","value":"0","extractionFn":null}}]},"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"__time","outputName":"d1","outputType":"LONG"},{"type":"default","dimension":"value","outputName":"d2","outputType":"LONG"},{"type":"default","dimension":"value0","outputName":"d3","outputType":"LONG"}],"aggregations":[{"type":"longMin","name":"a0","fieldName":"__time0","expression":null}],"postAggregations":[{"type":"expression","name":"p0","expression":"div((\"a0\" - \"d1\"),1000)","ordering":null}],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"sqlQueryId":"9dbd9333-701c-4c3c-a409-5d50a3bb7386"},"descending":false}], signature=[{p0:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-21T00:00:00.000Z/2020-04-28T21:06:21.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"9dbd9333-701c-4c3c-a409-5d50a3bb7386"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
       DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-21T00:00:00.000Z/2020-04-28T21:06:21.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"9dbd9333-701c-4c3c-a409-5d50a3bb7386"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
   ~~~
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] erkdursun commented on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
erkdursun commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-621687185


   > @erkdursun Could you please attach the plan for the basic `select * from klm` as well? It'd be useful to be able to compare them.
   
   ![image](https://user-images.githubusercontent.com/35697140/80688191-19df8400-8ad4-11ea-8c4a-4432f1c39979.png)
   
   ~~~
   DruidJoinQueryRel(condition=[=($0, $3)], joinType=[left], query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"__join__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns":[],"filter":{"type":"and","fields":[{"type":"expression","expression":"(\"__time\" < \"__time0\")"},{"type":"expression","expression":"(\"value\" != \"value0\")"},{"type":"not","field":{"type":"selector","dimension":"value","value":"0","extractionFn":null}}]},"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"__time","outputName":"d1","outputType":"LONG"},{"type":"default","dimension":"value","outputName":"d2","outputType":"LONG"},{"type":"default","dimension":"value0","outputName":"d3","outputType":"LONG"}],"aggregations":[{"type":"longMin","name":"a0","fieldName":"__time0","expression":null}],"postAggregations":[{"type":"expression","name":"p0","expression":"'BAC011'","ordering":null},{"type":"expression","name":"p1","expression":"div((\"a0\" - \"d1\"),1000)","ordering":null}],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"sqlQueryId":"6dfa7834-01c3-44c8-8885-c6bdf7d1f321"},"descending":false}], signature=[{p0:STRING, d1:LONG, d2:LONG, a0:LONG, d3:LONG, p1:LONG}])
     DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-23T00:00:00.000Z/2020-04-29T08:08:28.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"6dfa7834-01c3-44c8-8885-c6bdf7d1f321"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
     DruidQueryRel(query=[{"queryType":"scan","dataSource":{"type":"table","name":"rich-event-ordered"},"intervals":{"type":"intervals","intervals":["2020-04-23T00:00:00.000Z/2020-04-29T08:08:28.001Z"]},"virtualColumns":[{"type":"expression","name":"v0","expression":"'BAC011'","outputType":"STRING"}],"resultFormat":"compactedList","batchSize":20480,"limit":9223372036854775807,"order":"none","filter":{"type":"and","fields":[{"type":"selector","dimension":"ELEMENT_NAME","value":"TcLruSta","extractionFn":null},{"type":"selector","dimension":"B1_NAME","value":"BAC011","extractionFn":null}]},"columns":["__time","v0","value"],"legacy":false,"context":{"sqlQueryId":"6dfa7834-01c3-44c8-8885-c6bdf7d1f321"},"descending":false,"granularity":{"type":"all"}}], signature=[{v0:STRING, __time:LONG, value:LONG}])
   ~~~
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
gianm commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-621424927


   Hi @erkdursun,
   
   Thanks for the report.
   
   Could you please include the results of "EXPLAIN PLAN" for this query? i.e. `EXPLAIN PLAN FOR SELECT ...`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
suneet-s commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-623852159


   A `count(*)` query generates a native query that looks like
   
   ```
   {
         "queryType": "groupBy",
         "dataSource": {
           "type": "query",
           "query": {
             "queryType": "groupBy",
             "dataSource": {
               "type": "join",
               ...
   ```
   
   However a `select *` query generates a native query that looks like
   
   ```
   "queryType": "groupBy",
         "dataSource": {
           "type": "join",
           ...
   ```
   
   Too late in the day to actually figure out what's going on here, but hopefully tomorrow it will be obvious what the bug is 🤞 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
gianm commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-621526352


   @erkdursun Could you please attach the plan for the basic `select * from klm` as well? It'd be useful to be able to compare them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on issue #9792: Aggregation functions are not working properly when using join feature

Posted by GitBox <gi...@apache.org>.
suneet-s commented on issue #9792:
URL: https://github.com/apache/druid/issues/9792#issuecomment-623834181


   I was able to re-produce this on the sample wikipedia dataset.
   
   ```
   with abc as
   (
     SELECT page, "__time", sum_added, sum_deleted from wikipedia WHERE "countryName" = 'United States' and "__time" > '2016'
   )
   , klm AS
   (
       SELECT 
         t1.page
         from 
           abc as t1 left join abc as t2 on t1.page = t2.page
         where t1.sum_added < t2.sum_added
         group by 1
   )
   SELECT count(*) from klm
   ```
   
   The query above [count(*)] returns 306 results. This is the same as the number of results if there is no filter applied (ie remove where t1.sum_added < t2.sum_added)
   
   If we change the query to select *, it returns 18 results, which appears to be correct.
   
   This issue only happens if the filter contains an expression from t2. For example, using a filter `t1.sum_added > t1.sum_deleted` or `t1.sum_added > 1000` returns the expected consistent results, but `t2.sum_added > 1999` returns inconsistent results.
   
   My hypothesis is that something in calcite is trying to deciding that we don't need to apply the filters for the aggregate query. I believe this to be the case because I ran the queries with `enableJoinFilterPushDown` set to false.
   
   I could not reproduce this against a druid datasource
   ```
   SELECT t1.page from  wikipedia as t1 left join wikipedia as t2 on t1.page = t2.page WHERE t1."countryName" = 'United States' and t1.sum_added < t2.sum_added group by 1
   ```
   
   Next steps:
   * Look at the calcite plans more closely to find the difference in the generated plans


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org