You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "somu-imply (via GitHub)" <gi...@apache.org> on 2023/10/23 23:42:00 UTC

[PR] Fixing nested group by query with order by in outer query (druid)

somu-imply opened a new pull request, #15237:
URL: https://github.com/apache/druid/pull/15237

   Consider this query
   ```
   with t AS (SELECT distinct(m2) as mo, APPROX_COUNT_DISTINCT(m1) / ( TIMESTAMPDIFF(DAY,min( __time ), CURRENT_TIMESTAMP  ) + 0.0000000001) as trend_score
   FROM "foo"
   GROUP BY 1
   ORDER BY trend_score DESC
   LIMIT 2)
   select mo, (MAX(trend_score)) from t
   where mo > 2
   GROUP BY 1 
   ORDER BY 2 DESC
   ```
   This currently fails to plan with the stack trace
   
   
   ```
   Caused by: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not enough rules to produce a node with desired properties: convention=DRUID, sort=[1 DESC].
   Missing conversion is LogicalSort[convention: NONE -> DRUID]
   There is 1 empty subset: rel#126:RelSubset#7.DRUID.[1 DESC], the relevant part of the original plan is as follows
   124:LogicalSort(sort0=[$1], dir0=[DESC], fetch=[1001:BIGINT])
     120:LogicalFilter(subset=[rel#128:RelSubset#5.NONE.[]], condition=[>($0, 2)])
       118:LogicalSort(subset=[rel#119:RelSubset#4.NONE.[1 DESC]], sort0=[$1], dir0=[DESC], fetch=[2])
         116:LogicalProject(subset=[rel#117:RelSubset#3.NONE.[]], mo=[$0], trend_score=[/($1, +(CAST(/INT(Reinterpret(-(2023-10-17 23:46:28.328, $2)), 86400000)):INTEGER NOT NULL, 1E-10:DECIMAL(11, 10)))])
           114:LogicalAggregate(subset=[rel#115:RelSubset#2.NONE.[]], group=[{0}], agg#0=[APPROX_COUNT_DISTINCT($1)], agg#1=[MIN($2)])
             112:LogicalProject(subset=[rel#113:RelSubset#1.NONE.[]], mo=[$5], m1=[$4], __time=[$0])
               63:LogicalTableScan(subset=[rel#111:RelSubset#0.NONE.[]], table=[[druid, foo]])
   
   Root: rel#126:RelSubset#7.DRUID.[1 DESC]
   Original rel:
   LogicalSort(subset=[rel#126:RelSubset#7.DRUID.[1 DESC]], sort0=[$1], dir0=[DESC], fetch=[1001:BIGINT]): rowcount = 1.0, cumulative cost = {1.0 rows, 20.0 cpu, 0.0 io}, id = 124
     LogicalFilter(subset=[rel#128:RelSubset#5.NONE.[]], condition=[>($0, 2)]): rowcount = 1.0, cumulative cost = {1.0 rows, 2.0 cpu, 0.0 io}, id = 120
       LogicalSort(subset=[rel#119:RelSubset#4.NONE.[1 DESC]], sort0=[$1], dir0=[DESC], fetch=[2]): rowcount = 2.0, cumulative cost = {2.0 rows, 200.0 cpu, 0.0 io}, id = 118
         LogicalProject(subset=[rel#117:RelSubset#3.NONE.[]], mo=[$0], trend_score=[/($1, +(CAST(/INT(Reinterpret(-(2023-10-17 23:46:28.328, $2)), 86400000)):INTEGER NOT NULL, 1E-10:DECIMAL(11, 10)))]): rowcount = 10.0, cumulative cost = {10.0 rows, 20.0 cpu, 0.0 io}, id = 116
           LogicalAggregate(subset=[rel#115:RelSubset#2.NONE.[]], group=[{0}], agg#0=[APPROX_COUNT_DISTINCT($1)], agg#1=[MIN($2)]): rowcount = 10.0, cumulative cost = {12.5 rows, 0.0 cpu, 0.0 io}, id = 114
             LogicalProject(subset=[rel#113:RelSubset#1.NONE.[]], mo=[$5], m1=[$4], __time=[$0]): rowcount = 100.0, cumulative cost = {100.0 rows, 300.0 cpu, 0.0 io}, id = 112
               LogicalTableScan(subset=[rel#111:RelSubset#0.NONE.[]], table=[[druid, foo]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 63
   
           at org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:718) ~[calcite-core-1.35.0.jar:1.35.0]
           at org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:391) ~[calcite-core-1.35.0.jar:1.35.0]
           at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:533) ~[calcite-core-1.35.0.jar:1.35.0]
           at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:317) ~[calcite-core-1.35.0.jar:1.35.0]
           at org.apache.calcite.tools.Programs$SequenceProgram.run(Programs.java:337) ~[calcite-core-1.35.0.jar:1.35.0]
           at org.apache.druid.sql.calcite.planner.CalcitePlanner.transform(CalcitePlanner.java:452) ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
           at org.apache.druid.sql.calcite.planner.QueryHandler.planWithDruidConvention(QueryHandler.java:590) ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
           at org.apache.druid.sql.calcite.planner.QueryHandler$SelectHandler.planForDruid(QueryHandler.java:738) ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
           at org.apache.druid.sql.calcite.planner.QueryHandler.plan(QueryHandler.java:220) ~[druid-sql-29.0.0-SNAPSHOT.jar:29.0.0-SNAPSHOT]
           ... 84 more
   ```
   In this PR we fix planning by removing the Aggregate remove rule which plans these queries as scan over group by with order by on a non-time column which Druid does not support.
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query (druid)

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on PR #15237:
URL: https://github.com/apache/druid/pull/15237#issuecomment-1781516392

   This will be reverted back in https://github.com/apache/druid/pull/15241 as it introduces a better solution to solve the issue without needing to change plans for existing queries


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] remove calcite AggregateRemoveRule to fix nested group by query with order by in outer query (druid)

Posted by "soumyava (via GitHub)" <gi...@apache.org>.
soumyava merged PR #15237:
URL: https://github.com/apache/druid/pull/15237


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Fixing nested group by query with order by in outer query (druid)

Posted by "somu-imply (via GitHub)" <gi...@apache.org>.
somu-imply commented on code in PR #15237:
URL: https://github.com/apache/druid/pull/15237#discussion_r1369403744


##########
sql/src/test/resources/drill/window/queries/nestedAggs/multiWin_5.e:
##########
@@ -8,15 +8,15 @@ false	280487665.00748299428571428571
 false	295757331.64717262000000000000
 false	302662813.16785714333333333333
 false	304608131.82107142900000000000
-false	303537640.51502361272727272727
+false	303537640.515023651272727272727

Review Comment:
   These queries do not compare the query plan. The underlying query plan changed a bit due to removal of the rule hence the change in the terminating decimal places



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org