You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Volodymyr Tkach <vo...@gmail.com> on 2017/10/03 13:26:32 UTC
Newbie: Help with VolcanoPlanner query optimization
Hello everyone,
I'm new to calcite, and now I'm dealing with issue, when VolcanoPlanner
wrongly selects RelNode with the best cost for the current RelSubSet.
I make the next query with Apache Drill:
*select count(distinct l_linenumber), sum(l_linenumber) from
dfs.`/tmp/lineitem.parquet` group by l_shipdate;*
The JVM is running with enabled assertions and calcite debugging on: ...
-ea -Dcalcite.debug=true
The VolcanoPlanner@validate methods throws the next exception
Error: SYSTEM ERROR: AssertionError: rel
[rel#319:SortPrel.PHYSICAL.HASH_DISTRIBUTED([[$0]]).[0](input=rel#301:Subset#15.PHYSICAL.HASH_DISTRIBUTED([[$0]]).[],sort0=$0,dir0=ASC)]
has lower cost {313511.75 rows, 1.01686322832289E7 cpu, 0.0 io,
1.03520256E9 network, 1078336.0 memory} than best cost {318927.5 rows,
1.0424536013971556E7 cpu, 0.0 io, 1.03520256E9 network, 1059080.0 memory}
of subset [rel#200:Subset#15.PHYSICAL.HASH_DISTRIBUTED([[$0]]).[0]]
I tried to find the place in the code where and why the RelNode is selected
wrongly, but still no success.
Can somebody suggest any ideas how to find out why not the best RelNode is
selected as best one?