You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Volodymyr Tkach <vo...@gmail.com> on 2017/10/03 13:26:32 UTC

Newbie: Help with VolcanoPlanner query optimization

Hello everyone,

I'm new to calcite, and now I'm dealing with issue, when VolcanoPlanner
wrongly selects RelNode with  the best cost for the current RelSubSet.

I make the next query with Apache Drill:
 *select count(distinct l_linenumber), sum(l_linenumber) from
dfs.`/tmp/lineitem.parquet` group by l_shipdate;*

The JVM is running with enabled assertions and calcite debugging on: ...
-ea -Dcalcite.debug=true

The VolcanoPlanner@validate methods throws the next exception

Error: SYSTEM ERROR: AssertionError: rel
[rel#319:SortPrel.PHYSICAL.HASH_DISTRIBUTED([[$0]]).[0](input=rel#301:Subset#15.PHYSICAL.HASH_DISTRIBUTED([[$0]]).[],sort0=$0,dir0=ASC)]
has lower cost {313511.75 rows, 1.01686322832289E7 cpu, 0.0 io,
1.03520256E9 network, 1078336.0 memory} than best cost {318927.5 rows,
1.0424536013971556E7 cpu, 0.0 io, 1.03520256E9 network, 1059080.0 memory}
of subset [rel#200:Subset#15.PHYSICAL.HASH_DISTRIBUTED([[$0]]).[0]]


I tried to find the place in the code where and why the RelNode is selected
wrongly, but still no success.

Can somebody suggest any ideas how to find out why not the best RelNode is
selected as best one?