You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Γιώργος Θεοδωράκης <gi...@gmail.com> on 2016/11/01 11:05:58 UTC

Re: How to use VolcanoPlanner

I am wondering if is it possible to push down projections in Volcano
generally. With the cost model of Volcano, a projection adds rows and cpu
cost and it can't be chosen. For example  for the next query:

"select s.products.productid "
+ "from s.products,s.orders "
+ "where s.orders.productid = s.products.productid  "
+ "and units>5 "

, the optimized plan I get from Volcano is this

EnumerableProject(productid=[$0]): rowcount = 5.3999999999999995,
cumulative cost = {51.555956815368326 rows, 37.4 cpu, 0.0 io}, id = 86
  EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
5.3999999999999995, cumulative cost = {46.15595681536833 rows, 32.0 cpu,
0.0 io}, id = 85
    EnumerableTableScan(table=[[s, products]]): rowcount = 6.0, cumulative
cost = {6.0 rows, 7.0 cpu, 0.0 io}, id = 40
    EnumerableFilter(condition=[>($2, 5)]): rowcount = 6.0, cumulative cost
= {18.0 rows, 25.0 cpu, 0.0 io}, id = 84
      EnumerableTableScan(table=[[s, orders]]): rowcount = 12.0, cumulative
cost = {12.0 rows, 13.0 cpu, 0.0 io}, id = 59

However the plan I would like to get is the one with projections pushed
down, which has bigger cumulative cost as we would have two more
projections at the source of inputs (with additional rowCount and cpu cost)
with Volcano's cost model, but smaller in most cases as we reduce the
memory needed(which is not taken in consideration here).

Are there any suggestions? At first I used hepPlanner, but I didn't get
optimized plans for join order with the built-in
rules(using JoinPushThroughJoinRule.RIGHT and JoinPushThroughJoinRule.LEFT
never stops, and using MultiJoinRules doesn't do anything). Then, I tried
to use VolcanoPlanner and I got optimized plans for join order with no
pushed-down projections. I also tried to use a combination of both planners
but I got errors. What should I do?

2016-10-27 16:00 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:

> I fixed the second error by changing my Statistics to:
>
> public Statistic getStatistic() {
> int rowCount = rows.size();
> return Statistics.of(rowCount, ImmutableList.<ImmutableBitSet>of());
> }
>
> Any suggestions of a better solution are welcome. The project push-down
> problem still exists...
>
> 2016-10-27 15:42 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
>
>> Hi,
>> I was missing the implementations of operators, and I added the built in
>> EnumerableRules until I create my own, in order to fix it. However, the
>> plan I get from Volcano Optimizer is different from the one I get from
>> HepPlanner, although I use the same rules. My problem is about Projection
>> push-down. The hepPlanner pushes Projections to the bottom of the RelNode
>> tree, and VolcanoPlanner keeps them always at the top (doesn't push them
>> through joins). I use these rules in both :
>>
>>        ProjectRemoveRule.INSTANCE,
>>        ProjectJoinTransposeRule.INSTANCE,
>>         ProjectFilterTransposeRule.INSTANCE, /*it is better to use
>> filter first and then project*/
>>        ProjectTableScanRule.INSTANCE,
>>        ProjectWindowTransposeRule.INSTANCE,
>>        ProjectMergeRule.INSTANCE
>>
>> and
>>                             EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE,
>>         EnumerableRules.ENUMERABLE_PROJECT_RULE,
>>                             ...
>>
>> Finally , when trying to use aggregate I get this error:
>>
>> Exception in thread "main" java.lang.AssertionError: Internal error:
>> Error while applying rule EnumerableTableScanRule, args
>> [rel#4:LogicalTableScan.NONE.[](table=[s, orders])]
>> at org.apache.calcite.util.Util.newInternal(Util.java:792)
>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>> VolcanoRuleCall.java:236)
>> at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>> VolcanoPlanner.java:819)
>> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
>> s.java:334)
>> at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:308)
>> at calcite.VolcanoTester.main(VolcanoTester.java:106)
>> Caused by: java.lang.AssertionError: Internal error: Error occurred while
>> applying rule EnumerableTableScanRule
>> at org.apache.calcite.util.Util.newInternal(Util.java:792)
>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
>> VolcanoRuleCall.java:148)
>> at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRul
>> eCall.java:225)
>> at org.apache.calcite.rel.convert.ConverterRule.onMatch(Convert
>> erRule.java:117)
>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>> VolcanoRuleCall.java:213)
>> ... 4 more
>> Caused by: java.lang.NullPointerException
>> at org.apache.calcite.schema.Statistics$2.isKey(Statistics.java:70)
>> at org.apache.calcite.prepare.RelOptTableImpl.isKey(RelOptTable
>> Impl.java:288)
>> at org.apache.calcite.rel.metadata.RelMdColumnUniqueness.areCol
>> umnsUnique(RelMdColumnUniqueness.java:76)
>> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique_$(Unknown
>> Source)
>> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique(Unknown
>> Source)
>> at org.apache.calcite.rel.metadata.RelMetadataQuery.areColumnsU
>> nique(RelMetadataQuery.java:461)
>> at org.apache.calcite.rel.metadata.RelMdUtil.areColumnsDefinite
>> lyUnique(RelMdUtil.java:216)
>> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
>> tinctRowCount(RelMdDistinctRowCount.java:75)
>> at GeneratedMetadataHandler_DistinctRowCount.getDistinctRowCount_$(Unknown
>> Source)
>> at GeneratedMetadataHandler_DistinctRowCount.getDistinctRowCount(Unknown
>> Source)
>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
>> RowCount(RelMetadataQuery.java:700)
>> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
>> tinctRowCount(RelMdDistinctRowCount.java:292)
>> at GeneratedMetadataHandler_DistinctRowCount.getDistinctRowCount_$(Unknown
>> Source)
>> at GeneratedMetadataHandler_DistinctRowCount.getDistinctRowCount(Unknown
>> Source)
>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
>> RowCount(RelMetadataQuery.java:700)
>> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
>> tinctRowCount(RelMdDistinctRowCount.java:138)
>> at GeneratedMetadataHandler_DistinctRowCount.getDistinctRowCount_$(Unknown
>> Source)
>> at GeneratedMetadataHandler_DistinctRowCount.getDistinctRowCount(Unknown
>> Source)
>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
>> RowCount(RelMetadataQuery.java:700)
>> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
>> tinctRowCount(RelMdDistinctRowCount.java:292)
>> at GeneratedMetadataHandler_DistinctRowCount.getDistinctRowCount_$(Unknown
>> Source)
>> at GeneratedMetadataHandler_DistinctRowCount.getDistinctRowCount(Unknown
>> Source)
>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
>> RowCount(RelMetadataQuery.java:700)
>> at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(
>> RelMdRowCount.java:194)
>> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
>> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount
>> (RelMetadataQuery.java:201)
>> at org.apache.calcite.rel.core.Aggregate.computeSelfCost(Aggreg
>> ate.java:304)
>> at org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.
>> getNonCumulativeCost(RelMdPercentageOriginalRows.java:162)
>> at GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost_$(Unknown
>> Source)
>> at GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost(Unknown
>> Source)
>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumul
>> ativeCost(RelMetadataQuery.java:258)
>> at org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(
>> VolcanoPlanner.java:1128)
>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>> vements0(RelSubset.java:336)
>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>> vements(RelSubset.java:319)
>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>> vements0(RelSubset.java:348)
>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>> vements(RelSubset.java:319)
>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>> vements0(RelSubset.java:348)
>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>> vements(RelSubset.java:319)
>> at org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(
>> VolcanoPlanner.java:1830)
>> at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(
>> VolcanoPlanner.java:1766)
>> at org.apache.calcite.plan.volcano.VolcanoPlanner.register(
>> VolcanoPlanner.java:1032)
>> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
>> red(VolcanoPlanner.java:1052)
>> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
>> red(VolcanoPlanner.java:1942)
>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
>> VolcanoRuleCall.java:136)
>> ... 7 more
>>
>> I define the Statistics in the tables I use like this:
>> public Statistic getStatistic() {
>> int rowCount = rows.size();
>> return Statistics.of(rowCount, null); //add List<ImmutableBitSet>
>> }
>>
>>
>> Thanks,
>> George
>>
>> 2016-10-16 7:28 GMT+03:00 Jungtaek Lim <ka...@gmail.com>:
>>
>>> Hi George,
>>>
>>> This patch is ported version (with small fixes) of Milinda's samza-sql
>>> implementation for Storm SQL.
>>> https://github.com/apache/storm/pull/1736
>>>
>>> In this patch I removed adding HepPlanner and just rely on Volcano
>>> Planner
>>> (so the patch may be the closer thing what you want).
>>> For now I also remove code regarding metadata since I'm not clear on how
>>> it
>>> works and what it helps, but I'll re-address once I can find its usage
>>> and
>>> benefits.
>>>
>>> Hope this helps.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> 2016년 10월 4일 (화) 오후 7:08, Γιώργος Θεοδωράκης <gi...@gmail.com>님이
>>> 작성:
>>>
>>> > I think I did as you said:
>>> >
>>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
>>> c/main/java/calcite/VolcanoTester.java
>>> >
>>> > and I get for every query I use:
>>> > Exception in thread "main"
>>> > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
>>> > [rel#10:Subset#2.NONE.[]] could not be implemented; planner state:
>>> > Root: rel#10:Subset#2.NONE.[]
>>> > Original rel:
>>> > ....
>>> > at
>>> >
>>> > org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplac
>>> er.visit(RelSubset.java:443)
>>> > at
>>> >
>>> > org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(
>>> RelSubset.java:293)
>>> > at
>>> >
>>> > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(V
>>> olcanoPlanner.java:835)
>>> > at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
>>> s.java:334)
>>> > at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl
>>> .java:308)
>>> > at calcite.VolcanoTester.main(VolcanoTester.java:77)
>>> >
>>> > My table's is defined here :
>>> >
>>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
>>> c/main/java/calcite/utils/OrdersTableFactory.java
>>> >
>>> >
>>> > Thank you for your time,
>>> > George
>>> >
>>> >
>>> > 2016-10-04 2:38 GMT+03:00 Jordan Halterman <jordan.halterman@gmail.com
>>> >:
>>> >
>>> > > The link you provided is a pretty good example. Build a
>>> FrameworkConfig
>>> > > with your schema, parser config, and other information, and use it to
>>> > > create a Planner. That Planner uses a VolcanoPlanner internally.
>>> What's
>>> > > missing from that particular example is just the addition of
>>> programs.
>>> > > Programs are effectively sets of rules you will use to optimize your
>>> > query.
>>> > > So, to add your FilterProjectTransposeRule to the planner, call this
>>> when
>>> > > building your FrameworkConfig:
>>> > >
>>> > > .programs(Programs.ofRules(FilterProjectTransposeRule.INSTANCE))
>>> > >
>>> > > That adds your program(s) to the set of programs in the planner, and
>>> > those
>>> > > programs can be accessed to optimize the query. Use the planner to
>>> > parse()
>>> > > your query, validate() your query, and then convert() your query
>>> into a
>>> > > logical plan. Then call...
>>> > >
>>> > > RelTraitSet traitSet = planner.emptyTraitSet().replac
>>> e(Convention.NONE);
>>> > > planner.transform(0. traitSet, logicalPlan);
>>> > >
>>> > > to apply the rules you added to the configuration. That should use
>>> the
>>> > > VolcanoPlanner to apply the rules you added in your Program. The
>>> trait
>>> > set
>>> > > that's passed to that method is the required output trait set. So,
>>> if you
>>> > > wanted to convert the logical plan into some physical convention,
>>> you'd
>>> > > pass your physical convention instead of Convention.NONE. I can
>>> respond
>>> > > with a full example if you need it in a bit. I just don't have the
>>> > capacity
>>> > > to write it ATM.
>>> > >
>>> > > On Mon, Oct 3, 2016 at 8:51 AM, Γιώργος Θεοδωράκης <
>>> > > giwrgosrtheod@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Hi,
>>> > > >
>>> > > > I want to parse an Sql query and transform it to an optimized
>>> > relational
>>> > > > plan (not convert it to physical !!) using calcite rules based on
>>> my
>>> > > > database schema and metadata. Right now, the only helpful example I
>>> > have
>>> > > > found for my purpose is taken from
>>> > > > https://github.com/milinda/samza-sql/blob/master/samza-
>>> > > >
>>> > sql-planner/src/main/java/org/apache/samza/sql/planner/Query
>>> Planner.java
>>> > > > ,
>>> > > > in which a simple Planner is used for parsing and validating Sql
>>> and a
>>> > > > HepPlanner is used for searching for an optimized plan based on
>>> > imported
>>> > > > rules.
>>> > > >
>>> > > > Is there any way to use in my case the VolcanoPlanner? The only
>>> > examples
>>> > > I
>>> > > > have seen so far from the test classes suggest that it should be
>>> used
>>> > for
>>> > > > converting relational expressions to physical ones. How can I make
>>> the
>>> > > > Volcano Planner "see" my SchemaPlus schema ,when I can only define
>>> > > > RelOptSchema? Can someone provide me with a complete example of
>>> using
>>> > > > Volcano Planner and adding rules, such
>>> > > > as FilterProjectTransposeRule.INSTANCE?
>>> > > >
>>> > > > Thanks in advance,
>>> > > > George
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: How to use VolcanoPlanner

Posted by Colm McHugh <co...@gmail.com>.
What errors did you get? It should be possible to use both Volcano and hep
when query planning (Drill does this, and possibly others).
It superficially sounds like applying a heuristics pass that includes the
project pushdown (and any other rules you may want to apply)
after the volcano pass should work.

On Tue, Nov 1, 2016 at 4:05 AM, Γιώργος Θεοδωράκης <gi...@gmail.com>
wrote:

> I am wondering if is it possible to push down projections in Volcano
> generally. With the cost model of Volcano, a projection adds rows and cpu
> cost and it can't be chosen. For example  for the next query:
>
> "select s.products.productid "
> + "from s.products,s.orders "
> + "where s.orders.productid = s.products.productid  "
> + "and units>5 "
>
> , the optimized plan I get from Volcano is this
>
> EnumerableProject(productid=[$0]): rowcount = 5.3999999999999995,
> cumulative cost = {51.555956815368326 rows, 37.4 cpu, 0.0 io}, id = 86
>   EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
> 5.3999999999999995, cumulative cost = {46.15595681536833 rows, 32.0 cpu,
> 0.0 io}, id = 85
>     EnumerableTableScan(table=[[s, products]]): rowcount = 6.0, cumulative
> cost = {6.0 rows, 7.0 cpu, 0.0 io}, id = 40
>     EnumerableFilter(condition=[>($2, 5)]): rowcount = 6.0, cumulative
> cost
> = {18.0 rows, 25.0 cpu, 0.0 io}, id = 84
>       EnumerableTableScan(table=[[s, orders]]): rowcount = 12.0, cumulative
> cost = {12.0 rows, 13.0 cpu, 0.0 io}, id = 59
>
> However the plan I would like to get is the one with projections pushed
> down, which has bigger cumulative cost as we would have two more
> projections at the source of inputs (with additional rowCount and cpu cost)
> with Volcano's cost model, but smaller in most cases as we reduce the
> memory needed(which is not taken in consideration here).
>
> Are there any suggestions? At first I used hepPlanner, but I didn't get
> optimized plans for join order with the built-in
> rules(using JoinPushThroughJoinRule.RIGHT and JoinPushThroughJoinRule.LEFT
> never stops, and using MultiJoinRules doesn't do anything). Then, I tried
> to use VolcanoPlanner and I got optimized plans for join order with no
> pushed-down projections. I also tried to use a combination of both planners
> but I got errors. What should I do?
>
> 2016-10-27 16:00 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
>
> > I fixed the second error by changing my Statistics to:
> >
> > public Statistic getStatistic() {
> > int rowCount = rows.size();
> > return Statistics.of(rowCount, ImmutableList.<ImmutableBitSet>of());
> > }
> >
> > Any suggestions of a better solution are welcome. The project push-down
> > problem still exists...
> >
> > 2016-10-27 15:42 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
> >
> >> Hi,
> >> I was missing the implementations of operators, and I added the built in
> >> EnumerableRules until I create my own, in order to fix it. However, the
> >> plan I get from Volcano Optimizer is different from the one I get from
> >> HepPlanner, although I use the same rules. My problem is about
> Projection
> >> push-down. The hepPlanner pushes Projections to the bottom of the
> RelNode
> >> tree, and VolcanoPlanner keeps them always at the top (doesn't push them
> >> through joins). I use these rules in both :
> >>
> >>        ProjectRemoveRule.INSTANCE,
> >>        ProjectJoinTransposeRule.INSTANCE,
> >>         ProjectFilterTransposeRule.INSTANCE, /*it is better to use
> >> filter first and then project*/
> >>        ProjectTableScanRule.INSTANCE,
> >>        ProjectWindowTransposeRule.INSTANCE,
> >>        ProjectMergeRule.INSTANCE
> >>
> >> and
> >>                             EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE,
> >>         EnumerableRules.ENUMERABLE_PROJECT_RULE,
> >>                             ...
> >>
> >> Finally , when trying to use aggregate I get this error:
> >>
> >> Exception in thread "main" java.lang.AssertionError: Internal error:
> >> Error while applying rule EnumerableTableScanRule, args
> >> [rel#4:LogicalTableScan.NONE.[](table=[s, orders])]
> >> at org.apache.calcite.util.Util.newInternal(Util.java:792)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >> VolcanoRuleCall.java:236)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
> >> VolcanoPlanner.java:819)
> >> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
> >> s.java:334)
> >> at org.apache.calcite.prepare.PlannerImpl.transform(
> PlannerImpl.java:308)
> >> at calcite.VolcanoTester.main(VolcanoTester.java:106)
> >> Caused by: java.lang.AssertionError: Internal error: Error occurred
> while
> >> applying rule EnumerableTableScanRule
> >> at org.apache.calcite.util.Util.newInternal(Util.java:792)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
> >> VolcanoRuleCall.java:148)
> >> at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRul
> >> eCall.java:225)
> >> at org.apache.calcite.rel.convert.ConverterRule.onMatch(Convert
> >> erRule.java:117)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >> VolcanoRuleCall.java:213)
> >> ... 4 more
> >> Caused by: java.lang.NullPointerException
> >> at org.apache.calcite.schema.Statistics$2.isKey(Statistics.java:70)
> >> at org.apache.calcite.prepare.RelOptTableImpl.isKey(RelOptTable
> >> Impl.java:288)
> >> at org.apache.calcite.rel.metadata.RelMdColumnUniqueness.areCol
> >> umnsUnique(RelMdColumnUniqueness.java:76)
> >> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.areColumnsU
> >> nique(RelMetadataQuery.java:461)
> >> at org.apache.calcite.rel.metadata.RelMdUtil.areColumnsDefinite
> >> lyUnique(RelMdUtil.java:216)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:75)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:292)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:138)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:292)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(
> >> RelMdRowCount.java:194)
> >> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
> >> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount
> >> (RelMetadataQuery.java:201)
> >> at org.apache.calcite.rel.core.Aggregate.computeSelfCost(Aggreg
> >> ate.java:304)
> >> at org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.
> >> getNonCumulativeCost(RelMdPercentageOriginalRows.java:162)
> >> at GeneratedMetadataHandler_NonCumulativeCost.
> getNonCumulativeCost_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_NonCumulativeCost.
> getNonCumulativeCost(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumul
> >> ativeCost(RelMetadataQuery.java:258)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(
> >> VolcanoPlanner.java:1128)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements0(RelSubset.java:336)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements(RelSubset.java:319)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements0(RelSubset.java:348)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements(RelSubset.java:319)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements0(RelSubset.java:348)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements(RelSubset.java:319)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(
> >> VolcanoPlanner.java:1830)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(
> >> VolcanoPlanner.java:1766)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.register(
> >> VolcanoPlanner.java:1032)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
> >> red(VolcanoPlanner.java:1052)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
> >> red(VolcanoPlanner.java:1942)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
> >> VolcanoRuleCall.java:136)
> >> ... 7 more
> >>
> >> I define the Statistics in the tables I use like this:
> >> public Statistic getStatistic() {
> >> int rowCount = rows.size();
> >> return Statistics.of(rowCount, null); //add List<ImmutableBitSet>
> >> }
> >>
> >>
> >> Thanks,
> >> George
> >>
> >> 2016-10-16 7:28 GMT+03:00 Jungtaek Lim <ka...@gmail.com>:
> >>
> >>> Hi George,
> >>>
> >>> This patch is ported version (with small fixes) of Milinda's samza-sql
> >>> implementation for Storm SQL.
> >>> https://github.com/apache/storm/pull/1736
> >>>
> >>> In this patch I removed adding HepPlanner and just rely on Volcano
> >>> Planner
> >>> (so the patch may be the closer thing what you want).
> >>> For now I also remove code regarding metadata since I'm not clear on
> how
> >>> it
> >>> works and what it helps, but I'll re-address once I can find its usage
> >>> and
> >>> benefits.
> >>>
> >>> Hope this helps.
> >>>
> >>> Thanks,
> >>> Jungtaek Lim (HeartSaVioR)
> >>>
> >>> 2016년 10월 4일 (화) 오후 7:08, Γιώργος Θεοδωράκης <giwrgosrtheod@gmail.com
> >님이
> >>> 작성:
> >>>
> >>> > I think I did as you said:
> >>> >
> >>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
> >>> c/main/java/calcite/VolcanoTester.java
> >>> >
> >>> > and I get for every query I use:
> >>> > Exception in thread "main"
> >>> > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
> >>> > [rel#10:Subset#2.NONE.[]] could not be implemented; planner state:
> >>> > Root: rel#10:Subset#2.NONE.[]
> >>> > Original rel:
> >>> > ....
> >>> > at
> >>> >
> >>> > org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplac
> >>> er.visit(RelSubset.java:443)
> >>> > at
> >>> >
> >>> > org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(
> >>> RelSubset.java:293)
> >>> > at
> >>> >
> >>> > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(V
> >>> olcanoPlanner.java:835)
> >>> > at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
> >>> s.java:334)
> >>> > at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl
> >>> .java:308)
> >>> > at calcite.VolcanoTester.main(VolcanoTester.java:77)
> >>> >
> >>> > My table's is defined here :
> >>> >
> >>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
> >>> c/main/java/calcite/utils/OrdersTableFactory.java
> >>> >
> >>> >
> >>> > Thank you for your time,
> >>> > George
> >>> >
> >>> >
> >>> > 2016-10-04 2:38 GMT+03:00 Jordan Halterman <
> jordan.halterman@gmail.com
> >>> >:
> >>> >
> >>> > > The link you provided is a pretty good example. Build a
> >>> FrameworkConfig
> >>> > > with your schema, parser config, and other information, and use it
> to
> >>> > > create a Planner. That Planner uses a VolcanoPlanner internally.
> >>> What's
> >>> > > missing from that particular example is just the addition of
> >>> programs.
> >>> > > Programs are effectively sets of rules you will use to optimize
> your
> >>> > query.
> >>> > > So, to add your FilterProjectTransposeRule to the planner, call
> this
> >>> when
> >>> > > building your FrameworkConfig:
> >>> > >
> >>> > > .programs(Programs.ofRules(FilterProjectTransposeRule.INSTANCE))
> >>> > >
> >>> > > That adds your program(s) to the set of programs in the planner,
> and
> >>> > those
> >>> > > programs can be accessed to optimize the query. Use the planner to
> >>> > parse()
> >>> > > your query, validate() your query, and then convert() your query
> >>> into a
> >>> > > logical plan. Then call...
> >>> > >
> >>> > > RelTraitSet traitSet = planner.emptyTraitSet().replac
> >>> e(Convention.NONE);
> >>> > > planner.transform(0. traitSet, logicalPlan);
> >>> > >
> >>> > > to apply the rules you added to the configuration. That should use
> >>> the
> >>> > > VolcanoPlanner to apply the rules you added in your Program. The
> >>> trait
> >>> > set
> >>> > > that's passed to that method is the required output trait set. So,
> >>> if you
> >>> > > wanted to convert the logical plan into some physical convention,
> >>> you'd
> >>> > > pass your physical convention instead of Convention.NONE. I can
> >>> respond
> >>> > > with a full example if you need it in a bit. I just don't have the
> >>> > capacity
> >>> > > to write it ATM.
> >>> > >
> >>> > > On Mon, Oct 3, 2016 at 8:51 AM, Γιώργος Θεοδωράκης <
> >>> > > giwrgosrtheod@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Hi,
> >>> > > >
> >>> > > > I want to parse an Sql query and transform it to an optimized
> >>> > relational
> >>> > > > plan (not convert it to physical !!) using calcite rules based on
> >>> my
> >>> > > > database schema and metadata. Right now, the only helpful
> example I
> >>> > have
> >>> > > > found for my purpose is taken from
> >>> > > > https://github.com/milinda/samza-sql/blob/master/samza-
> >>> > > >
> >>> > sql-planner/src/main/java/org/apache/samza/sql/planner/Query
> >>> Planner.java
> >>> > > > ,
> >>> > > > in which a simple Planner is used for parsing and validating Sql
> >>> and a
> >>> > > > HepPlanner is used for searching for an optimized plan based on
> >>> > imported
> >>> > > > rules.
> >>> > > >
> >>> > > > Is there any way to use in my case the VolcanoPlanner? The only
> >>> > examples
> >>> > > I
> >>> > > > have seen so far from the test classes suggest that it should be
> >>> used
> >>> > for
> >>> > > > converting relational expressions to physical ones. How can I
> make
> >>> the
> >>> > > > Volcano Planner "see" my SchemaPlus schema ,when I can only
> define
> >>> > > > RelOptSchema? Can someone provide me with a complete example of
> >>> using
> >>> > > > Volcano Planner and adding rules, such
> >>> > > > as FilterProjectTransposeRule.INSTANCE?
> >>> > > >
> >>> > > > Thanks in advance,
> >>> > > > George
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: How to use VolcanoPlanner

Posted by Julian Hyde <jh...@apache.org>.
There is a simple form of projection push-down, namely column pruning aka field trimming. If you compute x + y from a table that has columns (x, y, z), then it will prune away z, and project only (x, y). Column pruning doesn’t fit into the Volcano model (or in fact into the general transformation rule model) because it returns RelNodes that have a different row-type.  So, use RelFieldTrimmer for that. Most production optimizers will trim early in the planning process and probably late in the planning process also.

Your point about cost-based projection push-down is well taken. The solution, I think, is to use more realistic cost functions. A project that just doesn’t compute expressions, just projects columns, should have virtually zero cost. (On top of a table scan, in a typical implementation, it actually has a negative cost: it makes the scan cheaper by reading fewer disk blocks, and returning less data.)

Other operators should also have a significantly lower cost if project has been pushed down. For example, the cost of a Sort implemented as an external merge sort should depend on log of the number of BYTES processed, not just log of the number of ROWS processed. It is the number of bytes that determines how many times the data needs to be partitioned before we get a sort run small enough to fit into available memory.

Feel free to suggest improvements to the built-in (default) cost functions. But also don’t be afraid to use “no brainer” transformations such as RelFieldTrimmer at key points in the planning process.

Julian


> On Nov 1, 2016, at 11:09 AM, Colm McHugh <co...@gmail.com> wrote:
> 
> What errors did you get? It should be possible to use both Volcano and hep
> when query planning (Drill does this, and possibly others).
> It superficially sounds like applying a heuristics pass that includes the
> project pushdown (and any other rules you may want to apply)
> after the volcano pass should work.
> 
> On Tue, Nov 1, 2016 at 4:05 AM, Γιώργος Θεοδωράκης <gi...@gmail.com>
> wrote:
> 
>> I am wondering if is it possible to push down projections in Volcano
>> generally. With the cost model of Volcano, a projection adds rows and cpu
>> cost and it can't be chosen. For example  for the next query:
>> 
>> "select s.products.productid "
>> + "from s.products,s.orders "
>> + "where s.orders.productid = s.products.productid  "
>> + "and units>5 "
>> 
>> , the optimized plan I get from Volcano is this
>> 
>> EnumerableProject(productid=[$0]): rowcount = 5.3999999999999995,
>> cumulative cost = {51.555956815368326 rows, 37.4 cpu, 0.0 io}, id = 86
>>  EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
>> 5.3999999999999995, cumulative cost = {46.15595681536833 rows, 32.0 cpu,
>> 0.0 io}, id = 85
>>    EnumerableTableScan(table=[[s, products]]): rowcount = 6.0, cumulative
>> cost = {6.0 rows, 7.0 cpu, 0.0 io}, id = 40
>>    EnumerableFilter(condition=[>($2, 5)]): rowcount = 6.0, cumulative
>> cost
>> = {18.0 rows, 25.0 cpu, 0.0 io}, id = 84
>>      EnumerableTableScan(table=[[s, orders]]): rowcount = 12.0, cumulative
>> cost = {12.0 rows, 13.0 cpu, 0.0 io}, id = 59
>> 
>> However the plan I would like to get is the one with projections pushed
>> down, which has bigger cumulative cost as we would have two more
>> projections at the source of inputs (with additional rowCount and cpu cost)
>> with Volcano's cost model, but smaller in most cases as we reduce the
>> memory needed(which is not taken in consideration here).
>> 
>> Are there any suggestions? At first I used hepPlanner, but I didn't get
>> optimized plans for join order with the built-in
>> rules(using JoinPushThroughJoinRule.RIGHT and JoinPushThroughJoinRule.LEFT
>> never stops, and using MultiJoinRules doesn't do anything). Then, I tried
>> to use VolcanoPlanner and I got optimized plans for join order with no
>> pushed-down projections. I also tried to use a combination of both planners
>> but I got errors. What should I do?
>> 
>> 2016-10-27 16:00 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
>> 
>>> I fixed the second error by changing my Statistics to:
>>> 
>>> public Statistic getStatistic() {
>>> int rowCount = rows.size();
>>> return Statistics.of(rowCount, ImmutableList.<ImmutableBitSet>of());
>>> }
>>> 
>>> Any suggestions of a better solution are welcome. The project push-down
>>> problem still exists...
>>> 
>>> 2016-10-27 15:42 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
>>> 
>>>> Hi,
>>>> I was missing the implementations of operators, and I added the built in
>>>> EnumerableRules until I create my own, in order to fix it. However, the
>>>> plan I get from Volcano Optimizer is different from the one I get from
>>>> HepPlanner, although I use the same rules. My problem is about
>> Projection
>>>> push-down. The hepPlanner pushes Projections to the bottom of the
>> RelNode
>>>> tree, and VolcanoPlanner keeps them always at the top (doesn't push them
>>>> through joins). I use these rules in both :
>>>> 
>>>>       ProjectRemoveRule.INSTANCE,
>>>>       ProjectJoinTransposeRule.INSTANCE,
>>>>        ProjectFilterTransposeRule.INSTANCE, /*it is better to use
>>>> filter first and then project*/
>>>>       ProjectTableScanRule.INSTANCE,
>>>>       ProjectWindowTransposeRule.INSTANCE,
>>>>       ProjectMergeRule.INSTANCE
>>>> 
>>>> and
>>>>                            EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE,
>>>>        EnumerableRules.ENUMERABLE_PROJECT_RULE,
>>>>                            ...
>>>> 
>>>> Finally , when trying to use aggregate I get this error:
>>>> 
>>>> Exception in thread "main" java.lang.AssertionError: Internal error:
>>>> Error while applying rule EnumerableTableScanRule, args
>>>> [rel#4:LogicalTableScan.NONE.[](table=[s, orders])]
>>>> at org.apache.calcite.util.Util.newInternal(Util.java:792)
>>>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>>>> VolcanoRuleCall.java:236)
>>>> at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
>>>> VolcanoPlanner.java:819)
>>>> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
>>>> s.java:334)
>>>> at org.apache.calcite.prepare.PlannerImpl.transform(
>> PlannerImpl.java:308)
>>>> at calcite.VolcanoTester.main(VolcanoTester.java:106)
>>>> Caused by: java.lang.AssertionError: Internal error: Error occurred
>> while
>>>> applying rule EnumerableTableScanRule
>>>> at org.apache.calcite.util.Util.newInternal(Util.java:792)
>>>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
>>>> VolcanoRuleCall.java:148)
>>>> at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRul
>>>> eCall.java:225)
>>>> at org.apache.calcite.rel.convert.ConverterRule.onMatch(Convert
>>>> erRule.java:117)
>>>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
>>>> VolcanoRuleCall.java:213)
>>>> ... 4 more
>>>> Caused by: java.lang.NullPointerException
>>>> at org.apache.calcite.schema.Statistics$2.isKey(Statistics.java:70)
>>>> at org.apache.calcite.prepare.RelOptTableImpl.isKey(RelOptTable
>>>> Impl.java:288)
>>>> at org.apache.calcite.rel.metadata.RelMdColumnUniqueness.areCol
>>>> umnsUnique(RelMdColumnUniqueness.java:76)
>>>> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique_$(Unknown
>>>> Source)
>>>> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique(Unknown
>>>> Source)
>>>> at org.apache.calcite.rel.metadata.RelMetadataQuery.areColumnsU
>>>> nique(RelMetadataQuery.java:461)
>>>> at org.apache.calcite.rel.metadata.RelMdUtil.areColumnsDefinite
>>>> lyUnique(RelMdUtil.java:216)
>>>> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
>>>> tinctRowCount(RelMdDistinctRowCount.java:75)
>>>> at GeneratedMetadataHandler_DistinctRowCount.
>> getDistinctRowCount_$(Unknown
>>>> Source)
>>>> at GeneratedMetadataHandler_DistinctRowCount.
>> getDistinctRowCount(Unknown
>>>> Source)
>>>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
>>>> RowCount(RelMetadataQuery.java:700)
>>>> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
>>>> tinctRowCount(RelMdDistinctRowCount.java:292)
>>>> at GeneratedMetadataHandler_DistinctRowCount.
>> getDistinctRowCount_$(Unknown
>>>> Source)
>>>> at GeneratedMetadataHandler_DistinctRowCount.
>> getDistinctRowCount(Unknown
>>>> Source)
>>>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
>>>> RowCount(RelMetadataQuery.java:700)
>>>> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
>>>> tinctRowCount(RelMdDistinctRowCount.java:138)
>>>> at GeneratedMetadataHandler_DistinctRowCount.
>> getDistinctRowCount_$(Unknown
>>>> Source)
>>>> at GeneratedMetadataHandler_DistinctRowCount.
>> getDistinctRowCount(Unknown
>>>> Source)
>>>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
>>>> RowCount(RelMetadataQuery.java:700)
>>>> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
>>>> tinctRowCount(RelMdDistinctRowCount.java:292)
>>>> at GeneratedMetadataHandler_DistinctRowCount.
>> getDistinctRowCount_$(Unknown
>>>> Source)
>>>> at GeneratedMetadataHandler_DistinctRowCount.
>> getDistinctRowCount(Unknown
>>>> Source)
>>>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
>>>> RowCount(RelMetadataQuery.java:700)
>>>> at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(
>>>> RelMdRowCount.java:194)
>>>> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
>>>> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
>>>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount
>>>> (RelMetadataQuery.java:201)
>>>> at org.apache.calcite.rel.core.Aggregate.computeSelfCost(Aggreg
>>>> ate.java:304)
>>>> at org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.
>>>> getNonCumulativeCost(RelMdPercentageOriginalRows.java:162)
>>>> at GeneratedMetadataHandler_NonCumulativeCost.
>> getNonCumulativeCost_$(Unknown
>>>> Source)
>>>> at GeneratedMetadataHandler_NonCumulativeCost.
>> getNonCumulativeCost(Unknown
>>>> Source)
>>>> at org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumul
>>>> ativeCost(RelMetadataQuery.java:258)
>>>> at org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(
>>>> VolcanoPlanner.java:1128)
>>>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>>>> vements0(RelSubset.java:336)
>>>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>>>> vements(RelSubset.java:319)
>>>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>>>> vements0(RelSubset.java:348)
>>>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>>>> vements(RelSubset.java:319)
>>>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>>>> vements0(RelSubset.java:348)
>>>> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
>>>> vements(RelSubset.java:319)
>>>> at org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(
>>>> VolcanoPlanner.java:1830)
>>>> at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(
>>>> VolcanoPlanner.java:1766)
>>>> at org.apache.calcite.plan.volcano.VolcanoPlanner.register(
>>>> VolcanoPlanner.java:1032)
>>>> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
>>>> red(VolcanoPlanner.java:1052)
>>>> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
>>>> red(VolcanoPlanner.java:1942)
>>>> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
>>>> VolcanoRuleCall.java:136)
>>>> ... 7 more
>>>> 
>>>> I define the Statistics in the tables I use like this:
>>>> public Statistic getStatistic() {
>>>> int rowCount = rows.size();
>>>> return Statistics.of(rowCount, null); //add List<ImmutableBitSet>
>>>> }
>>>> 
>>>> 
>>>> Thanks,
>>>> George
>>>> 
>>>> 2016-10-16 7:28 GMT+03:00 Jungtaek Lim <ka...@gmail.com>:
>>>> 
>>>>> Hi George,
>>>>> 
>>>>> This patch is ported version (with small fixes) of Milinda's samza-sql
>>>>> implementation for Storm SQL.
>>>>> https://github.com/apache/storm/pull/1736
>>>>> 
>>>>> In this patch I removed adding HepPlanner and just rely on Volcano
>>>>> Planner
>>>>> (so the patch may be the closer thing what you want).
>>>>> For now I also remove code regarding metadata since I'm not clear on
>> how
>>>>> it
>>>>> works and what it helps, but I'll re-address once I can find its usage
>>>>> and
>>>>> benefits.
>>>>> 
>>>>> Hope this helps.
>>>>> 
>>>>> Thanks,
>>>>> Jungtaek Lim (HeartSaVioR)
>>>>> 
>>>>> 2016년 10월 4일 (화) 오후 7:08, Γιώργος Θεοδωράκης <giwrgosrtheod@gmail.com
>>> 님이
>>>>> 작성:
>>>>> 
>>>>>> I think I did as you said:
>>>>>> 
>>>>>> https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
>>>>> c/main/java/calcite/VolcanoTester.java
>>>>>> 
>>>>>> and I get for every query I use:
>>>>>> Exception in thread "main"
>>>>>> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
>>>>>> [rel#10:Subset#2.NONE.[]] could not be implemented; planner state:
>>>>>> Root: rel#10:Subset#2.NONE.[]
>>>>>> Original rel:
>>>>>> ....
>>>>>> at
>>>>>> 
>>>>>> org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplac
>>>>> er.visit(RelSubset.java:443)
>>>>>> at
>>>>>> 
>>>>>> org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(
>>>>> RelSubset.java:293)
>>>>>> at
>>>>>> 
>>>>>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(V
>>>>> olcanoPlanner.java:835)
>>>>>> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
>>>>> s.java:334)
>>>>>> at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl
>>>>> .java:308)
>>>>>> at calcite.VolcanoTester.main(VolcanoTester.java:77)
>>>>>> 
>>>>>> My table's is defined here :
>>>>>> 
>>>>>> https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
>>>>> c/main/java/calcite/utils/OrdersTableFactory.java
>>>>>> 
>>>>>> 
>>>>>> Thank you for your time,
>>>>>> George
>>>>>> 
>>>>>> 
>>>>>> 2016-10-04 2:38 GMT+03:00 Jordan Halterman <
>> jordan.halterman@gmail.com
>>>>>> :
>>>>>> 
>>>>>>> The link you provided is a pretty good example. Build a
>>>>> FrameworkConfig
>>>>>>> with your schema, parser config, and other information, and use it
>> to
>>>>>>> create a Planner. That Planner uses a VolcanoPlanner internally.
>>>>> What's
>>>>>>> missing from that particular example is just the addition of
>>>>> programs.
>>>>>>> Programs are effectively sets of rules you will use to optimize
>> your
>>>>>> query.
>>>>>>> So, to add your FilterProjectTransposeRule to the planner, call
>> this
>>>>> when
>>>>>>> building your FrameworkConfig:
>>>>>>> 
>>>>>>> .programs(Programs.ofRules(FilterProjectTransposeRule.INSTANCE))
>>>>>>> 
>>>>>>> That adds your program(s) to the set of programs in the planner,
>> and
>>>>>> those
>>>>>>> programs can be accessed to optimize the query. Use the planner to
>>>>>> parse()
>>>>>>> your query, validate() your query, and then convert() your query
>>>>> into a
>>>>>>> logical plan. Then call...
>>>>>>> 
>>>>>>> RelTraitSet traitSet = planner.emptyTraitSet().replac
>>>>> e(Convention.NONE);
>>>>>>> planner.transform(0. traitSet, logicalPlan);
>>>>>>> 
>>>>>>> to apply the rules you added to the configuration. That should use
>>>>> the
>>>>>>> VolcanoPlanner to apply the rules you added in your Program. The
>>>>> trait
>>>>>> set
>>>>>>> that's passed to that method is the required output trait set. So,
>>>>> if you
>>>>>>> wanted to convert the logical plan into some physical convention,
>>>>> you'd
>>>>>>> pass your physical convention instead of Convention.NONE. I can
>>>>> respond
>>>>>>> with a full example if you need it in a bit. I just don't have the
>>>>>> capacity
>>>>>>> to write it ATM.
>>>>>>> 
>>>>>>> On Mon, Oct 3, 2016 at 8:51 AM, Γιώργος Θεοδωράκης <
>>>>>>> giwrgosrtheod@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I want to parse an Sql query and transform it to an optimized
>>>>>> relational
>>>>>>>> plan (not convert it to physical !!) using calcite rules based on
>>>>> my
>>>>>>>> database schema and metadata. Right now, the only helpful
>> example I
>>>>>> have
>>>>>>>> found for my purpose is taken from
>>>>>>>> https://github.com/milinda/samza-sql/blob/master/samza-
>>>>>>>> 
>>>>>> sql-planner/src/main/java/org/apache/samza/sql/planner/Query
>>>>> Planner.java
>>>>>>>> ,
>>>>>>>> in which a simple Planner is used for parsing and validating Sql
>>>>> and a
>>>>>>>> HepPlanner is used for searching for an optimized plan based on
>>>>>> imported
>>>>>>>> rules.
>>>>>>>> 
>>>>>>>> Is there any way to use in my case the VolcanoPlanner? The only
>>>>>> examples
>>>>>>> I
>>>>>>>> have seen so far from the test classes suggest that it should be
>>>>> used
>>>>>> for
>>>>>>>> converting relational expressions to physical ones. How can I
>> make
>>>>> the
>>>>>>>> Volcano Planner "see" my SchemaPlus schema ,when I can only
>> define
>>>>>>>> RelOptSchema? Can someone provide me with a complete example of
>>>>> using
>>>>>>>> Volcano Planner and adding rules, such
>>>>>>>> as FilterProjectTransposeRule.INSTANCE?
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> George
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 


Re: How to use VolcanoPlanner

Posted by Γιώργος Θεοδωράκης <gi...@gmail.com>.
I fixed the errors (they occurred  because of the way I added
EnumerableRules in Volcano).

I have implemented something like what you suggested:
1)I use VolcanoPlanner (with both simple and EnumerableRules), and get a
plan with EnumerableRules. As I have found by the hard way, I cannot get a
plan with LogicalOperators (this is what I would prefer) from
VolcanoPlanner, and I have to use some Convention Rules and Traits.
2)I use hepPlanner with two custom Rules I created, which pushdown
EnumerableProjections through EnumerableJoins (this is what I wanted). The
problem is that, my final plan has every rule as EnumerableOperator, except
from LogicalProjections because of my implementation of the custom rules.
The result is fine for now, but I would try to fix it.

 If I am missing something, I look forward to better suggestions.

Thanks,
George

2016-11-01 20:09 GMT+02:00 Colm McHugh <co...@gmail.com>:

> What errors did you get? It should be possible to use both Volcano and hep
> when query planning (Drill does this, and possibly others).
> It superficially sounds like applying a heuristics pass that includes the
> project pushdown (and any other rules you may want to apply)
> after the volcano pass should work.
>
> On Tue, Nov 1, 2016 at 4:05 AM, Γιώργος Θεοδωράκης <
> giwrgosrtheod@gmail.com>
> wrote:
>
> > I am wondering if is it possible to push down projections in Volcano
> > generally. With the cost model of Volcano, a projection adds rows and cpu
> > cost and it can't be chosen. For example  for the next query:
> >
> > "select s.products.productid "
> > + "from s.products,s.orders "
> > + "where s.orders.productid = s.products.productid  "
> > + "and units>5 "
> >
> > , the optimized plan I get from Volcano is this
> >
> > EnumerableProject(productid=[$0]): rowcount = 5.3999999999999995,
> > cumulative cost = {51.555956815368326 rows, 37.4 cpu, 0.0 io}, id = 86
> >   EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
> > 5.3999999999999995, cumulative cost = {46.15595681536833 rows, 32.0 cpu,
> > 0.0 io}, id = 85
> >     EnumerableTableScan(table=[[s, products]]): rowcount = 6.0,
> cumulative
> > cost = {6.0 rows, 7.0 cpu, 0.0 io}, id = 40
> >     EnumerableFilter(condition=[>($2, 5)]): rowcount = 6.0, cumulative
> > cost
> > = {18.0 rows, 25.0 cpu, 0.0 io}, id = 84
> >       EnumerableTableScan(table=[[s, orders]]): rowcount = 12.0,
> cumulative
> > cost = {12.0 rows, 13.0 cpu, 0.0 io}, id = 59
> >
> > However the plan I would like to get is the one with projections pushed
> > down, which has bigger cumulative cost as we would have two more
> > projections at the source of inputs (with additional rowCount and cpu
> cost)
> > with Volcano's cost model, but smaller in most cases as we reduce the
> > memory needed(which is not taken in consideration here).
> >
> > Are there any suggestions? At first I used hepPlanner, but I didn't get
> > optimized plans for join order with the built-in
> > rules(using JoinPushThroughJoinRule.RIGHT and
> JoinPushThroughJoinRule.LEFT
> > never stops, and using MultiJoinRules doesn't do anything). Then, I tried
> > to use VolcanoPlanner and I got optimized plans for join order with no
> > pushed-down projections. I also tried to use a combination of both
> planners
> > but I got errors. What should I do?
> >
> > 2016-10-27 16:00 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
> >
> > > I fixed the second error by changing my Statistics to:
> > >
> > > public Statistic getStatistic() {
> > > int rowCount = rows.size();
> > > return Statistics.of(rowCount, ImmutableList.<ImmutableBitSet>of());
> > > }
> > >
> > > Any suggestions of a better solution are welcome. The project push-down
> > > problem still exists...
> > >
> > > 2016-10-27 15:42 GMT+03:00 Γιώργος Θεοδωράκης <giwrgosrtheod@gmail.com
> >:
> > >
> > >> Hi,
> > >> I was missing the implementations of operators, and I added the built
> in
> > >> EnumerableRules until I create my own, in order to fix it. However,
> the
> > >> plan I get from Volcano Optimizer is different from the one I get from
> > >> HepPlanner, although I use the same rules. My problem is about
> > Projection
> > >> push-down. The hepPlanner pushes Projections to the bottom of the
> > RelNode
> > >> tree, and VolcanoPlanner keeps them always at the top (doesn't push
> them
> > >> through joins). I use these rules in both :
> > >>
> > >>        ProjectRemoveRule.INSTANCE,
> > >>        ProjectJoinTransposeRule.INSTANCE,
> > >>         ProjectFilterTransposeRule.INSTANCE, /*it is better to use
> > >> filter first and then project*/
> > >>        ProjectTableScanRule.INSTANCE,
> > >>        ProjectWindowTransposeRule.INSTANCE,
> > >>        ProjectMergeRule.INSTANCE
> > >>
> > >> and
> > >>                             EnumerableRules.ENUMERABLE_
> TABLE_SCAN_RULE,
> > >>         EnumerableRules.ENUMERABLE_PROJECT_RULE,
> > >>                             ...
> > >>
> > >> Finally , when trying to use aggregate I get this error:
> > >>
> > >> Exception in thread "main" java.lang.AssertionError: Internal error:
> > >> Error while applying rule EnumerableTableScanRule, args
> > >> [rel#4:LogicalTableScan.NONE.[](table=[s, orders])]
> > >> at org.apache.calcite.util.Util.newInternal(Util.java:792)
> > >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> > >> VolcanoRuleCall.java:236)
> > >> at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
> > >> VolcanoPlanner.java:819)
> > >> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
> > >> s.java:334)
> > >> at org.apache.calcite.prepare.PlannerImpl.transform(
> > PlannerImpl.java:308)
> > >> at calcite.VolcanoTester.main(VolcanoTester.java:106)
> > >> Caused by: java.lang.AssertionError: Internal error: Error occurred
> > while
> > >> applying rule EnumerableTableScanRule
> > >> at org.apache.calcite.util.Util.newInternal(Util.java:792)
> > >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
> > >> VolcanoRuleCall.java:148)
> > >> at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRul
> > >> eCall.java:225)
> > >> at org.apache.calcite.rel.convert.ConverterRule.onMatch(Convert
> > >> erRule.java:117)
> > >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> > >> VolcanoRuleCall.java:213)
> > >> ... 4 more
> > >> Caused by: java.lang.NullPointerException
> > >> at org.apache.calcite.schema.Statistics$2.isKey(Statistics.java:70)
> > >> at org.apache.calcite.prepare.RelOptTableImpl.isKey(RelOptTable
> > >> Impl.java:288)
> > >> at org.apache.calcite.rel.metadata.RelMdColumnUniqueness.areCol
> > >> umnsUnique(RelMdColumnUniqueness.java:76)
> > >> at GeneratedMetadataHandler_ColumnUniqueness.
> areColumnsUnique_$(Unknown
> > >> Source)
> > >> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique(Unknown
> > >> Source)
> > >> at org.apache.calcite.rel.metadata.RelMetadataQuery.areColumnsU
> > >> nique(RelMetadataQuery.java:461)
> > >> at org.apache.calcite.rel.metadata.RelMdUtil.areColumnsDefinite
> > >> lyUnique(RelMdUtil.java:216)
> > >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> > >> tinctRowCount(RelMdDistinctRowCount.java:75)
> > >> at GeneratedMetadataHandler_DistinctRowCount.
> > getDistinctRowCount_$(Unknown
> > >> Source)
> > >> at GeneratedMetadataHandler_DistinctRowCount.
> > getDistinctRowCount(Unknown
> > >> Source)
> > >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> > >> RowCount(RelMetadataQuery.java:700)
> > >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> > >> tinctRowCount(RelMdDistinctRowCount.java:292)
> > >> at GeneratedMetadataHandler_DistinctRowCount.
> > getDistinctRowCount_$(Unknown
> > >> Source)
> > >> at GeneratedMetadataHandler_DistinctRowCount.
> > getDistinctRowCount(Unknown
> > >> Source)
> > >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> > >> RowCount(RelMetadataQuery.java:700)
> > >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> > >> tinctRowCount(RelMdDistinctRowCount.java:138)
> > >> at GeneratedMetadataHandler_DistinctRowCount.
> > getDistinctRowCount_$(Unknown
> > >> Source)
> > >> at GeneratedMetadataHandler_DistinctRowCount.
> > getDistinctRowCount(Unknown
> > >> Source)
> > >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> > >> RowCount(RelMetadataQuery.java:700)
> > >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> > >> tinctRowCount(RelMdDistinctRowCount.java:292)
> > >> at GeneratedMetadataHandler_DistinctRowCount.
> > getDistinctRowCount_$(Unknown
> > >> Source)
> > >> at GeneratedMetadataHandler_DistinctRowCount.
> > getDistinctRowCount(Unknown
> > >> Source)
> > >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> > >> RowCount(RelMetadataQuery.java:700)
> > >> at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(
> > >> RelMdRowCount.java:194)
> > >> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
> > >> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> > >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount
> > >> (RelMetadataQuery.java:201)
> > >> at org.apache.calcite.rel.core.Aggregate.computeSelfCost(Aggreg
> > >> ate.java:304)
> > >> at org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.
> > >> getNonCumulativeCost(RelMdPercentageOriginalRows.java:162)
> > >> at GeneratedMetadataHandler_NonCumulativeCost.
> > getNonCumulativeCost_$(Unknown
> > >> Source)
> > >> at GeneratedMetadataHandler_NonCumulativeCost.
> > getNonCumulativeCost(Unknown
> > >> Source)
> > >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumul
> > >> ativeCost(RelMetadataQuery.java:258)
> > >> at org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(
> > >> VolcanoPlanner.java:1128)
> > >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> > >> vements0(RelSubset.java:336)
> > >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> > >> vements(RelSubset.java:319)
> > >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> > >> vements0(RelSubset.java:348)
> > >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> > >> vements(RelSubset.java:319)
> > >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> > >> vements0(RelSubset.java:348)
> > >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> > >> vements(RelSubset.java:319)
> > >> at org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(
> > >> VolcanoPlanner.java:1830)
> > >> at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(
> > >> VolcanoPlanner.java:1766)
> > >> at org.apache.calcite.plan.volcano.VolcanoPlanner.register(
> > >> VolcanoPlanner.java:1032)
> > >> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
> > >> red(VolcanoPlanner.java:1052)
> > >> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
> > >> red(VolcanoPlanner.java:1942)
> > >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
> > >> VolcanoRuleCall.java:136)
> > >> ... 7 more
> > >>
> > >> I define the Statistics in the tables I use like this:
> > >> public Statistic getStatistic() {
> > >> int rowCount = rows.size();
> > >> return Statistics.of(rowCount, null); //add List<ImmutableBitSet>
> > >> }
> > >>
> > >>
> > >> Thanks,
> > >> George
> > >>
> > >> 2016-10-16 7:28 GMT+03:00 Jungtaek Lim <ka...@gmail.com>:
> > >>
> > >>> Hi George,
> > >>>
> > >>> This patch is ported version (with small fixes) of Milinda's
> samza-sql
> > >>> implementation for Storm SQL.
> > >>> https://github.com/apache/storm/pull/1736
> > >>>
> > >>> In this patch I removed adding HepPlanner and just rely on Volcano
> > >>> Planner
> > >>> (so the patch may be the closer thing what you want).
> > >>> For now I also remove code regarding metadata since I'm not clear on
> > how
> > >>> it
> > >>> works and what it helps, but I'll re-address once I can find its
> usage
> > >>> and
> > >>> benefits.
> > >>>
> > >>> Hope this helps.
> > >>>
> > >>> Thanks,
> > >>> Jungtaek Lim (HeartSaVioR)
> > >>>
> > >>> 2016년 10월 4일 (화) 오후 7:08, Γιώργος Θεοδωράκης <
> giwrgosrtheod@gmail.com
> > >님이
> > >>> 작성:
> > >>>
> > >>> > I think I did as you said:
> > >>> >
> > >>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
> > >>> c/main/java/calcite/VolcanoTester.java
> > >>> >
> > >>> > and I get for every query I use:
> > >>> > Exception in thread "main"
> > >>> > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
> > >>> > [rel#10:Subset#2.NONE.[]] could not be implemented; planner state:
> > >>> > Root: rel#10:Subset#2.NONE.[]
> > >>> > Original rel:
> > >>> > ....
> > >>> > at
> > >>> >
> > >>> > org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplac
> > >>> er.visit(RelSubset.java:443)
> > >>> > at
> > >>> >
> > >>> > org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(
> > >>> RelSubset.java:293)
> > >>> > at
> > >>> >
> > >>> > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(V
> > >>> olcanoPlanner.java:835)
> > >>> > at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
> > >>> s.java:334)
> > >>> > at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl
> > >>> .java:308)
> > >>> > at calcite.VolcanoTester.main(VolcanoTester.java:77)
> > >>> >
> > >>> > My table's is defined here :
> > >>> >
> > >>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
> > >>> c/main/java/calcite/utils/OrdersTableFactory.java
> > >>> >
> > >>> >
> > >>> > Thank you for your time,
> > >>> > George
> > >>> >
> > >>> >
> > >>> > 2016-10-04 2:38 GMT+03:00 Jordan Halterman <
> > jordan.halterman@gmail.com
> > >>> >:
> > >>> >
> > >>> > > The link you provided is a pretty good example. Build a
> > >>> FrameworkConfig
> > >>> > > with your schema, parser config, and other information, and use
> it
> > to
> > >>> > > create a Planner. That Planner uses a VolcanoPlanner internally.
> > >>> What's
> > >>> > > missing from that particular example is just the addition of
> > >>> programs.
> > >>> > > Programs are effectively sets of rules you will use to optimize
> > your
> > >>> > query.
> > >>> > > So, to add your FilterProjectTransposeRule to the planner, call
> > this
> > >>> when
> > >>> > > building your FrameworkConfig:
> > >>> > >
> > >>> > > .programs(Programs.ofRules(FilterProjectTransposeRule.INSTANCE))
> > >>> > >
> > >>> > > That adds your program(s) to the set of programs in the planner,
> > and
> > >>> > those
> > >>> > > programs can be accessed to optimize the query. Use the planner
> to
> > >>> > parse()
> > >>> > > your query, validate() your query, and then convert() your query
> > >>> into a
> > >>> > > logical plan. Then call...
> > >>> > >
> > >>> > > RelTraitSet traitSet = planner.emptyTraitSet().replac
> > >>> e(Convention.NONE);
> > >>> > > planner.transform(0. traitSet, logicalPlan);
> > >>> > >
> > >>> > > to apply the rules you added to the configuration. That should
> use
> > >>> the
> > >>> > > VolcanoPlanner to apply the rules you added in your Program. The
> > >>> trait
> > >>> > set
> > >>> > > that's passed to that method is the required output trait set.
> So,
> > >>> if you
> > >>> > > wanted to convert the logical plan into some physical convention,
> > >>> you'd
> > >>> > > pass your physical convention instead of Convention.NONE. I can
> > >>> respond
> > >>> > > with a full example if you need it in a bit. I just don't have
> the
> > >>> > capacity
> > >>> > > to write it ATM.
> > >>> > >
> > >>> > > On Mon, Oct 3, 2016 at 8:51 AM, Γιώργος Θεοδωράκης <
> > >>> > > giwrgosrtheod@gmail.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > Hi,
> > >>> > > >
> > >>> > > > I want to parse an Sql query and transform it to an optimized
> > >>> > relational
> > >>> > > > plan (not convert it to physical !!) using calcite rules based
> on
> > >>> my
> > >>> > > > database schema and metadata. Right now, the only helpful
> > example I
> > >>> > have
> > >>> > > > found for my purpose is taken from
> > >>> > > > https://github.com/milinda/samza-sql/blob/master/samza-
> > >>> > > >
> > >>> > sql-planner/src/main/java/org/apache/samza/sql/planner/Query
> > >>> Planner.java
> > >>> > > > ,
> > >>> > > > in which a simple Planner is used for parsing and validating
> Sql
> > >>> and a
> > >>> > > > HepPlanner is used for searching for an optimized plan based on
> > >>> > imported
> > >>> > > > rules.
> > >>> > > >
> > >>> > > > Is there any way to use in my case the VolcanoPlanner? The only
> > >>> > examples
> > >>> > > I
> > >>> > > > have seen so far from the test classes suggest that it should
> be
> > >>> used
> > >>> > for
> > >>> > > > converting relational expressions to physical ones. How can I
> > make
> > >>> the
> > >>> > > > Volcano Planner "see" my SchemaPlus schema ,when I can only
> > define
> > >>> > > > RelOptSchema? Can someone provide me with a complete example of
> > >>> using
> > >>> > > > Volcano Planner and adding rules, such
> > >>> > > > as FilterProjectTransposeRule.INSTANCE?
> > >>> > > >
> > >>> > > > Thanks in advance,
> > >>> > > > George
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: How to use VolcanoPlanner

Posted by Colm McHugh <co...@gmail.com>.
What errors did you get? It should be possible to use both Volcano and hep
when query planning (Drill does this, and possibly others).
It superficially sounds like applying a heuristics pass that includes the
project pushdown (and any other rules you may want to apply)
after the volcano pass should work.

On Tue, Nov 1, 2016 at 4:05 AM, Γιώργος Θεοδωράκης <gi...@gmail.com>
wrote:

> I am wondering if is it possible to push down projections in Volcano
> generally. With the cost model of Volcano, a projection adds rows and cpu
> cost and it can't be chosen. For example  for the next query:
>
> "select s.products.productid "
> + "from s.products,s.orders "
> + "where s.orders.productid = s.products.productid  "
> + "and units>5 "
>
> , the optimized plan I get from Volcano is this
>
> EnumerableProject(productid=[$0]): rowcount = 5.3999999999999995,
> cumulative cost = {51.555956815368326 rows, 37.4 cpu, 0.0 io}, id = 86
>   EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
> 5.3999999999999995, cumulative cost = {46.15595681536833 rows, 32.0 cpu,
> 0.0 io}, id = 85
>     EnumerableTableScan(table=[[s, products]]): rowcount = 6.0, cumulative
> cost = {6.0 rows, 7.0 cpu, 0.0 io}, id = 40
>     EnumerableFilter(condition=[>($2, 5)]): rowcount = 6.0, cumulative
> cost
> = {18.0 rows, 25.0 cpu, 0.0 io}, id = 84
>       EnumerableTableScan(table=[[s, orders]]): rowcount = 12.0, cumulative
> cost = {12.0 rows, 13.0 cpu, 0.0 io}, id = 59
>
> However the plan I would like to get is the one with projections pushed
> down, which has bigger cumulative cost as we would have two more
> projections at the source of inputs (with additional rowCount and cpu cost)
> with Volcano's cost model, but smaller in most cases as we reduce the
> memory needed(which is not taken in consideration here).
>
> Are there any suggestions? At first I used hepPlanner, but I didn't get
> optimized plans for join order with the built-in
> rules(using JoinPushThroughJoinRule.RIGHT and JoinPushThroughJoinRule.LEFT
> never stops, and using MultiJoinRules doesn't do anything). Then, I tried
> to use VolcanoPlanner and I got optimized plans for join order with no
> pushed-down projections. I also tried to use a combination of both planners
> but I got errors. What should I do?
>
> 2016-10-27 16:00 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
>
> > I fixed the second error by changing my Statistics to:
> >
> > public Statistic getStatistic() {
> > int rowCount = rows.size();
> > return Statistics.of(rowCount, ImmutableList.<ImmutableBitSet>of());
> > }
> >
> > Any suggestions of a better solution are welcome. The project push-down
> > problem still exists...
> >
> > 2016-10-27 15:42 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
> >
> >> Hi,
> >> I was missing the implementations of operators, and I added the built in
> >> EnumerableRules until I create my own, in order to fix it. However, the
> >> plan I get from Volcano Optimizer is different from the one I get from
> >> HepPlanner, although I use the same rules. My problem is about
> Projection
> >> push-down. The hepPlanner pushes Projections to the bottom of the
> RelNode
> >> tree, and VolcanoPlanner keeps them always at the top (doesn't push them
> >> through joins). I use these rules in both :
> >>
> >>        ProjectRemoveRule.INSTANCE,
> >>        ProjectJoinTransposeRule.INSTANCE,
> >>         ProjectFilterTransposeRule.INSTANCE, /*it is better to use
> >> filter first and then project*/
> >>        ProjectTableScanRule.INSTANCE,
> >>        ProjectWindowTransposeRule.INSTANCE,
> >>        ProjectMergeRule.INSTANCE
> >>
> >> and
> >>                             EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE,
> >>         EnumerableRules.ENUMERABLE_PROJECT_RULE,
> >>                             ...
> >>
> >> Finally , when trying to use aggregate I get this error:
> >>
> >> Exception in thread "main" java.lang.AssertionError: Internal error:
> >> Error while applying rule EnumerableTableScanRule, args
> >> [rel#4:LogicalTableScan.NONE.[](table=[s, orders])]
> >> at org.apache.calcite.util.Util.newInternal(Util.java:792)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >> VolcanoRuleCall.java:236)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
> >> VolcanoPlanner.java:819)
> >> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
> >> s.java:334)
> >> at org.apache.calcite.prepare.PlannerImpl.transform(
> PlannerImpl.java:308)
> >> at calcite.VolcanoTester.main(VolcanoTester.java:106)
> >> Caused by: java.lang.AssertionError: Internal error: Error occurred
> while
> >> applying rule EnumerableTableScanRule
> >> at org.apache.calcite.util.Util.newInternal(Util.java:792)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
> >> VolcanoRuleCall.java:148)
> >> at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRul
> >> eCall.java:225)
> >> at org.apache.calcite.rel.convert.ConverterRule.onMatch(Convert
> >> erRule.java:117)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >> VolcanoRuleCall.java:213)
> >> ... 4 more
> >> Caused by: java.lang.NullPointerException
> >> at org.apache.calcite.schema.Statistics$2.isKey(Statistics.java:70)
> >> at org.apache.calcite.prepare.RelOptTableImpl.isKey(RelOptTable
> >> Impl.java:288)
> >> at org.apache.calcite.rel.metadata.RelMdColumnUniqueness.areCol
> >> umnsUnique(RelMdColumnUniqueness.java:76)
> >> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.areColumnsU
> >> nique(RelMetadataQuery.java:461)
> >> at org.apache.calcite.rel.metadata.RelMdUtil.areColumnsDefinite
> >> lyUnique(RelMdUtil.java:216)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:75)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:292)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:138)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:292)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(
> >> RelMdRowCount.java:194)
> >> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
> >> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount
> >> (RelMetadataQuery.java:201)
> >> at org.apache.calcite.rel.core.Aggregate.computeSelfCost(Aggreg
> >> ate.java:304)
> >> at org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.
> >> getNonCumulativeCost(RelMdPercentageOriginalRows.java:162)
> >> at GeneratedMetadataHandler_NonCumulativeCost.
> getNonCumulativeCost_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_NonCumulativeCost.
> getNonCumulativeCost(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumul
> >> ativeCost(RelMetadataQuery.java:258)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(
> >> VolcanoPlanner.java:1128)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements0(RelSubset.java:336)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements(RelSubset.java:319)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements0(RelSubset.java:348)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements(RelSubset.java:319)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements0(RelSubset.java:348)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements(RelSubset.java:319)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(
> >> VolcanoPlanner.java:1830)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(
> >> VolcanoPlanner.java:1766)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.register(
> >> VolcanoPlanner.java:1032)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
> >> red(VolcanoPlanner.java:1052)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
> >> red(VolcanoPlanner.java:1942)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
> >> VolcanoRuleCall.java:136)
> >> ... 7 more
> >>
> >> I define the Statistics in the tables I use like this:
> >> public Statistic getStatistic() {
> >> int rowCount = rows.size();
> >> return Statistics.of(rowCount, null); //add List<ImmutableBitSet>
> >> }
> >>
> >>
> >> Thanks,
> >> George
> >>
> >> 2016-10-16 7:28 GMT+03:00 Jungtaek Lim <ka...@gmail.com>:
> >>
> >>> Hi George,
> >>>
> >>> This patch is ported version (with small fixes) of Milinda's samza-sql
> >>> implementation for Storm SQL.
> >>> https://github.com/apache/storm/pull/1736
> >>>
> >>> In this patch I removed adding HepPlanner and just rely on Volcano
> >>> Planner
> >>> (so the patch may be the closer thing what you want).
> >>> For now I also remove code regarding metadata since I'm not clear on
> how
> >>> it
> >>> works and what it helps, but I'll re-address once I can find its usage
> >>> and
> >>> benefits.
> >>>
> >>> Hope this helps.
> >>>
> >>> Thanks,
> >>> Jungtaek Lim (HeartSaVioR)
> >>>
> >>> 2016년 10월 4일 (화) 오후 7:08, Γιώργος Θεοδωράκης <giwrgosrtheod@gmail.com
> >님이
> >>> 작성:
> >>>
> >>> > I think I did as you said:
> >>> >
> >>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
> >>> c/main/java/calcite/VolcanoTester.java
> >>> >
> >>> > and I get for every query I use:
> >>> > Exception in thread "main"
> >>> > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
> >>> > [rel#10:Subset#2.NONE.[]] could not be implemented; planner state:
> >>> > Root: rel#10:Subset#2.NONE.[]
> >>> > Original rel:
> >>> > ....
> >>> > at
> >>> >
> >>> > org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplac
> >>> er.visit(RelSubset.java:443)
> >>> > at
> >>> >
> >>> > org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(
> >>> RelSubset.java:293)
> >>> > at
> >>> >
> >>> > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(V
> >>> olcanoPlanner.java:835)
> >>> > at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
> >>> s.java:334)
> >>> > at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl
> >>> .java:308)
> >>> > at calcite.VolcanoTester.main(VolcanoTester.java:77)
> >>> >
> >>> > My table's is defined here :
> >>> >
> >>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
> >>> c/main/java/calcite/utils/OrdersTableFactory.java
> >>> >
> >>> >
> >>> > Thank you for your time,
> >>> > George
> >>> >
> >>> >
> >>> > 2016-10-04 2:38 GMT+03:00 Jordan Halterman <
> jordan.halterman@gmail.com
> >>> >:
> >>> >
> >>> > > The link you provided is a pretty good example. Build a
> >>> FrameworkConfig
> >>> > > with your schema, parser config, and other information, and use it
> to
> >>> > > create a Planner. That Planner uses a VolcanoPlanner internally.
> >>> What's
> >>> > > missing from that particular example is just the addition of
> >>> programs.
> >>> > > Programs are effectively sets of rules you will use to optimize
> your
> >>> > query.
> >>> > > So, to add your FilterProjectTransposeRule to the planner, call
> this
> >>> when
> >>> > > building your FrameworkConfig:
> >>> > >
> >>> > > .programs(Programs.ofRules(FilterProjectTransposeRule.INSTANCE))
> >>> > >
> >>> > > That adds your program(s) to the set of programs in the planner,
> and
> >>> > those
> >>> > > programs can be accessed to optimize the query. Use the planner to
> >>> > parse()
> >>> > > your query, validate() your query, and then convert() your query
> >>> into a
> >>> > > logical plan. Then call...
> >>> > >
> >>> > > RelTraitSet traitSet = planner.emptyTraitSet().replac
> >>> e(Convention.NONE);
> >>> > > planner.transform(0. traitSet, logicalPlan);
> >>> > >
> >>> > > to apply the rules you added to the configuration. That should use
> >>> the
> >>> > > VolcanoPlanner to apply the rules you added in your Program. The
> >>> trait
> >>> > set
> >>> > > that's passed to that method is the required output trait set. So,
> >>> if you
> >>> > > wanted to convert the logical plan into some physical convention,
> >>> you'd
> >>> > > pass your physical convention instead of Convention.NONE. I can
> >>> respond
> >>> > > with a full example if you need it in a bit. I just don't have the
> >>> > capacity
> >>> > > to write it ATM.
> >>> > >
> >>> > > On Mon, Oct 3, 2016 at 8:51 AM, Γιώργος Θεοδωράκης <
> >>> > > giwrgosrtheod@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Hi,
> >>> > > >
> >>> > > > I want to parse an Sql query and transform it to an optimized
> >>> > relational
> >>> > > > plan (not convert it to physical !!) using calcite rules based on
> >>> my
> >>> > > > database schema and metadata. Right now, the only helpful
> example I
> >>> > have
> >>> > > > found for my purpose is taken from
> >>> > > > https://github.com/milinda/samza-sql/blob/master/samza-
> >>> > > >
> >>> > sql-planner/src/main/java/org/apache/samza/sql/planner/Query
> >>> Planner.java
> >>> > > > ,
> >>> > > > in which a simple Planner is used for parsing and validating Sql
> >>> and a
> >>> > > > HepPlanner is used for searching for an optimized plan based on
> >>> > imported
> >>> > > > rules.
> >>> > > >
> >>> > > > Is there any way to use in my case the VolcanoPlanner? The only
> >>> > examples
> >>> > > I
> >>> > > > have seen so far from the test classes suggest that it should be
> >>> used
> >>> > for
> >>> > > > converting relational expressions to physical ones. How can I
> make
> >>> the
> >>> > > > Volcano Planner "see" my SchemaPlus schema ,when I can only
> define
> >>> > > > RelOptSchema? Can someone provide me with a complete example of
> >>> using
> >>> > > > Volcano Planner and adding rules, such
> >>> > > > as FilterProjectTransposeRule.INSTANCE?
> >>> > > >
> >>> > > > Thanks in advance,
> >>> > > > George
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>