You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Γιώργος Θεοδωράκης <gi...@gmail.com> on 2016/10/03 15:51:36 UTC

How to use VolcanoPlanner

Hi,

I want to parse an Sql query and transform it to an optimized relational
plan (not convert it to physical !!) using calcite rules based on my
database schema and metadata. Right now, the only helpful example I have
found for my purpose is taken from
https://github.com/milinda/samza-sql/blob/master/samza-sql-planner/src/main/java/org/apache/samza/sql/planner/QueryPlanner.java
,
in which a simple Planner is used for parsing and validating Sql and a
HepPlanner is used for searching for an optimized plan based on imported
rules.

Is there any way to use in my case the VolcanoPlanner? The only examples I
have seen so far from the test classes suggest that it should be used for
converting relational expressions to physical ones. How can I make the
Volcano Planner "see" my SchemaPlus schema ,when I can only define
RelOptSchema? Can someone provide me with a complete example of using
Volcano Planner and adding rules, such
as FilterProjectTransposeRule.INSTANCE?

Thanks in advance,
George

Re: How to use VolcanoPlanner

Posted by Colm McHugh <co...@gmail.com>.

What errors did you get? It should be possible to use both Volcano and hep
when query planning (Drill does this, and possibly others).
It superficially sounds like applying a heuristics pass that includes the
project pushdown (and any other rules you may want to apply)
after the volcano pass should work.

On Tue, Nov 1, 2016 at 4:05 AM, Γιώργος Θεοδωράκης <gi...@gmail.com>
wrote:

> I am wondering if is it possible to push down projections in Volcano
> generally. With the cost model of Volcano, a projection adds rows and cpu
> cost and it can't be chosen. For example  for the next query:
>
> "select s.products.productid "
> + "from s.products,s.orders "
> + "where s.orders.productid = s.products.productid  "
> + "and units>5 "
>
> , the optimized plan I get from Volcano is this
>
> EnumerableProject(productid=[$0]): rowcount = 5.3999999999999995,
> cumulative cost = {51.555956815368326 rows, 37.4 cpu, 0.0 io}, id = 86
>   EnumerableJoin(condition=[=($0, $3)], joinType=[inner]): rowcount =
> 5.3999999999999995, cumulative cost = {46.15595681536833 rows, 32.0 cpu,
> 0.0 io}, id = 85
>     EnumerableTableScan(table=[[s, products]]): rowcount = 6.0, cumulative
> cost = {6.0 rows, 7.0 cpu, 0.0 io}, id = 40
>     EnumerableFilter(condition=[>($2, 5)]): rowcount = 6.0, cumulative
> cost
> = {18.0 rows, 25.0 cpu, 0.0 io}, id = 84
>       EnumerableTableScan(table=[[s, orders]]): rowcount = 12.0, cumulative
> cost = {12.0 rows, 13.0 cpu, 0.0 io}, id = 59
>
> However the plan I would like to get is the one with projections pushed
> down, which has bigger cumulative cost as we would have two more
> projections at the source of inputs (with additional rowCount and cpu cost)
> with Volcano's cost model, but smaller in most cases as we reduce the
> memory needed(which is not taken in consideration here).
>
> Are there any suggestions? At first I used hepPlanner, but I didn't get
> optimized plans for join order with the built-in
> rules(using JoinPushThroughJoinRule.RIGHT and JoinPushThroughJoinRule.LEFT
> never stops, and using MultiJoinRules doesn't do anything). Then, I tried
> to use VolcanoPlanner and I got optimized plans for join order with no
> pushed-down projections. I also tried to use a combination of both planners
> but I got errors. What should I do?
>
> 2016-10-27 16:00 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
>
> > I fixed the second error by changing my Statistics to:
> >
> > public Statistic getStatistic() {
> > int rowCount = rows.size();
> > return Statistics.of(rowCount, ImmutableList.<ImmutableBitSet>of());
> > }
> >
> > Any suggestions of a better solution are welcome. The project push-down
> > problem still exists...
> >
> > 2016-10-27 15:42 GMT+03:00 Γιώργος Θεοδωράκης <gi...@gmail.com>:
> >
> >> Hi,
> >> I was missing the implementations of operators, and I added the built in
> >> EnumerableRules until I create my own, in order to fix it. However, the
> >> plan I get from Volcano Optimizer is different from the one I get from
> >> HepPlanner, although I use the same rules. My problem is about
> Projection
> >> push-down. The hepPlanner pushes Projections to the bottom of the
> RelNode
> >> tree, and VolcanoPlanner keeps them always at the top (doesn't push them
> >> through joins). I use these rules in both :
> >>
> >>        ProjectRemoveRule.INSTANCE,
> >>        ProjectJoinTransposeRule.INSTANCE,
> >>         ProjectFilterTransposeRule.INSTANCE, /*it is better to use
> >> filter first and then project*/
> >>        ProjectTableScanRule.INSTANCE,
> >>        ProjectWindowTransposeRule.INSTANCE,
> >>        ProjectMergeRule.INSTANCE
> >>
> >> and
> >>                             EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE,
> >>         EnumerableRules.ENUMERABLE_PROJECT_RULE,
> >>                             ...
> >>
> >> Finally , when trying to use aggregate I get this error:
> >>
> >> Exception in thread "main" java.lang.AssertionError: Internal error:
> >> Error while applying rule EnumerableTableScanRule, args
> >> [rel#4:LogicalTableScan.NONE.[](table=[s, orders])]
> >> at org.apache.calcite.util.Util.newInternal(Util.java:792)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >> VolcanoRuleCall.java:236)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(
> >> VolcanoPlanner.java:819)
> >> at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
> >> s.java:334)
> >> at org.apache.calcite.prepare.PlannerImpl.transform(
> PlannerImpl.java:308)
> >> at calcite.VolcanoTester.main(VolcanoTester.java:106)
> >> Caused by: java.lang.AssertionError: Internal error: Error occurred
> while
> >> applying rule EnumerableTableScanRule
> >> at org.apache.calcite.util.Util.newInternal(Util.java:792)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
> >> VolcanoRuleCall.java:148)
> >> at org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRul
> >> eCall.java:225)
> >> at org.apache.calcite.rel.convert.ConverterRule.onMatch(Convert
> >> erRule.java:117)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(
> >> VolcanoRuleCall.java:213)
> >> ... 4 more
> >> Caused by: java.lang.NullPointerException
> >> at org.apache.calcite.schema.Statistics$2.isKey(Statistics.java:70)
> >> at org.apache.calcite.prepare.RelOptTableImpl.isKey(RelOptTable
> >> Impl.java:288)
> >> at org.apache.calcite.rel.metadata.RelMdColumnUniqueness.areCol
> >> umnsUnique(RelMdColumnUniqueness.java:76)
> >> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_ColumnUniqueness.areColumnsUnique(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.areColumnsU
> >> nique(RelMetadataQuery.java:461)
> >> at org.apache.calcite.rel.metadata.RelMdUtil.areColumnsDefinite
> >> lyUnique(RelMdUtil.java:216)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:75)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:292)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:138)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdDistinctRowCount.getDis
> >> tinctRowCount(RelMdDistinctRowCount.java:292)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_DistinctRowCount.
> getDistinctRowCount(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getDistinct
> >> RowCount(RelMetadataQuery.java:700)
> >> at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(
> >> RelMdRowCount.java:194)
> >> at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
> >> at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount
> >> (RelMetadataQuery.java:201)
> >> at org.apache.calcite.rel.core.Aggregate.computeSelfCost(Aggreg
> >> ate.java:304)
> >> at org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.
> >> getNonCumulativeCost(RelMdPercentageOriginalRows.java:162)
> >> at GeneratedMetadataHandler_NonCumulativeCost.
> getNonCumulativeCost_$(Unknown
> >> Source)
> >> at GeneratedMetadataHandler_NonCumulativeCost.
> getNonCumulativeCost(Unknown
> >> Source)
> >> at org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumul
> >> ativeCost(RelMetadataQuery.java:258)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(
> >> VolcanoPlanner.java:1128)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements0(RelSubset.java:336)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements(RelSubset.java:319)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements0(RelSubset.java:348)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements(RelSubset.java:319)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements0(RelSubset.java:348)
> >> at org.apache.calcite.plan.volcano.RelSubset.propagateCostImpro
> >> vements(RelSubset.java:319)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(
> >> VolcanoPlanner.java:1830)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(
> >> VolcanoPlanner.java:1766)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.register(
> >> VolcanoPlanner.java:1032)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
> >> red(VolcanoPlanner.java:1052)
> >> at org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegiste
> >> red(VolcanoPlanner.java:1942)
> >> at org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(
> >> VolcanoRuleCall.java:136)
> >> ... 7 more
> >>
> >> I define the Statistics in the tables I use like this:
> >> public Statistic getStatistic() {
> >> int rowCount = rows.size();
> >> return Statistics.of(rowCount, null); //add List<ImmutableBitSet>
> >> }
> >>
> >>
> >> Thanks,
> >> George
> >>
> >> 2016-10-16 7:28 GMT+03:00 Jungtaek Lim <ka...@gmail.com>:
> >>
> >>> Hi George,
> >>>
> >>> This patch is ported version (with small fixes) of Milinda's samza-sql
> >>> implementation for Storm SQL.
> >>> https://github.com/apache/storm/pull/1736
> >>>
> >>> In this patch I removed adding HepPlanner and just rely on Volcano
> >>> Planner
> >>> (so the patch may be the closer thing what you want).
> >>> For now I also remove code regarding metadata since I'm not clear on
> how
> >>> it
> >>> works and what it helps, but I'll re-address once I can find its usage
> >>> and
> >>> benefits.
> >>>
> >>> Hope this helps.
> >>>
> >>> Thanks,
> >>> Jungtaek Lim (HeartSaVioR)
> >>>
> >>> 2016년 10월 4일 (화) 오후 7:08, Γιώργος Θεοδωράκης <giwrgosrtheod@gmail.com
> >님이
> >>> 작성:
> >>>
> >>> > I think I did as you said:
> >>> >
> >>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
> >>> c/main/java/calcite/VolcanoTester.java
> >>> >
> >>> > and I get for every query I use:
> >>> > Exception in thread "main"
> >>> > org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
> >>> > [rel#10:Subset#2.NONE.[]] could not be implemented; planner state:
> >>> > Root: rel#10:Subset#2.NONE.[]
> >>> > Original rel:
> >>> > ....
> >>> > at
> >>> >
> >>> > org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplac
> >>> er.visit(RelSubset.java:443)
> >>> > at
> >>> >
> >>> > org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(
> >>> RelSubset.java:293)
> >>> > at
> >>> >
> >>> > org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(V
> >>> olcanoPlanner.java:835)
> >>> > at org.apache.calcite.tools.Programs$RuleSetProgram.run(Program
> >>> s.java:334)
> >>> > at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl
> >>> .java:308)
> >>> > at calcite.VolcanoTester.main(VolcanoTester.java:77)
> >>> >
> >>> > My table's is defined here :
> >>> >
> >>> > https://github.com/giwrgostheod/Calcite-Saber/blob/master/sr
> >>> c/main/java/calcite/utils/OrdersTableFactory.java
> >>> >
> >>> >
> >>> > Thank you for your time,
> >>> > George
> >>> >
> >>> >
> >>> > 2016-10-04 2:38 GMT+03:00 Jordan Halterman <
> jordan.halterman@gmail.com
> >>> >:
> >>> >
> >>> > > The link you provided is a pretty good example. Build a
> >>> FrameworkConfig
> >>> > > with your schema, parser config, and other information, and use it
> to
> >>> > > create a Planner. That Planner uses a VolcanoPlanner internally.
> >>> What's
> >>> > > missing from that particular example is just the addition of
> >>> programs.
> >>> > > Programs are effectively sets of rules you will use to optimize
> your
> >>> > query.
> >>> > > So, to add your FilterProjectTransposeRule to the planner, call
> this
> >>> when
> >>> > > building your FrameworkConfig:
> >>> > >
> >>> > > .programs(Programs.ofRules(FilterProjectTransposeRule.INSTANCE))
> >>> > >
> >>> > > That adds your program(s) to the set of programs in the planner,
> and
> >>> > those
> >>> > > programs can be accessed to optimize the query. Use the planner to
> >>> > parse()
> >>> > > your query, validate() your query, and then convert() your query
> >>> into a
> >>> > > logical plan. Then call...
> >>> > >
> >>> > > RelTraitSet traitSet = planner.emptyTraitSet().replac
> >>> e(Convention.NONE);
> >>> > > planner.transform(0. traitSet, logicalPlan);
> >>> > >
> >>> > > to apply the rules you added to the configuration. That should use
> >>> the
> >>> > > VolcanoPlanner to apply the rules you added in your Program. The
> >>> trait
> >>> > set
> >>> > > that's passed to that method is the required output trait set. So,
> >>> if you
> >>> > > wanted to convert the logical plan into some physical convention,
> >>> you'd
> >>> > > pass your physical convention instead of Convention.NONE. I can
> >>> respond
> >>> > > with a full example if you need it in a bit. I just don't have the
> >>> > capacity
> >>> > > to write it ATM.
> >>> > >
> >>> > > On Mon, Oct 3, 2016 at 8:51 AM, Γιώργος Θεοδωράκης <
> >>> > > giwrgosrtheod@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Hi,
> >>> > > >
> >>> > > > I want to parse an Sql query and transform it to an optimized
> >>> > relational
> >>> > > > plan (not convert it to physical !!) using calcite rules based on
> >>> my
> >>> > > > database schema and metadata. Right now, the only helpful
> example I
> >>> > have
> >>> > > > found for my purpose is taken from
> >>> > > > https://github.com/milinda/samza-sql/blob/master/samza-
> >>> > > >
> >>> > sql-planner/src/main/java/org/apache/samza/sql/planner/Query
> >>> Planner.java
> >>> > > > ,
> >>> > > > in which a simple Planner is used for parsing and validating Sql
> >>> and a
> >>> > > > HepPlanner is used for searching for an optimized plan based on
> >>> > imported
> >>> > > > rules.
> >>> > > >
> >>> > > > Is there any way to use in my case the VolcanoPlanner? The only
> >>> > examples
> >>> > > I
> >>> > > > have seen so far from the test classes suggest that it should be
> >>> used
> >>> > for
> >>> > > > converting relational expressions to physical ones. How can I
> make
> >>> the
> >>> > > > Volcano Planner "see" my SchemaPlus schema ,when I can only
> define
> >>> > > > RelOptSchema? Can someone provide me with a complete example of
> >>> using
> >>> > > > Volcano Planner and adding rules, such
> >>> > > > as FilterProjectTransposeRule.INSTANCE?
> >>> > > >
> >>> > > > Thanks in advance,
> >>> > > > George
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>