You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Sharma, Ishan" <is...@brown.edu> on 2021/03/12 20:53:32 UTC

Issue with generating a basic physical plan

Hi Folks,
We are working on a dataflow system
<http://cs.brown.edu/people/malte/research/pbc/> and are using calcite to
parse the query and generate a plan. The generated plan is then later used
to build a dataflow graph in C++. We are facing issues when it comes to
generating a basic physical plan via the rules that calcite provides.

Our code is pretty much based on Milinda's tutorial
<https://github.com/milinda/calcite-tutorial>. I have it working in an
independent java project with the version of calcite that the tutorial uses
(1.6.0), but am unable to get it to work with 1.26.0.

*Following is the relevant subset of code:*
public class QueryPlanner {
  private final DataFlowGraphLibrary.DataFlowGraphGenerator generator;
  private final SchemaPlus schema;
  private final Planner planner;
  private final RuleSet rules;
  final ArrayList<RelTraitDef> traitDefs;

  // Construct a QueryPlanner given a C++ DataFlowGraphLibrary interface
exposing
  // Pelton state.
  private QueryPlanner(DataFlowGraphLibrary.DataFlowGraphGenerator
generator) {
    // Store the c++ interface.
    this.generator = generator;
    // Rules used for optimizing the logical plan
    this.rules = RuleSets.ofList(
    // CoreRules.FILTER_INTO_JOIN,
    Bindables.RULES);
    // Initialize traitdefs
    this.traitDefs = new ArrayList<RelTraitDef>();
    this.traitDefs.add(ConventionTraitDef.INSTANCE);
    this.traitDefs.add(RelCollationTraitDef.INSTANCE);

    // Create a calcite-understandable representation of the schema of the
logical
    // un-sharded database, as exposed from c++ via generator.
    PeltonSchemaFactory schemaFactory = new PeltonSchemaFactory(generator);
    this.schema = schemaFactory.createSchema();

    FrameworkConfig calciteFrameworkConfig =
        Frameworks.newConfigBuilder()
            // MySQL dialect.

.parserConfig(SqlParser.configBuilder().setLex(Lex.MYSQL).build())
            // Sets the schema to use by the planner
            .defaultSchema(this.schema)
            .traitDefs(this.traitDefs)
            .context(Contexts.EMPTY_CONTEXT)
            .ruleSets(this.rules)
            .costFactory(null)
            .build();

    this.planner = Frameworks.getPlanner(calciteFrameworkConfig);
  }

  // Parse query and plan it into a physical plan.
  private RelNode getPhysicalPlan(String query) throws ValidationException,
RelConversionException {
    SqlNode sqlNode;

    try {
      sqlNode = this.planner.parse(query);
    } catch (SqlParseException e) {
      throw new RuntimeException("Query parsing error.", e);
    }

    SqlNode validatedSqlNode = this.planner.validate(sqlNode);
    RelNode logicalPlan = this.planner.rel(validatedSqlNode).project();
    System.out.println(RelOptUtil.toString(logicalPlan));
    RelNode physicalPlan = this.planner.transform(0,
logicalPlan.getTraitSet().plus(BindableConvention.INSTANCE), logicalPlan);
    // System.out.println(RelOptUtil.toString(physicalPlan));
    return logicalPlan;
  }


*Following is the corresponding error*, where the query being supplied is
SELECT * FROM test_table WHERE Col2 = 'hello!':

Exception in thread "main"
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not
enough rules to produce a node with desired properties:
convention=BINDABLE, sort=[]. All the inputs have relevant nodes, however,
the cost is still infinite.
Root: rel#12:RelSubset#2.BINDABLE.[]
Original rel:
LogicalProject(subset=[rel#12:RelSubset#2.BINDABLE.[]], Col1=[$0],
Col2=[$1], Col3=[$2]): rowcount = 15.0, cumulative cost = {15.0 rows, 45.0
cpu, 0.0 io}, id = 10
  LogicalFilter(subset=[rel#9:RelSubset#1.NONE.[]], condition=[=($1,
'hello!')]): rowcount = 15.0, cumulative cost = {15.0 rows, 100.0 cpu, 0.0
io}, id = 8
    LogicalTableScan(subset=[rel#7:RelSubset#0.NONE.[]],
table=[[test_table]]): rowcount = 100.0, cumulative cost = {100.0 rows,
101.0 cpu, 0.0 io}, id = 4

Sets:
Set#0, type: RecordType(INTEGER Col1, VARCHAR Col2, INTEGER Col3)
        rel#7:RelSubset#0.NONE.[], best=null
                rel#4:LogicalTableScan.NONE.[](table=[test_table]),
rowcount=100.0, cumulative cost={inf}
        rel#16:RelSubset#0.BINDABLE.[], best=null

rel#15:InterpretableConverter.BINDABLE.[](input=RelSubset#7),
rowcount=100.0, cumulative cost={inf}
Set#1, type: RecordType(INTEGER Col1, VARCHAR Col2, INTEGER Col3)
        rel#9:RelSubset#1.NONE.[], best=null

rel#8:LogicalFilter.NONE.[](input=RelSubset#7,condition==($1, 'hello!')),
rowcount=15.0, cumulative cost={inf}
        rel#19:RelSubset#1.BINDABLE.[], best=null

rel#18:InterpretableConverter.BINDABLE.[](input=RelSubset#9),
rowcount=15.0, cumulative cost={inf}

rel#20:BindableFilter.BINDABLE.[](input=RelSubset#16,condition==($1,
'hello!')), rowcount=15.0, cumulative cost={inf}
Set#2, type: RecordType(INTEGER Col1, VARCHAR Col2, INTEGER Col3)
        rel#11:RelSubset#2.NONE.[], best=null

rel#10:LogicalProject.NONE.[](input=RelSubset#9,inputs=0..2),
rowcount=15.0, cumulative cost={inf}
        rel#12:RelSubset#2.BINDABLE.[], best=null

rel#13:AbstractConverter.BINDABLE.[](input=RelSubset#11,convention=BINDABLE,sort=[]),
rowcount=15.0, cumulative cost={inf}

rel#22:InterpretableConverter.BINDABLE.[](input=RelSubset#11),
rowcount=15.0, cumulative cost={inf}

rel#23:BindableProject.BINDABLE.[](input=RelSubset#19,inputs=0..2),
rowcount=15.0, cumulative cost={inf}

*Graphviz:*
digraph G {
        root [style=filled,label="Root"];
        subgraph cluster0{
                label="Set 0 RecordType(INTEGER Col1, VARCHAR Col2, INTEGER
Col3)";
                rel4 [label="re

I
l#4:LogicalTableScan\ntable=[test_table]\nrows=100.0, cost={inf}",shape=box]
                rel15
[label="rel#15:InterpretableConverter\ninput=RelSubset#7\nrows=100.0,
cost={inf}",shape=box]
                subset7 [label="rel#7:RelSubset#0.NONE.[]"]
                subset16 [label="rel#16:RelSubset#0.BINDABLE.[]"]
        }
        subgraph cluster1{
                label="Set 1 RecordType(INTEGER Col1, VARCHAR Col2, INTEGER
Col3)";
                rel8
[label="rel#8:LogicalFilter\ninput=RelSubset#7,condition==($1,
'hello!')\nrows=15.0, cost={inf}",shape=box]
                rel18
[label="rel#18:InterpretableConverter\ninput=RelSubset#9\nrows=15.0,
cost={inf}",shape=box]
                rel20
[label="rel#20:BindableFilter\ninput=RelSubset#16,condition==($1,
'hello!')\nrows=15.0, cost={inf}",shape=box]
                subset9 [label="rel#9:RelSubset#1.NONE.[]"]
                subset19 [label="rel#19:RelSubset#1.BINDABLE.[]"]
        }
        subgraph cluster2{
                label="Set 2 RecordType(INTEGER Col1, VARCHAR Col2, INTEGER
Col3)";
                rel10
[label="rel#10:LogicalProject\ninput=RelSubset#9,inputs=0..2\nrows=15.0,
cost={inf}",shape=box]
                rel13
[label="rel#13:AbstractConverter\ninput=RelSubset#11,convention=BINDABLE,sort=[]\nrows=15.0,
cost={inf}",shape=box]
                rel22
[label="rel#22:InterpretableConverter\ninput=RelSubset#11\nrows=15.0,
cost={inf}",shape=box]
                rel23
[label="rel#23:BindableProject\ninput=RelSubset#19,inputs=0..2\nrows=15.0,
cost={inf}",shape=box]
                subset11 [label="rel#11:RelSubset#2.NONE.[]"]
                subset12 [label="rel#12:RelSubset#2.BINDABLE.[]"]
        }
        root -> subset12;
        subset7 -> rel4;
        subset16 -> rel15; rel15 -> subset7;
        subset9 -> rel8; rel8 -> subset7;
        subset19 -> rel18; rel18 -> subset9;
        subset19 -> rel20; rel20 -> subset16;
        subset11 -> rel10; rel10 -> subset9;
        subset12 -> rel13; rel13 -> subset11;
        subset12 -> rel22; rel22 -> subset11;
        subset12 -> rel23; rel23 -> subset19;
}
        at
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:742)
        at
org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:365)
        at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:520)
        at
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:312)
        at
org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:356)
        at
com.brownsys.pelton.QueryPlanner.getPhysicalPlan(QueryPlanner.java:87)
        at com.brownsys.pelton.QueryPlanner.plan(QueryPlanner.java:118)

Based on the discussion here
<https://www.mail-archive.com/dev@calcite.apache.org/msg12131.html> and the
corresponding graphviz the issue seems to be related to the bindable
convention and/or the use of it. I am super new to working with calcite and
would appreciate your advice.

Thanks,
Ishan

Re: Issue with generating a basic physical plan

Posted by Chunwei Lei <ch...@gmail.com>.
Hi, Ishan.

From what I can see, that is because the table scan is not converted to
BindableTableScan.


Best,
Chunwei


On Sat, Mar 13, 2021 at 2:41 PM Sharma, Ishan <is...@brown.edu>
wrote:

> Hi Folks,
> We are working on a dataflow system
> <http://cs.brown.edu/people/malte/research/pbc/> and are using calcite to
> parse the query and generate a plan. The generated plan is then later used
> to build a dataflow graph in C++. We are facing issues when it comes to
> generating a basic physical plan via the rules that calcite provides.
>
> Our code is pretty much based on Milinda's tutorial
> <https://github.com/milinda/calcite-tutorial>. I have it working in an
> independent java project with the version of calcite that the tutorial uses
> (1.6.0), but am unable to get it to work with 1.26.0.
>
> *Following is the relevant subset of code:*
> public class QueryPlanner {
>   private final DataFlowGraphLibrary.DataFlowGraphGenerator generator;
>   private final SchemaPlus schema;
>   private final Planner planner;
>   private final RuleSet rules;
>   final ArrayList<RelTraitDef> traitDefs;
>
>   // Construct a QueryPlanner given a C++ DataFlowGraphLibrary interface
> exposing
>   // Pelton state.
>   private QueryPlanner(DataFlowGraphLibrary.DataFlowGraphGenerator
> generator) {
>     // Store the c++ interface.
>     this.generator = generator;
>     // Rules used for optimizing the logical plan
>     this.rules = RuleSets.ofList(
>     // CoreRules.FILTER_INTO_JOIN,
>     Bindables.RULES);
>     // Initialize traitdefs
>     this.traitDefs = new ArrayList<RelTraitDef>();
>     this.traitDefs.add(ConventionTraitDef.INSTANCE);
>     this.traitDefs.add(RelCollationTraitDef.INSTANCE);
>
>     // Create a calcite-understandable representation of the schema of the
> logical
>     // un-sharded database, as exposed from c++ via generator.
>     PeltonSchemaFactory schemaFactory = new PeltonSchemaFactory(generator);
>     this.schema = schemaFactory.createSchema();
>
>     FrameworkConfig calciteFrameworkConfig =
>         Frameworks.newConfigBuilder()
>             // MySQL dialect.
>
> .parserConfig(SqlParser.configBuilder().setLex(Lex.MYSQL).build())
>             // Sets the schema to use by the planner
>             .defaultSchema(this.schema)
>             .traitDefs(this.traitDefs)
>             .context(Contexts.EMPTY_CONTEXT)
>             .ruleSets(this.rules)
>             .costFactory(null)
>             .build();
>
>     this.planner = Frameworks.getPlanner(calciteFrameworkConfig);
>   }
>
>   // Parse query and plan it into a physical plan.
>   private RelNode getPhysicalPlan(String query) throws ValidationException,
> RelConversionException {
>     SqlNode sqlNode;
>
>     try {
>       sqlNode = this.planner.parse(query);
>     } catch (SqlParseException e) {
>       throw new RuntimeException("Query parsing error.", e);
>     }
>
>     SqlNode validatedSqlNode = this.planner.validate(sqlNode);
>     RelNode logicalPlan = this.planner.rel(validatedSqlNode).project();
>     System.out.println(RelOptUtil.toString(logicalPlan));
>     RelNode physicalPlan = this.planner.transform(0,
> logicalPlan.getTraitSet().plus(BindableConvention.INSTANCE), logicalPlan);
>     // System.out.println(RelOptUtil.toString(physicalPlan));
>     return logicalPlan;
>   }
>
>
> *Following is the corresponding error*, where the query being supplied is
> SELECT * FROM test_table WHERE Col2 = 'hello!':
>
> Exception in thread "main"
> org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not
> enough rules to produce a node with desired properties:
> convention=BINDABLE, sort=[]. All the inputs have relevant nodes, however,
> the cost is still infinite.
> Root: rel#12:RelSubset#2.BINDABLE.[]
> Original rel:
> LogicalProject(subset=[rel#12:RelSubset#2.BINDABLE.[]], Col1=[$0],
> Col2=[$1], Col3=[$2]): rowcount = 15.0, cumulative cost = {15.0 rows, 45.0
> cpu, 0.0 io}, id = 10
>   LogicalFilter(subset=[rel#9:RelSubset#1.NONE.[]], condition=[=($1,
> 'hello!')]): rowcount = 15.0, cumulative cost = {15.0 rows, 100.0 cpu, 0.0
> io}, id = 8
>     LogicalTableScan(subset=[rel#7:RelSubset#0.NONE.[]],
> table=[[test_table]]): rowcount = 100.0, cumulative cost = {100.0 rows,
> 101.0 cpu, 0.0 io}, id = 4
>
> Sets:
> Set#0, type: RecordType(INTEGER Col1, VARCHAR Col2, INTEGER Col3)
>         rel#7:RelSubset#0.NONE.[], best=null
>                 rel#4:LogicalTableScan.NONE.[](table=[test_table]),
> rowcount=100.0, cumulative cost={inf}
>         rel#16:RelSubset#0.BINDABLE.[], best=null
>
> rel#15:InterpretableConverter.BINDABLE.[](input=RelSubset#7),
> rowcount=100.0, cumulative cost={inf}
> Set#1, type: RecordType(INTEGER Col1, VARCHAR Col2, INTEGER Col3)
>         rel#9:RelSubset#1.NONE.[], best=null
>
> rel#8:LogicalFilter.NONE.[](input=RelSubset#7,condition==($1, 'hello!')),
> rowcount=15.0, cumulative cost={inf}
>         rel#19:RelSubset#1.BINDABLE.[], best=null
>
> rel#18:InterpretableConverter.BINDABLE.[](input=RelSubset#9),
> rowcount=15.0, cumulative cost={inf}
>
> rel#20:BindableFilter.BINDABLE.[](input=RelSubset#16,condition==($1,
> 'hello!')), rowcount=15.0, cumulative cost={inf}
> Set#2, type: RecordType(INTEGER Col1, VARCHAR Col2, INTEGER Col3)
>         rel#11:RelSubset#2.NONE.[], best=null
>
> rel#10:LogicalProject.NONE.[](input=RelSubset#9,inputs=0..2),
> rowcount=15.0, cumulative cost={inf}
>         rel#12:RelSubset#2.BINDABLE.[], best=null
>
>
> rel#13:AbstractConverter.BINDABLE.[](input=RelSubset#11,convention=BINDABLE,sort=[]),
> rowcount=15.0, cumulative cost={inf}
>
> rel#22:InterpretableConverter.BINDABLE.[](input=RelSubset#11),
> rowcount=15.0, cumulative cost={inf}
>
> rel#23:BindableProject.BINDABLE.[](input=RelSubset#19,inputs=0..2),
> rowcount=15.0, cumulative cost={inf}
>
> *Graphviz:*
> digraph G {
>         root [style=filled,label="Root"];
>         subgraph cluster0{
>                 label="Set 0 RecordType(INTEGER Col1, VARCHAR Col2, INTEGER
> Col3)";
>                 rel4 [label="re
>
> I
> l#4:LogicalTableScan\ntable=[test_table]\nrows=100.0,
> cost={inf}",shape=box]
>                 rel15
> [label="rel#15:InterpretableConverter\ninput=RelSubset#7\nrows=100.0,
> cost={inf}",shape=box]
>                 subset7 [label="rel#7:RelSubset#0.NONE.[]"]
>                 subset16 [label="rel#16:RelSubset#0.BINDABLE.[]"]
>         }
>         subgraph cluster1{
>                 label="Set 1 RecordType(INTEGER Col1, VARCHAR Col2, INTEGER
> Col3)";
>                 rel8
> [label="rel#8:LogicalFilter\ninput=RelSubset#7,condition==($1,
> 'hello!')\nrows=15.0, cost={inf}",shape=box]
>                 rel18
> [label="rel#18:InterpretableConverter\ninput=RelSubset#9\nrows=15.0,
> cost={inf}",shape=box]
>                 rel20
> [label="rel#20:BindableFilter\ninput=RelSubset#16,condition==($1,
> 'hello!')\nrows=15.0, cost={inf}",shape=box]
>                 subset9 [label="rel#9:RelSubset#1.NONE.[]"]
>                 subset19 [label="rel#19:RelSubset#1.BINDABLE.[]"]
>         }
>         subgraph cluster2{
>                 label="Set 2 RecordType(INTEGER Col1, VARCHAR Col2, INTEGER
> Col3)";
>                 rel10
> [label="rel#10:LogicalProject\ninput=RelSubset#9,inputs=0..2\nrows=15.0,
> cost={inf}",shape=box]
>                 rel13
>
> [label="rel#13:AbstractConverter\ninput=RelSubset#11,convention=BINDABLE,sort=[]\nrows=15.0,
> cost={inf}",shape=box]
>                 rel22
> [label="rel#22:InterpretableConverter\ninput=RelSubset#11\nrows=15.0,
> cost={inf}",shape=box]
>                 rel23
> [label="rel#23:BindableProject\ninput=RelSubset#19,inputs=0..2\nrows=15.0,
> cost={inf}",shape=box]
>                 subset11 [label="rel#11:RelSubset#2.NONE.[]"]
>                 subset12 [label="rel#12:RelSubset#2.BINDABLE.[]"]
>         }
>         root -> subset12;
>         subset7 -> rel4;
>         subset16 -> rel15; rel15 -> subset7;
>         subset9 -> rel8; rel8 -> subset7;
>         subset19 -> rel18; rel18 -> subset9;
>         subset19 -> rel20; rel20 -> subset16;
>         subset11 -> rel10; rel10 -> subset9;
>         subset12 -> rel13; rel13 -> subset11;
>         subset12 -> rel22; rel22 -> subset11;
>         subset12 -> rel23; rel23 -> subset19;
> }
>         at
>
> org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:742)
>         at
>
> org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:365)
>         at
>
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:520)
>         at
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:312)
>         at
> org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:356)
>         at
> com.brownsys.pelton.QueryPlanner.getPhysicalPlan(QueryPlanner.java:87)
>         at com.brownsys.pelton.QueryPlanner.plan(QueryPlanner.java:118)
>
> Based on the discussion here
> <https://www.mail-archive.com/dev@calcite.apache.org/msg12131.html> and
> the
> corresponding graphviz the issue seems to be related to the bindable
> convention and/or the use of it. I am super new to working with calcite and
> would appreciate your advice.
>
> Thanks,
> Ishan
>