You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Luis Fernando Kauer <lf...@yahoo.com.br.INVALID> on 2017/07/21 19:42:53 UTC

Rule to remove right part of a left join when no project is used after the join

Hi,

I have a schema with a lot of normalized tables, which makes it hard for users to build the queries.

I would like to create views with all the joins already included so that users could just select the fields without worrying with multiple tables and join conditions.


But the optimizer should drop all the tables that are not actually used (when possible), otherwise the performance is not good.

Any suggestion?

How can I register my own rules besides the build in rules when connecting through CalciteConnection?

Best regards,

Luis Fernando

Re: Rule to remove right part of a left join when no project is used after the join

Posted by Julian Hyde <jh...@apache.org>.
Referential constraint ("at least once") is not necessary because this
an outer join.

For uniqueness ("at most once") you would need to use RelMdUniqueKeys
or RelMdColumnUniqueness.

On Fri, Jul 28, 2017 at 1:03 PM, Luis Fernando Kauer
<lf...@yahoo.com.br.invalid> wrote:
>> I think you also need a check that the right-hand key is unique. For
>> example, it's not valid to rewrite
>>
>>   select d.name from dept left join emp using (deptno)
>>
>> to
>>
>> select d.name from dept
>>
>> because the second query gives fewer rows.
>
>
> Yes, you are right. I forgot about these cases where it can increase the number of rows.
> My main use case for the rule is to simplify a query that uses a view with many joins to dimension tables.
> So how can I check if the right-hand key is unique?
> Jdbc-adapter doesn't seem to load any information about keys.
> You mentioned RelReferencialConstraint.
> How can I use it?
>
> I'll try to implement your other comments too.
>
> Thank you.
>
> Luis Fernando
>
> Em Sexta-feira, 28 de Julho de 2017 14:27, Julian Hyde <jh...@apache.org> escreveu:
>
>
>
> That looks good.
>
> I think you also need a check that the right-hand key is unique. For
> example, it's not valid to rewrite
>
>   select d.name from dept left join emp using (deptno)
>
> to
>
> select d.name from dept
>
> because the second query gives fewer rows.
>
> I think there's some library code to ensure that a join only uses
> fields from a particular input.
>
> Often people choose to parameterize rule with classes. Your rule, for
> instance, could work on any sub-class of Project and Join, not just
> LogicalProject and LogicalJoin.
>
> I also strongly recommend that you add sufficient tests to
> RelOptRulesTest to prove that the rule handles the main cases.
> Including cases, like above, where it would be wrong to fire the rule.
>
> If you intend to contribute this at some point, please file a JIRA
> case. You can attach a PR later.
>
> Julian
>
>
>
> On Fri, Jul 28, 2017 at 7:53 AM, Luis Fernando Kauer
> <lf...@yahoo.com.br.invalid> wrote:
>> I'm still learning Calcite and how to convert and transform the nodes, but I managed to create a rule to do this that seems to be working, but I would like your comments about flaws and improvements.
>> It drops the right hand side of a join when the join type is "left" and the project above the join includes only field reference to the input from the left.
>>
>> public class LeftJoinRemoveRule extends RelOptRule {
>>   public static final LeftJoinRemoveRule INSTANCE =
>>     new LeftJoinRemoveRule();
>>
>>   //~ Constructors -----------------------------------------------------------
>>
>>   /** Creates a SemiJoinRemoveRule. */
>>   private LeftJoinRemoveRule() {
>>     super(operand(LogicalProject.class,
>>       operand(LogicalJoin.class, any())));
>>   }
>>
>>   //~ Methods ----------------------------------------------------------------
>>
>>   public void onMatch(RelOptRuleCall call) {
>>     LogicalProject project = call.rel(0);
>>     LogicalJoin join = call.rel(1);
>>     RelNode input = join.getInput(0);
>>     int fieldCount = input.getRowType().getFieldCount();
>>     if( join.getJoinType()!=JoinRelType.LEFT)
>>       return;
>>     for(RexNode rex: project.getProjects()) {
>>       if (! (rex instanceof RexInputRef)) {
>>         return;
>>       }
>>       RexInputRef ref = (RexInputRef) rex;
>>       if( ref.getIndex() >= fieldCount ) {
>>         return;
>>       }
>>     }
>>    call.transformTo(project.copy(project.getTraitSet(), ImmutableList.of(input)));
>>   }
>> }
>>
>> Best regards,
>>
>> Luis Fernando
>>
>> Em Sexta-feira, 21 de Julho de 2017 22:21, Julian Hyde <jh...@apache.org> escreveu:
>>
>>
>>
>> As you're probably aware, It's only safe to drop a table if (a) none
>> of its fields are used, (b) the join is to a unique key of that table,
>> (c) there is a referential constraint to that table.
>>
>> Without conditions (b) and (c) are necessary, the table could be
>> converting each row on the left to zero or more than one rows, and
>> therefore removing the join would affect the result.
>>
>> This kind of schema is common in data warehousing environments, so I
>> created the Lattice construct. Tables in a lattice implicitly have
>> many-to-one relationships. That makes it safe to transform a query on,
>> say (Sales JOIN Customer) to use a materialized view on (Sales JOIN
>> Customer) JOIN Product), because adding/removing the Product table
>> will not affect the row count.
>>
>> With Lattices the referential constraints are implicit, but we didn't
>> have the explicit referential constraints that made it safe to do this
>> transform. Jesus added them (class RelReferentialConstraint) as part
>> of https://issues.apache.org/jira/browse/CALCITE-1731.
>> Maybe you can leverage them in a rule you write. It would match
>> (Project (Join A B)) and convert to (Project A).
>>
>> I don't recall how to add rules. Some people use Frameworks or Planner.
>>
>> Julian
>>
>>
>>
>>
>> On Fri, Jul 21, 2017 at 12:42 PM, Luis Fernando Kauer
>> <lf...@yahoo.com.br.invalid> wrote:
>>> Hi,
>>>
>>> I have a schema with a lot of normalized tables, which makes it hard for users to build the queries.
>>>
>>> I would like to create views with all the joins already included so that users could just select the fields without worrying with multiple tables and join conditions.
>>>
>>>
>>> But the optimizer should drop all the tables that are not actually used (when possible), otherwise the performance is not good.
>>>
>>> Any suggestion?
>>>
>>> How can I register my own rules besides the build in rules when connecting through CalciteConnection?
>>>
>>> Best regards,
>>>
>>> Luis Fernando

Re: Rule to remove right part of a left join when no project is used after the join

Posted by Luis Fernando Kauer <lf...@yahoo.com.br.INVALID>.
> I think you also need a check that the right-hand key is unique. For
> example, it's not valid to rewrite
>
>   select d.name from dept left join emp using (deptno)
>
> to
>
> select d.name from dept
>
> because the second query gives fewer rows.


Yes, you are right. I forgot about these cases where it can increase the number of rows.
My main use case for the rule is to simplify a query that uses a view with many joins to dimension tables.
So how can I check if the right-hand key is unique?
Jdbc-adapter doesn't seem to load any information about keys.
You mentioned RelReferencialConstraint.
How can I use it?

I'll try to implement your other comments too.

Thank you.

Luis Fernando

Em Sexta-feira, 28 de Julho de 2017 14:27, Julian Hyde <jh...@apache.org> escreveu:



That looks good.

I think you also need a check that the right-hand key is unique. For
example, it's not valid to rewrite

  select d.name from dept left join emp using (deptno)

to

select d.name from dept

because the second query gives fewer rows.

I think there's some library code to ensure that a join only uses
fields from a particular input.

Often people choose to parameterize rule with classes. Your rule, for
instance, could work on any sub-class of Project and Join, not just
LogicalProject and LogicalJoin.

I also strongly recommend that you add sufficient tests to
RelOptRulesTest to prove that the rule handles the main cases.
Including cases, like above, where it would be wrong to fire the rule.

If you intend to contribute this at some point, please file a JIRA
case. You can attach a PR later.

Julian



On Fri, Jul 28, 2017 at 7:53 AM, Luis Fernando Kauer
<lf...@yahoo.com.br.invalid> wrote:
> I'm still learning Calcite and how to convert and transform the nodes, but I managed to create a rule to do this that seems to be working, but I would like your comments about flaws and improvements.
> It drops the right hand side of a join when the join type is "left" and the project above the join includes only field reference to the input from the left.
>
> public class LeftJoinRemoveRule extends RelOptRule {
>   public static final LeftJoinRemoveRule INSTANCE =
>     new LeftJoinRemoveRule();
>
>   //~ Constructors -----------------------------------------------------------
>
>   /** Creates a SemiJoinRemoveRule. */
>   private LeftJoinRemoveRule() {
>     super(operand(LogicalProject.class,
>       operand(LogicalJoin.class, any())));
>   }
>
>   //~ Methods ----------------------------------------------------------------
>
>   public void onMatch(RelOptRuleCall call) {
>     LogicalProject project = call.rel(0);
>     LogicalJoin join = call.rel(1);
>     RelNode input = join.getInput(0);
>     int fieldCount = input.getRowType().getFieldCount();
>     if( join.getJoinType()!=JoinRelType.LEFT)
>       return;
>     for(RexNode rex: project.getProjects()) {
>       if (! (rex instanceof RexInputRef)) {
>         return;
>       }
>       RexInputRef ref = (RexInputRef) rex;
>       if( ref.getIndex() >= fieldCount ) {
>         return;
>       }
>     }
>    call.transformTo(project.copy(project.getTraitSet(), ImmutableList.of(input)));
>   }
> }
>
> Best regards,
>
> Luis Fernando
>
> Em Sexta-feira, 21 de Julho de 2017 22:21, Julian Hyde <jh...@apache.org> escreveu:
>
>
>
> As you're probably aware, It's only safe to drop a table if (a) none
> of its fields are used, (b) the join is to a unique key of that table,
> (c) there is a referential constraint to that table.
>
> Without conditions (b) and (c) are necessary, the table could be
> converting each row on the left to zero or more than one rows, and
> therefore removing the join would affect the result.
>
> This kind of schema is common in data warehousing environments, so I
> created the Lattice construct. Tables in a lattice implicitly have
> many-to-one relationships. That makes it safe to transform a query on,
> say (Sales JOIN Customer) to use a materialized view on (Sales JOIN
> Customer) JOIN Product), because adding/removing the Product table
> will not affect the row count.
>
> With Lattices the referential constraints are implicit, but we didn't
> have the explicit referential constraints that made it safe to do this
> transform. Jesus added them (class RelReferentialConstraint) as part
> of https://issues.apache.org/jira/browse/CALCITE-1731.
> Maybe you can leverage them in a rule you write. It would match
> (Project (Join A B)) and convert to (Project A).
>
> I don't recall how to add rules. Some people use Frameworks or Planner.
>
> Julian
>
>
>
>
> On Fri, Jul 21, 2017 at 12:42 PM, Luis Fernando Kauer
> <lf...@yahoo.com.br.invalid> wrote:
>> Hi,
>>
>> I have a schema with a lot of normalized tables, which makes it hard for users to build the queries.
>>
>> I would like to create views with all the joins already included so that users could just select the fields without worrying with multiple tables and join conditions.
>>
>>
>> But the optimizer should drop all the tables that are not actually used (when possible), otherwise the performance is not good.
>>
>> Any suggestion?
>>
>> How can I register my own rules besides the build in rules when connecting through CalciteConnection?
>>
>> Best regards,
>>
>> Luis Fernando

Re: Rule to remove right part of a left join when no project is used after the join

Posted by Julian Hyde <jh...@apache.org>.
That looks good.

I think you also need a check that the right-hand key is unique. For
example, it's not valid to rewrite

  select d.name from dept left join emp using (deptno)

to

 select d.name from dept

because the second query gives fewer rows.

I think there's some library code to ensure that a join only uses
fields from a particular input.

Often people choose to parameterize rule with classes. Your rule, for
instance, could work on any sub-class of Project and Join, not just
LogicalProject and LogicalJoin.

I also strongly recommend that you add sufficient tests to
RelOptRulesTest to prove that the rule handles the main cases.
Including cases, like above, where it would be wrong to fire the rule.

If you intend to contribute this at some point, please file a JIRA
case. You can attach a PR later.

Julian


On Fri, Jul 28, 2017 at 7:53 AM, Luis Fernando Kauer
<lf...@yahoo.com.br.invalid> wrote:
> I'm still learning Calcite and how to convert and transform the nodes, but I managed to create a rule to do this that seems to be working, but I would like your comments about flaws and improvements.
> It drops the right hand side of a join when the join type is "left" and the project above the join includes only field reference to the input from the left.
>
> public class LeftJoinRemoveRule extends RelOptRule {
>   public static final LeftJoinRemoveRule INSTANCE =
>     new LeftJoinRemoveRule();
>
>   //~ Constructors -----------------------------------------------------------
>
>   /** Creates a SemiJoinRemoveRule. */
>   private LeftJoinRemoveRule() {
>     super(operand(LogicalProject.class,
>       operand(LogicalJoin.class, any())));
>   }
>
>   //~ Methods ----------------------------------------------------------------
>
>   public void onMatch(RelOptRuleCall call) {
>     LogicalProject project = call.rel(0);
>     LogicalJoin join = call.rel(1);
>     RelNode input = join.getInput(0);
>     int fieldCount = input.getRowType().getFieldCount();
>     if( join.getJoinType()!=JoinRelType.LEFT)
>       return;
>     for(RexNode rex: project.getProjects()) {
>       if (! (rex instanceof RexInputRef)) {
>         return;
>       }
>       RexInputRef ref = (RexInputRef) rex;
>       if( ref.getIndex() >= fieldCount ) {
>         return;
>       }
>     }
>    call.transformTo(project.copy(project.getTraitSet(), ImmutableList.of(input)));
>   }
> }
>
> Best regards,
>
> Luis Fernando
>
> Em Sexta-feira, 21 de Julho de 2017 22:21, Julian Hyde <jh...@apache.org> escreveu:
>
>
>
> As you're probably aware, It's only safe to drop a table if (a) none
> of its fields are used, (b) the join is to a unique key of that table,
> (c) there is a referential constraint to that table.
>
> Without conditions (b) and (c) are necessary, the table could be
> converting each row on the left to zero or more than one rows, and
> therefore removing the join would affect the result.
>
> This kind of schema is common in data warehousing environments, so I
> created the Lattice construct. Tables in a lattice implicitly have
> many-to-one relationships. That makes it safe to transform a query on,
> say (Sales JOIN Customer) to use a materialized view on (Sales JOIN
> Customer) JOIN Product), because adding/removing the Product table
> will not affect the row count.
>
> With Lattices the referential constraints are implicit, but we didn't
> have the explicit referential constraints that made it safe to do this
> transform. Jesus added them (class RelReferentialConstraint) as part
> of https://issues.apache.org/jira/browse/CALCITE-1731.
> Maybe you can leverage them in a rule you write. It would match
> (Project (Join A B)) and convert to (Project A).
>
> I don't recall how to add rules. Some people use Frameworks or Planner.
>
> Julian
>
>
>
>
> On Fri, Jul 21, 2017 at 12:42 PM, Luis Fernando Kauer
> <lf...@yahoo.com.br.invalid> wrote:
>> Hi,
>>
>> I have a schema with a lot of normalized tables, which makes it hard for users to build the queries.
>>
>> I would like to create views with all the joins already included so that users could just select the fields without worrying with multiple tables and join conditions.
>>
>>
>> But the optimizer should drop all the tables that are not actually used (when possible), otherwise the performance is not good.
>>
>> Any suggestion?
>>
>> How can I register my own rules besides the build in rules when connecting through CalciteConnection?
>>
>> Best regards,
>>
>> Luis Fernando

Re: Rule to remove right part of a left join when no project is used after the join

Posted by Luis Fernando Kauer <lf...@yahoo.com.br.INVALID>.
I'm still learning Calcite and how to convert and transform the nodes, but I managed to create a rule to do this that seems to be working, but I would like your comments about flaws and improvements.
It drops the right hand side of a join when the join type is "left" and the project above the join includes only field reference to the input from the left.

public class LeftJoinRemoveRule extends RelOptRule { 
  public static final LeftJoinRemoveRule INSTANCE = 
    new LeftJoinRemoveRule(); 

  //~ Constructors ----------------------------------------------------------- 

  /** Creates a SemiJoinRemoveRule. */ 
  private LeftJoinRemoveRule() { 
    super(operand(LogicalProject.class, 
      operand(LogicalJoin.class, any()))); 
  } 

  //~ Methods ---------------------------------------------------------------- 

  public void onMatch(RelOptRuleCall call) { 
    LogicalProject project = call.rel(0); 
    LogicalJoin join = call.rel(1); 
    RelNode input = join.getInput(0); 
    int fieldCount = input.getRowType().getFieldCount(); 
    if( join.getJoinType()!=JoinRelType.LEFT) 
      return; 
    for(RexNode rex: project.getProjects()) { 
      if (! (rex instanceof RexInputRef)) { 
        return; 
      } 
      RexInputRef ref = (RexInputRef) rex; 
      if( ref.getIndex() >= fieldCount ) { 
        return; 
      } 
    } 
   call.transformTo(project.copy(project.getTraitSet(), ImmutableList.of(input))); 
  } 
}

Best regards,

Luis Fernando

Em Sexta-feira, 21 de Julho de 2017 22:21, Julian Hyde <jh...@apache.org> escreveu:



As you're probably aware, It's only safe to drop a table if (a) none
of its fields are used, (b) the join is to a unique key of that table,
(c) there is a referential constraint to that table.

Without conditions (b) and (c) are necessary, the table could be
converting each row on the left to zero or more than one rows, and
therefore removing the join would affect the result.

This kind of schema is common in data warehousing environments, so I
created the Lattice construct. Tables in a lattice implicitly have
many-to-one relationships. That makes it safe to transform a query on,
say (Sales JOIN Customer) to use a materialized view on (Sales JOIN
Customer) JOIN Product), because adding/removing the Product table
will not affect the row count.

With Lattices the referential constraints are implicit, but we didn't
have the explicit referential constraints that made it safe to do this
transform. Jesus added them (class RelReferentialConstraint) as part
of https://issues.apache.org/jira/browse/CALCITE-1731.
Maybe you can leverage them in a rule you write. It would match
(Project (Join A B)) and convert to (Project A).

I don't recall how to add rules. Some people use Frameworks or Planner.

Julian




On Fri, Jul 21, 2017 at 12:42 PM, Luis Fernando Kauer
<lf...@yahoo.com.br.invalid> wrote:
> Hi,
>
> I have a schema with a lot of normalized tables, which makes it hard for users to build the queries.
>
> I would like to create views with all the joins already included so that users could just select the fields without worrying with multiple tables and join conditions.
>
>
> But the optimizer should drop all the tables that are not actually used (when possible), otherwise the performance is not good.
>
> Any suggestion?
>
> How can I register my own rules besides the build in rules when connecting through CalciteConnection?
>
> Best regards,
>
> Luis Fernando

Re: Rule to remove right part of a left join when no project is used after the join

Posted by Julian Hyde <jh...@apache.org>.
As you're probably aware, It's only safe to drop a table if (a) none
of its fields are used, (b) the join is to a unique key of that table,
(c) there is a referential constraint to that table.

Without conditions (b) and (c) are necessary, the table could be
converting each row on the left to zero or more than one rows, and
therefore removing the join would affect the result.

This kind of schema is common in data warehousing environments, so I
created the Lattice construct. Tables in a lattice implicitly have
many-to-one relationships. That makes it safe to transform a query on,
say (Sales JOIN Customer) to use a materialized view on (Sales JOIN
Customer) JOIN Product), because adding/removing the Product table
will not affect the row count.

With Lattices the referential constraints are implicit, but we didn't
have the explicit referential constraints that made it safe to do this
transform. Jesus added them (class RelReferentialConstraint) as part
of https://issues.apache.org/jira/browse/CALCITE-1731.
Maybe you can leverage them in a rule you write. It would match
(Project (Join A B)) and convert to (Project A).

I don't recall how to add rules. Some people use Frameworks or Planner.

Julian



On Fri, Jul 21, 2017 at 12:42 PM, Luis Fernando Kauer
<lf...@yahoo.com.br.invalid> wrote:
> Hi,
>
> I have a schema with a lot of normalized tables, which makes it hard for users to build the queries.
>
> I would like to create views with all the joins already included so that users could just select the fields without worrying with multiple tables and join conditions.
>
>
> But the optimizer should drop all the tables that are not actually used (when possible), otherwise the performance is not good.
>
> Any suggestion?
>
> How can I register my own rules besides the build in rules when connecting through CalciteConnection?
>
> Best regards,
>
> Luis Fernando