You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Ian Bertolacci (Jira)" <ji...@apache.org> on 2021/10/14 22:18:00 UTC

[jira] [Created] (CALCITE-4852) RelToSqlConverter creates unparseable SQL string from right associative multi-way join.

Ian Bertolacci created CALCITE-4852:
---------------------------------------

             Summary: RelToSqlConverter creates unparseable SQL string from right associative multi-way join.
                 Key: CALCITE-4852
                 URL: https://issues.apache.org/jira/browse/CALCITE-4852
             Project: Calcite
          Issue Type: Bug
          Components: core
            Reporter: Ian Bertolacci


As discussed inĀ CALCITE-35, Calcite cannot parse parenthesized join expressions (such as `select ... from A join (B join C)`).
But as suggested in CALCITE-2152, those expressions can be converted into parenthesized select on the join expression (`select ... from A join (select ... from B join C)`).

However, RelToSqlConverter will convert the RelNode representation of right-associative joins into a parenthesized join expression, which is unparseable.

For example, this RelNode tree
{code}
LogicalProject(_T5_ID=[$2], C0_51=[$3], _T1_ID=[$1], C0_53=[$4], _T3_ID=[$0])
  LogicalJoin(condition=[=($0, $4)], joinType=[inner])
    LogicalProject(_T3_ID=[$0])
      LogicalTableScan(table=[[QUERY, T3]], fields=[[0, 1, 2]])
      // RHS child of right-associate join
      LogicalJoin(condition=[=($0, $2)], joinType=[inner])
        LogicalProject(_T1_ID=[$0])
          LogicalTableScan(table=[[QUERY, T1]], fields=[[0, 1, 2]])
        LogicalProject(_T5_ID=[$0], C0_51=[$1], C0_53=[$3])
          LogicalTableScan(table=[[QUERY, T5]], fields=[[0, 1, 2, 3]])
{code}
creates the following unparseable SQL string
{code}
SELECT `t1`.`_t5_id`,
       `t1`.`c0_51`,
       `t0`.`_t1_id`,
       `t1`.`c0_53`,
       `t`.`_t3_id`
FROM   (SELECT `id` AS `_T3_ID`
        FROM   `query`.`t3`) AS `t`
       INNER JOIN ((SELECT `id` AS `_T1_ID`
                   FROM   `query`.`t1`) AS `t0`
                   INNER JOIN (SELECT `id` AS `_T5_ID`,
                                      `c0_51`,
                                      `c0_53`
                               FROM   `query`.`t5`) AS `t1`
                           ON `t0`.`_t1_id` = `t1`.`c0_51`)
               ON `t`.`_t3_id` = `t1`.`c0_53`
{code}

This is an issue, because it is very easy to make such trees, and ideally, all SQL strings generated by Calcite would also be parseable by Calcite.

To get around this, we found that the insertion of a projection node between a join and it's RHS child (the problematic join) forces RelToSqlConverter to create a parenthesized select statement, which *is* parseable, and (at least for us) semantically identical.

For example, this virtually identical tree:
{code}
LogicalProject(_T5_ID=[$2], C0_51=[$3], _T1_ID=[$1], C0_53=[$4], _T3_ID=[$0])
  LogicalJoin(condition=[=($0, $4)], joinType=[inner])
    LogicalProject(_T3_ID=[$0])
      LogicalTableScan(table=[[QUERY, T3]], fields=[[0, 1, 2]])
    // Inserted projection on top of RHS child
    LogicalProject(_T1_ID=[$0], _T5_ID=[$1], C0_51=[$2], C0_53=[$3])
      // RHS child of right-associate join
      LogicalJoin(condition=[=($0, $2)], joinType=[inner])
        LogicalProject(_T1_ID=[$0])
          LogicalTableScan(table=[[QUERY, T1]], fields=[[0, 1, 2]])
        LogicalProject(_T5_ID=[$0], C0_51=[$1], C0_53=[$3])
          LogicalTableScan(table=[[QUERY, T5]], fields=[[0, 1, 2, 3]])
{code}
creates the following parseable SQL string
{code}
SELECT `t2`.`_t5_id`,
       `t2`.`c0_51`,
       `t2`.`_t1_id`,
       `t2`.`c0_53`,
       `t`.`_t3_id`
FROM   (SELECT `id` AS `_T3_ID`
        FROM   `query`.`t3`) AS `t`
       INNER JOIN (SELECT *
                   FROM   (SELECT `id` AS `_T1_ID`
                           FROM   `query`.`t1`) AS `t0`
                          INNER JOIN (SELECT `id` AS `_T5_ID`,
                                             `c0_51`,
                                             `c0_53`
                                      FROM   `query`.`t5`) AS `t1`
                                  ON `t0`.`_t1_id` = `t1`.`c0_51`) AS `t2`
               ON `t`.`_t3_id` = `t2`.`c0_53`
{code}

We solved this our RelToSqlConverter extending class by cloning the parent join and inserting a projection between it and an RHS join:
{code}
override def visit(join: Join): Result = {
    if (join.getRight.isInstanceOf[Join]) {
      super.visit(
        join.copy(
          join.getTraitSet,
          join.getCondition,
          join.getLeft,
          new LogicalProject(
            join.getCluster,
            join.getTraitSet,
            join.getHints,
            join.getRight,
            // Construct projection of all expressions from the rhs join
            join.getRight.getRowType.getFieldList.asScala.map(field => new RexInputRef(field.getIndex, field.getType)).asJava,
            join.getRight.getRowType
          ),
          join.getJoinType,
          join.isSemiJoinDone
        )
      )
    } else {
      super.visit(join)
    }
  }
{code}
(we tried to do it "correctly" by constructing a proper Result object, but found it too difficult. Hopefully someone else can figure out how do it the right way).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)