You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Ian Bertolacci (Jira)" <ji...@apache.org> on 2021/10/14 22:18:00 UTC
[jira] [Created] (CALCITE-4852) RelToSqlConverter creates
unparseable SQL string from right associative multi-way join.
Ian Bertolacci created CALCITE-4852:
---------------------------------------
Summary: RelToSqlConverter creates unparseable SQL string from right associative multi-way join.
Key: CALCITE-4852
URL: https://issues.apache.org/jira/browse/CALCITE-4852
Project: Calcite
Issue Type: Bug
Components: core
Reporter: Ian Bertolacci
As discussed inĀ CALCITE-35, Calcite cannot parse parenthesized join expressions (such as `select ... from A join (B join C)`).
But as suggested in CALCITE-2152, those expressions can be converted into parenthesized select on the join expression (`select ... from A join (select ... from B join C)`).
However, RelToSqlConverter will convert the RelNode representation of right-associative joins into a parenthesized join expression, which is unparseable.
For example, this RelNode tree
{code}
LogicalProject(_T5_ID=[$2], C0_51=[$3], _T1_ID=[$1], C0_53=[$4], _T3_ID=[$0])
LogicalJoin(condition=[=($0, $4)], joinType=[inner])
LogicalProject(_T3_ID=[$0])
LogicalTableScan(table=[[QUERY, T3]], fields=[[0, 1, 2]])
// RHS child of right-associate join
LogicalJoin(condition=[=($0, $2)], joinType=[inner])
LogicalProject(_T1_ID=[$0])
LogicalTableScan(table=[[QUERY, T1]], fields=[[0, 1, 2]])
LogicalProject(_T5_ID=[$0], C0_51=[$1], C0_53=[$3])
LogicalTableScan(table=[[QUERY, T5]], fields=[[0, 1, 2, 3]])
{code}
creates the following unparseable SQL string
{code}
SELECT `t1`.`_t5_id`,
`t1`.`c0_51`,
`t0`.`_t1_id`,
`t1`.`c0_53`,
`t`.`_t3_id`
FROM (SELECT `id` AS `_T3_ID`
FROM `query`.`t3`) AS `t`
INNER JOIN ((SELECT `id` AS `_T1_ID`
FROM `query`.`t1`) AS `t0`
INNER JOIN (SELECT `id` AS `_T5_ID`,
`c0_51`,
`c0_53`
FROM `query`.`t5`) AS `t1`
ON `t0`.`_t1_id` = `t1`.`c0_51`)
ON `t`.`_t3_id` = `t1`.`c0_53`
{code}
This is an issue, because it is very easy to make such trees, and ideally, all SQL strings generated by Calcite would also be parseable by Calcite.
To get around this, we found that the insertion of a projection node between a join and it's RHS child (the problematic join) forces RelToSqlConverter to create a parenthesized select statement, which *is* parseable, and (at least for us) semantically identical.
For example, this virtually identical tree:
{code}
LogicalProject(_T5_ID=[$2], C0_51=[$3], _T1_ID=[$1], C0_53=[$4], _T3_ID=[$0])
LogicalJoin(condition=[=($0, $4)], joinType=[inner])
LogicalProject(_T3_ID=[$0])
LogicalTableScan(table=[[QUERY, T3]], fields=[[0, 1, 2]])
// Inserted projection on top of RHS child
LogicalProject(_T1_ID=[$0], _T5_ID=[$1], C0_51=[$2], C0_53=[$3])
// RHS child of right-associate join
LogicalJoin(condition=[=($0, $2)], joinType=[inner])
LogicalProject(_T1_ID=[$0])
LogicalTableScan(table=[[QUERY, T1]], fields=[[0, 1, 2]])
LogicalProject(_T5_ID=[$0], C0_51=[$1], C0_53=[$3])
LogicalTableScan(table=[[QUERY, T5]], fields=[[0, 1, 2, 3]])
{code}
creates the following parseable SQL string
{code}
SELECT `t2`.`_t5_id`,
`t2`.`c0_51`,
`t2`.`_t1_id`,
`t2`.`c0_53`,
`t`.`_t3_id`
FROM (SELECT `id` AS `_T3_ID`
FROM `query`.`t3`) AS `t`
INNER JOIN (SELECT *
FROM (SELECT `id` AS `_T1_ID`
FROM `query`.`t1`) AS `t0`
INNER JOIN (SELECT `id` AS `_T5_ID`,
`c0_51`,
`c0_53`
FROM `query`.`t5`) AS `t1`
ON `t0`.`_t1_id` = `t1`.`c0_51`) AS `t2`
ON `t`.`_t3_id` = `t2`.`c0_53`
{code}
We solved this our RelToSqlConverter extending class by cloning the parent join and inserting a projection between it and an RHS join:
{code}
override def visit(join: Join): Result = {
if (join.getRight.isInstanceOf[Join]) {
super.visit(
join.copy(
join.getTraitSet,
join.getCondition,
join.getLeft,
new LogicalProject(
join.getCluster,
join.getTraitSet,
join.getHints,
join.getRight,
// Construct projection of all expressions from the rhs join
join.getRight.getRowType.getFieldList.asScala.map(field => new RexInputRef(field.getIndex, field.getType)).asJava,
join.getRight.getRowType
),
join.getJoinType,
join.isSemiJoinDone
)
)
} else {
super.visit(join)
}
}
{code}
(we tried to do it "correctly" by constructing a proper Result object, but found it too difficult. Hopefully someone else can figure out how do it the right way).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)