You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "winds (Jira)" <ji...@apache.org> on 2023/06/26 00:30:00 UTC

[jira] [Comment Edited] (CALCITE-5756) Expand ProjectJoinRemoveRule to support inner join removal by using the foreign-unique constraints

    [ https://issues.apache.org/jira/browse/CALCITE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736942#comment-17736942 ] 

winds edited comment on CALCITE-5756 at 6/26/23 12:29 AM:
----------------------------------------------------------

Thanks for your check. Get your point, my understanding is as follows:

Firstly.
For the following sql, assumption there are constraints that Emp.deptno foreign key reference Dept.deptno unique key, 
Emp.ename foreign key reference Dept.deptno unique key.
{code:java}
SELECT e.ename, e.deptno, DEPT.name
FROM
   (SELECT Emp.ename, Emp.deptno FROM Emp) e
INNER JOIN
Dept
ON e.deptno on Dept.deptno{code}
It seems that There are two ways to get constraints on current rel node.

a. 
RelMetadataQuery#getForeignKeys mark out which position is the valid foreign key(if the position is on aggregate function, it seems invalid),
if we want to know which unqinue key the foreign key reference, we can get the relation by RelMetadataQuery#getColumnOrigin,such as the logic in method 
[ProjectJoinRemoveRule#areForeignKeysValid()|https://github.com/JingDas/calcite/blob/586dbf40d6ef7752b554c08fe573e600da456876/core/src/main/java/org/apache/calcite/rel/rules/ProjectJoinRemoveRule.java#L170].

b. 
If we want to get all constraint information from RelMetadataQuery#getForeignKeys, it seems that
RelMetadataQuery#getForeignKeys should return Set<Pair<ImmutableBitSet, ImmutableBitSet>>, which represent that
foreign key positions reference unique key positions correspondingly on current relNode.

As above sql and logic, if we call the method RelMetadataQuery#getForeignKeys(Project(Emp.ename, Emp.deptno),  true),
it seems return [<[Emp.ename, Emp.deptno], null>], because it reference another table Dept, but the Dept does not appear
in the current project relnode.

To be precise, I don't have ideas for that how to keep track and record this constraint relationship 
during the bottom-up transfer derivation process of rel node. So I took the first approach above.

Secondly.
I just searched some docs. And find that in some database such as Mysql, SqlServer and so on, Foreign Key are allowed to be
composited.For the method RelMetadataQuery#getForeignKeys, it seems to return Set<ImmutableBitSet> which can represent the information
for composite foreign keys, I will use Set<ImmutableBitSet> to fix the composite foreign keys representation.

WDYT?


was (Author: JIRAUSER292370):
Thanks for your check. Get your point, my understanding is as follows:

Firstly.
For the following sql, assumption there are constraints that Emp.deptno foreign key reference Dept.deptno unique key, 
Emp.ename foreign key reference Dept.deptno unique key.
{code:java}
SELECT e.ename, e.deptno, DEPT.name
FROM
   (SELEECT Emp.ename, Emp.deptno FROM Emp) e
INNER JOIN
Dept
ON e.deptno on Dept.deptno{code}
It seems that There are two ways to get constraints on current rel node.

a. 
RelMetadataQuery#getForeignKeys mark out which position is the valid foreign key(if the position is on aggregate function, it seems invalid),
if we want to know which unqinue key the foreign key reference, we can get the relation by RelMetadataQuery#getColumnOrigin,such as the logic in method 
[ProjectJoinRemoveRule#areForeignKeysValid()|https://github.com/JingDas/calcite/blob/586dbf40d6ef7752b554c08fe573e600da456876/core/src/main/java/org/apache/calcite/rel/rules/ProjectJoinRemoveRule.java#L170].

b. 
If we want to get all constraint information from RelMetadataQuery#getForeignKeys, it seems that
RelMetadataQuery#getForeignKeys should return Set<Pair<ImmutableBitSet, ImmutableBitSet>>, which represent that
foreign key positions reference unique key positions correspondingly on current relNode.

As above sql and logic, if we call the method RelMetadataQuery#getForeignKeys(Project(Emp.ename, Emp.deptno),  true),
it seems return [<[Emp.ename, Emp.deptno], null>], because it reference another table Dept, but the Dept does not appear
in the current project relnode.

To be precise, I don't have ideas for that how to keep track and record this constraint relationship 
during the bottom-up transfer derivation process of rel node. So I took the first approach above.


Secondly.
I just searched some docs. And find that in some database such as Mysql, SqlServer and so on, Foreign Key are allowed to be
composited.For the method RelMetadataQuery#getForeignKeys, it seems to return Set<ImmutableBitSet> which can represent the information
for composite foreign keys, I will use Set<ImmutableBitSet> to fix the composite foreign keys representation.

WDYT?

> Expand ProjectJoinRemoveRule to support inner join removal by using the foreign-unique constraints
> --------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5756
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5756
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: winds
>            Assignee: winds
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.35.0
>
>
> Join elimination is a useful optmize improvement. 
> Consider a query that joins the two tables but does not make use of the Dept columns:
> {code:java}
> SELECT Emp.name, Emp.salary
> FROM Emp, Dept
> WHERE Emp.deptno = Dept.dno {code}
> Assuming Emp.deptno is the foreign-key and is non-null, Dept.dno is the unique-key. The sql above can be rewritten as following. remove the Dept table without affecting the resultset.
> {code:java}
> SELECT Emp.name, Emp.salary
> FROM Emp {code}
> Without redundant join elimination, this query execution may perform poorly.
> The optimize improvement is also available in SQL Server, Oracle and Snowflake and so on.
> In Calcite, i think that is also useful. The infrastructure that join elimination depend on is already available.
> The main steps are as follows:
> 1. Analyse the column used by project, and then split them to left and right side.
> 2. Acccording to the project info above and outer join type, bail out in some scene.
> 3. Get join info such as join keys.
> 4. For inner join check foreign and unique keys, these may use
> RelMetadataQuery#getForeignKeys(newly add, similar to RelMetadataQuery#getUniqueKeys),
> RelOptTable#getReferentialConstraints.
> 5. Check removing side join keys are areColumnsUnique both for outer join and inner join.
> 6. If all done, calculate the fianl project and transform. 
> Please help me to check the improvement whether is useful or not.
> And i would like to add this improvement to Calcite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)