You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2020/06/01 22:57:00 UTC
[jira] [Commented] (CALCITE-3975) ProjectFilterTransposeRule should succeed for project that happens to reference all input columns

    [ https://issues.apache.org/jira/browse/CALCITE-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121388#comment-17121388 ] 

Julian Hyde commented on CALCITE-3975:
--------------------------------------

I looked into this some more. It turns out that the rule is functioning as intended, but the intention is to push down *field references* not whole expressions.

Consider this query (as discussed in an [email thread|https://lists.apache.org/thread.html/r4e9873a8dcc1186623837fcc7f92a61ab921ad95bd07ff15a5f03568%40%3Cdev.calcite.apache.org%3E]):
{code}
SELECT sal + comm AS remuneration
FROM Emp
WHERE deptno = 10
{code}

There are 3 things that ProjectFilterTransposeRule could do, and here are the resulting queries:

*Option A*. Push down references to the fields needed by filter and project expressions.

This is often the preferred option when you are trying optimize performance. You want to narrow the stream of records (reducing the amount of memory and copying) but defer the effort of computing expressions until after the Filter, when the number of rows has been decreased.

{code}
SELECT sal + comm AS remuneration
FROM (
    SELECT sal, comm, deptno
    FROM Emp)
WHERE deptno = 10
{code}

*Option B*. Push down the project expression, and references to the fields needed by the filter.

{code}
SELECT remuneration
FROM (
    SELECT sal + comm  AS remuneration, deptno
    FROM Emp)
WHERE deptno = 10
{code}

*Option C*. Push down the project and filter expressions.

{code}
SELECT remuneration
FROM (
  SELECT sal + comm  AS remuneration, deptno = 10 AS predicate
  FROM Emp)
WHERE predicate
{code}

The rule currently does option A. I propose to add configuration options to allow B or C.


> ProjectFilterTransposeRule should succeed for project that happens to reference all input columns
> -------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-3975
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3975
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.22.0
>            Reporter: Steven Talbot
>            Assignee: Julian Hyde
>            Priority: Major
>
> ... that is, I think
> If I make the trivial fix of just "only skip trivial projects", something like 
> {noformat}
> && origProj.getProjects().stream().allMatch((proj) -> proj instanceof RexInputRef) {noformat}
> at [https://github.com/apache/calcite/blob/571731b80a58eb095ebac7123285c375e7afff90/core/src/main/java/org/apache/calcite/rel/rules/PushProjector.java#L354 |https://github.com/apache/calcite/blob/571731b80a58eb095ebac7123285c375e7afff90/core/src/main/java/org/apache/calcite/rel/rules/PushProjector.java#L354]HepPlanner goes into infinite recursion with the rule.
> But here's the test case:
>  
> {code:java}
> @Test public void testPushProjectPastFilter3() {
>   final String sql = "select empno + deptno, ename, job, mgr, hiredate, sal, comm, slacker from emp where sal = 10 * comm\n"
>       + "and upper(ename) = 'FOO'";
>   sql(sql).withRule(ProjectFilterTransposeRule.INSTANCE).check();
> }
> {code}
>  
>  
>  
> {noformat}
>     <TestCase name="testPushProjectPastFilter3">
>         <Resource name="sql">
>             <![CDATA[select empno + deptno, emp* from emp where sal = 10 * comm
> and upper(ename) = 'FOO']]>
>         </Resource>
>         <Resource name="planBefore">
>             <![CDATA[
> LogicalProject(EXPR$0=[+($0, $7)], EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>   LogicalFilter(condition=[AND(=($5, *(10, $6)), =(UPPER($1), 'FOO'))])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> ]]>
>         </Resource>
>         <Resource name="planAfter">
>             <![CDATA[
> -- note: probably a LogicalProject here, although see below comment
> LogicalFilter(condition=[AND(=($5, *(10, $6)), =(UPPER($1), 'FOO'))])
>   LogicalProject(EXPR$0=[+($0, $7)], EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> ]]>
>         </Resource>
>     </TestCase>{noformat}
>  
> There's no reason the rule shouldn't succeed here, right? Or am I missing something?
> The reason this rule goes into an infinite recursion with hepplanner is because it sticks a project on top after transpose to handle common expressions extracted from the filter and the project. Ideally, it could have a mode where it could avoid doing that and do a true "transpose" if there was no need for it. For example, I don't think there is a a need for a reproject on top in this test case: you can just transpose and everything works as it should. This would be another way to avoid infinite recursion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)