You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "godfrey he (JIRA)" <ji...@apache.org> on 2018/05/08 07:59:00 UTC
[jira] [Comment Edited] (CALCITE-2298) Correlated SubQuery in Having generates error plan when correlated fields does not exist

    [ https://issues.apache.org/jira/browse/CALCITE-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467037#comment-16467037 ] 

godfrey he edited comment on CALCITE-2298 at 5/8/18 7:58 AM:
-------------------------------------------------------------

Thanks for the suggestions [~julianhyde].

yes, there are a lot of code to handle the following cases:
 # correlated fields in SubQuery are in parent's grouping fields. (we can link the scope for HAVING to the aggregating scope as your suggestion, and call _checkAggregateExpr_ to validate the expr)
 # correlated fields in SubQuery are in an aggregator. detects this pattern, no validation for this case）
 # nested SubQuery, e.g. 
{code:java}
check("select ename from sales.emp group by ename, deptno\n" + "having max(sal) <= \n" + " (select max(sal) from sales.emp_b e1 group by e1.ename\n" + " having min(sal) < \n" + " (select min(sal) from sales.emp_b e2 where emp.deptno = e2.deptno group by ename))");
{code}

        (we need find the correct aggregating scope which outputs the correlated fields. for the above case, the relationship of scopes is:
{noformat}
EmptyScope
 |__ CatalogScope (table emp)
      |__ SelectScope 
           |__ AggregatingSelectScope (group by ename, deptno)
                |__ SelectScope (SubQuery1, emp_b e1)
                     |__ AggregatingSelectScope (group by e1.name)
                          |__ SelectScope (SubQuery2, emp_b e2)
{noformat}
_emp.deptno = e2.deptno_ is in _SelectScope (SubQuery2, emp_b e2)_ and _emp.deptno_ is from _AggregatingSelectScope (group by ename, deptno),_ I don't know how to build the link for this case.  )

     4. grouping expr and correlated expr are SqlCall, e.g.
{code:java}
check("select deptno + 1, sum(sal) from emp group by deptno + 1\n"
        + "having max(sal) > \n"
        + "    (select max(sal) from emp e where emp.deptno + 1 = e.deptno group by ename)"); {code}
      5. correlated SubQuery is in an aggregator（detects this pattern, no validation for this case）
{code:java}
check("SELECT SUM(\n"
        + "  10 + (select char_length(name) from dept\n"
        + "   where dept.deptno = emp.empno))\n"
        + "FROM emp");
{code}
Looking forward to your advice, many thanks!

 


was (Author: godfreyhe):
Thanks for the suggestions [~julianhyde].

yes, there are a lot of code to handle the following cases:
 # correlated fields in SubQuery are in parent's grouping fields. (we can link the scope for HAVING to the aggregating scope as your suggestion, and call _checkAggregateExpr_ to validate the expr)
 # correlated fields in SubQuery are in an aggregator. detects this pattern, no validation for this case）
 # nested SubQuery, e.g. 
{code:java}
check("select ename from sales.emp group by ename, deptno\n" + "having max(sal) <= \n" + " (select max(sal) from sales.emp_b e1 group by e1.ename\n" + " having min(sal) < \n" + " (select min(sal) from sales.emp_b e2 where emp.deptno = e2.deptno group by ename))");
{code}

        (we need find the correct aggregating scope which outputs the correlated fields. for the above case, the relationship of scopes is:
{noformat}
EmptyScope
 |__ CatalogScope (table emp)
      |__ SelectScope 
           |__ AggregatingSelectScope (group by ename, deptno)
                |__ SelectScope (SubQuery1, emp_b e1)
                     |__ AggregatingSelectScope (group by e1.name)
                          |__ SelectScope (SubQuery2, emp_b e2)
{noformat}
_emp.deptno = e2.deptno_ is in _SelectScope (SubQuery2, emp_b e2)_ and _emp.deptno_ is from _AggregatingSelectScope (group by ename, deptno),_ I don't know how to build the link for this case. __ )

     4. grouping expr and correlated expr are SqlCall, e.g.
{code:java}
check("select deptno + 1, sum(sal) from emp group by deptno + 1\n"
        + "having max(sal) > \n"
        + "    (select max(sal) from emp e where emp.deptno + 1 = e.deptno group by ename)"); {code}
      5. correlated SubQuery is in an aggregator（detects this pattern, no validation for this case）
{code:java}
check("SELECT SUM(\n"
        + "  10 + (select char_length(name) from dept\n"
        + "   where dept.deptno = emp.empno))\n"
        + "FROM emp");
{code}
Looking forward to your advice, many thanks!

 

> Correlated SubQuery in Having generates error plan when correlated fields does not exist
> ----------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2298
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2298
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.15.0
>            Reporter: godfrey he
>            Assignee: Julian Hyde
>            Priority: Major
>
> {code}
>  @Test public void testDecorrelateWithUnresolvedField() throws Exception {
>     final String sql = "select ename\n"
>                        + "from sales.emp\n"
>                        + "group by ename, deptno\n"
>                        + "having max(sal) <= \n"
>                        + "  (select max(sal) from sales.emp_b where emp.job = emp_b.job group by ename)";
>     checkSubQuery(sql).withLateDecorrelation(true).check();
>   }
> {code}
> for now, we will get the following plan:
> {code}
> LogicalProject(ENAME=[$0])
>   LogicalFilter(condition=[<=($2, $SCALAR_QUERY({
> LogicalProject(EXPR$0=[$1])
>   LogicalAggregate(group=[{0}], EXPR$0=[MAX($1)])
>     LogicalProject(ENAME=[$1], SAL=[$5])
>       LogicalFilter(condition=[=($cor0.JOB, $2)])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP_B]])
> }))])
>     LogicalAggregate(group=[{0, 1}], agg#0=[MAX($2)])
>       LogicalProject(ENAME=[$1], DEPTNO=[$7], SAL=[$5])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> {code}
> However emp.job is not the grouping fields or the aggCall result.
> the expected result is throwing an Exception, like: 
> {code}
> org.apache.calcite.runtime.CalciteContextException: From line 5, column 47 to line 5, column 50: Column 'JOB' not found in 'LogicalAggregate(group=[{0, 1}], agg#0=[MAX($2)])'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)