You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Stamatis Zampetakis (JIRA)" <ji...@apache.org> on 2018/10/18 08:24:00 UTC

[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

    [ https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16654853#comment-16654853 ] 

Stamatis Zampetakis commented on CALCITE-2630:
----------------------------------------------

As far as it concerns IN expressions with a list of literals it might make sense. However, when it comes to IN expressions with subqueries I am not sure if it will be beneficial. In particular, because IN with subquery can often be transformed to a semi-join, which has clear semantics and many known properties for optimization. Moreover, note that the existing runtime does not provide an implementation for the IN operator (see [CALCITE-2618|https://issues.apache.org/jira/browse/CALCITE-2618]).

> Convert SqlInOperator to In-Expression
> --------------------------------------
>
>                 Key: CALCITE-2630
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2630
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.17.0
>            Reporter: pengzhiwei
>            Assignee: Julian Hyde
>            Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's operands less than "inSubQueryThreshold" or  to "Join" when the operands count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>     LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>      LogicalProject(ROW_VALUE=[$0], $f1=[true])
>       LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>     LogicalProject(ROW_VALUE=[$0], $f1=[true])
>       LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is no translation on "IN" and just keep it as it is.This would be much clear in the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)