You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Haisheng Yuan (JIRA)" <ji...@apache.org> on 2019/02/01 19:32:00 UTC

[jira] [Commented] (CALCITE-2630) Convert SqlInOperator to In-Expression

    [ https://issues.apache.org/jira/browse/CALCITE-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758607#comment-16758607 ] 

Haisheng Yuan commented on CALCITE-2630:
----------------------------------------

We hit this issue quite often. Sometimes the query is generated from BI tools, and may have tens of thousands of IN constants. Converting IN expression in Project list to Join would make the plan complex. Yet converting to OR expressions would increase the number of Rex nodes dramatically, which may cause optimization overhead, as CALCITE-2792 shows. Converting constant IN list to join also leads to [CALCITE-2696|https://issues.apache.org/jira/browse/CALCITE-2696]. I would prefer leave IN scalar operator as it is if the number of constants greater than a threshold, and let the runtime engine do the hash lookup.

> Convert SqlInOperator to In-Expression
> --------------------------------------
>
>                 Key: CALCITE-2630
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2630
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.17.0
>            Reporter: pengzhiwei
>            Assignee: pengzhiwei
>            Priority: Major
>
> Currently Calcite translate "IN" to "OR" expression when the count of  IN's operands less than "inSubQueryThreshold" or  to "Join" when the operands count greater  than "inSubQueryThreshold" to get better performance.
>   However this translation to "JOIN" is so complex. Especially when the "IN" expression located in the "select" or "join on condition".
> For example:
> {code:java}
> select case when deptno in (1,2) then 0 else 1 end from emp
> {code}
> the logical plan generated as follow:
> {code:java}
> LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), true, IS NULL($11), null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
> LogicalJoin(condition=[=($11, $12)], joinType=[left])
>  LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6], DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], DEPTNO0=[$7])
>   LogicalJoin(condition=[true], joinType=[inner])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>     LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
>      LogicalProject(ROW_VALUE=[$0], $f1=[true])
>       LogicalValues(tuples=[[{ 1 }, { 2 }]])
>   LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
>     LogicalProject(ROW_VALUE=[$0], $f1=[true])
>       LogicalValues(tuples=[[{ 1 }, { 2 }]])
> {code}
> The generated logical plan is so complex for such a simple sql!
> I think we can treat "IN" as a function like "plus" and "minus".So there is no translation on "IN" and just keep it as it is.This would be much clear in the logical plan!
> In the execute stage,We can provide a "InExpression":
> {code:java}
> InExpression(left,condition0,condition1,...) {code}
>  We can put all the constant conditions to a "Set".In that way,the computational complexity can reduce from O(n)to O(1).
> It would be much clear and have a good performance. 
> PS: "In sub-query" is not included in our talk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)