You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2023/04/19 01:15:00 UTC

[jira] [Commented] (CALCITE-5661) Introduce another way to convert IN predicate to RelNode when IN list is large

    [ https://issues.apache.org/jira/browse/CALCITE-5661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713814#comment-17713814 ] 

Julian Hyde commented on CALCITE-5661:
--------------------------------------

If it takes 2 minutes there's probably a quadratic algorithm somewhere. I suggest you find the problem (stack sampling usually works well) and fix it.

> Introduce another way to convert IN predicate to RelNode when IN list is large
> ------------------------------------------------------------------------------
>
>                 Key: CALCITE-5661
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5661
>             Project: Calcite
>          Issue Type: Improvement
>    Affects Versions: 1.34.0
>            Reporter: Runkang He
>            Priority: Major
>
> When IN list is large, the plan generation is time-consuming, after benchmark, when the IN value list size was 3w, it took 2 minutes to generate the final plan.
> {code:sql}
> select empno from emp where deptno in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ..., 30000){code}
> We find that in sql-to-rel phase, there are two methods to convert IN predicate to RelNode:
> 1.IN list size is below InSubQueryThreshold, convert IN to OR;
> 2.IN list size is over InSubQueryThreshold, convert IN to VALUES + JOIN.
> The first one will be very time-consuming in the expression simplification stage for the large OR predicate. As mentioned before, when the IN value list size was 3w, it took 2 minutes, which is not acceptable in OLAP scenarios.
> The second one will not be able to apply predicate pushdown, which it is very important in OLAP scenarios.
> So maybe we need to support converting IN to RexCall directly to avoid the disadvantages of the above two methods.
> After POC, when convert IN to RexCall directly, it takes less than 1 second to generate the final plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)