You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Runkang He (Jira)" <ji...@apache.org> on 2023/04/19 00:39:00 UTC

[jira] [Created] (CALCITE-5661) Introduce another way to convert IN predicate to RelNode when IN list is large

Runkang He created CALCITE-5661:
-----------------------------------

             Summary: Introduce another way to convert IN predicate to RelNode when IN list is large
                 Key: CALCITE-5661
                 URL: https://issues.apache.org/jira/browse/CALCITE-5661
             Project: Calcite
          Issue Type: Improvement
    Affects Versions: 1.34.0
            Reporter: Runkang He


When IN list is large, the plan generation is time-consuming, after benchmark, when the IN value list size was 3w, it took 2 minutes to generate the final plan.
{code:java}
select empno from emp where deptno in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ..., 30000){code}
We find that in sql-to-rel phase, there are two methods to convert IN predicate to RelNode:
1.IN list size is below InSubQueryThreshold, convert IN to OR;
2.IN list size is over InSubQueryThreshold, convert IN to VALUES + JOIN.
The first one will be very time-consuming in the expression simplification stage for the large OR predicate. As mentioned before, when the IN value list size was 3w, it took 2 minutes, which is not acceptable in OLAP scenarios.
The second one will not be able to apply IN predicate pushdown, which it is very important in OLAP scenarios.
SO maybe we need to support converting IN to RexCall directly to avoid the disadvantages of the above two methods.
After POC, when convert IN to RexCall directly, it takes less than 1 second to generate the final plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)