You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Shiven D <sh...@gmail.com> on 2022/04/29 23:09:30 UTC

Transforming the query after decorrelation

Hi,
Would like to get some inputs on following problem,
We have the following source SQL:

{code}
SELECT T1_FIELD1, T1_FIELD3, Amountdue
FROM T1, LATERAL (
    SELECT SUM( T2_NUM1 + T2_NUM2 + T2_NUM3) as Amountdue FROM T2
    WHERE T2_NUM4 = T1.ID
)
WHERE T1_FIELD3 IN (200, 300, 800) AND T1_FIELD4 = "Sales"
{code}

We applied the decorrelation API: (RelDecorrelator.decorrelateQuery()) and
the resulted decorrelated sql:

{code}
SELECT
   `t`.`T1_FIELD1`,
   `t`.`T1_FIELD3`,
   `t1`.`Amountdue`
FROM
   (
      SELECT
         *
      FROM
         `T1`
      WHERE
         `T1_FIELD3` IN
         (
            200,
            300,
            800
         )
         AND `T1_FIELD4` = 'Sales'
   )
   AS `t`
   LEFT JOIN
      (
         SELECT
            `T2_NUM4`,
            SUM(`T2_NUM1` + `T2_NUM2` + `T2_NUM3`) AS `Amountdue`
         FROM
            `T2`

         GROUP BY
            `T2_NUM4`
      )
      AS `t1`
      ON `t`.`ID` = `t1`.`T2_NUM4`
    {code}

The cardinality of the query on right hand side of the join (on table T2)
is very high.
We are exploring some options to push the filter that is on left side of
join to right side so that we can potentially get a reduced set of results.
something like this:

    {code}
    SELECT
       `t`.`T1_FIELD1`,
       `t`.`T1_FIELD3`,
       `t1`.`Amountdue`
    FROM
       (
          SELECT
             *
          FROM
             `T1`
          WHERE
             `T1_FIELD3` IN
             (
                200,
                300,
                800
             )
             AND `T1_FIELD4` = 'Sales'
       )
       AS `t`
       LEFT JOIN
          (
             SELECT
                `T2_NUM4`,
                SUM(`T2_NUM1` + `T2_NUM2` + `T2_NUM3`) AS `Amountdue`
             FROM
                `T2`
             WHERE T2_NUM4 IN  ( SELECT `ID` FROM `T1` WHERE `T1_FIELD3` IN
(200, 300, 800) AND `T1_FIELD4` = 'Sales')
             GROUP BY
                `T2_NUM4`
          )
          AS `t1`
          ON `t`.`ID` = `t1`.`T2_NUM4`
        {code}

is there any API or existing rule that can do such transformation? any
other suggestions?

Thanks

Re: Transforming the query after decorrelation

Posted by Benchao Li <li...@apache.org>.
Hi Shiven,

AFAIK, Calcite does not have a such rule for now.

There is a similar rule[1], which only pushes the predicates inferred from
the other side.

[1]
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/rules/JoinPushTransitivePredicatesRule.java


Shiven D <sh...@gmail.com> 于2022年5月1日周日 06:41写道:

> Hi,
> Would like to get some inputs on following problem,
> We have the following source SQL:
>
> {code}
> SELECT T1_FIELD1, T1_FIELD3, Amountdue
> FROM T1, LATERAL (
>     SELECT SUM( T2_NUM1 + T2_NUM2 + T2_NUM3) as Amountdue FROM T2
>     WHERE T2_NUM4 = T1.ID
> )
> WHERE T1_FIELD3 IN (200, 300, 800) AND T1_FIELD4 = "Sales"
> {code}
>
> We applied the decorrelation API: (RelDecorrelator.decorrelateQuery()) and
> the resulted decorrelated sql:
>
> {code}
> SELECT
>    `t`.`T1_FIELD1`,
>    `t`.`T1_FIELD3`,
>    `t1`.`Amountdue`
> FROM
>    (
>       SELECT
>          *
>       FROM
>          `T1`
>       WHERE
>          `T1_FIELD3` IN
>          (
>             200,
>             300,
>             800
>          )
>          AND `T1_FIELD4` = 'Sales'
>    )
>    AS `t`
>    LEFT JOIN
>       (
>          SELECT
>             `T2_NUM4`,
>             SUM(`T2_NUM1` + `T2_NUM2` + `T2_NUM3`) AS `Amountdue`
>          FROM
>             `T2`
>
>          GROUP BY
>             `T2_NUM4`
>       )
>       AS `t1`
>       ON `t`.`ID` = `t1`.`T2_NUM4`
>     {code}
>
> The cardinality of the query on right hand side of the join (on table T2)
> is very high.
> We are exploring some options to push the filter that is on left side of
> join to right side so that we can potentially get a reduced set of results.
> something like this:
>
>     {code}
>     SELECT
>        `t`.`T1_FIELD1`,
>        `t`.`T1_FIELD3`,
>        `t1`.`Amountdue`
>     FROM
>        (
>           SELECT
>              *
>           FROM
>              `T1`
>           WHERE
>              `T1_FIELD3` IN
>              (
>                 200,
>                 300,
>                 800
>              )
>              AND `T1_FIELD4` = 'Sales'
>        )
>        AS `t`
>        LEFT JOIN
>           (
>              SELECT
>                 `T2_NUM4`,
>                 SUM(`T2_NUM1` + `T2_NUM2` + `T2_NUM3`) AS `Amountdue`
>              FROM
>                 `T2`
>              WHERE T2_NUM4 IN  ( SELECT `ID` FROM `T1` WHERE `T1_FIELD3` IN
> (200, 300, 800) AND `T1_FIELD4` = 'Sales')
>              GROUP BY
>                 `T2_NUM4`
>           )
>           AS `t1`
>           ON `t`.`ID` = `t1`.`T2_NUM4`
>         {code}
>
> is there any API or existing rule that can do such transformation? any
> other suggestions?
>
> Thanks
>


-- 

Best,
Benchao Li