You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Alessandro Solimando (Jira)" <ji...@apache.org> on 2022/10/28 10:28:00 UTC

[jira] [Comment Edited] (CALCITE-5345) UnionPullUpConstantsRule could also pull up constants requiring a cast

    [ https://issues.apache.org/jira/browse/CALCITE-5345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625580#comment-17625580 ] 

Alessandro Solimando edited comment on CALCITE-5345 at 10/28/22 10:27 AM:
--------------------------------------------------------------------------

Keep in mind that the original plan shown in the description is the original plan we get after query validation, it's OT for this ticket (and the rule altogether).

Just for the records, in Calcite:
{code:java}
select deptno, ename from emp where ename = 1.0{code}
is valid and translates to:
{noformat}
LogicalFilter(condition=[=(CAST($1):DECIMAL(11, 1) NOT NULL, 1.0)]){noformat}
 

In Postgres, it's not accepted:
{code:java}
SELECT * FROM t where ename = 1.0;{code}
{noformat}
ERROR: operator does not exist: character varying = numeric Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts. Position: 28{noformat}
 

Adding the explicit cast raises an error, as you say:
{code:java}
SELECT * FROM t where ename::int = 1;{code}
{noformat}
ERROR: invalid input syntax for integer: "stamatis"{noformat}
Getting back to the specific rule (and the enhancement proposed in this ticket), for what you say we have two edge cases:

1) cast over the column gives an error: in this case the query will give error, as the filter has to be executed anyway, including the cast, therefore the plan change won't make any difference

2) cast over the column returns null: in this case the specific row will be filtered out under unknownAs=false semantics of _WHERE_ clauses, so the top-most constant in the project seem valid to me.

In both cases, I guess we are fine, so the current behaviour of _UnionPullUpConstantRule_ seems safe and sound to me.

For the top-most cast, the one applied to the constant, the story is different, and I need to give it some more thinking because if the cast produces _null_ or raises an error, that's probably an invalid plan.

While playing with this, I realized another issue, related to the fact that casting is not symmetric, _cast(a, t_b) = b_ is not equivalent to _a = cast(b, t_a)._

I need to see if there are cases where this can pose a real issue, I tried with _varchar_ of different lengths but this is handled well (if you use a longer constant, it's truncated to fit).


was (Author: asolimando):
Keep in mind that the original plan shown in the description is the original plan we get after query validation, it's OT for this ticket (and the rule altogether).

Just for the records, in Calcite:

 
{code:java}
select deptno, ename from emp where ename = 1.0{code}
is valid and translates to:

 

 
{noformat}
LogicalFilter(condition=[=(CAST($1):DECIMAL(11, 1) NOT NULL, 1.0)]){noformat}
 

In Postgres, it's not accepted:

 
{code:java}
SELECT * FROM t where ename = 1.0;{code}
 
{noformat}
ERROR: operator does not exist: character varying = numeric Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts. Position: 28{noformat}
Adding the explicit cast raises an error, as you say:

 
{code:java}
SELECT * FROM t where ename::int = 1;{code}
 
{noformat}
ERROR: invalid input syntax for integer: "stamatis"{noformat}
Getting back to the specific rule (and the enhancement proposed in this ticket), for what you say we have two edge cases:

1) cast over the column gives an error: in this case the query will give error, as the filter has to be executed anyway, including the cast, therefore the plan change won't make any difference

2) cast over the column returns null: in this case the specific row will be filtered out under unknownAs=false semantics of _WHERE_ clauses, so the top-most constant in the project seem valid to me.

In both cases, I guess we are fine, so the current behaviour of _UnionPullUpConstantRule_ seems safe and sound to me.

For the top-most cast, the one applied to the constant, the story is different, and I need to give it some more thinking because if the cast produces _null_ or raises an error, that's probably an invalid plan.

While playing with this, I realized another issue, related to the fact that casting is not symmetric, _cast(a, t_b) = b_ is not equivalent to _a = cast(b, t_a)._

I need to see if there are cases where this can pose a real issue, I tried with _varchar_ of different lengths but this is handled well (if you use a longer constant, it's truncated to fit).

> UnionPullUpConstantsRule could also pull up constants requiring a cast
> ----------------------------------------------------------------------
>
>                 Key: CALCITE-5345
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5345
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.32.0
>            Reporter: Alessandro Solimando
>            Assignee: Alessandro Solimando
>            Priority: Major
>
> Consider the following SQL query:
> {code:java}
> select deptno, ename from emp where deptno = 1.0
> union all
> select deptno, ename from emp where deptno = 1.0
> {code}
> The associated plan is as follows:
> {code:java}
> LogicalUnion(all=[true])
>   LogicalProject(DEPTNO=[$1], ENAME=[$0])
>     LogicalFilter(condition=[=(CAST($1):DECIMAL(11, 1) NOT NULL, 1.0)])
>       LogicalProject(ENAME=[$1], DEPTNO=[$7])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalProject(DEPTNO=[$1], ENAME=[$0])
>     LogicalFilter(condition=[=(CAST($1):DECIMAL(11, 1) NOT NULL, 1.0)])
>       LogicalProject(ENAME=[$1], DEPTNO=[$7])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]]){code}
> Note that since _deptno_ is of type {_}int{_}, a cast is needed in the filter ({_}i.e., LogicalFilter(condition=[=(CAST($1):DECIMAL(11, 1) NOT NULL, 1.0)]){_}).
> {_}UnionPullUpConstantsRule{_}, as currently written, processes only (pulled-up) predicates of the form "{_}=($i, $literal){_}", while now that CALCITE-5337 is present, it could also process "{_}=(CAST($i, $type), $literal){_}", because the need of a cast is recognized and the cast added in the projection when the constant is pulled up (if needed).
> The aforementioned query would be optimized in this way:
> {code:java}
> LogicalProject(DEPTNO=[1], ENAME=[$0])
>   LogicalUnion(all=[true])
>     LogicalProject(ENAME=[$0])
>       LogicalFilter(condition=[=(CAST($1):DECIMAL(11, 1) NOT NULL, 1.0)])
>         LogicalProject(ENAME=[$1], DEPTNO=[$7])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>     LogicalProject(ENAME=[$0])
>       LogicalFilter(condition=[=(CAST($1):DECIMAL(11, 1) NOT NULL, 1.0)])
>         LogicalProject(ENAME=[$1], DEPTNO=[$7])
>           LogicalTableScan(table=[[CATALOG, SALES, EMP]]){code}
> Without this improvement, the plan would not change after applying {_}UnionPullUpConstantsRule{_}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)