You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Vladimir Sitnikov <si...@gmail.com> on 2019/01/13 19:51:11 UTC

[CALCITE-2454] RexLiteral digest

Hi,

Historically RexLiteral digest did not include type. It caused confusion
for cases like Project(x=1) where 1 might be "integer" or "bigint".

The solution was to use Pair<digestString, relDataType> for the relation
key in planner (Volcano used that, and Hep used just digest).
So far so good, but we need to distinguish
Project(x=cast(1:int+2:int):float) from
Project(x=cast(1:float+2:float):float), so using "rel type" does not help
much.

The idea to approach the issue is to add data type to the RexLiteral
representation.
For instance, use 1:BIGINT instead of just 1. The downside is extra types
might add extra verbosity.

So the idea is to hide types for "well known cases":
1) Hide "NOT NULL" for not null literals
2) Hide INTEGER, BOOLEAN, SYMBOL, TIME(0), TIMESTAMP(0), DATE(0) types
3) Hide collation when it matches IMPLICIT/COERCIBLE
4) Hide charset when it matches default
5) Hide CHAR(xx) when literal length is equal to the precision of the type.
In other words, use 'Bob' instead of 'Bob':CHAR(3)
6) Hide BOOL for AND/OR arguments. In other words, AND(true, null) means
null is BOOL.
7) Hide types for literals in simple binary operations (e.g. +, -, *, /,
comparison) when type of the other argument is clear.
For instance: =(true. null) means null is BOOL.  =($0, null) means the type
of null matches the type of $0.

The main aim is to enforce org.apache.calcite.plan.RelOptNode#getDigest
contract:

* Returns a string which concisely describes the definition of this
* relational expression. Two relational expressions are equivalent if and
* only if their digests are the same.


Please feel free to comment/review
https://github.com/apache/calcite/pull/1002

I'm afraid the change might result in unwanted ripples here and there.
On the other hand, it might result in wanted ripples as well. For instance,
it happens to kills CHARACTER SET \"ISO-8859-1\" COLLATE
\"ISO-8859-1$en_US$primary\" that was present here and there :)

Vladimir

Re: [CALCITE-2454] RexLiteral digest

Posted by Julian Hyde <jh...@apache.org>.
Thanks Vladimir. I have reviewed and added comments to
https://issues.apache.org/jira/browse/CALCITE-2454.

On Sun, Jan 13, 2019 at 11:51 AM Vladimir Sitnikov
<si...@gmail.com> wrote:
>
> Hi,
>
> Historically RexLiteral digest did not include type. It caused confusion
> for cases like Project(x=1) where 1 might be "integer" or "bigint".
>
> The solution was to use Pair<digestString, relDataType> for the relation
> key in planner (Volcano used that, and Hep used just digest).
> So far so good, but we need to distinguish
> Project(x=cast(1:int+2:int):float) from
> Project(x=cast(1:float+2:float):float), so using "rel type" does not help
> much.
>
> The idea to approach the issue is to add data type to the RexLiteral
> representation.
> For instance, use 1:BIGINT instead of just 1. The downside is extra types
> might add extra verbosity.
>
> So the idea is to hide types for "well known cases":
> 1) Hide "NOT NULL" for not null literals
> 2) Hide INTEGER, BOOLEAN, SYMBOL, TIME(0), TIMESTAMP(0), DATE(0) types
> 3) Hide collation when it matches IMPLICIT/COERCIBLE
> 4) Hide charset when it matches default
> 5) Hide CHAR(xx) when literal length is equal to the precision of the type.
> In other words, use 'Bob' instead of 'Bob':CHAR(3)
> 6) Hide BOOL for AND/OR arguments. In other words, AND(true, null) means
> null is BOOL.
> 7) Hide types for literals in simple binary operations (e.g. +, -, *, /,
> comparison) when type of the other argument is clear.
> For instance: =(true. null) means null is BOOL.  =($0, null) means the type
> of null matches the type of $0.
>
> The main aim is to enforce org.apache.calcite.plan.RelOptNode#getDigest
> contract:
>
> * Returns a string which concisely describes the definition of this
> * relational expression. Two relational expressions are equivalent if and
> * only if their digests are the same.
>
>
> Please feel free to comment/review
> https://github.com/apache/calcite/pull/1002
>
> I'm afraid the change might result in unwanted ripples here and there.
> On the other hand, it might result in wanted ripples as well. For instance,
> it happens to kills CHARACTER SET \"ISO-8859-1\" COLLATE
> \"ISO-8859-1$en_US$primary\" that was present here and there :)
>
> Vladimir