You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2022/01/12 19:37:00 UTC

[jira] [Comment Edited] (CALCITE-4872) UNKNOWN SqlTypeName erroneously treated as NULL

    [ https://issues.apache.org/jira/browse/CALCITE-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450692#comment-17450692 ] 

Julian Hyde edited comment on CALCITE-4872 at 1/12/22, 7:36 PM:
----------------------------------------------------------------

Yes, there's a difference between {{ANY}} and {{{}UNKNOWN{}}}, but it's not very well defined. I'm not very sure what the definitions are, or what they should be.

The definitions, such as they are, are in the descriptions of the types (e.g. in reference.md and SqlTypeName), in the test suite, and in people's expectations. The way to improve those definitions is by writing tests, proposing changes to existing tests, proposing changes to those definitions, and discussions.

I'm going to throw into the mix a proposed {{VARIANT}} type. (See CALCITE-4918.) Unlike {{ANY}} and {{{}UNKNOWN{}}}, {{VARIANT}} actually exists in commercial databases (MSSQL, Snowflake, Teradata). It would be useful for [Snowflake-style JSON support|https://lists.apache.org/thread/2zvqqbsg6t9zr3njjo8mfpxt2fkfo5nx] which is similar to the late-binding schema implemented in Drill and proposed for Calcite in CALCITE-1257.

My definitions are:
 * {{UNKNOWN}} is a placeholder for a type that exists but hasn't been figured out yet.
 * {{ANY}} is a shorthand for function signatures that says that the overload exists for all types. For example, the "IS NULL" operator has type "ANY → BOOLEAN".
 * {{VARIANT}} is the type of a slot (e.g. a column) that can have a variety of types at runtime. The type of the value can be inspected at runtime (say by an {{INSTANCEOF}} or [TYPEOF|https://docs.snowflake.com/en/sql-reference/functions/typeof.html] or {{IS_type}} operator), and the value can safely be cast to its actual type (say by a [TRY_CAST|https://docs.microsoft.com/en-us/sql/t-sql/functions/try-cast-transact-sql?view=sql-server-ver15] operator). 

One way to figure out the real type from an {{UNKNOWN}} is by applying more and more constraints. For example, if {{u}} has type {{UNKNOWN}} and I use the expression "{{{}u + 10{}}}" then I know that {{u}} has a numeric type, because only numerics (and possibly also {{INTERVAL}} types) can be on the left side of "+ INTEGER". If I write "{{{}u IS NULL{}}}" then I don't find out anything about the type of {{u}} (because {{IS NULL}} applies to all types) but I know that the result is {{{}BOOLEAN NOT NULL{}}}.

There are more details around {{{}VARIANT{}}}. Perhaps {{VARIANT}} would allow syntactic sugar: if {{v}} has type \{{VARIANT}} then I can write {{v.deptno}} and it expands to {{CASE WHEN v INSTANCEOF MAP THEN (CAST(v AS MAP)['deptno'] ELSE NULL END}}. Perhaps {{v}} has a constraint {{CHECK (v INSTANCEOF DATE OR v INSTANCEOF TIMESTAMP)}} and therefore the planner can start to do some optimizations.

But anyway, given the above, {{UNKNOWN}} and {{ANY}} are not real data types. They are more like type schemes or type variables.


was (Author: julianhyde):
Yes, there's a difference between {{ANY}} and {{{}UNKNOWN{}}}, but it's not very well defined. I'm not very sure what the definitions are, or what they should be.

The definitions, such as they are, are in the descriptions of the types (e.g. in reference.md and SqlTypeName), in the test suite, and in people's expectations. The way to improve those definitions is by writing tests, proposing changes to existing tests, proposing changes to those definitions, and discussions.

I'm going to throw into the mix a proposed {{VARIANT}} type. Unlike {{ANY}} and {{{}UNKNOWN{}}}, {{VARIANT}} actually exists in commercial databases (MSSQL, Snowflake, Teradata). It would be useful for [Snowflake-style JSON support|https://lists.apache.org/thread/2zvqqbsg6t9zr3njjo8mfpxt2fkfo5nx] which is similar to the late-binding schema implemented in Drill and proposed for Calcite in CALCITE-1257.

My definitions are:
 * {{UNKNOWN}} is a placeholder for a type that exists but hasn't been figured out yet.
 * {{ANY}} is a shorthand for function signatures that says that the overload exists for all types. For example, the "IS NULL" operator has type "ANY → BOOLEAN".
 * {{VARIANT}} is the type of a slot (e.g. a column) that can have a variety of types at runtime. The type of the value can be inspected at runtime (say by an {{INSTANCEOF}} or [TYPEOF|https://docs.snowflake.com/en/sql-reference/functions/typeof.html] or {{IS_type}} operator), and the value can safely be cast to its actual type (say by a [TRY_CAST|https://docs.microsoft.com/en-us/sql/t-sql/functions/try-cast-transact-sql?view=sql-server-ver15] operator). 

One way to figure out the real type from an {{UNKNOWN}} is by applying more and more constraints. For example, if {{u}} has type {{UNKNOWN}} and I use the expression "{{{}u + 10{}}}" then I know that {{u}} has a numeric type, because only numerics (and possibly also {{INTERVAL}} types) can be on the left side of "+ INTEGER". If I write "{{{}u IS NULL{}}}" then I don't find out anything about the type of {{u}} (because {{IS NULL}} applies to all types) but I know that the result is {{{}BOOLEAN NOT NULL{}}}.

There are more details around {{{}VARIANT{}}}. Perhaps {{VARIANT}} would allow syntactic sugar: if {{v}} has type \{{VARIANT}} then I can write {{v.deptno}} and it expands to {{CASE WHEN v INSTANCEOF MAP THEN (CAST(v AS MAP)['deptno'] ELSE NULL END}}. Perhaps {{v}} has a constraint {{CHECK (v INSTANCEOF DATE OR v INSTANCEOF TIMESTAMP)}} and therefore the planner can start to do some optimizations.

But anyway, given the above, {{UNKNOWN}} and {{ANY}} are not real data types. They are more like type schemes or type variables.

> UNKNOWN SqlTypeName erroneously treated as NULL
> -----------------------------------------------
>
>                 Key: CALCITE-4872
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4872
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Will Noble
>            Assignee: Will Noble
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.30.0
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Call {{createWithNullability}} on an {{UnknownSqlType}} currently returns a {{NULL}} type. Furthermore, it's impossible to cast to unknown types, according to {{SqlTypeUtil.canCastFrom}}. This introduces a new {{SqlTypeName}} -- {{UNKNOWN}} -- which behaves like {{NULL}} in all ways except that it's no longer actually equal to {{NULL}}, and can always be cast to.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)