You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Dawid Wysakowicz (Jira)" <ji...@apache.org> on 2020/06/24 08:08:00 UTC
[jira] [Comment Edited] (CALCITE-4085) Improve nullability support for fields of structured type

    [ https://issues.apache.org/jira/browse/CALCITE-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143625#comment-17143625 ] 

Dawid Wysakowicz edited comment on CALCITE-4085 at 6/24/20, 8:07 AM:
---------------------------------------------------------------------

My end goal is to be able to have a nullable record with not null fields. E.g. {{USER(id NOT NULL) <NULLABLE>}}. That is because we might have a UDF that expects {{USER(id NOT NULL) <NULLABLE>}}. From that point of view the logic in RelDataTypeFactoryImpl[1] is incorrect as it changes the nullability of nested fields.

I fully agree with 

??"For expression "[A].[B]", if [A] is nullable then [A].[B] should also be nullable". ??

I don't think though this should be fixed at the time when the RecordType is created, as it makes, in my opinion, a valid use case described in the first paragraph impossible. This logic in my understanding/opinion is very similar if not exactly the same as {{SqlTypeTransforms#TO_NULLABLE}}, and it does belong to every operator that accesses nested fields. Of course you cannot prevent people to add new operators, but again in my opinion this belongs to the operator logic the same as the mentioned {{SqlTypeTransforms#TO_NULLABLE}}.

This is incorrect:

??You want the int type to be nullable. But in the POJO code, you want to translate the id to int instead of integer. In my point, the type inference of current Calcite is very close to correct. What we should do is the QE (when we translate the POJO, to do a promotion in the code instead) which seems more feasible.??

I want the int to be {{NOT NULL}} and be translated to {{int}}. Calcite will convert it to {{NULLABLE}} because of the logic in [1].

From my perspective changes in RexBuilder#makeFieldAccessInternal, SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) are necessary because I don't want the logic in the type factory as it breaks my use case from the first paragraph.

The changes in RexBuilder and SqlValidatorImpl are required for a query like {{SELECT u.nullable_row.not_null_nested_field FROM tableWithNullableRowColumn}}, because those do not use SqlDotOperator, but create a {{RexFieldAccess}}.  I split the PR into three commits because I do understand the changes in RexBuilder#makeFieldAccessInternal, SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) are quite invasive and you might not agree to incorporate them. So if you are strongly against those I will just drop those from the PR.

I'd really like to see the other two commits merged. Still the changes in {{AliasNamespace}} make tests in {{TableFunctionTest}} fail imo because of the logic in [1].  The tests there join with lateral table generated from a UDF which return type is {{IntString(n INT NOT NULL, s String) <NULLABLE>}}. If I change the {{AliasNamespace}} to keep the original nullability(which imo is correct, I don't see a reason why aliasing should change nullability), then the logic in [1] changes the nullability of n which in turn causes mismatch.

[1] https://github.com/apache/calcite/blob/0769a8b31cbbeb5bca66ade30cf3710523da4aaa/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeFactoryImpl.java#L338


was (Author: dawidwys):
My end goal is to be able to have a nullable record with not null fields. E.g. {{USER(id NOT NULL) <NULLABLE>}}. That is because we might have a UDF that expects {{USER(id NOT NULL) <NULLABLE>}}. From that point of view the logic in RelDataTypeFactoryImpl[1] is incorrect as it changes the nullability of nested fields.

I fully agree with "For expression "[A].[B]", if [A] is nullable then [A].[B] should also be nullable". I don't think though this should be fixed at the time when the RecordType is created, as it makes, in my opinion, a valid use case described in the first paragraph impossible. This logic in my understanding/opinion is very similar if not exactly the same as {{SqlTypeTransforms#TO_NULLABLE}}, and it does belong to every operator that accesses nested fields. Of course you cannot prevent people to add new operators, but again in my opinion this belongs to the operator logic the same as the mentioned {{SqlTypeTransforms#TO_NULLABLE}}.

This is incorrect:

> You want the int type to be nullable. But in the POJO code, you want to translate the id to int instead of integer. In my point, the type inference of current Calcite is very close to correct. What we should do is the QE (when we translate the POJO, to do a promotion in the code instead) which seems more feasible.

I want the int to be {{NOT NULL}} and be translated to {{int}}. Calcite will convert it to {{NULLABLE}} because of the logic in [1].

From my perspective changes in RexBuilder#makeFieldAccessInternal, SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) are necessary because I don't want the logic in the type factory as it breaks my use case from the first paragraph.

The changes in RexBuilder and SqlValidatorImpl are required for a query like {{SELECT u.nullable_row.not_null_nested_field FROM tableWithNullableRowColumn}}, because those do not use SqlDotOperator, but create a {{RexFieldAccess}}.  I split the PR into three commits because I do understand the changes in RexBuilder#makeFieldAccessInternal, SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier) are quite invasive and you might not agree to incorporate them. So if you are strongly against those I will just drop those from the PR.

I'd really like to see the other two commits merged. Still the changes in {{AliasNamespace}} make tests in {{TableFunctionTest}} fail imo because of the logic in [1].  The tests there join with lateral table generated from a UDF which return type is {{IntString(n INT NOT NULL, s String) <NULLABLE>}}. If I change the {{AliasNamespace}} to keep the original nullability(which imo is correct, I don't see a reason why aliasing should change nullability), then the logic in [1] changes the nullability of n which in turn causes mismatch.

[1] https://github.com/apache/calcite/blob/0769a8b31cbbeb5bca66ade30cf3710523da4aaa/core/src/main/java/org/apache/calcite/rel/type/RelDataTypeFactoryImpl.java#L338

> Improve nullability support for fields of structured type
> ---------------------------------------------------------
>
>                 Key: CALCITE-4085
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4085
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Dawid Wysakowicz
>            Assignee: Dawid Wysakowicz
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in https://lists.apache.org/thread.html/r602ac95fff23dd1ef974fb396df7be061ab861384ec42f5c57ce0bc2%40%3Cdev.calcite.apache.org%3E I would like to change the way a type of a field of a record is derived at couple of locations. This helps frameworks such as Apache Flink to build support for nullable records with not null fields.
> I suggest to change:
> * SqlDotOperator#deriveType
> * SqlItemOperator#inferReturnType
> * AliasNamespace#validateImpl
> * RexBuilder#makeFieldAccessInternal
> * SqlValidatorImpl.DeriveTypeVisitor#visit(SqlIdentifier)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)