You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (JIRA)" <ji...@apache.org> on 2015/07/24 06:35:04 UTC

[jira] [Commented] (CALCITE-815) Add an option to allow empty strings to represent null values

    [ https://issues.apache.org/jira/browse/CALCITE-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639905#comment-14639905 ] 

Julian Hyde commented on CALCITE-815:
-------------------------------------

This is to support Phoenix, which has semantics similar to Oracle.

It covers both character strings (VARCHAR) and binary strings (BINARY). Note that in standard SQL, those types behave similarly, and have a lot of methods in common (concatenate, length, substring, replace, etc.)

I don't think it's possible for a CHAR(10) or BINARY(10) to be "empty", so I don't think it applies to those types.

Configure this as part of the type system: a new method boolean isEmptyStringNull() in interface RelDataTypeSystem.

One design decision is whether this should be done by the producer or the consumer.
* If by the producer, then with this option enabled, empty strings would never exist, only null strings. Built-in functions that produce strings would have to be careful, and there would need to be an adapter for user-defined functions. And we would have to be careful reading from an external data source (e.g. using the Csv adapter) that empty strings become null values. And we would have to be careful that empty strings sent as parameter values become null values.
* If by the consumer, then every method that consumes a (nullable) string should check whether it is empty. And we would have to be careful sending results back over JDBC.

Which option is preferable: producer or consumer enforcement? I think producer. Otherwise we have a lot of functions to check.

In standard SQL most built-in functions are strict, meaning that if one of their arguments is null, the result will be null. For example, x || y returns null if either x or y is null. In Oracle semantics, such functions are not strict: 'hello' || null evaluates to 'hello'.

Functions that produce string values:
* concat (||), UPPER, LOWER, TRIM, OVERLAY, SUBSTRING, INITCAP
* JDBC functions LOCATE, INSERT, LCASE

Functions that consume character values:
* relational operators: =, <>, >, >=, <, <= (in Oracle, does " 'x' > '' " evaluate to true or unknown?)
* BETWEEN, LIKE [ESCAPE], IS [NOT] DISTINCT FROM are similar to the relational operators
* Check what "CASE x WHEN y THEN ..." is supposed to do when x and y are both null.

Type derivation. In Calcite, strict functions whose arguments are NOT NULL are usually NOT NULL. For example, the type of SUBSTRING(VARCHAR(2) NOT NULL, INT NOT NULL, INT NOT NULL) is VARCHAR(2) NOT NULL. But in Oracle compatibility mode, the type would be VARCHAR(2), because an empty string might be produced.

In Oracle, null strings sort first. Check that null values of other types sort first as well. Then ensure that "ORDER BY ... ASC NULLS FIRST" and "ORDER BY ... DESC NULLS LAST" is the default behavior.

> Add an option to allow empty strings to represent null values
> -------------------------------------------------------------
>
>                 Key: CALCITE-815
>                 URL: https://issues.apache.org/jira/browse/CALCITE-815
>             Project: Calcite
>          Issue Type: New Feature
>            Reporter: Maryann Xue
>            Assignee: Julian Hyde
>              Labels: phoenix
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)