You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Piotr Nowojski (JIRA)" <ji...@apache.org> on 2018/08/30 13:14:00 UTC

[jira] [Commented] (FLINK-10257) Incorrect CHAR type support in Flink SQL and Table API

    [ https://issues.apache.org/jira/browse/FLINK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597434#comment-16597434 ] 

Piotr Nowojski commented on FLINK-10257:
----------------------------------------

I can think of about 3 solutions to this problem:
 # Drop {{CHAR}} support, and convert all string literals to {{VARCHAR}} in SQL. Any attempt to cast a column to {{CHAR}} should return an exception.
 # Provide internal {{CHAR}} support in SQL, while keeping it unsupported in Table API
 # Provide internal {{CHAR}} support both for SQL and Table API, potentially changing type of string literals in Table API to {{CHAR}} as well to make it consistent with SQL

For option 1., we could either:
 * convince Calcite to do this
 * or we would have to rewrite all {{CHAR}} types on our side in all RexNodes and RelNodes (similarly, but a bit more complicated to {{RelTimeIndicatorConverter}}

For option 2., we would need to properly support {{CHAR}} in all string functions and comparisons, with respect to padding. Probably to make things more consistent, we should make a contract that either we internally store {{CHAR}} always padded or never padded (now it's semi random). For writing to connectors we would need to cast all {{CHAR}} columns to {{VARCHAR}} which would require trimming. 

For option 3. we would additionally need to add support for {{CHAR}} in Table API.

> Incorrect CHAR type support in Flink SQL and Table API
> ------------------------------------------------------
>
>                 Key: FLINK-10257
>                 URL: https://issues.apache.org/jira/browse/FLINK-10257
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API &amp; SQL
>            Reporter: Piotr Nowojski
>            Priority: Critical
>
> Despite that we officially do not support CHAR type, this type is visible and accessible for the users. First of all, string literals have default type of CHAR in SQL. Secondly users can always cast expressions/columns to CHAR.
> Problem is that we do not support CHAR correctly. We mishandle it in:
>  # comparisons and functions
>  # writing values to sinks
> According to SQL standard (and as almost all of the other databases do), CHAR comparisons should ignore white spaces. On the other hand, functions like {{CONCAT}} or {{LENGTH}} shouldn't: [http://troels.arvin.dk/db/rdbms/#data_types-char] .
> Currently in In Flink we completely ignore those rules. Sometimes we store internally CHAR with padded spaces sometimes without. This results with semi random behaviour with respect to comparisons/functions/writing to sinks. For example following query:
>  
> {code:java}
> tEnv.sqlQuery("SELECT CAST(s AS CHAR(10)) FROM sourceTable").insertInto("targetTable")
> env.execute()
> {code}
> Where `sourceTable` has single {{VARCHAR(10)}} column with values: "Hi", "Hello", "Hello world", writes to sink not padded strings (correctly), but following query incorrectly
>  
> {code:java}
> tEnv.sqlQuery("SELECT * FROM (SELECT CAST(s AS CHAR(10)) c FROM sourceTable) WHERE c = 'Hi'")
>   .insertInto("targetTable")
> env.execute(){code}
> Incorrectly filters out all of the results, because {{CAST(s AS CHAR(10))}} is a NOOP in Flink, while 'Hi' constant handled by Calcite to us will be padded with 8 spaces.
> On the other hand following query produces strings padded with spaces:
>  
> {code:java}
> tEnv.sqlQuery("SELECT CASE l WHEN 1 THEN 'GERMANY' WHEN 2 THEN 'POLAND' ELSE 'this should not happen' END FROM sourceTable")
>   .insertInto("targetTable")
> env.execute()
> val expected = Seq(
>   "GERMANY",
>   "POLAND",
>   "POLAND").mkString("\n")
> org.junit.ComparisonFailure: Different elements in arrays: expected 3 elements and received 3
> expected: [GERMANY, POLAND, POLAND]
> received: [GERMANY , POLAND , POLAND ]
> {code}
>  
> To make matter even worse, Calcite's constant folding correctly performs comparisons, while if same comparisons are performed by Flink, they yield different results. In other words in SQL:
> {code:java}
> SELECT 'POLAND' = 'POLAND  '
> {code}
> return true, but same expression performed on columns
> {code:java}
> SELECT CAST(country as CHAR(10)) = CAST(country_padded as CHAR(10)) FROM countries{code}
> returns false.
> To further complicated things, in SQL our string literals have {{CHAR}} type, while in Table API our literals have String type (effectively {{VARCHAR}}) making results inconsistent between those two APIs.
>  
> CC [~twalthr] [~fhueske] [~hequn8128]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)