You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "James Starr (Jira)" <ji...@apache.org> on 2021/04/17 05:04:00 UTC

[jira] [Updated] (CALCITE-4493) Postgres String Literal Equality Creates incorrect results.

     [ https://issues.apache.org/jira/browse/CALCITE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Starr updated CALCITE-4493:
---------------------------------
    Description: 
Calcite always interprets String Literals as CHAR type.  This is normally not an issue since if are literal is being compared to VARCHAR column, then the literal is cast to a VARCHAR.  VARCHAR and CHAR equality have slight different behavior.  CHAR ignores trailing whitespace when comparing, where VARCHAR does not.  Postgres defaults the a strings literals type as unkown in a given expression, but will default them to text if there evaluated to row.

Postgres treats string literals types as unknown or TEXT.  Postgres TEXT follows VARCHAR semantics.

RelToSqlConverterTest.java
{code:java}
@Test void testDefaultTypeStringLiteralType() {
  RuleSet rules = RuleSets.ofList(CoreRules.PROJECT_VALUES_MERGE, CoreRules.PROJECT_REDUCE_EXPRESSIONS);
  HepProgramBuilder builder = new HepProgramBuilder();
  builder.addRuleClass(ProjectToWindowRule.class);
  HepPlanner hepPlanner = new HepPlanner(builder.build());
  sql("SELECT 'foo ' = 'foo'")
      .optimize(rules, hepPlanner)
      .withPostgresql().ok("SELECT *\nFROM (VALUES (TRUE)) AS \"t\" (\"EXPR$0\")");
}
{code}
 This is generating incorrect results since postgres would treat the literals TEXT which follow the VARCHAR semantics and treats trailing whitespace as significant.
{code:sql}
postgres=# SELECT pg_typeof(a), pg_typeof('b') FROM (VALUES('foo')) AS t(a);
 pg_typeof | pg_typeof 
-----------+-----------
 text      | unknown
(1 row)
{code}
 
{code:sql}
postgres=# SELECT 'foo ' = 'foo';
 ?column? 
----------
 f
(1 row)
{code}

Concerning other dialects, I ran the following query:
{code:sql}
SELECT
case when t.a = t.b then 'true' else 'false' end as example1, --example 1 No Type Value literal equality
case when 'foo' = 'foo ' then 'true' else 'false' end as example2, --example 2 No Type literal comparison
case when t.a = 'foo ' then 'true' else 'false' end as example3, --example 3 No Type Value to No Type Literal
case when t.c = 'foo ' then 'true' else 'false' end as example4, --example 4 Typed Value to No Type Literal
CASE WHEN t.b = CAST('foo' AS CHAR(3)) then 'true' else 'false' end as example5, --example 5 No Type Value to Typed Literal
'|' || CAST('a ' AS VARCHAR(2)) || '|' as example6
FROM (
        select 'foo' a, 'foo ' b, CAST('foo' AS CHAR(3)) c --FROM DUAL 
) t
;
{code}


DB
a = b
‘foo’ = ‘foo ‘
a = ‘foo ‘
c = ‘foo ‘
b = CAST(‘foo’ AS CHAR(3))
 CAST(‘a ‘ TO VARCHAR(2)
Dremio
false
false
false
false
false
|a |
Postgres
false
false
false
true
false
|a |
Oracle
false
false
false
true
true
|a |
MySQL
false
false
false
false
false
|a |
SQL Server
true
true
true
true
true
|a |
Snowflake
false
false
false
false
false
|a |
ISO SQL
true
true
true
true
true
|a|



  was:
Calcite always interprets String Literals as CHAR type.  This is normally not an issue since if are literal is being compared to VARCHAR column, then the literal is cast to a VARCHAR.  VARCHAR and CHAR equality have slight different behavior.  CHAR ignores trailing whitespace when comparing, where VARCHAR does not.  Postgres defaults the a strings literals type as unkown in a given expression, but will default them to text if there evaluated to row.

Postgres treats string literals types as unknown or TEXT.  Postgres TEXT follows VARCHAR semantics.

RelToSqlConverterTest.java
{code:java}
@Test void testDefaultTypeStringLiteralType() {
  RuleSet rules = RuleSets.ofList(CoreRules.PROJECT_VALUES_MERGE, CoreRules.PROJECT_REDUCE_EXPRESSIONS);
  HepProgramBuilder builder = new HepProgramBuilder();
  builder.addRuleClass(ProjectToWindowRule.class);
  HepPlanner hepPlanner = new HepPlanner(builder.build());
  sql("SELECT 'foo ' = 'foo'")
      .optimize(rules, hepPlanner)
      .withPostgresql().ok("SELECT *\nFROM (VALUES (TRUE)) AS \"t\" (\"EXPR$0\")");
}
{code}
 This is generating incorrect results since postgres would treat the literals TEXT which follow the VARCHAR semantics and treats trailing whitespace as significant.
{code:sql}
postgres=# SELECT pg_typeof(a), pg_typeof('b') FROM (VALUES('foo')) AS t(a);
 pg_typeof | pg_typeof 
-----------+-----------
 text      | unknown
(1 row)
{code}
 
{code:sql}
postgres=# SELECT 'foo ' = 'foo';
 ?column? 
----------
 f
(1 row)
{code}


> Postgres String Literal Equality Creates incorrect results.
> -----------------------------------------------------------
>
>                 Key: CALCITE-4493
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4493
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: James Starr
>            Priority: Major
>
> Calcite always interprets String Literals as CHAR type.  This is normally not an issue since if are literal is being compared to VARCHAR column, then the literal is cast to a VARCHAR.  VARCHAR and CHAR equality have slight different behavior.  CHAR ignores trailing whitespace when comparing, where VARCHAR does not.  Postgres defaults the a strings literals type as unkown in a given expression, but will default them to text if there evaluated to row.
> Postgres treats string literals types as unknown or TEXT.  Postgres TEXT follows VARCHAR semantics.
> RelToSqlConverterTest.java
> {code:java}
> @Test void testDefaultTypeStringLiteralType() {
>   RuleSet rules = RuleSets.ofList(CoreRules.PROJECT_VALUES_MERGE, CoreRules.PROJECT_REDUCE_EXPRESSIONS);
>   HepProgramBuilder builder = new HepProgramBuilder();
>   builder.addRuleClass(ProjectToWindowRule.class);
>   HepPlanner hepPlanner = new HepPlanner(builder.build());
>   sql("SELECT 'foo ' = 'foo'")
>       .optimize(rules, hepPlanner)
>       .withPostgresql().ok("SELECT *\nFROM (VALUES (TRUE)) AS \"t\" (\"EXPR$0\")");
> }
> {code}
>  This is generating incorrect results since postgres would treat the literals TEXT which follow the VARCHAR semantics and treats trailing whitespace as significant.
> {code:sql}
> postgres=# SELECT pg_typeof(a), pg_typeof('b') FROM (VALUES('foo')) AS t(a);
>  pg_typeof | pg_typeof 
> -----------+-----------
>  text      | unknown
> (1 row)
> {code}
>  
> {code:sql}
> postgres=# SELECT 'foo ' = 'foo';
>  ?column? 
> ----------
>  f
> (1 row)
> {code}
> Concerning other dialects, I ran the following query:
> {code:sql}
> SELECT
> case when t.a = t.b then 'true' else 'false' end as example1, --example 1 No Type Value literal equality
> case when 'foo' = 'foo ' then 'true' else 'false' end as example2, --example 2 No Type literal comparison
> case when t.a = 'foo ' then 'true' else 'false' end as example3, --example 3 No Type Value to No Type Literal
> case when t.c = 'foo ' then 'true' else 'false' end as example4, --example 4 Typed Value to No Type Literal
> CASE WHEN t.b = CAST('foo' AS CHAR(3)) then 'true' else 'false' end as example5, --example 5 No Type Value to Typed Literal
> '|' || CAST('a ' AS VARCHAR(2)) || '|' as example6
> FROM (
>         select 'foo' a, 'foo ' b, CAST('foo' AS CHAR(3)) c --FROM DUAL 
> ) t
> ;
> {code}
> DB
> a = b
> ‘foo’ = ‘foo ‘
> a = ‘foo ‘
> c = ‘foo ‘
> b = CAST(‘foo’ AS CHAR(3))
>  CAST(‘a ‘ TO VARCHAR(2)
> Dremio
> false
> false
> false
> false
> false
> |a |
> Postgres
> false
> false
> false
> true
> false
> |a |
> Oracle
> false
> false
> false
> true
> true
> |a |
> MySQL
> false
> false
> false
> false
> false
> |a |
> SQL Server
> true
> true
> true
> true
> true
> |a |
> Snowflake
> false
> false
> false
> false
> false
> |a |
> ISO SQL
> true
> true
> true
> true
> true
> |a|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)