You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by James Starr <ja...@gmail.com> on 2021/02/11 21:13:31 UTC

Supporting Postgres String Literals Default Type

I created the ticket CALCITE-4493
<https://issues.apache.org/jira/browse/CALCITE-4493>*.*


Postgres does not assume string literals are type CHAR, instead it assigns
them to unknown or TEXT.  This behavior is particularly relevant when a
literal is compared to another literal.  Postgres TEXT type from the type
system perspective is equivalent to VARCHAR.  However, it is further
complicated by a string literal compared to a CHAR is treated as CHAR.


Currently the flow for generating literal types goes:

1. Sql parsing, during SqlLiteral creation, a char type is created and
stored

2. Sql Validation, a row types are created from the SqlLiteral

3. Sql To Rel, RexBuilder.createCharLiteral creates a RexLiteral, then the
Rex Literal is cast to type of the SqlLiteral.


RexLiteral and SqlLiteral currently do not allow VARCHAR literals.


Non-starter Routes forward:

1.  During sql parsing add a cast to VARCHAR.  This will create the
incorrect behavior for example 7.

2.  Rework row type creation during validation, and it's interaction in Sql
to Rel .  Then, override RexBuilder.createCharLiteral, using the type
returned by Rexbuilder.  The type is determined contextually which
RexBuilder.createCharLiteral will not have.



Possible Routes Forward:

1.  Create a new type with 2 members: default type, down cast type.  For
this particular instance it would be VARCHAR and CHAR.  When checking for a
cast the later is used, otherwise the form is used.  This would closely
model the postgres behavior, but would also alter the type system.
2.  Add another property to SqlLiteral, default type.  This would have an
identical value as sqlTypeName except for postgres string literals, where
sqlTypeName would be CHAR and default type of VARCHAR.  RowType creation
would use the default type as well as determining if casting is required
for literals.

Example of Calcite not correctly normalizing a Postgres query using
RelToSqlConverterTest.java:

@Test void testDefaultTypeStringLiteralType() {
  RuleSet rules = RuleSets.ofList(CoreRules.PROJECT_VALUES_MERGE,
CoreRules.PROJECT_REDUCE_EXPRESSIONS);
  HepProgramBuilder builder = new HepProgramBuilder();
  builder.addRuleClass(ProjectToWindowRule.class);
  HepPlanner hepPlanner = new HepPlanner(builder.build());
  sql("SELECT 'foo ' = 'foo'")
      .optimize(rules, hepPlanner)
      .withPostgresql().ok("SELECT *\nFROM (VALUES (TRUE)) AS \"t\"
(\"EXPR$0\")");
}


Example against Postgres:

SELECT
  pg_typeof(a), --example 1 No Type Value Literal
  pg_typeof(c), --example 2 Literal cast to char(x)
  pg_typeof('b'), --example 3 No Type Literal
  a = b, --example 4 No Type Value literal equality
  'foo' = 'foo ', --example 5 No Type literal comparison
  a = 'foo ', --example 6 No Type Value to No Type Literal
  c = 'foo ', --example 7 Typed Value to No Type Literal
  b = CAST('foo' AS CHAR(3)) --example 8 No Type Value to Typed Literal
  FROM (VALUES('foo', 'foo ', CAST('foo' AS CHAR(3)))) AS t(a, b, c);


  pg_typeof | pg_typeof | pg_typeof | ?column? | ?column? | ?column? |
?column? | ?column?
 -----------+-----------+-----------+----------+----------+----------+----------+----------
  text      | character | unknown   | f        | f        | f        | t
     | f
 (1 row)