You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Max Gekk (Jira)" <ji...@apache.org> on 2023/03/20 16:13:00 UTC
[jira] [Created] (SPARK-42873) Define Spark SQL types as keywords
Max Gekk created SPARK-42873:
--------------------------------
Summary: Define Spark SQL types as keywords
Key: SPARK-42873
URL: https://issues.apache.org/jira/browse/SPARK-42873
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk
Assignee: Max Gekk
Currently, Spark SQL defines primitive types as:
{code}
| identifier (LEFT_PAREN INTEGER_VALUE
(COMMA INTEGER_VALUE)* RIGHT_PAREN)? #primitiveDataType
{code}
where identifier is parsed later by visitPrimitiveDataType():
{code:scala}
override def visitPrimitiveDataType(ctx: PrimitiveDataTypeContext): DataType = withOrigin(ctx) {
val dataType = ctx.identifier.getText.toLowerCase(Locale.ROOT)
(dataType, ctx.INTEGER_VALUE().asScala.toList) match {
case ("boolean", Nil) => BooleanType
case ("tinyint" | "byte", Nil) => ByteType
case ("smallint" | "short", Nil) => ShortType
case ("int" | "integer", Nil) => IntegerType
case ("bigint" | "long", Nil) => LongType
case ("float" | "real", Nil) => FloatType
...
{code}
So, the types are not Spark SQL keywords, and this causes some inconveniences while analysing/transforming the lexer tree. For example, while forming the stable column aliases.
Need to define Spark SQL types in SqlBaseLexer.g4.
Also, typed literals have the same issue. The types "DATE", "TIMESTAMP_NTZ", "TIMESTAMP", "TIMESTAMP_LTZ", "INTERVAL", and "X" should be defined as base lexer tokens.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org