You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/20 21:49:22 UTC

[GitHub] [spark] anchovYu opened a new pull request #35915: [SPARK-38456][SQL] Improve error messages of no viable alternative, extraneous input and missing token in ParseException

anchovYu opened a new pull request #35915:
URL: https://github.com/apache/spark/pull/35915


   ### What changes were proposed in this pull request?
   This PR improves the "no viable alternative", "extraneous input" and "missing .. at " ANTLR error messages in ParseExceptions, as mentioned in https://issues.apache.org/jira/browse/SPARK-38456 
   
   **With this PR, all ANTLR exceptions are unified to the same error class, `PARSE_SYNTAX_ERROR`.**
   
   #### No viable alternative
   * Query
       ```sql
       select ( 
       ```
   * Before
       ```
       no viable alternative at input ‘(‘(line 1, pos 8)
       ```
   * After
       ```
       Syntax error at or near end of input(line 1, pos 8)
       ```
   
   #### Extraneous Input
   * Query
       ```sql
       CREATE TABLE my_tab(a: INT COMMENT 'test', b: STRING) USING parquet 
       ```
   * Before
       ```
       extraneous input ':' expecting {'APPLY', 'CALLED', 'CHANGES', 'CLONE', 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIONS', 'CREDENTIAL', 'CREDENTIALS', 'DEEP', 'DEFINER', 'DELTA', 'DETERMINISTIC', 'ENCRYPTION', 'EXPECT', 'FAIL', 'FILES', 'FORMAT_OPTIONS', 'HISTORY', 'INCREMENTAL', 'INPUT', 'INVOKER', 'LANGUAGE', 'LIVE', 'MATERIALIZED', 'MODIFIES', 'OPTIMIZE', 'PATTERN', 'READS', 'RESTORE', 'RETURN', 'RETURNS', 'SAMPLE', 'SCD TYPE 1', 'SCD TYPE 2', 'SECURITY', 'SEQUENCE', 'SHALLOW', 'SNAPSHOT', 'SPECIFIC', 'SQL', 'STORAGE', 'STREAMING', 'UPDATES', 'UP_TO_DATE', 'VIOLATION', 'ZORDER', 'ADD', 'AFTER', 'ALL', 'ALTER', 'ALWAYS', 'ANALYZE', 'AND', 'ANTI', 'ANY', 'ARCHIVE', 'ARRAY', 'AS', 'ASC', 'AT', 'AUTHORIZATION', 'BETWEEN', 'BOTH', 'BUCKET', 'BUCKETS', 'BY', 'CACHE', 'CASCADE', 'CASE', 'CAST', 'CATALOG', 'CATALOGS', 'CHANGE', 'CHECK', 'CLEAR', 'CLUSTER', 'CLUSTERED', 'CODE', 'CODEGEN', 'COLLATE', 'COLLECTION', 'COLUMN', 'COLUMNS', 'COMMENT', 'COMMIT', 'COMPACT', 'COMPACTIO
 NS', 'COMPUTE', 'CONCATENATE', 'CONSTRAINT', 'COST', 'CREATE', 'CROSS', 'CUBE', 'CURRENT', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURRENT_USER', 'DAY', 'DATA', 'DATABASE', 'DATABASES', 'DATEADD', 'DATE_ADD', 'DATEDIFF', 'DATE_DIFF', 'DBPROPERTIES', 'DEFAULT', 'DEFINED', 'DELETE', 'DELIMITED', 'DESC', 'DESCRIBE', 'DFS', 'DIRECTORIES', 'DIRECTORY', 'DISTINCT', 'DISTRIBUTE', 'DIV', 'DROP', 'ELSE', 'END', 'ESCAPE', 'ESCAPED', 'EXCEPT', 'EXCHANGE', 'EXISTS', 'EXPLAIN', 'EXPORT', 'EXTENDED', 'EXTERNAL', 'EXTRACT', 'FALSE', 'FETCH', 'FIELDS', 'FILTER', 'FILEFORMAT', 'FIRST', 'FN', 'FOLLOWING', 'FOR', 'FOREIGN', 'FORMAT', 'FORMATTED', 'FROM', 'FULL', 'FUNCTION', 'FUNCTIONS', 'GENERATED', 'GLOBAL', 'GRANT', 'GRANTS', 'GROUP', 'GROUPING', 'HAVING', 'HOUR', 'IDENTITY', 'IF', 'IGNORE', 'IMPORT', 'IN', 'INCREMENT', 'INDEX', 'INDEXES', 'INNER', 'INPATH', 'INPUTFORMAT', 'INSERT', 'INTERSECT', 'INTERVAL', 'INTO', 'IS', 'ITEMS', 'JOIN', 'KEY', 'KEYS', 'LAST', 'LATERAL', 'LAZY', 'LEADI
 NG', 'LEFT', 'LIKE', 'ILIKE', 'LIMIT', 'LINES', 'LIST', 'LOAD', 'LOCAL', 'LOCATION', 'LOCK', 'LOCKS', 'LOGICAL', 'MACRO', 'MAP', 'MATCHED', 'MERGE', 'MINUTE', 'MONTH', 'MSCK', 'NAMESPACE', 'NAMESPACES', 'NATURAL', 'NO', NOT, 'NULL', 'NULLS', 'OF', 'ON', 'ONLY', 'OPTION', 'OPTIONS', 'OR', 'ORDER', 'OUT', 'OUTER', 'OUTPUTFORMAT', 'OVER', 'OVERLAPS', 'OVERLAY', 'OVERWRITE', 'PARTITION', 'PARTITIONED', 'PARTITIONS', 'PERCENTILE_CONT', 'PERCENT', 'PIVOT', 'PLACING', 'POSITION', 'PRECEDING', 'PRIMARY', 'PRINCIPALS', 'PROPERTIES', 'PROVIDER', 'PROVIDERS', 'PURGE', 'QUALIFY', 'QUERY', 'RANGE', 'RECIPIENT', 'RECIPIENTS', 'RECORDREADER', 'RECORDWRITER', 'RECOVER', 'REDUCE', 'REFERENCES', 'REFRESH', 'REMOVE', 'RENAME', 'REPAIR', 'REPEATABLE', 'REPLACE', 'REPLICAS', 'RESET', 'RESPECT', 'RESTRICT', 'REVOKE', 'RIGHT', RLIKE, 'ROLE', 'ROLES', 'ROLLBACK', 'ROLLUP', 'ROW', 'ROWS', 'SECOND', 'SCHEMA', 'SCHEMAS', 'SELECT', 'SEMI', 'SEPARATED', 'SERDE', 'SERDEPROPERTIES', 'SESSION_USER', 'SET', 'MINUS'
 , 'SETS', 'SHARE', 'SHARES', 'SHOW', 'SKEWED', 'SOME', 'SORT', 'SORTED', 'START', 'STATISTICS', 'STORED', 'STRATIFY', 'STRUCT', 'SUBSTR', 'SUBSTRING', 'SYNC', 'SYSTEM_TIME', 'SYSTEM_VERSION', 'TABLE', 'TABLES', 'TABLESAMPLE', 'TBLPROPERTIES', TEMPORARY, 'TERMINATED', 'THEN', 'TIME', 'TIMESTAMP', 'TIMESTAMPADD', 'TIMESTAMPDIFF', 'TO', 'TOUCH', 'TRAILING', 'TRANSACTION', 'TRANSACTIONS', 'TRANSFORM', 'TRIM', 'TRUE', 'TRUNCATE', 'TRY_CAST', 'TYPE', 'UNARCHIVE', 'UNBOUNDED', 'UNCACHE', 'UNION', 'UNIQUE', 'UNKNOWN', 'UNLOCK', 'UNSET', 'UPDATE', 'USE', 'USER', 'USING', 'VALUES', 'VERSION', 'VIEW', 'VIEWS', 'WHEN', 'WHERE', 'WINDOW', 'WITH', 'WITHIN', 'YEAR', 'ZONE', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 21)
       ```
   * After
       ```
       Syntax error at or near ':': extra input ':'(line 1, pos 21)
       ```
   
   #### Missing token
   * Query
       ```sql
       select count(a from b 
       ```
   * Before
       ```
       missing ')' at 'from'(line 2, pos 0)
       ```
   * After
       ```
       Syntax error at or near 'from': missing ')'(line 2, pos 0)
       ```
   
   ### Why are the changes needed?
   https://issues.apache.org/jira/browse/SPARK-38384 The description states the reason for the change.
   TLDR, the error messages of ParseException directly coming from ANTLR are not user-friendly and we want to improve it.
   
   ### Does this PR introduce _any_ user-facing change?
   If the error messages changes are considered as user-facing change, then yes.
   Example cases are listed in the top of this PR description.
   
   
   ### How was this patch tested?
   Local unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35915: [SPARK-38456][SQL] Improve error messages of no viable alternative, extraneous input and missing token in ParseException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35915:
URL: https://github.com/apache/spark/pull/35915#issuecomment-1074128596


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35915: [SPARK-38456][SQL] Improve error messages of no viable alternative, extraneous input and missing token in ParseException

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35915:
URL: https://github.com/apache/spark/pull/35915#discussion_r831117075



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SparkParserErrorStrategy.scala
##########
@@ -49,6 +48,10 @@ class SparkRecognitionException(
       },
       Some(errorClass),
       messageParameters)
+
+  /** Construct with pure errorClass and messageParameter information.  */
+  def this(errorClass: String, messageParameters: Array[String]) =

Review comment:
       Does `SparkRecognitionException` still need to take error class name as a parameter if it's always `PARSE_SYNTAX_ERROR`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] anchovYu commented on a change in pull request #35915: [SPARK-38456][SQL] Improve error messages of no viable alternative, extraneous input and missing token in ParseException

Posted by GitBox <gi...@apache.org>.
anchovYu commented on a change in pull request #35915:
URL: https://github.com/apache/spark/pull/35915#discussion_r831316199



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SparkParserErrorStrategy.scala
##########
@@ -49,6 +48,10 @@ class SparkRecognitionException(
       },
       Some(errorClass),
       messageParameters)
+
+  /** Construct with pure errorClass and messageParameter information.  */
+  def this(errorClass: String, messageParameters: Array[String]) =

Review comment:
       That's a good question. I leave it here in case that in the future we want to make more create syntax error types, or we want to handle other rest exceptions (there should still be at least one) different from the current `PARSE_SYNTAX_ERROR`.
   I think I can still keep this parameter, but give this parameter a default value, so all existing calls don't need to pass the `PARSE_SYNTAX_ERROR` explicitly. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] anchovYu commented on a change in pull request #35915: [SPARK-38456][SQL] Improve error messages of no viable alternative, extraneous input and missing token in ParseException

Posted by GitBox <gi...@apache.org>.
anchovYu commented on a change in pull request #35915:
URL: https://github.com/apache/spark/pull/35915#discussion_r831316199



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SparkParserErrorStrategy.scala
##########
@@ -49,6 +48,10 @@ class SparkRecognitionException(
       },
       Some(errorClass),
       messageParameters)
+
+  /** Construct with pure errorClass and messageParameter information.  */
+  def this(errorClass: String, messageParameters: Array[String]) =

Review comment:
       That's a good question. I leave it here in case that in the future we want to make more create syntax error types, or we want to handle other rest exceptions (there should still be at least one) different from the current `PARSE_SYNTAX_ERROR`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] anchovYu commented on pull request #35915: [SPARK-38456][SQL] Improve error messages of no viable alternative, extraneous input and missing token in ParseException

Posted by GitBox <gi...@apache.org>.
anchovYu commented on pull request #35915:
URL: https://github.com/apache/spark/pull/35915#issuecomment-1073388348


   @cloud-fan @MaxGekk would you take a look? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35915: [SPARK-38456][SQL] Improve error messages of no viable alternative, extraneous input and missing token in ParseException

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35915:
URL: https://github.com/apache/spark/pull/35915#issuecomment-1075314647


   thanks, merging to master/3.3!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #35915: [SPARK-38456][SQL] Improve error messages of no viable alternative, extraneous input and missing token in ParseException

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #35915:
URL: https://github.com/apache/spark/pull/35915


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org