You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xinyi Yu (Jira)" <ji...@apache.org> on 2022/03/02 05:26:00 UTC

[jira] [Created] (SPARK-38384) Improve error messages of ParseException from ANTLR

Xinyi Yu created SPARK-38384:
--------------------------------

             Summary: Improve error messages of ParseException from ANTLR
                 Key: SPARK-38384
                 URL: https://issues.apache.org/jira/browse/SPARK-38384
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Xinyi Yu


This task is intended to improve the error messages of ParseException directly coming from ANTLR.
h2. Bad Error Messages

Many error messages defined in ANTLR are not user-friendly. For example,
{code:java}
spark.sql("sel 1")
 
ParseException: 
mismatched input 'sel' expecting {'(', 'APPLY', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'SYNC', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)
 
== SQL ==
sel 1
^^^ {code}
Following the [Spark Error Message Guidelines|https://spark.apache.org/error-message-guidelines.html], the words in this message are vague and hard to follow. It states ‘What’, but is unclear on the ‘Why’ and ‘How’.

Or,
{code:java}
spark.sql("") // empty query

ParseException: 
mismatched input '<EOF>' expecting {'(', 'CONVERT', 'COPY', 'OPTIMIZE', 'RESTORE', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

== SQL ==

^^^ {code}
Instead of simply telling users it’s an empty line, it outputs a long message, even giving the jargon '<EOF>'.
h2. Where do these error messages come from?

There has been much work on improving ParseException in general (see [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala] for example). But lots of the above error messages are defined in ANTLR and stay unmodified in Spark.

When such an error is encountered in ANTLR, ANTLR notified the exception listener with a message like ‘mismatched input {} expecting {}’. The Spark exception listener _appends_ the line and position to the message, as well as the problematic SQL and several ‘^^^’ marking the error position. Then it throws a ParseException with the appended error message. Spark doesn’t modify the error message given from ANTLR. 

This task focuses on those error messages from ANTLR.
h2. Goals
 # Improve the error messages of ParseException that are from ANTLR; Modify all affected test cases accordingly.
 # Make sure the new error message framework is applied in this change.

h2. Proposed Error Messages Change

It should be in each sub-task and includes concrete before & after cases. See the description of each sub-task for more details.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org