You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ya...@apache.org on 2019/03/18 06:20:20 UTC

[spark] branch master updated: [SPARK-27161][SQL] improve the document of SQL keywords

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new dbcb479  [SPARK-27161][SQL] improve the document of SQL keywords
dbcb479 is described below

commit dbcb4792f2396a31ab620210c6a8177c3b5db10a
Author: Wenchen Fan <we...@databricks.com>
AuthorDate: Mon Mar 18 15:19:52 2019 +0900

    [SPARK-27161][SQL] improve the document of SQL keywords
    
    ## What changes were proposed in this pull request?
    
    Make it more clear about how Spark categories keywords regarding to the config `spark.sql.parser.ansi.enabled`
    
    ## How was this patch tested?
    
    existing tests
    
    Closes #24093 from cloud-fan/parser.
    
    Authored-by: Wenchen Fan <we...@databricks.com>
    Signed-off-by: Takeshi Yamamuro <ya...@apache.org>
---
 ...nd-non-reserved-keywords.md => sql-keywords.md} | 48 +++++++-------
 .../apache/spark/sql/catalyst/parser/SqlBase.g4    | 75 ++++++++++++++--------
 2 files changed, 74 insertions(+), 49 deletions(-)

diff --git a/docs/sql-reserved-and-non-reserved-keywords.md b/docs/sql-keywords.md
similarity index 95%
rename from docs/sql-reserved-and-non-reserved-keywords.md
rename to docs/sql-keywords.md
index b1561fb..5ba3ad8 100644
--- a/docs/sql-reserved-and-non-reserved-keywords.md
+++ b/docs/sql-keywords.md
@@ -1,16 +1,20 @@
 ---
 layout: global
-title: SQL Reserved/Non-Reserved Keywords
-displayTitle: SQL Reserved/Non-Reserved Keywords
+title: Spark SQL Keywords
+displayTitle: Spark SQL Keywords
 ---
 
-In Spark SQL, there are 2 kinds of keywords: non-reserved and reserved. Non-reserved keywords have a
-special meaning only in particular contexts and can be used as identifiers (e.g., table names, view names,
-column names, column aliases, table aliases) in other contexts. Reserved keywords can't be used as
-table alias, but can be used as other identifiers.
+When `spark.sql.parser.ansi.enabled` is true, Spark SQL has two kinds of keywords:
+* Reserved keywords: Keywords that are reserved and can't be used as identifiers for table, view, column, function, alias, etc.
+* Non-reserved keywords: Keywords that have a special meaning only in particular contexts and can be used as identifiers in other contexts. For example, `SELECT 1 WEEK` is an interval literal, but WEEK can be used as identifiers in other places.
 
-The list of reserved and non-reserved keywords can change according to the config
-`spark.sql.parser.ansi.enabled`, which is false by default.
+When `spark.sql.parser.ansi.enabled` is false, Spark SQL has two kinds of keywords:
+* Non-reserved keywords: Same definition as the one when `spark.sql.parser.ansi.enabled=true`.
+* Strict-non-reserved keywords: A strict version of non-reserved keywords, which can not be used as table alias.
+
+By default `spark.sql.parser.ansi.enabled` is false.
+
+Below is a list of all the keywords in Spark SQL.
 
 <table class="table">
   <tr><th rowspan="2" style="vertical-align: middle;"><b>Keyword</b></th><th colspan="2"><b>Spark SQL</b></th><th rowspan="2" style="vertical-align: middle;"><b>SQL-2011</b></th></tr>
@@ -26,7 +30,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>ALTER</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>ANALYZE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>AND</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>ANTI</td><td>reserved</td><td>reserved</td><td>non-reserved</td></tr>
+  <tr><td>ANTI</td><td>reserved</td><td>strict-non-reserved</td><td>non-reserved</td></tr>
   <tr><td>ANY</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>ARE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>ARCHIVE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
@@ -116,7 +120,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>COVAR_POP</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>COVAR_SAMP</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>CREATE</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>CROSS</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>CROSS</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>CUBE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>CUME_DIST</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>CURRENT</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
@@ -185,7 +189,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>ESCAPE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>ESCAPED</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>EVERY</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>EXCEPT</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>EXCEPT</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>EXCEPTION</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>EXCHANGE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>EXEC</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
@@ -215,7 +219,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>FRAME_ROW</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>FREE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>FROM</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>FULL</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>FULL</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>FUNCTION</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>FUNCTIONS</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>FUSION</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
@@ -244,7 +248,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>INDEXES</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>INITIAL</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>INITIALLY</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
-  <tr><td>INNER</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>INNER</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>INOUT</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>INPATH</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>INPUT</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
@@ -253,7 +257,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>INSERT</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>INT</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>INTEGER</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>INTERSECT</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>INTERSECT</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>INTERSECTION</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>INTERVAL</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>INTO</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
@@ -261,7 +265,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>ISOLATION</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>ITEMS</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>ITERATE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>JOIN</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>JOIN</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>JSON_ARRAY</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>JSON_ARRAYAGG</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>JSON_EXISTS</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
@@ -283,7 +287,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>LEAD</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>LEADING</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>LEAVE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>LEFT</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>LEFT</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>LEVEL</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>LIKE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>LIKE_REGEX</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
@@ -332,7 +336,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>MULTISET</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>NAMES</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>NATIONAL</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>NATURAL</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>NATURAL</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>NCHAR</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>NCLOB</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>NEW</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
@@ -354,7 +358,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>OFFSET</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>OLD</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>OMIT</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
-  <tr><td>ON</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>ON</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>ONE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>ONLY</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>OPEN</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
@@ -440,7 +444,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>RETURN</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>RETURNS</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>REVOKE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>RIGHT</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>RIGHT</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>RLIKE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>ROLE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>ROLES</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
@@ -461,7 +465,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>SECTION</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>SEEK</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>SELECT</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>SEMI</td><td>reserved</td><td>reserved</td><td>non-reserved</td></tr>
+  <tr><td>SEMI</td><td>reserved</td><td>strict-non-reserved</td><td>non-reserved</td></tr>
   <tr><td>SENSITIVE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>SEPARATED</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>SERDE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
@@ -545,7 +549,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>UNCACHE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>UNDER</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>UNDO</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>UNION</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>UNION</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>UNIQUE</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>UNKNOWN</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>UNLOCK</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
@@ -557,7 +561,7 @@ The list of reserved and non-reserved keywords can change according to the confi
   <tr><td>USAGE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>USE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
   <tr><td>USER</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
-  <tr><td>USING</td><td>reserved</td><td>reserved</td><td>reserved</td></tr>
+  <tr><td>USING</td><td>reserved</td><td>strict-non-reserved</td><td>reserved</td></tr>
   <tr><td>VALUE</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>VALUES</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
   <tr><td>VALUE_OF</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
diff --git a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index be36aaa..4d02d62 100644
--- a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -758,7 +758,7 @@ qualifiedName
 
 identifier
     : strictIdentifier
-    | {!ansi}? defaultReserved
+    | {!ansi}? strictNonReserved
     ;
 
 strictIdentifier
@@ -782,7 +782,16 @@ number
     | MINUS? BIGDECIMAL_LITERAL       #bigDecimalLiteral
     ;
 
-// The list of the non-reserved keywords when `spark.sql.parser.ansi.enabled` is true.
+// When `spark.sql.parser.ansi.enabled=true`, there are 2 kinds of keywords in Spark SQL.
+// - Reserved keywords:
+//     Keywords that are reserved and can't be used as identifiers for table, view, column,
+//     function, alias, etc.
+// - Non-reserved keywords:
+//     Keywords that have a special meaning only in particular contexts and can be used as
+//     identifiers in other contexts. For example, `SELECT 1 WEEK` is an interval literal, but WEEK
+//     can be used as identifiers in other places.
+// You can find the full keywords list by searching "Start of the keywords list" in this file.
+// The non-reserved keywords are listed below. Keywords not in this list are reserved keywords.
 ansiNonReserved
     : ADD
     | AFTER
@@ -961,7 +970,16 @@ ansiNonReserved
     | YEARS
     ;
 
-defaultReserved
+// When `spark.sql.parser.ansi.enabled=false`, there are 2 kinds of keywords in Spark SQL.
+// - Non-reserved keywords:
+//     Same definition as the one when `spark.sql.parser.ansi.enabled=true`.
+// - Strict-non-reserved keywords:
+//     A strict version of non-reserved keywords, which can not be used as table alias.
+// You can find the full keywords list by searching "Start of the keywords list" in this file.
+// The strict-non-reserved keywords are listed in `strictNonReserved`.
+// The non-reserved keywords are listed in `nonReserved`.
+// These 2 together contain all the keywords.
+strictNonReserved
     : ANTI
     | CROSS
     | EXCEPT
@@ -1215,6 +1233,9 @@ nonReserved
     | YEARS
     ;
 
+//============================
+// Start of the keywords list
+//============================
 SELECT: 'SELECT';
 FROM: 'FROM';
 ADD: 'ADD';
@@ -1350,37 +1371,13 @@ IGNORE: 'IGNORE';
 BOTH: 'BOTH';
 LEADING: 'LEADING';
 TRAILING: 'TRAILING';
-
 IF: 'IF';
 POSITION: 'POSITION';
 EXTRACT: 'EXTRACT';
-
-EQ  : '=' | '==';
-NSEQ: '<=>';
-NEQ : '<>';
-NEQJ: '!=';
-LT  : '<';
-LTE : '<=' | '!>';
-GT  : '>';
-GTE : '>=' | '!<';
-
-PLUS: '+';
-MINUS: '-';
-ASTERISK: '*';
-SLASH: '/';
-PERCENT: '%';
-DIV: 'DIV';
-TILDE: '~';
-AMPERSAND: '&';
-PIPE: '|';
-CONCAT_PIPE: '||';
-HAT: '^';
-
 PERCENTLIT: 'PERCENT';
 BUCKET: 'BUCKET';
 OUT: 'OUT';
 OF: 'OF';
-
 SORT: 'SORT';
 CLUSTER: 'CLUSTER';
 DISTRIBUTE: 'DISTRIBUTE';
@@ -1487,6 +1484,30 @@ SESSION_USER: 'SESSION_USER';
 SOME: 'SOME';
 UNIQUE: 'UNIQUE';
 USER: 'USER';
+//============================
+// End of the keywords list
+//============================
+
+EQ  : '=' | '==';
+NSEQ: '<=>';
+NEQ : '<>';
+NEQJ: '!=';
+LT  : '<';
+LTE : '<=' | '!>';
+GT  : '>';
+GTE : '>=' | '!<';
+
+PLUS: '+';
+MINUS: '-';
+ASTERISK: '*';
+SLASH: '/';
+PERCENT: '%';
+DIV: 'DIV';
+TILDE: '~';
+AMPERSAND: '&';
+PIPE: '|';
+CONCAT_PIPE: '||';
+HAT: '^';
 
 STRING
     : '\'' ( ~('\''|'\\') | ('\\' .) )* '\''


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org