You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/29 07:44:42 UTC

[GitHub] [spark] cloud-fan commented on a change in pull request #29146: [WIP][SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command

cloud-fan commented on a change in pull request #29146:
URL: https://github.com/apache/spark/pull/29146#discussion_r461323124



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##########
@@ -58,6 +58,9 @@ class SparkSqlParser(conf: SQLConf) extends AbstractSqlParser(conf) {
 class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
   import org.apache.spark.sql.catalyst.parser.ParserUtils._
 
+  private def normalizeConfigString(s: String) =

Review comment:
       I checked `tableIdentifier`, it includes `BACKQUOTED_IDENTIFIER`, and in `AstBuilder` we just use `xxx.getText`. Why does `getText` not work here?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##########
@@ -58,6 +58,9 @@ class SparkSqlParser(conf: SQLConf) extends AbstractSqlParser(conf) {
 class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
   import org.apache.spark.sql.catalyst.parser.ParserUtils._
 
+  private def normalizeConfigString(s: String) =

Review comment:
       I found it. See `PostProcessor.exitQuotedIdentifier`. `quotedIdentifier` is a special parser rule which has post porcessor to remove back-quotes.

##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -244,11 +244,19 @@ statement
     | SET TIME ZONE interval                                           #setTimeZone
     | SET TIME ZONE timezone=(STRING | LOCAL)                          #setTimeZone
     | SET TIME ZONE .*?                                                #setTimeZone
+    | SET configKey (EQ value=.+)?                                     #setConfiguration
     | SET .*?                                                          #setConfiguration
+    | RESET configKey?                                                 #resetConfiguration
     | RESET .*?                                                        #resetConfiguration
     | unsupportedHiveNativeCommands .*?                                #failNativeCommand
     ;
 
+configKey
+    : IDENTIFIER (('.' | ':') IDENTIFIER)*
+    | quotedIdentifier
+    | STRING

Review comment:
       After a second thought, I'm wondering if we should support STRING. The table identifier can't be string literal, maybe it's better to keep consistent as we also support `quotedIdentifier`.

##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -244,11 +244,19 @@ statement
     | SET TIME ZONE interval                                           #setTimeZone
     | SET TIME ZONE timezone=(STRING | LOCAL)                          #setTimeZone
     | SET TIME ZONE .*?                                                #setTimeZone
+    | SET configKey (EQ value=.+)?                                     #setConfiguration
     | SET .*?                                                          #setConfiguration
+    | RESET configKey?                                                 #resetConfiguration
     | RESET .*?                                                        #resetConfiguration
     | unsupportedHiveNativeCommands .*?                                #failNativeCommand
     ;
 
+configKey
+    : IDENTIFIER (('.' | ':') IDENTIFIER)*

Review comment:
       maybe we should replace `IDENTIFIER (('.' | ':') IDENTIFIER)*` with
   ```
   ( ~' ' | ~'=' )+
   ```

##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -244,11 +244,19 @@ statement
     | SET TIME ZONE interval                                           #setTimeZone
     | SET TIME ZONE timezone=(STRING | LOCAL)                          #setTimeZone
     | SET TIME ZONE .*?                                                #setTimeZone
+    | SET configKey (EQ value=.+)?                                     #setConfiguration
     | SET .*?                                                          #setConfiguration
+    | RESET configKey?                                                 #resetConfiguration
     | RESET .*?                                                        #resetConfiguration
     | unsupportedHiveNativeCommands .*?                                #failNativeCommand
     ;
 
+configKey
+    : IDENTIFIER (('.' | ':') IDENTIFIER)*

Review comment:
       It's because `IDENTIFIER` doesn't include keywords.
   
   maybe we should replace `IDENTIFIER (('.' | ':') IDENTIFIER)*` with
   ```
   ( ~' ' | ~'=' )+
   ```

##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -244,11 +244,19 @@ statement
     | SET TIME ZONE interval                                           #setTimeZone
     | SET TIME ZONE timezone=(STRING | LOCAL)                          #setTimeZone
     | SET TIME ZONE .*?                                                #setTimeZone
+    | SET configKey (EQ value=.+)?                                     #setConfiguration
     | SET .*?                                                          #setConfiguration
+    | RESET configKey?                                                 #resetConfiguration
     | RESET .*?                                                        #resetConfiguration
     | unsupportedHiveNativeCommands .*?                                #failNativeCommand
     ;
 
+configKey
+    : IDENTIFIER (('.' | ':') IDENTIFIER)*

Review comment:
       or replace `IDENTIFIER` with `IDENTIFIER | nonReserved | strictNonReserved`
   
   `nonReserved | strictNonReserved` matches all the keywords

##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -244,11 +244,22 @@ statement
     | SET TIME ZONE interval                                           #setTimeZone
     | SET TIME ZONE timezone=(STRING | LOCAL)                          #setTimeZone
     | SET TIME ZONE .*?                                                #setTimeZone
+    | SET configureKey (EQ value=.+)?                                  #setConfiguration
     | SET .*?                                                          #setConfiguration
+    | RESET configureKey?                                              #resetConfiguration
     | RESET .*?                                                        #resetConfiguration
     | unsupportedHiveNativeCommands .*?                                #failNativeCommand
     ;
 
+configureKey

Review comment:
       configure is a verb. `configKey` or `configurationKey`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
##########
@@ -66,17 +66,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
    * character in the raw string.
    */
   override def visitSetConfiguration(ctx: SetConfigurationContext): LogicalPlan = withOrigin(ctx) {
-    // Construct the command.
-    val raw = remainder(ctx.SET.getSymbol)
-    val keyValueSeparatorIndex = raw.indexOf('=')
-    if (keyValueSeparatorIndex >= 0) {
-      val key = raw.substring(0, keyValueSeparatorIndex).trim
-      val value = raw.substring(keyValueSeparatorIndex + 1).trim
-      SetCommand(Some(key -> Option(value)))
-    } else if (raw.nonEmpty) {
-      SetCommand(Some(raw.trim -> None))
+    if (ctx.configureKey() != null) {
+      val keyStr = ctx.configureKey().getText
+      if (ctx.value != null) {
+        SetCommand(Some(keyStr -> Option(ctx.value.getText)))
+      } else {
+        SetCommand(Some(keyStr -> None))
+      }
     } else {
-      SetCommand(None)
+      remainder(ctx.SET().getSymbol).trim match {
+        case "-v" => SetCommand(Some("-v" -> None))
+        case s if s.isEmpty => SetCommand(None)
+        case _ => throw new ParseException("Expected format is 'SET', 'SET key', or " +
+          "'SET key=value'. If you want to include special characters in key and value, " +

Review comment:
       `in key and value` -> `in key`

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
##########
@@ -61,6 +61,39 @@ class SparkSqlParserSuite extends AnalysisTest {
   private def intercept(sqlCommand: String, messages: String*): Unit =
     interceptParseException(parser.parsePlan)(sqlCommand, messages: _*)
 
+  test("Report Error for invalid usage of SET command") {
+    assertEqual("SET", SetCommand(None))
+    assertEqual("SET -v", SetCommand(Some("-v", None)))
+    assertEqual("SET spark.sql.key", SetCommand(Some("spark.sql.key" -> None)))
+    assertEqual("SET spark:sql:key=false", SetCommand(Some("spark:sql:key" -> Some("false"))))
+    assertEqual("SET spark.sql.    key=value",

Review comment:
       This is a bit weird. Is it possible to forbid it?

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
##########
@@ -61,6 +61,39 @@ class SparkSqlParserSuite extends AnalysisTest {
   private def intercept(sqlCommand: String, messages: String*): Unit =
     interceptParseException(parser.parsePlan)(sqlCommand, messages: _*)
 
+  test("Report Error for invalid usage of SET command") {
+    assertEqual("SET", SetCommand(None))
+    assertEqual("SET -v", SetCommand(Some("-v", None)))
+    assertEqual("SET spark.sql.key", SetCommand(Some("spark.sql.key" -> None)))
+    assertEqual("SET spark:sql:key=false", SetCommand(Some("spark:sql:key" -> Some("false"))))
+    assertEqual("SET spark.sql.    key=value",

Review comment:
       I think table name is fine, but config name is really weird. It's kind of a silent behavior change as `SET spark.sql.    key=value` sets config `spark.sql.    key` before.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
##########
@@ -61,6 +61,39 @@ class SparkSqlParserSuite extends AnalysisTest {
   private def intercept(sqlCommand: String, messages: String*): Unit =
     interceptParseException(parser.parsePlan)(sqlCommand, messages: _*)
 
+  test("Report Error for invalid usage of SET command") {
+    assertEqual("SET", SetCommand(None))
+    assertEqual("SET -v", SetCommand(Some("-v", None)))
+    assertEqual("SET spark.sql.key", SetCommand(Some("spark.sql.key" -> None)))
+    assertEqual("SET spark:sql:key=false", SetCommand(Some("spark:sql:key" -> Some("false"))))
+    assertEqual("SET spark.sql.    key=value",

Review comment:
       https://www.antlr3.org/pipermail/antlr-interest/2007-October/024030.html
   
   One way is to look at the hidden channel after parsing `configIdentifier`, and see if the next token is `WS`.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala
##########
@@ -61,6 +63,53 @@ class SparkSqlParserSuite extends AnalysisTest {
   private def intercept(sqlCommand: String, messages: String*): Unit =
     interceptParseException(parser.parsePlan)(sqlCommand, messages: _*)
 
+  test("Checks if SET/RESET can accept all the configurations") {
+    // Force to build static SQL configurations
+    StaticSQLConf
+    SQLConf.sqlConfEntries.values.asScala.foreach { e =>
+      assertEqual(s"SET ${e.key}", SetCommand(Some(e.key -> None)))
+      assertEqual(s"RESET ${e.key}", ResetCommand(Some(e.key)))
+    }
+  }
+
+  test("Report Error for invalid usage of SET command") {
+    assertEqual("SET", SetCommand(None))
+    assertEqual("SET -v", SetCommand(Some("-v", None)))
+    assertEqual("SET spark.sql.key", SetCommand(Some("spark.sql.key" -> None)))
+    assertEqual("SET spark:sql:key=false", SetCommand(Some("spark:sql:key" -> Some("false"))))
+    assertEqual("SET `spark.sql.    key`=value",
+      SetCommand(Some("spark.sql.    key" -> Some("value"))))
+
+    val expectedErrMsg = "Expected format is 'SET', 'SET key', or " +
+      "'SET key=value'. If you want to include special characters in key, " +
+      "please use quotes, e.g., SET `ke y`=value."
+    intercept("SET spark.sql.key value", expectedErrMsg)
+    intercept("SET spark.sql.key   'value'", expectedErrMsg)
+    intercept("SET    spark.sql.key \"value\" ", expectedErrMsg)
+    intercept("SET spark.sql.key value1 value2", expectedErrMsg)
+
+    intercept("SET spark.sql.   key=value", "no viable alternative at input '.'")
+    intercept("SET spark.sql   :key=value", "rule configKey failed predicate")
+    intercept("SET spark.sql .  key=value", "rule configKey failed predicate")

Review comment:
       I thought it will fallback to `SET .*?` and give nice error message.
   
   We can improve it later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org