You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/25 22:17:57 UTC

[GitHub] [spark] rdblue opened a new pull request #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)

rdblue opened a new pull request #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026
 
 
   ### What changes were proposed in this pull request?
   
   * Unify the create table syntax in the parser
   * Add `SerdeInfo` and `external` boolean to statement plans
   
   ### Why are the changes needed?
   
   Current behavior is confusing.
   
   ### Does this PR introduce any user-facing change?
   
   Not by default, but v2 sources will be able to handle STORED AS and other Hive clauses.
   
   ### How was this patch tested?
   
   Existing tests for current behavior, new parser tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606974026
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120659/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606769368
 
 
   You made a point that the same problem occurs for `STORED AS`/`ROW FORMAT DELIMITED`. How are we going to support them in the v2 catalog?
   
   IIRC the v2 catalog API was design based on the native CREATE TABLE syntax, i.e. provider, table properties, etc. These Hive specific syntaxes were not considered.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607416120
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606973637
 
 
   **[Test build #120659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120659/testReport)** for PR 28026 at commit [`50019e6`](https://github.com/apache/spark/commit/50019e64d4271ac914d207c6f7c2c4b08716612b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607544833
 
 
   The [last test results](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120684/testReport) were mostly tests that failed because of the syntax differences. Those tests used SQL syntax that couldn't be parsed by the DataSource CREATE rule, but also didn't specify either USING or Hive storage options. That shows that there were at least a lot of test queries silently affected by this ambiguity that were uncovered by unifying the syntax.
   
   To avoid too many changes in this PR, I changed the default to create Hive tables. That fixes most of the tests that I looked at and we'll see how the next run goes. We will eventually have to fix these tests when we swap the default, so they set the property to create Hive tables. But modifying so many tests should be done in a separate PR to keep this small and get it in more quickly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607364105
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606968414
 
 
   Build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606343800
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-608535485
 
 
   Also, the option names are minor, so we may want to discuss changing those separately and get this in sooner.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607410361
 
 
   **[Test build #120684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120684/testReport)** for PR 28026 at commit [`2484059`](https://github.com/apache/spark/commit/2484059c4d0e47911821ca3d65931d24fcdb5202).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607522454
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120686/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607522454
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120686/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r399642347
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##########
 @@ -2715,24 +2729,198 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
     (filtered, path)
   }
 
+  /**
+   * Create a [[SerdeInfo]] for creating tables.
+   *
+   * Format: STORED AS (name | INPUTFORMAT input_format OUTPUTFORMAT output_format)
+   */
+  override def visitCreateFileFormat(ctx: CreateFileFormatContext): SerdeInfo = withOrigin(ctx) {
+    (ctx.fileFormat, ctx.storageHandler) match {
+      // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format
+      case (c: TableFileFormatContext, null) =>
+        SerdeInfo(formatClasses = Some((string(c.inFmt), string(c.outFmt))))
+      // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO
+      case (c: GenericFileFormatContext, null) =>
+        SerdeInfo(storedAs = Some(c.identifier.getText))
+      case (null, storageHandler) =>
+        operationNotAllowed("STORED BY", ctx)
+      case _ =>
+        throw new ParseException("Expected either STORED AS or STORED BY, not both", ctx)
+    }
+  }
+
+  /**
+   * Create a [[SerdeInfo]] used for creating tables.
+   *
+   * Example format:
+   * {{{
+   *   SERDE serde_name [WITH SERDEPROPERTIES (k1=v1, k2=v2, ...)]
+   * }}}
+   *
+   * OR
+   *
+   * {{{
+   *   DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
+   *   [COLLECTION ITEMS TERMINATED BY char]
+   *   [MAP KEYS TERMINATED BY char]
+   *   [LINES TERMINATED BY char]
+   *   [NULL DEFINED AS char]
+   * }}}
+   */
+  def visitRowFormat(
+      ctx: RowFormatContext): SerdeInfo = withOrigin(ctx) {
+    ctx match {
+      case serde: RowFormatSerdeContext => visitRowFormatSerde(serde)
+      case delimited: RowFormatDelimitedContext => visitRowFormatDelimited(delimited)
+    }
+  }
+
+  /**
+   * Create SERDE row format name and properties pair.
+   */
+  override def visitRowFormatSerde(ctx: RowFormatSerdeContext): SerdeInfo = withOrigin(ctx) {
+    import ctx._
+    SerdeInfo(
+      serde = Some(string(name)),
+      serdeProperties = Option(tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty))
+  }
+
+  /**
+   * Create a delimited row format properties object.
+   */
+  override def visitRowFormatDelimited(
+      ctx: RowFormatDelimitedContext): SerdeInfo = withOrigin(ctx) {
+    // Collect the entries if any.
+    def entry(key: String, value: Token): Seq[(String, String)] = {
+      Option(value).toSeq.map(x => key -> string(x))
+    }
+    // TODO we need proper support for the NULL format.
+    val entries =
+      entry("field.delim", ctx.fieldsTerminatedBy) ++
+          entry("serialization.format", ctx.fieldsTerminatedBy) ++
+          entry("escape.delim", ctx.escapedBy) ++
+          // The following typo is inherited from Hive...
+          entry("colelction.delim", ctx.collectionItemsTerminatedBy) ++
+          entry("mapkey.delim", ctx.keysTerminatedBy) ++
+          Option(ctx.linesSeparatedBy).toSeq.map { token =>
+            val value = string(token)
+            validate(
+              value == "\n",
+              s"LINES TERMINATED BY only supports newline '\\n' right now: $value",
+              ctx)
+            "line.delim" -> value
+          }
+    SerdeInfo(serdeProperties = entries.toMap)
+  }
+
+  /**
+   * Throw a [[ParseException]] if the user specified incompatible SerDes through ROW FORMAT
+   * and STORED AS.
+   *
+   * The following are allowed. Anything else is not:
+   *   ROW FORMAT SERDE ... STORED AS [SEQUENCEFILE | RCFILE | TEXTFILE]
+   *   ROW FORMAT DELIMITED ... STORED AS TEXTFILE
+   *   ROW FORMAT ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ...
+   */
+  protected def validateRowFormatFileFormat(
+      rowFormatCtx: RowFormatContext,
+      createFileFormatCtx: CreateFileFormatContext,
+      parentCtx: ParserRuleContext): Unit = {
+    if (rowFormatCtx == null || createFileFormatCtx == null) {
+      return
+    }
+    (rowFormatCtx, createFileFormatCtx.fileFormat) match {
+      case (_, ffTable: TableFileFormatContext) => // OK
+      case (rfSerde: RowFormatSerdeContext, ffGeneric: GenericFileFormatContext) =>
+        ffGeneric.identifier.getText.toLowerCase(Locale.ROOT) match {
+          case ("sequencefile" | "textfile" | "rcfile") => // OK
+          case fmt =>
+            operationNotAllowed(
+              s"ROW FORMAT SERDE is incompatible with format '$fmt', which also specifies a serde",
+              parentCtx)
+        }
+      case (rfDelimited: RowFormatDelimitedContext, ffGeneric: GenericFileFormatContext) =>
+        ffGeneric.identifier.getText.toLowerCase(Locale.ROOT) match {
+          case "textfile" => // OK
+          case fmt => operationNotAllowed(
+            s"ROW FORMAT DELIMITED is only compatible with 'textfile', not '$fmt'", parentCtx)
+        }
+      case _ =>
+        // should never happen
+        def str(ctx: ParserRuleContext): String = {
+          (0 until ctx.getChildCount).map { i => ctx.getChild(i).getText }.mkString(" ")
+        }
+        operationNotAllowed(
+          s"Unexpected combination of ${str(rowFormatCtx)} and ${str(createFileFormatCtx)}",
+          parentCtx)
+    }
+  }
+
+  protected def validateRowFormatFileFormat(
+      rowFormatCtx: Seq[RowFormatContext],
+      createFileFormatCtx: Seq[CreateFileFormatContext],
+      parentCtx: ParserRuleContext): Unit = {
+    if (rowFormatCtx.size == 1 && createFileFormatCtx.size == 1) {
+      validateRowFormatFileFormat(rowFormatCtx.head, createFileFormatCtx.head, parentCtx)
+    }
+  }
+
   override def visitCreateTableClauses(ctx: CreateTableClausesContext): TableClauses = {
     checkDuplicateClauses(ctx.TBLPROPERTIES, "TBLPROPERTIES", ctx)
     checkDuplicateClauses(ctx.OPTIONS, "OPTIONS", ctx)
     checkDuplicateClauses(ctx.PARTITIONED, "PARTITIONED BY", ctx)
+    checkDuplicateClauses(ctx.bucketSpec(), "CLUSTERED BY", ctx)
 
 Review comment:
   already exists - see below

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-610748104
 
 
   >  Again, this doesn't accomplish much. It makes this only marginally smaller and introduces more work to get the feature done.
   
   @rdblue We already cut RC. We try to avoid adding any feature work during RC stage. Can we just fix the correctness issue and implement the new feature in Spark 3.1? Is it a must-have feature in 3.0? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608510899
 
 
   I just recalled that we tried to unify Hive table before, to support `CREATE TABLE ... USING hive`. And we already defined option names to represent Hive serde information. These names are defined in `HiveOptions`, and we re-construct the Hive Serde from these options in the rule `ResolveHiveSerdeTable`.
   
   Is it possible to unify them?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-605538309
 
 
   Same here, please describe TODO list for this PR to remove WIP.
   
   If I skimmed the code correctly, this "requires" end users to add `USING hive` even they add Hive create table specific clause. Do I understand correctly?
   
   It's OK if that's one of TODO - we should concern about documentation to make clear which clause changes the provider. (Personally I'm not in favor of this, but otherwise why not just add `HIVE` to second syntax as both are requiring end users to change their query and adding `HIVE` is clearer?)
   
   If the change is intentional, that's my favor but I may concern about removing compatibility flag, as this will force end users to migrate their query in any way.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-612921360
 
 
   You are right that the main argument is how to pass the metadata.
   
   For the EXTERNAL keyword, according to `SparkSqlAstBuilder.visitCreateHiveTable`:
   ```
   ...
   // If we are creating an EXTERNAL table, then the LOCATION field is required
   if (external && location.isEmpty) {
     operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx)
   }
   ...
   // If location is defined, we'll assume this is an external table.
   // Otherwise, we may accidentally delete existing data.
   val tableType = if (external || location.isDefined) {
     CatalogTableType.EXTERNAL
   } else {
      CatalogTableType.MANAGED
   }
   ```
   If we implement Hive with catalog v2, what we care about is the table type, not the existence of the EXTERNAL keyword. The table type is decided by the existence of the LOCATION keyword, so I don't think we need the extra "PROP_EXTERNAL".
   
   Another problem: How are we going to differentiate SERDEPROPERTIES, TBLPROPERTIES, and OPTIONS? How shall we tell end-users that when to use which statement?
   
   > I left a comment with a compromise for Hive-specific information but didn't get a reply.
   
   Can you post the link to your comment? Maybe I missed it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606967859
 
 
   **[Test build #120658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120658/testReport)** for PR 28026 at commit [`8bf400d`](https://github.com/apache/spark/commit/8bf400d769417b2e4085af4019571ecfcfa412e3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606973968
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25358/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606375737
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120620/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609071094
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25508/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604162661
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120376/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606973963
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-613498890
 
 
   @rdblue Let's also think about the end-users. For example, currently we can say: "the EXTERNAL keyword means the table is external, and if the LOCATION is specified, it implicitly indicates EXTERNAL". If we add the "PROP_EXTERNAL", then it becomes: "the meaning of EXTERNAL depends on the catalog implementation". I think that makes Spark less user-friendly, as the behavior is less clear.
   
   The same thing applies to SERDEPROPERTIES, TBLPROPERTIES, and OPTIONS. I understand that we need to distinguish them by adding the "option." prefix, but what's the story for end-users? What they should put as OPTIONS and what for TBLPROPERTIES?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608217794
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120742/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r399024977
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java
 ##########
 @@ -46,6 +46,11 @@
    */
   String PROP_LOCATION = "location";
 
+  /**
+   * A reserved property to specify a table was created with EXTERNAL.
+   */
+  String PROP_EXTERNAL = "external";
 
 Review comment:
   The parse doesn't accept `EXTERNAL` except for hive serde tables. I don't agree that because Hive accepts it then all Spark data sources should accept it.
   
   Why not forbid EXTERNAL for hive serde tables? It can be an option as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606343429
 
 
   **[Test build #120620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120620/testReport)** for PR 28026 at commit [`12a17fc`](https://github.com/apache/spark/commit/12a17fcf2189796741c3e11bab2f98c1bf996c03).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-615322877
 
 
   > EXTERNAL doesn't give you useful information IMO
   
   Hive allows EXTERNAL with LOCATION and without it. Spark would be preventing EXTERNAL tables with a default location in Hive, which is perfectly valid to a Hive catalog. Your opinion that it is not useful doesn't mean it isn't a feature that Hive supports. This would be Spark deciding what a specific catalog can do.
   
   > Are we going to tell end-users that these are just two different ways to specify some configs and the catalog is free to interpret them?
   
   Yes. I don't see much of a benefit to either making sure OPTIONS and TBLPROPERTIES don't conflict as long as we document how OPTIONS are added.
   
   If you're concerned about conflicts between OPTIONS and SERDEPROPERTIES, then I think it would be reasonable to ensure that only one is used in a SQL statement because these are from different flavors of CREATE TABLE. I also don't think it is likely that anyone would use both, given how nested SERDEPROPERTIES is in the parser.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r398299449
 
 

 ##########
 File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##########
 @@ -123,33 +117,17 @@ statement
         (RESTRICT | CASCADE)?                                          #dropNamespace
     | SHOW (DATABASES | NAMESPACES) ((FROM | IN) multipartIdentifier)?
         (LIKE? pattern=STRING)?                                        #showNamespaces
-    | {!legacy_create_hive_table_by_default_enabled}?
-        createTableHeader ('(' colTypeList ')')? tableProvider?
+    | createTableHeader ('(' colTypeList ')')? tableProvider?
 
 Review comment:
   shall we put `tableProvider` into `createTableClauses` as well?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607543449
 
 
   **[Test build #120699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120699/testReport)** for PR 28026 at commit [`5e8f5d0`](https://github.com/apache/spark/commit/5e8f5d01ce6e9c9eb8ab2a05dff45c73b1cbbc90).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r400550699
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##########
 @@ -2715,24 +2729,198 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
     (filtered, path)
   }
 
+  /**
+   * Create a [[SerdeInfo]] for creating tables.
+   *
+   * Format: STORED AS (name | INPUTFORMAT input_format OUTPUTFORMAT output_format)
+   */
+  override def visitCreateFileFormat(ctx: CreateFileFormatContext): SerdeInfo = withOrigin(ctx) {
+    (ctx.fileFormat, ctx.storageHandler) match {
+      // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format
+      case (c: TableFileFormatContext, null) =>
+        SerdeInfo(formatClasses = Some((string(c.inFmt), string(c.outFmt))))
+      // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO
+      case (c: GenericFileFormatContext, null) =>
+        SerdeInfo(storedAs = Some(c.identifier.getText))
+      case (null, storageHandler) =>
+        operationNotAllowed("STORED BY", ctx)
+      case _ =>
+        throw new ParseException("Expected either STORED AS or STORED BY, not both", ctx)
+    }
+  }
+
+  /**
+   * Create a [[SerdeInfo]] used for creating tables.
+   *
+   * Example format:
+   * {{{
+   *   SERDE serde_name [WITH SERDEPROPERTIES (k1=v1, k2=v2, ...)]
+   * }}}
+   *
+   * OR
+   *
+   * {{{
+   *   DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
+   *   [COLLECTION ITEMS TERMINATED BY char]
+   *   [MAP KEYS TERMINATED BY char]
+   *   [LINES TERMINATED BY char]
+   *   [NULL DEFINED AS char]
+   * }}}
+   */
+  def visitRowFormat(
+      ctx: RowFormatContext): SerdeInfo = withOrigin(ctx) {
+    ctx match {
+      case serde: RowFormatSerdeContext => visitRowFormatSerde(serde)
+      case delimited: RowFormatDelimitedContext => visitRowFormatDelimited(delimited)
+    }
+  }
+
+  /**
+   * Create SERDE row format name and properties pair.
+   */
+  override def visitRowFormatSerde(ctx: RowFormatSerdeContext): SerdeInfo = withOrigin(ctx) {
+    import ctx._
+    SerdeInfo(
+      serde = Some(string(name)),
+      serdeProperties = Option(tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty))
+  }
+
+  /**
+   * Create a delimited row format properties object.
+   */
+  override def visitRowFormatDelimited(
+      ctx: RowFormatDelimitedContext): SerdeInfo = withOrigin(ctx) {
+    // Collect the entries if any.
+    def entry(key: String, value: Token): Seq[(String, String)] = {
+      Option(value).toSeq.map(x => key -> string(x))
+    }
+    // TODO we need proper support for the NULL format.
+    val entries =
+      entry("field.delim", ctx.fieldsTerminatedBy) ++
+          entry("serialization.format", ctx.fieldsTerminatedBy) ++
+          entry("escape.delim", ctx.escapedBy) ++
+          // The following typo is inherited from Hive...
+          entry("colelction.delim", ctx.collectionItemsTerminatedBy) ++
+          entry("mapkey.delim", ctx.keysTerminatedBy) ++
+          Option(ctx.linesSeparatedBy).toSeq.map { token =>
+            val value = string(token)
+            validate(
+              value == "\n",
+              s"LINES TERMINATED BY only supports newline '\\n' right now: $value",
+              ctx)
+            "line.delim" -> value
+          }
+    SerdeInfo(serdeProperties = entries.toMap)
+  }
+
+  /**
+   * Throw a [[ParseException]] if the user specified incompatible SerDes through ROW FORMAT
+   * and STORED AS.
+   *
+   * The following are allowed. Anything else is not:
+   *   ROW FORMAT SERDE ... STORED AS [SEQUENCEFILE | RCFILE | TEXTFILE]
+   *   ROW FORMAT DELIMITED ... STORED AS TEXTFILE
+   *   ROW FORMAT ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ...
+   */
+  protected def validateRowFormatFileFormat(
+      rowFormatCtx: RowFormatContext,
+      createFileFormatCtx: CreateFileFormatContext,
+      parentCtx: ParserRuleContext): Unit = {
+    if (rowFormatCtx == null || createFileFormatCtx == null) {
+      return
+    }
+    (rowFormatCtx, createFileFormatCtx.fileFormat) match {
+      case (_, ffTable: TableFileFormatContext) => // OK
+      case (rfSerde: RowFormatSerdeContext, ffGeneric: GenericFileFormatContext) =>
+        ffGeneric.identifier.getText.toLowerCase(Locale.ROOT) match {
+          case ("sequencefile" | "textfile" | "rcfile") => // OK
+          case fmt =>
+            operationNotAllowed(
+              s"ROW FORMAT SERDE is incompatible with format '$fmt', which also specifies a serde",
+              parentCtx)
+        }
+      case (rfDelimited: RowFormatDelimitedContext, ffGeneric: GenericFileFormatContext) =>
+        ffGeneric.identifier.getText.toLowerCase(Locale.ROOT) match {
+          case "textfile" => // OK
+          case fmt => operationNotAllowed(
+            s"ROW FORMAT DELIMITED is only compatible with 'textfile', not '$fmt'", parentCtx)
+        }
+      case _ =>
+        // should never happen
+        def str(ctx: ParserRuleContext): String = {
+          (0 until ctx.getChildCount).map { i => ctx.getChild(i).getText }.mkString(" ")
+        }
+        operationNotAllowed(
+          s"Unexpected combination of ${str(rowFormatCtx)} and ${str(createFileFormatCtx)}",
+          parentCtx)
+    }
+  }
+
+  protected def validateRowFormatFileFormat(
+      rowFormatCtx: Seq[RowFormatContext],
+      createFileFormatCtx: Seq[CreateFileFormatContext],
+      parentCtx: ParserRuleContext): Unit = {
+    if (rowFormatCtx.size == 1 && createFileFormatCtx.size == 1) {
+      validateRowFormatFileFormat(rowFormatCtx.head, createFileFormatCtx.head, parentCtx)
+    }
+  }
+
   override def visitCreateTableClauses(ctx: CreateTableClausesContext): TableClauses = {
     checkDuplicateClauses(ctx.TBLPROPERTIES, "TBLPROPERTIES", ctx)
     checkDuplicateClauses(ctx.OPTIONS, "OPTIONS", ctx)
     checkDuplicateClauses(ctx.PARTITIONED, "PARTITIONED BY", ctx)
+    checkDuplicateClauses(ctx.bucketSpec(), "CLUSTERED BY", ctx)
+    checkDuplicateClauses(ctx.createFileFormat, "STORED AS/BY", ctx)
+    checkDuplicateClauses(ctx.rowFormat, "ROW FORMAT", ctx)
     checkDuplicateClauses(ctx.commentSpec(), "COMMENT", ctx)
     checkDuplicateClauses(ctx.bucketSpec(), "CLUSTERED BY", ctx)
     checkDuplicateClauses(ctx.locationSpec, "LOCATION", ctx)
 
-    val partitioning: Seq[Transform] =
-      Option(ctx.partitioning).map(visitTransformList).getOrElse(Nil)
+    if (ctx.skewSpec.size > 0) {
+      operationNotAllowed("CREATE TABLE ... SKEWED BY", ctx)
+    }
+
+    val (partTransforms, partCols) =
+      Option(ctx.partitioning).map(visitPartitionFieldList).getOrElse((Nil, Nil))
     val bucketSpec = ctx.bucketSpec().asScala.headOption.map(visitBucketSpec)
     val properties = Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty)
     val cleanedProperties = cleanTableProperties(ctx, properties)
     val options = Option(ctx.options).map(visitPropertyKeyValues).getOrElse(Map.empty)
     val location = visitLocationSpecList(ctx.locationSpec())
     val (cleanedOptions, newLocation) = cleanTableOptions(ctx, options, location)
     val comment = visitCommentSpecList(ctx.commentSpec())
-    (partitioning, bucketSpec, cleanedProperties, cleanedOptions, newLocation, comment)
+
+    validateRowFormatFileFormat(ctx.rowFormat.asScala, ctx.createFileFormat.asScala, ctx)
+    val fileFormatSerdeInfo = ctx.createFileFormat.asScala.map(visitCreateFileFormat)
+    val rowFormatSerdeInfo = ctx.rowFormat.asScala.map(visitRowFormat)
+    val serdeInfo =
+      (fileFormatSerdeInfo ++ rowFormatSerdeInfo).reduceLeftOption((x, y) => x.merge(y))
+
+    (partTransforms, partCols, bucketSpec, cleanedProperties, cleanedOptions, newLocation, comment,
+        serdeInfo)
+  }
+
+  private def partitionExpressions(
+      partTransforms: Seq[Transform],
+      partCols: Seq[StructField],
+      ctx: ParserRuleContext): Seq[Transform] = {
+    if (partTransforms.nonEmpty) {
+      if (partCols.nonEmpty) {
+        val references = partTransforms.map(_.describe()).mkString(", ")
+        val columns = partCols.map(field => s"${field.name}: ${field.dataType}").mkString(", ")
+        throw new ParseException(
+          s"""PARTITION BY: Cannot mix partition expressions and partition columns:
+             |Expressions: $references
+             |Columns: $columns""".stripMargin, ctx)
 
 Review comment:
   @HeartSaVioR, this is where `PARTITION BY` is validated. It can be either column-referencing expressions, or column definitions but not a mix of the two.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606300393
 
 
   > Spark hides the EXTERNAL TABLE concept from users
   
   From [Hive's documentation for managed tables](https://cwiki.apache.org/confluence/display/Hive/Managed+vs.+External+Tables):
   > The default location can be overridden by the location property during table creation
   
   Hive allows `EXTERNAL` with a custom `LOCATION`, so I don't think that Spark should prevent this from being passed to catalogs. A catalog may choose to allow this. Spark can prevent its internal catalog from creating such tables, but it can't prevent them from existing.
   
   Since this is a valid configuration for a table, it must be possible to pass this through to a v2 catalog.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606974016
 
 
   **[Test build #120659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120659/testReport)** for PR 28026 at commit [`50019e6`](https://github.com/apache/spark/commit/50019e64d4271ac914d207c6f7c2c4b08716612b).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606619877
 
 
   > Hive allows EXTERNAL with a custom LOCATION, so I don't think that Spark should prevent this from being passed to catalogs.
   
   This is similar to the table option/property discussion. Hive has more features (more flexible parser) than Spark, but it doesn't mean Spark needs to accept them all.
   
   When we introduced the Spark native CREATE TABLE syntax, we evaluated the Hive syntax and decided to forbid the EXTERNAL keyword. Hive requires LOCATION to be specified if EXTERNAL exists, but also allows LOCATION to be specified even there is no EXTERNAL. So the missing feature is: creating a managed table with a custom location. This feature is weird and actually Hive metatore gives a warning message if the client creates managed tables with custom location, which implies that Hive doesn't recommend using this feature as well.
   
   I think we shouldn't put EXTERNAL into v2 API and we should probably forbid EXTERNAL for hive tables as well eventually, to be consistent with Spark native tables.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607003471
 
 
   > The create test in SparkSqlParserSuite highlighted an existing problem with spark.sql.legacy.createHiveTableByDefault.enabled. With that table property disabled (default to USING), Hive was still used to create a table because Hive's PARTITION BY syntax was used. I think this is very confusing behavior! (@HeartSaVioR probably agrees.)
   
   Sure, I totally agree. That's one of the issue I mentioned in mail thread, refer below link:
   
   https://lists.apache.org/thread.html/ra2aa31ed7cbd4e34d3504adc97cae1301cc249bfb8f95565b808b0cb%40%3Cdev.spark.apache.org%3E
   
   It's wrong if someone simply thinks that if some words (`ROW FORMAT`, `STORED AS`) exist then Hive create table will be used. There're more differences on details, including the point @rdblue pointed out here. The issue didn't exist before, as `USING` clause plays as a marker to distinguish two syntaxes.
   
   > Because of this problem, I think we should either roll back the patch to make USING the default in 3.0 since it is not applied everywhere. That, or get this syntax unification into 3.0.
   
   That's in line with my proposal as well. The ideal approach would be reverting SPARK-30098 in Spark 3.0 and make it correct with syntax unification in further version. I've also proposed some alternatives (add a marker, turn on legacy config by default) but haven't heard any feedback.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609100553
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120809/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r398731723
 
 

 ##########
 File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##########
 @@ -123,33 +117,17 @@ statement
         (RESTRICT | CASCADE)?                                          #dropNamespace
     | SHOW (DATABASES | NAMESPACES) ((FROM | IN) multipartIdentifier)?
         (LIKE? pattern=STRING)?                                        #showNamespaces
-    | {!legacy_create_hive_table_by_default_enabled}?
-        createTableHeader ('(' colTypeList ')')? tableProvider?
+    | createTableHeader ('(' colTypeList ')')? tableProvider?
 
 Review comment:
   You're welcome to do that in a follow-up. Here, I'm avoiding unnecessary changes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604162652
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606966281
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25357/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609070981
 
 
   **[Test build #120809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120809/testReport)** for PR 28026 at commit [`7204dd4`](https://github.com/apache/spark/commit/7204dd40f1e0de1fa99241e538b62dfcc7fd3058).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606299173
 
 
   > I don't know who marked comments as resolved
   
   That was me because I thought they were fairly straight-forward and definitive answers. We can still discuss them more since that wasn't the case.
   
   > If I skimmed the code correctly, this "requires" end users to add USING hive even they add Hive create table specific clause.
   
   That's not quite right.
   
   The parser now creates a `CreateTableStatement` that contains everything that was parsed from the SQL string, with minimal interpretation. That is what the user requested, assuming that request was well-formed -- meaning that it doesn't have duplicate clauses or mixed partition fields.
   
   The next step is to convert that statement to a plan. At the moment, this only converts to v2 plans because we no longer use v1 and this was ported from our internal version. The interpretation of that statement for v1 -- converting to v1 `CreateTable` -- has not yet been added. I plan to implement that like this:
   * If `provider` is present from USING, set the `CatalogTable` provider and validate that no `SerdeInfo` is set.
   * If `SerdeInfo` is present, validate that `provider` was not set and use a Hive `CatalogTable`.
   * If neither `SerdeInfo` nor `provider` is set, use `SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED` or similar setting to set a default provider or default serde properties.
   
   Speaking of well-formed, I think it might be reasonable to ensure that either a provider or serde info is present, but not both. That would raise an exception if both `USING` and `STORED AS` or `SERDE` clauses were used. I think that's probably reasonable. In that case, we could skip the checks above. What do you think?
   
   Coming back to the current implementation for the v2 interface: Spark is passing through everything that was parsed because Spark doesn't know what kind of table the catalog is going to create. The v2 catalog that we use can create Hive tables, Spark datasource tables, and Iceberg tables, so all of the information needs to be passed through -- again, with the requirement that the SQL was well-formed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604805210
 
 
   I understand that we want to make Hive implements the v2 API eventually, but can we focus on syntax unification right now? Let's not change the behavior, for example, EXTERNAL should still be disallowed when creating native data source tables.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607415571
 
 
   **[Test build #120686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120686/testReport)** for PR 28026 at commit [`dc1b761`](https://github.com/apache/spark/commit/dc1b76107d7bbb9d4782b0a080ea037324df810d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608524608
 
 
   @cloud-fan, it would be nice, but I don't think it is a good idea. I have no problem with the option convention for formats and serde, but the delimiter properties are different with no obvious value. I don't think it is a good idea to force sources to translate delimiter options because it is going to be error-prone (just look at the mess from `colelection.delim` for example).
   
   How about using `fileFormat`, `inputFormat`, `outputFormat`, and `serde` as options, but not translating the delimiters?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607353908
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120683/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r400540742
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##########
 @@ -2715,24 +2729,198 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
     (filtered, path)
   }
 
+  /**
+   * Create a [[SerdeInfo]] for creating tables.
+   *
+   * Format: STORED AS (name | INPUTFORMAT input_format OUTPUTFORMAT output_format)
+   */
+  override def visitCreateFileFormat(ctx: CreateFileFormatContext): SerdeInfo = withOrigin(ctx) {
+    (ctx.fileFormat, ctx.storageHandler) match {
+      // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format
+      case (c: TableFileFormatContext, null) =>
+        SerdeInfo(formatClasses = Some((string(c.inFmt), string(c.outFmt))))
+      // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO
+      case (c: GenericFileFormatContext, null) =>
+        SerdeInfo(storedAs = Some(c.identifier.getText))
+      case (null, storageHandler) =>
+        operationNotAllowed("STORED BY", ctx)
+      case _ =>
+        throw new ParseException("Expected either STORED AS or STORED BY, not both", ctx)
+    }
+  }
+
+  /**
+   * Create a [[SerdeInfo]] used for creating tables.
+   *
+   * Example format:
+   * {{{
+   *   SERDE serde_name [WITH SERDEPROPERTIES (k1=v1, k2=v2, ...)]
+   * }}}
+   *
+   * OR
+   *
+   * {{{
+   *   DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
+   *   [COLLECTION ITEMS TERMINATED BY char]
+   *   [MAP KEYS TERMINATED BY char]
+   *   [LINES TERMINATED BY char]
+   *   [NULL DEFINED AS char]
+   * }}}
+   */
+  def visitRowFormat(
+      ctx: RowFormatContext): SerdeInfo = withOrigin(ctx) {
+    ctx match {
+      case serde: RowFormatSerdeContext => visitRowFormatSerde(serde)
+      case delimited: RowFormatDelimitedContext => visitRowFormatDelimited(delimited)
+    }
+  }
+
+  /**
+   * Create SERDE row format name and properties pair.
+   */
+  override def visitRowFormatSerde(ctx: RowFormatSerdeContext): SerdeInfo = withOrigin(ctx) {
+    import ctx._
+    SerdeInfo(
+      serde = Some(string(name)),
+      serdeProperties = Option(tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty))
+  }
+
+  /**
+   * Create a delimited row format properties object.
+   */
+  override def visitRowFormatDelimited(
+      ctx: RowFormatDelimitedContext): SerdeInfo = withOrigin(ctx) {
+    // Collect the entries if any.
+    def entry(key: String, value: Token): Seq[(String, String)] = {
+      Option(value).toSeq.map(x => key -> string(x))
+    }
+    // TODO we need proper support for the NULL format.
+    val entries =
+      entry("field.delim", ctx.fieldsTerminatedBy) ++
+          entry("serialization.format", ctx.fieldsTerminatedBy) ++
+          entry("escape.delim", ctx.escapedBy) ++
+          // The following typo is inherited from Hive...
+          entry("colelction.delim", ctx.collectionItemsTerminatedBy) ++
+          entry("mapkey.delim", ctx.keysTerminatedBy) ++
+          Option(ctx.linesSeparatedBy).toSeq.map { token =>
+            val value = string(token)
+            validate(
+              value == "\n",
+              s"LINES TERMINATED BY only supports newline '\\n' right now: $value",
+              ctx)
+            "line.delim" -> value
+          }
+    SerdeInfo(serdeProperties = entries.toMap)
+  }
+
+  /**
+   * Throw a [[ParseException]] if the user specified incompatible SerDes through ROW FORMAT
+   * and STORED AS.
+   *
+   * The following are allowed. Anything else is not:
+   *   ROW FORMAT SERDE ... STORED AS [SEQUENCEFILE | RCFILE | TEXTFILE]
+   *   ROW FORMAT DELIMITED ... STORED AS TEXTFILE
+   *   ROW FORMAT ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ...
+   */
+  protected def validateRowFormatFileFormat(
+      rowFormatCtx: RowFormatContext,
+      createFileFormatCtx: CreateFileFormatContext,
+      parentCtx: ParserRuleContext): Unit = {
+    if (rowFormatCtx == null || createFileFormatCtx == null) {
+      return
+    }
+    (rowFormatCtx, createFileFormatCtx.fileFormat) match {
+      case (_, ffTable: TableFileFormatContext) => // OK
+      case (rfSerde: RowFormatSerdeContext, ffGeneric: GenericFileFormatContext) =>
+        ffGeneric.identifier.getText.toLowerCase(Locale.ROOT) match {
+          case ("sequencefile" | "textfile" | "rcfile") => // OK
+          case fmt =>
+            operationNotAllowed(
+              s"ROW FORMAT SERDE is incompatible with format '$fmt', which also specifies a serde",
+              parentCtx)
+        }
+      case (rfDelimited: RowFormatDelimitedContext, ffGeneric: GenericFileFormatContext) =>
+        ffGeneric.identifier.getText.toLowerCase(Locale.ROOT) match {
+          case "textfile" => // OK
+          case fmt => operationNotAllowed(
+            s"ROW FORMAT DELIMITED is only compatible with 'textfile', not '$fmt'", parentCtx)
+        }
+      case _ =>
+        // should never happen
+        def str(ctx: ParserRuleContext): String = {
+          (0 until ctx.getChildCount).map { i => ctx.getChild(i).getText }.mkString(" ")
+        }
+        operationNotAllowed(
+          s"Unexpected combination of ${str(rowFormatCtx)} and ${str(createFileFormatCtx)}",
+          parentCtx)
+    }
+  }
+
+  protected def validateRowFormatFileFormat(
+      rowFormatCtx: Seq[RowFormatContext],
+      createFileFormatCtx: Seq[CreateFileFormatContext],
+      parentCtx: ParserRuleContext): Unit = {
+    if (rowFormatCtx.size == 1 && createFileFormatCtx.size == 1) {
+      validateRowFormatFileFormat(rowFormatCtx.head, createFileFormatCtx.head, parentCtx)
+    }
+  }
+
   override def visitCreateTableClauses(ctx: CreateTableClausesContext): TableClauses = {
     checkDuplicateClauses(ctx.TBLPROPERTIES, "TBLPROPERTIES", ctx)
     checkDuplicateClauses(ctx.OPTIONS, "OPTIONS", ctx)
     checkDuplicateClauses(ctx.PARTITIONED, "PARTITIONED BY", ctx)
+    checkDuplicateClauses(ctx.bucketSpec(), "CLUSTERED BY", ctx)
 
 Review comment:
   I'll fix it. It's an artifact of porting to master.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606746750
 
 
   `EXTERNAL` is never supported in Spark native tables, and it's intentional. At least at that time, we thought `EXTERNAL` is a bad feature and Spark shouldn't have it. Currently Spark supports it only for Hive compatibility, and it's an option to make the v2 API simpler by dropping EXTERNAL completely and sacrificing Hive compatibility.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607350987
 
 
   **[Test build #120683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120683/testReport)** for PR 28026 at commit [`d7500c8`](https://github.com/apache/spark/commit/d7500c84b9ac941b73f46ae99aa5a6a9c4cfe8e8).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608152053
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25440/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607410570
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120684/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-613177337
 
 
   > If we implement Hive with catalog v2 . . .
   
   Anyone can plug an implementation into the catalog API. Spark cannot dictate what those implementations do or what they support, it can only choose how the Spark catalogs behave.
   
   What if Spark decided that ROW FORMAT SERDE can only be used with SEQUENCEFILE and not TEXTFILE? That's a reasonable choice because sequence file is splittable and compressable, but it would be ridiculous for Spark to do that kind of validation that is so specific to a source. The idea to validate EXTERNAL is the same: it breaks the catalog abstraction and disallows a valid Hive configuration.
   
   If Spark chooses to enforce the LOCATION/EXTERNAL relationship, then there are correct ways to do it:
   1. Enforce this the Hive connector that ships with Spark, but pass external through the API
   2. Remove EXTERNAL from the parser and stop passing it through the API
   
   > How are we going to differentiate SERDEPROPERTIES, TBLPROPERTIES, and OPTIONS?
   
   * TBLPROPERTIES are passed as-is, with options added
   * OPTIONS are prefixed with `option.` so they can be recovered
   * SERDEPROPERTIES are prefixed with `option.` for the same reason. I used the same prefix because both OPTIONS and SERDEPROPERTIES are stored in Hive's serde properties
   
   We could be a lot more strict here. We could throw an error if both OPTIONS and SERDEPROPERTIES are set. We could also throw an error if keys in OPTIONS conflict with SERDEPROPERTIES, or if the `option.` prefix is used by TBLPROPERTIES. I don't think these rules are worth the trouble. As long as we document how these are passed, it is fine that they are synonyms. I think it is very unlikely that this would cause confusion, let alone a problem.
   
   > Can you post the link to your comment?
   
   https://github.com/apache/spark/pull/28026#issuecomment-608524608

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r398300454
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java
 ##########
 @@ -46,6 +46,11 @@
    */
   String PROP_LOCATION = "location";
 
+  /**
+   * A reserved property to specify a table was created with EXTERNAL.
+   */
+  String PROP_EXTERNAL = "external";
 
 Review comment:
   Spark hides the `EXTERNAL TABLE` concept from users, that's why we forbid `EXTERNAL` in the native CREATE TABLE syntax. I don't think we should expose the `EXTERNAL` to catalog plugins.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608152045
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607658943
 
 
   Sorry if this is answered above but why do we handle v2 APIs here together? V2 APIs are in development, and we can make the changes after Spark 3.0 too whereas seems it's best to unify the syntax in Spark 3.0 first considering it passed code freeze and now it's RC period. Can we split these two separately if I am not mistaken?
   
   Also, @rdblue, please read the PR template and follow the guide there for PR description (https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE).
   From a glance, seems difficult to follow +1,328 −1,134 changes, and what this PR really proposes. I strongly think this is a bad pattern to avoid for big PRs, which I initially aimed to address at https://github.com/apache/spark/pull/25310. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606775156
 
 
   Hive syntax was considered. Anything specific to a source will be passed as table properties.
   
   In this PR, these are passed as follows:
   
   * `STORED AS formatName` -> `hive.stored-as=formatName`
   * `INPUTFORMAT formatClass` -> `hive.input-format=formatClass`
   * `OUTPUTFORMAT formatClass` -> `hive.output-format=formatClass`
   * `SERDE serdeClass` -> `hive.serde=serdeClass`
   * `SERDEPROPERTIES ('key'='value')` -> `option.key=value`
   
   Again, the only thing that Spark should be doing here is ensuring the data is well-formed and then passing it directly to the source.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607364115
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25383/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-614372327
 
 
   @cloud-fan, the meaning of external is dictated by Hive because that's where it came from. Spark is trying to change Hive's meaning of external, while maintaining "compatibility", but you can't do both. Because we now have a catalog abstraction, we're hitting the problem directly: you can't delegate to a catalog _and_ dictate what makes sense in cases specific to that catalog.
   
   If you want to drop support for external, then get rid of the keyword. Otherwise, it needs to be passed so that Spark isn't making arbitrary choices about details specific to a catalog. To be clear, I support failing all create statements that use EXTERNAL. That's fine with me, but you can't parse it and not pass it along.
   
   For SERDEPROPERTIES and OPTIONS, I fail to see how it is a similar situation. In that case, Spark has introduced a situation where it has clauses mean the same thing, so I implemented them to be passed the same way. We can pass these using separate prefixes if you'd like and let the catalog implementations choose. The problem is that this distinction has zero value for end users.
   
   Spark is easier to use and better overall if these are synonyms, but I can't say I care that much about it. Let me know what prefix you would like to use for SERDEPROPERTIES.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606973637
 
 
   **[Test build #120659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120659/testReport)** for PR 28026 at commit [`50019e6`](https://github.com/apache/spark/commit/50019e64d4271ac914d207c6f7c2c4b08716612b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607543770
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606973963
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607415571
 
 
   **[Test build #120686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120686/testReport)** for PR 28026 at commit [`dc1b761`](https://github.com/apache/spark/commit/dc1b76107d7bbb9d4782b0a080ea037324df810d).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606968418
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120658/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607522448
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609071094
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25508/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
cloud-fan edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-612921360
 
 
   You are right that the main argument is how to pass the metadata.
   
   For the EXTERNAL keyword, according to `SparkSqlAstBuilder.visitCreateHiveTable`:
   ```
   ...
   // If we are creating an EXTERNAL table, then the LOCATION field is required
   if (external && location.isEmpty) {
     operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx)
   }
   ...
   // If location is defined, we'll assume this is an external table.
   // Otherwise, we may accidentally delete existing data.
   val tableType = if (external || location.isDefined) {
     CatalogTableType.EXTERNAL
   } else {
      CatalogTableType.MANAGED
   }
   ```
   If we implement Hive with catalog v2, what we care about is the table type, not the existence of the EXTERNAL keyword. The table type is decided by the existence of the LOCATION keyword, so I don't think we need the extra "PROP_EXTERNAL". The "PROP_LOCATION" should be good enough.
   
   Another problem: How are we going to differentiate SERDEPROPERTIES, TBLPROPERTIES, and OPTIONS? How shall we tell end-users that when to use which statement?
   
   > I left a comment with a compromise for Hive-specific information but didn't get a reply.
   
   Can you post the link to your comment? Maybe I missed it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607003471
 
 
   > The create test in SparkSqlParserSuite highlighted an existing problem with spark.sql.legacy.createHiveTableByDefault.enabled. With that table property disabled (default to USING), Hive was still used to create a table because Hive's PARTITION BY syntax was used. I think this is very confusing behavior! (@HeartSaVioR probably agrees.)
   
   Sure, I totally agree. That's one of the issue I mentioned in mail thread, refer below link:
   
   https://lists.apache.org/thread.html/ra2aa31ed7cbd4e34d3504adc97cae1301cc249bfb8f95565b808b0cb%40%3Cdev.spark.apache.org%3E
   
   It's wrong if someone simply thinks that if some words (`ROW FORMAT`, `STORED AS`) exist then Hive create table will be used and if not then native create table will be used. There're more differences on details, including the point @rdblue pointed out here. The issue didn't exist in Spark 2.x, as `USING` clause plays as a marker to distinguish two syntaxes.
   
   > Because of this problem, I think we should either roll back the patch to make USING the default in 3.0 since it is not applied everywhere. That, or get this syntax unification into 3.0.
   
   That's in line with my proposal as well. The ideal approach would be reverting SPARK-30098 in Spark 3.0 and make it correct with syntax unification in further version. I've also proposed some alternatives (add a marker, turn on legacy config by default) but haven't heard any feedback.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-614703404
 
 
   > you can't delegate to a catalog and dictate what makes sense in cases specific to that catalog.
   
   I'm not sure I agree. Spark doesn't delegate the SQL string to a catalog literally, it abstract the information from the SQL and passes the information to the catalog. `EXTERNAL` doesn't give you useful information IMO: If it's specified, then `LOCATION` must be specified. If it's not specified, then `LOCATION` shouldn't be specified otherwise it's the same as with `EXTERNAL` specified. I think it's OK to have some decorative works in a language and we can ignore them.
   
   
   I'm OK to treat SERDEPROPERTIES and OPTIONS the same, it's better to have some validation to make sure these two don't conflict. My concern is more about OPTIONS vs TBLPROPERTIES. Are we going to tell end-users that these are just two different ways to specify some configs and the catalog is free to interpret them?
   
   Note: I'm asking these because we need to update the SQL reference doc after this PR, which is end-user facing: https://spark.apache.org/docs/3.0.0-preview2/sql-ref-syntax-ddl-create-table.html

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r398732526
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##########
 @@ -2413,8 +2413,9 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
    * Type to keep track of table clauses:
    * (partitioning, bucketSpec, properties, options, location, comment).
 
 Review comment:
   Will do.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609070981
 
 
   **[Test build #120809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120809/testReport)** for PR 28026 at commit [`7204dd4`](https://github.com/apache/spark/commit/7204dd40f1e0de1fa99241e538b62dfcc7fd3058).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607353892
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607353867
 
 
   **[Test build #120683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120683/testReport)** for PR 28026 at commit [`d7500c8`](https://github.com/apache/spark/commit/d7500c84b9ac941b73f46ae99aa5a6a9c4cfe8e8).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606974026
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120659/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-610486502
 
 
   > Can we do it with an individual PR?
   
   I think it's better to do it in this one, since the SERDEPROPERTIES need to be prefixed as well. It isn't worth an extra PR to remove 4 lines from this one.
   
   > Then at least it's not a correctness bug.
   
   Again, this doesn't accomplish much. It makes this only marginally smaller and introduces more work to get the feature done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r399205585
 
 

 ##########
 File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##########
 @@ -732,8 +713,13 @@ namedExpressionSeq
     : namedExpression (',' namedExpression)*
     ;
 
-transformList
-    : '(' transforms+=transform (',' transforms+=transform)* ')'
+partitionFieldList
+    : '(' fields+=partitionField (',' fields+=partitionField)* ')'
+    ;
+
+partitionField
 
 Review comment:
   I'd understand that we will have assertion to force using only one side of pattern at a time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604118010
 
 
   **[Test build #120376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120376/testReport)** for PR 28026 at commit [`0522c93`](https://github.com/apache/spark/commit/0522c93d7f34b74faaf0abcfe02428855491a6a4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607367493
 
 
   **[Test build #120684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120684/testReport)** for PR 28026 at commit [`2484059`](https://github.com/apache/spark/commit/2484059c4d0e47911821ca3d65931d24fcdb5202).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607416132
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25385/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607567650
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120699/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607351727
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607410570
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120684/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607416120
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606375737
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120620/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609070135
 
 
   Retest this please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606343803
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25322/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608217792
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607410563
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604808159
 
 
   I don't know who marked comments as resolved so please correct me if I'm wrong, but assuming that @cloud-fan comments to the resolved comment, it doesn't seem @cloud-fan marked comments as resolved.
   
   Could we make sure comments are marked as resolved only when both agree?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607364105
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606375730
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607567642
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607538903
 
 
   I tend to agree it would be ideal to get this patch in Spark 3.0.0, but then this patch will become a blocker for Spark 3.0.0, which means (at least for me) this PR should be handled as urgent manner. Reverting the commit would leave us in a good shape (not improved but not worse) which basically means this issue would become a type of "improvement" and never be a blocker.
   
   I agree the change may need to be introduced in Spark 4.0 due to semver (the main point of worry), but it might be possible the community decides to ship it in minor version if the community thinks the benefit clearly outweighs the semantic of semver.
   
   It would be nice to sync up with release manager for Spark 3.0.0 to weigh the benefit on marking this as blocker for Spark 3.0.0. cc. @rxin 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608217794
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120742/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604118640
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25085/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607567650
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120699/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609838028
 
 
   I'm a bit worried about merging this `+1661 −1146` huge PR to 3.0 at this stage. I agree that this may be in the right direction to implement Hive as a v2 catalog/data source, but this is way beyond what we need for 3.0.
   
   Can we have a more surgical patch that only unifies the CREATE TABLE syntax? Basically, for the items you listed in the PR description.
   
   These are still needed:
   - Unify the create table syntax in the parser by merging Hive and DataSource clauses
   - Add SerdeInfo and external boolean to statement plans and update AstBuilder to produce them
   - Add conversion from create statement plan to v1 create plans in ResolveSessionCatalog
   - Remove SparkSqlParser rules for Hive syntax
   
   With one more item: update ResolveCatalogs to fail if Hive specific clauses are specified in the create statement plan for v2 catalogs.
   
   These can be done later
   - Support new statement clauses in ResolveCatalogs conversion to v2 create plans
   - Add "option." namespace to distinguish SERDEPROPERTIES and OPTIONS in table properties
   
   BTW, the `Does this PR introduce any user-facing change?` section looks more like a new feature. Can we focus on the consequence of the syntax unification? For example:
   1. `CREATE TABLE ... USING ... PARTITIONED BY (colName colType)` is now supported.
   2. `CREATE EXTERNAL TABLE ...` without USING and STORED BY/AS now creates Hive table.
   3. ...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607543449
 
 
   **[Test build #120699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120699/testReport)** for PR 28026 at commit [`5e8f5d0`](https://github.com/apache/spark/commit/5e8f5d01ce6e9c9eb8ab2a05dff45c73b1cbbc90).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r400378232
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java
 ##########
 @@ -46,6 +46,11 @@
    */
   String PROP_LOCATION = "location";
 
+  /**
+   * A reserved property to specify a table was created with EXTERNAL.
+   */
+  String PROP_EXTERNAL = "external";
 
 Review comment:
   The v2 API is not the right place to make this decision. If we support an option in the parser, the source should know whether it was used, or else you are making assumptions about how the source will operate.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607344537
 
 
   @cloud-fan, @Ngone51: to be clear, I think that the PR that added that config property should be reverted if this one doesn't get into 3.0. Allowing people to turn on this confusing behavior is a bad idea.
   
   I think we should fix this, so that we don't need to wait until 4.0 for the behavior changes. After this PR, datasource tables will support Hive's `PARTITION BY` syntax as an alternative to partition expressions. I think it would be best to get all those changes into 3.0.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607567642
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606375730
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608152045
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607532712
 
 
   I agree turning on the legacy config is just a workaround which I will never recommend end users to turn off. I expect I would fail to explain to end users why this query creates parquet provider but other query creates Hive provider, and I guess that's not only me. (You may be able to understand although such complication but that's different story if you have to make others understand.)
   
   If I remember correctly, the Spark community practice for similar case has been "revert". There's no point of keep the commit be live as it's known to be bad and we will have better patch in near future. It only makes sense if someone disagrees the commit is bad; is that a case? Or do we want to let some individuals/teams/vendors to manage it with risks?
   
   ps. Honestly, I feel really odd we allow turning on legacy config by default - not for this issue but for other issues - end users may never indicate the fix, although there should be the valid reason the fix is landed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608151542
 
 
   **[Test build #120742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120742/testReport)** for PR 28026 at commit [`7204dd4`](https://github.com/apache/spark/commit/7204dd40f1e0de1fa99241e538b62dfcc7fd3058).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607032955
 
 
   Maybe we also need to change two "create table" pages and migration guide page as well  regardless of the approach, since we are changing the default behavior.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604162661
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120376/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609071090
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607927357
 
 
   > Sorry if this is answered above but why do we handle v2 APIs here together?
   
   Create statement plans are already converted to v2 because the DataSource create syntax is supported. If the conversion to v2 plans didn't handle the new statement fields, then the new data would be ignored and that's a correctness error.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604118010
 
 
   **[Test build #120376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120376/testReport)** for PR 28026 at commit [`0522c93`](https://github.com/apache/spark/commit/0522c93d7f34b74faaf0abcfe02428855491a6a4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608217792
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-611035487
 
 
   @gatorsmile, this isn't a new feature. This is fixing a long-standing problem that there are two different parser rules for create table that generate different plans. Now, there is one rule and additional metadata passed to conversions that already take place. Updating the existing paths for those properties is a reasonable part of the fix, instead of throwing exceptions.
   
   What you're asking for is to purposely disable passing only some of that metadata through one code path, but not the other. I've heard no rationale for that or discussion of risks introduced by passing it.
   
   What is the risk that justifies doing more work to remove passing a few metadata properties?
   
   If the only justification is that this is a "new feature" then should we also detect when Hive's `PARTITION BY` syntax is used for DataSource tables and throw an exception there? I think we all agree that would be silly. So what is the distinction between that and what you're asking?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607522448
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607364115
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25383/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604141901
 
 
   Could you please pick up SPARK-31257 instead as the JIRA issue explains better for the "situation" we are in? It's still good to keep the PR title as it is, as we tend to explain the "problem/issue" in JIRA and explain "how" to deal with it in PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607350987
 
 
   **[Test build #120683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120683/testReport)** for PR 28026 at commit [`d7500c8`](https://github.com/apache/spark/commit/d7500c84b9ac941b73f46ae99aa5a6a9c4cfe8e8).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607351734
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25382/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606310697
 
 
   Thanks for the detailed answer. Really appreciated.
   
   I misunderstood something as you commented, but it seems to come from something which is not addressed yet and the plan was not shared. Without sharing the detailed plan (TODO) I wonder the review can be done correctly. Maybe we'd be better to wait for you to make more progress?
   
   Btw, this change doesn't seem to be trivial but PR description has shortage of information - one line per a major change. I guess we tend to describe at least how it works in high level on the PR description so that commit message itself becomes informative. We may want to get it once the PR is ready to merge.
   
   Back to the comment, your plan looks promising to me, but I'm not an expert of DSv2 (what I originally pointed out was ambiguous parser syntaxes, not about DSv2) so I feel I'm not qualified.
   
   > Speaking of well-formed, I think it might be reasonable to ensure that either a provider or serde info is present, but not both. That would raise an exception if both USING and STORED AS or SERDE clauses were used. I think that's probably reasonable. In that case, we could skip the checks above. What do you think?
   
   Would we like to disallow STORED AS/SERDE clauses if the provider is specified to Hive via USING clause? That would be still OK if we provide good error message and guide alternatives (either doc or error message).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-605538309
 
 
   Same here, please describe TODO list for this PR to remove WIP.
   
   If I skimmed the code correctly, this "requires" end users to add `USING hive` even they add Hive create table specific clause. Do I understand correctly?
   
   It's OK if that's one of TODO that let some clauses change the provider - we should concern about documentation to make clear which clause changes the provider. (Personally I'm not in favor of this, but otherwise why not just add `HIVE` to second syntax as both are requiring end users to change their query and adding `HIVE` is clearer?)
   
   If the change is intentional, that's my favor but I may concern about removing compatibility flag, as this will force end users to migrate their query in any way.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606751060
 
 
   What you're suggesting is to maintain compatibility with Hive SQL, but arbitrarily limit what a source connector can do.
   
   How about this: users shouldn't be allowed to use the `|` character as the delimiter for Hive's delimited format in sequence files. Let's add special code to Spark to check for that case and fail it before passing everything to the catalog.
   
   Hopefully you see my point: making decisions about what catalogs should support is not a reasonable thing for Spark to do. Spark should reject what is not well-formed, but should not decide details for specific sources.
   
   And to be clear, if you want to do this in Spark's built-in Hive catalog/connector then that's fine with me. But this kind of decision should be made by the catalog, not in the parser or in logical plans.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607142332
 
 
   > @Ngone51 can you open a PR to flip the config value?
   
   > Maybe we also need to change two "create table" pages and migration guide page as well regardless of the approach, since we are changing the default behavior.
   
   Ok.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606343429
 
 
   **[Test build #120620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120620/testReport)** for PR 28026 at commit [`12a17fc`](https://github.com/apache/spark/commit/12a17fcf2189796741c3e11bab2f98c1bf996c03).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604118631
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-605538309
 
 
   Same here, please describe TODO list for this PR to remove WIP.
   
   If I skimmed the code correctly, this "requires" end users to add `USING hive` even they add Hive create table specific clause. Do I understand correctly?
   
   It's OK if that's one of TODO that let some clauses change the provider - we should concern about documentation to make clear which clause changes the provider. (Personally I'm not in favor of this, but otherwise why not just add `HIVE` to second syntax as both are requiring end users to change their query and adding `HIVE` is clearer?)
   
   If the change is intentional, that's in my favor but I may concern about removing compatibility flag, as this will force end users to migrate their query in any way.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606343803
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25322/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607416132
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25385/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609100329
 
 
   **[Test build #120809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120809/testReport)** for PR 28026 at commit [`7204dd4`](https://github.com/apache/spark/commit/7204dd40f1e0de1fa99241e538b62dfcc7fd3058).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class SerdeInfo(`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607003471
 
 
   > The create test in SparkSqlParserSuite highlighted an existing problem with spark.sql.legacy.createHiveTableByDefault.enabled. With that table property disabled (default to USING), Hive was still used to create a table because Hive's PARTITION BY syntax was used. I think this is very confusing behavior! (@HeartSaVioR probably agrees.)
   
   Sure, I totally agree. That's one of the issue I mentioned in mail thread, refer below link:
   
   https://lists.apache.org/thread.html/ra2aa31ed7cbd4e34d3504adc97cae1301cc249bfb8f95565b808b0cb%40%3Cdev.spark.apache.org%3E
   
   It's wrong if someone simply thinks that if some words (`ROW FORMAT`, `STORED AS`) exist then Hive create table will be used and if not then native create table will be used. There're more differences on details, including the point @rdblue pointed out here. The issue didn't exist before, as `USING` clause plays as a marker to distinguish two syntaxes.
   
   > Because of this problem, I think we should either roll back the patch to make USING the default in 3.0 since it is not applied everywhere. That, or get this syntax unification into 3.0.
   
   That's in line with my proposal as well. The ideal approach would be reverting SPARK-30098 in Spark 3.0 and make it correct with syntax unification in further version. I've also proposed some alternatives (add a marker, turn on legacy config by default) but haven't heard any feedback.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606733289
 
 
   > Hive has more features (more flexible parser) than Spark, but it doesn't mean Spark needs to accept them all.
   
   I don't think Spark needs to make a choice on this for the v2 API. It is up to the catalog what features of a table format are useful or not. Because Spark supports parsing EXTERNAL, it makes no sense to arbitrarily suppress it. That requires _more_ code in Spark that is specific to what is now a responsibility delegated to a catalog. It breaks the abstraction.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606966281
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25357/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606343800
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r398732356
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java
 ##########
 @@ -46,6 +46,11 @@
    */
   String PROP_LOCATION = "location";
 
+  /**
+   * A reserved property to specify a table was created with EXTERNAL.
+   */
+  String PROP_EXTERNAL = "external";
 
 Review comment:
   It is up to source providers how to handle this. If the parser accepts `EXTERNAL` then the fact that it was added should be passed to sources.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r398731276
 
 

 ##########
 File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##########
 @@ -732,8 +713,13 @@ namedExpressionSeq
     : namedExpression (',' namedExpression)*
     ;
 
-transformList
-    : '(' transforms+=transform (',' transforms+=transform)* ')'
+partitionFieldList
+    : '(' fields+=partitionField (',' fields+=partitionField)* ')'
+    ;
+
+partitionField
 
 Review comment:
   This is allowed in the grammar to give a better error message later. Otherwise it is confusing that `b string` works sometimes, but not others.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606375628
 
 
   **[Test build #120620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120620/testReport)** for PR 28026 at commit [`12a17fc`](https://github.com/apache/spark/commit/12a17fcf2189796741c3e11bab2f98c1bf996c03).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609100553
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120809/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604118640
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25085/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607351734
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25382/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609100550
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607351727
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r399079445
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java
 ##########
 @@ -46,6 +46,11 @@
    */
   String PROP_LOCATION = "location";
 
+  /**
+   * A reserved property to specify a table was created with EXTERNAL.
+   */
+  String PROP_EXTERNAL = "external";
 
 Review comment:
   What's the goal here? Would we want to pass all the configs to source provider "transparently"? If then yes it makes sense, but if not is there any case which LOCATION doesn't supersede EXTERNAL?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606974023
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r398299827
 
 

 ##########
 File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##########
 @@ -732,8 +713,13 @@ namedExpressionSeq
     : namedExpression (',' namedExpression)*
     ;
 
-transformList
-    : '(' transforms+=transform (',' transforms+=transform)* ')'
+partitionFieldList
+    : '(' fields+=partitionField (',' fields+=partitionField)* ')'
+    ;
+
+partitionField
 
 Review comment:
   It's a bit weird if we allow mixed usage, like `PARTITION BY (year(a), b string)`, shall we just define 2 `PARTITIONED BY` clauses?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606969684
 
 
   @HeartSaVioR, here's some additional info to give you context.
   
   When we convert plans to v2, we alter the parser to produce a `Statement` plan that contains what was parsed, without modification. The reason is that the v1 plans put a lot of logic about what was allowed in the parser itself, which required the parser to be broken across spark-catalyst and spark-sql modules and was difficult to maintain. Now, logic about what is allowed is in the analyzer.
   
   Statement plans are converted to either the exact same v1 logical plan as before, or a v2 plan. To validate that we are producing the exact same logical plans, we move tests from the spark-sql `DDLParserSuite` (only runs the parser) into `PlanResolutionSuite` that runs the parser and conversion rules. These tests should be nearly identical to the original and are as close to a copy & paste as possible to be easy to verify. I just completed this step, which is why tests should be passing.
   
   In addition, we add tests to spark-catalyst's `DDLParserSuite` that test SQL to `Statement` conversion. This is where we will also add new tests for the unified behavior, like rejecting `PARTITION BY` clauses with both expressions and column declarations. These are the last tests that I need to add for this PR and I'll work on them tomorrow.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607353908
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120683/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606968414
 
 
   Build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606967363
 
 
   @cloud-fan, I've implemented the conversions to v1 plans and updated the tests like the other commands that we moved to use statements. Tests from DDLParserSuite are now in PlanResolutionSuite. I also needed to move a create test from SparkSqlParserSuite into PlanResolutionSuite for the same reason.
   
   The create test in SparkSqlParserSuite **highlighted an existing problem with `spark.sql.legacy.createHiveTableByDefault.enabled`**. With that table property disabled (default to USING), Hive was still used to create a table because Hive's `PARTITION BY` syntax was used. I think this is *very* confusing behavior! (@HeartSaVioR probably agrees.)
   
   Because of this problem, I think we should either roll back the patch to make USING the default in 3.0 since it is not applied everywhere. That, or get this syntax unification into 3.0.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607522179
 
 
   **[Test build #120686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120686/testReport)** for PR 28026 at commit [`dc1b761`](https://github.com/apache/spark/commit/dc1b76107d7bbb9d4782b0a080ea037324df810d).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606967859
 
 
   **[Test build #120658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120658/testReport)** for PR 28026 at commit [`8bf400d`](https://github.com/apache/spark/commit/8bf400d769417b2e4085af4019571ecfcfa412e3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606974023
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608217400
 
 
   **[Test build #120742 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120742/testReport)** for PR 28026 at commit [`7204dd4`](https://github.com/apache/spark/commit/7204dd40f1e0de1fa99241e538b62dfcc7fd3058).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class SerdeInfo(`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604162490
 
 
   **[Test build #120376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120376/testReport)** for PR 28026 at commit [`0522c93`](https://github.com/apache/spark/commit/0522c93d7f34b74faaf0abcfe02428855491a6a4).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607410563
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606966276
 
 
   Build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609100550
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607543775
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25397/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609071090
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607353892
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607567521
 
 
   **[Test build #120699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120699/testReport)** for PR 28026 at commit [`5e8f5d0`](https://github.com/apache/spark/commit/5e8f5d01ce6e9c9eb8ab2a05dff45c73b1cbbc90).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607367493
 
 
   **[Test build #120684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120684/testReport)** for PR 28026 at commit [`2484059`](https://github.com/apache/spark/commit/2484059c4d0e47911821ca3d65931d24fcdb5202).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-609905547
 
 
   You're right that options _could_ be done as a separate commit. The other, not so much. In both cases, consider how much of this PR actually concerns those changes and you'll see that removing the parts you suggest doesn't actually help make this a smaller change. The option prefix is very small, but an important part of how we pass SERDEPROPERTIES.
   
   > Support new statement clauses in ResolveCatalogs conversion to v2 create plans
   
   No, this can't be done later. If the clauses are parsed and passed in the statement plan, the conversion to v2 cannot simply ignore them without being a correctness bug. Fixing that correctness bug would be a breaking behavior change.
   
   > Add "option." namespace to distinguish SERDEPROPERTIES and OPTIONS in table properties
   
   This is an outstanding item that needs to be done before 3.0. A community member volunteered to pick it up and never did and this is a good time to get it done. If we wait until after 3.0, then it will never happen because it will be a breaking change that options are not prefixed when passed to v2 implementations. Not only that, this is how SERDEPROPERTIES _should_ be passed, so changing over the OPTIONS clause in isolation makes little sense.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-610189239
 
 
   > the conversion to v2 cannot simply ignore them without being a correctness bug
   
   I agree with it, that's why I propose "update ResolveCatalogs to fail if Hive specific clauses are specified in the create statement plan for v2 catalogs". Then at least it's not a correctness bug.
   
   > The option prefix is very small, but an important part of how we pass SERDEPROPERTIES.
   
   Good to know that it's a small change. Can we do it with an individual PR? This can make the PR reviews more concentrated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607543775
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25397/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606968418
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120658/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607543770
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608152053
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25440/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-608151542
 
 
   **[Test build #120742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120742/testReport)** for PR 28026 at commit [`7204dd4`](https://github.com/apache/spark/commit/7204dd40f1e0de1fa99241e538b62dfcc7fd3058).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606969684
 
 
   @HeartSaVioR, here's some additional info to give you context.
   
   When we convert plans to v2, we alter the parser to produce a `Statement` plan that contains what was parsed, without modification. The reason is that the v1 plans put a lot of logic about what was allowed in the parser itself, which required the parser to be broken across spark-catalyst and spark-sql modules and was difficult to maintain. Now, logic about what is allowed is in the analyzer.
   
   Statement plans are converted to either the exact same v1 logical plan as before, or a v2 plan. To validate that we are producing the exact same logical plans, we move tests from the spark-sql `DDLParserSuite` (only runs the parser) into `PlanResolutionSuite` that runs the parser and conversion rules. These tests should be nearly identical to the original and are as close to a copy & paste as possible to be easy to verify. I just completed this step, which is why tests should be passing.
   
   In addition, we add tests to spark-catalyst's `DDLParserSuite` that test SQL to `Statement` conversion. This is where we will also add new tests for the unified behavior, like rejecting `PARTITION BY` clauses with both expressions and column declarations. These are the last tests that I need to add for this PR and I'll work on them tomorrow.
   
   Other than that, I need to rebase on master because there are conflicts.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-611472963
 
 
   Supporting the Hive style PARTITION BY is very straightforward and I don't see other opinions. Supporting the Hive style CREATE TABLE completely in v2 catalog API does have some different opinions:
   1. shall we support the EXTERNAL keyword? As I said in https://github.com/apache/spark/pull/28026#issuecomment-606746750, it's not useful IMO.
   2. what should be the option key names for these Hive specific information?
   
   I'm not saying that this PR makes the wrong decision, but including more work other than "unifying the CREATE TABLE syntax" does trigger more discussions and makes it harder to get this PR merged.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606969684
 
 
   @HeartSaVioR, here's some additional info to give you context.
   
   When we convert plans to v2, we alter the parser to produce a `Statement` plan that contains what was parsed, without modification. The reason is that the v1 plans put a lot of logic about what was allowed in the parser itself, which required the parser to be broken across spark-catalyst and spark-sql modules and was difficult to maintain. Now, logic about what is allowed is in the analyzer.
   
   Statement plans are converted to either the exact same v1 logical plan as before, or a v2 plan. To validate that we are producing the exact same logical plans, we move tests from the spark-sql `DDLParserSuite` (only runs the parser) into `PlanResolutionSuite` that runs the parser and conversion rules. These tests should be nearly identical to the original and are as close to a copy & paste as possible to be easy to verify. I just completed this step, which is why tests should be passing.
   
   In addition, we add tests to spark-catalyst's `DDLParserSuite` that test SQL to `Statement` conversion. These are the last tests that I need to add for this PR.
   
   Other than that, I need to rebase on master because there are conflicts.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606966276
 
 
   Build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r400553481
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##########
 @@ -2715,24 +2729,198 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
     (filtered, path)
   }
 
+  /**
+   * Create a [[SerdeInfo]] for creating tables.
+   *
+   * Format: STORED AS (name | INPUTFORMAT input_format OUTPUTFORMAT output_format)
+   */
+  override def visitCreateFileFormat(ctx: CreateFileFormatContext): SerdeInfo = withOrigin(ctx) {
+    (ctx.fileFormat, ctx.storageHandler) match {
+      // Expected format: INPUTFORMAT input_format OUTPUTFORMAT output_format
+      case (c: TableFileFormatContext, null) =>
+        SerdeInfo(formatClasses = Some((string(c.inFmt), string(c.outFmt))))
+      // Expected format: SEQUENCEFILE | TEXTFILE | RCFILE | ORC | PARQUET | AVRO
+      case (c: GenericFileFormatContext, null) =>
+        SerdeInfo(storedAs = Some(c.identifier.getText))
+      case (null, storageHandler) =>
+        operationNotAllowed("STORED BY", ctx)
+      case _ =>
+        throw new ParseException("Expected either STORED AS or STORED BY, not both", ctx)
+    }
+  }
+
+  /**
+   * Create a [[SerdeInfo]] used for creating tables.
+   *
+   * Example format:
+   * {{{
+   *   SERDE serde_name [WITH SERDEPROPERTIES (k1=v1, k2=v2, ...)]
+   * }}}
+   *
+   * OR
+   *
+   * {{{
+   *   DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]]
+   *   [COLLECTION ITEMS TERMINATED BY char]
+   *   [MAP KEYS TERMINATED BY char]
+   *   [LINES TERMINATED BY char]
+   *   [NULL DEFINED AS char]
+   * }}}
+   */
+  def visitRowFormat(
+      ctx: RowFormatContext): SerdeInfo = withOrigin(ctx) {
+    ctx match {
+      case serde: RowFormatSerdeContext => visitRowFormatSerde(serde)
+      case delimited: RowFormatDelimitedContext => visitRowFormatDelimited(delimited)
+    }
+  }
+
+  /**
+   * Create SERDE row format name and properties pair.
+   */
+  override def visitRowFormatSerde(ctx: RowFormatSerdeContext): SerdeInfo = withOrigin(ctx) {
+    import ctx._
+    SerdeInfo(
+      serde = Some(string(name)),
+      serdeProperties = Option(tablePropertyList).map(visitPropertyKeyValues).getOrElse(Map.empty))
+  }
+
+  /**
+   * Create a delimited row format properties object.
+   */
+  override def visitRowFormatDelimited(
+      ctx: RowFormatDelimitedContext): SerdeInfo = withOrigin(ctx) {
+    // Collect the entries if any.
+    def entry(key: String, value: Token): Seq[(String, String)] = {
+      Option(value).toSeq.map(x => key -> string(x))
+    }
+    // TODO we need proper support for the NULL format.
+    val entries =
+      entry("field.delim", ctx.fieldsTerminatedBy) ++
+          entry("serialization.format", ctx.fieldsTerminatedBy) ++
+          entry("escape.delim", ctx.escapedBy) ++
+          // The following typo is inherited from Hive...
+          entry("colelction.delim", ctx.collectionItemsTerminatedBy) ++
+          entry("mapkey.delim", ctx.keysTerminatedBy) ++
+          Option(ctx.linesSeparatedBy).toSeq.map { token =>
+            val value = string(token)
+            validate(
+              value == "\n",
+              s"LINES TERMINATED BY only supports newline '\\n' right now: $value",
+              ctx)
+            "line.delim" -> value
+          }
+    SerdeInfo(serdeProperties = entries.toMap)
+  }
+
+  /**
+   * Throw a [[ParseException]] if the user specified incompatible SerDes through ROW FORMAT
+   * and STORED AS.
+   *
+   * The following are allowed. Anything else is not:
+   *   ROW FORMAT SERDE ... STORED AS [SEQUENCEFILE | RCFILE | TEXTFILE]
+   *   ROW FORMAT DELIMITED ... STORED AS TEXTFILE
+   *   ROW FORMAT ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ...
+   */
+  protected def validateRowFormatFileFormat(
+      rowFormatCtx: RowFormatContext,
+      createFileFormatCtx: CreateFileFormatContext,
+      parentCtx: ParserRuleContext): Unit = {
+    if (rowFormatCtx == null || createFileFormatCtx == null) {
+      return
+    }
+    (rowFormatCtx, createFileFormatCtx.fileFormat) match {
+      case (_, ffTable: TableFileFormatContext) => // OK
+      case (rfSerde: RowFormatSerdeContext, ffGeneric: GenericFileFormatContext) =>
+        ffGeneric.identifier.getText.toLowerCase(Locale.ROOT) match {
+          case ("sequencefile" | "textfile" | "rcfile") => // OK
+          case fmt =>
+            operationNotAllowed(
+              s"ROW FORMAT SERDE is incompatible with format '$fmt', which also specifies a serde",
+              parentCtx)
+        }
+      case (rfDelimited: RowFormatDelimitedContext, ffGeneric: GenericFileFormatContext) =>
+        ffGeneric.identifier.getText.toLowerCase(Locale.ROOT) match {
+          case "textfile" => // OK
+          case fmt => operationNotAllowed(
+            s"ROW FORMAT DELIMITED is only compatible with 'textfile', not '$fmt'", parentCtx)
+        }
+      case _ =>
+        // should never happen
+        def str(ctx: ParserRuleContext): String = {
+          (0 until ctx.getChildCount).map { i => ctx.getChild(i).getText }.mkString(" ")
+        }
+        operationNotAllowed(
+          s"Unexpected combination of ${str(rowFormatCtx)} and ${str(createFileFormatCtx)}",
+          parentCtx)
+    }
+  }
+
+  protected def validateRowFormatFileFormat(
+      rowFormatCtx: Seq[RowFormatContext],
+      createFileFormatCtx: Seq[CreateFileFormatContext],
+      parentCtx: ParserRuleContext): Unit = {
+    if (rowFormatCtx.size == 1 && createFileFormatCtx.size == 1) {
+      validateRowFormatFileFormat(rowFormatCtx.head, createFileFormatCtx.head, parentCtx)
+    }
+  }
+
   override def visitCreateTableClauses(ctx: CreateTableClausesContext): TableClauses = {
     checkDuplicateClauses(ctx.TBLPROPERTIES, "TBLPROPERTIES", ctx)
     checkDuplicateClauses(ctx.OPTIONS, "OPTIONS", ctx)
     checkDuplicateClauses(ctx.PARTITIONED, "PARTITIONED BY", ctx)
+    checkDuplicateClauses(ctx.bucketSpec(), "CLUSTERED BY", ctx)
+    checkDuplicateClauses(ctx.createFileFormat, "STORED AS/BY", ctx)
+    checkDuplicateClauses(ctx.rowFormat, "ROW FORMAT", ctx)
     checkDuplicateClauses(ctx.commentSpec(), "COMMENT", ctx)
     checkDuplicateClauses(ctx.bucketSpec(), "CLUSTERED BY", ctx)
     checkDuplicateClauses(ctx.locationSpec, "LOCATION", ctx)
 
-    val partitioning: Seq[Transform] =
-      Option(ctx.partitioning).map(visitTransformList).getOrElse(Nil)
+    if (ctx.skewSpec.size > 0) {
+      operationNotAllowed("CREATE TABLE ... SKEWED BY", ctx)
+    }
+
+    val (partTransforms, partCols) =
+      Option(ctx.partitioning).map(visitPartitionFieldList).getOrElse((Nil, Nil))
     val bucketSpec = ctx.bucketSpec().asScala.headOption.map(visitBucketSpec)
     val properties = Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty)
     val cleanedProperties = cleanTableProperties(ctx, properties)
     val options = Option(ctx.options).map(visitPropertyKeyValues).getOrElse(Map.empty)
     val location = visitLocationSpecList(ctx.locationSpec())
     val (cleanedOptions, newLocation) = cleanTableOptions(ctx, options, location)
     val comment = visitCommentSpecList(ctx.commentSpec())
-    (partitioning, bucketSpec, cleanedProperties, cleanedOptions, newLocation, comment)
+
+    validateRowFormatFileFormat(ctx.rowFormat.asScala, ctx.createFileFormat.asScala, ctx)
+    val fileFormatSerdeInfo = ctx.createFileFormat.asScala.map(visitCreateFileFormat)
+    val rowFormatSerdeInfo = ctx.rowFormat.asScala.map(visitRowFormat)
+    val serdeInfo =
+      (fileFormatSerdeInfo ++ rowFormatSerdeInfo).reduceLeftOption((x, y) => x.merge(y))
+
+    (partTransforms, partCols, bucketSpec, cleanedProperties, cleanedOptions, newLocation, comment,
+        serdeInfo)
+  }
+
+  private def partitionExpressions(
+      partTransforms: Seq[Transform],
+      partCols: Seq[StructField],
+      ctx: ParserRuleContext): Seq[Transform] = {
+    if (partTransforms.nonEmpty) {
+      if (partCols.nonEmpty) {
+        val references = partTransforms.map(_.describe()).mkString(", ")
+        val columns = partCols.map(field => s"${field.name}: ${field.dataType}").mkString(", ")
+        throw new ParseException(
+          s"""PARTITION BY: Cannot mix partition expressions and partition columns:
+             |Expressions: $references
+             |Columns: $columns""".stripMargin, ctx)
 
 Review comment:
   I guess @cloud-fan commented regarding this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607004494
 
 
   OK let's flip the value of `spark.sql.legacy.createHiveTableByDefault.enabled` for 3.0, so we have more time to discuss this PR.
   
   @Ngone51 can you open a PR to flip the config value?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-607538903
 
 
   I tend to agree it would be ideal to get this patch in Spark 3.0.0, but then this patch will become a blocker for Spark 3.0.0, which means (at least for me) this PR should be handled as urgent manner. Reverting the commit would leave us in a good shape (not improved but back to normal) which basically means this issue would become a type of "improvement" and never be a blocker.
   
   I agree the change may need to be introduced in Spark 4.0 due to semver (the main point of worry), but it might be possible the community decides to ship it in minor version if the community thinks the benefit clearly outweighs the semantic of semver.
   
   It would be nice to sync up with release manager for Spark 3.0.0 to weigh the benefit on marking this as blocker for Spark 3.0.0. cc. @rxin 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r398407329
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ##########
 @@ -2413,8 +2413,9 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
    * Type to keep track of table clauses:
    * (partitioning, bucketSpec, properties, options, location, comment).
 
 Review comment:
   Let's update comment as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-18885][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604118631
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606968404
 
 
   **[Test build #120658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120658/testReport)** for PR 28026 at commit [`8bf400d`](https://github.com/apache/spark/commit/8bf400d769417b2e4085af4019571ecfcfa412e3).
    * This patch **fails Scala style tests**.
    * This patch **does not merge cleanly**.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax
URL: https://github.com/apache/spark/pull/28026#issuecomment-611610372
 
 
   I don't think it's reasonable to characterize this as "supporting the hive style create table completely". It passes metadata from Hive clauses through existing paths. That's not much code and as reasonable as the partition by changes: we could also throw exceptions for using the wrong partitioning syntax, we just all agree not to. I'm not hearing a real distinction, other than that there is disagreement on how to pass the metadata.
   
   Is the central problem the way the metadata is passed? Or maybe it's the size of the changes? Not sure, there have been different reasons cited.
   
   If it's the way metadata is passed, then let's discuss that. I left a comment with a compromise for Hive-specific information but didn't get a reply. Let's focus on the real points to get this done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r398404370
 
 

 ##########
 File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCatalog.java
 ##########
 @@ -46,6 +46,11 @@
    */
   String PROP_LOCATION = "location";
 
+  /**
+   * A reserved property to specify a table was created with EXTERNAL.
+   */
+  String PROP_EXTERNAL = "external";
 
 Review comment:
   Because `LOCATION` supersedes `EXTERNAL` and all external tables require location, right? If my understanding is correct, I agree `PROP_EXTERNAL` is redundant and `PROP_LOCATION` can be used instead.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-604162652
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#discussion_r401301040
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala
 ##########
 @@ -1557,6 +1568,634 @@ class PlanResolutionSuite extends AnalysisTest {
     checkFailure("testcat.tab", "foo")
   }
 
+  private def compareNormalized(plan1: LogicalPlan, plan2: LogicalPlan): Unit = {
+    /**
+     * Normalizes plans:
+     * - CreateTable the createTime in tableDesc will replaced by -1L.
+     */
+    def normalizePlan(plan: LogicalPlan): LogicalPlan = {
+      plan match {
+        case CreateTable(tableDesc, mode, query) =>
+          val newTableDesc = tableDesc.copy(createTime = -1L)
+          CreateTable(newTableDesc, mode, query)
+        case _ => plan // Don't transform
+      }
+    }
+    comparePlans(normalizePlan(plan1), normalizePlan(plan2))
+  }
+
+  test("create table - schema") {
+    def createTable(
+        table: String,
+        database: Option[String] = None,
+        tableType: CatalogTableType = CatalogTableType.MANAGED,
+        storage: CatalogStorageFormat = CatalogStorageFormat.empty.copy(
+          inputFormat = HiveSerDe.sourceToSerDe("textfile").get.inputFormat,
+          outputFormat = HiveSerDe.sourceToSerDe("textfile").get.outputFormat,
+          serde = Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe")),
+        schema: StructType = new StructType,
+        provider: Option[String] = Some("hive"),
+        partitionColumnNames: Seq[String] = Seq.empty,
+        comment: Option[String] = None,
+        mode: SaveMode = SaveMode.ErrorIfExists,
+        query: Option[LogicalPlan] = None): CreateTable = {
+      CreateTable(
+        CatalogTable(
+          identifier = TableIdentifier(table, database),
+          tableType = tableType,
+          storage = storage,
+          schema = schema,
+          provider = provider,
+          partitionColumnNames = partitionColumnNames,
+          comment = comment
+        ), mode, query
+      )
+    }
+
+    def compare(sql: String, plan: LogicalPlan): Unit = {
+      compareNormalized(parseAndResolve(sql), plan)
+    }
+
+    compare("CREATE TABLE my_tab(a INT COMMENT 'test', b STRING) STORED AS textfile",
+      createTable(
+        table = "my_tab",
+        database = Some("default"),
+        schema = (new StructType)
+            .add("a", IntegerType, nullable = true, "test")
+            .add("b", StringType)
+      )
+    )
+    withSQLConf(SQLConf.LEGACY_CREATE_HIVE_TABLE_BY_DEFAULT_ENABLED.key -> "true") {
 
 Review comment:
   @cloud-fan, this is the SQL that uses Hive without this patch because of the `PARTITION BY` clause (and the statement using this config option below).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28026: [SPARK-31257][SQL] Unify create table syntax (WIP)
URL: https://github.com/apache/spark/pull/28026#issuecomment-606973968
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25358/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org