You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/01 01:46:00 UTC

[GitHub] [spark] dtenedor opened a new pull request #35690: [SPARK-38334][SQL] Implement parser support for DEFAULT column values

dtenedor opened a new pull request #35690:
URL: https://github.com/apache/spark/pull/35690


   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Implement parser changes needed to support for DEFAULT column values as tracked in https://issues.apache.org/jira/browse/SPARK-38334.
   
   Note that these are the parser changes only. Analysis support will take place in a following PR.
   
   Background: in the future, CREATE TABLE and ALTER TABLE invocations will support setting column default values for later operations. Following INSERT, UPDATE, MERGE statements may then reference the value using the DEFAULT keyword as needed.
   
   Examples:
   ```sql
   CREATE TABLE T(a INT, b INT NOT NULL);
   
   -- The default default is NULL
   INSERT INTO T VALUES (DEFAULT, 0);
   INSERT INTO T(b)  VALUES (1);
   SELECT * FROM T;
   (NULL, 0)
   (NULL, 1)
   
   -- Adding a default to a table with rows, sets the values for the
   -- existing rows (exist default) and new rows (current default).
   ALTER TABLE T ADD COLUMN c INT DEFAULT 5;
   INSERT INTO T VALUES (1, 2, DEFAULT);
   SELECT * FROM T;
   (NULL, 0, 5)
   (NULL, 1, 5)
   (1, 2, 5) 
   ```
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   This new API helps users run DDL and DML statements to add new values to tables and scan existing values from tables more easily.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   No
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   
   Added unit test coverage in DDLParserSuite.scala


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818358129



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -1078,6 +1090,8 @@ alterColumnAction
     | commentSpec
     | colPosition
     | setOrDrop=(SET | DROP) NOT NULL
+    | SET defaultExpression

Review comment:
       Yes, it is part of the ANSI SQL.
   <img width="1023" alt="image" src="https://user-images.githubusercontent.com/1097932/156511258-88c5619a-38c6-46db-8ca6-4c510d052b11.png">
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #35690: [SPARK-38334][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r816930604



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -136,7 +136,7 @@ statement
         (RESTRICT | CASCADE)?                                          #dropNamespace
     | SHOW namespaces ((FROM | IN) multipartIdentifier)?
         (LIKE? pattern=STRING)?                                        #showNamespaces
-    | createTableHeader ('(' colTypeList ')')? tableProvider?
+    | createTableHeader ('(' createTableColTypeList ')')? tableProvider?

Review comment:
       I think we need to update line 149 `replaceTableHeader ('(' colTypeList ')')? tableProvider?` as well.
   
   But we can revisit replace table later.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1061497002


   Thank you for your confirmations, @gengliangwang and @MaxGekk !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1061482658


   @gengliangwang and @cloud-fan . Why do we have Spark 3.4 patch in `master` branch for Apache Spark 3.3?
   > @dongjoon-hyun This is for Spark 3.4
   
   Are we going to revert this from `branch-3.3`? cc @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1061483489


   As I mentioned https://github.com/apache/spark/pull/35690#pullrequestreview-899459801, I assumed we were going to wait until @MaxGekk cut a branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang edited a comment on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang edited a comment on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057620996


   Supporting default column values is very common among DBMS. However, this will be a breaking change for Spark SQL
   Currently Spark SQL
   ```
   > create table t(i int, j int);
   > insert into t values(1);
   Error in query: `default`.`t` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s).
   ```
   
   After supporting default column value:
   ```
   > create table t(i int, j int);
   > insert into t values(1);
   > select * from t;
   1	NULL
   
   > create table t2(i int, j int default 0);
   > insert into t2 values(1);
   > select * from t2;
   1	0
   ```
   
   I am +1 with the change.
   Before merging this PR, I would like to collect the opinions of more committers. We can send SPIP for voting if necessary.
   cc @cloud-fan  @dongjoon-hyun @viirya @dbtsai @huaxingao @maropu @zsxwing @wangyum @yaooqinn WDYT? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057730161


   > what is your release target for this
   
   @dongjoon-hyun This is for Spark 3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1061494935


   > Are we going to revert this from branch-3.3? cc @MaxGekk
   
   If there is a risk that it can hurt stability. Let's revert it. I will open a blocker for 3.3 than to not forget this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057727061


   > Creating NULL default value for NOT NULL column
   Type mismatch between default value literal and column type.
   Upcasting or not in case of type mismatch
   
   IMO:
   - Not Null column can't have Null default
   - Type mismatch between default value literal and column type:  we can simply forbid this. Note that we have many numeric types(Byte/Short/Int/Long/Decimal/Float/Double). If both default value literal type and column type are Numeric, it is not considered a mismatch.
   - Upcasting or not in case of type mismatch: casting can happen if both of the literal type and column type are Numeric
   
   @dtenedor WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang closed pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang closed pull request #35690:
URL: https://github.com/apache/spark/pull/35690


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057716458


   Adding the default support itself should be fine - it's pretty common up to my best knowledge. My only concern is the breaking behavior (https://github.com/apache/spark/pull/35690#issuecomment-1057620996). I am fine w/ this as long as this matches w/ most of DBMSes. I suspect it's less critical since it failed before but now works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1068999577


   > Hey, @MaxGekk . Did you make a block issue?
   
   Here is the blocker SPARK-38566


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818919712



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -3651,12 +3695,20 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
     }
   }
 
+  private def defaultColumnNotImplementedYetError = {
+    "Support for DEFAULT column values is not implemented yet"
+  }
+
   /**
    * Parse new column info from ADD COLUMN into a QualifiedColType.
    */
   override def visitQualifiedColTypeWithPosition(
       ctx: QualifiedColTypeWithPositionContext): QualifiedColType = withOrigin(ctx) {
     val name = typedVisit[Seq[String]](ctx.name)
+    val defaultExpr = Option(ctx.defaultExpression()).map(visitDefaultExpression)
+    if (defaultExpr != None) {

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818973943



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##########
@@ -2235,4 +2236,57 @@ class DDLParserSuite extends AnalysisTest {
     comparePlans(parsePlan(timestampTypeSql), insertPartitionPlan(timestamp))
     comparePlans(parsePlan(binaryTypeSql), insertPartitionPlan(binaryStr))
   }
+
+  test("SPARK-38335: Implement support for DEFAULT values for columns in tables") {

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818235977



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2756,6 +2763,41 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
       metadata = builder.build())
   }
 
+  /**
+   * Create a [[StructType]] from a number of CREATE TABLE column definitions.
+   */
+  override def visitCreateOrReplaceTableColTypeList(
+      ctx: CreateOrReplaceTableColTypeListContext): Seq[StructField] = withOrigin(ctx) {
+    ctx.createOrReplaceTableColType().asScala.map(visitCreateOrReplaceTableColType).toSeq
+  }
+
+  /**
+   * Create a top level [[StructField]] from a CREATE TABLE column definition.
+   */
+  override def visitCreateOrReplaceTableColType(
+      ctx: CreateOrReplaceTableColTypeContext): StructField = withOrigin(ctx) {
+    import ctx._
+
+    val builder = new MetadataBuilder
+    // Add comment to metadata
+    Option(commentSpec()).map(visitCommentSpec).foreach {
+      builder.putString("comment", _)
+    }
+
+    // Process the 'DEFAULT expression' clause in the column definition, if any.
+    val name: String = colName.getText
+    val defaultExpr = Option(ctx.defaultExpression()).map(visitDefaultExpression)
+    if (defaultExpr != None) {

Review comment:
       
   ```suggestion
       if (defaultExpr.isDefined) {
   ```

##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##########
@@ -1448,6 +1448,7 @@ class DDLParserSuite extends AnalysisTest {
             Assignment(UnresolvedAttribute("target.col2"), UnresolvedAttribute("source.col2")))))))
   }
 
+

Review comment:
       ```suggestion
   ```

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -3813,6 +3871,9 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
           throw QueryParsingErrors.operationInHiveStyleCommandUnsupportedError(
             "Replacing with a nested column", "REPLACE COLUMNS", ctx)
         }
+        if (Option(colType.defaultExpression()).map(visitDefaultExpression) != None) {

Review comment:
       ```suggestion
           if (Option(colType.defaultExpression()).map(visitDefaultExpression).isDefined) {
   ```

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -3651,12 +3695,20 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
     }
   }
 
+  private def defaultColumnNotImplementedYetError = {
+    "Support for DEFAULT column values is not implemented yet"
+  }
+
   /**
    * Parse new column info from ADD COLUMN into a QualifiedColType.
    */
   override def visitQualifiedColTypeWithPosition(
       ctx: QualifiedColTypeWithPositionContext): QualifiedColType = withOrigin(ctx) {
     val name = typedVisit[Seq[String]](ctx.name)
+    val defaultExpr = Option(ctx.defaultExpression()).map(visitDefaultExpression)
+    if (defaultExpr != None) {

Review comment:
       ```suggestion
       if (defaultExpr.isDefined) {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r817374943



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##########
@@ -1424,6 +1424,30 @@ class DDLParserSuite extends AnalysisTest {
     assert(exc.getMessage.contains("Columns aliases are not allowed in UPDATE."))
   }
 
+  private def basicMergeIntoCommand(firstValue: String, secondValue: String): String =
+    """
+      |MERGE INTO testcat1.ns1.ns2.tbl AS target
+      |USING testcat2.ns1.ns2.tbl AS source
+      |ON target.col1 = source.col1
+      |WHEN MATCHED AND (target.col2='delete') THEN DELETE
+      |WHEN MATCHED AND (target.col2='update') THEN UPDATE SET target.col2 = source.col2
+      |WHEN NOT MATCHED AND (target.col2='insert')
+      |THEN INSERT (target.col1, target.col2) values (
+      """.stripMargin + firstValue + ", " + secondValue + ")"
+
+  private def basicMergeIntoPlan(firstValue: String, secondValue: String): LogicalPlan =

Review comment:
       this method is not used




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818308245



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##########
@@ -2235,4 +2236,57 @@ class DDLParserSuite extends AnalysisTest {
     comparePlans(parsePlan(timestampTypeSql), insertPartitionPlan(timestamp))
     comparePlans(parsePlan(binaryTypeSql), insertPartitionPlan(binaryStr))
   }
+
+  test("SPARK-38335: Implement support for DEFAULT values for columns in tables") {

Review comment:
       nit: support -> parser support?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #35690: [SPARK-38334][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r816955843



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -3459,7 +3494,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
   override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = withOrigin(ctx) {

Review comment:
       This method is override in the sq/core module https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L303
   We need to update it as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38334][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r817103393



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -3459,7 +3494,8 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
   override def visitCreateTable(ctx: CreateTableContext): LogicalPlan = withOrigin(ctx) {

Review comment:
       Thanks for pointing this out! I had forgot to run the relevant subset of unit tests. Fixed this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1059894724


   (Note the prior merge conflict in the lexer has now been resolved.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1058387995


   > This PR doesn't seem to have the full body yet, what is your release target for this, @dtenedor and @gengliangwang ? I'm curious about the general error handling.
   > 
   > * Creating `NULL` default value for `NOT NULL` column
   > * Type mismatch between `default value literal` and column type.
   > * Upcasting or not in case of type mismatch
   
   These are good questions.
   
   * The future plan for `NOT NULL` columns is to require a `DEFAULT` value. If no `DEFAULT` value is provided, the engine will throw an error. We can also return an error in the case of a provided `DEFAULT NULL` value for a `NOT NULL` column.
   * The engine can perform an explicit type coercion from the provided type to the required type, e.g. integer to floating-point. If these types are not coercible in this way, the engine can return an error, e.g. floating-point to boolean.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057713645


   > As this is a breaking change, I'd like to see some clarification on why this is necessary to have. 
   
   @viirya Column default support is very helpful for table schema evolution and DML operations on wide tables with sparse data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818919421



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -3813,6 +3871,9 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
           throw QueryParsingErrors.operationInHiveStyleCommandUnsupportedError(
             "Replacing with a nested column", "REPLACE COLUMNS", ctx)
         }
+        if (Option(colType.defaultExpression()).map(visitDefaultExpression) != None) {

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818344523



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -1078,6 +1090,8 @@ alterColumnAction
     | commentSpec
     | colPosition
     | setOrDrop=(SET | DROP) NOT NULL
+    | SET defaultExpression
+    | dropDefault=DROP DEFAULT

Review comment:
       For supporting the ANSI SQL syntax
   ```
   ALTER TABLE table_name ALTER COLUMN column_name SET DEFAULT expr
   
   ALTER TABLE table_name ALTER COLUMN column_name DROP DEFAULT
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818922112



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2756,6 +2763,41 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
       metadata = builder.build())
   }
 
+  /**
+   * Create a [[StructType]] from a number of CREATE TABLE column definitions.
+   */
+  override def visitCreateOrReplaceTableColTypeList(
+      ctx: CreateOrReplaceTableColTypeListContext): Seq[StructField] = withOrigin(ctx) {
+    ctx.createOrReplaceTableColType().asScala.map(visitCreateOrReplaceTableColType).toSeq
+  }
+
+  /**
+   * Create a top level [[StructField]] from a CREATE TABLE column definition.
+   */
+  override def visitCreateOrReplaceTableColType(
+      ctx: CreateOrReplaceTableColTypeContext): StructField = withOrigin(ctx) {
+    import ctx._
+
+    val builder = new MetadataBuilder
+    // Add comment to metadata
+    Option(commentSpec()).map(visitCommentSpec).foreach {
+      builder.putString("comment", _)
+    }
+
+    // Process the 'DEFAULT expression' clause in the column definition, if any.
+    val name: String = colName.getText
+    val defaultExpr = Option(ctx.defaultExpression()).map(visitDefaultExpression)
+    if (defaultExpr != None) {
+      throw new ParseException(defaultColumnNotImplementedYetError, ctx)

Review comment:
       Sure, thanks for letting me know about this framework. Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1058723452


   This is ready for another review round @gengliangwang @viirya @wangyum @HyukjinKwon @dongjoon-hyun :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057312339


   jenkins merge


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057656111


   cc @aokolnychyi , @RussellSpitzer , @rdblue , too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1061486936


   @dongjoon-hyun I am merging this one to unblock @dtenedor's work on the actual changes in catalogs.
   I will:
   * revert this one on branch-3.3
   * make sure no new related PRs merged to master until branch-3.3 is cut
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1061483489


   As I mentioned https://github.com/apache/spark/pull/35690#pullrequestreview-899459801, I assumed we are going to wait until @MaxGekk cut a branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1068337535


   Hey, @MaxGekk . Did you make a block issue?
   To @gengliangwang and @HyukjinKwon , I saw the JIRA was resolved as 3.3 because we want to avoid our merge script shows `3.4` as a new version.
   <img width="319" alt="Screen Shot 2022-03-15 at 11 43 10 AM" src="https://user-images.githubusercontent.com/9700541/158449120-e9329b98-a289-4b0a-a738-302333c8981e.png">
    
   Since today is the feature freeze day, I recover it to 3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818307182



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -1078,6 +1090,8 @@ alterColumnAction
     | commentSpec
     | colPosition
     | setOrDrop=(SET | DROP) NOT NULL
+    | SET defaultExpression
+    | dropDefault=DROP DEFAULT

Review comment:
       Hmm, what is this for?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818236870



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -1974,3 +1991,4 @@ WS
 UNRECOGNIZED
     : .
     ;
+

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk edited a comment on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
MaxGekk edited a comment on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1068999577


   > Hey, @MaxGekk . Did you make a block issue?
   
   Here is the blocker SPARK-38566 and the PR which reverts the commit https://github.com/apache/spark/pull/35875


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1068604205


   👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057620996


   Before merging this PR, I would like to collect the opinions of more committers.
   Supporting default column values is very common among DBMS. However, this will be a breaking change for Spark SQL
   Currently Spark SQL
   ```
   > create table t(i int, j int);
   > insert into t values(1);
   Error in query: `default`.`t` requires that the data to be inserted have the same number of columns as the target table: target table has 2 column(s) but the inserted data has 1 column(s), including 0 partition column(s) having constant value(s).
   ```
   
   After supporting default column value:
   ```
   > create table t(i int, j int);
   > insert into t values(1);
   > select * from t;
   1	NULL
   ```
   
   cc @cloud-fan  @dongjoon-hyun @viirya @dbtsai @huaxingao @maropu @zsxwing @wangyum @yaooqinn WDYT? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1058385738


   > Thanks for the work.
   > 
   > For the parser change itself, looks okay. As this is a breaking change, I'd like to see some clarification on why this is necessary to have. What issue we have if we don't have this (because we don't have the default value for long time) and do we have any workaround now?
   
   This is for supporting the ANSI SQL syntax to specify default column values in DDL commands and then referencing them later in INSERT/UPDATE/MERGE commands. Examples:
   
   CREATE TABLE T(a INT, b INT NOT NULL);
   
   -- The default default is NULL
   INSERT INTO T VALUES (DEFAULT, 0);
   INSERT INTO T(b)  VALUES (1);
   SELECT * FROM T;
   (NULL, 0)
   (NULL, 1)
   
   -- Adding a default to a table with rows, sets the values for the
   -- existing rows (exist default) and new rows (current default).
   ALTER TABLE T ADD COLUMN c INT DEFAULT 5;
   INSERT INTO T VALUES (1, 2, DEFAULT);
   SELECT * FROM T;
   (NULL, 0, 5)
   (NULL, 1, 5)
   (1, 2, 5) 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r817924493



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##########
@@ -1424,6 +1424,30 @@ class DDLParserSuite extends AnalysisTest {
     assert(exc.getMessage.contains("Columns aliases are not allowed in UPDATE."))
   }
 
+  private def basicMergeIntoCommand(firstValue: String, secondValue: String): String =

Review comment:
       Done. (I had considered refactoring some MERGE INTO unit testing to share the logical plans, but it turned out to be more confusing than expected, so I skipped that approach.)

##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##########
@@ -1424,6 +1424,30 @@ class DDLParserSuite extends AnalysisTest {
     assert(exc.getMessage.contains("Columns aliases are not allowed in UPDATE."))
   }
 
+  private def basicMergeIntoCommand(firstValue: String, secondValue: String): String =
+    """
+      |MERGE INTO testcat1.ns1.ns2.tbl AS target
+      |USING testcat2.ns1.ns2.tbl AS source
+      |ON target.col1 = source.col1
+      |WHEN MATCHED AND (target.col2='delete') THEN DELETE
+      |WHEN MATCHED AND (target.col2='update') THEN UPDATE SET target.col2 = source.col2
+      |WHEN NOT MATCHED AND (target.col2='insert')
+      |THEN INSERT (target.col1, target.col2) values (
+      """.stripMargin + firstValue + ", " + secondValue + ")"
+
+  private def basicMergeIntoPlan(firstValue: String, secondValue: String): LogicalPlan =

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818237466



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2756,6 +2763,41 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
       metadata = builder.build())
   }
 
+  /**
+   * Create a [[StructType]] from a number of CREATE TABLE column definitions.
+   */
+  override def visitCreateOrReplaceTableColTypeList(
+      ctx: CreateOrReplaceTableColTypeListContext): Seq[StructField] = withOrigin(ctx) {
+    ctx.createOrReplaceTableColType().asScala.map(visitCreateOrReplaceTableColType).toSeq
+  }
+
+  /**
+   * Create a top level [[StructField]] from a CREATE TABLE column definition.
+   */
+  override def visitCreateOrReplaceTableColType(
+      ctx: CreateOrReplaceTableColTypeContext): StructField = withOrigin(ctx) {
+    import ctx._
+
+    val builder = new MetadataBuilder
+    // Add comment to metadata
+    Option(commentSpec()).map(visitCommentSpec).foreach {
+      builder.putString("comment", _)
+    }
+
+    // Process the 'DEFAULT expression' clause in the column definition, if any.
+    val name: String = colName.getText
+    val defaultExpr = Option(ctx.defaultExpression()).map(visitDefaultExpression)
+    if (defaultExpr != None) {
+      throw new ParseException(defaultColumnNotImplementedYetError, ctx)

Review comment:
       Should we leverage our error framework e.g., in `QueryParsingErrors`? cc @MaxGekk FYI




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057730727


   Thank you for the confirmation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818334272



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -1078,6 +1090,8 @@ alterColumnAction
     | commentSpec
     | colPosition
     | setOrDrop=(SET | DROP) NOT NULL
+    | SET defaultExpression

Review comment:
       Do other databases support `SET DEFAULT expression`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38334][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r817058508



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -136,7 +136,7 @@ statement
         (RESTRICT | CASCADE)?                                          #dropNamespace
     | SHOW namespaces ((FROM | IN) multipartIdentifier)?
         (LIKE? pattern=STRING)?                                        #showNamespaces
-    | createTableHeader ('(' colTypeList ')')? tableProvider?
+    | createTableHeader ('(' createTableColTypeList ')')? tableProvider?

Review comment:
       No problem, we can support `replace table` here as well. I updated the `createTableColTypeList` to `createOrReplaceTableColTypeList`, and updated the test cases so they pass.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #35690: [SPARK-38334][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r816930604



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -136,7 +136,7 @@ statement
         (RESTRICT | CASCADE)?                                          #dropNamespace
     | SHOW namespaces ((FROM | IN) multipartIdentifier)?
         (LIKE? pattern=STRING)?                                        #showNamespaces
-    | createTableHeader ('(' colTypeList ')')? tableProvider?
+    | createTableHeader ('(' createTableColTypeList ')')? tableProvider?

Review comment:
       I think we need to update line 149 `replaceTableHeader ('(' colTypeList ')')? tableProvider?` as well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r817374898



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##########
@@ -1424,6 +1424,30 @@ class DDLParserSuite extends AnalysisTest {
     assert(exc.getMessage.contains("Columns aliases are not allowed in UPDATE."))
   }
 
+  private def basicMergeIntoCommand(firstValue: String, secondValue: String): String =

Review comment:
       this method is not used




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1056427705


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1060361087


   @dtenedor Thanks for the first contribution!
   @HyukjinKwon @dongjoon-hyun @viirya  @wangyum @cloud-fan thanks for the inputs! I am merging this parser-only PR to unblock @dtenedor's works in this feature. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1060339678


   This is a parser-only change and the feature is not implemented yet, so definitely not a breaking change. But I'd like to confirm that, is every new SQL feature a breaking change? e.g. adding a new SQL function means that a query failed with "function not found" before, now succeeds. This doesn't seem like a breaking change to me. The same applies to accepting more parameters in a SQL function, accepting more parameter types, etc.
   
   The code change itself LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on a change in pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on a change in pull request #35690:
URL: https://github.com/apache/spark/pull/35690#discussion_r818916692



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -2756,6 +2763,41 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
       metadata = builder.build())
   }
 
+  /**
+   * Create a [[StructType]] from a number of CREATE TABLE column definitions.
+   */
+  override def visitCreateOrReplaceTableColTypeList(
+      ctx: CreateOrReplaceTableColTypeListContext): Seq[StructField] = withOrigin(ctx) {
+    ctx.createOrReplaceTableColType().asScala.map(visitCreateOrReplaceTableColType).toSeq
+  }
+
+  /**
+   * Create a top level [[StructField]] from a CREATE TABLE column definition.
+   */
+  override def visitCreateOrReplaceTableColType(
+      ctx: CreateOrReplaceTableColTypeContext): StructField = withOrigin(ctx) {
+    import ctx._
+
+    val builder = new MetadataBuilder
+    // Add comment to metadata
+    Option(commentSpec()).map(visitCommentSpec).foreach {
+      builder.putString("comment", _)
+    }
+
+    // Process the 'DEFAULT expression' clause in the column definition, if any.
+    val name: String = colName.getText
+    val defaultExpr = Option(ctx.defaultExpression()).map(visitDefaultExpression)
+    if (defaultExpr != None) {

Review comment:
       Done. Thanks for your reviews!

##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
##########
@@ -1448,6 +1448,7 @@ class DDLParserSuite extends AnalysisTest {
             Assignment(UnresolvedAttribute("target.col2"), UnresolvedAttribute("source.col2")))))))
   }
 
+

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1057800492


   > Adding the default support itself should be fine - it's pretty common up to my best knowledge. My only concern is the breaking behavior ([#35690 (comment)](https://github.com/apache/spark/pull/35690#issuecomment-1057620996)). I am fine w/ this as long as this matches w/ most of DBMSes. I suspect it's less critical since it failed before but now works.
   
   I agreed with you. @HyukjinKwon 
   
   And thanks for clarification, @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dtenedor commented on pull request #35690: [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Posted by GitBox <gi...@apache.org>.
dtenedor commented on pull request #35690:
URL: https://github.com/apache/spark/pull/35690#issuecomment-1058390966


   > > Creating NULL default value for NOT NULL column
   > > Type mismatch between default value literal and column type.
   > > Upcasting or not in case of type mismatch
   > 
   > IMO:
   > 
   > * Not Null column can't have Null default
   > * Type mismatch between default value literal and column type:  we can simply forbid this. Note that we have many numeric types(Byte/Short/Int/Long/Decimal/Float/Double). If both default value literal type and column type are Numeric, it is not considered a mismatch.
   > * Upcasting or not in case of type mismatch: casting can happen if both of the literal type and column type are Numeric
   > 
   > @dtenedor WDYT?
   
   Good questions, I replied above earlier. We can perform a type coercion from the provided type to the required type, or return an error if the types are not coercible in this way. We can use existing type coercion rules in the analyzer for this part for consistency with the rest of Spark. For example, coercing an integer to floating-point should work, but coercing a floating-point to boolean should return an error to the user.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org