You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/26 15:38:23 UTC

[GitHub] [spark] johanl-db opened a new pull request, #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

johanl-db opened a new pull request, #38400:
URL: https://github.com/apache/spark/pull/38400

   ### What changes were proposed in this pull request?
   This change adds a third type of WHEN clause to the MERGE INTO command that allows updating or deleting rows from the target table that have no match in the source table based on the merge condition.
   
   The following example updates all rows from the target table that have a match in the source table using the source value. For target rows that have no match in the source table, the 'state' column of rows that were created before '2022-10-26' is set to 'active', while rows created before that date are deleted from the target table.
   ```
   MERGE INTO target
   USING source
   ON target.key = source.key
   WHEN MATCHED THEN UPDATE SET *
   WHEN NOT MATCHED BY SOURCE AND target.create_at > '2022-10-26' THEN UPDATE SET target.status = 'active'
   WHEN NOT MATCHED BY SOURCE THEN DELETE 
   ```
   
   In addition, the existing WHEN NOT MATCHED clause can now also include an optional BY TARGET qualifier that has no effect on its semantic other than allowing a more consistent use of the clauses together:
   ```
   WHEN NOT MATCHED BY TARGET THEN INSERT * 
   ```
   is equivalent to:
   ```
   WHEN NOT MATCHED THEN INSERT *
   ```
   The updated SQL syntax for the MERGE INTO command is described more precisely in the user-facing change section below.
   
   The changes proposed in this pull request are two-fold:
   1. Update SQL parsing to handle the new clause introduced. This results in a new field `notMatchedBySourceActions` being populated in the logical plan node of the MERGE INTO command `MergeIntoTable`.
   2. Handle the newly added merge clause during analysis. In particular, resolve all attribute references used in WHEN NOT MATCHED BY SOURCE conditions and actions. The attributes used in a NOT MATCHED BY SOURCE clause may only refer to the target table.
   
   ### Why are the changes needed?
   The new clause opens up uses cases leveraging the merge command to sync a target from a source table by conditionally deleting or updating records that are not present in the source. As an example, the following command incrementally syncs the target table from the source table for the past 5 days:
   ```
   MERGE INTO target
   USING (SELECT `columns`  FROM source  WHERE created_at >= (current_date - INTERVAL ‘5’ DAY)  AS tmp_name
   ON FALSE
   WHEN NOT MATCHED BY SOURCE AND (current_date - INTERVAL ‘5’ DAY) THEN DELETE
   WHEN NOT MATCHED BY TARGET THEN INSERT `columns`
   ```
   After running this command, all rows older than 5 days in the target table are left unmodified while rows newer than 5 days that are either not already in the target table or not in the source table anymore are inserted and deleted, respectively.
   
   ### Does this PR introduce _any_ user-facing change?
   Two user-facing changes are introduced in the MERGE INTO syntax:
   - WHEN NOT MATCHED BY SOURCE clause.
   - Optional BY TARGET qualifier for WHEN NOT MATCHED clauses.
   
   The updated Spark SQL syntax is:
   ```
   MERGE INTO target_table_name [target_alias]
      USING source_table_reference [source_alias]
      ON merge_condition
      { WHEN MATCHED [ AND matched_condition ] THEN matched_action |
        WHEN NOT MATCHED [BY TARGET] [ AND not_matched_condition ] THEN not_matched_action
        WHEN NOT MATCHED BY SOURCE [ AND not_matched_by_source_condition ] THEN not_matched_by_source_action } [...]
   
   matched_action
    { DELETE |
      UPDATE SET * |
      UPDATE SET { column = [ expr | DEFAULT ] } [, ...] }
   
   not_matched_action
    { INSERT * |
      INSERT (column1 [, ...] ) VALUES ( expr | DEFAULT ] [, ...] )
   
   not_matched_by_source_action
    { DELETE |
      UPDATE SET { column = [ expr | DEFAULT ] } [, ...] }
   ```
   This syntax replicates the semantics used by other vendors, see:
   - (Microsoft T-SQL Merge)[https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql]
   - (Google BigQuery Merge)[https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_statement]
   
   
   ### How was this patch tested?
   Tests are extended or added to cover the following aspect of the change:
   - Parsing (in DDLParserSuite.scala):
     - Existing tests are extended to cover parsing of WHEN NOT MATCHED BY SOURCE clauses in a range of cases. This covers parsing the clause with optional conditions and a variety of UPDATE and DELETE actions.
     - New tests are added to cover NOT MATCHED BY TARGET and invalid action UPDATE SET * for WHEN NOT MATCHED BY SOURCE.
   - Analysis (in PlanResolutionSuite.scala):
     - Existing tests are extended to also cover attribute reference resolution for WHEN NOT MATCHED BY SOURCE conditions and actions together with other type of clauses.
     - New tests are added to cover reference resolution specific to WHEN NOT MATCHED BY SOURCE clauses:
       - Unqualified reference to a column present both in the target and source table is not ambiguous in WHEN NOT MATCHED BY SOURCE conditions or actions since it can only refer to the target table.
       - Reference to columns in the source table are invalid in WHEN NOT MATCHED BY SOURCE.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1008123172


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -602,6 +602,24 @@
     ],
     "sqlState" : "42000"
   },
+  "NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one MATCHED clauses in a MERGE statement, only the last MATCHED clause can omit the condition."
+    ],
+    "sqlState" : "42000"
+  },
+  "NON_LAST_NOT_MATCHED_BY_SOURCE_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one NOT MATCHED BY SOURCE clauses in a MERGE statement, only the last NOT MATCHED BY SOURCE clause can omit the condition."
+    ],
+    "sqlState" : "42000"
+  },
+  "NON_LAST_NOT_MATCHED_CLAUSE_OMIT_CONDITION" : {

Review Comment:
   is it more accurate to name it `NON_LAST_NOT_MATCHED_BY_TARGET_CLAUSE_OMIT_CONDITION `



##########
core/src/main/resources/error/error-classes.json:
##########
@@ -602,6 +602,24 @@
     ],
     "sqlState" : "42000"
   },
+  "NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one MATCHED clauses in a MERGE statement, only the last MATCHED clause can omit the condition."
+    ],
+    "sqlState" : "42000"
+  },
+  "NON_LAST_NOT_MATCHED_BY_SOURCE_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one NOT MATCHED BY SOURCE clauses in a MERGE statement, only the last NOT MATCHED BY SOURCE clause can omit the condition."
+    ],
+    "sqlState" : "42000"
+  },
+  "NON_LAST_NOT_MATCHED_CLAUSE_OMIT_CONDITION" : {

Review Comment:
   is it more accurate to name it `NON_LAST_NOT_MATCHED_BY_TARGET_CLAUSE_OMIT_CONDITION`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #38400:
URL: https://github.com/apache/spark/pull/38400#issuecomment-1298784075

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1008130147


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1563,13 +1563,29 @@ class Analyzer(override val catalogManager: CatalogManager)
                 }
                 InsertAction(
                   resolvedInsertCondition,
-                  resolveAssignments(assignments, m, resolveValuesWithSourceOnly = true))
+                  resolveAssignments(assignments, m, resolveValuesFrom = sourceTable))
               case o => o
             }
+            val newNotMatchedBySourceActions = m.notMatchedBySourceActions.map {
+              case DeleteAction(deleteCondition) =>
+                val resolvedDeleteCondition = deleteCondition.map(
+                  resolveExpressionByPlanChildren(_, Project(Nil, targetTable)))

Review Comment:
   nit: we can call `resolveExpressionByPlanOutput`, then we don't need to wrap the plan with `Project(Nil, ...)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] johanl-db commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
johanl-db commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1008270419


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -451,7 +451,22 @@ class AstBuilder extends SqlBaseParserBaseVisitor[AnyRef] with SQLConfHelper wit
         }
       }
     }
-    if (matchedActions.isEmpty && notMatchedActions.isEmpty) {
+    val notMatchedBySourceActions = ctx.notMatchedBySourceClause().asScala.map {
+      clause => {
+        val notMatchedBySourceAction = clause.notMatchedBySourceAction()
+        if (notMatchedBySourceAction.DELETE() != null) {
+          DeleteAction(Option(clause.notMatchedBySourceCond).map(expression))
+        } else if (notMatchedBySourceAction.UPDATE() != null) {
+          val condition = Option(clause.notMatchedBySourceCond).map(expression)
+          UpdateAction(condition,
+            withAssignments(clause.notMatchedBySourceAction().assignmentList()))
+        } else {
+          // It should not be here.
+          throw QueryParsingErrors.unrecognizedNotMatchedBySourceActionError(clause)

Review Comment:
   Parsing will fail if a user tries to use an invalid action. We don't cover this error in tests since it can't surface unless there's a bug. I replaced it (and similar  error for matched / not matched) with internal errors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1008124090


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -801,6 +819,21 @@
     ],
     "sqlState" : "42000"
   },
+  "UNRECOGNIZED_MATCHED_ACTION" : {
+    "message" : [
+      "Unrecognized matched action: <matchedAction>."
+    ]
+  },
+  "UNRECOGNIZED_NOT_MATCHED_ACTION" : {

Review Comment:
   ditto, `UNRECOGNIZED_NOT_MATCHED_BY_TARGET_ACTION`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] johanl-db commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
johanl-db commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1009246467


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1583,21 +1599,20 @@ class Analyzer(override val catalogManager: CatalogManager)
     def resolveAssignments(
         assignments: Seq[Assignment],
         mergeInto: MergeIntoTable,
-        resolveValuesWithSourceOnly: Boolean): Seq[Assignment] = {
+        resolveValuesFrom: LogicalPlan): Seq[Assignment] = {
       assignments.map { assign =>
         val resolvedKey = assign.key match {
           case c if !c.resolved =>
             resolveMergeExprOrFail(c, Project(Nil, mergeInto.targetTable))
           case o => o
         }
         val resolvedValue = assign.value match {
-          // The update values may contain target and/or source references.
           case c if !c.resolved =>
-            if (resolveValuesWithSourceOnly) {
-              resolveMergeExprOrFail(c, Project(Nil, mergeInto.sourceTable))
-            } else {
-              resolveMergeExprOrFail(c, mergeInto)
+            val resolveFromChildren = resolveValuesFrom match {
+              case m: MergeIntoTable => m
+              case p => Project(Nil, p)

Review Comment:
   I went the enum route, introduced MergeResolvePolicy.



##########
core/src/main/resources/error/error-classes.json:
##########
@@ -589,6 +589,24 @@
       "More than one row returned by a subquery used as an expression."
     ]
   },
+  "NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one MATCHED clauses in a MERGE statement, only the last MATCHED clause can omit the condition."
+    ],
+    "sqlState" : "42000"
+  },
+  "NON_LAST_NOT_MATCHED_BY_SOURCE_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one NOT MATCHED BY SOURCE clauses in a MERGE statement, only the last NOT MATCHED BY SOURCE clause can omit the condition."
+    ],
+    "sqlState" : "42000"
+  },
+  "NON_LAST_NOT_MATCHED_BY_TARGET_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one NOT MATCHED clauses in a MERGE statement, only the last NOT MATCHED clause can omit the condition."

Review Comment:
   Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO
URL: https://github.com/apache/spark/pull/38400


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1009084918


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1583,21 +1599,20 @@ class Analyzer(override val catalogManager: CatalogManager)
     def resolveAssignments(
         assignments: Seq[Assignment],
         mergeInto: MergeIntoTable,
-        resolveValuesWithSourceOnly: Boolean): Seq[Assignment] = {
+        resolveValuesFrom: LogicalPlan): Seq[Assignment] = {
       assignments.map { assign =>
         val resolvedKey = assign.key match {
           case c if !c.resolved =>
             resolveMergeExprOrFail(c, Project(Nil, mergeInto.targetTable))
           case o => o
         }
         val resolvedValue = assign.value match {
-          // The update values may contain target and/or source references.
           case c if !c.resolved =>
-            if (resolveValuesWithSourceOnly) {
-              resolveMergeExprOrFail(c, Project(Nil, mergeInto.sourceTable))
-            } else {
-              resolveMergeExprOrFail(c, mergeInto)
+            val resolveFromChildren = resolveValuesFrom match {
+              case m: MergeIntoTable => m
+              case p => Project(Nil, p)

Review Comment:
   or we pass an enum:
   ```
     object MergeResolvePolicy extends Enumeration {
       val BOTH, TARGET, SOURCE = Value
     }
   ```
   then the code here can be
   ```
   val basePlan = policy match {
     case BOTH => m
     case SOURCE => Project(Nil, m.source)
     case TARGET => Project(Nil, m.target)
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1009082879


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1583,21 +1599,20 @@ class Analyzer(override val catalogManager: CatalogManager)
     def resolveAssignments(
         assignments: Seq[Assignment],
         mergeInto: MergeIntoTable,
-        resolveValuesWithSourceOnly: Boolean): Seq[Assignment] = {
+        resolveValuesFrom: LogicalPlan): Seq[Assignment] = {
       assignments.map { assign =>
         val resolvedKey = assign.key match {
           case c if !c.resolved =>
             resolveMergeExprOrFail(c, Project(Nil, mergeInto.targetTable))
           case o => o
         }
         val resolvedValue = assign.value match {
-          // The update values may contain target and/or source references.
           case c if !c.resolved =>
-            if (resolveValuesWithSourceOnly) {
-              resolveMergeExprOrFail(c, Project(Nil, mergeInto.sourceTable))
-            } else {
-              resolveMergeExprOrFail(c, mergeInto)
+            val resolveFromChildren = resolveValuesFrom match {
+              case m: MergeIntoTable => m
+              case p => Project(Nil, p)

Review Comment:
   this pattern match looks a bit hacky. How about adding a new method?
   ```
   def resolveAssignmentsByPlanOutput ... {
     resolveAssignments(..., Project(Nil, p))
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1009081519


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -589,6 +589,24 @@
       "More than one row returned by a subquery used as an expression."
     ]
   },
+  "NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one MATCHED clauses in a MERGE statement, only the last MATCHED clause can omit the condition."
+    ],
+    "sqlState" : "42000"
+  },
+  "NON_LAST_NOT_MATCHED_BY_SOURCE_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one NOT MATCHED BY SOURCE clauses in a MERGE statement, only the last NOT MATCHED BY SOURCE clause can omit the condition."
+    ],
+    "sqlState" : "42000"
+  },
+  "NON_LAST_NOT_MATCHED_BY_TARGET_CLAUSE_OMIT_CONDITION" : {
+    "message" : [
+      "When there are more than one NOT MATCHED clauses in a MERGE statement, only the last NOT MATCHED clause can omit the condition."

Review Comment:
   ```suggestion
         "When there are more than one NOT MATCHED [BY TARGET] clauses in a MERGE statement, only the last NOT MATCHED [BY TARGET] clause can omit the condition."
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38400:
URL: https://github.com/apache/spark/pull/38400#issuecomment-1292521738

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1008135830


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -451,7 +451,22 @@ class AstBuilder extends SqlBaseParserBaseVisitor[AnyRef] with SQLConfHelper wit
         }
       }
     }
-    if (matchedActions.isEmpty && notMatchedActions.isEmpty) {
+    val notMatchedBySourceActions = ctx.notMatchedBySourceClause().asScala.map {
+      clause => {
+        val notMatchedBySourceAction = clause.notMatchedBySourceAction()
+        if (notMatchedBySourceAction.DELETE() != null) {
+          DeleteAction(Option(clause.notMatchedBySourceCond).map(expression))
+        } else if (notMatchedBySourceAction.UPDATE() != null) {
+          val condition = Option(clause.notMatchedBySourceCond).map(expression)
+          UpdateAction(condition,
+            withAssignments(clause.notMatchedBySourceAction().assignmentList()))
+        } else {
+          // It should not be here.
+          throw QueryParsingErrors.unrecognizedNotMatchedBySourceActionError(clause)

Review Comment:
   If this means a bug, we should throw `SparkException.internalError`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] johanl-db commented on a diff in pull request #38400: [SPARK-40921][SQL] Add WHEN NOT MATCHED BY SOURCE clause to MERGE INTO

Posted by GitBox <gi...@apache.org>.
johanl-db commented on code in PR #38400:
URL: https://github.com/apache/spark/pull/38400#discussion_r1008270419


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##########
@@ -451,7 +451,22 @@ class AstBuilder extends SqlBaseParserBaseVisitor[AnyRef] with SQLConfHelper wit
         }
       }
     }
-    if (matchedActions.isEmpty && notMatchedActions.isEmpty) {
+    val notMatchedBySourceActions = ctx.notMatchedBySourceClause().asScala.map {
+      clause => {
+        val notMatchedBySourceAction = clause.notMatchedBySourceAction()
+        if (notMatchedBySourceAction.DELETE() != null) {
+          DeleteAction(Option(clause.notMatchedBySourceCond).map(expression))
+        } else if (notMatchedBySourceAction.UPDATE() != null) {
+          val condition = Option(clause.notMatchedBySourceCond).map(expression)
+          UpdateAction(condition,
+            withAssignments(clause.notMatchedBySourceAction().assignmentList()))
+        } else {
+          // It should not be here.
+          throw QueryParsingErrors.unrecognizedNotMatchedBySourceActionError(clause)

Review Comment:
   Parsing will fail if a user tries to use an invalid action. We don't cover this error in tests since it can't surface unless there's a but. I replaced it (and similar  error for matched / not matched) by internal errors.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1563,13 +1563,29 @@ class Analyzer(override val catalogManager: CatalogManager)
                 }
                 InsertAction(
                   resolvedInsertCondition,
-                  resolveAssignments(assignments, m, resolveValuesWithSourceOnly = true))
+                  resolveAssignments(assignments, m, resolveValuesFrom = sourceTable))
               case o => o
             }
+            val newNotMatchedBySourceActions = m.notMatchedBySourceActions.map {
+              case DeleteAction(deleteCondition) =>
+                val resolvedDeleteCondition = deleteCondition.map(
+                  resolveExpressionByPlanChildren(_, Project(Nil, targetTable)))

Review Comment:
   Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org