You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/23 02:39:12 UTC

[GitHub] [spark] brkyvz opened a new pull request #27677: [SPARK-30924][SQL] Add additional checks to Merge Into

brkyvz opened a new pull request #27677: [SPARK-30924][SQL] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677
 
 
   ### What changes were proposed in this pull request?
   
   Merge Into is currently missing additional validation around:
   
    1. The lack of any WHEN statements
    2. The first WHEN MATCHED statement needs to have a condition if there are two WHEN MATCHED statements.
    3. Single use of UPDATE/DELETE
   
   This PR introduces these validations.
   (1) is required, because otherwise the MERGE statement is useless.
   (2) is required, because otherwise the second WHEN MATCHED condition becomes dead code
   (3) is up for debate, but the idea there is that a single expression should be sufficient to specify when you would like to update or delete your records. We restrict it for now to reduce surface area and ambiguity.
   
   ### Why are the changes needed?
   
   To ease DataSource developers when building implementations for MERGE
   
   ### Does this PR introduce any user-facing change?
   
   Adds additional validation checks
   
   ### How was this patch tested?
   
   Unit tests

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590033074
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118824/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590180594
 
 
   LGTM except for https://github.com/apache/spark/pull/27677/files#r383097562

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#discussion_r383097633
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
 ##########
 @@ -1190,6 +1190,56 @@ class DDLParserSuite extends AnalysisTest {
     assert(exc.getMessage.contains("There should be at most 1 'WHEN NOT MATCHED' clause."))
   }
 
+  test("merge into table: the first matched clause must have a condition if there's a second") {
+    val exc = intercept[ParseException] {
+      parsePlan(
+        """
+          |MERGE INTO testcat1.ns1.ns2.tbl AS target
+          |USING testcat2.ns1.ns2.tbl AS source
+          |ON target.col1 = source.col1
+          |WHEN MATCHED THEN DELETE
+          |WHEN MATCHED THEN UPDATE SET target.col2 = source.col2
+          |WHEN NOT MATCHED AND (target.col2='insert')
+          |THEN INSERT (target.col1, target.col2) values (source.col1, source.col2)
+        """.stripMargin)
+    }
+
+    assert(exc.getMessage.contains("the first MATCHED clause must have a condition"))
+  }
+
+  test("merge into table: there must be a when (not) matched condition") {
+    val exc = intercept[ParseException] {
+      parsePlan(
+        """
+          |MERGE INTO testcat1.ns1.ns2.tbl AS target
+          |USING testcat2.ns1.ns2.tbl AS source
+          |ON target.col1 = source.col1
+        """.stripMargin)
+    }
+
+    assert(exc.getMessage.contains("There must be at least one WHEN clause in a MERGE statement"))
+  }
+
+  test("merge into table: there can be only a single use DELETE or UPDATE") {
+    Seq("UPDATE SET *", "DELETE").foreach { op =>
 
 Review comment:
   seems better to fail at the implementation side if it's not supported.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590020181
 
 
   **[Test build #118824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118824/testReport)** for PR 27677 at commit [`a7614ac`](https://github.com/apache/spark/commit/a7614ac7014f6547c20c5303bf05bdc10a15f9d2).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#discussion_r383097562
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala
 ##########
 @@ -1190,6 +1190,56 @@ class DDLParserSuite extends AnalysisTest {
     assert(exc.getMessage.contains("There should be at most 1 'WHEN NOT MATCHED' clause."))
   }
 
+  test("merge into table: the first matched clause must have a condition if there's a second") {
+    val exc = intercept[ParseException] {
+      parsePlan(
+        """
+          |MERGE INTO testcat1.ns1.ns2.tbl AS target
+          |USING testcat2.ns1.ns2.tbl AS source
+          |ON target.col1 = source.col1
+          |WHEN MATCHED THEN DELETE
+          |WHEN MATCHED THEN UPDATE SET target.col2 = source.col2
+          |WHEN NOT MATCHED AND (target.col2='insert')
+          |THEN INSERT (target.col1, target.col2) values (source.col1, source.col2)
+        """.stripMargin)
+    }
+
+    assert(exc.getMessage.contains("the first MATCHED clause must have a condition"))
+  }
+
+  test("merge into table: there must be a when (not) matched condition") {
+    val exc = intercept[ParseException] {
+      parsePlan(
+        """
+          |MERGE INTO testcat1.ns1.ns2.tbl AS target
+          |USING testcat2.ns1.ns2.tbl AS source
+          |ON target.col1 = source.col1
+        """.stripMargin)
+    }
+
+    assert(exc.getMessage.contains("There must be at least one WHEN clause in a MERGE statement"))
+  }
+
+  test("merge into table: there can be only a single use DELETE or UPDATE") {
+    Seq("UPDATE SET *", "DELETE").foreach { op =>
 
 Review comment:
   This is a bit hard to justify. For example, we may want to update different rows with different new values according to the condition.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590032957
 
 
   **[Test build #118824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118824/testReport)** for PR 27677 at commit [`a7614ac`](https://github.com/apache/spark/commit/a7614ac7014f6547c20c5303bf05bdc10a15f9d2).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590020251
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590182425
 
 
   You still can though. It just makes the update condition a bit
   more complicated with a when() function. If we're going to limit the number
   of statements to two, I think this is fine to have for now. What do you
   think?
   
   On Sun, Feb 23, 2020, 10:15 PM Wenchen Fan <no...@github.com> wrote:
   
   > LGTM except for
   > https://github.com/apache/spark/pull/27677/files#r383097562
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/spark/pull/27677?email_source=notifications&email_token=ABIAE62HV5HXSESI3DN6OPLRENQW3A5CNFSM4KZWXRZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMWXB4Q#issuecomment-590180594>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ABIAE6234ILSU57EHFGZK7TRENQW3ANCNFSM4KZWXRZA>
   > .
   >
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590033073
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590193091
 
 
   thanks, merging to master/3.0!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590020253
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23574/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590188948
 
 
   > It just makes the update condition a bit more complicated with a when() function.
   
   OK this makes sense. Since we only allow 2 WHEN matched statements, users should try to compact updates into a single statement as possible as they can.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590033073
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590020181
 
 
   **[Test build #118824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118824/testReport)** for PR 27677 at commit [`a7614ac`](https://github.com/apache/spark/commit/a7614ac7014f6547c20c5303bf05bdc10a15f9d2).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590033074
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/118824/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590020251
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27677: [SPARK-30924][SQL][3.0] Add additional checks to Merge Into
URL: https://github.com/apache/spark/pull/27677#issuecomment-590020253
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23574/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org