You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/11/05 11:05:30 UTC

[GitHub] [spark] cxzl25 opened a new pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

cxzl25 opened a new pull request #34493:
URL: https://github.com/apache/spark/pull/34493


   ### What changes were proposed in this pull request?
   SPARK-29295 introduces a mechanism that writes to external tables is a dynamic partition method, and the data in the target partition will be deleted first.
   
   Assuming that 1001 partitions are written, the data of 10001 partitions will be deleted first, but because `hive.exec.max.dynamic.partitions` is 1000 by default, loadDynamicPartitions will fail at this time, but the data of 1001 partitions has been deleted.
   
   So we can check whether the number of dynamic partitions is greater than `hive.exec.max.dynamic.partitions` before deleting, it should fail quickly at this time.
   
   ### Why are the changes needed?
   Avoid data that cannot be recovered when the job fails.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   add UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-993124234


   Merged, thanks! also going to cherry-pick to branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on a change in pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on a change in pull request #34493:
URL: https://github.com/apache/spark/pull/34493#discussion_r766874775



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
##########
@@ -1905,4 +1905,15 @@ object QueryExecutionErrors {
   def cannotConvertOrcTimestampToTimestampNTZError(): Throwable = {
     new RuntimeException("Unable to convert timestamp of Orc to data type 'timestamp_ntz'")
   }
+
+  def writePartitionExceedConfigSizeWhenDynamicPartitionError(
+    numWrittenParts: Int,

Review comment:
       Sorry, I did not notice the indentation problem here, you have already provided an example before, thank you.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-963408906


   I feel even though we can't guarantee operations like delete to be atomic, we should make effort to do so. This PR looks simple enough and fixes a potential issue which could corrupt an external Hive table, so I think it's well worth it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #34493:
URL: https://github.com/apache/spark/pull/34493#discussion_r766869518



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
##########
@@ -1905,4 +1905,15 @@ object QueryExecutionErrors {
   def cannotConvertOrcTimestampToTimestampNTZError(): Throwable = {
     new RuntimeException("Unable to convert timestamp of Orc to data type 'timestamp_ntz'")
   }
+
+  def writePartitionExceedConfigSizeWhenDynamicPartitionError(
+    numWrittenParts: Int,

Review comment:
       check here: https://github.com/databricks/scala-style-guide#indent




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-962657954


   cc @viirya too since this is related to your change in SPARK-29295


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-993137663


   > @cxzl25 could you open another PR to backport this to branch-3.2? I tried to cherry-pick it but there's some conflict.
   
   ok, i will do it now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-975113863


   Can we continue to review this pr? @dongjoon-hyun @sunchao @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-963486352


   Sounds good to me. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-962812348


   > Hmm, my question is, as we are going to overwrite the table partitions, why we need to prevent data to be deleted? Any other delete-like command, I think if any failure happens during deletion, there will be some data that are already deleted before the failure. I think we don't provide atomicity guarantee for this command, right?
   
   Yes. I agree with you.
   Operation is not guaranteed to be atomic.
   Failure during the deletion process is not guaranteed to be restored.
   
   
   
   But in this case, if the number of dynamic partitions exceeds `hive.exec.max.dynamic.partitions`, Spark deletes the partition data first, and then checks that the number of partitions exceeds the configured number when `client.loadDynamicPartitions` loads the data, and it fails immediately. No data is written to the partition.
   
   The user thought that the operation was not successful, and theoretically the original data should still be there.
   
   Or the user will check whether the number of partitions meets expectations. If it does, the user needs to adjust the hive configuration. If it does not, it needs to modify the sql logic.
   It also takes time to re-run sql, and the data during this period will not be able to be read.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34493:
URL: https://github.com/apache/spark/pull/34493#discussion_r744036832



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
##########
@@ -192,6 +192,17 @@ case class InsertIntoHiveTable(
     if (partition.nonEmpty) {
       if (numDynamicPartitions > 0) {
         if (overwrite && table.tableType == CatalogTableType.EXTERNAL) {
+          val numWrittenParts = writtenParts.size
+          val maxDynamicPartitionsKey = "hive.exec.max.dynamic.partitions"
+          val maxDynamicPartitions = hadoopConf.getInt(maxDynamicPartitionsKey, 1000)
+          if (numWrittenParts > maxDynamicPartitions) {
+            val maxDynamicPartitionsErrMsg =
+              s"Number of dynamic partitions created is $numWrittenParts" +
+                s", which is more than $maxDynamicPartitions" +
+                s". To solve this try to set $maxDynamicPartitionsKey" +
+                s" to at least $numWrittenParts."

Review comment:
       Do you think we set `hive.exec.max.dynamic.partitions` automatically from Spark side in this case?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #34493:
URL: https://github.com/apache/spark/pull/34493#discussion_r766116988



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
##########
@@ -192,6 +192,17 @@ case class InsertIntoHiveTable(
     if (partition.nonEmpty) {
       if (numDynamicPartitions > 0) {
         if (overwrite && table.tableType == CatalogTableType.EXTERNAL) {
+          val numWrittenParts = writtenParts.size
+          val maxDynamicPartitionsKey = "hive.exec.max.dynamic.partitions"

Review comment:
       can we use `HiveConf.ConfVars.DYNAMICPARTITIONMAXPARTS.varname` for the key and `HiveConf.ConfVars.DYNAMICPARTITIONMAXPARTS.defaultIntVal` instead of `1000`?

##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
##########
@@ -192,6 +192,17 @@ case class InsertIntoHiveTable(
     if (partition.nonEmpty) {
       if (numDynamicPartitions > 0) {
         if (overwrite && table.tableType == CatalogTableType.EXTERNAL) {
+          val numWrittenParts = writtenParts.size
+          val maxDynamicPartitionsKey = "hive.exec.max.dynamic.partitions"
+          val maxDynamicPartitions = hadoopConf.getInt(maxDynamicPartitionsKey, 1000)
+          if (numWrittenParts > maxDynamicPartitions) {
+            val maxDynamicPartitionsErrMsg =
+              s"Number of dynamic partitions created is $numWrittenParts" +
+                s", which is more than $maxDynamicPartitions" +
+                s". To solve this try to set $maxDynamicPartitionsKey" +
+                s" to at least $numWrittenParts."
+            throw new SparkException(maxDynamicPartitionsErrMsg)

Review comment:
       nit: we may want to group this error message and define it in `QueryExecutionErrors`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao closed pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
sunchao closed pull request #34493:
URL: https://github.com/apache/spark/pull/34493


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-993143043


   branch-3.2 https://github.com/apache/spark/pull/34889


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-961829324


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on a change in pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on a change in pull request #34493:
URL: https://github.com/apache/spark/pull/34493#discussion_r744092734



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
##########
@@ -192,6 +192,17 @@ case class InsertIntoHiveTable(
     if (partition.nonEmpty) {
       if (numDynamicPartitions > 0) {
         if (overwrite && table.tableType == CatalogTableType.EXTERNAL) {
+          val numWrittenParts = writtenParts.size
+          val maxDynamicPartitionsKey = "hive.exec.max.dynamic.partitions"
+          val maxDynamicPartitions = hadoopConf.getInt(maxDynamicPartitionsKey, 1000)
+          if (numWrittenParts > maxDynamicPartitions) {
+            val maxDynamicPartitionsErrMsg =
+              s"Number of dynamic partitions created is $numWrittenParts" +
+                s", which is more than $maxDynamicPartitions" +
+                s". To solve this try to set $maxDynamicPartitionsKey" +
+                s" to at least $numWrittenParts."

Review comment:
       It is possible to automatically adjust the number of `hive.exec.max.dynamic.partitions`. 
   However, if it is automatically adjusted, many partitions may be created accidentally, and this parameter is meaningless. 
   
   https://github.com/apache/hive/blob/135629b8d6b538fed092641537034a9fbc59c7a0/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1857-L1864
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-963475448


   It may be that the PR title is not clear.
   Maybe i can change to
   The number of dynamic partitions should early check when writing to external tables ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on a change in pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #34493:
URL: https://github.com/apache/spark/pull/34493#discussion_r766868871



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
##########
@@ -1905,4 +1905,15 @@ object QueryExecutionErrors {
   def cannotConvertOrcTimestampToTimestampNTZError(): Throwable = {
     new RuntimeException("Unable to convert timestamp of Orc to data type 'timestamp_ntz'")
   }
+
+  def writePartitionExceedConfigSizeWhenDynamicPartitionError(
+    numWrittenParts: Int,

Review comment:
       nit: we need to use 4-space indentation here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sunchao commented on pull request #34493: [SPARK-37217][SQL] The number of dynamic partitions should early check when writing to external tables

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-993136875


   @cxzl25 could you open another PR to backport this to branch-3.2? I tried to cherry-pick it but there's some conflict.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cxzl25 commented on a change in pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
cxzl25 commented on a change in pull request #34493:
URL: https://github.com/apache/spark/pull/34493#discussion_r744294467



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
##########
@@ -2642,6 +2642,35 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi
       }
     }
   }
+
+  test("SPARK-37217 Dynamic partitions should fail quickly " +
+    "when writing to external tables to prevent data deletion") {
+    withTable("test") {
+      withTempDir { f =>
+        sql("CREATE EXTERNAL TABLE test(id int) PARTITIONED BY (p1 string, p2 string) " +
+          s"STORED AS PARQUET LOCATION '${f.getAbsolutePath}'")
+
+        withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false",
+          "hive.exec.dynamic.partition.mode" -> "nonstrict") {
+          val insertSQL =
+            """
+              |INSERT OVERWRITE TABLE test PARTITION(p1='n1', p2)
+              |SELECT * FROM VALUES (1, 'n2'), (2, 'n3'), (3, 'n4') AS t(id, p2)
+              """.stripMargin

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-961829324


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-961829324


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #34493:
URL: https://github.com/apache/spark/pull/34493#discussion_r744036266



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
##########
@@ -2642,6 +2642,35 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi
       }
     }
   }
+
+  test("SPARK-37217 Dynamic partitions should fail quickly " +
+    "when writing to external tables to prevent data deletion") {
+    withTable("test") {
+      withTempDir { f =>
+        sql("CREATE EXTERNAL TABLE test(id int) PARTITIONED BY (p1 string, p2 string) " +
+          s"STORED AS PARQUET LOCATION '${f.getAbsolutePath}'")
+
+        withSQLConf(HiveUtils.CONVERT_METASTORE_PARQUET.key -> "false",
+          "hive.exec.dynamic.partition.mode" -> "nonstrict") {
+          val insertSQL =
+            """
+              |INSERT OVERWRITE TABLE test PARTITION(p1='n1', p2)
+              |SELECT * FROM VALUES (1, 'n2'), (2, 'n3'), (3, 'n4') AS t(id, p2)
+              """.stripMargin

Review comment:
       indentation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-962294661


   cc @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org