You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "boneanxs (via GitHub)" <gi...@apache.org> on 2023/03/07 02:45:37 UTC

[GitHub] [hudi] boneanxs commented on a diff in pull request #8076: [HUDI-5884] Support bulk_insert for insert_overwrite and insert_overwrite_table

boneanxs commented on code in PR #8076:
URL: https://github.com/apache/hudi/pull/8076#discussion_r1127280692


##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala:
##########
@@ -88,17 +88,15 @@ object InsertIntoHoodieTableCommand extends Logging with ProvidesHoodieConfig wi
           extraOptions: Map[String, String] = Map.empty): Boolean = {
     val catalogTable = new HoodieCatalogTable(sparkSession, table)
 
-    var mode = SaveMode.Append
-    var isOverWriteTable = false
-    var isOverWritePartition = false
-    if (overwrite && partitionSpec.isEmpty) {
-      // insert overwrite table
-      mode = SaveMode.Overwrite
-      isOverWriteTable = true
+    val mode = if (overwrite) {
+      SaveMode.Overwrite

Review Comment:
   Given the `Overwrite` mode doesn't care abt the old data, do we need to enable `bulk_insert` by default if it's `Overwrite` mode?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -850,9 +854,12 @@ object HoodieSparkSqlWriter {
     if (operation != WriteOperationType.DELETE) {
       if (mode == SaveMode.ErrorIfExists && tableExists) {
         throw new HoodieException(s"hoodie table at $tablePath already exists.")
-      } else if (mode == SaveMode.Overwrite && tableExists && operation != WriteOperationType.INSERT_OVERWRITE_TABLE) {
-        // When user set operation as INSERT_OVERWRITE_TABLE,
-        // overwrite will use INSERT_OVERWRITE_TABLE operator in doWriteOperation
+      } else if (mode == SaveMode.Overwrite && tableExists &&

Review Comment:
   Not sure why we need to explicitly delete old data if it's `Overwrite` mode, this behavior actually make the HUDI not ACID-compliant(I keep it here to make the tests pass).
   
   Maybe we should only delete old data if using `drop table` command?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org