You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/22 17:53:16 UTC

[GitHub] [hudi] satishkotha commented on a change in pull request #2196: [HUDI-1349]spark sql support overwrite use replace action

satishkotha commented on a change in pull request #2196:
URL: https://github.com/apache/hudi/pull/2196#discussion_r510350766



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -93,6 +93,11 @@ private[hudi] object HoodieSparkSqlWriter {
       operation = WriteOperationType.INSERT
     }
 
+    // If the mode is Overwrite, should use INSERT_OVERWRITE operation

Review comment:
       I think this won't work. Insert overwrite only updates partitions that have records in the dataframe. Other partitions continue to have old data.
   
   But, I like the idea. Maybe we can add additional configuration to insert_overwrite to mark old partitions as 'deleted'? This can be done in a way to support https://issues.apache.org/jira/browse/HUDI-1350. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org