You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/02/04 19:10:25 UTC

[GitHub] [hudi] vinothchandar commented on a change in pull request #2296: [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpty() in HoodieSparkSqlWriter#write

vinothchandar commented on a change in pull request #2296:
URL: https://github.com/apache/hudi/pull/2296#discussion_r570476603



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -173,6 +173,10 @@ public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Opti
 
   public boolean commitStats(String instantTime, List<HoodieWriteStat> stats, Option<Map<String, String>> extraMetadata,
                              String commitActionType, Map<String, List<String>> partitionToReplaceFileIds) {
+    // Skip the empty commit
+    if (stats.isEmpty()) {

Review comment:
       I think there was an explicit ask to allow the empty commit before. Lets take deltastreamer which stores the offset of the kafka checkpoints in the commit metadata. If we don't commit when stats are empty the checkpoint will never advance. The transformer  in delta streamer could filter out all records read in that batch for e.g and lead to an empty commit. but the kafka offsets would have advanced. So its not good to do this IMO




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org