You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/25 19:10:37 UTC

[GitHub] [hudi] hussein-awala commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

hussein-awala commented on PR #7041:
URL: https://github.com/apache/hudi/pull/7041#issuecomment-1291019081

   I tested the PR in our project, it works fine as expected. For each clean we have the 3 states requested, inflight and completed, and the clean planner checks only the partitions that have been modified since `earliestCommitToRetain`.
   
   Recently, we incremented `CLEAN_MAX_COMMITS` to 24 as @nsivabalan [proposed](https://github.com/apache/hudi/issues/6953#issuecomment-1283143573) in order to clean the tables every 24 hours (we have a commit per hour) and avoid listing S3 partitions in the tables with with infrequently changed partitions, but the config doesn't work as expected, because after 24 commits, if the list of files to delete is empty, the cleaner will be executed at each next commit until delete something, because for the clean planner, the last clean was when the were some files to delete, and all the next clean operations are not considered because they write nothing to the timeline.
   
   In brief, we need this patch ASAP, can you please add it to 0.13.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org