You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "umehrot2 (via GitHub)" <gi...@apache.org> on 2023/01/27 00:55:54 UTC

[GitHub] [hudi] umehrot2 commented on issue #7600: Hoodie clean is not deleting old files for MOR table

umehrot2 commented on issue #7600:
URL: https://github.com/apache/hudi/issues/7600#issuecomment-1405874402

   @SabyasachiDasTR @koochiswathiTR The issue here is similar to https://github.com/apache/hudi/issues/3739 . I believe what is happening here is that you are setting CLEANER_HOURS_RETAINED to 2 days. But meanwhile, archival is running more aggressively. By default archival will maintain maximum 30 commits in the active timeline - https://hudi.apache.org/docs/0.11.1/configurations#hoodiekeepmaxcommits. Hence, in your case by the time cleaner is run and its trying to clean up commits older than 2 days, those commits are already archived. And hence cleaner even though it is scheduled, it is not finding anything to clean based on the logs you have provided.
   
   If you want to continue with you current cleaner config, you should set https://hudi.apache.org/docs/0.11.1/configurations#hoodiekeepmaxcommits to be higher than the number of commits you have in a span of 2 days. Essentially, you want to cleaner to run at a higher frequency than archival.
   
   As for cleaning the data, you should disable https://hudi.apache.org/docs/configurations/#hoodiecleanerincrementalmode while running the clean manually. This is needed because in your case, you want to cleaner to go back in time and clean dangling files which are older than last time the cleaner was run.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org