You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/01 12:53:13 UTC

[GitHub] [hudi] nsivabalan edited a comment on issue #3739: Hoodie clean is not deleting old files

nsivabalan edited a comment on issue #3739:
URL: https://github.com/apache/hudi/issues/3739#issuecomment-932199950


   awesome, good to know. So here is the thing. 
   Hudi has something called active timeline and archived timeline. Archival will kick in for every commit and move some older commits from active to archived. This is to keep the no of active commits within bounds. (contents of .hoodie folder)
   all operations within hudi will operate only on commits in active timeline(inserts, upserts, compaction, cleaning etc). 
   Since we have very aggressive archival configs, archival keeps the no of commits in active timeline to a lower no always. And so when cleaner tries to check if there are any eligible commits to be cleaned up, it can't find more than the configured value. so, if we increase the max commits for archival to kick in, lets say 20, until there are 20 commits, archival will not kick in. And so cleaner should be able to find more than configured value for cleaning to trigger. 
   May be we can add a warning or even throw exception if archival configs are aggressive compared to cleaner. bcoz, then silently cleaning becomes moot. 
   
   I have filed a [jira](https://issues.apache.org/jira/browse/HUDI-2511) on this regard and will follow up.
   
   Let me know if you need any more details. If not, can we close this issue out. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org