You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "bithw1 (via GitHub)" <gi...@apache.org> on 2023/03/29 12:38:54 UTC

[GitHub] [hudi] bithw1 opened a new issue, #8315: [SUPPORT]What does clean policy KEEP_LATEST_COMMITS really mean?

bithw1 opened a new issue, #8315:
URL: https://github.com/apache/hudi/issues/8315

   Hi,
   
   I am reading at https://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner/#examples.
   
   It gives an example for the clean policy KEEP_LATEST_COMMITS  and have set the following two options:
   
   ```
   hoodie.cleaner.policy=KEEP_LATEST_COMMITS
   hoodie.cleaner.commits.retained=2
   ```
   
   
   With `hoodie.cleaner.commits.retained=2`, will hudi will keep at least `two` versions for each file? 
    For example, I have a COW table and have committed 10 times(commit time c1,c2...c10,c1 is earliest and c10 is latest), and one file has committed only twice for commit time c1 and c8. 
   
   When I run the clean command, will the file still keep c1 and c8 for two versions? even c1 is too old since there have been 10 commits 
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 commented on issue #8315: [SUPPORT]What does clean policy KEEP_LATEST_COMMITS really mean?

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 commented on issue #8315:
URL: https://github.com/apache/hudi/issues/8315#issuecomment-1491735648

   Thanks @duc-dn , it helps a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 closed issue #8315: [SUPPORT]What does clean policy KEEP_LATEST_COMMITS really mean?

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 closed issue #8315: [SUPPORT]What does clean policy KEEP_LATEST_COMMITS really mean?
URL: https://github.com/apache/hudi/issues/8315


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] boundarymate commented on issue #8315: [SUPPORT]What does clean policy KEEP_LATEST_COMMITS really mean?

Posted by "boundarymate (via GitHub)" <gi...@apache.org>.
boundarymate commented on issue #8315:
URL: https://github.com/apache/hudi/issues/8315#issuecomment-1489641203

   I understand that each basefile is a full snapshot of data. Although the file is only written and updated on c1 and c8, the basefiles of c9 and c10 also have the latest data of c8. So I think it is enough to keep c9 and c10.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] duc-dn commented on issue #8315: [SUPPORT]What does clean policy KEEP_LATEST_COMMITS really mean?

Posted by "duc-dn (via GitHub)" <gi...@apache.org>.
duc-dn commented on issue #8315:
URL: https://github.com/apache/hudi/issues/8315#issuecomment-1491698903

   @bithw1 you can refer to this link: https://medium.com/@simpsons/how-to-configure-cleaner-configs-with-apache-hudi-7eec0ab21359 to more understand


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org