You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/17 14:36:36 UTC

[GitHub] [hudi] codope commented on issue #5880: Need help in removing old files from S3 after upsert leading to `duplicate records` within `S3` prefix which `dont appear in the AWS Athena` editor

codope commented on issue #5880:
URL: https://github.com/apache/hudi/issues/5880#issuecomment-1158935896

   @gtwuser You can setup cleaning to clear old data either using hudi-cli or along with the writes. Check [this doc](https://hudi.apache.org/docs/hoodie_cleaner) for more details. 
   But, it should not result in duplicate (as you see in the case of Athena). Any query engine that understands the Hudi metadata should be able to filter out older data in the latest snapshot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org