You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/17 06:34:09 UTC

[GitHub] [hudi] boneanxs commented on issue #6938: [SUPPORT] HoodieTimelineArchiver could archive uncleaned replace commits causing duplicates

boneanxs commented on issue #6938:
URL: https://github.com/apache/hudi/issues/6938#issuecomment-1280359415

   @yihua Yea, Identifying replaced file groups might be time consuming, we have to list affected partitions to build `FileSystemView` to get replaced file groups. I'm thinking If using `HoodieMetadataFileSystemView` in the end, the time cost of listing operation can be reduced a lot, besides, one replace operation usually doesn't contain many partitions, so maybe the time spent here can be acceptable(we can also make here run in parallel if there're many partitions affected)
   
   By the way, maybe we can provide a basic/simple fix at least address the issue(duplicates is actually a critical issue), and try to improve this logic in the long term.
   
   Do you think it's worth a try? Very appreciate for your suggestions!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org