You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/01 10:50:11 UTC

[GitHub] [hudi] boneanxs edited a comment on pull request #5048: [HUDI-3634] Could read empty or partial HoodieCommitMetaData in downstream if using HDFS

boneanxs edited a comment on pull request #5048:
URL: https://github.com/apache/hudi/pull/5048#issuecomment-1085748174


   > Why turn it on by default? Do reads fail too often?
   
   Yeah, I just turn it on by default to make sure all tests could pass after enabling it. I agree with you that we should keep the default behavior. I'll change this
   
   > What about cloud object stores? Rename may not be atomic there right?
   
   Yes, it is not atomic for cloud object stores, but I see we use `ConsistencyGuard` to ensure consistency, can this guarantee it?  Another problem is rename operation is very time-consuming in cloud object stores, I actually don't catch a better idea to address this.  Maybe we should add a commit list file to track completed commits, but this could introduce other concurrent issues like modifying the commit list file by two different writers; Or we might need to wait to use HUDI metastore server to ensure commit consistency.
   
   Anyway, we can at least keep consistency in HDFS by this patch currently.
   
   > Have you run this in a high throughput scenario? Any lags? I think for hdfs we should be good as rename just needs to change metadata.
   
   Yes, this patch is already online in our internal environment, and it runs well, our hdfs cluster could have very high throughput in the midnight, and write commit contents could cost more than 2s in some extreme cases, but rename is very quickly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org