You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/18 08:04:04 UTC

[GitHub] [hudi] guanziyue commented on pull request #4444: [HUDI-3026] don't allow HoodieAppendHandle roll over to next fileID

guanziyue commented on pull request #4444:
URL: https://github.com/apache/hudi/pull/4444#issuecomment-1044103610


   > Thanks, i saw your description in JIRA:
   > 
   > > In the first attempt 1, we write three records 5,4,3 to fileID_1_log.1_attempt1. But this attempt failed. Spark will have a try in the second task attempt (attempt 2), we write four records 1,2,3,4 to  fileID_1_log.1_attempt2. And then, we find this filegroup is large enough by call canWrite. So hudi write record 5 to fileID_2_log.1_attempt2 and finish this commit.
   > 
   > Do you mean the attempt 1 writes a complete/full log block in the log file then failed ? Can we write a rollback block when spark tasl failover there ?
   > 
   > If we made the change as this path, the ability for precise file size control/bin-packing lost.
   
   Sorry, for 
   
   > Thanks, i saw your description in JIRA:
   > 
   > > In the first attempt 1, we write three records 5,4,3 to fileID_1_log.1_attempt1. But this attempt failed. Spark will have a try in the second task attempt (attempt 2), we write four records 1,2,3,4 to  fileID_1_log.1_attempt2. And then, we find this filegroup is large enough by call canWrite. So hudi write record 5 to fileID_2_log.1_attempt2 and finish this commit.
   > 
   > Do you mean the attempt 1 writes a complete/full log block in the log file then failed ? Can we write a rollback block when spark tasl failover there ?
   > 
   > If we made the change as this path, the ability for precise file size control/bin-packing lost.
   Thanks. Agree with your concern. I will go deep into marker file mechanism to find if there is a way to be aware of previous failed task attempts. I have less knowledge relevant to marker file. Will have this updated as soon as I come up with a good solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org