You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/20 22:38:42 UTC

[GitHub] [iceberg] abmo-x commented on pull request #5311: DataWriter - failure to close should not add file to completedDataFiles

abmo-x commented on PR #5311:
URL: https://github.com/apache/iceberg/pull/5311#issuecomment-1190837724

   @rdblue @RussellSpitzer 
   Added a commit to clear currentWriter on close in BaseTaskWriter and added 2 test cases around failure to close and complete.
   
   I agree close should be only called once and we are relying on that behavior quite strongly and adding the data files. 
   However I have found the writers are held and closed more than once in various scenarios which causes this issue where a close resulted in failure and writers were in a bad state.
   
   1. when user defined functions catch all exceptions and ignore failures on write as seen in Flink's processElement which internally triggers a roll to new file.
   2. This behavior was also observed before and fix was made in https://github.com/apache/iceberg/pull/1749 
   
   Let me know your thoughts. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org