You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/04/03 00:10:30 UTC

[GitHub] [incubator-iceberg] rdblue edited a comment on issue #873: Clean orphan data files

rdblue edited a comment on issue #873: Clean orphan data files 
URL: https://github.com/apache/incubator-iceberg/issues/873#issuecomment-608157501
 
 
   @stevenzwu, this isn't the responsibility of the appender. When writing a file, the write should invoke close to clean up resources whether or not the write was successful. If the write was successful, then the writer can move on. Otherwise, it is the writer's responsibility to delete the file.
   
   The case you're talking about is specific to our internal S3 file system. When the internal details of the output stream are unknown, there isn't anything to do that can "abort" expensive operations. Because we primarily use the Hadoop FS API, that's the situation we're in. There is only close and clean up afterwards. That's why we haven't implemented this suggestion.
   
   The right behavior here is to make a FileIO delete call to clean up a file if a write failed. In addition, tasks are responsible for deleting other completed data files that will not be committed because of task failure.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org