You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/04/02 18:43:38 UTC

[GitHub] [incubator-iceberg] stevenzwu commented on issue #873: Clean orphan data files

stevenzwu commented on issue #873: Clean orphan data files 
URL: https://github.com/apache/incubator-iceberg/issues/873#issuecomment-608036978
 
 
   @waterlx current code does try to clean up uncommitted files after closing and uploading the files. We just swallow all exceptions if the deletion failed and rely on S3 bucket retention to purge garbage.
   https://github.com/Netflix-Skunkworks/nfflink-connector-iceberg/blob/master/nfflink-connector-iceberg/src/main/java/com/netflix/spaas/nfflink/connector/iceberg/sink/IcebergWriter.java#L301-L313
   
   @rdblue it will be great if `FileAppender` can provide an `abort()` API. Right now, Iceberg writer calls `close()` API, which also uploads the files even though we don't need it to be uploaded. It just creates cleanup work for us.
   
   however, here is one case I am not sure how to do active cleanup. Writers uploaded the files, then failed/crashed. Where do we keep track of those files? In S3, these corner cases can be taken care of by bucket/prefix level retention policy.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org