You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/07/30 21:01:51 UTC

[GitHub] [incubator-iceberg] rdblue commented on issue #330: Orphan manifest file when performing delete in transaction

rdblue commented on issue #330: Orphan manifest file when performing delete in transaction
URL: https://github.com/apache/incubator-iceberg/issues/330#issuecomment-516594310
 
 
   Here's my response from the dev list:
   
   I suspect that this has something to do with PR #218 that introduced special handling for files that are deleted in transactions. The problem that PR fixed was that a manifest was created, merged, and then deleted. Then the transaction failed to commit and retried. The manifest that was created was reused, but in the retry it didn’t get merged and was still a valid metadata file. Since the file had been deleted on the first try, the table was missing a manifest.
   
   The fix was to introduce a lazy delete for cleaning up. The transaction keeps track of files to delete and deletes them after the commit succeeds. What might be happening here is the first time the transaction tries to commit, it is out of date and retries, then the original manifest is not deleted on the second attempt. Looking at the cleanup code, I think this looks like the problem because the filtered manifest cache is cleared as files are deleted: https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L336
   
   I think the fix is to add a list of files that should be deleted on every attempt. When the filtered cache is cleared, each file should be deleted and moved to the delete list. That way future attempts also delete the files.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org