You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ChristinaTech (via GitHub)" <gi...@apache.org> on 2023/04/03 20:22:49 UTC

[GitHub] [iceberg] ChristinaTech commented on issue #7151: GlueTableOperations/DynamoDbTableOperations can delete current metadata file after incorrect exception handling

ChristinaTech commented on issue #7151:
URL: https://github.com/apache/iceberg/issues/7151#issuecomment-1494931953

   @ryanyuan @c0d3monk So, if the issue actually occurs the way to recover the table is stated in the overview via a manual Glue `UpdateTable` API call to set the `metadata_location` property to equal the `previous_metadata_location`. But that's not so much as a workaround as recovery once it happens.
   
   Considering the missing metadata failure condition is pretty easy to detect in code once it happens via catching `NotFoundException`, it would technically be possible to automate this fix and then retry the job, though you would have to be careful not to change anything else in the Glue Table metadata in the process to be safe.
   
   As for a workaround that avoids ending up in this situation in the first place, some potential options besides my pending upstream fix are to:
   1. Set the Catalog Option [`s3.delete-enabled`](https://iceberg.apache.org/docs/1.2.0/aws/#s3-tags) to `false` so the step that actually corrupts the table becomes a no-op. If you do this though, you will Orphan any files you attempt to the system attempts to delete/expire, so make sure you don't have that option set in whatever context you use to [DeleteOrphanFiles](https://iceberg.apache.org/docs/1.2.0/maintenance/#delete-orphan-files).
   2. If you want to use the prior option but limit what gets Orphaned, you can potentially extend [S3FileIO](https://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java) with a version where `deleteFile` no-ops for metadata files, then specify that modified version of S3FileIO for your `io-impl` parameter.
   1. Provide a [custom AWS Client factory](https://iceberg.apache.org/docs/1.2.0/aws/#aws-client-customization) that disables API retries for the Glue API client. The downside, as documented earlier, is:
   > This hurts reliability in normal usage to an unacceptable degree.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org