You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/10 23:03:58 UTC

[GitHub] [iceberg] RussellSpitzer opened a new issue #2317: Corrupted Metadata when Catalog Fails During Commit

RussellSpitzer opened a new issue #2317:
URL: https://github.com/apache/iceberg/issues/2317


   The current logic for doing a commit is at a high level as follows
   
   1. Write all files including a new-metadata.json for the operation
   2. Acquiring a lock
   3. Swapping the pointer from old-metadata.json to new-metadata.json
   4. Releasing the lock
   
   If we fail during 3 we will always attempt to cleanup the files we created in step 1. This is a problem when step 3 (the swap) has been successful server side but the client has not received an acknowledgment. This leads to a state where 
   
   1. Catalog is pointing to new-metadata.json
   2. Our client is actively removing old-metadata.json and all files which were added for the operation
   
   Future clients which are able to contact the HMS will see the new-metdata.json location, attempt to read it and fail. 
   
   What's worse is that if there are multiple clients attempting to work with this table, there is a window of time in which a second client can read new-metadata.json before it is removed and can build a new-new-metadata.json based off of it. The new-new-metadata.json will now have references to files which are in the process of being removed by the first client.
   
   
   To avoid this we need to handle failures in stage 3 of table commits slightly differently, basically we need to group failures into two categories:
   
   1. Failures that are reports from the Catalog that it could not perform the operation for some reason
   2. Failures where the client has lost contact for some reason (basically everything else)
   
   Type 1 failures can be cleaned up, we know the commit did not succeed and the files we have currently generated are more or less useless.
   
   Type 2 failures must still be reported as failures, but cannot be retried and cannot be cleaned up. We do not know if our metadata.json is pointing to the new file or not and thus we cannot resolve the situation until communication with the HMS is restored. Since we are usually talking about a client perspective here, I believe the right thing to do is to fail and let the user know they need to manually check and clean up files if necessary.
   
   I haven't checked other catalog implementations to see if they have similar vulnerabilities so we should probably be checking those as well. The HMS code in question is [here](https://github.com/apache/iceberg/blob/c7ff6d51377eaae7f476e0c0730dea3b5a84fabb/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L214-L229)
   
   CC: @aokolnychyi , @karuppayya , @raptond 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary edited a comment on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
pvary edited a comment on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796568356


   As a Hive user I would hate if I have to handle Type 2 errors. Is this rare enough to leave unhandled?
   
   Just brainstorming:
   - If we want to avoid this situation we can acquire a shared lock for getting the snapshot as well. This would mean that during the swap operation we should wait until the write operation is either finished successfully or failed. Since ideally both swap and getting the snapshot is a very short operation this should not cause too much trouble. OTOH if there are plenty of reads the we can have problem getting the exclusive lock for writes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangdove commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
zhangdove commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796395354


   We are using the Spark computing engine to read MySQL data and overwrite Iceberg (used by HadoopCatalog). When the timeout with MySQL is too long, the data file is cleaned up, while the file new-metadata.json still exists.
   
   I'm not sure if it's the same as this ISSUE, but it looks a bit like @RussellSpitzer 's analysis.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
marton-bod commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796636597


   When running into a Type 2 error,  I think our retry logic would need to be changed. When retrying for a Type 2, we shouldn't cleanup the files prematurely, but instead should first attempt to reconnect to the catalog to double-check if the earlier operation succeeded. 
   
   - If we get an answer, and our snapshot is in the history of the table, then we're essentially done and no need to do anything.
   - If we get an answer, and our snapshot is not in the table history, we proceed with the file cleanup and the retry. 
   - If we cannot get ahold of the catalog persistently, then we'll give up the retry operation but still don't do any file cleanup. That ensures that if the operation did succeed, then we're not messing things up. If it was unsuccessful, that could leave some dangling files temporarily but that should be cleaned up by the Cleaner eventually, IIUC.
   
   What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796287993


   @RussellSpitzer, will you submit a PR for this? I'll help reviewing.
   
   cc @pvary @rymurr @shardulm94 @rdblue too


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary edited a comment on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
pvary edited a comment on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796740977


   > Eventually the end user will need to run "remove orphan files" or something like the case of Type 2 Failure that is actually a failure and not just a network issue.
   
   Thanks for the info!
   So the tools are there, but manual intervention is needed.
   
   The number of these manual interventions could be minimized with @marton-bod's suggestion by checking the table status after a Type 2 failure and running cleanup automatically if the connection is available again and the commit was not successful. This solution looks good to me


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796354558


   Thanks @RussellSpitzer for filing this. We hit this issue at LinkedIn recently with HiveCatalog and were investigating the issue. Your analysis seems to line up with what we too observed. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796352164


   Yeah I can start digging into this soon, but I only am confident enough to handle the HMS use case here, I would suggest all the other catalog creators do a similar check to see if they have the same issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 edited a comment on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
shardulm94 edited a comment on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796354558


   Thanks @RussellSpitzer for filing this. We hit this issue at LinkedIn recently with HiveCatalog and were investigating the issue. Your analysis seems to line up with what we too observed.  cc: @omalley 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796568356


   As a Hive user I would hate if I have to handle Type 2 errors. Is this rare enough to leave unhandled?
   
   Just brainstorming:
   - If we want to avoid this situation we can acquire a shared lock for getting the snapshot as well. This would mean that during the swap operation we should wait until the write operation is either finished successfully or failed. Since ideally both swap and getting the snapshot is a very short operation this should not cause too much trouble.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796951503


   Posted a WIP ^ for feedback while I work on tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796737113


   Eventually the end user will need to run "remove orphan files" or something like the case of Type 2 Failure that is actually a failure and not just a network issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796722305


   @pvary If we do not handle them we basically corrupt the iceberg table when they happen and require manual restoration/
   
   I agree with @marton-bod, we can really do any amount of retries and reconnects but eventually we have to give up and at that point we must not clean up.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796735828


   > I agree with @marton-bod, we can really do any amount of retries and reconnects but eventually we have to give up and at that point we must not clean up.
   
   Is @marton-bod's assumption is correct, that if we give up and the commit was not successful then the dangling files will be cleaned up by the Cleaner eventually?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi closed issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
aokolnychyi closed issue #2317:
URL: https://github.com/apache/iceberg/issues/2317


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796287993


   @RussellSpitzer, will you submit a PR for this? I'll help reviewing.
   
   cc @pvary @rymurr @shardulm94 too


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796740977


   > Eventually the end user will need to run "remove orphan files" or something like the case of Type 2 Failure that is actually a failure and not just a network issue.
   
   Thanks for the info!
   So the tools are there, but manual intervention is needed.
   
   The number of these manual interventions could be minimized with @marton-bod's suggestion by checking the table status after a Type 2 failure and running cleanup automatically if the connection is available again and the commit was not successful.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rymurr commented on issue #2317: Corrupted Metadata when Catalog Fails During Commit

Posted by GitBox <gi...@apache.org>.
rymurr commented on issue #2317:
URL: https://github.com/apache/iceberg/issues/2317#issuecomment-796687147


   Re Hive: @marton-bod's logic sounds correct to me, including the case of multiple commiters. I guess its a matter of accurately identifying type 2 vs type 1
   
   Re other catalogs. Nessie is susceptible to this issue as well. I think the fix is straightforward - certain exceptions don't delete metadata. Retrying the commit again is safe as it will create an empty commit. Polluting the commit log is annoying but at least the data is safe. Will raise a PR for Nessie soon to fix this. Nessie avoids the branching behaviour described by @RussellSpitzer thanks to it forcing a linear history so its safe with multiple commiters too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org