You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Matteo Bertozzi (JIRA)" <ji...@apache.org> on 2012/12/03 18:05:58 UTC
[jira] [Commented] (HBASE-7245) Recovery on failed restore.

    [ https://issues.apache.org/jira/browse/HBASE-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508857#comment-13508857 ] 

Matteo Bertozzi commented on HBASE-7245:
----------------------------------------

The set of operations that have this kind of problem are:
 * create table: remove the table if failed (rollback) the user already received the failure
 * delete table: finish removing the table (rollforward) restoring the table is impossible
 * clone table: remove the table if failed (rollback) same as create table
 * restore table: finish restoring the table (rollforward) finish the restore
 * snapshot: removing the tmp folder (rollback)

One simple solution is to drop a "operation lock" file in the table folder, and on master startup, if the file is present look at the operation enum serialized and execute the "rollback/rollforward". (Note that if the master is not down, you can do the recovery catching the exception)
                
> Recovery on failed restore.
> ---------------------------
>
>                 Key: HBASE-7245
>                 URL: https://issues.apache.org/jira/browse/HBASE-7245
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jonathan Hsieh
>            Assignee: Matteo Bertozzi
>             Fix For: hbase-6055, 0.96.0
>
>
> Restore will do updates to the file system and to meta.  it seems that an inopportune failure before meta is completely updated could result in an inconsistent state that would require hbck to fix.
> We should define what the semantics are for recovering from this.  Some suggestions:
> 1) Fail Forward (see some log saying restore's meta edits not completed, then gather information necessary to build it all from fs, and complete meta edits.).
> 2) Fail backwards (see some log saying restore's meta edits not completed, delete incomplete snapshot region entries from meta.)  
> I think I prefer 1 -- if two processes end somehow updating  (somehow the original master didn't die, and a new one started up) they would be idempotent.  If we used 2, we could still have a race and still be in a bad place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira