You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jesse Yates (JIRA)" <ji...@apache.org> on 2012/06/18 18:07:42 UTC

[jira] [Created] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Jesse Yates created HBASE-6230:
----------------------------------

             Summary: [brainstorm] "Restore" snapshots for HBase 0.96
                 Key: HBASE-6230
                 URL: https://issues.apache.org/jira/browse/HBASE-6230
             Project: HBase
          Issue Type: Brainstorming
            Reporter: Jesse Yates


Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396125#comment-13396125 ] 

Jonathan Hsieh edited comment on HBASE-6230 at 6/19/12 1:37 AM:
----------------------------------------------------------------

A quick survey of how snapshots are taken and restored in other systems.

|| Database || Taking DB snapshot ||	Restoring a DB snapshot	|| mechanism / requirements ||	read-only snapshot mount? ||	read-write snapshot mount? ||	what happens to previous after a restore? ||	Links		||
| mysql	| Take FS snapshot	| Take FS snapshot + log recovery	| FS Snapshot |	No |	No |	gone	| http://www.mysqlperformanceblog.com/2006/08/21/using-lvm-for-mysql-backup-and-replication-setup/ 	http://forge.mysql.com/w/images/c/c1/MySQL_Backups_using_File_System_Snapshots-2009-02-26.pdf	|
| postgres |	Take FS snapshot |	Take FS snapshot + point in time log recovery |	FS Snapshot |	No |	No |	gone	| https://blogs.oracle.com/jkshah/entry/snapshots_with_postgresql_and_amber	http://www.postgresql.org/docs/9.0/static/backup-file.html	http://www.postgresql.org/docs/8.3/static/continuous-archiving.html |
|oracle	| More like a consistent copytable to another database, different snapshot concept, requires secondary db |	It seems like a consistent copy table |	... |	yes |	Need to copy	| nothing	| http://www.dba-oracle.com/data_warehouse/table_replication.htm	http://www.csee.umbc.edu/portal/help/oracle8/server.815/a67791/mview.htm	http://www.careerride.com/Oracle-what-is-snapshot.aspx |
|ms sqlserver |	Take, then originals saved to COW files as active is modified |	Take new snapshot, then convert COW to active db	| Sparse files |	yes |	yes |	snapshot |	http://www.simple-talk.com/sql/database-administration/sql-server-2005-snapshots/	http://msdn.microsoft.com/en-us/library/ms187054(SQL.90).aspx	|
| accumulo |	clone metadata |	No need | Metadata clone |	no need |	yes |	no need	|https://github.com/apache/accumulo/blob/trunk/server/src/main/java/org/apache/accumulo/server/master/tableOps/CloneTable.java		|
                
      was (Author: jmhsieh):
    A quick survey of how snapshots are taken and restored in other systems.

|| Database || Taking DB snapshot ||	Restoring a DB snapshot	|| mechanism / requirements ||	read-only snapshot mount? ||	read-write snapshot mount? ||	what happens to previous after a restore? ||	Links		||
| mysql	| Take FS snapshot	| Take FS snapshot + log recovery	| FS Snapshot |	No |	No |	gone	| http://www.mysqlperformanceblog.com/2006/08/21/using-lvm-for-mysql-backup-and-replication-setup/ 	http://forge.mysql.com/w/images/c/c1/MySQL_Backups_using_File_System_Snapshots-2009-02-26.pdf	|
| postgres |	Take FS snapshot |	Take FS snapshot + point in time log recovery |	FS Snapshot |	No |	No |	gone	| https://blogs.oracle.com/jkshah/entry/snapshots_with_postgresql_and_amber	http://www.postgresql.org/docs/9.0/static/backup-file.html	http://www.postgresql.org/docs/8.3/static/continuous-archiving.html |
|oracle	| More like a consistent copytable to another database, different snapshot concept, requires secondary db |	It seems like a consistent copy table |	... |	yes |	Need to copy	| nothing	| http://www.dba-oracle.com/data_warehouse/table_replication.htm	http://www.csee.umbc.edu/portal/help/oracle8/server.815/a67791/mview.htm	http://www.careerride.com/Oracle-what-is-snapshot.aspx |
|ms sqlserver |	Take, then originals saved to COW files as active is modified |	Take new snapshot, then convert COW to active db	| Sparse files |	yes |	yes |	snapshot |	http://www.simple-talk.com/sql/database-administration/sql-server-2005-snapshots/	http://msdn.microsoft.com/en-us/library/ms187054(SQL.90).aspx	|
| accumulo |	clone metadata |	No need	Metadata | clone |	no need |	yes |	no need	|https://github.com/apache/accumulo/blob/trunk/server/src/main/java/org/apache/accumulo/server/master/tableOps/CloneTable.java		|
                  
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13395998#comment-13395998 ] 

Jesse Yates commented on HBASE-6230:
------------------------------------

Restore.

There seems to be five different things people want to do after taking a snapshot.

1) "Oh crap" recovery of a table. 
 - this is generally going to be enabled by just having the file references, table info and respective region infos to rebuild the table state. Currently, this is enabled by just having the snapshot on the fs.

2) Restore to read-only table
 - Take the existing snapshot and make it into a read only table

3) Restore to read/write table
 - take an existing snapshot and make it into a full-fledged table

4) Swap an existing table for the underlying snapshot
 - this should snapshot the existing table and then replace that table with the desired snapshot

5) Export snapshot to separate cluster and enable as read/write
 - do (3), after first copying over files onto another cluster.

The other major issue here is going to be how we name these operations such that it makes sense with existing semantics in other databases, either that we pick new words to separate the action from existing connotations or match the existing semantics with our implementation.

Lets talk high-level, user/admin interface before getting down into the the implementation. The latter should fall out once we have the former.
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400545#comment-13400545 ] 

Jonathan Hsieh commented on HBASE-6230:
---------------------------------------

It's been pointed out to me that oracle has a feature called "flashback" (more specifically a "flashback archive" which is seems functionally closer to we've for the use cases I think we've been talking about.

http://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_flashback.htm
http://www.oracle.com/technetwork/database/features/availability/flashback-overview-082751.html
http://www.orafaq.com/node/50
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesse Yates resolved HBASE-6230.
--------------------------------

    Resolution: Fixed

Committed to https://github.com/jyates/hbase/tree/snapshots. 
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-6230-v1.patch, HBASE-6230-v2.patch, HBASE-6230-v3.patch, SnapshotRestore-v0.pdf
>
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415705#comment-13415705 ] 

Jonathan Hsieh commented on HBASE-6230:
---------------------------------------

We need to agree on these definitions.

bq. How does this correspond to restoring a table from a snapshot when the table doesn't exist?

Do you mean "restoring" to a different table name in this case (which matteo's named "clone")  or a situation where a user deleted his table and you want to restore?

Restoring to the same name seems to be straight forwards to me.

There is a bunch of work that would need to be done if you are restoring a snapshot of a table to a different name that the original while maintaining all of the invariants in regions and the their encoded names.  Ex: the region dir corresponds to a hash of the tablename, startkey, and region ts.  Changing the table name means potentially regenerating and moving a bunch of metadata stuff.

Clonign and Exporting are follow on features that we'd like to have possible but doesn't need to be in the first cut that makes it into trunk.  From my point of view we do need to have snapshot and restore (and some system tests) before we could consider committing to trunk.  
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>         Attachments: SnapshotRestore-v0.pdf
>
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400545#comment-13400545 ] 

Jonathan Hsieh edited comment on HBASE-6230 at 6/25/12 3:41 PM:
----------------------------------------------------------------

It's been pointed out to me that oracle has a feature called "flashback" (more specifically a "flashback archive") which is seems functionally closer to we've for the use cases I think we've been talking about.

http://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_flashback.htm
http://www.oracle.com/technetwork/database/features/availability/flashback-overview-082751.html
http://www.orafaq.com/node/50
                
      was (Author: jmhsieh):
    It's been pointed out to me that oracle has a feature called "flashback" (more specifically a "flashback archive" which is seems functionally closer to we've for the use cases I think we've been talking about.

http://docs.oracle.com/cd/B28359_01/appdev.111/b28424/adfns_flashback.htm
http://www.oracle.com/technetwork/database/features/availability/flashback-overview-082751.html
http://www.orafaq.com/node/50
                  
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396173#comment-13396173 ] 

Matt Corgan commented on HBASE-6230:
------------------------------------

Amazon just released a hosted HBase service (based on 0.92).  Here's their backup documentation as an additional reference: http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hbase-backup-restore.html
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497523#comment-13497523 ] 

Matteo Bertozzi commented on HBASE-6230:
----------------------------------------

Yes, the restore depends on restore interface HBASE-6777
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-6230-v1.patch, SnapshotRestore-v0.pdf
>
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-6230:
-----------------------------------

    Attachment: SnapshotRestore-v0.pdf

Attached a document on how the Restore/Rollback work and an initial draft of the restore code https://reviews.apache.org/r/5963/
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>         Attachments: SnapshotRestore-v0.pdf
>
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396179#comment-13396179 ] 

Matteo Bertozzi commented on HBASE-6230:
----------------------------------------

Another question is: Do we need to disable the table during a restore?
Can we disable just some region servers? only the one that have changes?

Probably you're restoring due to a "corruption" (I've accidentaly deleted from row X to row Y)
Some applications can tolerate that data is missing or "bad", some other can not.

                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-6230:
-----------------------------------

    Attachment: HBASE-6230-v2.patch
    
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-6230-v1.patch, HBASE-6230-v2.patch, SnapshotRestore-v0.pdf
>
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi reassigned HBASE-6230:
--------------------------------------

    Assignee: Matteo Bertozzi
    
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-6230:
-----------------------------------

    Attachment: HBASE-6230-v3.patch
    
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-6230-v1.patch, HBASE-6230-v2.patch, HBASE-6230-v3.patch, SnapshotRestore-v0.pdf
>
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396125#comment-13396125 ] 

Jonathan Hsieh commented on HBASE-6230:
---------------------------------------

A quick survey of how snapshots are taken and restored in other systems.

|| Database || Taking DB snapshot ||	Restoring a DB snapshot	|| mechanism / requirements ||	read-only snapshot mount? ||	read-write snapshot mount? ||	what happens to previous after a restore? ||	Links		||
| mysql	| Take FS snapshot	| Take FS snapshot + log recovery	| FS Snapshot |	No |	No |	gone	| http://www.mysqlperformanceblog.com/2006/08/21/using-lvm-for-mysql-backup-and-replication-setup/ 	http://forge.mysql.com/w/images/c/c1/MySQL_Backups_using_File_System_Snapshots-2009-02-26.pdf	|
| postgres |	Take FS snapshot |	Take FS snapshot + point in time log recovery |	FS Snapshot |	No |	No |	gone	| https://blogs.oracle.com/jkshah/entry/snapshots_with_postgresql_and_amber	http://www.postgresql.org/docs/9.0/static/backup-file.html	http://www.postgresql.org/docs/8.3/static/continuous-archiving.html |
|oracle	| More like a consistent copytable to another database, different snapshot concept, requires secondary db |	It seems like a consistent copy table |	... |	yes |	Need to copy	| nothing	| http://www.dba-oracle.com/data_warehouse/table_replication.htm	http://www.csee.umbc.edu/portal/help/oracle8/server.815/a67791/mview.htm	http://www.careerride.com/Oracle-what-is-snapshot.aspx |
|ms sqlserver |	Take, then originals saved to COW files as active is modified |	Take new snapshot, then convert COW to active db	| Sparse files |	yes |	yes |	snapshot |	http://www.simple-talk.com/sql/database-administration/sql-server-2005-snapshots/	http://msdn.microsoft.com/en-us/library/ms187054(SQL.90).aspx	|
| accumulo |	clone metadata |	No need	Metadata | clone |	no need |	yes |	no need	|https://github.com/apache/accumulo/blob/trunk/server/src/main/java/org/apache/accumulo/server/master/tableOps/CloneTable.java		|
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410682#comment-13410682 ] 

Jesse Yates commented on HBASE-6230:
------------------------------------

{quote}
 Restore Table

Given a "snapshot name" restore override the original table with the snapshot content.
Before restoring a new snapshot of the table is taken, just to avoid bad situations.
(If the table is not disabled we can keep serving reads)

This allows a full and quick rollback to a previous snapshot.
{quote}

+1 on the general design.

How does this correspond to restoring a table from a snapshot when the table doesn't exist? I feel like this should be a semantically different use case, though the underlying implementation will probably only differ in terms of not taking a snapshot of the existing table because no existing table can exist. I'd propose that Restore -> Rollback and Restore then means just taking a snapshot and creating a table from it. This means on the external cluster, the exported snapshot is then 'restored' on the remote cluster.

{quote}
Clone Snapshot
{quote}

This could be very, very tricky in terms of multiple tables reading the same files. You would have to make sure that no other tables are using the current HFiles when a compaction comes around. Otherwise, when you archive the files, you will break the other table using those files. Maybe there is some niceness in HDFS that will blowup on you when trying to move a file someone else is currently reading, but that would take some investigation. I have a feeling there is also a bunch of code that assumes a certain layout for the files that will make this hard. I'm not saying its not doable, but its not going to be trivial.

{quote}
* To Restore only "individual items" (only some small range of data was lost from "current")
** MR job that scan the cloned table and update the data in the original one. (Partial restore of the data)
{quote}

This seems like  slightly more difficult proposal. I'm not adverse to doing this, but it isn't a trivial operation and probably should be taken care of by a Map/Reduce job that exports to a 'small' (depending on data-size), temporary table so we can easily filter out the right ranges without having to stand up a special region or do a ton of compactions. This means it becomes an inherently slower operation, but should be performant enough for recovering data and makes lots of sense to recovering a very large chunk in terms of overall throughput (though you probably want to just restore a clone at that point).

This brings up another potential nicety  - a snapshot and clone operation. Takes a snapshot of the existing table and then stands up a clone of that data. Small addition to the interface and to me what a real 'clone' operation should do.

{quote}
Export Snapshot
{quote}
+1 Let the remote cluster restore the snapshot if they want to do it - don't force a table to be stood up immediately.
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-6230:
----------------------------------

    Issue Type: Sub-task  (was: Brainstorming)
        Parent: HBASE-6055
    
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-6230-v1.patch, HBASE-6230-v2.patch, HBASE-6230-v3.patch, SnapshotRestore-v0.pdf
>
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497522#comment-13497522 ] 

Jesse Yates commented on HBASE-6230:
------------------------------------

this needs the restore interface patch (HBASE-6777) first, right [~mbertozzi]?
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-6230-v1.patch, SnapshotRestore-v0.pdf
>
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407357#comment-13407357 ] 

Matteo Bertozzi commented on HBASE-6230:
----------------------------------------

I think that the 3 foundamental blocks for restore are:
 * Restore Table - restore <snapshot name> 
 * Clone Table - clone <snapshot name> <table name> [readonly]
 * Export snapshot - hbase o.a.h.h.mapreduce.ExportSnapshot <snapshot name> <address>

On top of these operation you can create all the tools that you want to allow to allow the user to restore just a piece of data, show the difference between snapshot A, snapshot B and the original table, and so on...

h5. Restore Table
Given a "snapshot name" restore override the original table with the snapshot content.
Before restoring a new snapshot of the table is taken, just to avoid bad situations.
(If the table is not disabled we can keep serving reads)

This allows a full and quick rollback to a previous snapshot.

h5. Clone Table
Given a "snapshot name" a new table is created with the content of the specified snapshot. 

This operation allows:
 * To have an old version of the table in parallel with the current one.
 ** Look at snapshot side-by-side with the "current" before making the decision whether to roll back or not
 * To Restore only "individual items" (only some small range of data was lost from "current")
 ** MR job that scan the cloned table and update the data in the original one. (Partial restore of the data)
 * if the table is not marked as read-only
 ** To Add/Remove data from this table without affecting the original one or the snapshot.

h5. Export Snapshot
Copy the "snapshot name" to the specified cluster. This will be more or less a distcp of the snapshot folder. And allows you to copy a snapshotted table without going through CopyTable. After this operation the snapshot will be visible with "list_snapshot" and can be restored or cloned.
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6230) [brainstorm] "Restore" snapshots for HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-6230:
-----------------------------------

    Attachment: HBASE-6230-v1.patch

attached the latest patch available on review board. 
                
> [brainstorm] "Restore" snapshots for HBase 0.96
> -----------------------------------------------
>
>                 Key: HBASE-6230
>                 URL: https://issues.apache.org/jira/browse/HBASE-6230
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Jesse Yates
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-6230-v1.patch, SnapshotRestore-v0.pdf
>
>
> Discussion ticket around the definitions/expectations of different parts of snapshot restoration.  This is complementary, but separate from the _how_ of taking a snapshot of a table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira