You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jesse Yates (JIRA)" <ji...@apache.org> on 2012/05/19 01:17:08 UTC

[jira] [Created] (HBASE-6055) Snapshots in HBase 0.96

Jesse Yates created HBASE-6055:
----------------------------------

             Summary: Snapshots in HBase 0.96
                 Key: HBASE-6055
                 URL: https://issues.apache.org/jira/browse/HBASE-6055
             Project: HBase
          Issue Type: New Feature
          Components: client, master, regionserver, zookeeper
            Reporter: Jesse Yates
            Assignee: Jesse Yates
             Fix For: 0.96.0


Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414766#comment-13414766 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

@Ted -- FYI, I keep 5 screens open each on a different one.  Then I can flip between them quick and comment on each.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503315#comment-13503315 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

I agree with Matteo wrt Jon's comments. I expect the 'online scaffolding' will have to dramatically change too, though it depends on what Jon comes up with.

[~jmhsieh] feel free to close out the old jiras for things that you are replacing (e.g. the Three-Phase Commit Framework), if you feel up to it.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463700#comment-13463700 ] 

Matteo Bertozzi commented on HBASE-6055:
----------------------------------------

When you're talking about hfiles, you are referring to the log files right? I've a bit a of confusion reading your comment, bacause the log files are sequence files. anyway...

The logs in /hbase/.logs are splitted (new files are created in region/recover.edits) and if you look at HRegion.replayRecoveredEditsIfAny(), the content of recover.edits is removed as soon as the edits are applied. Removed, not archived. And this means that as soon as the table goes online, the snapshot doesn't have a way to read those files.

but as you've said, the original (full) log is still available during split, but moved to the archive (.oldlogs) as soon as the split is done. 

This means that if you see files in recover.edits, you should have the full logs in /hbase/.logs folder. And you can keep a reference to them, as you do for the online snapshot.

Another semi-unrelated note... currently we keep full logs files, and the restore needs to split them (see the restore code SnapshotLogSplitter, https://github.com/matteobertozzi/hbase/blob/snapshot-dev/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/restore/RestoreSnapshotHelper.java#L398)
Can we move this logic at the end of the take snapshot operation and split the logs in .snapshot/region/recover.edits?
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408713#comment-13408713 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

@Matteo I've done that already, looks like my diff-ing got messed up :/ Working on pushing up a new patch...
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502910#comment-13502910 ] 

Matteo Bertozzi commented on HBASE-6055:
----------------------------------------

+1 on separate offline and online, but maybe we can keep a root jira to keep track of all the dependencies, and a general design doc
{code}
 + Snapshot in HBase
 |-- HFile Archiver
 |-- Offline Snapshot
 |----- Offline Snapshot
 |----- Cleaner
 |----- Restore/Clone
 |----- Shell
 |----- ...
 |-- Online Snapshot
 |----- Procedure
 |----- Exception Framework
 |----- Timestamp Snapshot
 |----- ...
{code}
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284373#comment-13284373 ] 

ramkrishna.s.vasudevan commented on HBASE-6055:
-----------------------------------------------

Nice doc Jesse.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414689#comment-13414689 ] 

Zhihong Ted Yu commented on HBASE-6055:
---------------------------------------

Flipping through 5 pages on review board is slow. So I am putting down some notes here.

For HStore.java:
The license header doesn't look like the standard format.
Please add audience and stability annotations to this new interface.
{code}
+  FileStatus[] getStoreFiles() throws IOException;
+
+  List<StoreFile> getStorefiles();
{code}
Why do we need two methods which are spelled almost the same, yet returning different types ? When refactoring, we should make the code cleaner.
There're many methods which don't have javadoc. Please add javadoc for them.
{code}
+  public HStore getDelgate() {
{code}
Correct spelling for the above method.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408528#comment-13408528 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

Looks like I'm just missing HBASE-6341 and HBASE-6283, both of which don't really change the patch. Keep in mind you will need to apply the latest patch from https://reviews.apache.org/r/4633/ (RB of latest for HBASE-5547, that code on trunk) before applying the patch - RB isn't obvious about having a parent diff.

Unless something has been committed to the svn that is significantly different and hasn't propagated to the git repo yet...
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280706#comment-13280706 ] 

Zhihong Yu commented on HBASE-6055:
-----------------------------------

The design document is very good.
Will get back to reviewing HBASE-5547 first.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437254#comment-13437254 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

Hm.. previous should have been labeled HBASE-6570.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6055) Offline Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesse Yates updated HBASE-6055:
-------------------------------

    Summary: Offline Snapshots in HBase 0.96  (was: Snapshots in HBase 0.96)
    
> Offline Snapshots in HBase 0.96
> -------------------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>            Priority: Blocker
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436382#comment-13436382 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

Looking at implementing concurrent compactions, there is an issue around allowing compactions and how to get a consistent view of the directory for each store. If a compaction is taking place and we 'ls' the directory for a store, the following may occur (or some semantically correct subset of the following):

* get the first set of HFiles in the directory
* compaction removes all the files
* compaction moves in its own files
* we get the next batch of files from the namenode for the original 'ls'

This leads to a munged (not necessarily incorrect) view of the hfiles that will require another compaction on restore to get a reasonable performance. There are a couple considerations here. 

(1) the above situation occurs only when we have _more files in a store that the ls limit on the namenode_, which is 1000 by default - the unit of atomicity. As long as a single store doesn't have more than 1000 files, then we can just ignore compactions entirely and snap away. However, once we breach 1000 files, this becomes a different, potentially far more complex to reason about, problem. 

(2) We can block for the currently running compactions to finish and then get a quick 'ls' between compactions starting. This is a bit more intrustive and will potentialy hold up the compaction queue for a little bit. Also as we have more files and a more active system it becomes increasingly likely to get a compaction and cause your snapshot to fail as it waits on the compaction to finish (since we time-bound snapshots to minimize impact on the system).

Personally, it seems unlikely that we are going to get more than 1000 files in a single store. However, if its unlikely, that means its probably going to happen :) Option (2) is far more intrusive and code intensive, potentially causing some lag in the system, but is sure to be safe once we get it right. 

Thoughts?
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500578#comment-13500578 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

Just force pushed the snapshot-dev-squash branch -- it didn't delete files that were supposed to be gone.

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464283#comment-13464283 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

{quote}
Another semi-unrelated note... currently we keep full logs files, and the restore needs to split them (see the restore code SnapshotLogSplitter, https://github.com/matteobertozzi/hbase/blob/snapshot-dev/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/restore/RestoreSnapshotHelper.java#L398)
Can we move this logic at the end of the take snapshot operation and split the logs in .snapshot/region/recover.edits?
{quote}

If we move it into the snapshot operation, then that will slow down the overall operation and make it more difficult to reason about how long a snapshot 'should' take. In particular, this becomes difficult because we want to give the client firm time bounds, but the log splitting is not time bounded (AFAIK).  

An alternative would be to have a background snapshot-log-splitter task that just goes through and splits logs for snapshots. It would basically comb though the snapshot directory, looking for snapshots. If it finds one it hasn't seen, it starts doing the current log splitting on that snapshot (which looks basically like the root directory of hbase - less the ROOT and META tables - so it should be almost, if not entirely, drop-in useable). When the logs are split, we would have to do a little extra checking to make sure that we don't restore a snapshot mid-split, or that if we do that it handles it properly. 
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282221#comment-13282221 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

sorry, forgot to mention that the new "correct" branch is snapshots-r0. It compiles locally for me and gets most of the way through the test (TestSnapshotFromClient).Should solve most of the current issues :) That'll teach me to be overzealous with posting patches.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496795#comment-13496795 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

+1 on Jon's comments. I think most of this is almost there
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesse Yates updated HBASE-6055:
-------------------------------

    Attachment: Snapshots in HBase.docx

Adding updated documentation - realized it fudged a couple things when doing the testing (thanks for the hints Matteo!)
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454168#comment-13454168 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

Just setup my github repo for a snapshots development branch: https://github.com/jyates/hbase/tree/snapshots

We can make it such that any of the future patches for snapshots (HBASE-6765, HBASE-6353, HBASE-6571, HBASE-6573) all go into this branch and then we just merge the branch into svn with 3 +1's from committers when its ready (as per the discussion here: http://search-hadoop.com/m/asM982C5FkS1/hbase+branch+git&subj=Thoughts+about+large+feature+dev+branches).

All reviews still go through reviewboard and will receive the same scrutiny, but get committed over on github until we want to roll it into trunk.

Thoughts? 
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-6055:
----------------------------------

    Comment: was deleted

(was: I will be out of the office on November 19 - 25, 2012 with limited access to email.  If this is a support issue please use the appropriate on-call procedures.[1] If this is an emergency please contact Greg Whitsitt.

[1] - http://wiki.cloud.cerner.corp/index.php/Big_Data/On_Call_Support

CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
)
    
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432228#comment-13432228 ] 

Andrew Purtell commented on HBASE-6055:
---------------------------------------

bq. Each of these abstractions/standalone pieces is going to moved to another jira and given their own RB review.

[~jesse_yates] So maybe we can use this JIRA as an umbrella for subtasks?
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-6055:
----------------------------------

    Priority: Blocker  (was: Major)
    
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>            Priority: Blocker
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408646#comment-13408646 ] 

Matteo Bertozzi commented on HBASE-6055:
----------------------------------------

@Jesse working on HBASE-6353, I've also switched to use protobuf (HMasterInterface was removed HBASE-6039).
Maybe you can use when rebasing on trunk (HBaseAdmin, MasterAdminProtocol, ...)
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288733#comment-13288733 ] 

Zhihong Ted Yu commented on HBASE-6055:
---------------------------------------

bq. The HLog will have edits from regions not relevant to the table's regions.
Over in HBASE-5699, each one of the multiple WALs can be devised to receive edits from single table.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289900#comment-13289900 ] 

gaojinchao commented on HBASE-6055:
-----------------------------------

Hi Jesse
I am considering the solution which don't use Hlog.   The way is only handling the memstore and asynchronous flush the memstore to Hfile. when the region server is down, we can finish flushing Hfile by replay editLog. Do  you think whether it is feasible or not?
If we can do, there are several relatively large benefits:
1. restore the snapshot is easier
2. We can achieve an incremental backup by HFile 
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502554#comment-13502554 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

[~jesse_yates] [~mbertozzi] What do you guys about closing this issue when offline makes it into trunk and having a separate umbrella issue for online-snapshots and variants?  If we'd do this we'd move some of the remaining subtasks to the new issue.

I'm also considering closing some of the existing subtasks and creating new issues with the simplified versions -- while they have similar purposes the design and implementation details are somewhat different.  comments concerns?


                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408390#comment-13408390 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

Pushed up initial review (based on HBASE-5547) on review board: https://reviews.apache.org/r/5817/
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464276#comment-13464276 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

bq. When you're talking about hfiles, you are referring to the log files right? I've a bit a of confusion reading your comment, bacause the log files are sequence files. anyway...

Oops, typing tired. Yeah, I mean hlogs the entire time.

{quote}
The logs in /hbase/.logs are splitted (new files are created in region/recover.edits) and if you look at HRegion.replayRecoveredEditsIfAny(), the content of recover.edits is removed as soon as the edits are applied. Removed, not archived. And this means that as soon as the table goes online, the snapshot doesn't have a way to read those files.

but as you've said, the original (full) log is still available during split, but moved to the archive (.oldlogs) as soon as the split is done.

This means that if you see files in recover.edits, you should have the full logs in /hbase/.logs folder. And you can keep a reference to them, as you do for the online snapshot
{quote}

Keeping all the logs in .oldlogs as well as .logs will cover a LOT more hlogs than are necessary to restore the table. Better would be just just reference all the files in the recovered.edits directory, but I worry that there will probably be some race conditions (especially in cases where a server is brought up and down multiple times). Easier just seems to be to remove the log file when when all the recovered.edits are finished. For instance, we could use the FileLink stuff Matteo is working on to ref-count that hlog and only delete it when the last 'reference' (or file derived from that hlog) is gone from the recovered.edits directory
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496573#comment-13496573 ] 

Matteo Bertozzi commented on HBASE-6055:
----------------------------------------

Current offline snapshot status: 
the code was up for review for a while now, and everything is at least +1
we're missing a couple of reviews to merge it in the snapshot branch.

|| Jira || Description || Status || Review Link ||
| HBASE-5547 | HFile Archiver | trunk | |
| HBASE-6610 | HFileLink hardlink alternative | trunk | |
| HBASE-6571 | Error handling framework | snapshot branch | [review board|https://reviews.apache.org/r/6589/] |
| HBASE-6765 | Take a Snapshot Interface | snapshot branch | [review board|https://reviews.apache.org/r/7072/] |
| HBASE-6230 | Snapshot Reference Utils | snapshot branch | [review board|https://reviews.apache.org/r/7788/] |
| HBASE-6353 | Snapshot Shell | snapshot-branch | [review board|https://reviews.apache.org/r/7583/] |
| HBASE-6863 | Offline Snapshot | review +2 | [review board|https://reviews.apache.org/r/7608/] |
| HBASE-6865 | Snapshot cleaner | review +2 | [review board|https://reviews.apache.org/r/7627] |
| HBASE-6777 | Restore Interface | review +1 | [review board|https://reviews.apache.org/r/7096] |
| HBASE-6230 | Restore Snapshot | review +1 | [review board|https://reviews.apache.org/r/5963/] |
| HBASE-6802 | Export Snapshot | review +1 | [review board|https://reviews.apache.org/r/7137/] |

The *reference snapshot branch* is: https://github.com/jyates/hbase/tree/snapshots
The "complete" dev branch with all commit above is: https://github.com/matteobertozzi/hbase/commits/offline-snapshot-review-v3
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500557#comment-13500557 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

I just +1'ed Matteo's restore changes, so I hope to roll that into the snapshot branch (https://github.com/jyates/hbase/tree/snapshots) in the next day or two, which would mean we have everything for offline snapshots.

The offline snapshots themselves are a self contained bit of code and sizable enough to make it worthwhile to roll it into trunk on its own. 

Let's keep the online keep on a branch until its wrapped up - I'll start taking a look at the online when I have a chance, and look forward to code posted on RB. The progress sounds real sweet Jon!


TL;DR lets do what Jon suggests above, especially as offline snapshots are basically done and ready to merge into trunk
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433602#comment-13433602 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

Created sub-tasks HBASE-6568, HBASE-6569, HBASE-6570, HBASE-6571, HBASE-6573 for each of the pieces for snapshots. Posting a new snapshots patch (based on all these patches - to apply the coming patch you will need to apply all the others first) to RB soon.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292952#comment-13292952 ] 

Matteo Bertozzi commented on HBASE-6055:
----------------------------------------

@Jon inline replies

{quote}
I issue a "snapshot" command at the shell/master.
* HBase creates a new .snapshot subdir, and it contains references to HLogs and HFiles. This is a "snapshot"
** This step is called: snapshotting, "taking a snapshot", and also materializing right?
{quote}

Yes, When you issue a "snapshot" command, hbase create a new .snapshot subdir containing references to hlog and hfile.
This is "taking a snapshot" or snapshotting... 
but not materialization, I think "materialization" is when you copy the hfiles/hlogs somewhere else...

{quote}
I currently have a snapshot. I want read-only access its contents to compare with the current table.
* Does HBase know how to interpret the stuff in a .snapshot dir such that it act like a read-only table?
* Do I, as an admin, need to execute some step to make it appear in HBase as a read-only table? (if so what is this called?)
{quote}
I think that the first point is more like a snapshot-scan... that scan the hfiles + hlog in the snapshot directory and show you the result...
The second point seems more like a "Restore on different table" and marking the table as readonly

{quote}
I currently have a snapshot. Oops! I accidentally truncated the table I had snapshotted. I don't want the truncated version of the table anymore and I want to replace the table with the snapshot so I have read write access.
* This is called "restoring" the snapshot right? (and I do this by issuing a something like "restore" command at the shell?)
* Does HBase copy or move the data referred to in the snapshot?
{quote}
"Restore" is when you replace your current table with the snapshot version, and you do it by "restore snapshot-name"
Yeah you need to copy the "old hfiles" to restore the snapshot (but maybe not every hfiles are removed from the current table)

{quote}
I currently have a snapshot. I want the current version but I'd like to clone of the snapshotted table that provides read/write access to the clone.
* Is/should this be supported?
* Is this called "restoring" or "exporting" the snapshot (to a new name)?
* For this to work I need to convert all references into actual copies of the HFiles and HLogs right? Is this conversion called exporting? (FYI, this is what I meant materializing to mean, but let's just stick to your definitions)
{quote}
Yeah this is really easy with HardLink... some more work is needed to keep track of references files
This is "Restore on a different table", "export" is when you're copying the .snapshot/name folder to another cluster...
If you think in term of HardLink you don't need to copy the hfiles but just doing an HardLink... more code is needed to use Reference Files but you can avoid the copy. (Note that HLog need to be replayed, so this is the only one that need to be copied.

{quote}
I currently have a snapshot. I want to send a copy of the snapshot to a remote cluster so that it can provide read/write access to the data.
Is/should this be supported?
* Do both HBase instances need to be up at the same time?
* This process would need to dereference the snapshot's references and copy them. What is it called? exporting?
{quote}
Yes this is "Import/Export" that besically is a distcp of the .snapshot/name folder
I Think that is enough having both hdfs up at the same time.
Yeah in this case you need to physically copy the hfiles.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286861#comment-13286861 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

I've recently had an existential crisis, of sorts, over snapshots. Triggered by both Jon's questions and some from Ian Varley, I've started to rethink the goal of snapshot. Initially, it was to take a globally consistent view of a single table. The question that Ian raised is, "Why are we enforcing stricter guarantees for a snapshot than for a scan?" In fact, a globally consistent view is something HBase explicitly doesn't support (if you do a put to two different tables, you have no real, system level guarantees of consistency). 

So does it really matter if we have an actual point in time? Everything in HBase is timestamped, which is considered the source of truth for a given Mutation. If we are doing a scan for the state of the table as of 12:15:05, we don't know if RS1 is 2 seconds before RS2 - as far as we care, its just the state at 12:15:05. 
 
This starts to break down a little bit when doing a Get for the latest version on a table. If RS1 is two seconds behind RS2 and we snapshot at 12:15:05, then we actually might not see all the change to RS1 in the snapshot. However, this doesn't really matter because you still wouldn't see that edit when looking at that "time". Things are happening so fast in HBase that the best we really need is just a "fuzzy" view of the state of the table.

The upside to this is we can do the snapshot _without taking any downtime_ on the table being snapshotted. I already discussed how to do this generally in the document, but it will have to be rewritten from the perspective of timestamped based snapshots (I'll move it to a google doc until we get a more finalized version).

The only problem that has jumped out in multiple discussions of the timestamp based approach is that if you are using the timestamp for something other than the time (ala Facebook Messages) you might not be able to make use of snapshots. At Salesforce, I was planning on abusing timestamps as well, so that consideration will be made in the implementation (I'll go over how in another post).

TL;DR global consistency doesn't matter for HBase since the timestamp is the source of truth - the only question is whether you believe the timestamp or not. I would posit that based on the design of HBase it has to be considered a source of truth.

I'll respond in a bit with a more detailed design of how timestamp based snapshots differ from the point-in-time design, but in everything except how to deal with the memstore and WAL, it _exactly the same_. The way to handle the memstore was suggested by Ian Varley in that we basically use the memstore snapshot stuff with some rejiggering to wait a certain amount of time; for the WAL we can just use the meta edits that Jon recommends and that I've at least talked about IRL (if not in text).
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454173#comment-13454173 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

@Jon yeah, that's the pain of RB, but you can just do 'git checkout HEAD~1; git diff trunk' to generate that parent patch - not too much overhead.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283894#comment-13283894 ] 

gaojinchao commented on HBASE-6055:
-----------------------------------

This is a very useful feature. :0

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288700#comment-13288700 ] 

Lars Hofhansl commented on HBASE-6055:
--------------------------------------

bq. IMO hardlinks with HBase snapshots is the way to go... HBSE-5547 is basically just a hack around hardlinks

Yep.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436380#comment-13436380 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

Current patch is up on RB, but is missing (1) validation of the snapshot on the master, (2) concurrent flushes during a timestamp-consistent snapshot and (3) concurrent compactions during a timestamp-consistent snapshot. That being said, these are relatively minor elements that can be rolled in after a majority of reviews.

I've currently got (2) working with unit tests and am hoping to push up a new version early next week with all 3 elements (so a complete implementation).
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288414#comment-13288414 ] 

Lars Hofhansl commented on HBASE-6055:
--------------------------------------

Now that I started to get a bit more familiar with HDFS I am wondering whether HDFS hardlinks (HDFS-3370) or even HDFS snapshots (HDFS-233, HDFS-2802) are not a better avenue. We are looking for data consistency here, which would be better tackled at the data layer.
Now... Both features are some ways off in HDFS (although we can probably push these forward), so doing something in HBase first is probably needed, but IMHO it should be something quick.
Lastly if we are considering this for backups HBASE-5547 should be a better (simpler) solution.

Not trying to derail anything here, just making sure we do not invest a lot of time in vain.

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288279#comment-13288279 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

One other place where bulk import can be expensive -- if we bulk import all into a single region, it would likely incur a compaction/split storm...
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437209#comment-13437209 ] 

Hudson commented on HBASE-6055:
-------------------------------

Integrated in HBase-TRUNK #3237 (See [https://builds.apache.org/job/HBase-TRUNK/3237/])
    HBASE-6055 Fix hfile/log cleaning delegate method naming (Revision 1374478)

     Result = SUCCESS
stack : 
Files : 
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/example/LongTermArchivingHFileCleaner.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/BaseLogCleanerDelegate.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/FileCleanerDelegate.java
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/TimeToLiveHFileCleaner.java
* /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/CheckedArchivingHFileCleaner.java

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463433#comment-13463433 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

I was going through the offline snapshot code (https://github.com/jyates/hbase/tree/offline-snapshots) and noticed that apparently I wrote the following:
{code}
Path editsdir = HLog.getRegionDirRecoveredEditsDir(HRegion.getRegionDir(tdir,regionInfo.getEncodedName()));
WALReferenceTask op = new WALReferenceTask(snapshot, this.monitor, editsdir, conf, fs, "disabledTableSnapshot");
{code}

For referencing the current hfiles for a disabled table, this makes no sense. However, it got me thinking about dealing with recovered edits for a table. Even if a table is disabled, it may have recovered edits that haven't been applied to the table (a RS comes up, splits the logs, but then dies again before replaying the split log). 

If I'm reading the log-splitting code correctly, I think it archives the original HLog after splitting, but not before the edits are applied to the region. This would mean we also need to reference the recovered.edits directory under each region, if we keep the current implementation...right?

I was thinking that instead we can keep the hfiles around in the .logs directory until the recovered.edits files for that log file have been replayed. This way we can avoid another task for snapshotting (referencing all the recovered edits) and keep everything simple fairly simple. There would need to be some extra work to keep track of the source hlog - either an 'info' file for the source hlog that lists the written recovered.edits files or special naming of the recovered.edits files that point back to the source file. 

Thoughts?
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesse Yates updated HBASE-6055:
-------------------------------

    Attachment:     (was: Snapshots in HBase.docx)
    
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282067#comment-13282067 ] 

Zhihong Yu commented on HBASE-6055:
-----------------------------------

Some files, such as RegionSnapshotPool, don't have license.

For RegionSnapshotOperation.java:
{code}
  public void setStatusMonitor(RegionSnapshotOperationStatus monitor) {
    this.setStatus(new RegionSnapshotStatus(monitor));
{code}
the method name doesn't seem to match its parameter.

Will post more comments later.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408512#comment-13408512 ] 

Zhihong Ted Yu commented on HBASE-6055:
---------------------------------------

Can you rebase to current trunk ?
{code}
|--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
|+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
--------------------------
File to patch: ^C
{code}
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502896#comment-13502896 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

If I get some +1's or no comments after Monday,  I'll update it.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286862#comment-13286862 ] 

Jonathan Hsieh edited comment on HBASE-6055 at 6/1/12 11:02 PM:
----------------------------------------------------------------

_(jon: I made a minor formatting tweak to make this easier to read the dir structure)_

But before a detailed description of how timestamp-based snapshots work internally, lets answer some comments!

@Jon: I'll add more info to the document to cover this stuff, but for the moment, lets just get it out there.

{quote}
What is the read mechanism for snapshots like? Does the snapshot act like a read-only table or is there some special external mechanism needed to read the data from a snapshot? You mention having to rebuild in-memory state by replaying wals – is this a recovery situation or needed in normal reads?
{quote}

Its almost, but not quite like a table. Read of a snapshot is going to require an external tool but after hooking up the snapshot via the external tool, it should act just like a real table. 

Snapshots are intended to happen as fast as possible, to minimize downtime for the table. To enable that, we are just creating reference files in the snapshot directory. My vision is that once you take a snapshot, at some point (maybe weekly), you export the snapshot to a backup area. In the export you actually do the copy of the referenced files - you do a direct scan of the HFile (avoiding the top-level interface and going right to HDFS) and the WAL files. Then when you want to read the snapshot, you can just bulk-import the HFIles and replay the WAL files (with the WALPlayer this is relatively easy) to rebuild the state of the table at the time of the snapshot. Its not an exact copy (META isn't preserved), but all the actual data is there.

The caveat here is since everything is references, one of the WAL files you reference may not actually have been closed (and therefore not readable). In the common case this won't happen, but if you snap and immediately export, its possible. In that case, you need to roll the WAL for the RS that haven't rolled them yet. However, this is in the export process, so a little latency there is tolerable, whereas avoiding this means adding latency to taking a snapshot  - bad news bears.

Keep in mind that the log files and hfiles will get regularly cleaned up. The former will be moved to the .oldlogs directory and periodically cleaned up and the latter get moved to the .archive directory (again with a parallel file hierarchy, as per HBASE-5547). If the snapshot goes to read the reference file, which tracks down to the original file and it doesn't find it, then it will need to lookup the same file in its respective archive directory. If its not there, then you are really hosed (except for the case mentioned in the doc about the WALs getting cleaned up by an aggressive log cleaner, which it is shown, is not a problem).

Haven't gotten around to implementing this yet, but it seems reasonable to finish up (and I think Matteo was interested in working on that part).

{quote}
What is a representation of a snapshot look like in terms of META and file system contents?
{quote}

The way I see the implementation in the end is just a bunch of files in the /hbase/.snapshot directory. Like I mentioned above, the layout is very similar to the layout of a table. 

Lets look at an example of a table named "stuff" (snapshot names need to be valid directory names - same as a table or CF) and has column "column" which is hosted on servers rs-1 and rs-2. Originally, the file system will look something like (with license taken on file names - its not exact, I know, this is just an example) :
{code}
/hbase/
	.logs/
		rs-1/
			WAL-rs1-1
			WAL-rs1-2
		rs-2/
			WAL-rs2-1
			WAL-rs2-2
	stuff/
		.tableinfo
		region1
			column
				region1-hfile-1
		region2
			column
				region2-hfile-1
{code}

The snapshot named "tuesday-at-nine", when completed, then just adds the following to the directory structure (or close enough):

{code}
	.snapshot/
		tuesday-at-nine/
			.tableinfo
			.snapshotinfo
			.logs
				rs-1/
				WAL-rs1-1.reference
				WAL-rs1-2.reference
			rs-2/
				WAL-rs2-1.reference
				WAL-rs2-2.reference
			stuff/
				.tableinfo
				region1
					column
						region1-hfile-1.reference
				region2
					column
						region2-hfile-1.reference
{code}

The only file here that isn't a reference here is the tableinfo since it is a pretty small file (generally), so a copy seemed more prudent over doing archiving on changes to the table info.

The original implementation updated META with file references to do hbase-level hard links for the HFiles. AFter getting the original implementation working, I'm going to be ripping this piece out in favor of just doing an HFile cleaner and cleaner delegates (similar to logs) and then have a snapshot cleaner that reads of the FS for file references. 

{quote}
At some point we may get called upon to repair these, I want to make sure there are enough breadcrumbs for this to be possible.
{quote}

How could that happen - hbase never has problems! (sarcasm)

{quote}
 - hlog roll (which I believe does not trigger a flush) instead of special meta hlog marker (this might avoid write unavailability, seems simpler that the mechanism I suggested)
{quote}

The hlog marker is what I'm planning on doing for the timestamped based snapshot, which is going to be far safer than doing an HLog roll and provide less latency. With the roll, you need to not take any writes to the memstore between the roll and the end of the snapshot (otherwise you will lose edits). Doing meta edits into the HLog allows you to keep edits and not worry about it.

{quote}
admin initiated snapshot and admin initiated restore operations as opposed to acting like a read only table. (not sure what happens to "newer" data after a restore, need to reread to see if it is in there, not sure about the cost to restore a snapshot)
{quote}

Yup, right now its all handled from HBaseAdmin. Matteo was interested in working on the restore stuff, but depending on timing, I may end up picking up that work when I get the taking of a snapshot working.  I think part of "snapshots" definitely includes getting back the state.

{quote}
I believe it also has an ability to read files directly from an MR job without having to go through HBase's get/put interface. Is that in scope for HBASE-6055?
{quote}

Absolutely in scope. It just didn't come up because I considered that part of the restore (which Matteo expressed interest). If you had to go through the high-level interface, then you would just use the procedure Lars talks about in his blog: http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html

The other notable change is that I'm building to support multiple snapshots concurrently. Its really a trivial change, so I don't think its too much feature creep, just a matter of using lists rather than a single item. 

{quote}
How does this buy your more consistency? Aren't we still inconsistent at the prepare point now instead? Can we just write the special snapshotting hlog entry at initiation of prepare, allowing writes to continue, then adding data elsewhere (META) to mark success in commit? We could then have some compaction/flush time logic cleanup failed atttempt markers?
{quote}

See the above comment about timestamp based vs. point in time and the former being all that's necessary for HBase. This means we don't take downtime and end up with a 'fuzzy' snapshot in terms of global consistency, but is exact in terms of HBase delivered timestamps.

The problem point-in-time snapshots overcomes is reaching distributed consensus while still trying to maintain availability and the ability to cross partitions. Since no one has figured out CAP and we are looking for consistency, we have to remove some availability to reach consensus. In this case, the agreement is over the state _of the entire table_, rather than per region server. 

Yes, this is strictly against the contract that we have on a Scan, but it is also in line with expectations people have on what a snapshot means. Any writes that are pending before the snapshot are allowed to commit, but any writes that reach the RS after the snapshot time cannot be included in the snapshot. I got a little overzealous in my reading of HBASE-50 and took it to mean global state, but after review the only way it would work within the constraints (no downtime) is to make it timestamp based.

But why can't we get global consistency without taking downtime?

Let's take your example of using an HLog edit to mark the start (and for ease, lets say the end as well - as long as its durable and recoverable, it doesn't matter if its WAL or META). 

Say we start a snapshot and send a message to all the RS (lets ignore ZK for the moment, to simplify things) that they should take a snapshot. So they write a marker into the HLog marking the start, create references as mentioned above, and then report to the master that they are done. When everyone is done, we then message each RS to commit the snapshot, which is just another entry into the WAL. Then in rebuilding the snapshot, they would just replay the WAL up to the start (assuming the end is found).

How do we know though which writes arrived first on each RS if we just dump a write into the WAL? Ok, so then we need to wait for the MVCC read number to roll forward to when we got the snapshot notification _before_ we can write an edit to the log - totally reasonable.

However, the problem arises in attempting to get a global state of the table in a high-volume write environment. We have no guarantee that the "snapshot commit" notification reached each of the RS at the same time. And even if it did reach them at the same time, maybe there was some latency in getting the write number. Or the switch was a little wonky, or it just finishing up a GC (I could go on). 

Then we have a case where we don't actually have the snapshot as of the commit, but rather "at commit, plus or minus a bit" - not a clean snapshot (if we don't care about being exact then we can do a much faster, lower potential latency solution, the discussion of which is still coming, I promise). In a system that can take millions of writes a second, that is still a non-trivial amount of data that can change in a few milliseconds, no longer a true 'point in time'.

The only way to get that global, consistent view is to remove the availability of the table for a short time so we know that the state is the same across all tables.

Say we start a snapshot and the start indication doesn't reach the servers and get started at _exactly the same time on all the servers_, which, as explained above, is very likely. Then we let the servers commit any outstanding writes,but they don't get to take any new writes or a short time. In this time while they are waiting for writes to commit, we can then do all the snapshot preparation (referencing, table info copying). Once we are ready for the snapshot, we report back to the master and wait for the commit step. In this time we are still not taking writes. The key here is that for that short time, none of the servers are taking writes and that allows us to get a single point in time that no writes are committing (but they do get buffered on the server, they just can't change the system state).

If we let writes commit, then how do we reach a state that we can agree on across all the servers? If you let the writes commit, you again don't have any assurances that the prepare or the commit message time is agreed to by all the servers. The table-level consistent state is somewhere between the prepare and commit, but it's not clear how one would find that point - I'm pretty sure we can't do this unless we have perfectly synchronized clocks, which is not really possible without a better understanding of quantum mechanics :)

Block writes is a perhaps a bad phrase in this situation. In the current implementation, it buffers the writes as threads into the server, blocking on the updateLock. However, we can go with a "semi-blocking" version: writes still complete, but they aren't going to be visible until we roll forward to the snapshot MVCC number. This lets the writers complete (not affecting latency), but is going to affect read-modify-write and reader-to-writer comparison latency. However, as soon as we roll forward the MVCC, all those writes become visible, essentially catching back up to the current state. A slight modification to the WAL edits will need to be made to write the MVCC number so we can keep track of which writes are in/out of a snapshot, but that _shouldn't_ be too hard (famous last words). You don't even need to modify all the WAL edits, just those made during the snapshot window, so the over the wire cost is still kept essentially the same, when amortized over the life of a table (for the standard use case).

I'm looking at doing this once I get the simple version working - one step at a time. Moving to the timestamp based approach lets us keep taking writes but does so at the cost of global consistency in favor of local consistency and still uses the _exact same infrastructure_. The first patch I'll actually put on RB will be the timestamp based, but let me get the stop the world version going before going down a rabbit hole.

The only thing we don't capture is if a writer makes a request to the RS before the snapshot is taken (by another client), but the write doesn't reach the server until after the RS hits the start barrier. From the global client perspective, this write should be in the snapshot, but that requires a single client or client-side write coordination (via a timestamp oracle). However, this is even worse coordination and creates even more constraints on the system where we currently have no coordination between clients (and I'm against adding any). So yes, we miss that edit, but that would be the case in a single-server database anyways without an external timestamp manager (to again distributed coordination between the client and server, though it can be done in a non-blocking manner). I'll mention some of this external coordination in the timestamp explanation.
                
      was (Author: jesse_yates):
    But before a detailed description of how timestamp-based snapshots work internally, lets answer some comments!

@Jon: I'll add more info to the document to cover this stuff, but for the moment, lets just get it out there.

{quote}
What is the read mechanism for snapshots like? Does the snapshot act like a read-only table or is there some special external mechanism needed to read the data from a snapshot? You mention having to rebuild in-memory state by replaying wals – is this a recovery situation or needed in normal reads?
{quote}

Its almost, but not quite like a table. Read of a snapshot is going to require an external tool but after hooking up the snapshot via the external tool, it should act just like a real table. 

Snapshots are intended to happen as fast as possible, to minimize downtime for the table. To enable that, we are just creating reference files in the snapshot directory. My vision is that once you take a snapshot, at some point (maybe weekly), you export the snapshot to a backup area. In the export you actually do the copy of the referenced files - you do a direct scan of the HFile (avoiding the top-level interface and going right to HDFS) and the WAL files. Then when you want to read the snapshot, you can just bulk-import the HFIles and replay the WAL files (with the WALPlayer this is relatively easy) to rebuild the state of the table at the time of the snapshot. Its not an exact copy (META isn't preserved), but all the actual data is there.

The caveat here is since everything is references, one of the WAL files you reference may not actually have been closed (and therefore not readable). In the common case this won't happen, but if you snap and immediately export, its possible. In that case, you need to roll the WAL for the RS that haven't rolled them yet. However, this is in the export process, so a little latency there is tolerable, whereas avoiding this means adding latency to taking a snapshot  - bad news bears.

Keep in mind that the log files and hfiles will get regularly cleaned up. The former will be moved to the .oldlogs directory and periodically cleaned up and the latter get moved to the .archive directory (again with a parallel file hierarchy, as per HBASE-5547). If the snapshot goes to read the reference file, which tracks down to the original file and it doesn't find it, then it will need to lookup the same file in its respective archive directory. If its not there, then you are really hosed (except for the case mentioned in the doc about the WALs getting cleaned up by an aggressive log cleaner, which it is shown, is not a problem).

Haven't gotten around to implementing this yet, but it seems reasonable to finish up (and I think Matteo was interested in working on that part).

{quote}
What is a representation of a snapshot look like in terms of META and file system contents?
{quote}

The way I see the implementation in the end is just a bunch of files in the /hbase/.snapshot directory. Like I mentioned above, the layout is very similar to the layout of a table. 

Lets look at an example of a table named "stuff" (snapshot names need to be valid directory names - same as a table or CF) and has column "column" which is hosted on servers rs-1 and rs-2. Originally, the file system will look something like (with license taken on file names - its not exact, I know, this is just an example) :
/hbase/
	.logs/
		rs-1/
			WAL-rs1-1
			WAL-rs1-2
		rs-2/
			WAL-rs2-1
			WAL-rs2-2
	stuff/
		.tableinfo
		region1
			column
				region1-hfile-1
		region2
			column
				region2-hfile-1

The snapshot named "tuesday-at-nine", when completed, then just adds the following to the directory structure (or close enough):

	.snapshot/
		tuesday-at-nine/
			.tableinfo
			.snapshotinfo
			.logs
				rs-1/
				WAL-rs1-1.reference
				WAL-rs1-2.reference
			rs-2/
				WAL-rs2-1.reference
				WAL-rs2-2.reference
			stuff/
				.tableinfo
				region1
					column
						region1-hfile-1.reference
				region2
					column
						region2-hfile-1.reference

The only file here that isn't a reference here is the tableinfo since it is a pretty small file (generally), so a copy seemed more prudent over doing archiving on changes to the table info.

The original implementation updated META with file references to do hbase-level hard links for the HFiles. AFter getting the original implementation working, I'm going to be ripping this piece out in favor of just doing an HFile cleaner and cleaner delegates (similar to logs) and then have a snapshot cleaner that reads of the FS for file references. 

{quote}
At some point we may get called upon to repair these, I want to make sure there are enough breadcrumbs for this to be possible.
{quote}

How could that happen - hbase never has problems! (sarcasm)

{quote}
 - hlog roll (which I believe does not trigger a flush) instead of special meta hlog marker (this might avoid write unavailability, seems simpler that the mechanism I suggested)
{quote}

The hlog marker is what I'm planning on doing for the timestamped based snapshot, which is going to be far safer than doing an HLog roll and provide less latency. With the roll, you need to not take any writes to the memstore between the roll and the end of the snapshot (otherwise you will lose edits). Doing meta edits into the HLog allows you to keep edits and not worry about it.

{quote}
admin initiated snapshot and admin initiated restore operations as opposed to acting like a read only table. (not sure what happens to "newer" data after a restore, need to reread to see if it is in there, not sure about the cost to restore a snapshot)
{quote}

Yup, right now its all handled from HBaseAdmin. Matteo was interested in working on the restore stuff, but depending on timing, I may end up picking up that work when I get the taking of a snapshot working.  I think part of "snapshots" definitely includes getting back the state.

{quote}
I believe it also has an ability to read files directly from an MR job without having to go through HBase's get/put interface. Is that in scope for HBASE-6055?
{quote}

Absolutely in scope. It just didn't come up because I considered that part of the restore (which Matteo expressed interest). If you had to go through the high-level interface, then you would just use the procedure Lars talks about in his blog: http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html

The other notable change is that I'm building to support multiple snapshots concurrently. Its really a trivial change, so I don't think its too much feature creep, just a matter of using lists rather than a single item. 

{quote}
How does this buy your more consistency? Aren't we still inconsistent at the prepare point now instead? Can we just write the special snapshotting hlog entry at initiation of prepare, allowing writes to continue, then adding data elsewhere (META) to mark success in commit? We could then have some compaction/flush time logic cleanup failed atttempt markers?
{quote}

See the above comment about timestamp based vs. point in time and the former being all that's necessary for HBase. This means we don't take downtime and end up with a 'fuzzy' snapshot in terms of global consistency, but is exact in terms of HBase delivered timestamps.

The problem point-in-time snapshots overcomes is reaching distributed consensus while still trying to maintain availability and the ability to cross partitions. Since no one has figured out CAP and we are looking for consistency, we have to remove some availability to reach consensus. In this case, the agreement is over the state _of the entire table_, rather than per region server. 

Yes, this is strictly against the contract that we have on a Scan, but it is also in line with expectations people have on what a snapshot means. Any writes that are pending before the snapshot are allowed to commit, but any writes that reach the RS after the snapshot time cannot be included in the snapshot. I got a little overzealous in my reading of HBASE-50 and took it to mean global state, but after review the only way it would work within the constraints (no downtime) is to make it timestamp based.

But why can't we get global consistency without taking downtime?

Let's take your example of using an HLog edit to mark the start (and for ease, lets say the end as well - as long as its durable and recoverable, it doesn't matter if its WAL or META). 

Say we start a snapshot and send a message to all the RS (lets ignore ZK for the moment, to simplify things) that they should take a snapshot. So they write a marker into the HLog marking the start, create references as mentioned above, and then report to the master that they are done. When everyone is done, we then message each RS to commit the snapshot, which is just another entry into the WAL. Then in rebuilding the snapshot, they would just replay the WAL up to the start (assuming the end is found).

How do we know though which writes arrived first on each RS if we just dump a write into the WAL? Ok, so then we need to wait for the MVCC read number to roll forward to when we got the snapshot notification _before_ we can write an edit to the log - totally reasonable.

However, the problem arises in attempting to get a global state of the table in a high-volume write environment. We have no guarantee that the "snapshot commit" notification reached each of the RS at the same time. And even if it did reach them at the same time, maybe there was some latency in getting the write number. Or the switch was a little wonky, or it just finishing up a GC (I could go on). 

Then we have a case where we don't actually have the snapshot as of the commit, but rather "at commit, plus or minus a bit" - not a clean snapshot (if we don't care about being exact then we can do a much faster, lower potential latency solution, the discussion of which is still coming, I promise). In a system that can take millions of writes a second, that is still a non-trivial amount of data that can change in a few milliseconds, no longer a true 'point in time'.

The only way to get that global, consistent view is to remove the availability of the table for a short time so we know that the state is the same across all tables.

Say we start a snapshot and the start indication doesn't reach the servers and get started at _exactly the same time on all the servers_, which, as explained above, is very likely. Then we let the servers commit any outstanding writes,but they don't get to take any new writes or a short time. In this time while they are waiting for writes to commit, we can then do all the snapshot preparation (referencing, table info copying). Once we are ready for the snapshot, we report back to the master and wait for the commit step. In this time we are still not taking writes. The key here is that for that short time, none of the servers are taking writes and that allows us to get a single point in time that no writes are committing (but they do get buffered on the server, they just can't change the system state).

If we let writes commit, then how do we reach a state that we can agree on across all the servers? If you let the writes commit, you again don't have any assurances that the prepare or the commit message time is agreed to by all the servers. The table-level consistent state is somewhere between the prepare and commit, but it's not clear how one would find that point - I'm pretty sure we can't do this unless we have perfectly synchronized clocks, which is not really possible without a better understanding of quantum mechanics :)

Block writes is a perhaps a bad phrase in this situation. In the current implementation, it buffers the writes as threads into the server, blocking on the updateLock. However, we can go with a "semi-blocking" version: writes still complete, but they aren't going to be visible until we roll forward to the snapshot MVCC number. This lets the writers complete (not affecting latency), but is going to affect read-modify-write and reader-to-writer comparison latency. However, as soon as we roll forward the MVCC, all those writes become visible, essentially catching back up to the current state. A slight modification to the WAL edits will need to be made to write the MVCC number so we can keep track of which writes are in/out of a snapshot, but that _shouldn't_ be too hard (famous last words). You don't even need to modify all the WAL edits, just those made during the snapshot window, so the over the wire cost is still kept essentially the same, when amortized over the life of a table (for the standard use case).

I'm looking at doing this once I get the simple version working - one step at a time. Moving to the timestamp based approach lets us keep taking writes but does so at the cost of global consistency in favor of local consistency and still uses the _exact same infrastructure_. The first patch I'll actually put on RB will be the timestamp based, but let me get the stop the world version going before going down a rabbit hole.

The only thing we don't capture is if a writer makes a request to the RS before the snapshot is taken (by another client), but the write doesn't reach the server until after the RS hits the start barrier. From the global client perspective, this write should be in the snapshot, but that requires a single client or client-side write coordination (via a timestamp oracle). However, this is even worse coordination and creates even more constraints on the system where we currently have no coordination between clients (and I'm against adding any). So yes, we miss that edit, but that would be the case in a single-server database anyways without an external timestamp manager (to again distributed coordination between the client and server, though it can be done in a non-blocking manner). I'll mention some of this external coordination in the timestamp explanation.
                  
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501387#comment-13501387 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

Here's a repo with the online branch merged to the snapshot branch with snapshot unit tests passing now.  

https://github.com/jmhsieh/hbase/tree/snapshots-online-merge
hash  d1299347c0c1afcc0264b14ee12beee170efc4c2

mvn clean
mvn test -Dtest=errorhandling/* -PlocalTests
mvn test -Dtest=Test*Procedure* -PlocalTests
mvn test -Dtest=Test*Snapshot*,snapshot/*,TestFSUtils -PlocalTests

It needs some cleanup (cleanup duplicate/commented code from merge) but patches of pieces will be coming out later today / tommorrow.

Likely pieces:  ( might break a few more of these down if they are excessive)
* External Exceptions + snapshot manager refactor
* Barrier Procedure
* online Timestamp snapshots tasks
* online timestampe snapshots + snapshot manager

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288696#comment-13288696 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

@Lars even with the HDFS patches, we should have a way to logically group/backup/restore snapshots. 

IMO hardlinks with HBase snapshots is the way to go, since we can do it with zero downtime, whereas HDFS snapshots require some (though admittedly small) downtime. HBSE-5547 is basically just a hack around hardlinks.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293657#comment-13293657 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

Hey Lars,

Sorry if it seems like I'm going overboard -- I'm trying to tease out consistent common definitions, and get an explicit high-level understanding of how the feature is supposed to be used from an user/admin point of view.  

I'm also trying to understand what is in scope and not (ex: making the snapshot act like a read-only table could be in scope,  restoring/replacing the original table should be, but restoring to another name could be deferred until hardlinks, the import/export stuff could be done using existing means.)

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesse Yates updated HBASE-6055:
-------------------------------

    Attachment: Snapshots in HBase.docx

Attaching longer document detailing snapshots. It has both a general overview and a walkthrough of implementation. Its complementary to the docs on HBASE-50 and the code on github.

In short, its distributed two-phase commit, where the prepare phase blocks all writes to all, enabled via a barrier node in zookeeper.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292852#comment-13292852 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

I still a bit confused -- still at the basic admin level.  I think it would help if we give the "restoring"/"export" parts some more attention and talk about usage as opposed to mechanism first.  I'm going to pose some use case/examples/scenarios which hopefully will be easier to discuss.  

Let's say I am an admin, and we are pre hdfs hardlinks.  

I issue a "snapshot" command at the shell/master.  
* HBase creates a new .snapshot subdir, and it contains references to HLogs and HFiles.  This is a "snapshot"
** This step is called: snapshotting, "taking a snapshot", and also materializing right?

I currently have a snapshot.  I want read-only access its contents to compare with the current table.
* Does HBase know how to interpret the stuff in a .snapshot dir such that it act like a read-only table?
* Do I, as an admin, need to execute some step to make it appear in HBase as a read-only table? (if so what is this called?)

I currently have a snapshot.  Oops! I accidentally truncated the table I had snapshotted.  I don't want the truncated version of the table anymore and I want to replace the table with the snapshot so I have read write access.
* This is called "restoring" the snapshot right? (and I do this by issuing a something like "restore" command at the shell?)
* Does HBase copy or move the data referred to in the snapshot? 

I currently have a snapshot.  I want the current version but I'd like to clone of the snapshotted table that provides read/write access to the clone.
* Is/should this be supported?
* Is this called "restoring" or "exporting" the snapshot (to a new name)?
* For this to work I need to convert all references into actual copies of the HFiles and HLogs right?  Is this conversion called exporting? (FYI, this is what I meant materializing to mean, but let's just stick to your definitions)

I currently have a snapshot.  I want to send a copy of the snapshot to a remote cluster so that it can provide read/write access to the data.  
* Is/should this be supported?
* Do both HBase instances need to be up at the same time? 
** This process would need to dereference the snapshot's references and copy them.  What is it called?  exporting?


----
Source of confusion

bq. Export is taking a snapshot from the .snapshot/ directory and possibly having a special snapshot distcp to somewhere. I would consider materialization as taking the exported snapshot and then 'hooking it back up' to another cluster (or the same) as a new table. You could throw materialization of the exported snapshot, but they are in fact distinct.

I think the first "materialization" is supposed to be "restoration" yeah?  I don't quite get the last sentence.

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432220#comment-13432220 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

I've working on a new version that abstracts out a lot of the different pieces in the snapshot that can be reused (both here and in general in HBase) and then reimplementing everything based on those abstractions. Each of these abstractions/standalone pieces is going to moved to another jira and given their own RB review. Hopefully this is going to makes reviewer's lives easier.

I currently have tests passing and hopefully, I'll have a new version (and child tickets) up on RB tomorrow.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288715#comment-13288715 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

A couple of definitions going forward:
 - materialization: the end result of taking a single snapshot, on the same cluster. It ends up in in the .snapshot/[snapshot_name] directory
 - export: sending the snapshot to another cluster or another part of the same cluster
 - restore: taking an exported snapshot and converting the snapshot into an active table.

{quote}
Hm.. how do you restore a snapshot from references files if it hasn't been scan/copied yet? Require scan/copy "materialization" of the snapshot first? (which means slower restore, but probably would likely be simplest for a first cut)
{quote}

Right now, you would do a M/R job to distcp over the files to another cluster or a backup part of your cluster. Since we are just storing references, the actual file copying will be necessary. This will be helped by using the actual "Reference" class for the HFiles (and currently being (mis)used for the WALs, but I don't think we actually need to keep the WALs  - I'll comment in the timestamp ticket). Since they are just reference files, you could just use the regular HFile reader to load them into another table.

{quote}
	Snapshot restore needs to be "transactional" like snapshotting right?
{quote}

Yeah, I guess. I don't really see this as a problem - just keep it to one restore at a time. But it would be all or nothing to get a table online.

{quote}
what is "export"? is this taking a snapshot or the materialization or the snapshot restore or something else?
{quote}

Export is taking a snapshot from the .snapshot/ directory and possibly having a special snapshot distcp to somewhere. I would consider materialization as taking the exported snapshot and then 'hooking it back up' to another cluster (or the same) as a new table. You could throw materialization of the exported snapshot, but they are in fact distinct.
{quote}
If we restore snapshots to the same hbase instance, in dir structure, you probably need .regioninfo files as well. (contains region startkey/endkey info necessary to reconsistute META later).
{quote}

+1 I'll make sure that gets in
{quote}
Is restoring to a separate instance in scope? If so bulk loads can be expensive – if regions don't line up there will be a bunch of spliting that happens. Again, keeping the regionsinfos and the snapshot's splits may be worthwhile.
{quote}

I'd say restore is part of this. Should be solved by having the region info. -1 for split/compact storms.

{quote}
Where do the materialized versions of the snapshot reference files end up? in the snapshot dirs? elsewhere?
{quote}

What do you mean materialized? After taking  snapshot, where do the snapshot files end up? In the .snapshot directory. See my earlier comments on the structure.

{quote}
This potentially gets a little trickier with markers as opposed to log rolls.
{quote}
If we do a log roll, its probably going to take a bit longer. Also, its not going to be applicable to the timestamp approach, since log rolling will necessitate doing some kind of locking, which we should avoid, where the markers will be much faster.

{quote}
The HLog will have edits from regions not relevant to the table's regions. Not a huge problem but maybe an optmization would be that the materialization step will do an "offline hlogsplit/flush" to just keep the data relevent to this table/region?
{quote}

+1, assuming we need the HLogs. I think there is a minimally impactful way to avoid this altogether.


                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288238#comment-13288238 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

Jesse,

Thanks for answering the questions.  A strong +1 for doing the simplest hbase timestamp-based approach first, and then looking into the more complicated version as an option afterwards.  Maybe start a sub issue with the point-in-time approach to move discussion there? (I still have questions there, might be better to ask there)

The main use case I care about is ability to quickly "snapshot" without downtime and quickly recover it (ideally with no downtime, but possibly with a short downtime window).  Although it is a "sloppy snapshot" conceptually it is pretty simple to define and I think the caveats are fairly well undestood.  I don't expect something with stronger consistency guarantees than what hbase currently offers but do expect something better (cheaper/faster) than the current closest thing which is a CopyTable.  

I have a bunch of new questions - some just asking for precision and some for clarification.  It might be helpful to define terms in the beginning of the doc so it stays consistent? 

- Hm.. how do you restore a snapshot from references files if it hasn't been scan/copied yet?  Require scan/copy "materialization" of the snapshot first?  (which means slower restore, but probably would likely be simplest for a first cut)
- Snapshot restore needs to be "transactional" like snapshotting right?
- what is "export"? is this taking a snapshot or the materialization or the snapshot restore or something else?
- If we restore snapshots to the same hbase instance, in dir structure, you probably need .regioninfo files as well. (contains region startkey/endkey info necessary to reconsistute META later).  
- Is restoring to a separate instance in scope?  If so bulk loads can be expensive -- if regions don't line up there will be a bunch of spliting that happens.  Again, keeping the regionsinfos and the snapshot's splits may be worthwhile.
- Where do the materialized versions of the snapshot reference files end up?  in the snapshot dirs? elsewhere?  
-- This potentially gets a little trickier with markers as opposed to log rolls.
-- The HLog will have edits from regions not relevant to the table's regions.  Not a huge problem but maybe an optmization would be that the materialization step will do an "offline hlogsplit/flush" to just keep the data relevent to this table/region?


                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500532#comment-13500532 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

Branch to a squashed version. 
https://github.com/jmhsieh/hbase/tree/snapshot-dev-squash

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jimmy Xiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281715#comment-13281715 ] 

Jimmy Xiang commented on HBASE-6055:
------------------------------------

I have a concern.  Why should we do two phases?  I think the prepare phase is not needed.  We have row level atomicity.  We don't need every region server to be on the same page.  Since it is distributed, it is arguable about the meaning of point-in-time. That means it is hard to say it is consistent/inconsistent point-in-time.

I think we each region server can try to create the snapshot at first.  If anyone fails, partial snapshot can be just deleted.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454169#comment-13454169 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

+1 Sounds good to me.  We might have to do the incremental reviews(generating a parent patch and then the main patch) to send up to review board but this should work.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500491#comment-13500491 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

I've reached a pretty decent point on this branch now refactoring and simplifying the online snapshot code.  Most of the work is in the plumbing, and most of the complexity remains in the plumbing.   Here is a link

https://github.com/jmhsieh/hbase/commits/snapshot-dev?

It isn't worth looking at the individual patches -- if interested take a look at the code generally one package at a time.  I'd suggest starting from errorhandling, then procedures since this work is fairly isolated and stable now.  The will likely affect existing code, and online may change significantly during the merge process.

Error handling:  o.a.h.h.errorhandling.*
* Added concept of ExternalException (an exception from an separate thread or process).
* Removing generics by funneling everything through an ExternalException
* Simplified Exception Propagation by only having a Dispatcher, Listener, and Checker. (No Visitors, Orchestrators, some Factories)
* Made Exception Serialization static so that instances don't need to be passed around.
* Added more meaningful usage and motivation documentation.

Procedure framework: o.a.h.h.procedure.*
* Separated Coordinator side from Member side
* Reduced the number of constructors (and fewer objects threaded throughout).
* Added concept of Procedure and Subprocedure -- these maintain state on each host. (this replaces just using strings everywhere).
* Folded several threads that used latch ping-pong into single threads.
* Renamed methods from 2pc nomenclature to barrier nomenclature.
* Added more meaningful usage and motivation documentation.

Online Snapshots: o.a.h.h.snapshot.*
* Converted per regionserver only Procedures to simpler Callable/Future fork join implementation.
* Removed different *ErrorHandlers and moved into Subprocedures. (this may be further eliminated)
* Each Procedure contains an ExternalExceptionDispatcher
* ExternalExceptions go to the SnapshotManager to abort the Procedure.

I'm in process of merging code into the offline snapshot branch.  It isn't clean but I'll be working on that for the next few days. (Clashes with updates in offline snapshot).   Once I get the snapshot branch compiling again, I'll start posting the External Exception and Procedure stuff as a series of patches.

My suggestion for the overall effort is to get the main offline snapshot branch code committed to the branch and then start looking into merging with trunk and 0.94.  The online work I feel should remain a branch until its pieces are fleshed out.

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279340#comment-13279340 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

Initial, very rough draft of the code is up on github: https://github.com/jyates/hbase/tree/snapshots. Latest commit message has the following description (which is a pretty good summary of the current status):
{quote}
This is based on the backup-hfile work in HBASE-5547, and is very different
from the actual proposal (and code samples) in HBASE-50. The overall layout is
more or less the same as the proposal, but they actual implementation has changed
due to changes in the actual code as well as the current conventions.

Currently, only 1 test is written and has not been tested (99% sure its not
going to pass). However, the meat of the implementation is complete. There is
still some work around listing and deleting of online snapshots and taking of
snapshots for offlined tables, but this is trivial compared
to the taking of snapshots for an online table and naturally falls out of the
current implementation.

Further, the export/import functionality for snapshots has not been completed, but will
probably (again) be very similar to the work in HBASE-50. Currently Matteo Bertozzi is
interested in working on this functionality, so I'm leaving that to him for the moment.

NOTES:
- I'm not very happy with the monitoring infrastructure I've put in place around
keeping track of the different tasks and propagating errors from snapshots failing
locally (from any of the various async threads) back to the rest of the nodes running
the snapshot and vice-versa. Its feels overly complex and seems to be refactored repeatedly.
Its on the list, but seemed not worth the effort of cleanup versus having something up.
{quote}

Planning on doing a writeup this weekend so people can actually have a chance at splunking through the changeset (which is non-trivial in size).
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290027#comment-13290027 ] 

gaojinchao commented on HBASE-6055:
-----------------------------------

Fine, Thanks, I will take some time for this feature.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496582#comment-13496582 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

I think we need to get all those committed to the branch, and then put up a doc about the features, what it provides, how it works, and why we chose particular semantics.  We also need to document its current caveats. 

We'll do some testing and probably need to do a little rebasing before we can consider a trunk merge.

We'll put a flag up here when we are ready but as a heads up, we'd like 1-2 more committers to review.  (Currently it is Me, Jesse, and Ted in some places).

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282047#comment-13282047 ] 

Zhihong Yu commented on HBASE-6055:
-----------------------------------

I checked out project from github:
{code}
# On branch snapshots
nothing to commit (working directory clean)
{code}
I got some compilation errors:
{code}
[ERROR] /Users/zhihyu/snapshots/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java:[102,0] org.apache.hadoop.hbase.master.MockRegionServer is not abstract and does not override abstract method getRootDir() in org.apache.hadoop.hbase.regionserver.RegionServerServices
[ERROR] 
[ERROR] /Users/zhihyu/snapshots/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionHFileArchiving.java:[340,12] cannot find symbol
[ERROR] symbol  : method waitForFlushesAndCompactions()
[ERROR] location: class org.apache.hadoop.hbase.regionserver.HRegion
[ERROR] 
[ERROR] /Users/zhihyu/snapshots/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionHFileArchiving.java:[426,12] cannot find symbol
[ERROR] symbol  : method waitForFlushesAndCompactions()
[ERROR] location: class org.apache.hadoop.hbase.regionserver.HRegion
[ERROR] 
[ERROR] /Users/zhihyu/snapshots/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionHFileArchiving.java:[507,12] cannot find symbol
[ERROR] symbol  : method waitForFlushesAndCompactions()
[ERROR] location: class org.apache.hadoop.hbase.regionserver.HRegion
[ERROR] 
[ERROR] /Users/zhihyu/snapshots/src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java:[45,7] org.apache.hadoop.hbase.util.MockRegionServerServices is not abstract and does not override abstract method getRootDir() in org.apache.hadoop.hbase.regionserver.RegionServerServices
{code}

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286429#comment-13286429 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

Thanks Jesse,

I haven't looked at code yet but I got a chance to read through the writeup in HBASE-50.  One thing I like about it is that it breaks out the operations, pretty clearly describes the zk and hdfs layouts so that an admin, a tool, or someone with info at this particular level could use the feature and inspect the exposed zk and hdfs state for debugging. 

>From the HBASE-50 doc (haven't looked at code yet), Li seems to have chosen or more clearly stated these design decisions:
- hlog roll (which I believe does not trigger a flush) instead of special meta hlog marker (this might avoid write unavailability, seems simpler that the mechanism I suggested) 
- admin initiated snapshot and admin initiated restore operations as opposed to acting like a read only table.  (not sure what happens to "newer" data after a restore, need to reread to see if it is in there, not sure about the cost to restore a snapshot)
- I believe it also has an ability to read files directly from an MR job without having to go through HBase's get/put interface.   Is that in scope for HBASE-6055?

Jon.

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288429#comment-13288429 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

I agreee that HDFS-3370, and HDFS-233 are likely a ways off.  Hardlinks would probably obviate the need for HBASE-5547.  We could probably take advantage of HDFS symlinks HDFS-245 which is in Hadoop 2.x.x hdfs but may not be in Hadoop 1.x.x hdfs.

I think that HBASE-5547 is a prereq for either consistency approach (even if we use symlinks) until we have hdfs hardlinks.  I'll take a closer look into HBASE-5547.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6055) Offline Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jesse Yates updated HBASE-6055:
-------------------------------

    Attachment: Offline Snapshots.docx

Attaching doc that covers a range of information on how offline snapshots and recovery work. It starts out talking a bit about the high-level of each feature, and then does a walk-through of offline snapshots, restore, clone and export.

This doc describes out general FS layouts, ordering of events, process ownership, and pending concerns.
                
> Offline Snapshots in HBase 0.96
> -------------------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>            Priority: Blocker
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Offline Snapshots.docx, Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-6055:
----------------------------------

    Fix Version/s: hbase-6055
    
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, snapshots, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0, hbase-6055
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284366#comment-13284366 ] 

gaojinchao commented on HBASE-6055:
-----------------------------------

Hi Jesse, Are you working this feature? I am interested in it.  I will study your code.
one question, When we are creating snapshots,  Do we need stop the balance?
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Micah Whitacre (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501391#comment-13501391 ] 

Micah Whitacre commented on HBASE-6055:
---------------------------------------

I will be out of the office on November 19 - 25, 2012 with limited access to email.  If this is a support issue please use the appropriate on-call procedures.[1] If this is an emergency please contact Greg Whitsitt.

[1] - http://wiki.cloud.cerner.corp/index.php/Big_Data/On_Call_Support

CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286862#comment-13286862 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

But before a detailed description of how timestamp-based snapshots work internally, lets answer some comments!

@Jon: I'll add more info to the document to cover this stuff, but for the moment, lets just get it out there.

{quote}
What is the read mechanism for snapshots like? Does the snapshot act like a read-only table or is there some special external mechanism needed to read the data from a snapshot? You mention having to rebuild in-memory state by replaying wals – is this a recovery situation or needed in normal reads?
{quote}

Its almost, but not quite like a table. Read of a snapshot is going to require an external tool but after hooking up the snapshot via the external tool, it should act just like a real table. 

Snapshots are intended to happen as fast as possible, to minimize downtime for the table. To enable that, we are just creating reference files in the snapshot directory. My vision is that once you take a snapshot, at some point (maybe weekly), you export the snapshot to a backup area. In the export you actually do the copy of the referenced files - you do a direct scan of the HFile (avoiding the top-level interface and going right to HDFS) and the WAL files. Then when you want to read the snapshot, you can just bulk-import the HFIles and replay the WAL files (with the WALPlayer this is relatively easy) to rebuild the state of the table at the time of the snapshot. Its not an exact copy (META isn't preserved), but all the actual data is there.

The caveat here is since everything is references, one of the WAL files you reference may not actually have been closed (and therefore not readable). In the common case this won't happen, but if you snap and immediately export, its possible. In that case, you need to roll the WAL for the RS that haven't rolled them yet. However, this is in the export process, so a little latency there is tolerable, whereas avoiding this means adding latency to taking a snapshot  - bad news bears.

Keep in mind that the log files and hfiles will get regularly cleaned up. The former will be moved to the .oldlogs directory and periodically cleaned up and the latter get moved to the .archive directory (again with a parallel file hierarchy, as per HBASE-5547). If the snapshot goes to read the reference file, which tracks down to the original file and it doesn't find it, then it will need to lookup the same file in its respective archive directory. If its not there, then you are really hosed (except for the case mentioned in the doc about the WALs getting cleaned up by an aggressive log cleaner, which it is shown, is not a problem).

Haven't gotten around to implementing this yet, but it seems reasonable to finish up (and I think Matteo was interested in working on that part).

{quote}
What is a representation of a snapshot look like in terms of META and file system contents?
{quote}

The way I see the implementation in the end is just a bunch of files in the /hbase/.snapshot directory. Like I mentioned above, the layout is very similar to the layout of a table. 

Lets look at an example of a table named "stuff" (snapshot names need to be valid directory names - same as a table or CF) and has column "column" which is hosted on servers rs-1 and rs-2. Originally, the file system will look something like (with license taken on file names - its not exact, I know, this is just an example) :
/hbase/
	.logs/
		rs-1/
			WAL-rs1-1
			WAL-rs1-2
		rs-2/
			WAL-rs2-1
			WAL-rs2-2
	stuff/
		.tableinfo
		region1
			column
				region1-hfile-1
		region2
			column
				region2-hfile-1

The snapshot named "tuesday-at-nine", when completed, then just adds the following to the directory structure (or close enough):

	.snapshot/
		tuesday-at-nine/
			.tableinfo
			.snapshotinfo
			.logs
				rs-1/
				WAL-rs1-1.reference
				WAL-rs1-2.reference
			rs-2/
				WAL-rs2-1.reference
				WAL-rs2-2.reference
			stuff/
				.tableinfo
				region1
					column
						region1-hfile-1.reference
				region2
					column
						region2-hfile-1.reference

The only file here that isn't a reference here is the tableinfo since it is a pretty small file (generally), so a copy seemed more prudent over doing archiving on changes to the table info.

The original implementation updated META with file references to do hbase-level hard links for the HFiles. AFter getting the original implementation working, I'm going to be ripping this piece out in favor of just doing an HFile cleaner and cleaner delegates (similar to logs) and then have a snapshot cleaner that reads of the FS for file references. 

{quote}
At some point we may get called upon to repair these, I want to make sure there are enough breadcrumbs for this to be possible.
{quote}

How could that happen - hbase never has problems! (sarcasm)

{quote}
 - hlog roll (which I believe does not trigger a flush) instead of special meta hlog marker (this might avoid write unavailability, seems simpler that the mechanism I suggested)
{quote}

The hlog marker is what I'm planning on doing for the timestamped based snapshot, which is going to be far safer than doing an HLog roll and provide less latency. With the roll, you need to not take any writes to the memstore between the roll and the end of the snapshot (otherwise you will lose edits). Doing meta edits into the HLog allows you to keep edits and not worry about it.

{quote}
admin initiated snapshot and admin initiated restore operations as opposed to acting like a read only table. (not sure what happens to "newer" data after a restore, need to reread to see if it is in there, not sure about the cost to restore a snapshot)
{quote}

Yup, right now its all handled from HBaseAdmin. Matteo was interested in working on the restore stuff, but depending on timing, I may end up picking up that work when I get the taking of a snapshot working.  I think part of "snapshots" definitely includes getting back the state.

{quote}
I believe it also has an ability to read files directly from an MR job without having to go through HBase's get/put interface. Is that in scope for HBASE-6055?
{quote}

Absolutely in scope. It just didn't come up because I considered that part of the restore (which Matteo expressed interest). If you had to go through the high-level interface, then you would just use the procedure Lars talks about in his blog: http://hadoop-hbase.blogspot.com/2012/04/timestamp-consistent-backups-in-hbase.html

The other notable change is that I'm building to support multiple snapshots concurrently. Its really a trivial change, so I don't think its too much feature creep, just a matter of using lists rather than a single item. 

{quote}
How does this buy your more consistency? Aren't we still inconsistent at the prepare point now instead? Can we just write the special snapshotting hlog entry at initiation of prepare, allowing writes to continue, then adding data elsewhere (META) to mark success in commit? We could then have some compaction/flush time logic cleanup failed atttempt markers?
{quote}

See the above comment about timestamp based vs. point in time and the former being all that's necessary for HBase. This means we don't take downtime and end up with a 'fuzzy' snapshot in terms of global consistency, but is exact in terms of HBase delivered timestamps.

The problem point-in-time snapshots overcomes is reaching distributed consensus while still trying to maintain availability and the ability to cross partitions. Since no one has figured out CAP and we are looking for consistency, we have to remove some availability to reach consensus. In this case, the agreement is over the state _of the entire table_, rather than per region server. 

Yes, this is strictly against the contract that we have on a Scan, but it is also in line with expectations people have on what a snapshot means. Any writes that are pending before the snapshot are allowed to commit, but any writes that reach the RS after the snapshot time cannot be included in the snapshot. I got a little overzealous in my reading of HBASE-50 and took it to mean global state, but after review the only way it would work within the constraints (no downtime) is to make it timestamp based.

But why can't we get global consistency without taking downtime?

Let's take your example of using an HLog edit to mark the start (and for ease, lets say the end as well - as long as its durable and recoverable, it doesn't matter if its WAL or META). 

Say we start a snapshot and send a message to all the RS (lets ignore ZK for the moment, to simplify things) that they should take a snapshot. So they write a marker into the HLog marking the start, create references as mentioned above, and then report to the master that they are done. When everyone is done, we then message each RS to commit the snapshot, which is just another entry into the WAL. Then in rebuilding the snapshot, they would just replay the WAL up to the start (assuming the end is found).

How do we know though which writes arrived first on each RS if we just dump a write into the WAL? Ok, so then we need to wait for the MVCC read number to roll forward to when we got the snapshot notification _before_ we can write an edit to the log - totally reasonable.

However, the problem arises in attempting to get a global state of the table in a high-volume write environment. We have no guarantee that the "snapshot commit" notification reached each of the RS at the same time. And even if it did reach them at the same time, maybe there was some latency in getting the write number. Or the switch was a little wonky, or it just finishing up a GC (I could go on). 

Then we have a case where we don't actually have the snapshot as of the commit, but rather "at commit, plus or minus a bit" - not a clean snapshot (if we don't care about being exact then we can do a much faster, lower potential latency solution, the discussion of which is still coming, I promise). In a system that can take millions of writes a second, that is still a non-trivial amount of data that can change in a few milliseconds, no longer a true 'point in time'.

The only way to get that global, consistent view is to remove the availability of the table for a short time so we know that the state is the same across all tables.

Say we start a snapshot and the start indication doesn't reach the servers and get started at _exactly the same time on all the servers_, which, as explained above, is very likely. Then we let the servers commit any outstanding writes,but they don't get to take any new writes or a short time. In this time while they are waiting for writes to commit, we can then do all the snapshot preparation (referencing, table info copying). Once we are ready for the snapshot, we report back to the master and wait for the commit step. In this time we are still not taking writes. The key here is that for that short time, none of the servers are taking writes and that allows us to get a single point in time that no writes are committing (but they do get buffered on the server, they just can't change the system state).

If we let writes commit, then how do we reach a state that we can agree on across all the servers? If you let the writes commit, you again don't have any assurances that the prepare or the commit message time is agreed to by all the servers. The table-level consistent state is somewhere between the prepare and commit, but it's not clear how one would find that point - I'm pretty sure we can't do this unless we have perfectly synchronized clocks, which is not really possible without a better understanding of quantum mechanics :)

Block writes is a perhaps a bad phrase in this situation. In the current implementation, it buffers the writes as threads into the server, blocking on the updateLock. However, we can go with a "semi-blocking" version: writes still complete, but they aren't going to be visible until we roll forward to the snapshot MVCC number. This lets the writers complete (not affecting latency), but is going to affect read-modify-write and reader-to-writer comparison latency. However, as soon as we roll forward the MVCC, all those writes become visible, essentially catching back up to the current state. A slight modification to the WAL edits will need to be made to write the MVCC number so we can keep track of which writes are in/out of a snapshot, but that _shouldn't_ be too hard (famous last words). You don't even need to modify all the WAL edits, just those made during the snapshot window, so the over the wire cost is still kept essentially the same, when amortized over the life of a table (for the standard use case).

I'm looking at doing this once I get the simple version working - one step at a time. Moving to the timestamp based approach lets us keep taking writes but does so at the cost of global consistency in favor of local consistency and still uses the _exact same infrastructure_. The first patch I'll actually put on RB will be the timestamp based, but let me get the stop the world version going before going down a rabbit hole.

The only thing we don't capture is if a writer makes a request to the RS before the snapshot is taken (by another client), but the write doesn't reach the server until after the RS hits the start barrier. From the global client perspective, this write should be in the snapshot, but that requires a single client or client-side write coordination (via a timestamp oracle). However, this is even worse coordination and creates even more constraints on the system where we currently have no coordination between clients (and I'm against adding any). So yes, we miss that edit, but that would be the case in a single-server database anyways without an external timestamp manager (to again distributed coordination between the client and server, though it can be done in a non-blocking manner). I'll mention some of this external coordination in the timestamp explanation.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285017#comment-13285017 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

@gaojinchao: I'm definitely still working on it - its just been a busy week, what with hbasecon, the hackathon and the rebase, this has been on the back burner. This week I'm planning to have a working first cut. Keep in mind that the code on github is a rough preview - definitely not the finished version, so no guarantees on polish or even correctness. That said, any feedback is appreciated.

@Jon - I'm working on a thorough response, thanks for the questions.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415410#comment-13415410 ] 

Zhihong Ted Yu commented on HBASE-6055:
---------------------------------------

Thanks for the hint, Jon.
I thought of that approach.

I recently looked up related classes in the patch using vi directly.
It would be nice if we can reduce the number of classes: controller, monitor, manager, sentinel, etc. It is hard to follow :-)

I have gone through about 2.5 pages of diff.
I can see there is more work to be done for Global snapshot.
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293482#comment-13293482 ] 

Lars Hofhansl commented on HBASE-6055:
--------------------------------------

Let's try to avoid going overboard here.
In principle snapshot and backup/restore are different and independent.

A snapshot generates a consistent snapshot of the data that can subsequently be copied conveniently somewhere else - thus creating a backup.

Ideally we would not even prescribe the backup/restore semantics here, but just provide missing building blocks.

Just my $0.02.

Another thought here is: In principle an HFile resulting from a major compaction could be considered a baseline copy and additional HFiles would be incremental changes on top of that baseline. It might be worth considering if we can make use of this ability of HBase to overlay changes from many sources into a single view of the data (would probably be tricky as regions are flushed in sync, etc, just waving hands here).

                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502563#comment-13502563 ] 

Ted Yu commented on HBASE-6055:
-------------------------------

In that case the title of this JIRA should signify offline snapshots, right ?
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jesse Yates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289907#comment-13289907 ] 

Jesse Yates commented on HBASE-6055:
------------------------------------

[~sunnygao] In the "stop the world" impementation, flushing the HFile is going to take too long. However, in the timestamp based approach time doesn't play as big a role (oh the irony!), so we can actually flush the HFiles and do what you are talking about. I'm most of the way through a writeup for how this would work, but have been a bit busy the last few days to post it - planning to have it up tomorrow in a sub-ticket (as Jon suggests). 
                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284532#comment-13284532 ] 

Jonathan Hsieh commented on HBASE-6055:
---------------------------------------

Jesse,

Thanks for the writeup, I find having a single doc with the design summary really helpful and ideally something we do for all major new features. I've read through the document carefully, let it steep for a few days, and had some design-level questions.  I've skimmed HBASE-50 and will read more of the history more carefully later this evening.  

What is the read mechanism for snapshots like?  Does the snapshot act like a read-only table or is there some special external mechanism needed to read the data from a snapshot?  You mention having to rebuild in-memory state by replaying wals -- is this a recovery situation or needed in normal reads?

What is a representation of a snapshot look like in terms of META and file system contents?  At some point we may get called upon to repair these, I want to make sure there are enough breadcrumbs for this to be possible.

I'm still thinking about the two-phase part -- I think it is necessary for marking success or initiating failure recovery, but I'm skeptical at the moment about why the barriering writes is necessary.  How does this buy your more consistency?  Aren't we still inconsistent at the prepare point now instead?   Can we just write the special snapshotting hlog entry at initiation of prepare, allowing writes to continue, then adding data elsewhere (META) to mark success in commit?  We could then have some compaction/flush time logic cleanup failed atttempt markers?




                
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6055) Snapshots in HBase 0.96

Posted by "Jonathan Hsieh (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hsieh updated HBASE-6055:
----------------------------------

    Component/s: snapshots
    
> Snapshots in HBase 0.96
> -----------------------
>
>                 Key: HBASE-6055
>                 URL: https://issues.apache.org/jira/browse/HBASE-6055
>             Project: HBase
>          Issue Type: New Feature
>          Components: client, master, regionserver, snapshots, zookeeper
>            Reporter: Jesse Yates
>            Assignee: Jesse Yates
>             Fix For: 0.96.0
>
>         Attachments: Snapshots in HBase.docx
>
>
> Continuation of HBASE-50 for the current trunk. Since the implementation has drastically changed, opening as a new ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira