You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2012/12/02 03:01:59 UTC

[jira] [Comment Edited] (HBASE-7212) Globally Barriered Procedure mechanism

    [ https://issues.apache.org/jira/browse/HBASE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508128#comment-13508128 ] 

Andrew Purtell edited comment on HBASE-7212 at 12/2/12 2:00 AM:
----------------------------------------------------------------

bq. The main questions I had when I was initially understanding the previous implementation was "Is this 2pc?" and "Do we need 2pc?". The answers are: what we have implemented here has two phases but is not true two-phase commit. 2pc, as defined in the literature (http://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf), requires that once the coordinator says something is committed, any failures at a member or coordinator must be recover by failing forward and completing it. The key point here is that while we will need a global barrier for one of the snapshot flavors (global), it don't need full 2PC because 1) the we don't need to undo work (like a log roll or flush) if some sub part of the first phase (our acquire/2pc's prepare) fails, and because 2) we don't need to recover failing forward if anything fails in the second phase (our release/2pc's commit). In the latter case we just fail and delete .snapshot/.tmp reminants in the fs, and carry on with extra flushed/rolled hlogs.

+1 

This makes a good case. I like the "keep it as simple as possible and only do as much as we actually need to" approach.

Edit: Moved unrelated comment to HBASE-7254
                
      was (Author: apurtell):
    bq. The main questions I had when I was initially understanding the previous implementation was "Is this 2pc?" and "Do we need 2pc?". The answers are: what we have implemented here has two phases but is not true two-phase commit. 2pc, as defined in the literature (http://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf), requires that once the coordinator says something is committed, any failures at a member or coordinator must be recover by failing forward and completing it. The key point here is that while we will need a global barrier for one of the snapshot flavors (global), it don't need full 2PC because 1) the we don't need to undo work (like a log roll or flush) if some sub part of the first phase (our acquire/2pc's prepare) fails, and because 2) we don't need to recover failing forward if anything fails in the second phase (our release/2pc's commit). In the latter case we just fail and delete .snapshot/.tmp reminants in the fs, and carry on with extra flushed/rolled hlogs.

+1 

This makes a good case. I like the "keep it as simple as possible and only do as much as we actually need to" approach.

I can see a use for this in security too. We could tighten up the permissions cache using a barrier for grant and revoke ops. In other words, replace the current ZK watcher based permissions cache "RPC via ZK" with the Procedure mechanism that provides much the same, but with the added benefit that we can fail the grant or revoke op if one or more RSes fail to ack the update.
                  
> Globally Barriered Procedure mechanism
> --------------------------------------
>
>                 Key: HBASE-7212
>                 URL: https://issues.apache.org/jira/browse/HBASE-7212
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-6055
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>             Fix For: hbase-6055
>
>         Attachments: 121127-global-barrier-proc.pdf, hbase-7212.patch, pre-hbase-7212.patch
>
>
> This is a simplified version of what was proposed in HBASE-6573.  Instead of claiming to be a 2pc or 3pc implementation (which implies logging at each actor, and recovery operations) this is just provides a best effort global barrier mechanism called a Procedure.  
> Users need only to implement a methods to acquireBarrier, to act when insideBarrier, and to releaseBarrier that use the ExternalException cooperative error checking mechanism.
> Globally consistent snapshots require the ability to quiesce writes to a set of region servers before a the snapshot operation is executed.  Also if any node fails, it needs to be able to notify them so that they abort.
> The first cut of other online snapshots don't need the fully barrier but may still use this for its error propagation mechanisms.
> This version removes the extra layer incurred in the previous implementation due to the use of generics, separates the coordinator and members, and reduces the amount of inheritance used in favor of composition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira