You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by jg...@apache.org on 2010/07/28 01:08:07 UTC
svn commit: r979909 - in /hbase/branches/0.90_master_rewrite:
BRANCH_NOTES.txt pom.xml
Author: jgray
Date: Tue Jul 27 23:08:07 2010
New Revision: 979909
URL: http://svn.apache.org/viewvc?rev=979909&view=rev
Log:
HBASE-2692 Added BRANCH_NOTES design doc file
Added:
hbase/branches/0.90_master_rewrite/BRANCH_NOTES.txt
Modified:
hbase/branches/0.90_master_rewrite/pom.xml
Added: hbase/branches/0.90_master_rewrite/BRANCH_NOTES.txt
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/BRANCH_NOTES.txt?rev=979909&view=auto
==============================================================================
--- hbase/branches/0.90_master_rewrite/BRANCH_NOTES.txt (added)
+++ hbase/branches/0.90_master_rewrite/BRANCH_NOTES.txt Tue Jul 27 23:08:07 2010
@@ -0,0 +1,303 @@
+--------------------------------------------------------------------------------
+Master Rewrite Notes
+--------------------------------------------------------------------------------
+
+Region Transitions
+
+* Regions only transition in a limited set of circumstances, outlined below.
+
+ 1. Cluster Startup
+
+ During cluster startup, the master will know that it is a cluster startup
+ and do a bulk assignment. This should take HDFS block locations into
+ account, though this will likely be left off the initial master rewrite.
+
+ - Master startup determines whether this is startup or failover by counting
+ the number of RS nodes in ZK.
+
+ - Master waits for the minimum number of RS to be available to be assigned
+ regions.
+
+ - Master clears out anything in the /unassigned directory in ZK.
+
+ - Master randomly assigns out ROOT and then META.
+
+ - Master determines a bulk assignment plan via the LoadBalancer.
+
+ - Master stores the plan in the RegionManager / MasterPlanner.
+
+ - Master creates OFFLINE ZK nodes in /unassigned for every region.
+
+ - Master sends RPCs to each RS, telling them to OPEN their regions.
+
+ All special cluster startup logic ends here. More detail of how RSs handle
+ OPEN and CLOSE described for other cases.
+
+ So what can go wrong?
+
+ + We assume that the Master will not fail until after the OFFLINE nodes
+ have been created in ZK. RegionServers can fail at any time.
+
+ + If an RS fails at some point during this process, normal region
+ open/opening/opened handling will take care of it.
+
+ If the RS successfully opened a region, then it will be taken care of
+ in the normal RS failure handling.
+
+ If the RS did not successfully open a region, the RegionManager or
+ MasterPlanner will notice that the OFFLINE (or OPENING) node in ZK
+ has not been updated. This will trigger a re-assignment to a different
+ server. This logic is not special to startup, all assignments will
+ eventually time out if the destination server never proceeds.
+
+ + If the Master fails (after creating the ZK nodes), the failed-over
+ Master will see all of the regions in transition. It will handle it
+ in the same way any failed-over Master will handle existing regions in
+ transition.
+
+
+ 2. Load Balancing
+
+ Periodically, and when there are not any regions in transition, a load
+ balancer will run and move regions around to balance cluster load.
+
+ - Periodic timer expires initializing a load balance.
+
+ - Load balancer blocks until there are no regions in transition.
+
+ - Master determines a balancing plan via the LoadBalancer.
+
+ - Master stores the plan in the RegionManager / MasterPlanner.
+
+ - Master sends RPCs to the source RSs, telling them to CLOSE the regions.
+
+ That is it for the initial part of the load balance. Further steps will
+ be executed following event-triggers from ZK or timeouts if closes run too
+ long. It's not clear what to do in the case of a long-running CLOSE
+ besides ask again.
+
+ - RS receives CLOSE RPC, changes to CLOSING, and begins closing the region.
+
+ - Master sees that region is now CLOSING but does nothing.
+
+ - RS closes region and changes ZK node to CLOSED.
+
+ - Master sees that region is now CLOSED.
+
+ - Master looks at the plan for the specified region to figure out the
+ desired destination server.
+
+ - Master sends an RPC to the destination RS telling it to OPEN the region.
+
+ - RS receives OPEN RPC, changes to OPENING, and begins opening the region.
+
+ - Master sees that region is now OPENING but does nothing.
+
+ - RS opens region and changes ZK node to OPENED.
+
+ - Master sees that region is now OPENED.
+
+ - Master removes the region from all in-memory structures.
+
+ - Master deletes the OPENED node from ZK.
+
+ The Master or RSs can fail during this process. There is nothing special
+ about handling regions in transition due to load balancing so consult the
+ descriptions below for how this is handled.
+
+
+ 3. Table Enable/Disable
+
+ Users can enable and disable tables manually. This is done to make config
+ changes to tables, drop tables, etc...
+
+ Because all failover logic is designed to ensure assignment of all regions
+ in transition, these operations will not properly ride over Master or
+ RegionServer failures. Since these are client-triggered operations, this
+ should be okay for the initial master design. Moving forward, a special
+ node could be put in ZK to denote that a enable/disable has been requested.
+ Another option is to persist region movement plans into ZK instead of just
+ in-memory. In that case, an empty destination would signal that the region
+ should not be reopened after being closed.
+
+ DISABLE
+
+ - Client sends Master an RPC to disable a table.
+
+ - Master finds all regions of the table.
+
+ - Master stores the plan (do not re-open the regions once closed).
+
+ - Master sends RPCs to RSs to close all the regions of the table.
+
+ - RS receives CLOSE RPC, creates ZK node in CLOSING state, and begins
+ closing the region.
+
+ - Master sees that region is now CLOSING but does nothing.
+
+ - RS closes region and changes ZK node to CLOSED.
+
+ - Master sees that region is now CLOSED.
+
+ - Master looks at the plan for the specified region and sees that it should
+ not reopen.
+
+ - Master deletes the unassigned znode. It is no longer responsible for
+ ensuring assignment/availability of this region.
+
+ ENABLE
+
+ - Client sends Master an RPC to disable a table.
+
+ - Master finds all regions of the table.
+
+ - Master creates an unassigned node in an OFFLINE state for each region.
+
+ - Master sends RPCs to RSs to open all the regions of the table.
+
+ - RS receives OPEN RPC, transitions ZK node to OPENING state, and begins
+ opening the region.
+
+ - Master sees that region is now OPENING but does nothing.
+
+ - RS opens region and changes ZK node to OPENED.
+
+ - Master sees that region is now OPENED.
+
+ - Master deletes the unassigned znode.
+
+
+ 4. RegionServer Failure
+
+ - Master is alerted via ZK that an RS ephemeral node is gone.
+
+ - Master begins RS failure process.
+
+ - Master determines which regions need to be handled.
+
+ - Master in-memory state shows all regions currently assigned to the dead
+ RS.
+
+ - Master in-memory plans show any regions that were in transitioning to the
+ dead RS.
+
+ - With list of regions, Master now forces assignment of all regions to
+ other RSs.
+
+ - Master creates or force updates all existing ZK unassigned nodes to be
+ OFFLINE.
+
+ - Master sends RPCs to RSs to open all the regions.
+
+ - Normal operations from here on.
+
+ There are some complexities here.
+
+ For regions in transition that were somehow involved with the dead RS,
+ these could be in any of the 5 states in ZK.
+
+ OFFLINE Generate a new assignment and send an OPEN RPC.
+
+ CLOSING If the failed RS is the source, we overwrite the state to
+ OFFLINE, generate a new assignment, and send an OPEN RPC.
+
+ If the failed RS is the destination, we overwrite the state to
+ OFFLINE and send an OPEN RPC to the original destination.
+
+ If for some reason we don't have an existing plan (concurrent
+ Master failure), generate a new assignment and send an OPEN RPC.
+
+ CLOSED If the failed RS is the source, we can safely ignore this.
+ The normal ZK event handling should deal with this.
+
+ If the failed RS is the destination, we generate a new
+ assignment and send an OPEN RPC.
+
+ OPENING If the failed RS was the original source, ignore.
+ or OPENED
+ If the failed RS is the destination, we overwrite the state to
+ OFFLINE, generate a new assignment, and send an OPEN RPC.
+
+ In all of these cases, it is important to note that the transitions on the
+ RS side ensure only a single RS ever successfully completes a transition.
+ This is done by reading the current state, verifying it is expected, and
+ then issuing the update with the version number of the read value. If
+ multiple RSs are attempting this operation, exactly one can succeed.
+
+ 5. Master Failover
+
+ - Master initializes and finds out that he is a failed-over Master.
+
+ - Before Master starts up the normal handlers for region transitions he
+ grabs all nodes in /unassigned.
+
+ - If no regions are in transition, failover is done and he continues.
+
+ - If regions are in transition, each will be handled according to the
+ current region state in ZK.
+
+ - Before processing the regions in transition, the normal handlers start
+ to ensure we don't miss any transitions. The handling of opens on the
+ RS side ensures we don't dupe assign even if things have changed before
+ we finish acting on them.
+
+ OFFLINE Generate a new assignment and send an OPEN RPC.
+
+ CLOSING Nothing to be done. Normal handlers take care of timeouts.
+
+ CLOSED Generate a new assignment and send an OPEN RPC.
+
+ OPENING Nothing to be done. Normal handlers take care of timeouts.
+
+ OPENED Delete the node from ZK. Region was successfully opened but
+ the previous Master did not acknowledge it.
+
+ - Once this is done, everything further is dealt with as normal by the
+ RegionManager.
+
+* Given these different circumstances where regions will transition, there are
+ a limited set of ZK unassigned node transitions that are legitimate (we should
+ not be using createOrUpdate all over the place).
+
+ MASTER
+
+ 1. Master creates an unassigned node as OFFLINE.
+
+ - Cluster startup and table enabling.
+
+ 2. Master forces an existing unassigned node to OFFLINE.
+
+ - RegionServer failure.
+
+ - Allows transitions from all states to OFFLINE.
+
+ 3. Master deletes an unassigned node that was in a OPENED state.
+
+ - Normal region transitions. Besides cluster startup, no other deletions
+ of unassigned nodes is allowed.
+
+ 4. Master deletes all unassigned nodes regardless of state.
+
+ - Cluster startup before any assignment happens.
+
+ REGIONSERVER
+
+ 1. RegionServer creates an unassigned node as CLOSING.
+
+ - All region closes will do this in response to a CLOSE RPC from Master.
+
+ - A node can never be transitioned to CLOSING, only created.
+
+ 2. RegionServer transitions an unassigned node from CLOSING to CLOSED.
+
+ - Normal region closes. CAS operation.
+
+ 3. RegionServer transitions an unassigned node from OFFLINE to OPENING.
+
+ - All region opens will do this in response to an OPEN RPC from the Master.
+
+ - Normal region opens. CAS operation.
+
+ 4. RegionServer transitions an unassigned node from OPENING to OPENED.
+
+ - Normal region opens. CAS operation.
\ No newline at end of file
Modified: hbase/branches/0.90_master_rewrite/pom.xml
URL: http://svn.apache.org/viewvc/hbase/branches/0.90_master_rewrite/pom.xml?rev=979909&r1=979908&r2=979909&view=diff
==============================================================================
--- hbase/branches/0.90_master_rewrite/pom.xml (original)
+++ hbase/branches/0.90_master_rewrite/pom.xml Tue Jul 27 23:08:07 2010
@@ -3,7 +3,7 @@
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.apache.hbase</groupId>
- <artifactId>hbase</artifactId>
+ <artifactId>public-branch-master-rewrite</artifactId>
<packaging>jar</packaging>
<version>${hbase.version}</version>
<name>HBase</name>