You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2017/05/23 18:38:05 UTC

[28/28] hbase git commit: HBASE-18087 Fix unit tests in TestTableFavoredNodes

    HBASE-18087 Fix unit tests in TestTableFavoredNodes

    AssignmentManager call getFavoredNodes() on LoadBalancer and expects non-empty result. Adding getFavoredNodes() method to FavoredStochasticBalancer fixes broken unit tests.

    Signed-off-by: Michael Stack <st...@apache.org>

    Fix CatalogTracker. Make it use Procedures doing clean up of Region
    data on split/merge. Without these changes, ITBLL was failing at
    larger scale (3-4hours 5B rows) because we were splitting split
    Regions.

    Added a bunch of doc. on Procedure primitives.

    Added new region-based state machine base class. Moved region-based
    state machines on to it.

    Found bugs in the way procedure locking was doing in a few of the
    region-based Procedures. Having them all have same subclass helps here.

    Added isSplittable and isMergeable to the Region Interface.

    Master would split/merge even though the Regions still had
    references. Fixed it so Master asks RegionServer if Region
    is splittable.

    Messing more w/ logging. Made all procedures log the same and report
    the state the same; helps when logging is regular.

    Rewrote TestCatalogTracker. Enabled TestMergeTableRegionProcedure.

    Added more functionality to MockMasterServices so can use it doing
    standalone testing of Procedures (made TestCatalogTracker use it
    instead of its own version).

    Trying to find who sets server and regionState to null around
    servercrashprocedure add DEBUG. Ditto for why we do a suspend
    though we have not done dispatch (on a retry....)

    Add to MasterServices ability to wait on Master being up -- makes
    it so can Mock Master and start to implement standalone split testing.
    Start in on a Split region standalone test in TestAM.

    Fix bug where a Split can fail because it comes in in the middle of
    a Move (by holding lock for duration of a Move).

        HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
        Move to a new AssignmentManager, one that describes Assignment using
        a State Machine built on top of ProcedureV2 facility.

        Includes four patches from Matteos' repository and then fix up to get it all to
        pass, filling in some missing functionality, fix of findbugs, fixing bugs, etc..

        This doc. keeps state on where we are at w/ the new AM:
        https://docs.google.com/document/d/1eVKa7FHdeoJ1-9o8yZcOTAQbv0u0bblBlCCzVSIn69g/edit#heading=h.vfdoxqut9lqn
        Includes list of tests disabled by this patch with reasons why.

        I applied the two patches in one go because applying each independently puts
        hbase in a non-working state.

        1. HBASE-14616 Procedure v2 - Replace the old AM with the new AM
        The basis comes from Matteo's repo here:
          https://github.com/matteobertozzi/hbase/commit/689227fcbfe8e6588433dbcdabf4526e3d478b2e

        Patch replaces old AM with the new under subpackage master.assignment.
        Mostly just updating classes to use new AM -- import changes -- rather
        than the old. It also removes old AM and supporting classes.
        See below for more detail.

        2. HBASE-14614 Procedure v2 - Core Assignment Manager (Matteo Bertozzi)
        Adds running of remote procedure. Adds batching of remote calls.
        Adds support for assign/unassign in procedures. Adds version info
        reporting in rpc. Adds start of an AMv2.

        3. and 4. are fixes around merge and split.

        This work mostly comes from:
        https://github.com/matteobertozzi/hbase/commit/3622cba4e331d2fc7bfc1932abb4c9cbf5802efa

        Reporting of remote RS version is from here:
        https://github.com/matteobertozzi/hbase/commit/ddb4df3964e8298c88c0210e83493aa91ac0942d.patch

        And remote dispatch of procedures is from:
        https://github.com/matteobertozzi/hbase/commit/186b9e7c4dae61a79509a6c3aad7f80ec61345e5

        The split merge patches from here are also melded in:
        https://github.com/matteobertozzi/hbase/commit/9a3a95a2c2974842a4849d1ad867e70764e7f707
        and https://github.com/matteobertozzi/hbase/commit/d6289307a02a777299f65238238a2a8af3253067

        Adds testing util for new AM and new sets of tests.

        Does a bunch of fixup on logging so its possible to follow a procedures'
        narrative by grepping procedure id. We spewed loads of log too on big
        transitions such as master fail; fixed.

        Details:

        M hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
         Takes List of regionstates on construction rather than a Set.
         NOTE!!!!! This is a change in a public class.

        M hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
         Add utility getShortNameToLog

        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
        M hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
         Add support for dispatching assign, split and merge processes.

        M hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
         Purge old overlapping states: PENDING_OPEN, PENDING_CLOSE, etc.

        A hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/RemoteProcedureDispatcher.java
         Dispatch remote procedures every 150ms or 32 items -- which ever
         happens first (configurable). Runs a timeout thread. This facility is
         not on yet; will come in as part of a later fix. Currently works a
         region at a time. This class carries notion of a remote procedure and of a buffer full of these.
         "hbase.procedure.remote.dispatcher.threadpool.size" with default = 128
         "hbase.procedure.remote.dispatcher.delay.msec" with default = 150ms
         "hbase.procedure.remote.dispatcher.max.queue.size" with default = 32

        M hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
         Add in support for merge. Remove no-longer used methods.

        M hbase-protocol-shaded/src/main/protobuf/Admin.proto b/hbase-protocol-shaded/src/main/protobuf/Admin.proto
         Add execute procedures call ExecuteProcedures.

        M hbase-protocol-shaded/src/main/protobuf/MasterProcedure.proto
         Add assign and unassign state support for procedures.

        M hbase-server/src/main/java/org/apache/hadoop/hbase/client/VersionInfoUtil.java
         Adds getting RS version out of RPC
         Examples: (1.3.4 is 0x0103004, 2.1.0 is 0x0201000)

        M hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
         Remove periodic metrics chore. This is done over in new AM now.
         Replace AM with the new.

        M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterMetaBootstrap.java
         Have AMv2 handle assigning meta.

        M hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java
         Extract version number of the server making rpc.

        A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignProcedure.java
         Add new assign procedure. Runs assign via Procedure Dispatch.
         There can only be one RegionTransitionProcedure per region running at the time,
         since each procedure takes a lock on the region.

        D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignCallable.java
        D hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        D hbase-server/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java
        D hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/GeneralBulkAssigner.java

        D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStateStore.java
        D hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
        D hbase-server/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java

        A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java
        A procedure-based AM (AMv2).

        TODO
         - handle region migration
         - handle meta assignment first
         - handle sys table assignment first (e.g. acl, namespace)
         - handle table priorities
          "hbase.assignment.bootstrap.thread.pool.size"; default size is 16.
          "hbase.assignment.dispatch.wait.msec"; default wait is 150
          "hbase.assignment.dispatch.wait.queue.max.size"; wait max default is 100
          "hbase.assignment.rit.chore.interval.msec"; default is 5 * 1000;
          "hbase.assignment.maximum.attempts"; default is 10;

        A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/MoveRegionProcedure.java
         Procedure that runs subprocedure to unassign and then assign to new location

        A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java
         Manage store of region state (in hbase:meta by default).

        A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java
         In-memory state of all regions. Used by AMv2.

        A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
         Base RIT procedure for Assign and Unassign.

        A hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
         Unassign procedure.

        A hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java
         Run region assignement in a manner that pays attention to target server version.
         Adds "hbase.regionserver.rpc.startup.waittime"; defaults 60 seconds.


Project: http://git-wip-us.apache.org/repos/asf/hbase/repo
Commit: http://git-wip-us.apache.org/repos/asf/hbase/commit/4143c017
Tree: http://git-wip-us.apache.org/repos/asf/hbase/tree/4143c017
Diff: http://git-wip-us.apache.org/repos/asf/hbase/diff/4143c017

Branch: refs/heads/HBASE-14614
Commit: 4143c0176ffd46030f89aad5730b72f8f7627ae6
Parents: ebe92c8
Author: Michael Stack <st...@apache.org>
Authored: Tue May 23 11:30:19 2017 -0700
Committer: Michael Stack <st...@apache.org>
Committed: Tue May 23 11:30:19 2017 -0700

----------------------------------------------------------------------
 .../org/apache/hadoop/hbase/ClusterStatus.java  |     8 +-
 .../org/apache/hadoop/hbase/HRegionInfo.java    |    14 +
 .../apache/hadoop/hbase/MetaTableAccessor.java  |    15 +-
 .../hbase/client/ConnectionImplementation.java  |    12 +
 .../client/ShortCircuitMasterConnection.java    |    12 +
 .../hadoop/hbase/ipc/NettyRpcDuplexHandler.java |     4 +-
 .../apache/hadoop/hbase/ipc/RpcConnection.java  |     6 +-
 .../apache/hadoop/hbase/master/RegionState.java |    22 +-
 .../hbase/shaded/protobuf/ProtobufUtil.java     |   122 +-
 .../hbase/shaded/protobuf/RequestConverter.java |    16 +-
 .../shaded/protobuf/ResponseConverter.java      |    13 -
 .../hbase/zookeeper/MetaTableLocator.java       |     7 +-
 .../org/apache/hadoop/hbase/ProcedureInfo.java  |     6 +-
 .../master/MetricsAssignmentManagerSource.java  |    20 +-
 .../MetricsAssignmentManagerSourceImpl.java     |    36 +-
 .../procedure2/AbstractProcedureScheduler.java  |    40 +-
 .../hadoop/hbase/procedure2/Procedure.java      |   205 +-
 .../hadoop/hbase/procedure2/ProcedureEvent.java |     2 +-
 .../hbase/procedure2/ProcedureExecutor.java     |   219 +-
 .../procedure2/ProcedureInMemoryChore.java      |     2 +-
 .../hbase/procedure2/ProcedureScheduler.java    |     3 +-
 .../procedure2/RemoteProcedureDispatcher.java   |   375 +
 .../hbase/procedure2/SequentialProcedure.java   |     9 +-
 .../hbase/procedure2/StateMachineProcedure.java |    28 +-
 .../procedure2/store/NoopProcedureStore.java    |     4 +-
 .../hbase/procedure2/store/ProcedureStore.java  |     3 +-
 .../procedure2/store/wal/ProcedureWALFile.java  |    18 +-
 .../store/wal/ProcedureWALFormatReader.java     |    38 +-
 .../procedure2/store/wal/WALProcedureStore.java |    55 +-
 .../hbase/procedure2/util/DelayedUtil.java      |     6 +-
 .../protobuf/generated/AccessControlProtos.java |   102 +-
 .../shaded/protobuf/generated/AdminProtos.java  | 17815 ++++++++++-------
 .../generated/MasterProcedureProtos.java        |  9343 +++++++--
 .../shaded/protobuf/generated/MasterProtos.java |  7786 ++++---
 .../shaded/protobuf/generated/QuotaProtos.java  |   134 +-
 .../generated/RegionServerStatusProtos.java     |  1679 +-
 .../protobuf/generated/SnapshotProtos.java      |    22 +-
 .../src/main/protobuf/Admin.proto               |    51 +-
 .../src/main/protobuf/Master.proto              |    37 +
 .../src/main/protobuf/MasterProcedure.proto     |   117 +-
 .../src/main/protobuf/RegionServerStatus.proto  |    26 -
 .../hbase/rsgroup/RSGroupAdminServer.java       |    13 +-
 .../hbase/rsgroup/RSGroupBasedLoadBalancer.java |     9 +-
 .../balancer/TestRSGroupBasedLoadBalancer.java  |     2 +-
 .../hadoop/hbase/rsgroup/TestRSGroups.java      |    16 +-
 .../hbase/rsgroup/TestRSGroupsOfflineMode.java  |     3 +-
 .../master/AssignmentManagerStatusTmpl.jamon    |    51 +-
 .../hbase/tmpl/master/MasterStatusTmpl.jamon    |     2 +-
 .../hadoop/hbase/RegionStateListener.java       |     7 +-
 .../org/apache/hadoop/hbase/SplitLogTask.java   |     4 +
 .../hadoop/hbase/backup/HFileArchiver.java      |    15 +-
 .../hadoop/hbase/client/VersionInfoUtil.java    |    81 +-
 .../hbase/coprocessor/RegionObserver.java       |    22 +-
 .../org/apache/hadoop/hbase/ipc/CallRunner.java |     9 +-
 .../apache/hadoop/hbase/ipc/RpcExecutor.java    |     5 +-
 .../hadoop/hbase/ipc/SimpleRpcServer.java       |    16 +-
 .../hadoop/hbase/master/AssignCallable.java     |    49 -
 .../hadoop/hbase/master/AssignmentManager.java  |  3053 ---
 .../hadoop/hbase/master/BulkAssigner.java       |   122 -
 .../apache/hadoop/hbase/master/BulkReOpen.java  |   136 -
 .../hadoop/hbase/master/CatalogJanitor.java     |   101 +-
 .../apache/hadoop/hbase/master/DeadServer.java  |     6 +-
 .../hbase/master/GeneralBulkAssigner.java       |   213 -
 .../org/apache/hadoop/hbase/master/HMaster.java |   209 +-
 .../hadoop/hbase/master/LoadBalancer.java       |     4 +-
 .../hbase/master/MasterCoprocessorHost.java     |    22 +
 .../hadoop/hbase/master/MasterDumpServlet.java  |     8 +-
 .../hbase/master/MasterMetaBootstrap.java       |    61 +-
 .../hadoop/hbase/master/MasterRpcServices.java  |   109 +-
 .../hadoop/hbase/master/MasterServices.java     |    30 +
 .../hadoop/hbase/master/MasterWalManager.java   |    17 +-
 .../hbase/master/MetricsAssignmentManager.java  |    39 +-
 .../hbase/master/NoSuchProcedureException.java  |    33 +
 .../apache/hadoop/hbase/master/RegionPlan.java  |     4 +-
 .../hadoop/hbase/master/RegionStateStore.java   |   268 -
 .../hadoop/hbase/master/RegionStates.java       |  1170 --
 .../hadoop/hbase/master/ServerManager.java      |    85 +-
 .../hadoop/hbase/master/SplitLogManager.java    |     2 +-
 .../hbase/master/TableNamespaceManager.java     |     5 +-
 .../hadoop/hbase/master/TableStateManager.java  |     3 +-
 .../hadoop/hbase/master/UnAssignCallable.java   |    47 -
 .../master/assignment/AssignProcedure.java      |   338 +
 .../master/assignment/AssignmentManager.java    |  1814 ++
 .../FailedRemoteDispatchException.java          |    33 +
 .../assignment/GCMergedRegionsProcedure.java    |   170 +
 .../master/assignment/GCRegionProcedure.java    |   154 +
 .../assignment/MergeTableRegionsProcedure.java  |   776 +
 .../master/assignment/MoveRegionProcedure.java  |   145 +
 .../master/assignment/RegionStateStore.java     |   327 +
 .../hbase/master/assignment/RegionStates.java   |   969 +
 .../assignment/RegionTransitionProcedure.java   |   381 +
 .../assignment/SplitTableRegionProcedure.java   |   733 +
 .../master/assignment/UnassignProcedure.java    |   247 +
 .../hadoop/hbase/master/assignment/Util.java    |    60 +
 .../hbase/master/balancer/BaseLoadBalancer.java |    33 +-
 .../balancer/FavoredStochasticBalancer.java     |    11 +-
 .../master/balancer/RegionLocationFinder.java   |    14 +-
 .../master/balancer/SimpleLoadBalancer.java     |     9 +-
 .../master/balancer/StochasticLoadBalancer.java |     8 +-
 .../AbstractStateMachineNamespaceProcedure.java |     3 +-
 .../AbstractStateMachineRegionProcedure.java    |   118 +
 .../AbstractStateMachineTableProcedure.java     |    14 +-
 .../procedure/AddColumnFamilyProcedure.java     |    31 +-
 .../procedure/CloneSnapshotProcedure.java       |     4 +-
 .../master/procedure/CreateTableProcedure.java  |    41 +-
 .../procedure/DeleteColumnFamilyProcedure.java  |    31 +-
 .../master/procedure/DeleteTableProcedure.java  |    12 +-
 .../master/procedure/DisableTableProcedure.java |   156 +-
 .../DispatchMergingRegionsProcedure.java        |   584 +
 .../master/procedure/EnableTableProcedure.java  |   172 +-
 .../procedure/MasterDDLOperationHelper.java     |    93 +-
 .../procedure/MasterProcedureConstants.java     |     2 +-
 .../master/procedure/MasterProcedureEnv.java    |    30 +-
 .../procedure/MasterProcedureScheduler.java     |    19 +-
 .../procedure/MergeTableRegionsProcedure.java   |   906 -
 .../procedure/ModifyColumnFamilyProcedure.java  |    30 +-
 .../master/procedure/ModifyTableProcedure.java  |    30 +-
 .../master/procedure/ProcedureSyncWait.java     |   146 +-
 .../master/procedure/RSProcedureDispatcher.java |   541 +
 .../procedure/RestoreSnapshotProcedure.java     |    27 +-
 .../master/procedure/ServerCrashException.java  |    46 +
 .../master/procedure/ServerCrashProcedure.java  |   587 +-
 .../procedure/SplitTableRegionProcedure.java    |   785 -
 .../procedure/TableProcedureInterface.java      |     3 +-
 .../procedure/TruncateTableProcedure.java       |     6 +-
 .../apache/hadoop/hbase/mob/MobFileCache.java   |     4 +-
 .../hbase/namespace/NamespaceAuditor.java       |    10 +-
 .../hbase/namespace/NamespaceStateManager.java  |     5 +-
 .../hadoop/hbase/quotas/MasterQuotaManager.java |     8 +-
 .../hadoop/hbase/regionserver/CompactSplit.java |   723 +
 .../hbase/regionserver/CompactSplitThread.java  |   695 -
 .../regionserver/CompactedHFilesDischarger.java |    77 +-
 .../hadoop/hbase/regionserver/HRegion.java      |    92 +-
 .../hbase/regionserver/HRegionFileSystem.java   |     4 +-
 .../hbase/regionserver/HRegionServer.java       |   134 +-
 .../hbase/regionserver/RSRpcServices.java       |   120 +-
 .../hadoop/hbase/regionserver/Region.java       |     8 +
 .../hbase/regionserver/RegionMergeRequest.java  |   108 +
 .../regionserver/RegionServerServices.java      |    10 -
 .../hbase/regionserver/RegionUnassigner.java    |     5 +-
 .../hadoop/hbase/regionserver/SplitRequest.java |    91 +-
 .../handler/CloseRegionHandler.java             |     2 +-
 .../org/apache/hadoop/hbase/util/HBaseFsck.java |     2 +-
 .../hadoop/hbase/util/ModifyRegionUtils.java    |    24 +-
 .../apache/hadoop/hbase/wal/WALSplitter.java    |    16 +-
 .../hbase/zookeeper/RegionServerTracker.java    |     4 +-
 .../hadoop/hbase/HBaseTestingUtility.java       |    16 +-
 .../hadoop/hbase/MockRegionServerServices.java  |    10 -
 .../hadoop/hbase/TestRegionRebalancing.java     |    16 +-
 .../apache/hadoop/hbase/client/TestAdmin1.java  |    20 +-
 .../apache/hadoop/hbase/client/TestAdmin2.java  |     4 +-
 .../hbase/client/TestAsyncRegionAdminApi.java   |    51 +-
 .../client/TestAsyncTableGetMultiThreaded.java  |     2 +
 ...ableGetMultiThreadedWithBasicCompaction.java |     6 +-
 ...ableGetMultiThreadedWithEagerCompaction.java |     6 +-
 .../client/TestBlockEvictionFromClient.java     |     2 +
 .../hadoop/hbase/client/TestEnableTable.java    |    34 +-
 .../hbase/client/TestFromClientSide3.java       |     2 +
 .../org/apache/hadoop/hbase/client/TestHCM.java |   142 +-
 .../hbase/client/TestMetaWithReplicas.java      |    37 +-
 .../client/TestScannersFromClientSide.java      |    30 +-
 .../hbase/client/TestServerBusyException.java   |   234 +
 .../client/TestSnapshotCloneIndependence.java   |     2 +-
 .../hbase/client/TestSplitOrMergeStatus.java    |   119 +-
 .../hbase/client/TestTableFavoredNodes.java     |    43 +-
 .../coprocessor/TestIncrementTimeRange.java     |     5 +-
 .../hbase/ipc/TestSimpleRpcScheduler.java       |     7 +-
 .../mapreduce/TestLoadIncrementalHFiles.java    |     2 +-
 .../hbase/master/MockNoopMasterServices.java    |    22 +-
 .../hadoop/hbase/master/MockRegionServer.java   |    37 +-
 .../hbase/master/TestAssignmentListener.java    |     1 +
 .../master/TestAssignmentManagerOnCluster.java  |  1403 --
 .../hadoop/hbase/master/TestCatalogJanitor.java |   591 +-
 .../master/TestDistributedLogSplitting.java     |     1 +
 .../apache/hadoop/hbase/master/TestMaster.java  |     1 +
 .../master/TestMasterBalanceThrottling.java     |    11 +-
 .../hadoop/hbase/master/TestMasterFailover.java |    31 +-
 .../hadoop/hbase/master/TestMasterMetrics.java  |     8 +-
 .../TestMasterOperationsForRegionReplicas.java  |    10 +-
 .../hbase/master/TestMasterStatusServlet.java   |    58 +-
 .../hbase/master/TestMasterWalManager.java      |     2 +-
 .../hbase/master/TestMetaShutdownHandler.java   |     1 +
 .../hadoop/hbase/master/TestRegionState.java    |    17 +-
 .../hadoop/hbase/master/TestRegionStates.java   |   144 -
 .../hadoop/hbase/master/TestRestartCluster.java |     8 +-
 .../hadoop/hbase/master/TestWarmupRegion.java   |    14 +-
 .../assignment/AssignmentTestingUtil.java       |   125 +
 .../master/assignment/MockMasterServices.java   |   358 +
 .../assignment/TestAssignmentManager.java       |   750 +
 .../assignment/TestAssignmentOnRSCrash.java     |   185 +
 .../TestMergeTableRegionsProcedure.java         |   260 +
 .../master/assignment/TestRegionStates.java     |   224 +
 .../TestSplitTableRegionProcedure.java          |   428 +
 .../TestFavoredStochasticLoadBalancer.java      |    27 +-
 .../TestSimpleRegionNormalizerOnCluster.java    |     3 +-
 ...ProcedureSchedulerPerformanceEvaluation.java |     2 +-
 .../MasterProcedureTestingUtility.java          |    67 +-
 .../procedure/TestAddColumnFamilyProcedure.java |    34 +-
 .../procedure/TestCloneSnapshotProcedure.java   |     8 +-
 .../procedure/TestCreateNamespaceProcedure.java |     4 +-
 .../procedure/TestCreateTableProcedure.java     |    46 +-
 .../TestDeleteColumnFamilyProcedure.java        |    31 +-
 .../procedure/TestDeleteNamespaceProcedure.java |     4 +-
 .../procedure/TestDeleteTableProcedure.java     |    21 +-
 .../procedure/TestDisableTableProcedure.java    |    24 +-
 .../procedure/TestEnableTableProcedure.java     |    24 +-
 .../TestMasterFailoverWithProcedures.java       |    23 +-
 .../procedure/TestMasterProcedureEvents.java    |     2 +-
 .../procedure/TestMasterProcedureScheduler.java |    20 +-
 .../TestModifyColumnFamilyProcedure.java        |     9 +-
 .../procedure/TestModifyNamespaceProcedure.java |     4 +-
 .../procedure/TestModifyTableProcedure.java     |    18 +-
 .../master/procedure/TestProcedureAdmin.java    |    12 +-
 .../procedure/TestRestoreSnapshotProcedure.java |    12 +-
 .../procedure/TestServerCrashProcedure.java     |   115 +-
 .../TestSplitTableRegionProcedure.java          |   420 -
 .../procedure/TestTableDDLProcedureBase.java    |     7 +-
 .../procedure/TestTruncateTableProcedure.java   |    11 +-
 .../hbase/namespace/TestNamespaceAuditor.java   |   184 +-
 .../procedure/SimpleMasterProcedureManager.java |     2 +-
 .../regionserver/TestCompactSplitThread.java    |    21 +-
 .../hbase/regionserver/TestCompaction.java      |    10 +-
 .../TestCorruptedRegionStoreFile.java           |     5 +
 .../regionserver/TestHRegionFileSystem.java     |     6 +-
 .../TestRegionMergeTransactionOnCluster.java    |    54 +-
 .../regionserver/TestRegionServerMetrics.java   |    40 +-
 .../TestSplitTransactionOnCluster.java          |   140 +-
 .../hbase/regionserver/wal/TestLogRolling.java  |     5 +
 .../wal/TestSecureAsyncWALReplay.java           |     5 +
 .../hbase/regionserver/wal/TestWALReplay.java   |     8 +-
 .../security/access/TestAccessController3.java  |    14 +-
 .../hadoop/hbase/util/BaseTestHBaseFsck.java    |     4 +-
 .../hadoop/hbase/util/TestHBaseFsckMOB.java     |     2 +-
 .../hadoop/hbase/util/TestHBaseFsckOneRS.java   |   118 +-
 .../hbase/util/TestHBaseFsckReplicas.java       |     4 +-
 .../hadoop/hbase/util/TestHBaseFsckTwoRS.java   |    23 +-
 .../util/hbck/TestOfflineMetaRebuildBase.java   |     3 +-
 .../util/hbck/TestOfflineMetaRebuildHole.java   |     2 +
 .../hbck/TestOfflineMetaRebuildOverlap.java     |     2 +
 239 files changed, 38178 insertions(+), 26166 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
index c51a437..95d77a2 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
@@ -22,7 +22,7 @@ package org.apache.hadoop.hbase;
 import java.util.Arrays;
 import java.util.Collection;
 import java.util.Collections;
-import java.util.Set;
+import java.util.List;
 import java.util.Map;
 
 import org.apache.hadoop.hbase.classification.InterfaceAudience;
@@ -67,7 +67,7 @@ public class ClusterStatus extends VersionedWritable {
   private Collection<ServerName> deadServers;
   private ServerName master;
   private Collection<ServerName> backupMasters;
-  private Set<RegionState> intransition;
+  private List<RegionState> intransition;
   private String clusterId;
   private String[] masterCoprocessors;
   private Boolean balancerOn;
@@ -77,7 +77,7 @@ public class ClusterStatus extends VersionedWritable {
       final Collection<ServerName> deadServers,
       final ServerName master,
       final Collection<ServerName> backupMasters,
-      final Set<RegionState> rit,
+      final List<RegionState> rit,
       final String[] masterCoprocessors,
       final Boolean balancerOn) {
     this.hbaseVersion = hbaseVersion;
@@ -248,7 +248,7 @@ public class ClusterStatus extends VersionedWritable {
   }
 
   @InterfaceAudience.Private
-  public Set<RegionState> getRegionsInTransition() {
+  public List<RegionState> getRegionsInTransition() {
     return this.intransition;
   }
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
index bc93cc6..d470ffa 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
@@ -23,6 +23,7 @@ import java.io.IOException;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.stream.Collectors;
 
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
@@ -167,6 +168,19 @@ public class HRegionInfo implements Comparable<HRegionInfo> {
     return prettyPrint(this.getEncodedName());
   }
 
+  public static String getShortNameToLog(HRegionInfo...hris) {
+    return getShortNameToLog(Arrays.asList(hris));
+  }
+
+  /**
+   * @return Return a String of short, printable names for <code>hris</code>
+   * (usually encoded name) for us logging.
+   */
+  public static String getShortNameToLog(final List<HRegionInfo> hris) {
+    return hris.stream().map(hri -> hri.getShortNameToLog()).
+        collect(Collectors.toList()).toString();
+  }
+
   /**
    * Use logging.
    * @param encodedRegionName The encoded regionname.

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
index 15bc132..9eb5111 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
@@ -1663,8 +1663,11 @@ public class MetaTableAccessor {
       Delete deleteA = makeDeleteFromRegionInfo(regionA, time);
       Delete deleteB = makeDeleteFromRegionInfo(regionB, time);
 
-      // The merged is a new region, openSeqNum = 1 is fine.
-      addLocation(putOfMerged, sn, 1, -1, mergedRegion.getReplicaId());
+      // The merged is a new region, openSeqNum = 1 is fine. ServerName may be null
+      // if crash after merge happened but before we got to here.. means in-memory
+      // locations of offlined merged, now-closed, regions is lost. Should be ok. We
+      // assign the merged region later.
+      if (sn != null) addLocation(putOfMerged, sn, 1, -1, mergedRegion.getReplicaId());
 
       // Add empty locations for region replicas of the merged region so that number of replicas can
       // be cached whenever the primary region is looked up from meta
@@ -1966,8 +1969,8 @@ public class MetaTableAccessor {
    * @param regionsInfo list of regions to be deleted from META
    * @throws IOException
    */
-  public static void deleteRegions(Connection connection,
-                                   List<HRegionInfo> regionsInfo, long ts) throws IOException {
+  public static void deleteRegions(Connection connection, List<HRegionInfo> regionsInfo, long ts)
+  throws IOException {
     List<Delete> deletes = new ArrayList<>(regionsInfo.size());
     for (HRegionInfo hri: regionsInfo) {
       Delete e = new Delete(hri.getRegionName());
@@ -2002,10 +2005,10 @@ public class MetaTableAccessor {
     }
     mutateMetaTable(connection, mutation);
     if (regionsToRemove != null && regionsToRemove.size() > 0) {
-      LOG.debug("Deleted " + regionsToRemove);
+      LOG.debug("Deleted " + HRegionInfo.getShortNameToLog(regionsToRemove));
     }
     if (regionsToAdd != null && regionsToAdd.size() > 0) {
-      LOG.debug("Added " + regionsToAdd);
+      LOG.debug("Added " + HRegionInfo.getShortNameToLog(regionsToAdd));
     }
   }
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
index e5f5694..4ed28ec 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
@@ -1339,6 +1339,12 @@ class ConnectionImplementation implements ClusterConnection, Closeable {
         return stub.mergeTableRegions(controller, request);
       }
 
+      public MasterProtos.DispatchMergingRegionsResponse dispatchMergingRegions(
+          RpcController controller, MasterProtos.DispatchMergingRegionsRequest request)
+          throws ServiceException {
+        return stub.dispatchMergingRegions(controller, request);
+      }
+
       @Override
       public MasterProtos.AssignRegionResponse assignRegion(RpcController controller,
           MasterProtos.AssignRegionRequest request) throws ServiceException {
@@ -1358,6 +1364,12 @@ class ConnectionImplementation implements ClusterConnection, Closeable {
       }
 
       @Override
+      public MasterProtos.SplitTableRegionResponse splitRegion(RpcController controller,
+          MasterProtos.SplitTableRegionRequest request) throws ServiceException {
+        return stub.splitRegion(controller, request);
+      }
+
+      @Override
       public MasterProtos.DeleteTableResponse deleteTable(RpcController controller,
           MasterProtos.DeleteTableRequest request) throws ServiceException {
         return stub.deleteTable(controller, request);

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
index bea578c..6d75446 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ShortCircuitMasterConnection.java
@@ -499,4 +499,16 @@ public class ShortCircuitMasterConnection implements MasterKeepAliveConnection {
       GetQuotaStatesRequest request) throws ServiceException {
     return stub.getQuotaStates(controller, request);
   }
+
+  @Override
+  public SplitTableRegionResponse splitRegion(RpcController controller, SplitTableRegionRequest request)
+      throws ServiceException {
+    return stub.splitRegion(controller, request);
+  }
+
+  @Override
+  public DispatchMergingRegionsResponse dispatchMergingRegions(RpcController controller,
+      DispatchMergingRegionsRequest request) throws ServiceException {
+    return stub.dispatchMergingRegions(controller, request);
+  }
 }

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcDuplexHandler.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcDuplexHandler.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcDuplexHandler.java
index e69b42d..08533b4 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcDuplexHandler.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcDuplexHandler.java
@@ -226,8 +226,8 @@ class NettyRpcDuplexHandler extends ChannelDuplexHandler {
       switch (idleEvt.state()) {
         case WRITER_IDLE:
           if (id2Call.isEmpty()) {
-            if (LOG.isDebugEnabled()) {
-              LOG.debug("shutdown connection to " + conn.remoteId().address
+            if (LOG.isTraceEnabled()) {
+              LOG.trace("shutdown connection to " + conn.remoteId().address
                   + " because idle for a long time");
             }
             // It may happen that there are still some pending calls in the event loop queue and

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcConnection.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcConnection.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcConnection.java
index b5a7959..98d2256 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcConnection.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcConnection.java
@@ -129,7 +129,11 @@ abstract class RpcConnection {
       authMethod = AuthMethod.KERBEROS;
     }
 
-    if (LOG.isDebugEnabled()) {
+    // Log if debug AND non-default auth, else if trace enabled.
+    // No point logging obvious.
+    if ((LOG.isDebugEnabled() && !authMethod.equals(AuthMethod.SIMPLE)) ||
+        LOG.isTraceEnabled()) {
+      // Only log if not default auth.
       LOG.debug("Use " + authMethod + " authentication for service " + remoteId.serviceName
           + ", sasl=" + useSasl);
     }

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
index 0e12ef6..7116763 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
@@ -36,10 +36,8 @@ public class RegionState {
   @InterfaceStability.Evolving
   public enum State {
     OFFLINE,        // region is in an offline state
-    PENDING_OPEN,   // same as OPENING, to be removed
     OPENING,        // server has begun to open but not yet done
     OPEN,           // server opened region and updated meta
-    PENDING_CLOSE,  // same as CLOSING, to be removed
     CLOSING,        // server has begun to close but not yet done
     CLOSED,         // server closed region and updated meta
     SPLITTING,      // server started split of a region
@@ -64,18 +62,12 @@ public class RegionState {
       case OFFLINE:
         rs = ClusterStatusProtos.RegionState.State.OFFLINE;
         break;
-      case PENDING_OPEN:
-        rs = ClusterStatusProtos.RegionState.State.PENDING_OPEN;
-        break;
       case OPENING:
         rs = ClusterStatusProtos.RegionState.State.OPENING;
         break;
       case OPEN:
         rs = ClusterStatusProtos.RegionState.State.OPEN;
         break;
-      case PENDING_CLOSE:
-        rs = ClusterStatusProtos.RegionState.State.PENDING_CLOSE;
-        break;
       case CLOSING:
         rs = ClusterStatusProtos.RegionState.State.CLOSING;
         break;
@@ -124,8 +116,6 @@ public class RegionState {
         state = OFFLINE;
         break;
       case PENDING_OPEN:
-        state = PENDING_OPEN;
-        break;
       case OPENING:
         state = OPENING;
         break;
@@ -133,8 +123,6 @@ public class RegionState {
         state = OPEN;
         break;
       case PENDING_CLOSE:
-        state = PENDING_CLOSE;
-        break;
       case CLOSING:
         state = CLOSING;
         break;
@@ -231,22 +219,16 @@ public class RegionState {
     this.ritDuration += (this.stamp - previousStamp);
   }
 
-  /**
-   * PENDING_CLOSE (to be removed) is the same as CLOSING
-   */
   public boolean isClosing() {
-    return state == State.PENDING_CLOSE || state == State.CLOSING;
+    return state == State.CLOSING;
   }
 
   public boolean isClosed() {
     return state == State.CLOSED;
   }
 
-  /**
-   * PENDING_OPEN (to be removed) is the same as OPENING
-   */
   public boolean isOpening() {
-    return state == State.PENDING_OPEN || state == State.OPENING;
+    return state == State.OPENING;
   }
 
   public boolean isOpened() {

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
index 108646a..eca050f 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ProtobufUtil.java
@@ -20,19 +20,19 @@ package org.apache.hadoop.hbase.shaded.protobuf;
 import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.InterruptedIOException;
 import java.lang.reflect.Constructor;
 import java.lang.reflect.Method;
 import java.nio.ByteBuffer;
+import java.security.PrivilegedExceptionAction;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.HashMap;
-import java.util.HashSet;
 import java.util.List;
 import java.util.Locale;
 import java.util.Map;
 import java.util.Map.Entry;
 import java.util.NavigableSet;
-import java.util.Set;
 import java.util.concurrent.Callable;
 import java.util.concurrent.TimeUnit;
 
@@ -89,12 +89,14 @@ import org.apache.hadoop.hbase.io.TimeRange;
 import org.apache.hadoop.hbase.master.RegionState;
 import org.apache.hadoop.hbase.procedure2.LockInfo;
 import org.apache.hadoop.hbase.protobuf.ProtobufMagic;
+import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.MergeRegionsRequest;
 import org.apache.hadoop.hbase.quotas.QuotaScope;
 import org.apache.hadoop.hbase.quotas.QuotaType;
 import org.apache.hadoop.hbase.quotas.SpaceViolationPolicy;
 import org.apache.hadoop.hbase.quotas.ThrottleType;
 import org.apache.hadoop.hbase.replication.ReplicationLoadSink;
 import org.apache.hadoop.hbase.replication.ReplicationLoadSource;
+import org.apache.hadoop.hbase.security.User;
 import org.apache.hadoop.hbase.security.visibility.Authorizations;
 import org.apache.hadoop.hbase.security.visibility.CellVisibility;
 import org.apache.hadoop.hbase.shaded.com.google.protobuf.ByteString;
@@ -108,8 +110,6 @@ import org.apache.hadoop.hbase.shaded.com.google.protobuf.ServiceException;
 import org.apache.hadoop.hbase.shaded.com.google.protobuf.TextFormat;
 import org.apache.hadoop.hbase.shaded.com.google.protobuf.UnsafeByteOperations;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.AdminService;
-import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.CloseRegionForSplitOrMergeRequest;
-import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.CloseRegionForSplitOrMergeResponse;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.CloseRegionRequest;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.CloseRegionResponse;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.GetOnlineRegionRequest;
@@ -177,6 +177,7 @@ import org.apache.hadoop.hbase.shaded.protobuf.generated.ZooKeeperProtos;
 import org.apache.hadoop.hbase.util.Addressing;
 import org.apache.hadoop.hbase.util.Bytes;
 import org.apache.hadoop.hbase.util.DynamicClassLoader;
+import org.apache.hadoop.hbase.util.EnvironmentEdgeManager;
 import org.apache.hadoop.hbase.util.ExceptionUtil;
 import org.apache.hadoop.hbase.util.ForeignExceptionUtil;
 import org.apache.hadoop.hbase.util.Methods;
@@ -1842,33 +1843,6 @@ public final class ProtobufUtil {
   }
 
   /**
-   * A helper to close a region for split or merge
-   * using admin protocol.
-   *
-   * @param controller RPC controller
-   * @param admin Admin service
-   * @param server the RS that hosts the target region
-   * @param regionInfo the target region info
-   * @return true if the region is closed
-   * @throws IOException
-   */
-  public static boolean closeRegionForSplitOrMerge(
-      final RpcController controller,
-      final AdminService.BlockingInterface admin,
-      final ServerName server,
-      final HRegionInfo... regionInfo) throws IOException {
-    CloseRegionForSplitOrMergeRequest closeRegionForRequest =
-        ProtobufUtil.buildCloseRegionForSplitOrMergeRequest(server, regionInfo);
-    try {
-      CloseRegionForSplitOrMergeResponse response =
-          admin.closeRegionForSplitOrMerge(controller, closeRegionForRequest);
-      return ResponseConverter.isClosed(response);
-    } catch (ServiceException se) {
-      throw getRemoteException(se);
-    }
-  }
-
-  /**
    * A helper to warmup a region given a region name
    * using admin protocol
    *
@@ -2020,6 +1994,46 @@ public final class ProtobufUtil {
     }
   }
 
+  /**
+   * A helper to merge regions using admin protocol. Send request to
+   * regionserver.
+   * @param admin
+   * @param region_a
+   * @param region_b
+   * @param forcible true if do a compulsory merge, otherwise we will only merge
+   *          two adjacent regions
+   * @param user effective user
+   * @throws IOException
+   */
+  public static void mergeRegions(final RpcController controller,
+      final AdminService.BlockingInterface admin,
+      final HRegionInfo region_a, final HRegionInfo region_b,
+      final boolean forcible, final User user) throws IOException {
+    final MergeRegionsRequest request = ProtobufUtil.buildMergeRegionsRequest(
+        region_a.getRegionName(), region_b.getRegionName(),forcible);
+    if (user != null) {
+      try {
+        user.runAs(new PrivilegedExceptionAction<Void>() {
+          @Override
+          public Void run() throws Exception {
+            admin.mergeRegions(controller, request);
+            return null;
+          }
+        });
+      } catch (InterruptedException ie) {
+        InterruptedIOException iioe = new InterruptedIOException();
+        iioe.initCause(ie);
+        throw iioe;
+      }
+    } else {
+      try {
+        admin.mergeRegions(controller, request);
+      } catch (ServiceException se) {
+        throw ProtobufUtil.getRemoteException(se);
+      }
+    }
+  }
+
 // End helpers for Admin
 
   /*
@@ -3103,8 +3117,8 @@ public final class ProtobufUtil {
       backupMasters.add(ProtobufUtil.toServerName(sn));
     }
 
-    Set<RegionState> rit = null;
-    rit = new HashSet<>(proto.getRegionsInTransitionList().size());
+    List<RegionState> rit =
+      new ArrayList<>(proto.getRegionsInTransitionList().size());
     for (RegionInTransition region : proto.getRegionsInTransitionList()) {
       RegionState value = RegionState.convert(region.getRegionState());
       rit.add(value);
@@ -3263,26 +3277,6 @@ public final class ProtobufUtil {
   }
 
   /**
-   * Create a CloseRegionForSplitOrMergeRequest for given regions
-   *
-   * @param server the RS server that hosts the region
-   * @param regionsToClose the info of the regions to close
-   * @return a CloseRegionForSplitRequest
-   */
-  public static CloseRegionForSplitOrMergeRequest buildCloseRegionForSplitOrMergeRequest(
-      final ServerName server,
-      final HRegionInfo... regionsToClose) {
-    CloseRegionForSplitOrMergeRequest.Builder builder =
-        CloseRegionForSplitOrMergeRequest.newBuilder();
-    for(int i = 0; i < regionsToClose.length; i++) {
-        RegionSpecifier regionToClose = RequestConverter.buildRegionSpecifier(
-          RegionSpecifierType.REGION_NAME, regionsToClose[i].getRegionName());
-        builder.addRegion(regionToClose);
-    }
-    return builder.build();
-  }
-
-  /**
     * Create a CloseRegionRequest for a given encoded region name
     *
     * @param encodedRegionName the name of the region to close
@@ -3331,6 +3325,28 @@ public final class ProtobufUtil {
     return builder.build();
   }
 
+   /**
+    * Create a MergeRegionsRequest for the given regions
+    * @param regionA name of region a
+    * @param regionB name of region b
+    * @param forcible true if it is a compulsory merge
+    * @return a MergeRegionsRequest
+    */
+   public static MergeRegionsRequest buildMergeRegionsRequest(
+       final byte[] regionA, final byte[] regionB, final boolean forcible) {
+     MergeRegionsRequest.Builder builder = MergeRegionsRequest.newBuilder();
+     RegionSpecifier regionASpecifier = RequestConverter.buildRegionSpecifier(
+         RegionSpecifierType.REGION_NAME, regionA);
+     RegionSpecifier regionBSpecifier = RequestConverter.buildRegionSpecifier(
+         RegionSpecifierType.REGION_NAME, regionB);
+     builder.setRegionA(regionASpecifier);
+     builder.setRegionB(regionBSpecifier);
+     builder.setForcible(forcible);
+     // send the master's wall clock time as well, so that the RS can refer to it
+     builder.setMasterSystemTime(EnvironmentEdgeManager.currentTime());
+     return builder.build();
+   }
+
   /**
    * Get a ServerName from the passed in data bytes.
    * @param data Data with a serialize server name in it; can handle the old style

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java
index 4d34334..134c319 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java
@@ -123,7 +123,6 @@ import org.apache.hadoop.hbase.shaded.protobuf.generated.QuotaProtos.GetQuotaSta
 import org.apache.hadoop.hbase.shaded.protobuf.generated.QuotaProtos.GetSpaceQuotaRegionSizesRequest;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.QuotaProtos.GetSpaceQuotaSnapshotsRequest;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos.GetLastFlushedSequenceIdRequest;
-import org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos.SplitTableRegionRequest;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.ReplicationProtos;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.ReplicationProtos.AddReplicationPeerRequest;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.ReplicationProtos.DisableReplicationPeerRequest;
@@ -1120,19 +1119,6 @@ public final class RequestConverter {
     return builder.build();
   }
 
-  public static SplitTableRegionRequest buildSplitTableRegionRequest(
-      final HRegionInfo regionInfo,
-      final byte[] splitPoint,
-      final long nonceGroup,
-      final long nonce) {
-    SplitTableRegionRequest.Builder builder = SplitTableRegionRequest.newBuilder();
-    builder.setRegionInfo(HRegionInfo.convert(regionInfo));
-    builder.setSplitRow(UnsafeByteOperations.unsafeWrap(splitPoint));
-    builder.setNonceGroup(nonceGroup);
-    builder.setNonce(nonce);
-    return builder.build();
-  }
-
   /**
    * Create a protocol buffer AssignRegionRequest
    *
@@ -1515,7 +1501,7 @@ public final class RequestConverter {
   /**
    * Create a RegionOpenInfo based on given region info and version of offline node
    */
-  private static RegionOpenInfo buildRegionOpenInfo(
+  public static RegionOpenInfo buildRegionOpenInfo(
       final HRegionInfo region,
       final List<ServerName> favoredNodes, Boolean openForReplay) {
     RegionOpenInfo.Builder builder = RegionOpenInfo.newBuilder();

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ResponseConverter.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ResponseConverter.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ResponseConverter.java
index ecadbbc..c489628 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ResponseConverter.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/ResponseConverter.java
@@ -34,7 +34,6 @@ import org.apache.hadoop.hbase.classification.InterfaceAudience;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.client.SingleResponse;
 import org.apache.hadoop.hbase.ipc.ServerRpcController;
-import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.CloseRegionForSplitOrMergeResponse;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.CloseRegionResponse;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.GetOnlineRegionResponse;
 import org.apache.hadoop.hbase.shaded.protobuf.generated.AdminProtos.GetServerInfoResponse;
@@ -254,18 +253,6 @@ public final class ResponseConverter {
   }
 
   /**
-   * Check if the region is closed from a CloseRegionForSplitResponse
-   *
-   * @param proto the CloseRegionForSplitResponse
-   * @return the region close state
-   */
-  public static boolean isClosed
-      (final CloseRegionForSplitOrMergeResponse proto) {
-    if (proto == null || !proto.hasClosed()) return false;
-    return proto.getClosed();
-  }
-
-  /**
    * A utility to build a GetServerInfoResponse.
    *
    * @param serverName

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java
----------------------------------------------------------------------
diff --git a/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java b/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java
index afab54a..c11d896 100644
--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaTableLocator.java
@@ -439,6 +439,10 @@ public class MetaTableLocator {
    */
   public static void setMetaLocation(ZooKeeperWatcher zookeeper,
       ServerName serverName, int replicaId, RegionState.State state) throws KeeperException {
+    if (serverName == null) {
+      LOG.warn("Tried to set null ServerName in hbase:meta; skipping -- ServerName required");
+      return;
+    }
     LOG.info("Setting hbase:meta region location in ZooKeeper as " + serverName);
     // Make the MetaRegionServer pb and then get its bytes and save this as
     // the znode content.
@@ -448,7 +452,8 @@ public class MetaTableLocator {
       .setState(state.convert()).build();
     byte[] data = ProtobufUtil.prependPBMagic(pbrsr.toByteArray());
     try {
-      ZKUtil.setData(zookeeper, zookeeper.znodePaths.getZNodeForReplica(replicaId), data);
+      ZKUtil.setData(zookeeper,
+          zookeeper.znodePaths.getZNodeForReplica(replicaId), data);
     } catch(KeeperException.NoNodeException nne) {
       if (replicaId == HRegionInfo.DEFAULT_REPLICA_ID) {
         LOG.debug("META region location doesn't exist, create it");

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-common/src/main/java/org/apache/hadoop/hbase/ProcedureInfo.java
----------------------------------------------------------------------
diff --git a/hbase-common/src/main/java/org/apache/hadoop/hbase/ProcedureInfo.java b/hbase-common/src/main/java/org/apache/hadoop/hbase/ProcedureInfo.java
index 6104c22..36dabdd 100644
--- a/hbase-common/src/main/java/org/apache/hadoop/hbase/ProcedureInfo.java
+++ b/hbase-common/src/main/java/org/apache/hadoop/hbase/ProcedureInfo.java
@@ -80,12 +80,11 @@ public class ProcedureInfo implements Cloneable {
   @Override
   public String toString() {
     StringBuilder sb = new StringBuilder();
-    sb.append("Procedure=");
     sb.append(procName);
-    sb.append(" (id=");
+    sb.append(" pid=");
     sb.append(procId);
     if (hasParentId()) {
-      sb.append(", parent=");
+      sb.append(", ppid=");
       sb.append(parentId);
     }
     if (hasOwner()) {
@@ -107,7 +106,6 @@ public class ProcedureInfo implements Cloneable {
       sb.append(this.exception.getMessage());
       sb.append("\"");
     }
-    sb.append(")");
     return sb.toString();
   }
 

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSource.java
----------------------------------------------------------------------
diff --git a/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSource.java b/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSource.java
index fa7bbec..2ebf8c9 100644
--- a/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSource.java
+++ b/hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSource.java
@@ -47,6 +47,7 @@ public interface MetricsAssignmentManagerSource extends BaseSource {
   String RIT_OLDEST_AGE_NAME = "ritOldestAge";
   String RIT_DURATION_NAME = "ritDuration";
   String ASSIGN_TIME_NAME = "assign";
+  String UNASSIGN_TIME_NAME = "unassign";
   String BULK_ASSIGN_TIME_NAME = "bulkAssign";
 
   String RIT_COUNT_DESC = "Current number of Regions In Transition (Gauge).";
@@ -56,9 +57,7 @@ public interface MetricsAssignmentManagerSource extends BaseSource {
   String RIT_DURATION_DESC =
       "Total durations in milliseconds for all Regions in Transition (Histogram).";
 
-  void updateAssignmentTime(long time);
-
-  void updateBulkAssignTime(long time);
+  String OPERATION_COUNT_NAME = "operationCount";
 
   /**
    * Set the number of regions in transition.
@@ -82,4 +81,19 @@ public interface MetricsAssignmentManagerSource extends BaseSource {
   void setRITOldestAge(long age);
 
   void updateRitDuration(long duration);
+
+  /**
+   * Increment the count of assignment operation (assign/unassign).
+   */
+  void incrementOperationCounter();
+
+  /**
+   * Add the time took to perform the last assign operation
+   */
+  void updateAssignTime(long time);
+
+  /**
+   * Add the time took to perform the last unassign operation
+   */
+  void updateUnassignTime(long time);
 }

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSourceImpl.java
----------------------------------------------------------------------
diff --git a/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSourceImpl.java b/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSourceImpl.java
index faae044..14b7e71 100644
--- a/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSourceImpl.java
+++ b/hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/MetricsAssignmentManagerSourceImpl.java
@@ -21,6 +21,7 @@ package org.apache.hadoop.hbase.master;
 import org.apache.hadoop.hbase.classification.InterfaceAudience;
 import org.apache.hadoop.hbase.metrics.BaseSourceImpl;
 import org.apache.hadoop.metrics2.MetricHistogram;
+import org.apache.hadoop.metrics2.lib.MutableFastCounter;
 import org.apache.hadoop.metrics2.lib.MutableGaugeLong;
 
 @InterfaceAudience.Private
@@ -32,8 +33,10 @@ public class MetricsAssignmentManagerSourceImpl
   private MutableGaugeLong ritCountOverThresholdGauge;
   private MutableGaugeLong ritOldestAgeGauge;
   private MetricHistogram ritDurationHisto;
+
+  private MutableFastCounter operationCounter;
   private MetricHistogram assignTimeHisto;
-  private MetricHistogram bulkAssignTimeHisto;
+  private MetricHistogram unassignTimeHisto;
 
   public MetricsAssignmentManagerSourceImpl() {
     this(METRICS_NAME, METRICS_DESCRIPTION, METRICS_CONTEXT, METRICS_JMX_CONTEXT);
@@ -51,30 +54,39 @@ public class MetricsAssignmentManagerSourceImpl
         RIT_COUNT_OVER_THRESHOLD_DESC,0l);
     ritOldestAgeGauge = metricsRegistry.newGauge(RIT_OLDEST_AGE_NAME, RIT_OLDEST_AGE_DESC, 0l);
     assignTimeHisto = metricsRegistry.newTimeHistogram(ASSIGN_TIME_NAME);
-    bulkAssignTimeHisto = metricsRegistry.newTimeHistogram(BULK_ASSIGN_TIME_NAME);
+    unassignTimeHisto = metricsRegistry.newTimeHistogram(UNASSIGN_TIME_NAME);
     ritDurationHisto = metricsRegistry.newTimeHistogram(RIT_DURATION_NAME, RIT_DURATION_DESC);
+    operationCounter = metricsRegistry.getCounter(OPERATION_COUNT_NAME, 0l);
   }
 
   @Override
-  public void updateAssignmentTime(long time) {
-    assignTimeHisto.add(time);
+  public void setRIT(final int ritCount) {
+    ritGauge.set(ritCount);
   }
 
   @Override
-  public void updateBulkAssignTime(long time) {
-    bulkAssignTimeHisto.add(time);
+  public void setRITCountOverThreshold(final int ritCount) {
+    ritCountOverThresholdGauge.set(ritCount);
   }
 
-  public void setRIT(int ritCount) {
-    ritGauge.set(ritCount);
+  @Override
+  public void setRITOldestAge(final long ritCount) {
+    ritOldestAgeGauge.set(ritCount);
   }
 
-  public void setRITCountOverThreshold(int ritCount) {
-    ritCountOverThresholdGauge.set(ritCount);
+  @Override
+  public void incrementOperationCounter() {
+    operationCounter.incr();
   }
 
-  public void setRITOldestAge(long ritCount) {
-    ritOldestAgeGauge.set(ritCount);
+  @Override
+  public void updateAssignTime(final long time) {
+    assignTimeHisto.add(time);
+  }
+
+  @Override
+  public void updateUnassignTime(final long time) {
+    unassignTimeHisto.add(time);
   }
 
   @Override

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/AbstractProcedureScheduler.java
----------------------------------------------------------------------
diff --git a/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/AbstractProcedureScheduler.java b/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/AbstractProcedureScheduler.java
index fbb066c..64c3e53 100644
--- a/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/AbstractProcedureScheduler.java
+++ b/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/AbstractProcedureScheduler.java
@@ -29,8 +29,8 @@ import org.apache.hadoop.hbase.classification.InterfaceAudience;
 @InterfaceAudience.Private
 public abstract class AbstractProcedureScheduler implements ProcedureScheduler {
   private static final Log LOG = LogFactory.getLog(AbstractProcedureScheduler.class);
-  private final ReentrantLock schedLock = new ReentrantLock();
-  private final Condition schedWaitCond = schedLock.newCondition();
+  private final ReentrantLock schedulerLock = new ReentrantLock();
+  private final Condition schedWaitCond = schedulerLock.newCondition();
   private boolean running = false;
 
   // TODO: metrics
@@ -88,14 +88,14 @@ public abstract class AbstractProcedureScheduler implements ProcedureScheduler {
   }
 
   protected void push(final Procedure procedure, final boolean addFront, final boolean notify) {
-    schedLock.lock();
+    schedulerLock.lock();
     try {
       enqueue(procedure, addFront);
       if (notify) {
         schedWaitCond.signal();
       }
     } finally {
-      schedLock.unlock();
+      schedulerLock.unlock();
     }
   }
 
@@ -219,11 +219,11 @@ public abstract class AbstractProcedureScheduler implements ProcedureScheduler {
 
   @Override
   public void suspendEvent(final ProcedureEvent event) {
-    final boolean isTraceEnabled = LOG.isTraceEnabled();
+    final boolean traceEnabled = LOG.isTraceEnabled();
     synchronized (event) {
       event.setReady(false);
-      if (isTraceEnabled) {
-        LOG.trace("Suspend event " + event);
+      if (traceEnabled) {
+        LOG.trace("Suspend " + event);
       }
     }
   }
@@ -235,18 +235,29 @@ public abstract class AbstractProcedureScheduler implements ProcedureScheduler {
 
   @Override
   public void wakeEvents(final int count, final ProcedureEvent... events) {
-    final boolean isTraceEnabled = LOG.isTraceEnabled();
+    final boolean traceEnabled = LOG.isTraceEnabled();
     schedLock();
     try {
       int waitingCount = 0;
       for (int i = 0; i < count; ++i) {
         final ProcedureEvent event = events[i];
         synchronized (event) {
-          event.setReady(true);
-          if (isTraceEnabled) {
-            LOG.trace("Wake event " + event);
+          if (!event.isReady()) {
+            // Only set ready if we were not ready; i.e. suspended. Otherwise, we double-wake
+            // on this event and down in wakeWaitingProcedures, we double decrement this
+            // finish which messes up child procedure accounting.
+            event.setReady(true);
+            if (traceEnabled) {
+              LOG.trace("Unsuspend " + event);
+            }
+            waitingCount += wakeWaitingProcedures(event.getSuspendedProcedures());
+          } else {
+            ProcedureDeque q = event.getSuspendedProcedures();
+            if (q != null && !q.isEmpty()) {
+              LOG.warn("Q is not empty! size=" + q.size() + "; PROCESSING...");
+              waitingCount += wakeWaitingProcedures(event.getSuspendedProcedures());
+            }
           }
-          waitingCount += wakeWaitingProcedures(event.getSuspendedProcedures());
         }
       }
       wakePollIfNeeded(waitingCount);
@@ -275,6 +286,7 @@ public abstract class AbstractProcedureScheduler implements ProcedureScheduler {
   }
 
   protected void wakeProcedure(final Procedure procedure) {
+    if (LOG.isTraceEnabled()) LOG.trace("Wake " + procedure);
     push(procedure, /* addFront= */ true, /* notify= */false);
   }
 
@@ -282,11 +294,11 @@ public abstract class AbstractProcedureScheduler implements ProcedureScheduler {
   //  Internal helpers
   // ==========================================================================
   protected void schedLock() {
-    schedLock.lock();
+    schedulerLock.lock();
   }
 
   protected void schedUnlock() {
-    schedLock.unlock();
+    schedulerLock.unlock();
   }
 
   protected void wakePollIfNeeded(final int waitingCount) {

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
----------------------------------------------------------------------
diff --git a/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java b/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
index 591c0d0..5ce1dd0 100644
--- a/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
+++ b/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/Procedure.java
@@ -25,6 +25,8 @@ import java.util.Arrays;
 import java.util.List;
 import java.util.Map;
 
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
 import org.apache.hadoop.hbase.classification.InterfaceAudience;
 import org.apache.hadoop.hbase.classification.InterfaceStability;
 import org.apache.hadoop.hbase.exceptions.TimeoutIOException;
@@ -37,37 +39,66 @@ import org.apache.hadoop.hbase.util.NonceKey;
 import com.google.common.annotations.VisibleForTesting;
 
 /**
- * Base Procedure class responsible to handle the Procedure Metadata
- * e.g. state, submittedTime, lastUpdate, stack-indexes, ...
+ * Base Procedure class responsible for Procedure Metadata;
+ * e.g. state, submittedTime, lastUpdate, stack-indexes, etc.
  *
- * execute() is called each time the procedure is executed.
- * it may be called multiple times in case of failure and restart, so the
- * code must be idempotent.
- * the return is a set of sub-procedures or null in case the procedure doesn't
- * have sub-procedures. Once the sub-procedures are successfully completed
- * the execute() method is called again, you should think at it as a stack:
- *  -&gt; step 1
- *  ---&gt; step 2
- *  -&gt; step 1
+ * <p>Procedures are run by a {@link ProcedureExecutor} instance. They are submitted and then
+ * the ProcedureExecutor keeps calling {@link #execute(Object)} until the Procedure is done.
+ * Execute may be called multiple times in the case of failure or a restart, so code must be
+ * idempotent. The return from an execute call is either: null to indicate we are done;
+ * ourself if there is more to do; or, a set of sub-procedures that need to
+ * be run to completion before the framework resumes our execution.
  *
- * rollback() is called when the procedure or one of the sub-procedures is failed.
- * the rollback step is supposed to cleanup the resources created during the
- * execute() step. in case of failure and restart rollback() may be called
- * multiple times, so the code must be idempotent.
+ * <p>The ProcedureExecutor keeps its
+ * notion of Procedure State in the Procedure itself; e.g. it stamps the Procedure as INITIALIZING,
+ * RUNNABLE, SUCCESS, etc. Here are some of the States defined in the ProcedureState enum from
+ * protos:
+ *<ul>
+ * <li>{@link #isFailed()} A procedure has executed at least once and has failed. The procedure
+ * may or may not have rolled back yet. Any procedure in FAILED state will be eventually moved
+ * to ROLLEDBACK state.</li>
+ *
+ * <li>{@link #isSuccess()} A procedure is completed successfully without exception.</li>
+ *
+ * <li>{@link #isFinished()} As a procedure in FAILED state will be tried forever for rollback, only
+ * condition when scheduler/ executor will drop procedure from further processing is when procedure
+ * state is ROLLEDBACK or isSuccess() returns true. This is a terminal state of the procedure.</li>
+ *
+ * <li>{@link #isWaiting()} - Procedure is in one of the two waiting states
+ * ({@link ProcedureState#WAITING}, {@link ProcedureState#WAITING_TIMEOUT}).</li>
+ *</ul>
+ * NOTE: This states are of the ProcedureExecutor. Procedure implementations in turn can keep
+ * their own state. This can lead to confusion. Try to keep the two distinct.
+ *
+ * <p>rollback() is called when the procedure or one of the sub-procedures
+ * has failed. The rollback step is supposed to cleanup the resources created
+ * during the execute() step. In case of failure and restart, rollback() may be
+ * called multiple times, so again the code must be idempotent.
+ *
+ * <p>Procedure can be made respect a locking regime. It has acqure/release methods as
+ * well as an {@link #hasLock(Object)}. The lock implementation is up to the implementor.
+ * If an entity needs to be locked for the life of a procedure -- not just the calls to
+ * execute -- then implementations should say so with the {@link #holdLock(Object)}
+ * method.
+ *
+ * <p>There are hooks for collecting metrics on submit of the procedure and on finish.
+ * See {@link #updateMetricsOnSubmit(Object)} and
+ * {@link #updateMetricsOnFinish(Object, long, boolean)}.
  */
 @InterfaceAudience.Private
 @InterfaceStability.Evolving
-public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
+public abstract class Procedure<TEnvironment> implements Comparable<Procedure<?>> {
+  private static final Log LOG = LogFactory.getLog(Procedure.class);
   public static final long NO_PROC_ID = -1;
   protected static final int NO_TIMEOUT = -1;
 
   public enum LockState {
-    LOCK_ACQUIRED,       // lock acquired and ready to execute
-    LOCK_YIELD_WAIT,     // lock not acquired, framework needs to yield
-    LOCK_EVENT_WAIT,     // lock not acquired, an event will yield the procedure
+    LOCK_ACQUIRED,       // Lock acquired and ready to execute
+    LOCK_YIELD_WAIT,     // Lock not acquired, framework needs to yield
+    LOCK_EVENT_WAIT,     // Lock not acquired, an event will yield the procedure
   }
 
-  // unchanged after initialization
+  // Unchanged after initialization
   private NonceKey nonceKey = null;
   private String owner = null;
   private long parentProcId = NO_PROC_ID;
@@ -75,7 +106,7 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
   private long procId = NO_PROC_ID;
   private long submittedTime;
 
-  // runtime state, updated every operation
+  // Runtime state, updated every operation
   private ProcedureState state = ProcedureState.INITIALIZING;
   private RemoteProcedureException exception = null;
   private int[] stackIndexes = null;
@@ -88,19 +119,22 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
 
   /**
    * The main code of the procedure. It must be idempotent since execute()
-   * may be called multiple time in case of machine failure in the middle
+   * may be called multiple times in case of machine failure in the middle
    * of the execution.
    * @param env the environment passed to the ProcedureExecutor
-   * @return a set of sub-procedures or null if there is nothing else to execute.
-   * @throws ProcedureYieldException the procedure will be added back to the queue and retried later
-   * @throws InterruptedException the procedure will be added back to the queue and retried later
-   */
-  protected abstract Procedure[] execute(TEnvironment env)
+   * @return a set of sub-procedures to run or ourselves if there is more work to do or null if the
+   * procedure is done.
+   * @throws ProcedureYieldException the procedure will be added back to the queue and retried later.
+   * @throws InterruptedException the procedure will be added back to the queue and retried later.
+   * @throws ProcedureSuspendedException Signal to the executor that Procedure has suspended itself and
+   * has set itself up waiting for an external event to wake it back up again.
+   */
+  protected abstract Procedure<?>[] execute(TEnvironment env)
     throws ProcedureYieldException, ProcedureSuspendedException, InterruptedException;
 
   /**
-   * The code to undo what done by the execute() code.
-   * It is called when the procedure or one of the sub-procedure failed or an
+   * The code to undo what was done by the execute() code.
+   * It is called when the procedure or one of the sub-procedures failed or an
    * abort was requested. It should cleanup all the resources created by
    * the execute() call. The implementation must be idempotent since rollback()
    * may be called multiple time in case of machine failure in the middle
@@ -114,21 +148,21 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
 
   /**
    * The abort() call is asynchronous and each procedure must decide how to deal
-   * with that, if they want to be abortable. The simplest implementation
+   * with it, if they want to be abortable. The simplest implementation
    * is to have an AtomicBoolean set in the abort() method and then the execute()
    * will check if the abort flag is set or not.
    * abort() may be called multiple times from the client, so the implementation
    * must be idempotent.
    *
-   * NOTE: abort() is not like Thread.interrupt() it is just a notification
-   * that allows the procedure implementor where to abort to avoid leak and
-   * have a better control on what was executed and what not.
+   * <p>NOTE: abort() is not like Thread.interrupt(). It is just a notification
+   * that allows the procedure implementor abort.
    */
   protected abstract boolean abort(TEnvironment env);
 
   /**
    * The user-level code of the procedure may have some state to
-   * persist (e.g. input arguments) to be able to resume on failure.
+   * persist (e.g. input arguments or current position in the processing state) to
+   * be able to resume on failure.
    * @param stream the stream that will contain the user serialized data
    */
   protected abstract void serializeStateData(final OutputStream stream)
@@ -143,11 +177,17 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
     throws IOException;
 
   /**
-   * The user should override this method, and try to take a lock if necessary.
-   * A lock can be anything, and it is up to the implementor.
+   * The user should override this method if they need a lock on an Entity.
+   * A lock can be anything, and it is up to the implementor. The Procedure
+   * Framework will call this method just before it invokes {@link #execute(Object)}.
+   * It calls {@link #releaseLock(Object)} after the call to execute.
+   * 
+   * <p>If you need to hold the lock for the life of the Procdure -- i.e. you do not
+   * want any other Procedure interfering while this Procedure is running, see
+   * {@link #holdLock(Object)}.
    *
    * <p>Example: in our Master we can execute request in parallel for different tables.
-   * We can create t1 and create t2 and this can be executed at the same time.
+   * We can create t1 and create t2 and these creates can be executed at the same time.
    * Anything else on t1/t2 is queued waiting that specific table create to happen.
    *
    * <p>There are 3 LockState:
@@ -173,6 +213,9 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
 
   /**
    * Used to keep the procedure lock even when the procedure is yielding or suspended.
+   * Must implement {@link #hasLock(Object)} if you want to hold the lock for life
+   * of the Procedure.
+   * @see #hasLock(Object)
    * @return true if the procedure should hold on the lock until completionCleanup()
    */
   protected boolean holdLock(final TEnvironment env) {
@@ -180,8 +223,11 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
   }
 
   /**
-   * This is used in conjuction with holdLock(). If holdLock() is true
-   * the procedure executor will not call acquireLock() if hasLock() is true.
+   * This is used in conjunction with {@link #holdLock(Object)}. If {@link #holdLock(Object)}
+   * returns true, the procedure executor will call acquireLock() once and thereafter
+   * not call {@link #releaseLock(Object)} until the Procedure is done (Normally, it calls
+   * release/acquire around each invocation of {@link #execute(Object)}.
+   * @see #holdLock(Object)
    * @return true if the procedure has the lock, false otherwise.
    */
   protected boolean hasLock(final TEnvironment env) {
@@ -209,14 +255,15 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
   /**
    * Called when the procedure is marked as completed (success or rollback).
    * The procedure implementor may use this method to cleanup in-memory states.
-   * This operation will not be retried on failure.
+   * This operation will not be retried on failure. If a procedure took a lock,
+   * it will have been released when this method runs.
    */
   protected void completionCleanup(final TEnvironment env) {
     // no-op
   }
 
   /**
-   * By default, the executor will try to run procedures start to finish.
+   * By default, the procedure framework/executor will try to run procedures start to finish.
    * Return true to make the executor yield between each execution step to
    * give other procedures a chance to run.
    * @param env the environment passed to the ProcedureExecutor
@@ -275,27 +322,29 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
   protected StringBuilder toStringSimpleSB() {
     final StringBuilder sb = new StringBuilder();
 
-    sb.append("procId=");
+    sb.append("pid=");
     sb.append(getProcId());
 
     if (hasParent()) {
-      sb.append(", parentProcId=");
+      sb.append(", ppid=");
       sb.append(getParentProcId());
     }
 
+    /**
+     * Enable later when this is being used.
     if (hasOwner()) {
       sb.append(", owner=");
       sb.append(getOwner());
-    }
+    }*/
 
-    sb.append(", state=");
+    sb.append(", state="); // pState for Procedure State as opposed to any other kind.
     toStringState(sb);
 
     if (hasException()) {
       sb.append(", exception=" + getException());
     }
 
-    sb.append(", ");
+    sb.append("; ");
     toStringClassDetails(sb);
 
     return sb;
@@ -311,7 +360,7 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
     sb.append(" submittedTime=");
     sb.append(getSubmittedTime());
 
-    sb.append(" lastUpdate=");
+    sb.append(", lastUpdate=");
     sb.append(getLastUpdate());
 
     final int[] stackIndices = getStackIndexes();
@@ -331,7 +380,8 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
   }
 
   /**
-   * Called from {@link #toString()} when interpolating {@link Procedure} state
+   * Called from {@link #toString()} when interpolating {@link Procedure} State.
+   * Allows decorating generic Procedure State with Procedure particulars.
    * @param builder Append current {@link ProcedureState}
    */
   protected void toStringState(StringBuilder builder) {
@@ -527,25 +577,6 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
   // ==============================================================================================
 
   /**
-   * Procedure has states which are defined in proto file. At some places in the code, we
-   * need to determine more about those states. Following Methods help determine:
-   *
-   * {@link #isFailed()} - A procedure has executed at least once and has failed. The procedure
-   *                       may or may not have rolled back yet. Any procedure in FAILED state
-   *                       will be eventually moved to ROLLEDBACK state.
-   *
-   * {@link #isSuccess()} - A procedure is completed successfully without any exception.
-   *
-   * {@link #isFinished()} - As a procedure in FAILED state will be tried forever for rollback, only
-   *                         condition when scheduler/ executor will drop procedure from further
-   *                         processing is when procedure state is ROLLEDBACK or isSuccess()
-   *                         returns true. This is a terminal state of the procedure.
-   *
-   * {@link #isWaiting()} - Procedure is in one of the two waiting states ({@link
-   *                        ProcedureState#WAITING}, {@link ProcedureState#WAITING_TIMEOUT}).
-   */
-
-  /**
    * @return true if the procedure is in a RUNNABLE state.
    */
   protected synchronized boolean isRunnable() {
@@ -648,6 +679,10 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
   @InterfaceAudience.Private
   protected synchronized void setChildrenLatch(final int numChildren) {
     this.childrenLatch = numChildren;
+    if (LOG.isTraceEnabled()) {
+      LOG.info("CHILD LATCH INCREMENT SET " +
+          this.childrenLatch, new Throwable(this.toString()));
+    }
   }
 
   /**
@@ -657,15 +692,34 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
   protected synchronized void incChildrenLatch() {
     // TODO: can this be inferred from the stack? I think so...
     this.childrenLatch++;
+    if (LOG.isTraceEnabled()) {
+      LOG.info("CHILD LATCH INCREMENT " + this.childrenLatch, new Throwable(this.toString()));
+    }
   }
 
   /**
    * Called by the ProcedureExecutor to notify that one of the sub-procedures has completed.
    */
   @InterfaceAudience.Private
-  protected synchronized boolean childrenCountDown() {
+  private synchronized boolean childrenCountDown() {
     assert childrenLatch > 0: this;
-    return --childrenLatch == 0;
+    boolean b = --childrenLatch == 0;
+    if (LOG.isTraceEnabled()) {
+      LOG.info("CHILD LATCH DECREMENT " + childrenLatch, new Throwable(this.toString()));
+    }
+    return b;
+  }
+
+  /**
+   * Try to set this procedure into RUNNABLE state.
+   * Succeeds if all subprocedures/children are done.
+   * @return True if we were able to move procedure to RUNNABLE state.
+   */
+  synchronized boolean tryRunnable() {
+    // Don't use isWaiting in the below; it returns true for WAITING and WAITING_TIMEOUT
+    boolean b = getState() == ProcedureState.WAITING && childrenCountDown();
+    if (b) setState(ProcedureState.RUNNABLE);
+    return b;
   }
 
   @InterfaceAudience.Private
@@ -732,9 +786,11 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
 
   /**
    * Internal method called by the ProcedureExecutor that starts the user-level code execute().
+   * @throws ProcedureSuspendedException This is used when procedure wants to halt processing and
+   * skip out without changing states or releasing any locks held.
    */
   @InterfaceAudience.Private
-  protected Procedure[] doExecute(final TEnvironment env)
+  protected Procedure<?>[] doExecute(final TEnvironment env)
       throws ProcedureYieldException, ProcedureSuspendedException, InterruptedException {
     try {
       updateTimestamp();
@@ -775,7 +831,7 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
   }
 
   @Override
-  public int compareTo(final Procedure other) {
+  public int compareTo(final Procedure<?> other) {
     return Long.compare(getProcId(), other.getProcId());
   }
 
@@ -801,7 +857,8 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
    * Helper to lookup the root Procedure ID given a specified procedure.
    */
   @InterfaceAudience.Private
-  protected static Long getRootProcedureId(final Map<Long, Procedure> procedures, Procedure proc) {
+  protected static Long getRootProcedureId(final Map<Long, Procedure> procedures,
+      Procedure<?> proc) {
     while (proc.hasParent()) {
       proc = procedures.get(proc.getParentProcId());
       if (proc == null) return null;
@@ -814,10 +871,10 @@ public abstract class Procedure<TEnvironment> implements Comparable<Procedure> {
    * @param b the second procedure to be compared.
    * @return true if the two procedures have the same parent
    */
-  public static boolean haveSameParent(final Procedure a, final Procedure b) {
+  public static boolean haveSameParent(final Procedure<?> a, final Procedure<?> b) {
     if (a.hasParent() && b.hasParent()) {
       return a.getParentProcId() == b.getParentProcId();
     }
     return false;
   }
-}
+}
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/hbase/blob/4143c017/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureEvent.java
----------------------------------------------------------------------
diff --git a/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureEvent.java b/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureEvent.java
index 43cce3a..adb27a8 100644
--- a/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureEvent.java
+++ b/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureEvent.java
@@ -50,6 +50,6 @@ public class ProcedureEvent<T> {
   @Override
   public String toString() {
     return getClass().getSimpleName() + " for " + object + ", ready=" + isReady() +
-        ", suspended procedures count=" + getSuspendedProcedures().size();
+        ", " + getSuspendedProcedures();
   }
 }
\ No newline at end of file