You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2019/09/13 21:28:12 UTC

[hbase-operator-tools] branch master updated: HBASE-23021 [hbase-operator-tools] README edits in prep for release

This is an automated email from the ASF dual-hosted git repository.

stack pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hbase-operator-tools.git


The following commit(s) were added to refs/heads/master by this push:
     new cee1464  HBASE-23021 [hbase-operator-tools] README edits in prep for release
cee1464 is described below

commit cee1464097b0b4b86904d8c7037a37d1dc7d4110
Author: stack <st...@apache.org>
AuthorDate: Fri Sep 13 09:01:37 2019 -0700

    HBASE-23021 [hbase-operator-tools] README edits in prep for release
    
    Main changes:
    
     * Underline how hbck1 differs from hbck2 and hbck2 philosophy.
     * Make entry and section transitions ore palatable.
     * Try to talk up repair process. Explain fix hbase:meta first and then
     everything else. Bulk up general rules.
---
 hbase-hbck2/README.md                              | 333 ++++++++++++---------
 .../src/main/java/org/apache/hbase/HBCK2.java      |  72 ++---
 2 files changed, 233 insertions(+), 172 deletions(-)

diff --git a/hbase-hbck2/README.md b/hbase-hbck2/README.md
index fceebd5..5d71358 100644
--- a/hbase-hbck2/README.md
+++ b/hbase-hbck2/README.md
@@ -18,38 +18,60 @@
 
 # Apache HBase HBCK2 Tool
 
-HBCK2 is the successor to [hbck](https://hbase.apache.org/book.html#hbck.in.depth),
-the hbase-1.x fixup tool (A.K.A _hbck1_). Use it in place of _hbck1_ making repairs
-against hbase-2.x installs.
+_HBCK2_ is the repair tool for Apache HBase clusters.
 
-_HBCK2_ differs from _hbck1_ philosophically. Each run performs a discrete task rather than
-presume the tool can  repair 'all problems'. It is more of the vein of
-[`plumbing` than `porecelain`](https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain).
+Problems in operation are bugs.
+The need for an _HBCK2_ fix is meant as workaround until the bug is fixed and
+deployed in a new hbase version.
 
-## _hbck1_
-The _hbck_ tool that ships with hbase-1.x (A.K.A _hbck1_) should not be run against an
-hbase-2.x cluster. It may do damage. While _hbck1_ is still bundled inside hbase-2.x
--- to minimize surprise -- it's write-facility (`-fix`) has been removed. It can report
-on the state of an hbase-2.x cluster but its assessments are likely inaccurate since it
-does not understand the internal workings of an hbase-2.x.
+## _HBCK2_ vs _hbck1_
+HBCK2 is the successor to [hbck](https://hbase.apache.org/book.html#hbck.in.depth),
+the repair tool that shipped with _hbase-1.x_ (A.K.A _hbck1_).  Use _HBCK2_ in place of
+_hbck1_ making repairs against hbase-2.x clusters. _hbck1_ should not be run against an
+hbase-2.x install. It may do damage. While _hbck1_ is still bundled inside hbase-2.x
+-- to minimize surprise -- it is deprecated, to be removed in _hbase-3.x_. It's
+write-facility (`-fix`) has been removed. It can report on the state of an hbase-2.x
+cluster but its assessments will be inaccurate since it does not understand the internal
+workings of an hbase-2.x.
+
+_HBCK2_ does not work the way _hbck1_ used to. See the next section for how.
+
+## Philosophy
+_HBCK2_ performs a single discrete 'fix' task each time it is run. It does not presume
+a tool can analyze all about the running cluster and then repair 'all problems' found as
+_hbck1_ used suggest. _HBCK2_ is a tool that is more in the vein of
+[`plumbing` than `porcelain`](https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain).
+
+The _HBCK2_ tool makes fixes. For listings of inconsistencies or blockages in the running cluster,
+you go elsewhere, to the logs and UI of the running cluster Master. Once an issue has been identified,
+you use the _HBCK2_ tool to ask the Master to effect fixes or to skip-over bad state. Asking the
+Master for problems and to make fixes rather than try and effect the repair locally in a fix-it
+tool's context is another important difference between _HBCK2_ and _hbck1_. More on how this
+interactive fix-it process works and on _HBCK2_ workings can be found in sections that follow.
+
+## Obtaining _HBCK2_
+Releases can be found under the HBase distribution directory. See the
+[HBASE Downloads Page](http://hbase.apache.org/downloads.html).
 
 ## Building _HBCK2_
 
 Run:
 ```
-mvn install
+$ mvn install
 ```
 The built _HBCK2_ jar will be in the `target` sub-directory.
 
 ## Running _HBCK2_
 The _HBCK2_ jar does not include dependencies; it is not built as a 'fat' jar.
 Dependencies must be `provided`. Building, adjusting the target hbase version in the
-top-level pom to match your deploy will make for the smoothest operation (See
-the parent pom.xml `hbase-operator-tools` for the
+top-level pom to match your deploy will make for the smoothest operation when run
+against your deploy (See the parent pom.xml `hbase-operator-tools` for the
 [hbase.version to set](https://github.com/apache/hbase-operator-tools/blob/master/pom.xml#L126)).
-Where this can get interesting is at runtime when _HBCK2_ is in advance of your hbase
-deploy such that your hbase does not support all APIs in current _HBCK2_. Where
-_HBCK2_ does not have needed server-side support it should fail gracefully.
+
+Where runtime interaction between _HBCK2_ and running cluster can get interesting is
+when _HBCK2_ is in advance of your hbase deploy such that your hbase does not support
+all APIs in current _HBCK2_. Where _HBCK2_ does not have needed server-side support
+it should fail gracefully. Use an older release or upgrade your cluster (if you can).
 
 The easiest means of 'providing' _HBCK2_ its dependencies is by launching
 _HBCK2_ via the `$HBASE_HOME/bin/hbase` script. The `bin/hbase` script natively
@@ -80,30 +102,28 @@ Command:
  addFsRegionsMissingInMeta <NAMESPACE|NAMESPACE:TABLENAME>...
    Options:
     -d,--force_disable aborts fix for table if disable fails.
-   To be used in scenarios where some regions may be missing in META,
-   but there's still a valid 'regioninfo' metadata file on HDFS.
-   This is a lighter version of 'OfflineMetaRepair tool commonly used for
-   similar issues on 1.x release line.
-   This command needs META to be online. For each table name passed as
-   parameter, it performs a diff between regions available in META,
-   against existing regions dirs on HDFS. Then, for region dirs with
-   no matches in META, it reads regioninfo metadata file and
-   re-creates given region in META. Regions are re-created in 'CLOSED'
-   state at META table only, but not in Masters' cache, and are not
-   assigned either. To get these regions online, run HBCK2 'assigns'command
-   printed at the end of this command results for convenience.
-
+   To be used when some regions may be missing from hbase:meta
+   but their directories are present in HDFS. This is a 'lighter'
+   version of 'OfflineMetaRepair' tool commonly used for similar
+   issues in hbase-1.x. This command needs hbase:meta to be online.
+   For each table name passed as parameter, it performs a diff
+   between regions available in hbase:meta and region dirs on HDFS.
+   Then for dirs with no hbase:meta matches, it reads the 'regioninfo'
+   metadata file and re-creates given region in hbase:meta. Regions are
+   re-created in 'CLOSED' state in the hbase:meta table, but not in the
+   Masters' cache, and they are not assigned either. To get these
+   regions online, run the HBCK2 'assigns'command printed when this
+   command-run completes.
    NOTE: If using hbase releases older than 2.3.0, a rolling restart of
    HMasters is needed prior to executing the provided 'assigns' command.
-
-   An example adding missing regions for tables 'tbl_1' on default
-   namespace, 'tbl_2' on namespace 'n1' and for all tables from
+   An example adding missing regions for tables 'tbl_1' in the default
+   namespace, 'tbl_2' in namespace 'n1' and for all tables from
    namespace 'n2':
      $ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2
-   Returns HBCK2 'assigns' command with all re-inserted regions.
+   Returns HBCK2  an 'assigns' command with all re-inserted regions.
    SEE ALSO: reportMissingRegionsInMeta
 
- assigns [OPTIONS] <ENCODED_REGIONNAME>...
+    assigns [OPTIONS] <ENCODED_REGIONNAME>...
    Options:
     -o,--override  override ownership by another procedure
    A 'raw' assign that can be used even during Master initialization (if
@@ -127,26 +147,40 @@ Command:
    to finish parent and children. This is SLOW, and dangerous so use
    selectively. Does not always work.
 
- filesystem [OPTIONS] [<TABLENAME...]
+ filesystem [OPTIONS] [<TABLENAME>...]
+   Options:
+    -f, --fix    sideline corrupt hfiles, bad links, and references.
+   Report on corrupt hfiles, references, broken links, and integrity.
+   Pass '--fix' to sideline corrupt files and links. '--fix' does NOT
+   fix integrity issues; i.e. 'holes' or 'orphan' regions. Pass one or
+   more tablenames to narrow checkup. Default checks all tables and
+   restores 'hbase.version' if missing. Interacts with the filesystem
+   only! Modified regions need to be reopened to pick-up changes.
+
+ fixMeta
+   Do a server-side fixing of bad or inconsistent state in hbase:meta.
+   Repairs 'holes' and 'overlaps' in hbase:meta.
+   SEE ALSO: reportMissingRegionsInMeta
+
+ replication [OPTIONS] [<TABLENAME>...]
    Options:
-    -f, --fix    sideline corrupt hfiles, bad links and references.
-   Report corrupt hfiles and broken links. Pass '--fix' to sideline
-   corrupt files and links. Pass one or more tablenames to narrow the
-   checkup. Default checks all tables. Modified regions will need to be
-   reopened to pick-up changes.
+    -f, --fix    fix any replication issues found.
+   Looks for undeleted replication queues and deletes them if passed the
+   '--fix' option. Pass a table name to check for replication barrier and
+   purge if '--fix'.
 
  reportMissingRegionsInMeta <NAMESPACE|NAMESPACE:TABLENAME>...
-   To be used in scenarios where some regions may be missing in META,
-   but there's still a valid 'regioninfo metadata file on HDFS.
-   This is a checking only method, designed for reporting purposes and
-   doesn't perform any fixes, providing a view of which regions (if any)
-   would get re-added to meta, grouped by respective table/namespace.
-   To effectively re-add regions in meta, addFsRegionsMissingInMeta should be executed.
-   This command needs META to be online. For each namespace/table passed
-   as parameter, it performs a diff between regions available in META,
-   against existing regions dirs on HDFS. Region dirs with no matches
-   are printed grouped under its related table name. Tables with no
-   missing regions will show a 'no missing regions' message. If no
+   To be used when some regions may be missing from hbase:meta
+   but their directories are present in HDFS. This is a checking only
+   method, designed for reporting purposes and doesn't perform any
+   fixes, providing a view of which regions (if any) would get re-added
+   to meta, grouped by respective table/namespace. To effectively
+   re-add regions in meta, run addFsRegionsMissingInMeta.
+   This command needs hbase:meta to be online. For each namespace/table
+   passed as parameter, it performs a diff between regions available in
+   hbase:meta against existing regions dirs on HDFS. Region dirs with no
+   matches are printed grouped under its related table name. Tables with
+   no missing regions will show a 'no missing regions' message. If no
    namespace or table is specified, it will verify all existing regions.
    It accepts a combination of multiple namespace and tables. Table names
    should include the namespace portion, even for tables in the default
@@ -188,14 +222,14 @@ Command:
      $ HBCK2 setTableState users ENABLED
    Returns whatever the previous table state was.
 
- scheduleRecovery <SERVERNAME>...
+ scheduleRecoveries <SERVERNAME>...
    Schedule ServerCrashProcedure(SCP) for list of RegionServers. Format
    server name as '<HOSTNAME>,<PORT>,<STARTCODE>' (See HBase UI/logs).
    Example using RegionServer 'a.example.org,29100,1540348649479':
-     $ HBCK2 scheduleRecovery a.example.org,29100,1540348649479
+     $ HBCK2 scheduleRecoveries a.example.org,29100,1540348649479
    Returns the pid(s) of the created ServerCrashProcedure(s) or -1 if
    no procedure created (see master logs for why not).
-   Command only supported in hbase versions 2.0.3, 2.1.2, 2.2.0 (or newer).
+   Command support added in hbase versions 2.0.3, 2.1.2, 2.2.0 or newer.
 
  unassigns <ENCODED_REGIONNAME>...
    Options:
@@ -211,7 +245,6 @@ Command:
    SEE ALSO, org.apache.hbase.hbck1.OfflineMetaRepair, the offline
    hbase:meta tool. See the HBCK2 README for how to use.
 ```
-
 Note that when you pass `bin/hbase` the `hbck` argument, it will by
 default use the shaded client to get to the targeted hbase cluster.
 This is sufficient for most _HBCK2_ usage. If you run into complaints
@@ -233,38 +266,38 @@ Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
 ... it is because the HDFS jars are not on the CLASSPATH. The default is NOT
 to bundle HDFS jars on the CLASSPATH when running `hbck` via `bin/hbase`. Define
 `HADOOP_HOME` in the environment so `bin/hbase` can find your local hadoop
-install and load its HDFS jars.
+install and then it will load its HDFS jars.
 
 ## _HBCK2_ Overview
 _HBCK2_ is currently a simple tool that does one thing at a time only.
 
-In hbase-2.x, the Master is the final arbiter of all state, so a general principal for most of
+In hbase-2.x, the Master is the final arbiter of all state, so a general principal for most
 _HBCK2_ commands is that it asks the Master to effect all repair. This means a Master must be
-up before you can run (most) _HBCK2_ commands.
+up before you can run _HBCK2_ commands.
 
-_HBCK2_ implementation approach is to make use of an
+The _HBCK2_ implementation approach is to make use of an
 `HbckService` hosted on the Master. The Service publishes a few methods for the _HBCK2_ tool to
 pull on. Therefore, for _HBCK2_ commands relying on Master's `HbckService` facade,
 first thing _HBCK2_ does is poke the cluster to ensure the service is available.
 This will fail if the remote Server does not publish the Service or if the
-`HbckService` is lacking the requested method.
+`HbckService` is lacking the requested method. For the latter case, if you can,
+update your cluster to obtain more fix facility.
 
 _HBCK2_ versions should be able to work across multiple hbase-2 releases. It will
 fail with a complaint if it is unable to run. There is no `HbckService` in versions
 of hbase before 2.0.3 and 2.1.1. _HBCK2_ will not work against these versions.
 
-As _HBCK2_ evolves independently from _HBase_ main project, there will be eventually the need to
-define new fix methods with client side implementations (at least until a related one can be added
-on Master's `HbckService` facade), so that _HBCK2_ can operate on such _HBase_ releases without
-requiring a cluster upgrade. One example of such methods is the _setRegionState_.
+Next we look first at how you 'find' issues in your running cluster followed by
+a section on how you 'fix' found problems.
 
 ## Finding Problems
 
 While _hbck1_ performed analysis reporting your cluster GOOD or BAD, _HBCK2_
 is less presumptious. In hbase-2.x, the operator figures what needs fixing and
-then uses tooling including _HBCK2_ to do fixup.
+then uses tooling including _HBCK2_ to do fixup. The operator may have to go
+a few rounds of back and forth running _HBCK2_ then checking cluster state.
 
-To figure issues in assignment, make use of the following utilities.
+To figure cluster issues, make use of the following utilities and emissions.
 
 ### Diagnosis Tooling
 
@@ -283,7 +316,7 @@ Procedure's various stages to finish. Some Procedures spawn sub-procedures,
 wait on their Children, and then themselves finish. Each child logs
 its _pid_ but also its _ppid_; its parent's _pid_.
 
-Generally all runs problem free but if some unforeseen circumstance
+Generally all run problem free but if some unforeseen circumstance
 arises, the assignment framework may sustain damage requiring
 operator intervention.  Below we will discuss some such scenarios
 but they can manifest in the Master log as a Region being _STUCK_ or
@@ -297,8 +330,7 @@ _STUCK_ Procedures look like this:
 2018-09-12 15:29:06,558 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=va1001.example.org,22101,1536173230599, table=IntegrationTestBigLinkedList_20180626110336, region=dbdb56242f17610c46ea044f7a42895b
 ```
 
-
-#### /master-status#tables
+#### Master UI: /master-status#tables
 
 This section about midway down in Master UI home-page shows a list of tables
 with columns for whether the table is _ENABLED_, _ENABLING_, _DISABLING_, or
@@ -310,7 +342,7 @@ _ENABLED_ and there are Regions that are not in the _OPEN_ state
 and the Master Log is silent about any ongoing assigns, then
 something is amiss.
 
-#### Procedures & Locks
+#### Master UI: 'Procedures & Locks'
 
 This page off the Master UI home page under the
 _Procedures & Locks_ menu item in the page heading lists all ongoing
@@ -331,7 +363,7 @@ $ echo "list_locks"| hbase shell &> /tmp/locks.txt
 $ echo "list_procedures"| hbase shell &> /tmp/procedures.txt
 ```
 
-#### The 'HBCK Report'
+#### Master UI: The 'HBCK Report'
 An `HBCK Report` page was added to the Master in versions hbase 2.3.0/2.1.6/2.2.1
 at `/hbck.jsp`
 which shows output from two inspections run by the master on an interval; one
@@ -391,15 +423,22 @@ queue a new Assign Procedure (watch the Master logs to see the
 Assign run). If many Regions to assign, use the _HBCK2_ tool. It
 can do bulk assigning.
 
-## Fixing
+## Fixing Problems
 
-General principals include a Region can not be assigned if
-it is in _CLOSING_ state (or the inverse, unassigned if in
-_OPENING_ state) without first transitioning via _CLOSED_:
-Regions must always move from _CLOSED_, to _OPENING_, to _OPEN_,
-and then to _CLOSING_, _CLOSED_.
+### Some General Principals
+When making repair, make sure hbase:meta is consistent first
+before you go about fixing any other issue type such as a filesystem
+deviance. Deviance in the filesystem or problems with assign should
+be addressed after the hbase:meta has been put in order. If hbase:meta
+is out of whack, the Master cannot make proper placements when adopting orphan
+filesystem data or making region assignments.
 
-When making repair, do fixup a table at a time.
+Other general principals to keep in mind include a Region can not be assigned if
+it is in _CLOSING_ state (or the inverse, unassigned if in _OPENING_ state) without
+first transitioning via _CLOSED_: Regions must always move from _CLOSED_, to _OPENING_,
+to _OPEN_, and then to _CLOSING_, _CLOSED_.
+
+When making repair, do fixup of a table-at-a-time.
 
 Also, if a table is _DISABLED_, you cannot assign a Region.
 In the Master logs, you will see that the Master will report
@@ -413,25 +452,9 @@ assign, and then set it back again after the unassign.
 _HBCK2_ has facility to allow you do this. See the
 _HBCK2_ usage output.
 
-### Start-over
-
-At an extreme, if the Master is distraught and all attempts at fixup only
-turn up undoable locks or Procedures that won't finish, and/or
-the set of MasterProcWALs is growing without bound, it is
-possible to wipe the Master state clean. Just move aside the
-_/hbase/MasterProcWALs/_ directory under your hbase install and
-restart the Master process. It will come back as a `tabula rasa` without
-memory of the bad times past.
-
-If at the time of the erasure, all Regions were happily
-assigned or offlined, then on Master restart, the Master should
-pick up and continue as though nothing happened. But if there were Regions-In-Transition
-at the time, then the operator may have to intervene to bring outstanding
-assigns/unassigns to their terminal point. Read the _hbase:meta_
-_info:state_ columns as described above to figure what needs
-assigning/unassigning. Having erased all history moving aside
-the _MasterProcWALs_, none of the entities should be locked so
-you are free to bulk assign/unassign.
+What follows is a mix of notes and prescription that comes of experience running hbase-2.x so far.
+The root issues that brought on states described below has been fixed in later versions of hbase
+so upgrade if you can so as to avoid secenarios described.
 
 ### Assigning/Unassigning
 
@@ -456,7 +479,6 @@ _hbase:meta_ (or _hbase:namespace_). To inject one, use the _HBCK2_ tool:
 ```
 HBASE_CLASSPATH_PREFIX=./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase org.apache.hbase.HBCK2 assigns -skip 1588230740
 ```
-
 ...where 1588230740 is the encoded name of the _hbase:meta_ Region. Pass the '-skip' option to
 stop HBCK2 doing a verstion check against the remote master. If the remote master is not up,
 the version check will prompt a 'Master is initializing response' or 'PleaseHoldException'
@@ -468,43 +490,50 @@ encoded Region name of the _hbase:namespace_ Region and do similar to
 what we did for _hbase:meta_. In this latter case, the Master actually
 prints out a helpful message that looks like the following:
 
-```2019-07-09 22:08:38,966 WARN  [master/localhost:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1562733904278.9559cf72b8e81e1291c626a8e781a6ae. is NOT online; state={9559cf72b8e81e1291c626a8e781a6ae state=CLOSED, ts=1562735318897, server=null}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.```
+```
+2019-07-09 22:08:38,966 WARN  [master/localhost:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1562733904278.9559cf72b8e81e1291c626a8e781a6ae. is NOT online; state={9559cf72b8e81e1291c626a8e781a6ae state=CLOSED, ts=1562735318897, server=null}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
+```
 
 To schedule an assign for the hbase:namespace table noted in the above log line, you would do:
-```HBASE_CLASSPATH_PREFIX=./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase org.apache.hbase.HBCK2 -skip assigns 9559cf72b8e81e1291c626a8e781a6ae```
+```
+ $ ${HBASE_HOME}/bin/hbase --config /etc/hbase-conf hbck -j ./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase -skip assigns 9559cf72b8e81e1291c626a8e781a6ae
+```
 ... passing the encoded name for the namespace region (the encoded name will differ per deploy).
 
-### Missing Regions in META - hbase:meta region/table restore/rebuild
+### Missing Regions in hbase:meta region/table restore/rebuild
 
-There's been some extra-ordinary cases where table regions are removed from META table.
-Some triage on such cases revealed those were operator-induced, after execution
-attempts of the obsolete *hbck1* _OfflineMetaRepair_ tool. _OfflineMetaRepair_ is a well known tool
-for fixing META table related issues on HBase 1.x versions. The original version is not compatible
-with HBase 2.x or higher versions, and it has undergone some adjustments to be now run within hbck2.
+There have been some unusual cases where table regions have been removed from hbase:meta table.
+Some triage on such cases revealed these were operator-induced. Users had run the obsolete
+*hbck1* _OfflineMetaRepair_ tool against an _HBCK2_ cluster. _OfflineMetaRepair_ is a well
+known tool for fixing hbase:meta table related issues on HBase 1.x versions. The original
+version is not compatible with HBase 2.x or higher versions, and it has undergone some
+adjustments so in the extreme, it can now be run via _HBCK2_.
 
-In most of these cases, regions may be missing in meta at random, but hbase may still be
-operational. In such situations, problem can be addressed with master online, using _addFsRegionsMissingInMeta_ command.
-This command is less disruptive to hbase than the full meta rebuild covered later, and it can be used even for
-recovering _namespace_ table region.
+In most of these cases, regions end up missing in hbase:meta at random, but hbase may still be
+operational. In such situations, problem can be addressed with the Master online,
+using the _addFsRegionsMissingInMeta_ command in _HBCK2_. This command is less disruptive to
+hbase than a full hbase:meta rebuild covered later, and it can be used even for
+recovering the _namespace_ table region.
 
-#### Online meta rebuild recipe
+#### Online hbase:meta rebuild recipe
 
-If meta corruption is not too critical, hbase would still be able to bring it online. Even if namespace region
-is among the missing ones in meta, it will still be possible to scan meta in the initialization period,
-where master will be waiting for namespace to be assigned. To verify on this, a meta scan command can be executed
-as below. If it does not time out or show any errors, _meta_ is online:
+If hbase:meta corruption is not too critical, hbase would still be able to bring it online. Even if namespace region
+is among the missing regions, it will still be possible to scan hbase:meta during the
+initialization period, where Master will be waiting for namespace to be assigned.
+To verify this situation, a hbase:meta scan command can be executed
+as below. If it does not time out or show any errors, _hbase:meta_ is online:
 
 ```
 echo "scan 'hbase:meta', {COLUMN=>'info:regioninfo'}" | hbase shell
 ```
 
-HBCK2 _addFsRegionsMissingInMeta_ can be used if the above does not show any errors. It reads region
-metadata info available on the FS region dirs, in order to re-create regions in META. Since it can
-run with hbase partially operational, it attempts to disable online tables that are affected by the
-reported problem and is gonna have regions re-added to _meta_.
-It can check for specific tables/namespaces, or all tables
-from all namespaces. An example adding missing regions for tables 'tbl_1' on default namespace,
-'tbl_2' on namespace 'n1' and for all tables from namespace 'n2':
+_HBCK2_ _addFsRegionsMissingInMeta_ can be used if the above does not show any errors. It reads region
+metadata info available on the FS region directories in order to recreate regions
+in hbase:meta. Since it can run with hbase partially operational, it attempts to disable online tables
+that are affected by the reported problem and it is going to readd regions to _hbase:meta_.
+It can check for specific tables/namespaces, or all tables from all namespaces.
+An example below shows adding missing regions for tables 'tbl_1' in the default namespace,
+'tbl_2' in namespace 'n1', and for all tables from namespace 'n2':
 
 ```
 $ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2
@@ -519,38 +548,50 @@ command needs to be executed later, so copy and save it for convenience.
 2. For HBase versions prior to 2.3.0, after _addFsRegionsMissingInMeta_ finished successfully and output has been saved,
 restart all running HBase Masters.
 
-3. Once Master's are restarted and META is already online (check if Web UI is accessible), run
+3. Once Master's are restarted and hbase:meta is already online (check if Web UI is accessible), run
 _assigns_ command from _addFsRegionsMissingInMeta_ output saved per instructions from #1.
 
-NOTE: If _namespace_ region is among the missing ones, you will need to add _--skip_ flag at the
+NOTE: If _namespace_ region is among the missing regions, you will need to add _--skip_ flag at the
 beginning of _assigns_ command returned.
 
-
-Should a cluster suffer a catastrophic loss of the `hbase:meta` region, a rough rebuild is possible following the below recipe.
-In outline: stop the cluster; run the _OfflineMetaRepair_ tool which reads directories and metadata dropped into the filesystem making a best effort at reconstructing a viable _hbase:meta_ table; restart your cluster; inject an assign to bring the system namespace table online; and then finally, re-assign userspace tables you'd like enabled (the rebuilt _hbase:meta_ creates a table with all tables offline and no regions assigned).
+Should a cluster suffer a catastrophic loss of the `hbase:meta` table, a rough rebuild is possible using the following recipe.
+In outline, we stop the cluster; run the _HBCK2_ _OfflineMetaRepair_ tool which reads directories and metadata dropped into the filesystem
+making a best effort at reconstructing a viable _hbase:meta_ table; restart your cluster; inject an assign to bring the system
+namespace table online; and then finally, re-assign userspace tables you'd like enabled (the rebuilt _hbase:meta_ creates a table with all tables offline and no regions assigned).
 
 #### Detailed rebuild recipe
 Stop the cluster.
 
 Run the rebuild _hbase:meta_ command from _HBCK2_. This will move aside the original _hbase:meta_ and put in place a newly rebuilt one. Below is an example of how to run the tool.  It adds the `-details` flag so the tool dumps info on the regions its found in hdfs:
-```$ HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.hbck1.OfflineMetaRepair -details```
+```
+$ HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.hbck1.OfflineMetaRepair -details
+```
 
 Start the cluster up. It won’t come up fully. It will be stuck because the _namespace_ table is not online and there is no assign procedure in the procedure store for this contingency. The hbase master log will show this state. Here is an example of what it will log:
-```2019-07-10 18:30:51,090 WARN  [master/localhost:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1562808216225.725a0fe6c2c869d3d0a9ed82bfa80fa3. is NOT online; state={725a0fe6c2c869d3d0a9ed82bfa80fa3 state=CLOSED, ts=1562808619952, server=null}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.```
+```
+2019-07-10 18:30:51,090 WARN  [master/localhost:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1562808216225.725a0fe6c2c869d3d0a9ed82bfa80fa3. is NOT online; state={725a0fe6c2c869d3d0a9ed82bfa80fa3 state=CLOSED, ts=1562808619952, server=null}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
+```
 
 To assign the namespace table region, you cannot use the shell. If you use the shell, it will fail with a `PleaseHoldException` because the master is not yet up (it is waiting for the namepace table to come online before it declares itself ‘up’). You have to use the `HBCK2` _assigns_ command. To assign, you will need the namespace encoded name. It shows in the log quoted above: i.e. _725a0fe6c2c869d3d0a9ed82bfa80fa3_ in this case. You will also have to pass the -skip command to ‘skip’ th [...]
-```$ HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 725a0fe6c2c869d3d0a9ed82bfa80fa3```
+```
+$ HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 725a0fe6c2c869d3d0a9ed82bfa80fa3
+```
 
 If the invocation comes back with ‘Connection refused’, is the Master up? The Master will shut down after a while if it can’t initialize itself. Just restart the cluster/master and rerun the above assigns command.
 
 When the assigns runs successfully, you’ll see it emit the likes of the following. The ‘48’ on the end is the pid of the assign procedure schedule. If the pid returned is ‘-1’, then the  master startup has not progressed sufficently… retry. Or, the encoded regionname is incorrect. Check.
-{{{$  HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 725a0fe6c2c869d3d0a9ed82bfa80fa3
+```
+$  HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 725a0fe6c2c869d3d0a9ed82bfa80fa3
 18:40:43.817 [main] WARN  org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 18:40:44.315 [main] INFO  org.apache.hbase.HBCK2 - hbck support check skipped
-[48]}}}
+[48]
+``````
 
 Check the master logs. The master should have come up. You’ll see successful completion of pid=48. Look for a line like this to verify successful master launch:
-```master.HMaster: Master has completed initialization 132.515sec``` It might take a while to appear.
+```
+master.HMaster: Master has completed initialization 132.515sec
+```
+It might take a while to appear.
 
 The rebuild of _hbase:meta_ adds the user tables in _DISABLED_ state and the regions in _CLOSED_ mode. Reenable tables via the shell to bring all table regions back online.
 Do it one-at-a-time or see the `enable_all ".*"` command to enable all tables in one shot.
@@ -559,7 +600,27 @@ The rebuild meta will likely be missing edits and may need subsequent repair and
 
 ### Dropped reference files, missing hbase.version file, and corrupted hfiles
 
-HBCK2 can check for hanging references and corrupt hfiles. You can ask it to sideline bad files which may be needed to get over humps where regions won't online or reads are failing. See the _filesystem_ command in the HBCK2 listing. Pass one or more tablename (or 'none' to check all tables). It will report bad files. Pass the _--fix_ option to effect repairs.
+_HBCK2_ can check for hanging references and corrupt hfiles. You can ask it to sideline bad files which may be needed to get over humps where regions won't online or reads are failing. See the _filesystem_ command in the _HBCK2_ listing. Pass one or more tablename (or 'none' to check all tables). It will report bad files. Pass the _--fix_ option to effect repairs.
+
+### Procedure Start-over
+
+At an extreme, as a last resource, if the Master is distraught and all
+attempts at fixup only turn up undoable locks or Procedures that won't finish, and/or
+the set of MasterProcWALs is growing without bound, it is
+possible to wipe the Master state clean. Just move aside the
+_/hbase/MasterProcWALs/_ directory under your hbase install and
+restart the Master process. It will come back as a `tabula rasa` without
+memory of the bad times past.
+
+If at the time of the erasure, all Regions were happily
+assigned or offlined, then on Master restart, the Master should
+pick up and continue as though nothing happened. But if there were Regions-In-Transition
+at the time, then the operator will have to intervene to bring outstanding
+assigns/unassigns to their terminal point. Read the _hbase:meta_
+_info:state_ columns as described above to figure what needs
+assigning/unassigning. Having erased all history moving aside
+the _MasterProcWALs_, none of the entities should be locked so
+you are free to bulk assign/unassign.
 
 ### Adopting 'Orphan' Data
 For how to fix `orphan` regions reported by the 'HBCK Chore',
diff --git a/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java b/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java
index 62693ba..067e7bb 100644
--- a/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java
+++ b/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java
@@ -371,28 +371,26 @@ public class HBCK2 extends Configured implements org.apache.hadoop.util.Tool {
       + "NAMESPACE:TABLENAME>...");
     writer.println("   Options:");
     writer.println("    -d,--force_disable aborts fix for table if disable fails.");
-    writer.println("   To be used in scenarios where some regions may be missing in META,");
-    writer.println("   but there's still a valid 'regioninfo' metadata file on HDFS. ");
-    writer.println("   This is a lighter version of 'OfflineMetaRepair tool commonly used for ");
-    writer.println("   similar issues on 1.x release line. ");
-    writer.println("   This command needs META to be online. For each table name passed as");
-    writer.println("   parameter, it performs a diff between regions available in META, ");
-    writer.println("   against existing regions dirs on HDFS. Then, for region dirs with ");
-    writer.println("   no matches in META, it reads regioninfo metadata file and ");
-    writer.println("   re-creates given region in META. Regions are re-created in 'CLOSED' ");
-    writer.println("   state at META table only, but not in Masters' cache, and are not ");
-    writer.println("   assigned either. To get these regions online, run HBCK2 'assigns'command ");
-    writer.println("   printed at the end of this command results for convenience.");
-    writer.println();
-    writer.println("   NOTE: If using hbase releases older than 2.3.0, a rolling restart of ");
-    writer.println("   HMasters is needed prior to executing the provided 'assigns' command. ");
-    writer.println();
-    writer.println("   An example adding missing regions for tables 'tbl_1' on default ");
-    writer.println("   namespace, 'tbl_2' on namespace 'n1' and for all tables from ");
-    writer.println("   namespace 'n2': ");
+    writer.println("   To be used when some regions may be missing from hbase:meta");
+    writer.println("   but their directories are present in HDFS. This is a 'lighter'");
+    writer.println("   version of 'OfflineMetaRepair' tool commonly used for similar");
+    writer.println("   issues in hbase-1.x. This command needs hbase:meta to be online.");
+    writer.println("   For each table name passed as parameter, it performs a diff");
+    writer.println("   between regions available in hbase:meta and region dirs on HDFS.");
+    writer.println("   Then for dirs with no hbase:meta matches, it reads the 'regioninfo'");
+    writer.println("   metadata file and re-creates given region in hbase:meta. Regions are");
+    writer.println("   re-created in 'CLOSED' state in the hbase:meta table, but not in the");
+    writer.println("   Masters' cache, and they are not assigned either. To get these");
+    writer.println("   regions online, run the HBCK2 'assigns'command printed when this");
+    writer.println("   command-run completes.");
+    writer.println("   NOTE: If using hbase releases older than 2.3.0, a rolling restart of");
+    writer.println("   HMasters is needed prior to executing the provided 'assigns' command.");
+    writer.println("   An example adding missing regions for tables 'tbl_1' in the default");
+    writer.println("   namespace, 'tbl_2' in namespace 'n1' and for all tables from");
+    writer.println("   namespace 'n2':");
     writer.println("     $ HBCK2 " + ADD_MISSING_REGIONS_IN_META_FOR_TABLES +
-      " default:tbl_1 n1:tbl_2 n2 ");
-    writer.println("   Returns HBCK2 'assigns' command with all re-inserted regions.");
+      " default:tbl_1 n1:tbl_2 n2");
+    writer.println("   Returns HBCK2  an 'assigns' command with all re-inserted regions.");
     writer.println("   SEE ALSO: " + REPORT_MISSING_REGIONS_IN_META);
     writer.println();
     writer.println(" " + ASSIGNS + " [OPTIONS] <ENCODED_REGIONNAME>...");
@@ -437,7 +435,9 @@ public class HBCK2 extends Configured implements org.apache.hadoop.util.Tool {
     writer.println("   only! Modified regions need to be reopened to pick-up changes.");
     writer.println();
     writer.println(" " + FIX_META);
-    writer.println("   Do a server-side fixing of bad or inconsistent state in hbase:meta");
+    writer.println("   Do a server-side fixing of bad or inconsistent state in hbase:meta.");
+    writer.println("   Repairs 'holes' and 'overlaps' in hbase:meta.");
+    writer.println("   SEE ALSO: " + REPORT_MISSING_REGIONS_IN_META);
     writer.println();
     writer.println(" " + REPLICATION + " [OPTIONS] [<TABLENAME>...]");
     writer.println("   Options:");
@@ -448,18 +448,18 @@ public class HBCK2 extends Configured implements org.apache.hadoop.util.Tool {
     writer.println();
     writer.println(" " + REPORT_MISSING_REGIONS_IN_META + " <NAMESPACE|"
       + "NAMESPACE:TABLENAME>...");
-    writer.println("   To be used in scenarios where some regions may be missing in META,");
-    writer.println("   but there's still a valid 'regioninfo metadata file on HDFS. ");
-    writer.println("   This is a checking only method, designed for reporting purposes and");
-    writer.println("   doesn't perform any fixes, providing a view of which regions (if any) ");
-    writer.println("   would get re-added to meta, grouped by respective table/namespace. ");
-    writer.println("   To effectively re-add regions in meta, "
-      + ADD_MISSING_REGIONS_IN_META_FOR_TABLES + " should be executed. ");
-    writer.println("   This command needs META to be online. For each namespace/table passed");
-    writer.println("   as parameter, it performs a diff between regions available in META, ");
-    writer.println("   against existing regions dirs on HDFS. Region dirs with no matches");
-    writer.println("   are printed grouped under its related table name. Tables with no");
-    writer.println("   missing regions will show a 'no missing regions' message. If no");
+    writer.println("   To be used when some regions may be missing from hbase:meta");
+    writer.println("   but their directories are present in HDFS. This is a checking only");
+    writer.println("   method, designed for reporting purposes and doesn't perform any");
+    writer.println("   fixes, providing a view of which regions (if any) would get re-added");
+    writer.println("   to meta, grouped by respective table/namespace. To effectively");
+    writer.println("   re-add regions in meta, run " + ADD_MISSING_REGIONS_IN_META_FOR_TABLES +
+        ".");
+    writer.println("   This command needs hbase:meta to be online. For each namespace/table");
+    writer.println("   passed as parameter, it performs a diff between regions available in");
+    writer.println("   hbase:meta against existing regions dirs on HDFS. Region dirs with no");
+    writer.println("   matches are printed grouped under its related table name. Tables with");
+    writer.println("   no missing regions will show a 'no missing regions' message. If no");
     writer.println("   namespace or table is specified, it will verify all existing regions.");
     writer.println("   It accepts a combination of multiple namespace and tables. Table names");
     writer.println("   should include the namespace portion, even for tables in the default");
@@ -470,7 +470,7 @@ public class HBCK2 extends Configured implements org.apache.hadoop.util.Tool {
     writer.println("   An example triggering missing regions report for table 'table_1'");
     writer.println("   under default namespace, and for all tables from namespace 'ns1':");
     writer.println("     $ HBCK2 reportMissingRegionsInMeta default:table_1 ns1");
-    writer.println("   Returns list of missing regions for each table passed as parameter, or ");
+    writer.println("   Returns list of missing regions for each table passed as parameter, or");
     writer.println("   for each table on namespaces specified as parameter.");
     writer.println();
     writer.println(" " + SET_REGION_STATE + " <ENCODED_REGIONNAME> <STATE>");
@@ -494,7 +494,7 @@ public class HBCK2 extends Configured implements org.apache.hadoop.util.Tool {
     writer.println(" " + SET_TABLE_STATE + " <TABLENAME> <STATE>");
     writer.println("   Possible table states: " + Arrays.stream(TableState.State.values()).
         map(Enum::toString).collect(Collectors.joining(", ")));
-    writer.println("   To read current table state, in the hbase shell run: ");
+    writer.println("   To read current table state, in the hbase shell run:");
     writer.println("     hbase> get 'hbase:meta', '<TABLENAME>', 'table:state'");
     writer.println("   A value of \\x08\\x00 == ENABLED, \\x08\\x01 == DISABLED, etc.");
     writer.println("   Can also run a 'describe \"<TABLENAME>\"' at the shell prompt.");