You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/06/12 21:54:17 UTC

[GitHub] [hbase] HorizonNet commented on a change in pull request #1894: HBASE-21405 [DOC] Add Details about Output of "status 'replication'"

HorizonNet commented on a change in pull request #1894:
URL: https://github.com/apache/hbase/pull/1894#discussion_r439656217



##########
File path: src/main/asciidoc/_chapters/ops_mgt.adoc
##########
@@ -2629,6 +2629,91 @@ You can use the HBase Shell command `status 'replication'` to monitor the replic
 * `status 'replication', 'source'` -- prints the status for each replication source, sorted by hostname.
 * `status 'replication', 'sink'` -- prints the status for each replication sink, sorted by hostname.
 
+==== Understanding the output
+
+The command output will vary according to the state of replication. For example right after a restart
+and if destination peer is not reachable, no replication source threads would be running,
+so no metrics would get displayed:
+
+----
+hbase01.home:
+SOURCE: PeerID=1
+Normal Queue: 1
+No Reader/Shipper threads runnning yet.
+SINK: TimeStampStarted=1591985197350, Waiting for OPs...
+----
+
+Under normal circumstances, a healthy, active-active replication deployment would
+show the following:
+
+----
+    hbase01.home:
+      SOURCE: PeerID=1
+         Normal Queue: 1
+           AgeOfLastShippedOp=0, TimeStampOfLastShippedOp=Fri Jun 12 18:49:23 BST 2020, SizeOfLogQueue=1, EditsReadFromLogQueue=1, OpsShippedToTarget=1, TimeStampOfNextToReplicate=Fri Jun 12 18:49:23 BST 2020, Replication Lag=0
+      SINK: TimeStampStarted=1591983663458, AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Fri Jun 12 18:57:18 BST 2020
+----
+
+The definition for each of these metrics is detailed below:
+
+[cols="1,1,1", options="header"]
+|===
+| Type
+| Metric Name
+| Description
+
+| Source
+| AgeOfLastShippedOp
+| How long last successfully shipped edit took to effectively get replicated on target.
+
+| Source
+| TimeStampOfLastShippedOp
+| The actual date of last successful edit shipment.
+
+| Source
+| `

Review comment:
       This should be `SizeOfLogQueue`, or?

##########
File path: src/main/asciidoc/_chapters/ops_mgt.adoc
##########
@@ -2629,6 +2629,91 @@ You can use the HBase Shell command `status 'replication'` to monitor the replic
 * `status 'replication', 'source'` -- prints the status for each replication source, sorted by hostname.
 * `status 'replication', 'sink'` -- prints the status for each replication sink, sorted by hostname.
 
+==== Understanding the output
+
+The command output will vary according to the state of replication. For example right after a restart
+and if destination peer is not reachable, no replication source threads would be running,
+so no metrics would get displayed:
+
+----
+hbase01.home:
+SOURCE: PeerID=1
+Normal Queue: 1
+No Reader/Shipper threads runnning yet.
+SINK: TimeStampStarted=1591985197350, Waiting for OPs...
+----
+
+Under normal circumstances, a healthy, active-active replication deployment would
+show the following:
+
+----
+    hbase01.home:
+      SOURCE: PeerID=1
+         Normal Queue: 1
+           AgeOfLastShippedOp=0, TimeStampOfLastShippedOp=Fri Jun 12 18:49:23 BST 2020, SizeOfLogQueue=1, EditsReadFromLogQueue=1, OpsShippedToTarget=1, TimeStampOfNextToReplicate=Fri Jun 12 18:49:23 BST 2020, Replication Lag=0
+      SINK: TimeStampStarted=1591983663458, AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Fri Jun 12 18:57:18 BST 2020
+----
+
+The definition for each of these metrics is detailed below:
+
+[cols="1,1,1", options="header"]
+|===
+| Type
+| Metric Name
+| Description
+
+| Source
+| AgeOfLastShippedOp
+| How long last successfully shipped edit took to effectively get replicated on target.
+
+| Source
+| TimeStampOfLastShippedOp
+| The actual date of last successful edit shipment.
+
+| Source
+| `
+| Number of wal files on this given queue.
+
+| Source
+| EditsReadFromLogQueue
+| How many edits have been read from this given queue since this source thread started.
+
+| Source
+| OpsShippedToTarget
+| How many edits have been shipped to target since this source thread started.
+
+| Source
+| TimeStampOfNextToReplicate
+| Date of the current edit been attempted to replicate.
+
+| Source
+| Replication Lag
+| The elapsed time (in millis), since the last edit to replicate was read by this source
+thread and effectively replicated to target
+
+| Sink
+| TimeStampStarted
+| Date (in millis) of when this Sink thread started.
+
+| Sink
+| AgeOfLastAppliedOp
+| How long it took to apply the last successful shipped edit.
+
+| Sink
+| TimeStampsOfLastAppliedOp
+| Date of last successful applied edit.
+
+|===
+
+Growing values for `Source.TimeStampsOfLastAppliedOp` and/or
+`Source.Replication Lag` would indicate replication delays. If those numbers keep going
+up, while `Source.TimeStampOfLastShippedOp`, `Source.EditsReadFromLogQueue`,
+`Source.OpsShippedToTarget` or `Source.TimeStampOfNextToReplicate` do not change at all,
+ then replication flow is failing to progress, and there might be problems within
+clusters communication. This could also happen if replication is manually paused
+(via hbase shell `disable_peer`command, for example), but date keeps getting ingested

Review comment:
       There seems to be a space missing before "command".




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org