You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2014/06/25 03:34:24 UTC

[jira] [Commented] (HBASE-9531) a command line (hbase shell) interface to retreive the replication metrics and show replication lag

    [ https://issues.apache.org/jira/browse/HBASE-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042936#comment-14042936 ] 

Andrew Purtell commented on HBASE-9531:
---------------------------------------

This looks plausible to me and I like the clean extension of the existing 'status' command. 

{code}
+  // A default Load for the case of no replication
+  private static final HashMap<String, String> noReplicationLoadMap = new HashMap<String, String>();
+  static {
+    noReplicationLoadMap.put(REPLICATIONLOADSOURCE, "Replication disabled");
+    noReplicationLoadMap.put(REPLICATIONLOADSINK, "Replication disabled");
+  }
+
{code}

Can we just not set the new fields in ClusterStatus if replication is not active?

In ReplicationLoad.java, please don't start method names with capital letters.

The default status command is 'summary', so we shouldn't dump all of the source and sink information as default, that's not a summary by definition.

{code}
diff --git hbase-shell/src/main/ruby/shell/commands/status.rb hbase-shell/src/main/ruby/shell/commands/status.rb
index f72c13c..4654b4a 100644
--- hbase-shell/src/main/ruby/shell/commands/status.rb
+++ hbase-shell/src/main/ruby/shell/commands/status.rb
@@ -22,18 +22,22 @@ module Shell
     class Status < Command
       def help
         return <<-EOF
-Show cluster status. Can be 'summary', 'simple', or 'detailed'. The
+Show cluster status. Can be 'summary', 'simple', 'detailed', or 'replication'. The
 default is 'summary'. Examples:
 
   hbase> status
   hbase> status 'simple'
   hbase> status 'summary'
   hbase> status 'detailed'
+  hbase> status 'replication'
+  hbase> status 'replication', 'source'
+  hbase> status 'replication', 'sink'
+
 EOF
       end
 
-      def command(format = 'summary')
-        admin.status(format)
+      def command(format = 'summary',type = 'both')
+        admin.status(format,type)
       end
     end
   end
{code}


> a command line (hbase shell) interface to retreive the replication metrics and show replication lag
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9531
>                 URL: https://issues.apache.org/jira/browse/HBASE-9531
>             Project: HBase
>          Issue Type: New Feature
>          Components: Replication
>    Affects Versions: 0.99.0
>            Reporter: Demai Ni
>            Assignee: Demai Ni
>             Fix For: 0.99.0, 0.98.4
>
>         Attachments: HBASE-9531-trunk-v0.patch, HBASE-9531-trunk-v0.patch
>
>
> This jira is to provide a command line (hbase shell) interface to retreive the replication metrics info such as:ageOfLastShippedOp, timeStampsOfLastShippedOp, sizeOfLogQueue ageOfLastAppliedOp, and timeStampsOfLastAppliedOp. And also to provide a point of time info of the lag of replication(source only)
> Understand that hbase is using Hadoop metrics(http://hbase.apache.org/metrics.html), which is a common way to monitor metric info. This Jira is to serve as a light-weight client interface, comparing to a completed(certainly better, but heavier)GUI monitoring package. I made the code works on 0.94.9 now, and like to use this jira to get opinions about whether the feature is valuable to other users/workshop. If so, I will build a trunk patch. 
> All inputs are greatly appreciated. Thank you!
> The overall design is to reuse the existing logic which supports hbase shell command 'status', and invent a new module, called ReplicationLoad.  In HRegionServer.buildServerLoad() , use the local replication service objects to get their loads  which could be wrapped in a ReplicationLoad object and then simply pass it to the ServerLoad. In ReplicationSourceMetrics and ReplicationSinkMetrics, a few getters and setters will be created, and ask Replication to build a "ReplicationLoad".  (many thanks to Jean-Daniel for his kindly suggestions through dev email list)
> the replication lag will be calculated for source only, and use this formula: 
> {code:title=Replication lag|borderStyle=solid}
> 	if sizeOfLogQueue != 0 then max(ageOfLastShippedOp, (current time - timeStampsOfLastShippedOp)) //err on the large side
> 	else if (current time - timeStampsOfLastShippedOp) < 2* ageOfLastShippedOp then lag = ageOfLastShippedOp // last shipped happen recently 
>         else lag = 0 // last shipped may happens last night, so NO real lag although ageOfLastShippedOp is non-zero
> {code}
> External will look something like:
> {code:title=status 'replication'|borderStyle=solid}
> hbase(main):001:0> status 'replication'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>         SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):002:0> status 'replication','source'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=14, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:49:48 PDT 2013
>     hdtest018.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest015.svl.ibm.com:
>         SOURCE:PeerID=1, ageOfLastShippedOp=0, sizeOfLogQueue=0, timeStampsOfLastShippedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','sink'
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com:
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013
>     hdtest018.svl.ibm.com:
>         SINK  :AgeOfLastAppliedOp=14, TimeStampsOfLastAppliedOp=Wed Sep 04 14:50:59 PDT 2013
>     hdtest015.svl.ibm.com:
>         SINK  :AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Wed Sep 04 14:48:48 PDT 2013
> hbase(main):003:0> status 'replication','lag' 
> version 0.94.9
> 3 live servers
>     hdtest017.svl.ibm.com: lag = 0
>     hdtest018.svl.ibm.com: lag = 14
>     hdtest015.svl.ibm.com: lag = 0
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)