You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jeremy Carroll (JIRA)" <ji...@apache.org> on 2012/06/05 00:33:23 UTC

[jira] [Commented] (HBASE-4393) Implement a canary monitoring program

    [ https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288967#comment-13288967 ] 

Jeremy Carroll commented on HBASE-4393:
---------------------------------------

Just wanted to put in a few operational comments. We have a version of this Canary script hooked up to our current HBase cluster for monitoring. It works well to determine if your cluster is responding to RPC's in a health amount of time. But it does not work well to determine latency for requests overall as the getStartKey becomes cached. Since a request for the same key over, and over again is basically cache warming it returns in <1ms every time after a few iterations.

We played around with the idea of using a random request within the RegionServer to get non-cache latency responses. In this scenario we basically are testing our disk latency. IMHO the intention of the Canary is not to test my disk response but the overall response / health of the HBase RegionServer. We took an approach to use the fsLatency histogram metrics (99, 999th percent) in a separate check in addition to the Canary for overall health status.
                
> Implement a canary monitoring program
> -------------------------------------
>
>                 Key: HBASE-4393
>                 URL: https://issues.apache.org/jira/browse/HBASE-4393
>             Project: HBase
>          Issue Type: New Feature
>          Components: monitoring
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Matteo Bertozzi
>             Fix For: 0.94.0, 0.96.0
>
>         Attachments: Canary-v0.java, HBASE-4393-v0.patch, HBaseCanary.java
>
>
> This JIRA is to implement a standalone program that can be used to do "canary monitoring" of a running HBase cluster. This program would gather a list of the regions in the cluster, then iterate over them doing lightweight operations (eg short scans) to provide metrics about latency as well as alert on availability issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira