You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2010/07/29 02:17:16 UTC

[jira] Created: (HBASE-2888) Review all our metrics

Review all our metrics
----------------------

                 Key: HBASE-2888
                 URL: https://issues.apache.org/jira/browse/HBASE-2888
             Project: HBase
          Issue Type: Improvement
          Components: master
            Reporter: Jean-Daniel Cryans
             Fix For: 0.90.0


HBase publishes a bunch of metrics, some useful some wasteful, that should be improved to deliver a better ops experience. Examples:

 - Block cache hit ratio converges at some point and stops moving
 - fsReadLatency goes down when compactions are running
 - storefileIndexSizeMB is the exact same number once a system is serving production load

We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2888) Review all our metrics

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893641#action_12893641 ] 

Kannan Muthukkaruppan commented on HBASE-2888:
----------------------------------------------

Actually, the per CF stats for put/get is easier said than done. A single put can have data for multiple CFs, and shares a common sync operation. So it'll be hard to correctly separate this. Thoughts?

> Review all our metrics
> ----------------------
>
>                 Key: HBASE-2888
>                 URL: https://issues.apache.org/jira/browse/HBASE-2888
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.90.0
>
>
> HBase publishes a bunch of metrics, some useful some wasteful, that should be improved to deliver a better ops experience. Examples:
>  - Block cache hit ratio converges at some point and stops moving
>  - fsReadLatency goes down when compactions are running
>  - storefileIndexSizeMB is the exact same number once a system is serving production load
> We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2888) Review all our metrics

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907096#action_12907096 ] 

stack commented on HBASE-2888:
------------------------------

Setting hbase.period in hadoop-metrics.properties doesn't seem to have an effect; counts are off.  Here's what I noticed digging in code:

'hadoop-metrics.properties' gets read up into a metrics attributes map but nothing seems to be done w/ them subsequently. Reading up in hadoop, in branch-0.20/src/core/org/apache/hadoop/metrics/package.html, it seems to imply that we need to getAttribute and set them after we make a metrics Context; i.e. in this case, call setPeriod in RegionServerMetrics, etc.?

More broadly, need to make sure settings in hadoop-metrics.properties take effect when changed.

> Review all our metrics
> ----------------------
>
>                 Key: HBASE-2888
>                 URL: https://issues.apache.org/jira/browse/HBASE-2888
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.90.0
>
>
> HBase publishes a bunch of metrics, some useful some wasteful, that should be improved to deliver a better ops experience. Examples:
>  - Block cache hit ratio converges at some point and stops moving
>  - fsReadLatency goes down when compactions are running
>  - storefileIndexSizeMB is the exact same number once a system is serving production load
> We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2888) Review all our metrics

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893851#action_12893851 ] 

ryan rawson commented on HBASE-2888:
------------------------------------

here are some of the things i've identified as issues:

- HFile states, eg: fsReadLatency, is in milliseconds, and it should really be in microseconds.
- we should generate 99th and 95th percentile for many of the stats (eg: fsReadLatency) and publish it.  Perhaps a 1 and/or 5 minute 99th rolling percentile.
- The HFile metrics integration is a little weak, we use some volatiles and scrape them, for the enhanced 99th/95th pc stats we'll need access to the richer stats classes.  HFile depends on Hadoop and hbase.util so with a little moving of things around, hopefully it'll be possible to actually make better stats w/o having HFile depends on HRS (for example)

> Review all our metrics
> ----------------------
>
>                 Key: HBASE-2888
>                 URL: https://issues.apache.org/jira/browse/HBASE-2888
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.90.0
>
>
> HBase publishes a bunch of metrics, some useful some wasteful, that should be improved to deliver a better ops experience. Examples:
>  - Block cache hit ratio converges at some point and stops moving
>  - fsReadLatency goes down when compactions are running
>  - storefileIndexSizeMB is the exact same number once a system is serving production load
> We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2888) Review all our metrics

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893639#action_12893639 ] 

Kannan Muthukkaruppan commented on HBASE-2888:
----------------------------------------------

Thanks for creating this JIRA.

Some additional comments:

* If the stats, at least the RPC (API-level) stats, could be on a per Table.ColumnFamily basis, that would be really helpful.
* For non-counter stats, can we always put the "unit" in its name. For example, storefileIndexSizeMB has MB in its name (which is really useful). But fsReadLatency doesn't have the units (ms? secs?).


> Review all our metrics
> ----------------------
>
>                 Key: HBASE-2888
>                 URL: https://issues.apache.org/jira/browse/HBASE-2888
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.90.0
>
>
> HBase publishes a bunch of metrics, some useful some wasteful, that should be improved to deliver a better ops experience. Examples:
>  - Block cache hit ratio converges at some point and stops moving
>  - fsReadLatency goes down when compactions are running
>  - storefileIndexSizeMB is the exact same number once a system is serving production load
> We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2888) Review all our metrics

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-2888:
--------------------------------------

    Fix Version/s:     (was: 0.90.0)
                   0.92.0

Punt to 0.92.0

> Review all our metrics
> ----------------------
>
>                 Key: HBASE-2888
>                 URL: https://issues.apache.org/jira/browse/HBASE-2888
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>            Reporter: Jean-Daniel Cryans
>             Fix For: 0.92.0
>
>
> HBase publishes a bunch of metrics, some useful some wasteful, that should be improved to deliver a better ops experience. Examples:
>  - Block cache hit ratio converges at some point and stops moving
>  - fsReadLatency goes down when compactions are running
>  - storefileIndexSizeMB is the exact same number once a system is serving production load
> We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.