You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2012/08/16 02:48:38 UTC

[jira] [Created] (HADOOP-8706) Provide rate metrics based on counter value

Ming Ma created HADOOP-8706:
-------------------------------

             Summary: Provide rate metrics based on counter value
                 Key: HADOOP-8706
                 URL: https://issues.apache.org/jira/browse/HADOOP-8706
             Project: Hadoop Common
          Issue Type: Improvement
          Components: metrics
            Reporter: Ming Ma


In production clusters, it is more useful to have ops / sec instead of increasing counter value. Take NameNodeMetrics.getBlockLocations as an example, its current type is MutableCounterLong and thus the value is increasing all the time. Quite often "num of getBlockLocations" per second is more interesting for analysis. Further I found most of the MutableCounterLong in NamenodeMetrics and DataNodeMetrics are more useful if they are expressed in terms of ops / sec. 

I looked at all the metrics objects provided in metrics 2.0, couldn't find such type.

FYI, hbase has its own MetricsRate object based on metrics 1.0 for this purpose.
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8706) Provide rate metrics based on counter value

Posted by "Ming Ma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436554#comment-13436554 ] 

Ming Ma commented on HADOOP-8706:
---------------------------------

@Aaron, It seems MutableRate is used to collect things like latency. It generates a bunch of derived metrics such as mean and stand deviation. It doesn't seem to target ops / sec, although the name has rate in it. MutableRate seems to be similar to MetricsTimeVaryingRate in metrics 1.0.

@Andy. We use ganglia. Do you know if there is a way to change ganglia to calculate derivative of any given metrics or something similar?

Some more background. Before we applied this fix to MetricsTimeVaryingLong in metrics 1.0 in our internal branch, we find it hard to understand why NN in a shared cluster sometimes became really slow, later we understood there are some bad client code doing lots of NN operation; the monotonic metric of NN file operations aren't obvious in ganglia. After the fix is applied to production cluster, it makes root cause analysis much easier.
                
> Provide rate metrics based on counter value
> -------------------------------------------
>
>                 Key: HADOOP-8706
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8706
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Ming Ma
>         Attachments: HADOOP-8706.patch
>
>
> In production clusters, it is more useful to have ops / sec instead of increasing counter value. Take NameNodeMetrics.getBlockLocations as an example, its current type is MutableCounterLong and thus the value is increasing all the time. Quite often "num of getBlockLocations" per second is more interesting for analysis. Further I found most of the MutableCounterLong in NamenodeMetrics and DataNodeMetrics are more useful if they are expressed in terms of ops / sec. 
> I looked at all the metrics objects provided in metrics 2.0, couldn't find such type.
> FYI, hbase has its own MetricsRate object based on metrics 1.0 for this purpose.
>    

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8706) Provide rate metrics based on counter value

Posted by "Aaron T. Myers (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436314#comment-13436314 ] 

Aaron T. Myers commented on HADOOP-8706:
----------------------------------------

Hi Ming Ma, it seems like MutableRate in metrics 2 might be appropriate here?
                
> Provide rate metrics based on counter value
> -------------------------------------------
>
>                 Key: HADOOP-8706
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8706
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Ming Ma
>         Attachments: HADOOP-8706.patch
>
>
> In production clusters, it is more useful to have ops / sec instead of increasing counter value. Take NameNodeMetrics.getBlockLocations as an example, its current type is MutableCounterLong and thus the value is increasing all the time. Quite often "num of getBlockLocations" per second is more interesting for analysis. Further I found most of the MutableCounterLong in NamenodeMetrics and DataNodeMetrics are more useful if they are expressed in terms of ops / sec. 
> I looked at all the metrics objects provided in metrics 2.0, couldn't find such type.
> FYI, hbase has its own MetricsRate object based on metrics 1.0 for this purpose.
>    

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8706) Provide rate metrics based on counter value

Posted by "Ming Ma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ming Ma updated HADOOP-8706:
----------------------------

    Attachment: HADOOP-8706.patch

Here is the patch. It defines a new metrics type MutableCounterLongRate.

For most usage cases of MutableCounterLong in HDFS, it seems more useful to express in terms of ops / sec. We can change it to use the new metrics type MutableCounterLongRate.

Alternatively, we can have MutableCounterLong push out two values to the metrics sink, one is the current counter value, another one is ops / sec. 
                
> Provide rate metrics based on counter value
> -------------------------------------------
>
>                 Key: HADOOP-8706
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8706
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Ming Ma
>         Attachments: HADOOP-8706.patch
>
>
> In production clusters, it is more useful to have ops / sec instead of increasing counter value. Take NameNodeMetrics.getBlockLocations as an example, its current type is MutableCounterLong and thus the value is increasing all the time. Quite often "num of getBlockLocations" per second is more interesting for analysis. Further I found most of the MutableCounterLong in NamenodeMetrics and DataNodeMetrics are more useful if they are expressed in terms of ops / sec. 
> I looked at all the metrics objects provided in metrics 2.0, couldn't find such type.
> FYI, hbase has its own MetricsRate object based on metrics 1.0 for this purpose.
>    

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8706) Provide rate metrics based on counter value

Posted by "Andy Isaacson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436363#comment-13436363 ] 

Andy Isaacson commented on HADOOP-8706:
---------------------------------------

It's very easy using an external tool to derive an ops/sec rate by sampling the monotonic metric at your chosen frequency and subtracting. What is the benefit of hard-coding a single period (one second) within the metrics system rather than allowing the external sampling application to choose a period that is appropriate for its application?

I don't see a benefit to this approach that justifies the added complexity.
                
> Provide rate metrics based on counter value
> -------------------------------------------
>
>                 Key: HADOOP-8706
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8706
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Ming Ma
>         Attachments: HADOOP-8706.patch
>
>
> In production clusters, it is more useful to have ops / sec instead of increasing counter value. Take NameNodeMetrics.getBlockLocations as an example, its current type is MutableCounterLong and thus the value is increasing all the time. Quite often "num of getBlockLocations" per second is more interesting for analysis. Further I found most of the MutableCounterLong in NamenodeMetrics and DataNodeMetrics are more useful if they are expressed in terms of ops / sec. 
> I looked at all the metrics objects provided in metrics 2.0, couldn't find such type.
> FYI, hbase has its own MetricsRate object based on metrics 1.0 for this purpose.
>    

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira