You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Andrew Wang (JIRA)" <ji...@apache.org> on 2013/02/25 19:58:13 UTC

[jira] [Commented] (MAPREDUCE-5026) For shortening the time of TaskTracker heartbeat, decouple the statics collection operations

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586136#comment-13586136 ] 

Andrew Wang commented on MAPREDUCE-5026:
----------------------------------------

Hi Sam,

Thanks for the patch. I moved your issue to MAPREDUCE, since the TaskTracker isn't a component of HDFS.

A few minor comments:

* Please rename "Statics" to "Statistics" in the code.
* Could you provide some performance numbers, to quantify the before and after improvement?
                
> For shortening the time of TaskTracker heartbeat, decouple the statics collection operations
> --------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5026
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5026
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, tasktracker
>    Affects Versions: 1.1.1
>            Reporter: sam liu
>              Labels: patch
>             Fix For: 1.1.1
>
>         Attachments: HDFS-4527.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In each heartbeat of TaskTracker, it will calculate some system statics, like the free disk space, available virtual/physical memory, cpu usage, etc. However, it's not necessary to calculate all the statics in every heartbeat, and this will consume many system resource and impace the performance of TaskTracker heartbeat. Furthermore, the characteristics of system properties(disk, memory, cpu) are different and it's better to collect their statics in different intervals.
> To reduce the latency of TaskTracker heartbeat, one solution is to decouple all the system statics collection operations from it, and issue separate threads to do the statics collection works when the TaskTracker starts. The threads could be three: the first one is to collect cpu related statics in a short interval; the second one is to collect memory related statics in a normal interval; the third one is to collect disk related statics in a long interval. And all the interval could be customized by the parameter "mapred.stats.collection.interval" in the mapred-site.xml. At last, the heartbeat could get values of system statics from the memory directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira