You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "MengWang (JIRA)" <ji...@apache.org> on 2011/02/25 06:07:38 UTC

[jira] Commented: (MAPREDUCE-2345) Optimize jobtracker's memory usage

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999218#comment-12999218 ] 

MengWang commented on MAPREDUCE-2345:
-------------------------------------

jobtracker's memory mainly used for TaskInProgress objects. We submit a Job with 100,087 tasks, jt's memory usage as follows:
org.apache.hadoop.mapred.TaskInProgress 
object 100,087
Shallow size 29,625,752
Retained size 325,065,944 (96%)

Our optimization work as follows:
(1)Reduce duplicated strings
   jobtracker stores too many duplicated strings, for example: splitClass name, splite locations, counters group name, couters name, display name, jtIdentifier of JobID, jobdir of MapOutputFile. 
   we use a StringCache reduced nearly 15% memory.
(2)Counters should be delay initialized
   tips with no attempttask assigned should not create Counters.
(3)Reconstruct completed TIP's counters
   when a task completed, the tip of this task become bigger because of counters. To speed up Counters update and lookup, Counters use HashMap and a cache, which cost too much memory. So we seperated counter values from Counters structure, all tasks share a CounterMap object, which map <CounterGroupName, CounterName> -> index of a long array, and every tip store a array of its counter values.
   Using this method, JT's memory reduced nearly 50%.

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> To many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, Task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira