You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "MengWang (JIRA)" <ji...@apache.org> on 2011/02/25 04:26:38 UTC

[jira] Created: (MAPREDUCE-2345) Optimize jobtracker's memory usage

Optimize jobtracker's  memory usage  
-------------------------------------

                 Key: MAPREDUCE-2345
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: jobtracker
    Affects Versions: 0.21.0
            Reporter: MengWang
             Fix For: 0.23.0


To many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, Task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2345) Optimize jobtracker's memory usage

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008597#comment-13008597 ] 

Allen Wittenauer commented on MAPREDUCE-2345:
---------------------------------------------

> But how about a running job with tens of thousands of tasks? We see that big running 
> jobs use much memory in the cluster. 

This is almost always a sign that either the data being read is not laid out efficiently/too small of block size, that one needs to use CombinedFileInputFormat, or there just too many reducers in play.  There is almost never a reason to have jobs in the x0,000 area unless the dataset is Just That Big.

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> Too many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAPREDUCE-2345) Optimize jobtracker's memory usage

Posted by "MengWang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

MengWang updated MAPREDUCE-2345:
--------------------------------

    Description: Too many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3.   (was: To many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, Task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. )

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> Too many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (MAPREDUCE-2345) Optimize jobtracker's memory usage

Posted by "MengWang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

MengWang updated MAPREDUCE-2345:
--------------------------------

    Attachment: jt-memory-useage.bmp

Jobtracker's memory mainly user for TaskInProgress objects. We submit a Job with 100,087 tasks, jt's memory usage as follows:


> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> To many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, Task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2345) Optimize jobtracker's memory usage

Posted by "MengWang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999218#comment-12999218 ] 

MengWang commented on MAPREDUCE-2345:
-------------------------------------

jobtracker's memory mainly used for TaskInProgress objects. We submit a Job with 100,087 tasks, jt's memory usage as follows:
org.apache.hadoop.mapred.TaskInProgress 
object 100,087
Shallow size 29,625,752
Retained size 325,065,944 (96%)

Our optimization work as follows:
(1)Reduce duplicated strings
   jobtracker stores too many duplicated strings, for example: splitClass name, splite locations, counters group name, couters name, display name, jtIdentifier of JobID, jobdir of MapOutputFile. 
   we use a StringCache reduced nearly 15% memory.
(2)Counters should be delay initialized
   tips with no attempttask assigned should not create Counters.
(3)Reconstruct completed TIP's counters
   when a task completed, the tip of this task become bigger because of counters. To speed up Counters update and lookup, Counters use HashMap and a cache, which cost too much memory. So we seperated counter values from Counters structure, all tasks share a CounterMap object, which map <CounterGroupName, CounterName> -> index of a long array, and every tip store a array of its counter values.
   Using this method, JT's memory reduced nearly 50%.

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> To many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, Task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2345) Optimize jobtracker's memory usage

Posted by "MengWang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000651#comment-13000651 ] 

MengWang commented on MAPREDUCE-2345:
-------------------------------------

Thanks Kang, you got it.

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> To many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, Task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2345) Optimize jobtracker's memory usage

Posted by "Kang Xiao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000637#comment-13000637 ] 

Kang Xiao commented on MAPREDUCE-2345:
--------------------------------------

Thanks Arun, it's really a good solution to retire completed jobs from memory and the function is in trunk. But how about a running job with tens of thousands of tasks? We see that big running jobs use much memory in the cluster.  

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> To many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, Task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAPREDUCE-2345) Optimize jobtracker's memory usage

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000457#comment-13000457 ] 

Arun C Murthy commented on MAPREDUCE-2345:
------------------------------------------

Meng, interesting analysis, thanks!

To be perfectly honest, I'm surprised you guys are seeing this many *memory* issues with the JT... what version of the Hadoop Map-Reduce are you running? A simple solution we have deployed at Yahoo! for a long while now is to aggressively cut down #completed jobs in memory which has helped a *lot*. Something to consider for you guys.

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> To many tasks will eat up a considerable amount of JobTracker's heap space. According to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings, Task waste too much memory. Our optimization around these three points reduced jobtracker's memory to 1/3. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira