You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2015/05/15 19:52:03 UTC

[jira] [Resolved] (MAPREDUCE-2872) Optimize TaskTracker memory usage

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved MAPREDUCE-2872.
-----------------------------------------
    Resolution: Won't Fix

Closing this as won't fix, given development on hadoop-1 has effectively stopped.

> Optimize TaskTracker memory usage
> ---------------------------------
>
>                 Key: MAPREDUCE-2872
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2872
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.203.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>              Labels: memory, optimization
>
> We observe high memory usage of framework level components on slave node, mainly TaskTracker & Child, especially for large clusters. To be clear at first, large jobs with 10000-100000 map and >10000 reduce tasks are very common in our offline cluster, and will very likely continue to grow. This is reasonable because the number of map & reduce slots are in the same range, and it's impractical for users to reduce their job's task number without execution time penalty. 
> High memory consumption will:
> * Limit the memory used by up level application; 
> * Reduce page cache space, which plays a  important role in spill, merge, shuffle and even HDFS performance; 
> * Increase the probability of slave node OOM, which may affect storage layer(HDFS) too. 
> A stable TT with predictable memory behavior is desired, this also applies to Child JVM.
> This issue focuses on TaskTracker memory optimization, on our cluster, TaskTracker use 600M+ memory & 300%+(3core+) CPU at peak, and 300M+ memory & much less CPU in average, so we need to set -Xmx to 1000M for TT to prevent OOM, then the TT memory is in 200M-1200M range, and 800M in average. 
> Here are some ideas:  
> Jetty http connection use a lot memory when these are many requests in queue, we need to limit the length of the queue, combine multiple requests into one request, or use netty just like MR2
> TaskCompletionEvents use a lot memory too if a job have large number of map task, this won't be a problem in MR2, but can be optimized, A typical TaskCompletionEvent object use 296 bytes memory, a job with 100000 map will use about 30M memory, problem will appear if there are some big RunningJob in a TaskTracker. There are more memory efficient implementations for TaskCompletionEvent.
> IndexCache: memory of indexcache varies directly as reduce number, on large cluster 10MB of indexcache is not enough, 
> we set it to 100MB, again use primitive long[] instead of IndexRecord[] can save 50% of memory.
> Although some of the above won't be a problem in MR-v2, since MR-v1 is still widely used, I think optimizations are needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)