You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Binglin Chang (JIRA)" <ji...@apache.org> on 2011/08/23 08:51:29 UTC

[jira] [Created] (MAPREDUCE-2872) Optimize TaskTracker memory usage

Optimize TaskTracker memory usage
---------------------------------

                 Key: MAPREDUCE-2872
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2872
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tasktracker
    Affects Versions: 0.20.203.0
            Reporter: Binglin Chang
            Assignee: Binglin Chang


We observe high memory usage of framework level components on slave node, mainly TaskTracker & Child, especially for large clusters. To be clear at first, large jobs with 10000-100000 map and >10000 reduce tasks are very common in our offline cluster, and will very likely continue to grow. This is reasonable because the number of map & reduce slots are in the same range, and it's impractical for users to reduce their job's task number without execution time penalty. 

High memory consumption will:
* Limit the memory used by up level application; 
* Reduce page cache space, which plays a  important role in spill, merge, shuffle and even HDFS performance; 
* Increase the probability of slave node OOM, which may affect storage layer(HDFS) too. 

A stable TT with predictable memory behavior is desired, this also applies to Child JVM.

This issue focuses on TaskTracker memory optimization, on our cluster, TaskTracker use 600M+ memory & 300%+(3core+) CPU at peak, and 300M+ memory & much less CPU in average, so we need to set -Xmx to 1000M for TT to prevent OOM, then the TT memory is in 200M-1200M range, and 800M in average. 

Here are some ideas:  

Jetty http connection use a lot memory when these are many requests in queue, we need to limit the length of the queue, combine multiple requests into one request, or use netty just like MR2

TaskCompletionEvents use a lot memory too if a job have large number of map task, this won't be a problem in MR2, but can be optimized, A typical TaskCompletionEvent object use 296 bytes memory, a job with 100000 map will use about 30M memory, problem will appear if there are some big RunningJob in a TaskTracker. There are more memory efficient implementations for TaskCompletionEvent.

IndexCache: memory of indexcache varies directly as reduce number, on large cluster 10MB of indexcache is not enough, 
we set it to 100MB, again use primitive long[] instead of IndexRecord[] can save 50% of memory.

Although some of the above won't be a problem in MR-v2, since MR-v1 is still widely used, I think optimizations are needed.





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2872) Optimize TaskTracker memory usage

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092828#comment-13092828 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-2872:
----------------------------------------------------

Hi, without discounting the performance improvements you are proposing, what I sense probably a more pressing issue for you is the memory usage of the child tasks. With jobs like streaming and pipes, the child tasks have wildly varying usages. This had been a significant issue in the past, see HADOOP-3581. Have you tried enabling the memory-monitoring of tasks? (http://hadoop.apache.org/common/docs/current/cluster_setup.html#Memory+monitoring)



> Optimize TaskTracker memory usage
> ---------------------------------
>
>                 Key: MAPREDUCE-2872
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2872
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.203.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>              Labels: memory, optimization
>
> We observe high memory usage of framework level components on slave node, mainly TaskTracker & Child, especially for large clusters. To be clear at first, large jobs with 10000-100000 map and >10000 reduce tasks are very common in our offline cluster, and will very likely continue to grow. This is reasonable because the number of map & reduce slots are in the same range, and it's impractical for users to reduce their job's task number without execution time penalty. 
> High memory consumption will:
> * Limit the memory used by up level application; 
> * Reduce page cache space, which plays a  important role in spill, merge, shuffle and even HDFS performance; 
> * Increase the probability of slave node OOM, which may affect storage layer(HDFS) too. 
> A stable TT with predictable memory behavior is desired, this also applies to Child JVM.
> This issue focuses on TaskTracker memory optimization, on our cluster, TaskTracker use 600M+ memory & 300%+(3core+) CPU at peak, and 300M+ memory & much less CPU in average, so we need to set -Xmx to 1000M for TT to prevent OOM, then the TT memory is in 200M-1200M range, and 800M in average. 
> Here are some ideas:  
> Jetty http connection use a lot memory when these are many requests in queue, we need to limit the length of the queue, combine multiple requests into one request, or use netty just like MR2
> TaskCompletionEvents use a lot memory too if a job have large number of map task, this won't be a problem in MR2, but can be optimized, A typical TaskCompletionEvent object use 296 bytes memory, a job with 100000 map will use about 30M memory, problem will appear if there are some big RunningJob in a TaskTracker. There are more memory efficient implementations for TaskCompletionEvent.
> IndexCache: memory of indexcache varies directly as reduce number, on large cluster 10MB of indexcache is not enough, 
> we set it to 100MB, again use primitive long[] instead of IndexRecord[] can save 50% of memory.
> Although some of the above won't be a problem in MR-v2, since MR-v1 is still widely used, I think optimizations are needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2872) Optimize TaskTracker memory usage

Posted by "Binglin Chang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093558#comment-13093558 ] 

Binglin Chang commented on MAPREDUCE-2872:
------------------------------------------

bq. what I sense probably a more pressing issue for you is the memory usage of the child tasks.
Yes, statistics of memory consumption for a whole cluster shows that large proportion of memory is used by map/reduce tasks, especially reduce tasks. 
We don't use memory-monitoring feature cause our cluster don't have this feature... very old verison:( 
And I notice that many memory related bugs has already been fixed in newer versions. Kill tasks is a solution, but we also can do some optimizations. Most high memory child task is caused by big key/values, especially for streaming/pipes.
As MRv1 will stay for a long time in our company internally, we made these changes:
1. Add cache for taskTrackerHttp for TaskCompletionEvents
2. Add cache in TaskID to JobID reference, 
   So average memory of TaskCompletionEvent object drop from 296bytes to some tens of bytes.
3. Use 12byte to record a IndexRecord rather than 24byte(long[3]) 
4. Try to replace Jetty with Netty, using code from 0.23
5. Resource isolaton using cgroup.

As these changes are not good coding practice or require OS support, and TT will not exist in MRv2, I think I should close this issue.


> Optimize TaskTracker memory usage
> ---------------------------------
>
>                 Key: MAPREDUCE-2872
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2872
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.203.0
>            Reporter: Binglin Chang
>            Assignee: Binglin Chang
>              Labels: memory, optimization
>
> We observe high memory usage of framework level components on slave node, mainly TaskTracker & Child, especially for large clusters. To be clear at first, large jobs with 10000-100000 map and >10000 reduce tasks are very common in our offline cluster, and will very likely continue to grow. This is reasonable because the number of map & reduce slots are in the same range, and it's impractical for users to reduce their job's task number without execution time penalty. 
> High memory consumption will:
> * Limit the memory used by up level application; 
> * Reduce page cache space, which plays a  important role in spill, merge, shuffle and even HDFS performance; 
> * Increase the probability of slave node OOM, which may affect storage layer(HDFS) too. 
> A stable TT with predictable memory behavior is desired, this also applies to Child JVM.
> This issue focuses on TaskTracker memory optimization, on our cluster, TaskTracker use 600M+ memory & 300%+(3core+) CPU at peak, and 300M+ memory & much less CPU in average, so we need to set -Xmx to 1000M for TT to prevent OOM, then the TT memory is in 200M-1200M range, and 800M in average. 
> Here are some ideas:  
> Jetty http connection use a lot memory when these are many requests in queue, we need to limit the length of the queue, combine multiple requests into one request, or use netty just like MR2
> TaskCompletionEvents use a lot memory too if a job have large number of map task, this won't be a problem in MR2, but can be optimized, A typical TaskCompletionEvent object use 296 bytes memory, a job with 100000 map will use about 30M memory, problem will appear if there are some big RunningJob in a TaskTracker. There are more memory efficient implementations for TaskCompletionEvent.
> IndexCache: memory of indexcache varies directly as reduce number, on large cluster 10MB of indexcache is not enough, 
> we set it to 100MB, again use primitive long[] instead of IndexRecord[] can save 50% of memory.
> Although some of the above won't be a problem in MR-v2, since MR-v1 is still widely used, I think optimizations are needed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira