You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Haibo Chen (JIRA)" <ji...@apache.org> on 2016/07/21 03:46:20 UTC
[jira] [Resolved] (MAPREDUCE-6131) Integer overflow in
RMContainerAllocator results in starvation of applications
[ https://issues.apache.org/jira/browse/MAPREDUCE-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haibo Chen resolved MAPREDUCE-6131.
-----------------------------------
Resolution: Invalid
> Integer overflow in RMContainerAllocator results in starvation of applications
> ------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6131
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Kamal Kc
> Attachments: MAPREDUCE-6131-2.2.0.patch
>
>
> When processing large datasets, Hadoop encounters a scenario where all
> containers run reduce tasks and no map tasks are scheduled. The
> application does not fail but rather remains in this state without making
> any forward progress. It then has to be manually terminated.
> This bug is due to integer overflow in scheduleReduces() of
> RMContainerAllocator. The variable netScheduledMapMem overflows for
> large data sizes, takes negative value, and results in a large
> finalReduceMemLimit and a large rampup value. In almost all cases, this
> large rampup value is greater than the total number of reduce tasks.
> Therefore, the AM tries to assign all reduce tasks. And if the total number
> of reduce tasks is greater than the total container slots, then all slots are
> taken up by reduce tasks, leaving none for maps.
> With 128MB block size and 2GB map container size, overflow occurs with 128 TB data size. An example scenario for the reproduction is:
> - Input data size of 32TB, block size 128MB, Map container size = 10GB,
> reduce container size = 10GB, #reducers = 50, cluster mem capacity = 7 x 40GB, slowstart=0.0
> Better resolution might be to change the variables used in
> RMContainerAllocator from int to long. A simpler fix instead would be to
> only change the local variables of scheduleReduces() to long data types.
> Patch is attached for 2.2.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org