You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2012/05/31 22:44:24 UTC
[jira] [Commented] (MAPREDUCE-4299) Terasort hangs with MR2 FifoScheduler

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286915#comment-13286915 ] 

Tom White commented on MAPREDUCE-4299:
--------------------------------------

The code in RMContainerAllocator is meant to handle this case by ramping up the number reducers as maps finish. However, there seems to be something fishy about the total amount of memory available to the job. Compare

2012-05-24 16:47:25,803 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 0.3 totalMemLimit:63488 finalMapMemLimit:44442 finalReduceMemLimit:19046 netScheduledMapMem:117760 netScheduledReduceMem:15360

to

2012-05-24 16:47:07,521 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: PendingReduces:30 ScheduledMaps:160 ScheduledReduces:0 AssignedMaps:0 AssignedReduces:0 completedMaps:0 completedReduces:0 containersAllocated:0 containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:0 availableResources(headroom):memory: 32768

The first says that there is 63488 MB of memory, the second 32768 MB (these numbers stay the same throughout the job). So what could be happening is that the allocator slowly ramps up the number of reducers until they use up 32768 MB (32 slots at 1024MB apiece) thinking that there is still memory available when there isn't. The code has some confusion between the terms 'available resource', 'headroom', and 'cluster resource' - i.e. it's not clear if available resource is a total, or just what's not in use. RMContainerAllocator.getMemLimit() suggests the latter, while the FifoScheduler has the line {{application.setHeadroom(clusterResource)}} which suggests that it's a fixed total.
                
> Terasort hangs with MR2 FifoScheduler
> -------------------------------------
>
>                 Key: MAPREDUCE-4299
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4299
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.0-alpha
>            Reporter: Tom White
>
> What happens is that the number of reducers ramp up until they occupy all of the job's containers, at which point the maps no longer make any progress and the job hangs.
> When the same job is run with the CapacityScheduler it succeeds, so this looks like a FifoScheduler bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira