You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Anubhav Dhoot (JIRA)" <ji...@apache.org> on 2015/09/30 21:22:06 UTC

[jira] [Commented] (MAPREDUCE-6302) Incorrect headroom can lead to a deadlock between map and reduce allocations

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14938321#comment-14938321 ] 

Anubhav Dhoot commented on MAPREDUCE-6302:
------------------------------------------

The patch looks mostly good

why does availableResourceForMap not consider assignedRequests.maps after the patch?

The earlier comments had some more description that would be useful to preserve. Maybe as a heading for both set of values to describe when does preemption kick in.  For eg the earlier description "The threshold in terms of seconds after which an unsatisfied mapper request triggers reducer preemption to free space."

Would UNCONDITIONAL be better than FORCE, because its not like the other one is an optional preemption when it kicks in?
consider 
reverting duration -> allocationDelayThresholdMs
forcePreemptThreshold -> forcePreemptThresholdSec
reducerPreemptionHoldMs -> reducerNoHeadroomPreemptionMs

resourceLimit in allocation is a weird name for the headroom in the Allocation. Consider another jira for fixing that.


> Incorrect headroom can lead to a deadlock between map and reduce allocations 
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6302
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: mai shurong
>            Assignee: Karthik Kambatla
>            Priority: Critical
>         Attachments: AM_log_head100000.txt.gz, AM_log_tail100000.txt.gz, log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-prelim.patch, queue_with_max163cores.png, queue_with_max263cores.png, queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a queue(fairscheduler) with 300 max cores. When the big mapreduce job is running 100% maps, the 300 reduces have occupied 300 max cores in the queue. And then, a map fails and retry, waiting for a core, while the 300 reduces are waiting for failed map to finish. So a deadlock occur. As a result, the job is blocked, and the later job in the queue cannot run because no available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)