You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Rohith (JIRA)" <ji...@apache.org> on 2014/01/31 11:24:10 UTC

[jira] [Commented] (MAPREDUCE-5734) Reducer preemption does not happen if node is blacklisted, intern job get hanged.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887610#comment-13887610 ] 

Rohith commented on MAPREDUCE-5734:
-----------------------------------

Currently, availableResource sent by ResourceManager in heartbeat includes blackListedNodes free memory. ApplicationMaster takes many descisions based on availableResource(schedule Request,Reducer pre-emption) received from ResourceManager. 
So, ResourceManager should send correct available resources(excluding blacklisted nodes free memory) to applicatonmaster.

> Reducer preemption does not  happen if node is blacklisted, intern job get hanged.
> ----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5734
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5734
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>         Environment: SuSE 11 SP2 + Hadoop-2.3 
>            Reporter: Rohith
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)