You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Zhijie Shen (JIRA)" <ji...@apache.org> on 2013/10/03 02:19:42 UTC

[jira] [Updated] (MAPREDUCE-5489) MR jobs hangs as it does not use the node-blacklisting feature in RM requests

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhijie Shen updated MAPREDUCE-5489:
-----------------------------------

    Attachment: MAPREDUCE-5489.1.patch

I've created the patch to make AM send blacklist nodes to RM. Basically the logical is described as follows:

1. Add blacklistAdditions and blacklistRemovals to remember the blacklisted nodes added or removed between two allocate calls. The two collections will be sent to RM in upcoming allocate call.

2. Whenever a container fails on a host, the host will be blacklisted, and will add to blacklistAdditions if blacklist is not ignored.

3. When changing from not ignoring blacklist to ignoring, we added all the blacklist nodes  to blacklistRemovals.

4. When changing from ignoring blacklist to not ignoring, we added all the blacklist nodes  to blacklistAdditions.

5.  Switching between ignoring and not ignoring blacklist nodes will not effect until the upcoming allocate call, but anyway, it will effect eventually.

Test cases have been modified test whether RM is aware of the blacklisted nodes.

> MR jobs hangs as it does not use the node-blacklisting feature in RM requests
> -----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Zhijie Shen
>         Attachments: MAPREDUCE-5489.1.patch
>
>
> When RM restarted, if during restart one NM went bad (bad disk), NM got blacklisted by AM and RM keeps giving the containers on the same node even though AM doesn't want it there.
> Need to change AM to specifically blacklist node in the RM requests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)