You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org> on 2011/10/15 01:48:12 UTC
[jira] [Commented] (MAPREDUCE-2693) NPE in AM causes it to lose containers which are never returned back to RM

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127994#comment-13127994 ] 

Hadoop QA commented on MAPREDUCE-2693:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12499113/MR-2693.1.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1030//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1030//console

This message is automatically generated.
                
> NPE in AM causes it to lose containers which are never returned back to RM
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2693
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2693
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Amol Kekre
>            Assignee: Hitesh Shah
>            Priority: Critical
>             Fix For: 0.23.0
>
>         Attachments: MR-2693.1.patch
>
>
> The following exception in AM of an application at the top of queue causes this. Once this happens, AM keeps obtaining
> containers from RM and simply loses them. Eventually on a cluster with multiple jobs, no more scheduling happens
> because of these lost containers.
> It happens when there are blacklisted nodes at the app level in AM. A bug in AM
> (RMContainerRequestor.containerFailedOnHost(hostName)) is causing this - nodes are simply getting removed from the
> request-table. We should make sure RM also knows about this update.
> ========================================================================
> 11/06/17 06:11:18 INFO rm.RMContainerAllocator: Assigned based on host match 98.138.163.34
> 11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30 priority=20
> resourceName=... numContainers=4978 #asks=5
> 11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30 priority=20
> resourceName=... numContainers=4977 #asks=5
> 11/06/17 06:11:18 INFO rm.RMContainerRequestor: BEFORE decResourceRequest: applicationId=30 priority=20
> resourceName=... numContainers=1540 #asks=5
> 11/06/17 06:11:18 INFO rm.RMContainerRequestor: AFTER decResourceRequest: applicationId=30 priority=20
> resourceName=... numContainers=1539 #asks=6
> 11/06/17 06:11:18 ERROR rm.RMContainerAllocator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException
>         at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decResourceRequest(RMContainerRequestor.java:246)
>         at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.decContainerReq(RMContainerRequestor.java:198)
>         at
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:523)
>         at
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$200(RMContainerAllocator.java:433)
>         at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:151)
>         at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:220)
>         at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira