You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/11/13 19:43:11 UTC

[jira] [Commented] (YARN-4355) NPE while processing localizer heartbeat

    [ https://issues.apache.org/jira/browse/YARN-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004480#comment-15004480 ] 

Jason Lowe commented on YARN-4355:
----------------------------------

Stacktrace:
{noformat}
java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1089)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1054)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:681)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:330)
        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server.call(Server.java:2297)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:654)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:621)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1680)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2247)
{noformat}

The nodemanager was in the process of tearing down, so applications where being cleaned up.  Looks like localizer heartbeats can come in and we can lose the localizer tracker just as the localizer heartbeat tries to use it.

> NPE while processing localizer heartbeat
> ----------------------------------------
>
>                 Key: YARN-4355
>                 URL: https://issues.apache.org/jira/browse/YARN-4355
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.2
>            Reporter: Jason Lowe
>
> While analyzing YARN-4354 I noticed a nodemanager was getting NPEs while processing a private localizer heartbeat.  I think there's a race where we can cleanup resources for an application and therefore remove the app local resource tracker just as we are trying to handle the localizer heartbeat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)