You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Mit Desai (JIRA)" <ji...@apache.org> on 2014/08/06 21:08:12 UTC
[jira] [Created] (YARN-2387) Resource Manager crashes with NPE due
to lack of synchronization
Mit Desai created YARN-2387:
-------------------------------
Summary: Resource Manager crashes with NPE due to lack of synchronization
Key: YARN-2387
URL: https://issues.apache.org/jira/browse/YARN-2387
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
We recently came across a 0.23 RM crashing with an NPE. Here is the stacktrace for it.
{noformat}
2014-08-06 05:56:52,165 [ResourceManager Event Processor] FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
handling event type NODE_UPDATE to the scheduler
java.lang.NullPointerException
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToBuilder(ContainerStatusPBImpl.java:61)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.mergeLocalToProto(ContainerStatusPBImpl.java:68)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:53)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerStatusPBImpl.getProto(ContainerStatusPBImpl.java:34)
at
org.apache.hadoop.yarn.api.records.ProtoBase.toString(ProtoBase.java:55)
at java.lang.String.valueOf(String.java:2854)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at
org.apache.hadoop.yarn.api.records.impl.pb.ContainerPBImpl.toString(ContainerPBImpl.java:353)
at java.lang.String.valueOf(String.java:2854)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1405)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainer(CapacityScheduler.java:790)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:602)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:688)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:82)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:339)
at java.lang.Thread.run(Thread.java:722)
2014-08-06 05:56:52,166 [ResourceManager Event Processor] INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{noformat}
On investigating a on the issue we found that the ContainerStatusPBImpl has methods that are called by different threads and are not synchronized. Even the 2.X code looks alike.
We need to make these methods synchronized so that we do not encounter this problem in future.
--
This message was sent by Atlassian JIRA
(v6.2#6252)