You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Zephyr Guo (JIRA)" <ji...@apache.org> on 2016/02/27 10:13:18 UTC
[jira] [Created] (YARN-4743) ResourceManager crash because TimSort
Zephyr Guo created YARN-4743:
--------------------------------
Summary: ResourceManager crash because TimSort
Key: YARN-4743
URL: https://issues.apache.org/jira/browse/YARN-4743
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.6.4
Reporter: Zephyr Guo
{code}
2016-02-26 14:08:50,821 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.util.TimSort.mergeHi(TimSort.java:868)
at java.util.TimSort.mergeAt(TimSort.java:485)
at java.util.TimSort.mergeCollapse(TimSort.java:410)
at java.util.TimSort.sort(TimSort.java:214)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
at java.lang.Thread.run(Thread.java:745)
2016-02-26 14:08:50,822 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}
Actually, this issue found in 2.6.0-cdh5.4.7.
I think the cause is that we modify {{Resouce}} while we are sorting {{runnableApps}}.
{code:title=FSLeafQueue.java}
Comparator<Schedulable> comparator = policy.getComparator();
writeLock.lock();
try {
Collections.sort(runnableApps, comparator);
} finally {
writeLock.unlock();
}
readLock.lock();
{code}
{code:title=FairShareComparator}
public int compare(Schedulable s1, Schedulable s2) {
......
s1.getResourceUsage(), minShare1);
boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
s2.getResourceUsage(), minShare2);
minShareRatio1 = (double) s1.getResourceUsage().getMemory()
/ Resources.max(RESOURCE_CALCULATOR, null, minShare1, ONE).getMemory();
minShareRatio2 = (double) s2.getResourceUsage().getMemory()
/ Resources.max(RESOURCE_CALCULATOR, null, minShare2, ONE).getMemory();
......
{code}
{{getResourceUsage}} will return current Resource. The current Resource is unstable.
{code:title=FSAppAttempt.java}
@Override
public Resource getResourceUsage() {
// Here the getPreemptedResources() always return zero, except in
// a preemption round
return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
}
{code}
{code:title=SchedulerApplicationAttempt}
public Resource getCurrentConsumption() {
return currentConsumption;
}
// This method may modify current Resource.
public synchronized void recoverContainer(RMContainer rmContainer) {
......
Resources.addTo(currentConsumption, rmContainer.getContainer()
.getResource());
......
}
{code}
I suggest that use stable Resource in comparator.
Is there something i think wrong?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)