You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Benedict Jin (JIRA)" <ji...@apache.org> on 2016/09/07 06:59:20 UTC
[jira] [Commented] (YARN-4743) ResourceManager crash because
TimSort
[ https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15469789#comment-15469789 ]
Benedict Jin commented on YARN-4743:
------------------------------------
14:54:17
【群主】南京-小金 2016/9/7 14:54:17
你们 jdk的版本到多少了,感觉这是一个 jdk本身的漏洞,比较器里面 相比较的两个值 如果同时为空的话,传入的顺序可能决定了返回值 的结果,破坏了 传递性 @南京-It_Ds_N.cpp
【群主】南京-小金 2016/9/7 14:54:34
JDK-6804124 : (coll) Replace "modified mergesort" in java.util.Arrays.sort with timsort
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6804124
【群主】南京-小金 2016/9/7 14:55:16
试试在 jvm中配置 java.util.Arrays.useLegacyMergeSort=true,看看有没有效果 @南京-It_Ds_N.cpp
> ResourceManager crash because TimSort
> -------------------------------------
>
> Key: YARN-4743
> URL: https://issues.apache.org/jira/browse/YARN-4743
> Project: Hadoop YARN
> Issue Type: Bug
> Components: fairscheduler
> Affects Versions: 2.6.4
> Reporter: Zephyr Guo
> Assignee: Yufei Gu
> Attachments: YARN-4743-cdh5.4.7.patch
>
>
> {code}
> 2016-02-26 14:08:50,821 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general contract!
> at java.util.TimSort.mergeHi(TimSort.java:868)
> at java.util.TimSort.mergeAt(TimSort.java:485)
> at java.util.TimSort.mergeCollapse(TimSort.java:410)
> at java.util.TimSort.sort(TimSort.java:214)
> at java.util.TimSort.sort(TimSort.java:173)
> at java.util.Arrays.sort(Arrays.java:659)
> at java.util.Collections.sort(Collections.java:217)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
> at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
> at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
> at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting {{runnableApps}}.
> {code:title=FSLeafQueue.java}
> Comparator<Schedulable> comparator = policy.getComparator();
> writeLock.lock();
> try {
> Collections.sort(runnableApps, comparator);
> } finally {
> writeLock.unlock();
> }
> readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ......
> s1.getResourceUsage(), minShare1);
> boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
> s2.getResourceUsage(), minShare2);
> minShareRatio1 = (double) s1.getResourceUsage().getMemory()
> / Resources.max(RESOURCE_CALCULATOR, null, minShare1, ONE).getMemory();
> minShareRatio2 = (double) s2.getResourceUsage().getMemory()
> / Resources.max(RESOURCE_CALCULATOR, null, minShare2, ONE).getMemory();
> ......
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is unstable.
> {code:title=FSAppAttempt.java}
> @Override
> public Resource getResourceUsage() {
> // Here the getPreemptedResources() always return zero, except in
> // a preemption round
> return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
> }
> {code}
> {code:title=SchedulerApplicationAttempt}
> public Resource getCurrentConsumption() {
> return currentConsumption;
> }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ......
> Resources.addTo(currentConsumption, rmContainer.getContainer()
> .getResource());
> ......
> }
> {code}
> I suggest that use stable Resource in comparator.
> Is there something i think wrong?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org