You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Benedict Jin (JIRA)" <ji...@apache.org> on 2016/09/07 06:59:20 UTC

[jira] [Commented] (YARN-4743) ResourceManager crash because TimSort

    [ https://issues.apache.org/jira/browse/YARN-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15469789#comment-15469789 ] 

Benedict Jin commented on YARN-4743:
------------------------------------

				14:54:17
				【群主】南京-小金 2016/9/7 14:54:17
				你们 jdk的版本到多少了,感觉这是一个 jdk本身的漏洞,比较器里面 相比较的两个值 如果同时为空的话,传入的顺序可能决定了返回值 的结果,破坏了 传递性 @南京-It_Ds_N.cpp 

				【群主】南京-小金 2016/9/7 14:54:34
				JDK-6804124 : (coll) Replace "modified mergesort" in java.util.Arrays.sort with timsort
				http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6804124

				【群主】南京-小金 2016/9/7 14:55:16
				试试在 jvm中配置 java.util.Arrays.useLegacyMergeSort=true,看看有没有效果 @南京-It_Ds_N.cpp 

> ResourceManager crash because TimSort
> -------------------------------------
>
>                 Key: YARN-4743
>                 URL: https://issues.apache.org/jira/browse/YARN-4743
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: fairscheduler
>    Affects Versions: 2.6.4
>            Reporter: Zephyr Guo
>            Assignee: Yufei Gu
>         Attachments: YARN-4743-cdh5.4.7.patch
>
>
> {code}
> 2016-02-26 14:08:50,821 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: Comparison method violates its general contract!
>          at java.util.TimSort.mergeHi(TimSort.java:868)
>          at java.util.TimSort.mergeAt(TimSort.java:485)
>          at java.util.TimSort.mergeCollapse(TimSort.java:410)
>          at java.util.TimSort.sort(TimSort.java:214)
>          at java.util.TimSort.sort(TimSort.java:173)
>          at java.util.Arrays.sort(Arrays.java:659)
>          at java.util.Collections.sort(Collections.java:217)
>          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:316)
>          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:240)
>          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1091)
>          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:989)
>          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1185)
>          at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
>          at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:684)
>          at java.lang.Thread.run(Thread.java:745)
> 2016-02-26 14:08:50,822 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
> {code}
> Actually, this issue found in 2.6.0-cdh5.4.7.
> I think the cause is that we modify {{Resouce}} while we are sorting {{runnableApps}}.
> {code:title=FSLeafQueue.java}
>     Comparator<Schedulable> comparator = policy.getComparator();
>     writeLock.lock();
>     try {
>       Collections.sort(runnableApps, comparator);
>     } finally {
>       writeLock.unlock();
>     }
>     readLock.lock();
> {code}
> {code:title=FairShareComparator}
> public int compare(Schedulable s1, Schedulable s2) {
> ......
>           s1.getResourceUsage(), minShare1);
>       boolean s2Needy = Resources.lessThan(RESOURCE_CALCULATOR, null,
>           s2.getResourceUsage(), minShare2);
>       minShareRatio1 = (double) s1.getResourceUsage().getMemory()
>           / Resources.max(RESOURCE_CALCULATOR, null, minShare1, ONE).getMemory();
>       minShareRatio2 = (double) s2.getResourceUsage().getMemory()
>           / Resources.max(RESOURCE_CALCULATOR, null, minShare2, ONE).getMemory();
> ......
> {code}
> {{getResourceUsage}} will return current Resource. The current Resource is unstable. 
> {code:title=FSAppAttempt.java}
> @Override
>   public Resource getResourceUsage() {
>     // Here the getPreemptedResources() always return zero, except in
>     // a preemption round
>     return Resources.subtract(getCurrentConsumption(), getPreemptedResources());
>   }
> {code}
> {code:title=SchedulerApplicationAttempt}
>  public Resource getCurrentConsumption() {
>     return currentConsumption;
>   }
> // This method may modify current Resource.
> public synchronized void recoverContainer(RMContainer rmContainer) {
> ......
>     Resources.addTo(currentConsumption, rmContainer.getContainer()
>       .getResource());
> ......
>   }
> {code}
> I suggest that use stable Resource in comparator.
> Is there something i think wrong?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org