You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Szilard Nemeth (Jira)" <ji...@apache.org> on 2021/09/07 14:39:00 UTC

[jira] [Comment Edited] (YARN-10934) activateApplications NPL

    [ https://issues.apache.org/jira/browse/YARN-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17411281#comment-17411281 ] 

Szilard Nemeth edited comment on YARN-10934 at 9/7/21, 2:38 PM:
----------------------------------------------------------------

Hi [~luoyuan],
Can you attach a full yarn-site.xml config file here? Probably there's also something else than the DominantResourceCalculator that comes into play here.
If you have sensitive info like queues names or something like that, you may mask or replace the data with some dummy values.

A question: What's "NPL" in the title? Did you want to refer to NPE (NullPointerException) or something else?
Thanks.


was (Author: snemeth):
Hi [~luoyuan],
Can you attach a full yarn-site.xml config file here? Probably there's also something else than the DominantResourceCalculator that comes into play here.
If you have sensitive info like queues names or something like that, you may mask or replace the data with some dummy values.

> activateApplications NPL
> ------------------------
>
>                 Key: YARN-10934
>                 URL: https://issues.apache.org/jira/browse/YARN-10934
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>    Affects Versions: 3.3.1
>            Reporter: Yuan LUO
>            Priority: Major
>
> Our prod Yarn cluster is hadoop version 3.3.1 ,  we changed DefaultResourceCalculator -> DominantResourceCalculator and restart RM, then our RM crashed, the Exception stack like below.  I think this is a serious bug and hope someone can follow up and fix it.
> 2021-08-30 21:00:59,114 ERROR event.EventDispatcher (MarkerIgnoringBase.java:error(159)) - Error in handling event type APP_ATTEMPT_REMOVED to the Event Dispatcher
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.activateApplications(LeafQueue.java:868)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.removeApplicationAttempt(LeafQueue.java:1014)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.finishApplicationAttempt(LeafQueue.java:972)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:1188)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1904)
>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
>         at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:79)
>         at java.base/java.lang.Thread.run(Thread.java:834)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org