You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Sergey Shelukhin (JIRA)" <ji...@apache.org> on 2019/02/20 21:41:00 UTC

[jira] [Updated] (YARN-9320) ConcurrentModificationException in capacity scheduler (updateQueueStatistics)

     [ https://issues.apache.org/jira/browse/YARN-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Shelukhin updated YARN-9320:
-----------------------------------
    Description: 
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top of my head what version it corresponds to. I can look it up if that's important, but I haven't found a bug like this so I suspect it would also affect a current version unless fixed by accident.

If it helps, the cluster is very large (1000s of NMs) so we expect node failures frequently; also some apps may have misconfigured node labels specified spo node label related stuff may go into corner cases. Still, this shouldn't happen based on a user-supplied parameter.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
	at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
	at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}

  was:
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top of my head what version it corresponds to. I can look it up if that's important, but I haven't found a bug like this so I suspect it would also affect a current version unless fixed by accident.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
	at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
	at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}


> ConcurrentModificationException in capacity scheduler (updateQueueStatistics)
> -----------------------------------------------------------------------------
>
>                 Key: YARN-9320
>                 URL: https://issues.apache.org/jira/browse/YARN-9320
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.9.3
>            Reporter: Sergey Shelukhin
>            Priority: Critical
>
> We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top of my head what version it corresponds to. I can look it up if that's important, but I haven't found a bug like this so I suspect it would also affect a current version unless fixed by accident.
> If it helps, the cluster is very large (1000s of NMs) so we expect node failures frequently; also some apps may have misconfigured node labels specified spo node label related stuff may go into corner cases. Still, this shouldn't happen based on a user-supplied parameter.
> {noformat}
> 2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: queueCapacities.getNodePartitionsSet() changed 
> java.util.ConcurrentModificationException
> 	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
> 	at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
> 	at org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org