You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "chan (Jira)" <ji...@apache.org> on 2020/10/30 08:48:00 UTC
[jira] [Commented] (YARN-10440) resource manager hangs,and i cannot
submit any new jobs,but rm and nm processes are normal
[ https://issues.apache.org/jira/browse/YARN-10440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223495#comment-17223495 ]
chan commented on YARN-10440:
-----------------------------
@[~Jufeng] i had ever met this problem and i set the config,hope to help you!
{code:java}
//代码占位符
<property>
<name>yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled</name>
<value>false</value>
</property>
{code}
> resource manager hangs,and i cannot submit any new jobs,but rm and nm processes are normal
> ------------------------------------------------------------------------------------------
>
> Key: YARN-10440
> URL: https://issues.apache.org/jira/browse/YARN-10440
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 3.1.1
> Reporter: jufeng li
> Priority: Blocker
> Attachments: rm_2020-09-26-2.dump
>
>
> RM hangs,and i cannot submit any new jobs,but RM and NM processes are normal. I can open xxxxx:8088/cluster/apps/RUNNING but can not xxxxx:8088/cluster/scheduler.Those apps submited can not end itself and new apps can not be submited.just everything hangs but not RM,NM server. How can I fix this?help me,please!
>
> here is the log:
> {code:java}
> ttempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,679 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,679 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> 2020-09-17 00:22:25,680 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2906)) - Failed to accept allocation proposal
> 2020-09-17 00:22:25,680 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1600074574138_66297_000001 container=null queue=tianqiwang clusterResource=<memory:10240000, vCores:4800> type=NODE_LOCAL requestedPartition=
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org