You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Chen Yufei (JIRA)" <ji...@apache.org> on 2018/07/16 02:40:00 UTC

[jira] [Comment Edited] (YARN-8513) CapacityScheduler infinite loop when queue is near fully utilized

    [ https://issues.apache.org/jira/browse/YARN-8513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544742#comment-16544742 ] 

Chen Yufei edited comment on YARN-8513 at 7/16/18 2:39 AM:
-----------------------------------------------------------

[~yuanbo] I've uploaded jstack and top log when the problem appeared yesterday.

jstack log is captured for 5 times thus 5 log files.

[^top-during-lock.log] is captured when RM is not responding to requests.

[^top-when-normal.log] is captured today and RM is running normally.


was (Author: cyfdecyf):
[~yuanbo] I've uploaded jstack and top log when the problem appears yesterday.

jstack log are captured for 5 times thus 5 log files.

[^top-during-lock.log] is captured when RM is not responding to requests.

[^top-when-normal.log] is captured today and RM is running normally.

> CapacityScheduler infinite loop when queue is near fully utilized
> -----------------------------------------------------------------
>
>                 Key: YARN-8513
>                 URL: https://issues.apache.org/jira/browse/YARN-8513
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, yarn
>    Affects Versions: 2.9.1
>         Environment: Ubuntu 14.04.5
> YARN is configured with one label and 5 queues.
>            Reporter: Chen Yufei
>            Priority: Major
>         Attachments: jstack-1.log, jstack-2.log, jstack-3.log, jstack-4.log, jstack-5.log, top-during-lock.log, top-when-normal.log
>
>
> ResourceManager does not respond to any request when queue is near fully utilized sometimes. Sending SIGTERM won't stop RM, only SIGKILL can. After RM restart, it can recover running jobs and start accepting new ones.
>  
> Seems like CapacityScheduler is in an infinite loop printing out the following log messages (more than 25,000 lines in a second):
>  
> {{2018-07-10 17:16:29,227 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.99816763 absoluteUsedCapacity=0.99816763 used=<memory:16170624, vCores:1577> cluster=<memory:29441544, vCores:5792>}}
> {{2018-07-10 17:16:29,227 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal}}
> {{2018-07-10 17:16:29,227 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1530619767030_1652_000001 container=null queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@14420943 clusterResource=<memory:29441544, vCores:5792> type=NODE_LOCAL requestedPartition=}}
>  
> I encounter this problem several times after upgrading to YARN 2.9.1, while the same configuration works fine under version 2.7.3.
>  
> YARN-4477 is an infinite loop bug in FairScheduler, not sure if this is a similar problem.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org