You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Tao Yang (JIRA)" <ji...@apache.org> on 2018/07/25 08:40:00 UTC
[jira] [Created] (YARN-8575) CapacityScheduler should check node
state before committing reserve/allocate proposals
Tao Yang created YARN-8575:
------------------------------
Summary: CapacityScheduler should check node state before committing reserve/allocate proposals
Key: YARN-8575
URL: https://issues.apache.org/jira/browse/YARN-8575
Project: Hadoop YARN
Issue Type: Bug
Components: capacityscheduler
Affects Versions: 3.2.0, 3.1.2
Reporter: Tao Yang
Assignee: Tao Yang
Recently we found a new error as follows:
{noformat}
ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: node to unreserve doesn't exist, nodeid: host1:45454
{noformat}
Reproduce this problem:
(1) Create a reserve proposal for app1 on node1
(2) node1 is successfully decommissioned and removed from node tracker
(3) Try to commit this outdated reserve proposal, it will be accepted and applied.
This error may be occurred after decommissioning some NMs. The application who print the error log will always have a reserved container on non-exist (decommissioned) NM and the pending request will never be satisfied.
To solve this problem, scheduler should check node state in FiCaSchedulerApp#accept to avoid committing outdated proposals on unusable nodes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org