You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jun Gong (JIRA)" <ji...@apache.org> on 2015/09/11 16:04:46 UTC
[jira] [Created] (YARN-4148) When killing app, RM releases app's
resource before they are released by NM
Jun Gong created YARN-4148:
------------------------------
Summary: When killing app, RM releases app's resource before they are released by NM
Key: YARN-4148
URL: https://issues.apache.org/jira/browse/YARN-4148
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
When killing a app, RM scheduler releases app's resource as soon as possible, then it might allocate these resource for new requests. But NM have not released them at that time.
The problem was found when we supported GPU as a resource(YARN-4122). Test environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. But when B tried to start container on NM, NM found it didn't have 3 GPUs to allocate because it had not released A's GPUs.
I think the problem also exists for CPU/Memory. It might cause OOM when memory is overused.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)