You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jun Gong (JIRA)" <ji...@apache.org> on 2015/09/11 16:04:46 UTC

[jira] [Created] (YARN-4148) When killing app, RM releases app's resource before they are released by NM

Jun Gong created YARN-4148:
------------------------------

             Summary: When killing app, RM releases app's resource before they are released by NM
                 Key: YARN-4148
                 URL: https://issues.apache.org/jira/browse/YARN-4148
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
            Reporter: Jun Gong
            Assignee: Jun Gong


When killing a app, RM scheduler releases app's resource as soon as possible, then it might allocate these resource for new requests. But NM have not released them at that time.

The problem was found when we supported GPU as a resource(YARN-4122).  Test environment: a NM had 6 GPUs, app A used all 6 GPUs, app B was requesting 3 GPUs. Killed app A, then RM released A's 6 GPUs, and allocated 3 GPUs to B. But when B tried to start container on NM, NM found it didn't have 3 GPUs to allocate because it had not released A's GPUs.

I think the problem also exists for CPU/Memory. It might cause OOM when memory is overused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)