You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2016/06/22 19:48:16 UTC

[jira] [Created] (YARN-5290) ResourceManager can place more containers on a node than the node size allows

Jason Lowe created YARN-5290:
--------------------------------

             Summary: ResourceManager can place more containers on a node than the node size allows
                 Key: YARN-5290
                 URL: https://issues.apache.org/jira/browse/YARN-5290
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
            Reporter: Jason Lowe


When the ResourceManager or an ApplicationMaster kills a container the RM scheduler instantly thinks the container is dead and frees those resources within the scheduler bookkeeping.  However that container can still be running on the node until the node heartbeats back into the RM and is told to kill the container.  If the RM allocates the space associated with the released container and gives it to an AM quickly enough, the AM can launch a new container while the old container is still running on the NM.  That leads to a scenario where we're technically running more resources on the node than the node advertised to the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Re: [jira] [Created] (YARN-5290) ResourceManager can place more containers on a node than the node size allows

Posted by Joep Rottinghuis <jr...@gmail.com>.
It seems to be another one in a series of bugs rooted in mismatch of state between NMs and the RM.
Aside from playing whack-a-mole is it possible to make a more structural / architectural fix?

> On Jun 22, 2016, at 12:48 PM, Jason Lowe (JIRA) <ji...@apache.org> wrote:
> 
> Jason Lowe created YARN-5290:
> --------------------------------
> 
>             Summary: ResourceManager can place more containers on a node than the node size allows
>                 Key: YARN-5290
>                 URL: https://issues.apache.org/jira/browse/YARN-5290
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Jason Lowe
> 
> 
> When the ResourceManager or an ApplicationMaster kills a container the RM scheduler instantly thinks the container is dead and frees those resources within the scheduler bookkeeping.  However that container can still be running on the node until the node heartbeats back into the RM and is told to kill the container.  If the RM allocates the space associated with the released container and gives it to an AM quickly enough, the AM can launch a new container while the old container is still running on the NM.  That leads to a scenario where we're technically running more resources on the node than the node advertised to the RM.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-help@hadoop.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org