You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-dev@hadoop.apache.org by Sandy Ryza <sa...@cloudera.com> on 2013/01/08 02:39:43 UTC

multiple requests for

I've come across an NPE in AppSchedulingInfo so I looked around to try to
determine the cause, and I think came across a problem with how containers
are scheduled.  It seems like somebody should have run into this already,
so I wanted to ask about it before I filed a JIRA.  Am I just
misunderstanding how things work?

When requesting a node-local container, YARN schedulers expect three
ResourceRequests - one at the node-level, one at the rack level, and one at
the "*" level.  For each application and priority, these requests are
stored by the RM as a map of location strings to ResourceRequests.
 Schedulers try to schedule requests node-locally, but do rack-local, and
then off-switch, after a given number of heartbeats pass.  When a
node-local container is allocated, the number of outstanding containers is
decremented at each level.  When a rack-local container is allocated, only
the number of outstanding rack local and "*" requests are decremented.

This means that if a rack-local container is allocated, the node-local
container will still be around, and when the scheduler tries to allocate
it, the scheduler should run into an NPE, as there will be no rack-local
ResourceRequest to decrement.
What would be the best way to deal with this?  It seems like node-local
ResourceRequests need to be tied to rack-local ResourceRequests, so that
node-local requests can be removed when their corresponding rack-local
requests are, but the current AllocateRequest is a list of independent
resource requests.

thanks for any guidance,
Sandy

Re: multiple requests for

Posted by Sandy Ryza <sa...@cloudera.com>.

I was able to reproduce it with the attached test.  It also comes up
consistently when running a test suite.  So far we've only tested with the
fair scheduler, so it's possible it's specific to that, but what I
described below should apply to all schedulers.

On Mon, Jan 7, 2013 at 6:05 PM, Bikas Saha <bi...@hortonworks.com> wrote:

> Do you have a repro for this?
>
> Bikas
>
> -----Original Message-----
> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> Sent: Monday, January 07, 2013 5:40 PM
> To: yarn-dev@hadoop.apache.org
> Subject: multiple requests for
>
> I've come across an NPE in AppSchedulingInfo so I looked around to try to
> determine the cause, and I think came across a problem with how containers
> are scheduled.  It seems like somebody should have run into this already,
> so I wanted to ask about it before I filed a JIRA.  Am I just
> misunderstanding how things work?
>
> When requesting a node-local container, YARN schedulers expect three
> ResourceRequests - one at the node-level, one at the rack level, and one
> at the "*" level.  For each application and priority, these requests are
> stored by the RM as a map of location strings to ResourceRequests.
>  Schedulers try to schedule requests node-locally, but do rack-local, and
> then off-switch, after a given number of heartbeats pass.  When a
> node-local container is allocated, the number of outstanding containers is
> decremented at each level.  When a rack-local container is allocated, only
> the number of outstanding rack local and "*" requests are decremented.
>
> This means that if a rack-local container is allocated, the node-local
> container will still be around, and when the scheduler tries to allocate
> it, the scheduler should run into an NPE, as there will be no rack-local
> ResourceRequest to decrement.
> What would be the best way to deal with this?  It seems like node-local
> ResourceRequests need to be tied to rack-local ResourceRequests, so that
> node-local requests can be removed when their corresponding rack-local
> requests are, but the current AllocateRequest is a list of independent
> resource requests.
>
> thanks for any guidance,
> Sandy
>

RE: multiple requests for

Posted by Bikas Saha <bi...@hortonworks.com>.

Do you have a repro for this?

Bikas

-----Original Message-----
From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Monday, January 07, 2013 5:40 PM
To: yarn-dev@hadoop.apache.org
Subject: multiple requests for

I've come across an NPE in AppSchedulingInfo so I looked around to try to
determine the cause, and I think came across a problem with how containers
are scheduled.  It seems like somebody should have run into this already,
so I wanted to ask about it before I filed a JIRA.  Am I just
misunderstanding how things work?

When requesting a node-local container, YARN schedulers expect three
ResourceRequests - one at the node-level, one at the rack level, and one
at the "*" level.  For each application and priority, these requests are
stored by the RM as a map of location strings to ResourceRequests.
 Schedulers try to schedule requests node-locally, but do rack-local, and
then off-switch, after a given number of heartbeats pass.  When a
node-local container is allocated, the number of outstanding containers is
decremented at each level.  When a rack-local container is allocated, only
the number of outstanding rack local and "*" requests are decremented.

This means that if a rack-local container is allocated, the node-local
container will still be around, and when the scheduler tries to allocate
it, the scheduler should run into an NPE, as there will be no rack-local
ResourceRequest to decrement.
What would be the best way to deal with this?  It seems like node-local
ResourceRequests need to be tied to rack-local ResourceRequests, so that
node-local requests can be removed when their corresponding rack-local
requests are, but the current AllocateRequest is a list of independent
resource requests.

thanks for any guidance,
Sandy