You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2013/06/12 17:03:39 UTC

container allocation

If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node?  In other words is the node name a requirement or a hint?
Thanks
John


Re: container allocation

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi John,

At this time, releasing containers is the preferred way to be strict about
your locality requirements.  This is not included in a release yet, but
https://issues.apache.org/jira/browse/YARN-392 allows expressing hard
locality constraints on requests, so you can tell the scheduler to never
give you a node that you didn't ask for.

-Sandy


On Fri, Jun 14, 2013 at 12:41 PM, John Lilley <jo...@redpoint.net>wrote:

>  Thanks that is good to know.  Is there any way to say “please fail if I
> don’t get the node I want?”  Do I just release the container and try
> again?  ****
>
> I’d like to understand the implications of this policy.  Suppose I have
> 1000 data splits and cluster capacity of 100 containers.  If I try to
> schedule 200 tasks, requesting a local data node for each one, how do I
> ensure the highest chance that the tasks run against local data?  Do I just
> ask for all 200 at once?  Should I ask for 100 at a time and then re-target
> the remainder as containers come open?****
>
> Or am I thinking about this all wrong… perhaps I should ask for
> containers, see what nodes they are on, and then assign the data splits to
> them once I see the set of available containers?****
>
> john****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Thursday, June 13, 2013 12:27 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: container allocation****
>
> ** **
>
> By default, the ResourceManager will try give you a container on that
> node, rack or anywhere (in that order).****
>
> ** **
>
> We recently added ability to whitelist or blacklist nodes to allow for
> more control.****
>
> ** **
>
> Arun****
>
> ** **
>
> On Jun 12, 2013, at 8:03 AM, John Lilley wrote:****
>
>
>
> ****
>
> If I request a container on a node, and that node is busy, will the
> request fail, or will it give me a container on a different node?  In other
> words is the node name a requirement or a hint?****
>
> Thanks****
>
> John****
>
>  ****
>
> ** **
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
> ** **
>

Re: container allocation

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi John,

At this time, releasing containers is the preferred way to be strict about
your locality requirements.  This is not included in a release yet, but
https://issues.apache.org/jira/browse/YARN-392 allows expressing hard
locality constraints on requests, so you can tell the scheduler to never
give you a node that you didn't ask for.

-Sandy


On Fri, Jun 14, 2013 at 12:41 PM, John Lilley <jo...@redpoint.net>wrote:

>  Thanks that is good to know.  Is there any way to say “please fail if I
> don’t get the node I want?”  Do I just release the container and try
> again?  ****
>
> I’d like to understand the implications of this policy.  Suppose I have
> 1000 data splits and cluster capacity of 100 containers.  If I try to
> schedule 200 tasks, requesting a local data node for each one, how do I
> ensure the highest chance that the tasks run against local data?  Do I just
> ask for all 200 at once?  Should I ask for 100 at a time and then re-target
> the remainder as containers come open?****
>
> Or am I thinking about this all wrong… perhaps I should ask for
> containers, see what nodes they are on, and then assign the data splits to
> them once I see the set of available containers?****
>
> john****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Thursday, June 13, 2013 12:27 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: container allocation****
>
> ** **
>
> By default, the ResourceManager will try give you a container on that
> node, rack or anywhere (in that order).****
>
> ** **
>
> We recently added ability to whitelist or blacklist nodes to allow for
> more control.****
>
> ** **
>
> Arun****
>
> ** **
>
> On Jun 12, 2013, at 8:03 AM, John Lilley wrote:****
>
>
>
> ****
>
> If I request a container on a node, and that node is busy, will the
> request fail, or will it give me a container on a different node?  In other
> words is the node name a requirement or a hint?****
>
> Thanks****
>
> John****
>
>  ****
>
> ** **
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
> ** **
>

Re: container allocation

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi John,

At this time, releasing containers is the preferred way to be strict about
your locality requirements.  This is not included in a release yet, but
https://issues.apache.org/jira/browse/YARN-392 allows expressing hard
locality constraints on requests, so you can tell the scheduler to never
give you a node that you didn't ask for.

-Sandy


On Fri, Jun 14, 2013 at 12:41 PM, John Lilley <jo...@redpoint.net>wrote:

>  Thanks that is good to know.  Is there any way to say “please fail if I
> don’t get the node I want?”  Do I just release the container and try
> again?  ****
>
> I’d like to understand the implications of this policy.  Suppose I have
> 1000 data splits and cluster capacity of 100 containers.  If I try to
> schedule 200 tasks, requesting a local data node for each one, how do I
> ensure the highest chance that the tasks run against local data?  Do I just
> ask for all 200 at once?  Should I ask for 100 at a time and then re-target
> the remainder as containers come open?****
>
> Or am I thinking about this all wrong… perhaps I should ask for
> containers, see what nodes they are on, and then assign the data splits to
> them once I see the set of available containers?****
>
> john****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Thursday, June 13, 2013 12:27 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: container allocation****
>
> ** **
>
> By default, the ResourceManager will try give you a container on that
> node, rack or anywhere (in that order).****
>
> ** **
>
> We recently added ability to whitelist or blacklist nodes to allow for
> more control.****
>
> ** **
>
> Arun****
>
> ** **
>
> On Jun 12, 2013, at 8:03 AM, John Lilley wrote:****
>
>
>
> ****
>
> If I request a container on a node, and that node is busy, will the
> request fail, or will it give me a container on a different node?  In other
> words is the node name a requirement or a hint?****
>
> Thanks****
>
> John****
>
>  ****
>
> ** **
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
> ** **
>

Re: container allocation

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi John,

At this time, releasing containers is the preferred way to be strict about
your locality requirements.  This is not included in a release yet, but
https://issues.apache.org/jira/browse/YARN-392 allows expressing hard
locality constraints on requests, so you can tell the scheduler to never
give you a node that you didn't ask for.

-Sandy


On Fri, Jun 14, 2013 at 12:41 PM, John Lilley <jo...@redpoint.net>wrote:

>  Thanks that is good to know.  Is there any way to say “please fail if I
> don’t get the node I want?”  Do I just release the container and try
> again?  ****
>
> I’d like to understand the implications of this policy.  Suppose I have
> 1000 data splits and cluster capacity of 100 containers.  If I try to
> schedule 200 tasks, requesting a local data node for each one, how do I
> ensure the highest chance that the tasks run against local data?  Do I just
> ask for all 200 at once?  Should I ask for 100 at a time and then re-target
> the remainder as containers come open?****
>
> Or am I thinking about this all wrong… perhaps I should ask for
> containers, see what nodes they are on, and then assign the data splits to
> them once I see the set of available containers?****
>
> john****
>
> *From:* Arun C Murthy [mailto:acm@hortonworks.com]
> *Sent:* Thursday, June 13, 2013 12:27 AM
> *To:* user@hadoop.apache.org
> *Subject:* Re: container allocation****
>
> ** **
>
> By default, the ResourceManager will try give you a container on that
> node, rack or anywhere (in that order).****
>
> ** **
>
> We recently added ability to whitelist or blacklist nodes to allow for
> more control.****
>
> ** **
>
> Arun****
>
> ** **
>
> On Jun 12, 2013, at 8:03 AM, John Lilley wrote:****
>
>
>
> ****
>
> If I request a container on a node, and that node is busy, will the
> request fail, or will it give me a container on a different node?  In other
> words is the node name a requirement or a hint?****
>
> Thanks****
>
> John****
>
>  ****
>
> ** **
>
> --****
>
> Arun C. Murthy****
>
> Hortonworks Inc.
> http://hortonworks.com/****
>
> ** **
>

RE: container allocation

Posted by John Lilley <jo...@redpoint.net>.
Thanks that is good to know.  Is there any way to say "please fail if I don't get the node I want?"  Do I just release the container and try again?
I'd like to understand the implications of this policy.  Suppose I have 1000 data splits and cluster capacity of 100 containers.  If I try to schedule 200 tasks, requesting a local data node for each one, how do I ensure the highest chance that the tasks run against local data?  Do I just ask for all 200 at once?  Should I ask for 100 at a time and then re-target the remainder as containers come open?
Or am I thinking about this all wrong... perhaps I should ask for containers, see what nodes they are on, and then assign the data splits to them once I see the set of available containers?
john
From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, June 13, 2013 12:27 AM
To: user@hadoop.apache.org
Subject: Re: container allocation

By default, the ResourceManager will try give you a container on that node, rack or anywhere (in that order).

We recently added ability to whitelist or blacklist nodes to allow for more control.

Arun

On Jun 12, 2013, at 8:03 AM, John Lilley wrote:


If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node?  In other words is the node name a requirement or a hint?
Thanks
John


--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


RE: container allocation

Posted by John Lilley <jo...@redpoint.net>.
Thanks that is good to know.  Is there any way to say "please fail if I don't get the node I want?"  Do I just release the container and try again?
I'd like to understand the implications of this policy.  Suppose I have 1000 data splits and cluster capacity of 100 containers.  If I try to schedule 200 tasks, requesting a local data node for each one, how do I ensure the highest chance that the tasks run against local data?  Do I just ask for all 200 at once?  Should I ask for 100 at a time and then re-target the remainder as containers come open?
Or am I thinking about this all wrong... perhaps I should ask for containers, see what nodes they are on, and then assign the data splits to them once I see the set of available containers?
john
From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, June 13, 2013 12:27 AM
To: user@hadoop.apache.org
Subject: Re: container allocation

By default, the ResourceManager will try give you a container on that node, rack or anywhere (in that order).

We recently added ability to whitelist or blacklist nodes to allow for more control.

Arun

On Jun 12, 2013, at 8:03 AM, John Lilley wrote:


If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node?  In other words is the node name a requirement or a hint?
Thanks
John


--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


RE: container allocation

Posted by John Lilley <jo...@redpoint.net>.
Thanks that is good to know.  Is there any way to say "please fail if I don't get the node I want?"  Do I just release the container and try again?
I'd like to understand the implications of this policy.  Suppose I have 1000 data splits and cluster capacity of 100 containers.  If I try to schedule 200 tasks, requesting a local data node for each one, how do I ensure the highest chance that the tasks run against local data?  Do I just ask for all 200 at once?  Should I ask for 100 at a time and then re-target the remainder as containers come open?
Or am I thinking about this all wrong... perhaps I should ask for containers, see what nodes they are on, and then assign the data splits to them once I see the set of available containers?
john
From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, June 13, 2013 12:27 AM
To: user@hadoop.apache.org
Subject: Re: container allocation

By default, the ResourceManager will try give you a container on that node, rack or anywhere (in that order).

We recently added ability to whitelist or blacklist nodes to allow for more control.

Arun

On Jun 12, 2013, at 8:03 AM, John Lilley wrote:


If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node?  In other words is the node name a requirement or a hint?
Thanks
John


--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


RE: container allocation

Posted by John Lilley <jo...@redpoint.net>.
Thanks that is good to know.  Is there any way to say "please fail if I don't get the node I want?"  Do I just release the container and try again?
I'd like to understand the implications of this policy.  Suppose I have 1000 data splits and cluster capacity of 100 containers.  If I try to schedule 200 tasks, requesting a local data node for each one, how do I ensure the highest chance that the tasks run against local data?  Do I just ask for all 200 at once?  Should I ask for 100 at a time and then re-target the remainder as containers come open?
Or am I thinking about this all wrong... perhaps I should ask for containers, see what nodes they are on, and then assign the data splits to them once I see the set of available containers?
john
From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, June 13, 2013 12:27 AM
To: user@hadoop.apache.org
Subject: Re: container allocation

By default, the ResourceManager will try give you a container on that node, rack or anywhere (in that order).

We recently added ability to whitelist or blacklist nodes to allow for more control.

Arun

On Jun 12, 2013, at 8:03 AM, John Lilley wrote:


If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node?  In other words is the node name a requirement or a hint?
Thanks
John


--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


Re: container allocation

Posted by Arun C Murthy <ac...@hortonworks.com>.
By default, the ResourceManager will try give you a container on that node, rack or anywhere (in that order).

We recently added ability to whitelist or blacklist nodes to allow for more control.

Arun

On Jun 12, 2013, at 8:03 AM, John Lilley wrote:

> If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node?  In other words is the node name a requirement or a hint?
> Thanks
> John
>  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: container allocation

Posted by Arun C Murthy <ac...@hortonworks.com>.
By default, the ResourceManager will try give you a container on that node, rack or anywhere (in that order).

We recently added ability to whitelist or blacklist nodes to allow for more control.

Arun

On Jun 12, 2013, at 8:03 AM, John Lilley wrote:

> If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node?  In other words is the node name a requirement or a hint?
> Thanks
> John
>  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: container allocation

Posted by Arun C Murthy <ac...@hortonworks.com>.
By default, the ResourceManager will try give you a container on that node, rack or anywhere (in that order).

We recently added ability to whitelist or blacklist nodes to allow for more control.

Arun

On Jun 12, 2013, at 8:03 AM, John Lilley wrote:

> If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node?  In other words is the node name a requirement or a hint?
> Thanks
> John
>  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: container allocation

Posted by Arun C Murthy <ac...@hortonworks.com>.
By default, the ResourceManager will try give you a container on that node, rack or anywhere (in that order).

We recently added ability to whitelist or blacklist nodes to allow for more control.

Arun

On Jun 12, 2013, at 8:03 AM, John Lilley wrote:

> If I request a container on a node, and that node is busy, will the request fail, or will it give me a container on a different node?  In other words is the node name a requirement or a hint?
> Thanks
> John
>  

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/