You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-dev@hadoop.apache.org by Bikas Saha <bi...@hortonworks.com> on 2013/01/02 20:04:54 UTC

RE: scheduler satisfying heterogeneous resource requests at same priority

Reading the code seems to suggest that AppSchedulingInfo is not preferring
the larger request. Its simply returning the last request for that
priority and hostname. So it could be that in your case, the larger
request is the second request. You could try and make it the first request
and check if you get the same results.

Wrt, your ResourceRequest question, having a single Resource capability
simplifies ResourceRequest operations. Having heterogeneous resources is
allowed by the API by submitting multiple ResourceRequests having
different Resource capabilities. See the RMContainerRequestor code in the
MR YARN app. Given the above, it looks like the Resource heterogeneity is
lost inside the AppSchedulingInfo and that may be a bug or a conscious
decision. Looking at folks experienced in that code for an answer. How is
everything working despite this? Perhaps because the applications are not
issuing heterogeneous requests for a given priority and location.
Secondly, the * catch all is always around to save the day.

Let me know if this makes sense. I may have missed stuff.

-----Original Message-----
From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Friday, December 28, 2012 4:46 PM
To: yarn-dev@hadoop.apache.org
Subject: scheduler satisfying heterogeneous resource requests at same
priority

I am trying to understand how YARN schedulers are able to satisfy smaller
requests while larger requests are outstanding (per YARN-289).

Consider the following situation:
An application submits two requests - one for a container with 1024 MB and
one for a container with 2048 MB.  1024 MB frees up on a node.  The
scheduler should (or might wish to) place the smaller container on the
node, instead of placing a reservation for the larger one.

However, currently, if I understand correctly, the larger request is
always serviced first.  AppSchedulingInfo, which is used by all the
schedulers to find a container request when space becomes available,
stores a map of priorities to maps of node/rack/* to ResourceRequests.  A
ResourceRequest contains a single Resource (capability), and the number of
containers.  Why does a ResourceRequest not allow for heterogeneous
containers.  Is this just not supported yet because it hasn't been needed
yet?  Or is there a more fundamental reason I'm missing about why it
doesn't make sense?

many thanks for any guidance,
Sandy

RE: scheduler satisfying heterogeneous resource requests at same priority

Posted by Bikas Saha <bi...@hortonworks.com>.

To be clear, I think there is no issue in requesting resources of
different sizes (at same priority and location). Its possible to express
that in the protocol. The issue is that AppSchedulingInfo loses that
information by storing ResourceRequest only by priority and location
instead of priority, location and size.

Bikas

-----Original Message-----
From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Thursday, January 03, 2013 11:39 PM
To: yarn-dev@hadoop.apache.org
Subject: Re: scheduler satisfying heterogeneous resource requests at same
priority

Thanks all for the guidance.  I filed
https://issues.apache.org/jira/browse/YARN-314 to add in the ability to
request containers with different resource requirements at the same
priority level.  While it it doesn't look like it's needed for apps
currently, and can be circumvented by specifying different priorities if
absolutely necessary, it seems to me that it should be there for the
future and for completeness sake.

-Sandy

On Thu, Jan 3, 2013 at 8:14 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The answer is a combination of what Robert and Bikas mentioned above.
>
>  - Priorities are used to order the scheduling requests.
>  - At a given priority, if you have requests of different sizes, it
> could be looking at the last request. We can clarify this in docs.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Jan 3, 2013, at 3:15 AM, Tsuyoshi OZAWA wrote:
>
> > Sandy, it also depends on the timing. For instance, in MapReduce's
> > case, MRAppMaster requests the containers for each task separately.
> > Could you explain the timing when you issue each request?
> >
> >
> > On Thu, Jan 3, 2013 at 5:52 AM, Robert Evans <ev...@yahoo-inc.com>
> wrote:
> >
> >> Mappers and reducers are requested at different priorities.
> >> Reducers
> have
> >> a higher priority. But the AM does not request all of the reducers
> >> at once. It waits and will request some at a time until all of the
> >> mappers have been satisfied at which point it then requests the
> >> rest of the reducers.
> >>
> >> --Bobby
> >>
> >> On 1/2/13 2:47 PM, "Sandy Ryza" <sa...@cloudera.com> wrote:
> >>
> >>> Thanks for looking into it Bikas.  What you wrote makes sense to me.
> >>> You're
> >>> right that it's the last request not the largest.  Otherwise, you
> >>> summarize my confusion well - why doesn't AppSchedulingInfo hold a
> >>> list of ResourceRequests for each node/priority?
> >>>
> >>> I also don't understand why this hasn't caused a problem already
> >>> for mapreduce when mappers and reducers request different amounts
> >>> of
> memory.
> >>> It must be either because reduces are requested after all map
> containers
> >>> are completed? Or because they're requested at non-overlapping
> locations?
> >>>
> >>> On Wed, Jan 2, 2013 at 11:04 AM, Bikas Saha
> >>> <bi...@hortonworks.com>
> >> wrote:
> >>>
> >>>> Reading the code seems to suggest that AppSchedulingInfo is not
> >>>> preferring the larger request. Its simply returning the last
> >>>> request for that priority and hostname. So it could be that in
> >>>> your case, the larger request is the second request. You could
> >>>> try and make it the first request and check if you get the same
> >>>> results.
> >>>>
> >>>> Wrt, your ResourceRequest question, having a single Resource
> capability
> >>>> simplifies ResourceRequest operations. Having heterogeneous
> >>>> resources
> is
> >>>> allowed by the API by submitting multiple ResourceRequests having
> >>>> different Resource capabilities. See the RMContainerRequestor
> >>>> code in the MR YARN app. Given the above, it looks like the
> >>>> Resource heterogeneity is lost inside the AppSchedulingInfo and
> >>>> that may be a bug or a conscious decision. Looking at folks
> >>>> experienced in that code for an answer. How is everything working
> >>>> despite this? Perhaps because the applications are not issuing
> >>>> heterogeneous requests for a given priority and location.
> >>>> Secondly, the * catch all is always around to save the day.
> >>>>
> >>>> Let me know if this makes sense. I may have missed stuff.
> >>>>
> >>>> -----Original Message-----
> >>>> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> >>>> Sent: Friday, December 28, 2012 4:46 PM
> >>>> To: yarn-dev@hadoop.apache.org
> >>>> Subject: scheduler satisfying heterogeneous resource requests at
> >>>> same priority
> >>>>
> >>>> I am trying to understand how YARN schedulers are able to satisfy
> >>>> smaller requests while larger requests are outstanding (per
> >>>> YARN-289).
> >>>>
> >>>> Consider the following situation:
> >>>> An application submits two requests - one for a container with
> >>>> 1024 MB and one for a container with 2048 MB.  1024 MB frees up
> >>>> on a node.  The scheduler should (or might wish to) place the
> >>>> smaller container on the node, instead of placing a reservation
> >>>> for the larger one.
> >>>>
> >>>> However, currently, if I understand correctly, the larger request
> >>>> is always serviced first.  AppSchedulingInfo, which is used by
> >>>> all the schedulers to find a container request when space becomes
> >>>> available, stores a map of priorities to maps of node/rack/* to
ResourceRequests.
> >>>> A
> >>>> ResourceRequest contains a single Resource (capability), and the
> number
> >>>> of
> >>>> containers.  Why does a ResourceRequest not allow for
> >>>> heterogeneous containers.  Is this just not supported yet because
> >>>> it hasn't been needed yet?  Or is there a more fundamental reason
> >>>> I'm missing about why it doesn't make sense?
> >>>>
> >>>> many thanks for any guidance,
> >>>> Sandy
> >>>>
> >>
> >>
> >
> >
> > --
> > OZAWA Tsuyoshi
>
>

Re: scheduler satisfying heterogeneous resource requests at same priority

Posted by Sandy Ryza <sa...@cloudera.com>.

Thanks all for the guidance.  I filed
https://issues.apache.org/jira/browse/YARN-314 to add in the ability to
request containers with different resource requirements at the same
priority level.  While it it doesn't look like it's needed for apps
currently, and can be circumvented by specifying different priorities if
absolutely necessary, it seems to me that it should be there for the future
and for completeness sake.

-Sandy

On Thu, Jan 3, 2013 at 8:14 PM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

> The answer is a combination of what Robert and Bikas mentioned above.
>
>  - Priorities are used to order the scheduling requests.
>  - At a given priority, if you have requests of different sizes, it could
> be looking at the last request. We can clarify this in docs.
>
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Jan 3, 2013, at 3:15 AM, Tsuyoshi OZAWA wrote:
>
> > Sandy, it also depends on the timing. For instance, in MapReduce's case,
> > MRAppMaster requests the containers for each task separately. Could you
> > explain the timing when you issue each request?
> >
> >
> > On Thu, Jan 3, 2013 at 5:52 AM, Robert Evans <ev...@yahoo-inc.com>
> wrote:
> >
> >> Mappers and reducers are requested at different priorities.  Reducers
> have
> >> a higher priority. But the AM does not request all of the reducers at
> >> once. It waits and will request some at a time until all of the mappers
> >> have been satisfied at which point it then requests the rest of the
> >> reducers.
> >>
> >> --Bobby
> >>
> >> On 1/2/13 2:47 PM, "Sandy Ryza" <sa...@cloudera.com> wrote:
> >>
> >>> Thanks for looking into it Bikas.  What you wrote makes sense to me.
> >>> You're
> >>> right that it's the last request not the largest.  Otherwise, you
> >>> summarize
> >>> my confusion well - why doesn't AppSchedulingInfo hold a list of
> >>> ResourceRequests for each node/priority?
> >>>
> >>> I also don't understand why this hasn't caused a problem already for
> >>> mapreduce when mappers and reducers request different amounts of
> memory.
> >>> It must be either because reduces are requested after all map
> containers
> >>> are completed? Or because they're requested at non-overlapping
> locations?
> >>>
> >>> On Wed, Jan 2, 2013 at 11:04 AM, Bikas Saha <bi...@hortonworks.com>
> >> wrote:
> >>>
> >>>> Reading the code seems to suggest that AppSchedulingInfo is not
> >>>> preferring
> >>>> the larger request. Its simply returning the last request for that
> >>>> priority and hostname. So it could be that in your case, the larger
> >>>> request is the second request. You could try and make it the first
> >>>> request
> >>>> and check if you get the same results.
> >>>>
> >>>> Wrt, your ResourceRequest question, having a single Resource
> capability
> >>>> simplifies ResourceRequest operations. Having heterogeneous resources
> is
> >>>> allowed by the API by submitting multiple ResourceRequests having
> >>>> different Resource capabilities. See the RMContainerRequestor code in
> >>>> the
> >>>> MR YARN app. Given the above, it looks like the Resource heterogeneity
> >>>> is
> >>>> lost inside the AppSchedulingInfo and that may be a bug or a conscious
> >>>> decision. Looking at folks experienced in that code for an answer. How
> >>>> is
> >>>> everything working despite this? Perhaps because the applications are
> >>>> not
> >>>> issuing heterogeneous requests for a given priority and location.
> >>>> Secondly, the * catch all is always around to save the day.
> >>>>
> >>>> Let me know if this makes sense. I may have missed stuff.
> >>>>
> >>>> -----Original Message-----
> >>>> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> >>>> Sent: Friday, December 28, 2012 4:46 PM
> >>>> To: yarn-dev@hadoop.apache.org
> >>>> Subject: scheduler satisfying heterogeneous resource requests at same
> >>>> priority
> >>>>
> >>>> I am trying to understand how YARN schedulers are able to satisfy
> >>>> smaller
> >>>> requests while larger requests are outstanding (per YARN-289).
> >>>>
> >>>> Consider the following situation:
> >>>> An application submits two requests - one for a container with 1024 MB
> >>>> and
> >>>> one for a container with 2048 MB.  1024 MB frees up on a node.  The
> >>>> scheduler should (or might wish to) place the smaller container on the
> >>>> node, instead of placing a reservation for the larger one.
> >>>>
> >>>> However, currently, if I understand correctly, the larger request is
> >>>> always serviced first.  AppSchedulingInfo, which is used by all the
> >>>> schedulers to find a container request when space becomes available,
> >>>> stores a map of priorities to maps of node/rack/* to ResourceRequests.
> >>>> A
> >>>> ResourceRequest contains a single Resource (capability), and the
> number
> >>>> of
> >>>> containers.  Why does a ResourceRequest not allow for heterogeneous
> >>>> containers.  Is this just not supported yet because it hasn't been
> >>>> needed
> >>>> yet?  Or is there a more fundamental reason I'm missing about why it
> >>>> doesn't make sense?
> >>>>
> >>>> many thanks for any guidance,
> >>>> Sandy
> >>>>
> >>
> >>
> >
> >
> > --
> > OZAWA Tsuyoshi
>
>

Re: scheduler satisfying heterogeneous resource requests at same priority

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

The answer is a combination of what Robert and Bikas mentioned above.

 - Priorities are used to order the scheduling requests.
 - At a given priority, if you have requests of different sizes, it could be looking at the last request. We can clarify this in docs.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Jan 3, 2013, at 3:15 AM, Tsuyoshi OZAWA wrote:

> Sandy, it also depends on the timing. For instance, in MapReduce's case,
> MRAppMaster requests the containers for each task separately. Could you
> explain the timing when you issue each request?
> 
> 
> On Thu, Jan 3, 2013 at 5:52 AM, Robert Evans <ev...@yahoo-inc.com> wrote:
> 
>> Mappers and reducers are requested at different priorities.  Reducers have
>> a higher priority. But the AM does not request all of the reducers at
>> once. It waits and will request some at a time until all of the mappers
>> have been satisfied at which point it then requests the rest of the
>> reducers.
>> 
>> --Bobby
>> 
>> On 1/2/13 2:47 PM, "Sandy Ryza" <sa...@cloudera.com> wrote:
>> 
>>> Thanks for looking into it Bikas.  What you wrote makes sense to me.
>>> You're
>>> right that it's the last request not the largest.  Otherwise, you
>>> summarize
>>> my confusion well - why doesn't AppSchedulingInfo hold a list of
>>> ResourceRequests for each node/priority?
>>> 
>>> I also don't understand why this hasn't caused a problem already for
>>> mapreduce when mappers and reducers request different amounts of memory.
>>> It must be either because reduces are requested after all map containers
>>> are completed? Or because they're requested at non-overlapping locations?
>>> 
>>> On Wed, Jan 2, 2013 at 11:04 AM, Bikas Saha <bi...@hortonworks.com>
>> wrote:
>>> 
>>>> Reading the code seems to suggest that AppSchedulingInfo is not
>>>> preferring
>>>> the larger request. Its simply returning the last request for that
>>>> priority and hostname. So it could be that in your case, the larger
>>>> request is the second request. You could try and make it the first
>>>> request
>>>> and check if you get the same results.
>>>> 
>>>> Wrt, your ResourceRequest question, having a single Resource capability
>>>> simplifies ResourceRequest operations. Having heterogeneous resources is
>>>> allowed by the API by submitting multiple ResourceRequests having
>>>> different Resource capabilities. See the RMContainerRequestor code in
>>>> the
>>>> MR YARN app. Given the above, it looks like the Resource heterogeneity
>>>> is
>>>> lost inside the AppSchedulingInfo and that may be a bug or a conscious
>>>> decision. Looking at folks experienced in that code for an answer. How
>>>> is
>>>> everything working despite this? Perhaps because the applications are
>>>> not
>>>> issuing heterogeneous requests for a given priority and location.
>>>> Secondly, the * catch all is always around to save the day.
>>>> 
>>>> Let me know if this makes sense. I may have missed stuff.
>>>> 
>>>> -----Original Message-----
>>>> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
>>>> Sent: Friday, December 28, 2012 4:46 PM
>>>> To: yarn-dev@hadoop.apache.org
>>>> Subject: scheduler satisfying heterogeneous resource requests at same
>>>> priority
>>>> 
>>>> I am trying to understand how YARN schedulers are able to satisfy
>>>> smaller
>>>> requests while larger requests are outstanding (per YARN-289).
>>>> 
>>>> Consider the following situation:
>>>> An application submits two requests - one for a container with 1024 MB
>>>> and
>>>> one for a container with 2048 MB.  1024 MB frees up on a node.  The
>>>> scheduler should (or might wish to) place the smaller container on the
>>>> node, instead of placing a reservation for the larger one.
>>>> 
>>>> However, currently, if I understand correctly, the larger request is
>>>> always serviced first.  AppSchedulingInfo, which is used by all the
>>>> schedulers to find a container request when space becomes available,
>>>> stores a map of priorities to maps of node/rack/* to ResourceRequests.
>>>> A
>>>> ResourceRequest contains a single Resource (capability), and the number
>>>> of
>>>> containers.  Why does a ResourceRequest not allow for heterogeneous
>>>> containers.  Is this just not supported yet because it hasn't been
>>>> needed
>>>> yet?  Or is there a more fundamental reason I'm missing about why it
>>>> doesn't make sense?
>>>> 
>>>> many thanks for any guidance,
>>>> Sandy
>>>> 
>> 
>> 
> 
> 
> -- 
> OZAWA Tsuyoshi

Re: scheduler satisfying heterogeneous resource requests at same priority

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.

Sandy, it also depends on the timing. For instance, in MapReduce's case,
MRAppMaster requests the containers for each task separately. Could you
explain the timing when you issue each request?


On Thu, Jan 3, 2013 at 5:52 AM, Robert Evans <ev...@yahoo-inc.com> wrote:

> Mappers and reducers are requested at different priorities.  Reducers have
> a higher priority. But the AM does not request all of the reducers at
> once. It waits and will request some at a time until all of the mappers
> have been satisfied at which point it then requests the rest of the
> reducers.
>
> --Bobby
>
> On 1/2/13 2:47 PM, "Sandy Ryza" <sa...@cloudera.com> wrote:
>
> >Thanks for looking into it Bikas.  What you wrote makes sense to me.
> >You're
> >right that it's the last request not the largest.  Otherwise, you
> >summarize
> >my confusion well - why doesn't AppSchedulingInfo hold a list of
> >ResourceRequests for each node/priority?
> >
> >I also don't understand why this hasn't caused a problem already for
> >mapreduce when mappers and reducers request different amounts of memory.
> > It must be either because reduces are requested after all map containers
> >are completed? Or because they're requested at non-overlapping locations?
> >
> >On Wed, Jan 2, 2013 at 11:04 AM, Bikas Saha <bi...@hortonworks.com>
> wrote:
> >
> >> Reading the code seems to suggest that AppSchedulingInfo is not
> >>preferring
> >> the larger request. Its simply returning the last request for that
> >> priority and hostname. So it could be that in your case, the larger
> >> request is the second request. You could try and make it the first
> >>request
> >> and check if you get the same results.
> >>
> >> Wrt, your ResourceRequest question, having a single Resource capability
> >> simplifies ResourceRequest operations. Having heterogeneous resources is
> >> allowed by the API by submitting multiple ResourceRequests having
> >> different Resource capabilities. See the RMContainerRequestor code in
> >>the
> >> MR YARN app. Given the above, it looks like the Resource heterogeneity
> >>is
> >> lost inside the AppSchedulingInfo and that may be a bug or a conscious
> >> decision. Looking at folks experienced in that code for an answer. How
> >>is
> >> everything working despite this? Perhaps because the applications are
> >>not
> >> issuing heterogeneous requests for a given priority and location.
> >> Secondly, the * catch all is always around to save the day.
> >>
> >> Let me know if this makes sense. I may have missed stuff.
> >>
> >> -----Original Message-----
> >> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> >> Sent: Friday, December 28, 2012 4:46 PM
> >> To: yarn-dev@hadoop.apache.org
> >> Subject: scheduler satisfying heterogeneous resource requests at same
> >> priority
> >>
> >> I am trying to understand how YARN schedulers are able to satisfy
> >>smaller
> >> requests while larger requests are outstanding (per YARN-289).
> >>
> >> Consider the following situation:
> >> An application submits two requests - one for a container with 1024 MB
> >>and
> >> one for a container with 2048 MB.  1024 MB frees up on a node.  The
> >> scheduler should (or might wish to) place the smaller container on the
> >> node, instead of placing a reservation for the larger one.
> >>
> >> However, currently, if I understand correctly, the larger request is
> >> always serviced first.  AppSchedulingInfo, which is used by all the
> >> schedulers to find a container request when space becomes available,
> >> stores a map of priorities to maps of node/rack/* to ResourceRequests.
> >>A
> >> ResourceRequest contains a single Resource (capability), and the number
> >>of
> >> containers.  Why does a ResourceRequest not allow for heterogeneous
> >> containers.  Is this just not supported yet because it hasn't been
> >>needed
> >> yet?  Or is there a more fundamental reason I'm missing about why it
> >> doesn't make sense?
> >>
> >> many thanks for any guidance,
> >> Sandy
> >>
>
>


-- 
OZAWA Tsuyoshi

RE: scheduler satisfying heterogeneous resource requests at same priority

Posted by Bikas Saha <bi...@hortonworks.com>.

Most likely because mappers and reducers are scheduled at different
priorities.

To summarize, the issue seems to be in AppSchedulingInfo not maintaining
ResourceRequests by the Resource capability. Alternative would be to have
ResourceRequest itself contain multiple capabilities but that IMO would be
hard to work with and also a big surgery to the code base.

-----Original Message-----
From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Wednesday, January 02, 2013 12:47 PM
To: yarn-dev@hadoop.apache.org
Subject: Re: scheduler satisfying heterogeneous resource requests at same
priority

Thanks for looking into it Bikas.  What you wrote makes sense to me.
You're right that it's the last request not the largest.  Otherwise, you
summarize my confusion well - why doesn't AppSchedulingInfo hold a list of
ResourceRequests for each node/priority?

I also don't understand why this hasn't caused a problem already for
mapreduce when mappers and reducers request different amounts of memory.
 It must be either because reduces are requested after all map containers
are completed? Or because they're requested at non-overlapping locations?

On Wed, Jan 2, 2013 at 11:04 AM, Bikas Saha <bi...@hortonworks.com> wrote:

> Reading the code seems to suggest that AppSchedulingInfo is not
> preferring the larger request. Its simply returning the last request
> for that priority and hostname. So it could be that in your case, the
> larger request is the second request. You could try and make it the
> first request and check if you get the same results.
>
> Wrt, your ResourceRequest question, having a single Resource
> capability simplifies ResourceRequest operations. Having heterogeneous
> resources is allowed by the API by submitting multiple
> ResourceRequests having different Resource capabilities. See the
> RMContainerRequestor code in the MR YARN app. Given the above, it
> looks like the Resource heterogeneity is lost inside the
> AppSchedulingInfo and that may be a bug or a conscious decision.
> Looking at folks experienced in that code for an answer. How is
> everything working despite this? Perhaps because the applications are
not issuing heterogeneous requests for a given priority and location.
> Secondly, the * catch all is always around to save the day.
>
> Let me know if this makes sense. I may have missed stuff.
>
> -----Original Message-----
> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> Sent: Friday, December 28, 2012 4:46 PM
> To: yarn-dev@hadoop.apache.org
> Subject: scheduler satisfying heterogeneous resource requests at same
> priority
>
> I am trying to understand how YARN schedulers are able to satisfy
> smaller requests while larger requests are outstanding (per YARN-289).
>
> Consider the following situation:
> An application submits two requests - one for a container with 1024 MB
> and one for a container with 2048 MB.  1024 MB frees up on a node.
> The scheduler should (or might wish to) place the smaller container on
> the node, instead of placing a reservation for the larger one.
>
> However, currently, if I understand correctly, the larger request is
> always serviced first.  AppSchedulingInfo, which is used by all the
> schedulers to find a container request when space becomes available,
> stores a map of priorities to maps of node/rack/* to ResourceRequests.
> A ResourceRequest contains a single Resource (capability), and the
> number of containers.  Why does a ResourceRequest not allow for
> heterogeneous containers.  Is this just not supported yet because it
> hasn't been needed yet?  Or is there a more fundamental reason I'm
> missing about why it doesn't make sense?
>
> many thanks for any guidance,
> Sandy
>

Re: scheduler satisfying heterogeneous resource requests at same priority

Posted by Robert Evans <ev...@yahoo-inc.com>.

Mappers and reducers are requested at different priorities.  Reducers have
a higher priority. But the AM does not request all of the reducers at
once. It waits and will request some at a time until all of the mappers
have been satisfied at which point it then requests the rest of the
reducers.

--Bobby

On 1/2/13 2:47 PM, "Sandy Ryza" <sa...@cloudera.com> wrote:

>Thanks for looking into it Bikas.  What you wrote makes sense to me.
>You're
>right that it's the last request not the largest.  Otherwise, you
>summarize
>my confusion well - why doesn't AppSchedulingInfo hold a list of
>ResourceRequests for each node/priority?
>
>I also don't understand why this hasn't caused a problem already for
>mapreduce when mappers and reducers request different amounts of memory.
> It must be either because reduces are requested after all map containers
>are completed? Or because they're requested at non-overlapping locations?
>
>On Wed, Jan 2, 2013 at 11:04 AM, Bikas Saha <bi...@hortonworks.com> wrote:
>
>> Reading the code seems to suggest that AppSchedulingInfo is not
>>preferring
>> the larger request. Its simply returning the last request for that
>> priority and hostname. So it could be that in your case, the larger
>> request is the second request. You could try and make it the first
>>request
>> and check if you get the same results.
>>
>> Wrt, your ResourceRequest question, having a single Resource capability
>> simplifies ResourceRequest operations. Having heterogeneous resources is
>> allowed by the API by submitting multiple ResourceRequests having
>> different Resource capabilities. See the RMContainerRequestor code in
>>the
>> MR YARN app. Given the above, it looks like the Resource heterogeneity
>>is
>> lost inside the AppSchedulingInfo and that may be a bug or a conscious
>> decision. Looking at folks experienced in that code for an answer. How
>>is
>> everything working despite this? Perhaps because the applications are
>>not
>> issuing heterogeneous requests for a given priority and location.
>> Secondly, the * catch all is always around to save the day.
>>
>> Let me know if this makes sense. I may have missed stuff.
>>
>> -----Original Message-----
>> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
>> Sent: Friday, December 28, 2012 4:46 PM
>> To: yarn-dev@hadoop.apache.org
>> Subject: scheduler satisfying heterogeneous resource requests at same
>> priority
>>
>> I am trying to understand how YARN schedulers are able to satisfy
>>smaller
>> requests while larger requests are outstanding (per YARN-289).
>>
>> Consider the following situation:
>> An application submits two requests - one for a container with 1024 MB
>>and
>> one for a container with 2048 MB.  1024 MB frees up on a node.  The
>> scheduler should (or might wish to) place the smaller container on the
>> node, instead of placing a reservation for the larger one.
>>
>> However, currently, if I understand correctly, the larger request is
>> always serviced first.  AppSchedulingInfo, which is used by all the
>> schedulers to find a container request when space becomes available,
>> stores a map of priorities to maps of node/rack/* to ResourceRequests.
>>A
>> ResourceRequest contains a single Resource (capability), and the number
>>of
>> containers.  Why does a ResourceRequest not allow for heterogeneous
>> containers.  Is this just not supported yet because it hasn't been
>>needed
>> yet?  Or is there a more fundamental reason I'm missing about why it
>> doesn't make sense?
>>
>> many thanks for any guidance,
>> Sandy
>>

Re: scheduler satisfying heterogeneous resource requests at same priority

Posted by Sandy Ryza <sa...@cloudera.com>.

Thanks for looking into it Bikas.  What you wrote makes sense to me. You're
right that it's the last request not the largest.  Otherwise, you summarize
my confusion well - why doesn't AppSchedulingInfo hold a list of
ResourceRequests for each node/priority?

I also don't understand why this hasn't caused a problem already for
mapreduce when mappers and reducers request different amounts of memory.
 It must be either because reduces are requested after all map containers
are completed? Or because they're requested at non-overlapping locations?

On Wed, Jan 2, 2013 at 11:04 AM, Bikas Saha <bi...@hortonworks.com> wrote:

> Reading the code seems to suggest that AppSchedulingInfo is not preferring
> the larger request. Its simply returning the last request for that
> priority and hostname. So it could be that in your case, the larger
> request is the second request. You could try and make it the first request
> and check if you get the same results.
>
> Wrt, your ResourceRequest question, having a single Resource capability
> simplifies ResourceRequest operations. Having heterogeneous resources is
> allowed by the API by submitting multiple ResourceRequests having
> different Resource capabilities. See the RMContainerRequestor code in the
> MR YARN app. Given the above, it looks like the Resource heterogeneity is
> lost inside the AppSchedulingInfo and that may be a bug or a conscious
> decision. Looking at folks experienced in that code for an answer. How is
> everything working despite this? Perhaps because the applications are not
> issuing heterogeneous requests for a given priority and location.
> Secondly, the * catch all is always around to save the day.
>
> Let me know if this makes sense. I may have missed stuff.
>
> -----Original Message-----
> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> Sent: Friday, December 28, 2012 4:46 PM
> To: yarn-dev@hadoop.apache.org
> Subject: scheduler satisfying heterogeneous resource requests at same
> priority
>
> I am trying to understand how YARN schedulers are able to satisfy smaller
> requests while larger requests are outstanding (per YARN-289).
>
> Consider the following situation:
> An application submits two requests - one for a container with 1024 MB and
> one for a container with 2048 MB.  1024 MB frees up on a node.  The
> scheduler should (or might wish to) place the smaller container on the
> node, instead of placing a reservation for the larger one.
>
> However, currently, if I understand correctly, the larger request is
> always serviced first.  AppSchedulingInfo, which is used by all the
> schedulers to find a container request when space becomes available,
> stores a map of priorities to maps of node/rack/* to ResourceRequests.  A
> ResourceRequest contains a single Resource (capability), and the number of
> containers.  Why does a ResourceRequest not allow for heterogeneous
> containers.  Is this just not supported yet because it hasn't been needed
> yet?  Or is there a more fundamental reason I'm missing about why it
> doesn't make sense?
>
> many thanks for any guidance,
> Sandy
>