You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Carlo Curino <cc...@microsoft.com> on 2014/09/26 20:50:14 UTC

Calling a merge vote for YARN-1051

(Apologies if it is delivered twice.)

YARN Devs,

We propose to merge YARN-1051 development branch into trunk.

Key Idea:
This work adds support for Reservations to YARN RM. The key idea is to allow users to request dedicated access to resources (a reservation), ahead of time.
For example I can ask for "10 containers for 1 hour sometime between 4pm and 9pm today".  The RM keeps track of the accepted reservation by means of
a Plan (think it as an agenda on how the  cluster resources will be used), and performs admission control to guarantee that if a reservation is accepted enough
resources are set aside to satisfy it.  We enforce the reservation promises by dynamically creating/resizing/removing queues at the right time. This allows us
to leverage the existing schedulers for the actual container assignment and tracking. The key benefit is to expose to the scheduler flexibility of allocation, while
guaranteeing users predictable resource allocation.

Status

*         The work has been "broken down" into 14 subtasks (+3 patches already committed to trunk for move/kill of apps). All the issues have been resolved.

*         Jenkins +1 the patch (with the exception of one test failure which we did not introduce, which is tracked here: https://issues.apache.org/jira/browse/MAPREDUCE-6094)

*         Simple integration with MapReduce: https://issues.apache.org/jira/browse/MAPREDUCE-6103

*         The broken-down patches have been reviewed and +1ed by Vinod Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris Douglas. Thanks to all of you for the thorough reviews!

*         The current version has been rather thoroughly tested by running it on our 250 machines research cluster for months (first prototype was operational about a year ago) by:

o   Running hundreds of thousands of job generate by a modified version of gridmix that exercise the reservations mechanism side-by-side normal queues.

o   To support our integration with the resource estimation framework Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf). Kaushik and Dharmesh have been pounding the reservation system for their research for 3-4 months now, and helped us spot few bugs and iron them out.

o   Code has been inspected/extended by 4-5 other researchers which are exploring integration with other systems and extensions of our algorithms for "reservation placement".

*         We have few ideas for follow-up extensions/improvements are tracked by the umbrella JIRA https://issues.apache.org/jira/browse/YARN-2572

Documents and Deliverables

*         This work was accepted for publication to SoCC 2014 (pre-camera ready version of the paper here):   https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf

*         Shorter design doc: https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf

*         Overall patch: https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch

*         Per Karthik request we are preparing a small how-to document and example code/configuration tracked by https://issues.apache.org/jira/browse/YARN-2609


Credits
Myself and Subru did lots of the coding (hence the flow of patches from us), but this is a group effort that could have not been possible without the ideas and hard work of many other
folks in our research group (Microsoft-CISL). Major kudos to:  Chris Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel Difallah. Also big thanks to the many folks in community  (Arun, Vinod, Alejandro, Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby, and many more) that helped us shape our ideas and code with very insightful feedback and comments.

We expect the vote to run for the usual 7 days and will expire at 12pm PDT on Oct 3. Please feel free to reach out to us if you have any questions/doubts.

Cheers,
Carlo & Subru


Re: Calling a merge vote for YARN-1051

Posted by Anubhav Dhoot <ad...@cloudera.com>.
+1 Non binding.
I misread the deadline so mine may not count. This adds a relevant and
important dimension to the scheduling features in YARN.

On Fri, Oct 3, 2014 at 2:23 PM, Carlo Curino <cc...@microsoft.com> wrote:

> Thanks everyone for voting, if I count right we have:
>  4 +1 binding,
>  5 +1 non-binding (including ourselves)
>
> So we are proceeding with merge to trunk (via Chris Douglas),
> and per Vinod's and Karthik's suggestions, we will get a couple
> of clean builds / jenkins runs, and repeat our usual suite of
> runs on clusters and then commit to branch-2 and branch-2.6.
>
> Thanks,
> Carlo & Subru
>
> On 10/2/14 4:17 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:
>
> >If this vote is meant for all branches:
> >
> >+1 to merge to trunk
> >+1 to merge to branch-2
> >+1 to merge to branch-2.6, provided we "label" this feature
> >experimental/alpha until the follow-up items are addressed.
> >-0 to unconditional merge to branch-2.6.
> >
> >PS: We should decide on the way to communicate the stability of a feature.
> >May be, the new-feature notes in the release documentation should have
> >this
> >label?
> >
> >
> >
> >On Wed, Oct 1, 2014 at 6:23 PM, Karthik Kambatla <ka...@cloudera.com>
> >wrote:
> >
> >> +1. Nicely done, Subru and Carlo.
> >>
> >> I have been partially involved with the work, and have reviewed some of
> >> the patches. With some help from Subru and documentation from Carlo
> >> (thanks!), I was able to play with the reservation system. Verified the
> >> following:
> >> 1. Reservations can be made only for the amount of resources available
> >>for
> >> that queue.
> >> 2. Jobs submitted against a reservation run in the corresponding
> >> "reservation" queue, and jobs submitted to the same higher-level queue
> >>but
> >> not against a reservation run in the corresponding "default" queue.
> >> 3. The web-ui shows the reserved resources in a queue even when there
> >>are
> >> no apps running.
> >>
> >> There are a few follow-up items towards feature completeness, and I am
> >> okay with working on them post merge to trunk as planned.
> >> 1. Support for FairScheduler
> >> 2. Recover reservations on RM restart/failover
> >> 3. CLI and/or REST APIs to make reservations - this is very useful for
> >> testing
> >> 4. Documentation in the usual apt.vm format.
> >>
> >> Cheers!
> >> Karthik
> >>
> >>
> >>
> >>
> >> On Wed, Oct 1, 2014 at 1:29 PM, Wangda Tan <wh...@gmail.com> wrote:
> >>
> >>> +1 (non-binding),
> >>> Reviewed several patches related to scheduler side changes. As Jian
> >>> mentioned, this will not affect existing behavior.
> >>> Looking forward this feature will be used by more people. Thanks for
> >>>Carlo
> >>> and Subru!
> >>>
> >>> Thanks,
> >>> Wangda
> >>>
> >>> On Wed, Oct 1, 2014 at 1:21 PM, Jian He <jh...@hortonworks.com> wrote:
> >>>
> >>> > +1,
> >>> >
> >>> > Carlo and Subru,  great job !  thanks for your contribution !
> >>> > I reviewed a couple of CapacityScheduler related patches, they are in
> >>> good
> >>> > shape. In the minimum, they are not affecting existing behavior.
> >>>should
> >>> be
> >>> > safe to merge.
> >>> >
> >>> > Jian
> >>> >
> >>> >
> >>> > On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut
> >>><tj...@apache.org>
> >>> > wrote:
> >>> >
> >>> > > +1 (non-binding)
> >>> > > Thanks for adding this, really useful feature.
> >>> > >
> >>> > > On 30 September 2014 19:40, Chris Douglas <cd...@apache.org>
> >>> wrote:
> >>> > >
> >>> > > > +1
> >>> > > >
> >>> > > > Excellent work, Carlo and Subru. -C
> >>> > > >
> >>> > > > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <
> >>> ccurino@microsoft.com>
> >>> > > > wrote:
> >>> > > > > (Apologies if it is delivered twice.)
> >>> > > > >
> >>> > > > > YARN Devs,
> >>> > > > >
> >>> > > > > We propose to merge YARN-1051 development branch into trunk.
> >>> > > > >
> >>> > > > > Key Idea:
> >>> > > > > This work adds support for Reservations to YARN RM. The key
> >>>idea
> >>> is
> >>> > to
> >>> > > > allow users to request dedicated access to resources (a
> >>> reservation),
> >>> > > ahead
> >>> > > > of time.
> >>> > > > > For example I can ask for "10 containers for 1 hour sometime
> >>> between
> >>> > > 4pm
> >>> > > > and 9pm today".  The RM keeps track of the accepted reservation
> >>>by
> >>> > means
> >>> > > of
> >>> > > > > a Plan (think it as an agenda on how the  cluster resources
> >>>will
> >>> be
> >>> > > > used), and performs admission control to guarantee that if a
> >>> > reservation
> >>> > > is
> >>> > > > accepted enough
> >>> > > > > resources are set aside to satisfy it.  We enforce the
> >>>reservation
> >>> > > > promises by dynamically creating/resizing/removing queues at the
> >>> right
> >>> > > > time. This allows us
> >>> > > > > to leverage the existing schedulers for the actual container
> >>> > assignment
> >>> > > > and tracking. The key benefit is to expose to the scheduler
> >>> flexibility
> >>> > > of
> >>> > > > allocation, while
> >>> > > > > guaranteeing users predictable resource allocation.
> >>> > > > >
> >>> > > > > Status
> >>> > > > >
> >>> > > > > *         The work has been "broken down" into 14 subtasks (+3
> >>> > patches
> >>> > > > already committed to trunk for move/kill of apps). All the issues
> >>> have
> >>> > > been
> >>> > > > resolved.
> >>> > > > >
> >>> > > > > *         Jenkins +1 the patch (with the exception of one test
> >>> > failure
> >>> > > > which we did not introduce, which is tracked here:
> >>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6094)
> >>> > > > >
> >>> > > > > *         Simple integration with MapReduce:
> >>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6103
> >>> > > > >
> >>> > > > > *         The broken-down patches have been reviewed and +1ed
> >>>by
> >>> > Vinod
> >>> > > > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and
> >>>Chris
> >>> > > Douglas.
> >>> > > > Thanks to all of you for the thorough reviews!
> >>> > > > >
> >>> > > > > *         The current version has been rather thoroughly
> >>>tested by
> >>> > > > running it on our 250 machines research cluster for months (first
> >>> > > prototype
> >>> > > > was operational about a year ago) by:
> >>> > > > >
> >>> > > > > o   Running hundreds of thousands of job generate by a modified
> >>> > version
> >>> > > > of gridmix that exercise the reservations mechanism side-by-side
> >>> normal
> >>> > > > queues.
> >>> > > > >
> >>> > > > > o   To support our integration with the resource estimation
> >>> framework
> >>> > > > Perforator (
> >>> http://research.microsoft.com/pubs/178971/perforator.pdf).
> >>> > > > Kaushik and Dharmesh have been pounding the reservation system
> >>>for
> >>> > their
> >>> > > > research for 3-4 months now, and helped us spot few bugs and iron
> >>> them
> >>> > > out.
> >>> > > > >
> >>> > > > > o   Code has been inspected/extended by 4-5 other researchers
> >>> which
> >>> > are
> >>> > > > exploring integration with other systems and extensions of our
> >>> > algorithms
> >>> > > > for "reservation placement".
> >>> > > > >
> >>> > > > > *         We have few ideas for follow-up
> >>>extensions/improvements
> >>> are
> >>> > > > tracked by the umbrella JIRA
> >>> > > > https://issues.apache.org/jira/browse/YARN-2572
> >>> > > > >
> >>> > > > > Documents and Deliverables
> >>> > > > >
> >>> > > > > *         This work was accepted for publication to SoCC 2014
> >>> > > > (pre-camera ready version of the paper here):
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15
> >>>.pdf
> >>> > > > >
> >>> > > > > *         Shorter design doc:
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-desi
> >>>gn.pdf
> >>> > > > >
> >>> > > > > *         Overall patch:
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>>
> https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.pa
> >>>tch
> >>> > > > >
> >>> > > > > *         Per Karthik request we are preparing a small how-to
> >>> > document
> >>> > > > and example code/configuration tracked by
> >>> > > > https://issues.apache.org/jira/browse/YARN-2609
> >>> > > > >
> >>> > > > >
> >>> > > > > Credits
> >>> > > > > Myself and Subru did lots of the coding (hence the flow of
> >>>patches
> >>> > from
> >>> > > > us), but this is a group effort that could have not been possible
> >>> > without
> >>> > > > the ideas and hard work of many other
> >>> > > > > folks in our research group (Microsoft-CISL). Major kudos to:
> >>> Chris
> >>> > > > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel
> >>> > Difallah.
> >>> > > > Also big thanks to the many folks in community  (Arun, Vinod,
> >>> > Alejandro,
> >>> > > > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason,
> >>> Bobby,
> >>> > and
> >>> > > > many more) that helped us shape our ideas and code with very
> >>> insightful
> >>> > > > feedback and comments.
> >>> > > > >
> >>> > > > > We expect the vote to run for the usual 7 days and will expire
> >>>at
> >>> > 12pm
> >>> > > > PDT on Oct 3. Please feel free to reach out to us if you have any
> >>> > > > questions/doubts.
> >>> > > > >
> >>> > > > > Cheers,
> >>> > > > > Carlo & Subru
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>> > --
> >>> > CONFIDENTIALITY NOTICE
> >>> > NOTICE: This message is intended for the use of the individual or
> >>> entity to
> >>> > which it is addressed and may contain information that is
> >>>confidential,
> >>> > privileged and exempt from disclosure under applicable law. If the
> >>> reader
> >>> > of this message is not the intended recipient, you are hereby
> >>>notified
> >>> that
> >>> > any printing, copying, dissemination, distribution, disclosure or
> >>> > forwarding of this communication is strictly prohibited. If you have
> >>> > received this communication in error, please contact the sender
> >>> immediately
> >>> > and delete it from your system. Thank You.
> >>> >
> >>>
> >>
> >>
>
>

Re: Calling a merge vote for YARN-1051

Posted by Carlo Curino <cc...@microsoft.com>.
Thanks everyone for voting, if I count right we have:
 4 +1 binding,
 5 +1 non-binding (including ourselves)

So we are proceeding with merge to trunk (via Chris Douglas),
and per Vinod's and Karthik's suggestions, we will get a couple
of clean builds / jenkins runs, and repeat our usual suite of
runs on clusters and then commit to branch-2 and branch-2.6.

Thanks,
Carlo & Subru

On 10/2/14 4:17 PM, "Karthik Kambatla" <ka...@cloudera.com> wrote:

>If this vote is meant for all branches:
>
>+1 to merge to trunk
>+1 to merge to branch-2
>+1 to merge to branch-2.6, provided we "label" this feature
>experimental/alpha until the follow-up items are addressed.
>-0 to unconditional merge to branch-2.6.
>
>PS: We should decide on the way to communicate the stability of a feature.
>May be, the new-feature notes in the release documentation should have
>this
>label?
>
>
>
>On Wed, Oct 1, 2014 at 6:23 PM, Karthik Kambatla <ka...@cloudera.com>
>wrote:
>
>> +1. Nicely done, Subru and Carlo.
>>
>> I have been partially involved with the work, and have reviewed some of
>> the patches. With some help from Subru and documentation from Carlo
>> (thanks!), I was able to play with the reservation system. Verified the
>> following:
>> 1. Reservations can be made only for the amount of resources available
>>for
>> that queue.
>> 2. Jobs submitted against a reservation run in the corresponding
>> "reservation" queue, and jobs submitted to the same higher-level queue
>>but
>> not against a reservation run in the corresponding "default" queue.
>> 3. The web-ui shows the reserved resources in a queue even when there
>>are
>> no apps running.
>>
>> There are a few follow-up items towards feature completeness, and I am
>> okay with working on them post merge to trunk as planned.
>> 1. Support for FairScheduler
>> 2. Recover reservations on RM restart/failover
>> 3. CLI and/or REST APIs to make reservations - this is very useful for
>> testing
>> 4. Documentation in the usual apt.vm format.
>>
>> Cheers!
>> Karthik
>>
>>
>>
>>
>> On Wed, Oct 1, 2014 at 1:29 PM, Wangda Tan <wh...@gmail.com> wrote:
>>
>>> +1 (non-binding),
>>> Reviewed several patches related to scheduler side changes. As Jian
>>> mentioned, this will not affect existing behavior.
>>> Looking forward this feature will be used by more people. Thanks for
>>>Carlo
>>> and Subru!
>>>
>>> Thanks,
>>> Wangda
>>>
>>> On Wed, Oct 1, 2014 at 1:21 PM, Jian He <jh...@hortonworks.com> wrote:
>>>
>>> > +1,
>>> >
>>> > Carlo and Subru,  great job !  thanks for your contribution !
>>> > I reviewed a couple of CapacityScheduler related patches, they are in
>>> good
>>> > shape. In the minimum, they are not affecting existing behavior.
>>>should
>>> be
>>> > safe to merge.
>>> >
>>> > Jian
>>> >
>>> >
>>> > On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut
>>><tj...@apache.org>
>>> > wrote:
>>> >
>>> > > +1 (non-binding)
>>> > > Thanks for adding this, really useful feature.
>>> > >
>>> > > On 30 September 2014 19:40, Chris Douglas <cd...@apache.org>
>>> wrote:
>>> > >
>>> > > > +1
>>> > > >
>>> > > > Excellent work, Carlo and Subru. -C
>>> > > >
>>> > > > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <
>>> ccurino@microsoft.com>
>>> > > > wrote:
>>> > > > > (Apologies if it is delivered twice.)
>>> > > > >
>>> > > > > YARN Devs,
>>> > > > >
>>> > > > > We propose to merge YARN-1051 development branch into trunk.
>>> > > > >
>>> > > > > Key Idea:
>>> > > > > This work adds support for Reservations to YARN RM. The key
>>>idea
>>> is
>>> > to
>>> > > > allow users to request dedicated access to resources (a
>>> reservation),
>>> > > ahead
>>> > > > of time.
>>> > > > > For example I can ask for "10 containers for 1 hour sometime
>>> between
>>> > > 4pm
>>> > > > and 9pm today".  The RM keeps track of the accepted reservation
>>>by
>>> > means
>>> > > of
>>> > > > > a Plan (think it as an agenda on how the  cluster resources
>>>will
>>> be
>>> > > > used), and performs admission control to guarantee that if a
>>> > reservation
>>> > > is
>>> > > > accepted enough
>>> > > > > resources are set aside to satisfy it.  We enforce the
>>>reservation
>>> > > > promises by dynamically creating/resizing/removing queues at the
>>> right
>>> > > > time. This allows us
>>> > > > > to leverage the existing schedulers for the actual container
>>> > assignment
>>> > > > and tracking. The key benefit is to expose to the scheduler
>>> flexibility
>>> > > of
>>> > > > allocation, while
>>> > > > > guaranteeing users predictable resource allocation.
>>> > > > >
>>> > > > > Status
>>> > > > >
>>> > > > > *         The work has been "broken down" into 14 subtasks (+3
>>> > patches
>>> > > > already committed to trunk for move/kill of apps). All the issues
>>> have
>>> > > been
>>> > > > resolved.
>>> > > > >
>>> > > > > *         Jenkins +1 the patch (with the exception of one test
>>> > failure
>>> > > > which we did not introduce, which is tracked here:
>>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6094)
>>> > > > >
>>> > > > > *         Simple integration with MapReduce:
>>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6103
>>> > > > >
>>> > > > > *         The broken-down patches have been reviewed and +1ed
>>>by
>>> > Vinod
>>> > > > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and
>>>Chris
>>> > > Douglas.
>>> > > > Thanks to all of you for the thorough reviews!
>>> > > > >
>>> > > > > *         The current version has been rather thoroughly
>>>tested by
>>> > > > running it on our 250 machines research cluster for months (first
>>> > > prototype
>>> > > > was operational about a year ago) by:
>>> > > > >
>>> > > > > o   Running hundreds of thousands of job generate by a modified
>>> > version
>>> > > > of gridmix that exercise the reservations mechanism side-by-side
>>> normal
>>> > > > queues.
>>> > > > >
>>> > > > > o   To support our integration with the resource estimation
>>> framework
>>> > > > Perforator (
>>> http://research.microsoft.com/pubs/178971/perforator.pdf).
>>> > > > Kaushik and Dharmesh have been pounding the reservation system
>>>for
>>> > their
>>> > > > research for 3-4 months now, and helped us spot few bugs and iron
>>> them
>>> > > out.
>>> > > > >
>>> > > > > o   Code has been inspected/extended by 4-5 other researchers
>>> which
>>> > are
>>> > > > exploring integration with other systems and extensions of our
>>> > algorithms
>>> > > > for "reservation placement".
>>> > > > >
>>> > > > > *         We have few ideas for follow-up
>>>extensions/improvements
>>> are
>>> > > > tracked by the umbrella JIRA
>>> > > > https://issues.apache.org/jira/browse/YARN-2572
>>> > > > >
>>> > > > > Documents and Deliverables
>>> > > > >
>>> > > > > *         This work was accepted for publication to SoCC 2014
>>> > > > (pre-camera ready version of the paper here):
>>> > > >
>>> > >
>>> >
>>> 
>>>https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15
>>>.pdf
>>> > > > >
>>> > > > > *         Shorter design doc:
>>> > > >
>>> > >
>>> >
>>> 
>>>https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-desi
>>>gn.pdf
>>> > > > >
>>> > > > > *         Overall patch:
>>> > > >
>>> > >
>>> >
>>> 
>>>https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.pa
>>>tch
>>> > > > >
>>> > > > > *         Per Karthik request we are preparing a small how-to
>>> > document
>>> > > > and example code/configuration tracked by
>>> > > > https://issues.apache.org/jira/browse/YARN-2609
>>> > > > >
>>> > > > >
>>> > > > > Credits
>>> > > > > Myself and Subru did lots of the coding (hence the flow of
>>>patches
>>> > from
>>> > > > us), but this is a group effort that could have not been possible
>>> > without
>>> > > > the ideas and hard work of many other
>>> > > > > folks in our research group (Microsoft-CISL). Major kudos to:
>>> Chris
>>> > > > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel
>>> > Difallah.
>>> > > > Also big thanks to the many folks in community  (Arun, Vinod,
>>> > Alejandro,
>>> > > > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason,
>>> Bobby,
>>> > and
>>> > > > many more) that helped us shape our ideas and code with very
>>> insightful
>>> > > > feedback and comments.
>>> > > > >
>>> > > > > We expect the vote to run for the usual 7 days and will expire
>>>at
>>> > 12pm
>>> > > > PDT on Oct 3. Please feel free to reach out to us if you have any
>>> > > > questions/doubts.
>>> > > > >
>>> > > > > Cheers,
>>> > > > > Carlo & Subru
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> > --
>>> > CONFIDENTIALITY NOTICE
>>> > NOTICE: This message is intended for the use of the individual or
>>> entity to
>>> > which it is addressed and may contain information that is
>>>confidential,
>>> > privileged and exempt from disclosure under applicable law. If the
>>> reader
>>> > of this message is not the intended recipient, you are hereby
>>>notified
>>> that
>>> > any printing, copying, dissemination, distribution, disclosure or
>>> > forwarding of this communication is strictly prohibited. If you have
>>> > received this communication in error, please contact the sender
>>> immediately
>>> > and delete it from your system. Thank You.
>>> >
>>>
>>
>>


Re: Calling a merge vote for YARN-1051

Posted by Karthik Kambatla <ka...@cloudera.com>.
If this vote is meant for all branches:

+1 to merge to trunk
+1 to merge to branch-2
+1 to merge to branch-2.6, provided we "label" this feature
experimental/alpha until the follow-up items are addressed.
-0 to unconditional merge to branch-2.6.

PS: We should decide on the way to communicate the stability of a feature.
May be, the new-feature notes in the release documentation should have this
label?



On Wed, Oct 1, 2014 at 6:23 PM, Karthik Kambatla <ka...@cloudera.com> wrote:

> +1. Nicely done, Subru and Carlo.
>
> I have been partially involved with the work, and have reviewed some of
> the patches. With some help from Subru and documentation from Carlo
> (thanks!), I was able to play with the reservation system. Verified the
> following:
> 1. Reservations can be made only for the amount of resources available for
> that queue.
> 2. Jobs submitted against a reservation run in the corresponding
> "reservation" queue, and jobs submitted to the same higher-level queue but
> not against a reservation run in the corresponding "default" queue.
> 3. The web-ui shows the reserved resources in a queue even when there are
> no apps running.
>
> There are a few follow-up items towards feature completeness, and I am
> okay with working on them post merge to trunk as planned.
> 1. Support for FairScheduler
> 2. Recover reservations on RM restart/failover
> 3. CLI and/or REST APIs to make reservations - this is very useful for
> testing
> 4. Documentation in the usual apt.vm format.
>
> Cheers!
> Karthik
>
>
>
>
> On Wed, Oct 1, 2014 at 1:29 PM, Wangda Tan <wh...@gmail.com> wrote:
>
>> +1 (non-binding),
>> Reviewed several patches related to scheduler side changes. As Jian
>> mentioned, this will not affect existing behavior.
>> Looking forward this feature will be used by more people. Thanks for Carlo
>> and Subru!
>>
>> Thanks,
>> Wangda
>>
>> On Wed, Oct 1, 2014 at 1:21 PM, Jian He <jh...@hortonworks.com> wrote:
>>
>> > +1,
>> >
>> > Carlo and Subru,  great job !  thanks for your contribution !
>> > I reviewed a couple of CapacityScheduler related patches, they are in
>> good
>> > shape. In the minimum, they are not affecting existing behavior. should
>> be
>> > safe to merge.
>> >
>> > Jian
>> >
>> >
>> > On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut <tj...@apache.org>
>> > wrote:
>> >
>> > > +1 (non-binding)
>> > > Thanks for adding this, really useful feature.
>> > >
>> > > On 30 September 2014 19:40, Chris Douglas <cd...@apache.org>
>> wrote:
>> > >
>> > > > +1
>> > > >
>> > > > Excellent work, Carlo and Subru. -C
>> > > >
>> > > > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <
>> ccurino@microsoft.com>
>> > > > wrote:
>> > > > > (Apologies if it is delivered twice.)
>> > > > >
>> > > > > YARN Devs,
>> > > > >
>> > > > > We propose to merge YARN-1051 development branch into trunk.
>> > > > >
>> > > > > Key Idea:
>> > > > > This work adds support for Reservations to YARN RM. The key idea
>> is
>> > to
>> > > > allow users to request dedicated access to resources (a
>> reservation),
>> > > ahead
>> > > > of time.
>> > > > > For example I can ask for "10 containers for 1 hour sometime
>> between
>> > > 4pm
>> > > > and 9pm today".  The RM keeps track of the accepted reservation by
>> > means
>> > > of
>> > > > > a Plan (think it as an agenda on how the  cluster resources will
>> be
>> > > > used), and performs admission control to guarantee that if a
>> > reservation
>> > > is
>> > > > accepted enough
>> > > > > resources are set aside to satisfy it.  We enforce the reservation
>> > > > promises by dynamically creating/resizing/removing queues at the
>> right
>> > > > time. This allows us
>> > > > > to leverage the existing schedulers for the actual container
>> > assignment
>> > > > and tracking. The key benefit is to expose to the scheduler
>> flexibility
>> > > of
>> > > > allocation, while
>> > > > > guaranteeing users predictable resource allocation.
>> > > > >
>> > > > > Status
>> > > > >
>> > > > > *         The work has been "broken down" into 14 subtasks (+3
>> > patches
>> > > > already committed to trunk for move/kill of apps). All the issues
>> have
>> > > been
>> > > > resolved.
>> > > > >
>> > > > > *         Jenkins +1 the patch (with the exception of one test
>> > failure
>> > > > which we did not introduce, which is tracked here:
>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6094)
>> > > > >
>> > > > > *         Simple integration with MapReduce:
>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6103
>> > > > >
>> > > > > *         The broken-down patches have been reviewed and +1ed by
>> > Vinod
>> > > > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris
>> > > Douglas.
>> > > > Thanks to all of you for the thorough reviews!
>> > > > >
>> > > > > *         The current version has been rather thoroughly tested by
>> > > > running it on our 250 machines research cluster for months (first
>> > > prototype
>> > > > was operational about a year ago) by:
>> > > > >
>> > > > > o   Running hundreds of thousands of job generate by a modified
>> > version
>> > > > of gridmix that exercise the reservations mechanism side-by-side
>> normal
>> > > > queues.
>> > > > >
>> > > > > o   To support our integration with the resource estimation
>> framework
>> > > > Perforator (
>> http://research.microsoft.com/pubs/178971/perforator.pdf).
>> > > > Kaushik and Dharmesh have been pounding the reservation system for
>> > their
>> > > > research for 3-4 months now, and helped us spot few bugs and iron
>> them
>> > > out.
>> > > > >
>> > > > > o   Code has been inspected/extended by 4-5 other researchers
>> which
>> > are
>> > > > exploring integration with other systems and extensions of our
>> > algorithms
>> > > > for "reservation placement".
>> > > > >
>> > > > > *         We have few ideas for follow-up extensions/improvements
>> are
>> > > > tracked by the umbrella JIRA
>> > > > https://issues.apache.org/jira/browse/YARN-2572
>> > > > >
>> > > > > Documents and Deliverables
>> > > > >
>> > > > > *         This work was accepted for publication to SoCC 2014
>> > > > (pre-camera ready version of the paper here):
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf
>> > > > >
>> > > > > *         Shorter design doc:
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf
>> > > > >
>> > > > > *         Overall patch:
>> > > >
>> > >
>> >
>> https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch
>> > > > >
>> > > > > *         Per Karthik request we are preparing a small how-to
>> > document
>> > > > and example code/configuration tracked by
>> > > > https://issues.apache.org/jira/browse/YARN-2609
>> > > > >
>> > > > >
>> > > > > Credits
>> > > > > Myself and Subru did lots of the coding (hence the flow of patches
>> > from
>> > > > us), but this is a group effort that could have not been possible
>> > without
>> > > > the ideas and hard work of many other
>> > > > > folks in our research group (Microsoft-CISL). Major kudos to:
>> Chris
>> > > > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel
>> > Difallah.
>> > > > Also big thanks to the many folks in community  (Arun, Vinod,
>> > Alejandro,
>> > > > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason,
>> Bobby,
>> > and
>> > > > many more) that helped us shape our ideas and code with very
>> insightful
>> > > > feedback and comments.
>> > > > >
>> > > > > We expect the vote to run for the usual 7 days and will expire at
>> > 12pm
>> > > > PDT on Oct 3. Please feel free to reach out to us if you have any
>> > > > questions/doubts.
>> > > > >
>> > > > > Cheers,
>> > > > > Carlo & Subru
>> > > > >
>> > > >
>> > >
>> >
>> > --
>> > CONFIDENTIALITY NOTICE
>> > NOTICE: This message is intended for the use of the individual or
>> entity to
>> > which it is addressed and may contain information that is confidential,
>> > privileged and exempt from disclosure under applicable law. If the
>> reader
>> > of this message is not the intended recipient, you are hereby notified
>> that
>> > any printing, copying, dissemination, distribution, disclosure or
>> > forwarding of this communication is strictly prohibited. If you have
>> > received this communication in error, please contact the sender
>> immediately
>> > and delete it from your system. Thank You.
>> >
>>
>
>

Re: Calling a merge vote for YARN-1051

Posted by Karthik Kambatla <ka...@cloudera.com>.
+1. Nicely done, Subru and Carlo.

I have been partially involved with the work, and have reviewed some of the
patches. With some help from Subru and documentation from Carlo (thanks!),
I was able to play with the reservation system. Verified the following:
1. Reservations can be made only for the amount of resources available for
that queue.
2. Jobs submitted against a reservation run in the corresponding
"reservation" queue, and jobs submitted to the same higher-level queue but
not against a reservation run in the corresponding "default" queue.
3. The web-ui shows the reserved resources in a queue even when there are
no apps running.

There are a few follow-up items towards feature completeness, and I am okay
with working on them post merge to trunk as planned.
1. Support for FairScheduler
2. Recover reservations on RM restart/failover
3. CLI and/or REST APIs to make reservations - this is very useful for
testing
4. Documentation in the usual apt.vm format.

Cheers!
Karthik




On Wed, Oct 1, 2014 at 1:29 PM, Wangda Tan <wh...@gmail.com> wrote:

> +1 (non-binding),
> Reviewed several patches related to scheduler side changes. As Jian
> mentioned, this will not affect existing behavior.
> Looking forward this feature will be used by more people. Thanks for Carlo
> and Subru!
>
> Thanks,
> Wangda
>
> On Wed, Oct 1, 2014 at 1:21 PM, Jian He <jh...@hortonworks.com> wrote:
>
> > +1,
> >
> > Carlo and Subru,  great job !  thanks for your contribution !
> > I reviewed a couple of CapacityScheduler related patches, they are in
> good
> > shape. In the minimum, they are not affecting existing behavior. should
> be
> > safe to merge.
> >
> > Jian
> >
> >
> > On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut <tj...@apache.org>
> > wrote:
> >
> > > +1 (non-binding)
> > > Thanks for adding this, really useful feature.
> > >
> > > On 30 September 2014 19:40, Chris Douglas <cd...@apache.org> wrote:
> > >
> > > > +1
> > > >
> > > > Excellent work, Carlo and Subru. -C
> > > >
> > > > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <
> ccurino@microsoft.com>
> > > > wrote:
> > > > > (Apologies if it is delivered twice.)
> > > > >
> > > > > YARN Devs,
> > > > >
> > > > > We propose to merge YARN-1051 development branch into trunk.
> > > > >
> > > > > Key Idea:
> > > > > This work adds support for Reservations to YARN RM. The key idea is
> > to
> > > > allow users to request dedicated access to resources (a reservation),
> > > ahead
> > > > of time.
> > > > > For example I can ask for "10 containers for 1 hour sometime
> between
> > > 4pm
> > > > and 9pm today".  The RM keeps track of the accepted reservation by
> > means
> > > of
> > > > > a Plan (think it as an agenda on how the  cluster resources will be
> > > > used), and performs admission control to guarantee that if a
> > reservation
> > > is
> > > > accepted enough
> > > > > resources are set aside to satisfy it.  We enforce the reservation
> > > > promises by dynamically creating/resizing/removing queues at the
> right
> > > > time. This allows us
> > > > > to leverage the existing schedulers for the actual container
> > assignment
> > > > and tracking. The key benefit is to expose to the scheduler
> flexibility
> > > of
> > > > allocation, while
> > > > > guaranteeing users predictable resource allocation.
> > > > >
> > > > > Status
> > > > >
> > > > > *         The work has been "broken down" into 14 subtasks (+3
> > patches
> > > > already committed to trunk for move/kill of apps). All the issues
> have
> > > been
> > > > resolved.
> > > > >
> > > > > *         Jenkins +1 the patch (with the exception of one test
> > failure
> > > > which we did not introduce, which is tracked here:
> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6094)
> > > > >
> > > > > *         Simple integration with MapReduce:
> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6103
> > > > >
> > > > > *         The broken-down patches have been reviewed and +1ed by
> > Vinod
> > > > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris
> > > Douglas.
> > > > Thanks to all of you for the thorough reviews!
> > > > >
> > > > > *         The current version has been rather thoroughly tested by
> > > > running it on our 250 machines research cluster for months (first
> > > prototype
> > > > was operational about a year ago) by:
> > > > >
> > > > > o   Running hundreds of thousands of job generate by a modified
> > version
> > > > of gridmix that exercise the reservations mechanism side-by-side
> normal
> > > > queues.
> > > > >
> > > > > o   To support our integration with the resource estimation
> framework
> > > > Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf
> ).
> > > > Kaushik and Dharmesh have been pounding the reservation system for
> > their
> > > > research for 3-4 months now, and helped us spot few bugs and iron
> them
> > > out.
> > > > >
> > > > > o   Code has been inspected/extended by 4-5 other researchers which
> > are
> > > > exploring integration with other systems and extensions of our
> > algorithms
> > > > for "reservation placement".
> > > > >
> > > > > *         We have few ideas for follow-up extensions/improvements
> are
> > > > tracked by the umbrella JIRA
> > > > https://issues.apache.org/jira/browse/YARN-2572
> > > > >
> > > > > Documents and Deliverables
> > > > >
> > > > > *         This work was accepted for publication to SoCC 2014
> > > > (pre-camera ready version of the paper here):
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf
> > > > >
> > > > > *         Shorter design doc:
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf
> > > > >
> > > > > *         Overall patch:
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch
> > > > >
> > > > > *         Per Karthik request we are preparing a small how-to
> > document
> > > > and example code/configuration tracked by
> > > > https://issues.apache.org/jira/browse/YARN-2609
> > > > >
> > > > >
> > > > > Credits
> > > > > Myself and Subru did lots of the coding (hence the flow of patches
> > from
> > > > us), but this is a group effort that could have not been possible
> > without
> > > > the ideas and hard work of many other
> > > > > folks in our research group (Microsoft-CISL). Major kudos to:
> Chris
> > > > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel
> > Difallah.
> > > > Also big thanks to the many folks in community  (Arun, Vinod,
> > Alejandro,
> > > > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby,
> > and
> > > > many more) that helped us shape our ideas and code with very
> insightful
> > > > feedback and comments.
> > > > >
> > > > > We expect the vote to run for the usual 7 days and will expire at
> > 12pm
> > > > PDT on Oct 3. Please feel free to reach out to us if you have any
> > > > questions/doubts.
> > > > >
> > > > > Cheers,
> > > > > Carlo & Subru
> > > > >
> > > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

Re: Calling a merge vote for YARN-1051

Posted by Wangda Tan <wh...@gmail.com>.
+1 (non-binding),
Reviewed several patches related to scheduler side changes. As Jian
mentioned, this will not affect existing behavior.
Looking forward this feature will be used by more people. Thanks for Carlo
and Subru!

Thanks,
Wangda

On Wed, Oct 1, 2014 at 1:21 PM, Jian He <jh...@hortonworks.com> wrote:

> +1,
>
> Carlo and Subru,  great job !  thanks for your contribution !
> I reviewed a couple of CapacityScheduler related patches, they are in good
> shape. In the minimum, they are not affecting existing behavior. should be
> safe to merge.
>
> Jian
>
>
> On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut <tj...@apache.org>
> wrote:
>
> > +1 (non-binding)
> > Thanks for adding this, really useful feature.
> >
> > On 30 September 2014 19:40, Chris Douglas <cd...@apache.org> wrote:
> >
> > > +1
> > >
> > > Excellent work, Carlo and Subru. -C
> > >
> > > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <cc...@microsoft.com>
> > > wrote:
> > > > (Apologies if it is delivered twice.)
> > > >
> > > > YARN Devs,
> > > >
> > > > We propose to merge YARN-1051 development branch into trunk.
> > > >
> > > > Key Idea:
> > > > This work adds support for Reservations to YARN RM. The key idea is
> to
> > > allow users to request dedicated access to resources (a reservation),
> > ahead
> > > of time.
> > > > For example I can ask for "10 containers for 1 hour sometime between
> > 4pm
> > > and 9pm today".  The RM keeps track of the accepted reservation by
> means
> > of
> > > > a Plan (think it as an agenda on how the  cluster resources will be
> > > used), and performs admission control to guarantee that if a
> reservation
> > is
> > > accepted enough
> > > > resources are set aside to satisfy it.  We enforce the reservation
> > > promises by dynamically creating/resizing/removing queues at the right
> > > time. This allows us
> > > > to leverage the existing schedulers for the actual container
> assignment
> > > and tracking. The key benefit is to expose to the scheduler flexibility
> > of
> > > allocation, while
> > > > guaranteeing users predictable resource allocation.
> > > >
> > > > Status
> > > >
> > > > *         The work has been "broken down" into 14 subtasks (+3
> patches
> > > already committed to trunk for move/kill of apps). All the issues have
> > been
> > > resolved.
> > > >
> > > > *         Jenkins +1 the patch (with the exception of one test
> failure
> > > which we did not introduce, which is tracked here:
> > > https://issues.apache.org/jira/browse/MAPREDUCE-6094)
> > > >
> > > > *         Simple integration with MapReduce:
> > > https://issues.apache.org/jira/browse/MAPREDUCE-6103
> > > >
> > > > *         The broken-down patches have been reviewed and +1ed by
> Vinod
> > > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris
> > Douglas.
> > > Thanks to all of you for the thorough reviews!
> > > >
> > > > *         The current version has been rather thoroughly tested by
> > > running it on our 250 machines research cluster for months (first
> > prototype
> > > was operational about a year ago) by:
> > > >
> > > > o   Running hundreds of thousands of job generate by a modified
> version
> > > of gridmix that exercise the reservations mechanism side-by-side normal
> > > queues.
> > > >
> > > > o   To support our integration with the resource estimation framework
> > > Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf).
> > > Kaushik and Dharmesh have been pounding the reservation system for
> their
> > > research for 3-4 months now, and helped us spot few bugs and iron them
> > out.
> > > >
> > > > o   Code has been inspected/extended by 4-5 other researchers which
> are
> > > exploring integration with other systems and extensions of our
> algorithms
> > > for "reservation placement".
> > > >
> > > > *         We have few ideas for follow-up extensions/improvements are
> > > tracked by the umbrella JIRA
> > > https://issues.apache.org/jira/browse/YARN-2572
> > > >
> > > > Documents and Deliverables
> > > >
> > > > *         This work was accepted for publication to SoCC 2014
> > > (pre-camera ready version of the paper here):
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf
> > > >
> > > > *         Shorter design doc:
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf
> > > >
> > > > *         Overall patch:
> > >
> >
> https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch
> > > >
> > > > *         Per Karthik request we are preparing a small how-to
> document
> > > and example code/configuration tracked by
> > > https://issues.apache.org/jira/browse/YARN-2609
> > > >
> > > >
> > > > Credits
> > > > Myself and Subru did lots of the coding (hence the flow of patches
> from
> > > us), but this is a group effort that could have not been possible
> without
> > > the ideas and hard work of many other
> > > > folks in our research group (Microsoft-CISL). Major kudos to:  Chris
> > > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel
> Difallah.
> > > Also big thanks to the many folks in community  (Arun, Vinod,
> Alejandro,
> > > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby,
> and
> > > many more) that helped us shape our ideas and code with very insightful
> > > feedback and comments.
> > > >
> > > > We expect the vote to run for the usual 7 days and will expire at
> 12pm
> > > PDT on Oct 3. Please feel free to reach out to us if you have any
> > > questions/doubts.
> > > >
> > > > Cheers,
> > > > Carlo & Subru
> > > >
> > >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Calling a merge vote for YARN-1051

Posted by Jian He <jh...@hortonworks.com>.
+1,

Carlo and Subru,  great job !  thanks for your contribution !
I reviewed a couple of CapacityScheduler related patches, they are in good
shape. In the minimum, they are not affecting existing behavior. should be
safe to merge.

Jian


On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut <tj...@apache.org>
wrote:

> +1 (non-binding)
> Thanks for adding this, really useful feature.
>
> On 30 September 2014 19:40, Chris Douglas <cd...@apache.org> wrote:
>
> > +1
> >
> > Excellent work, Carlo and Subru. -C
> >
> > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <cc...@microsoft.com>
> > wrote:
> > > (Apologies if it is delivered twice.)
> > >
> > > YARN Devs,
> > >
> > > We propose to merge YARN-1051 development branch into trunk.
> > >
> > > Key Idea:
> > > This work adds support for Reservations to YARN RM. The key idea is to
> > allow users to request dedicated access to resources (a reservation),
> ahead
> > of time.
> > > For example I can ask for "10 containers for 1 hour sometime between
> 4pm
> > and 9pm today".  The RM keeps track of the accepted reservation by means
> of
> > > a Plan (think it as an agenda on how the  cluster resources will be
> > used), and performs admission control to guarantee that if a reservation
> is
> > accepted enough
> > > resources are set aside to satisfy it.  We enforce the reservation
> > promises by dynamically creating/resizing/removing queues at the right
> > time. This allows us
> > > to leverage the existing schedulers for the actual container assignment
> > and tracking. The key benefit is to expose to the scheduler flexibility
> of
> > allocation, while
> > > guaranteeing users predictable resource allocation.
> > >
> > > Status
> > >
> > > *         The work has been "broken down" into 14 subtasks (+3 patches
> > already committed to trunk for move/kill of apps). All the issues have
> been
> > resolved.
> > >
> > > *         Jenkins +1 the patch (with the exception of one test failure
> > which we did not introduce, which is tracked here:
> > https://issues.apache.org/jira/browse/MAPREDUCE-6094)
> > >
> > > *         Simple integration with MapReduce:
> > https://issues.apache.org/jira/browse/MAPREDUCE-6103
> > >
> > > *         The broken-down patches have been reviewed and +1ed by Vinod
> > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris
> Douglas.
> > Thanks to all of you for the thorough reviews!
> > >
> > > *         The current version has been rather thoroughly tested by
> > running it on our 250 machines research cluster for months (first
> prototype
> > was operational about a year ago) by:
> > >
> > > o   Running hundreds of thousands of job generate by a modified version
> > of gridmix that exercise the reservations mechanism side-by-side normal
> > queues.
> > >
> > > o   To support our integration with the resource estimation framework
> > Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf).
> > Kaushik and Dharmesh have been pounding the reservation system for their
> > research for 3-4 months now, and helped us spot few bugs and iron them
> out.
> > >
> > > o   Code has been inspected/extended by 4-5 other researchers which are
> > exploring integration with other systems and extensions of our algorithms
> > for "reservation placement".
> > >
> > > *         We have few ideas for follow-up extensions/improvements are
> > tracked by the umbrella JIRA
> > https://issues.apache.org/jira/browse/YARN-2572
> > >
> > > Documents and Deliverables
> > >
> > > *         This work was accepted for publication to SoCC 2014
> > (pre-camera ready version of the paper here):
> >
> https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf
> > >
> > > *         Shorter design doc:
> >
> https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf
> > >
> > > *         Overall patch:
> >
> https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch
> > >
> > > *         Per Karthik request we are preparing a small how-to document
> > and example code/configuration tracked by
> > https://issues.apache.org/jira/browse/YARN-2609
> > >
> > >
> > > Credits
> > > Myself and Subru did lots of the coding (hence the flow of patches from
> > us), but this is a group effort that could have not been possible without
> > the ideas and hard work of many other
> > > folks in our research group (Microsoft-CISL). Major kudos to:  Chris
> > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel Difallah.
> > Also big thanks to the many folks in community  (Arun, Vinod, Alejandro,
> > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby, and
> > many more) that helped us shape our ideas and code with very insightful
> > feedback and comments.
> > >
> > > We expect the vote to run for the usual 7 days and will expire at 12pm
> > PDT on Oct 3. Please feel free to reach out to us if you have any
> > questions/doubts.
> > >
> > > Cheers,
> > > Carlo & Subru
> > >
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Calling a merge vote for YARN-1051

Posted by Thomas Jungblut <tj...@apache.org>.
+1 (non-binding)
Thanks for adding this, really useful feature.

On 30 September 2014 19:40, Chris Douglas <cd...@apache.org> wrote:

> +1
>
> Excellent work, Carlo and Subru. -C
>
> On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <cc...@microsoft.com>
> wrote:
> > (Apologies if it is delivered twice.)
> >
> > YARN Devs,
> >
> > We propose to merge YARN-1051 development branch into trunk.
> >
> > Key Idea:
> > This work adds support for Reservations to YARN RM. The key idea is to
> allow users to request dedicated access to resources (a reservation), ahead
> of time.
> > For example I can ask for "10 containers for 1 hour sometime between 4pm
> and 9pm today".  The RM keeps track of the accepted reservation by means of
> > a Plan (think it as an agenda on how the  cluster resources will be
> used), and performs admission control to guarantee that if a reservation is
> accepted enough
> > resources are set aside to satisfy it.  We enforce the reservation
> promises by dynamically creating/resizing/removing queues at the right
> time. This allows us
> > to leverage the existing schedulers for the actual container assignment
> and tracking. The key benefit is to expose to the scheduler flexibility of
> allocation, while
> > guaranteeing users predictable resource allocation.
> >
> > Status
> >
> > *         The work has been "broken down" into 14 subtasks (+3 patches
> already committed to trunk for move/kill of apps). All the issues have been
> resolved.
> >
> > *         Jenkins +1 the patch (with the exception of one test failure
> which we did not introduce, which is tracked here:
> https://issues.apache.org/jira/browse/MAPREDUCE-6094)
> >
> > *         Simple integration with MapReduce:
> https://issues.apache.org/jira/browse/MAPREDUCE-6103
> >
> > *         The broken-down patches have been reviewed and +1ed by Vinod
> Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris Douglas.
> Thanks to all of you for the thorough reviews!
> >
> > *         The current version has been rather thoroughly tested by
> running it on our 250 machines research cluster for months (first prototype
> was operational about a year ago) by:
> >
> > o   Running hundreds of thousands of job generate by a modified version
> of gridmix that exercise the reservations mechanism side-by-side normal
> queues.
> >
> > o   To support our integration with the resource estimation framework
> Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf).
> Kaushik and Dharmesh have been pounding the reservation system for their
> research for 3-4 months now, and helped us spot few bugs and iron them out.
> >
> > o   Code has been inspected/extended by 4-5 other researchers which are
> exploring integration with other systems and extensions of our algorithms
> for "reservation placement".
> >
> > *         We have few ideas for follow-up extensions/improvements are
> tracked by the umbrella JIRA
> https://issues.apache.org/jira/browse/YARN-2572
> >
> > Documents and Deliverables
> >
> > *         This work was accepted for publication to SoCC 2014
> (pre-camera ready version of the paper here):
> https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf
> >
> > *         Shorter design doc:
> https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf
> >
> > *         Overall patch:
> https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch
> >
> > *         Per Karthik request we are preparing a small how-to document
> and example code/configuration tracked by
> https://issues.apache.org/jira/browse/YARN-2609
> >
> >
> > Credits
> > Myself and Subru did lots of the coding (hence the flow of patches from
> us), but this is a group effort that could have not been possible without
> the ideas and hard work of many other
> > folks in our research group (Microsoft-CISL). Major kudos to:  Chris
> Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel Difallah.
> Also big thanks to the many folks in community  (Arun, Vinod, Alejandro,
> Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby, and
> many more) that helped us shape our ideas and code with very insightful
> feedback and comments.
> >
> > We expect the vote to run for the usual 7 days and will expire at 12pm
> PDT on Oct 3. Please feel free to reach out to us if you have any
> questions/doubts.
> >
> > Cheers,
> > Carlo & Subru
> >
>

Re: Calling a merge vote for YARN-1051

Posted by Vinod Kumar Vavilapalli <vi...@apache.org>.
Thanks Subru, Carlo, Dharmesh and Chris!

Overall a great feature and a great contribution. I've been involved in various design discussions and did a bunch of reviews. The APIs are great, the implementation is too barring few last rush items like persistence. We can do these in the follow-up.

+1 binding.

Seeing this as a vote for trunk, let's merge this to trunk first, get a couple of clean builds and test-runs before we move it to branch-2.

This will be a great alpha feature for many users! Thanks again!

+Vinod

On Sep 30, 2014, at 11:40 AM, Chris Douglas <cd...@apache.org> wrote:

> +1
> 
> Excellent work, Carlo and Subru. -C
> 
> On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <cc...@microsoft.com> wrote:
>> (Apologies if it is delivered twice.)
>> 
>> YARN Devs,
>> 
>> We propose to merge YARN-1051 development branch into trunk.
>> 
>> Key Idea:
>> This work adds support for Reservations to YARN RM. The key idea is to allow users to request dedicated access to resources (a reservation), ahead of time.
>> For example I can ask for "10 containers for 1 hour sometime between 4pm and 9pm today".  The RM keeps track of the accepted reservation by means of
>> a Plan (think it as an agenda on how the  cluster resources will be used), and performs admission control to guarantee that if a reservation is accepted enough
>> resources are set aside to satisfy it.  We enforce the reservation promises by dynamically creating/resizing/removing queues at the right time. This allows us
>> to leverage the existing schedulers for the actual container assignment and tracking. The key benefit is to expose to the scheduler flexibility of allocation, while
>> guaranteeing users predictable resource allocation.
>> 
>> Status
>> 
>> *         The work has been "broken down" into 14 subtasks (+3 patches already committed to trunk for move/kill of apps). All the issues have been resolved.
>> 
>> *         Jenkins +1 the patch (with the exception of one test failure which we did not introduce, which is tracked here: https://issues.apache.org/jira/browse/MAPREDUCE-6094)
>> 
>> *         Simple integration with MapReduce: https://issues.apache.org/jira/browse/MAPREDUCE-6103
>> 
>> *         The broken-down patches have been reviewed and +1ed by Vinod Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris Douglas. Thanks to all of you for the thorough reviews!
>> 
>> *         The current version has been rather thoroughly tested by running it on our 250 machines research cluster for months (first prototype was operational about a year ago) by:
>> 
>> o   Running hundreds of thousands of job generate by a modified version of gridmix that exercise the reservations mechanism side-by-side normal queues.
>> 
>> o   To support our integration with the resource estimation framework Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf). Kaushik and Dharmesh have been pounding the reservation system for their research for 3-4 months now, and helped us spot few bugs and iron them out.
>> 
>> o   Code has been inspected/extended by 4-5 other researchers which are exploring integration with other systems and extensions of our algorithms for "reservation placement".
>> 
>> *         We have few ideas for follow-up extensions/improvements are tracked by the umbrella JIRA https://issues.apache.org/jira/browse/YARN-2572
>> 
>> Documents and Deliverables
>> 
>> *         This work was accepted for publication to SoCC 2014 (pre-camera ready version of the paper here):   https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf
>> 
>> *         Shorter design doc: https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf
>> 
>> *         Overall patch: https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch
>> 
>> *         Per Karthik request we are preparing a small how-to document and example code/configuration tracked by https://issues.apache.org/jira/browse/YARN-2609
>> 
>> 
>> Credits
>> Myself and Subru did lots of the coding (hence the flow of patches from us), but this is a group effort that could have not been possible without the ideas and hard work of many other
>> folks in our research group (Microsoft-CISL). Major kudos to:  Chris Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel Difallah. Also big thanks to the many folks in community  (Arun, Vinod, Alejandro, Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby, and many more) that helped us shape our ideas and code with very insightful feedback and comments.
>> 
>> We expect the vote to run for the usual 7 days and will expire at 12pm PDT on Oct 3. Please feel free to reach out to us if you have any questions/doubts.
>> 
>> Cheers,
>> Carlo & Subru
>> 


Re: Calling a merge vote for YARN-1051

Posted by Chris Douglas <cd...@apache.org>.
+1

Excellent work, Carlo and Subru. -C

On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <cc...@microsoft.com> wrote:
> (Apologies if it is delivered twice.)
>
> YARN Devs,
>
> We propose to merge YARN-1051 development branch into trunk.
>
> Key Idea:
> This work adds support for Reservations to YARN RM. The key idea is to allow users to request dedicated access to resources (a reservation), ahead of time.
> For example I can ask for "10 containers for 1 hour sometime between 4pm and 9pm today".  The RM keeps track of the accepted reservation by means of
> a Plan (think it as an agenda on how the  cluster resources will be used), and performs admission control to guarantee that if a reservation is accepted enough
> resources are set aside to satisfy it.  We enforce the reservation promises by dynamically creating/resizing/removing queues at the right time. This allows us
> to leverage the existing schedulers for the actual container assignment and tracking. The key benefit is to expose to the scheduler flexibility of allocation, while
> guaranteeing users predictable resource allocation.
>
> Status
>
> *         The work has been "broken down" into 14 subtasks (+3 patches already committed to trunk for move/kill of apps). All the issues have been resolved.
>
> *         Jenkins +1 the patch (with the exception of one test failure which we did not introduce, which is tracked here: https://issues.apache.org/jira/browse/MAPREDUCE-6094)
>
> *         Simple integration with MapReduce: https://issues.apache.org/jira/browse/MAPREDUCE-6103
>
> *         The broken-down patches have been reviewed and +1ed by Vinod Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris Douglas. Thanks to all of you for the thorough reviews!
>
> *         The current version has been rather thoroughly tested by running it on our 250 machines research cluster for months (first prototype was operational about a year ago) by:
>
> o   Running hundreds of thousands of job generate by a modified version of gridmix that exercise the reservations mechanism side-by-side normal queues.
>
> o   To support our integration with the resource estimation framework Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf). Kaushik and Dharmesh have been pounding the reservation system for their research for 3-4 months now, and helped us spot few bugs and iron them out.
>
> o   Code has been inspected/extended by 4-5 other researchers which are exploring integration with other systems and extensions of our algorithms for "reservation placement".
>
> *         We have few ideas for follow-up extensions/improvements are tracked by the umbrella JIRA https://issues.apache.org/jira/browse/YARN-2572
>
> Documents and Deliverables
>
> *         This work was accepted for publication to SoCC 2014 (pre-camera ready version of the paper here):   https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf
>
> *         Shorter design doc: https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf
>
> *         Overall patch: https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch
>
> *         Per Karthik request we are preparing a small how-to document and example code/configuration tracked by https://issues.apache.org/jira/browse/YARN-2609
>
>
> Credits
> Myself and Subru did lots of the coding (hence the flow of patches from us), but this is a group effort that could have not been possible without the ideas and hard work of many other
> folks in our research group (Microsoft-CISL). Major kudos to:  Chris Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel Difallah. Also big thanks to the many folks in community  (Arun, Vinod, Alejandro, Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby, and many more) that helped us shape our ideas and code with very insightful feedback and comments.
>
> We expect the vote to run for the usual 7 days and will expire at 12pm PDT on Oct 3. Please feel free to reach out to us if you have any questions/doubts.
>
> Cheers,
> Carlo & Subru
>

Re: Calling a merge vote for YARN-1051

Posted by Dharmesh Kakadia <dh...@gmail.com>.
+1

We have been working on resource estimation for YARN jobs. We have
estimation in place for Mapreduce, Hive and Oozie workflows through
PerfOrator (more details below), in the past we have done estimation for
research frameworks like dryadLINQ as well (
http://research.microsoft.com/apps/pubs/default.aspx?id=178971). We have
been using reservation system introduced in YARN-1051 to enforce SLAs. Some
of these jobs are complex and have 10 or more stages. We have been able to
express their reservation requests in the language provided and YARN-1051
has been doing a good job of enforcing them. We have been using YARN-1051
for few months now and found it quite stable.

PerfOrator:
PerfOrator employs performance models (tailored to big data jobs) to
estimate the resource requirements of a job. It takes resources available
as input and predicts a "skyline" for a job that consists of a sequence of
(#container, time) pairs. A prediction {(x1, t1),(x2,t2)} means that x1
containers are needed for t1 time, followed by x2 containers are needed for
t2 time... . The reservation language provided by YARN-1051 is a perfect
match for pipeline jobs.

We have prior experience in building these for other distributed runtime's
as well. But none of them have features to reserve resources and enforce
such reservations. We are very happy to see this in YARN.

Kudos to you guys.

Thanks
Kaushik and Dharmesh

On Sat, Sep 27, 2014 at 12:20 AM, Carlo Curino <cc...@microsoft.com>
wrote:

> (Apologies if it is delivered twice.)
>
> YARN Devs,
>
> We propose to merge YARN-1051 development branch into trunk.
>
> Key Idea:
> This work adds support for Reservations to YARN RM. The key idea is to
> allow users to request dedicated access to resources (a reservation), ahead
> of time.
> For example I can ask for "10 containers for 1 hour sometime between 4pm
> and 9pm today".  The RM keeps track of the accepted reservation by means of
> a Plan (think it as an agenda on how the  cluster resources will be used),
> and performs admission control to guarantee that if a reservation is
> accepted enough
> resources are set aside to satisfy it.  We enforce the reservation
> promises by dynamically creating/resizing/removing queues at the right
> time. This allows us
> to leverage the existing schedulers for the actual container assignment
> and tracking. The key benefit is to expose to the scheduler flexibility of
> allocation, while
> guaranteeing users predictable resource allocation.
>
> Status
>
> *         The work has been "broken down" into 14 subtasks (+3 patches
> already committed to trunk for move/kill of apps). All the issues have been
> resolved.
>
> *         Jenkins +1 the patch (with the exception of one test failure
> which we did not introduce, which is tracked here:
> https://issues.apache.org/jira/browse/MAPREDUCE-6094)
>
> *         Simple integration with MapReduce:
> https://issues.apache.org/jira/browse/MAPREDUCE-6103
>
> *         The broken-down patches have been reviewed and +1ed by Vinod
> Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris Douglas.
> Thanks to all of you for the thorough reviews!
>
> *         The current version has been rather thoroughly tested by running
> it on our 250 machines research cluster for months (first prototype was
> operational about a year ago) by:
>
> o   Running hundreds of thousands of job generate by a modified version of
> gridmix that exercise the reservations mechanism side-by-side normal queues.
>
> o   To support our integration with the resource estimation framework
> Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf).
> Kaushik and Dharmesh have been pounding the reservation system for their
> research for 3-4 months now, and helped us spot few bugs and iron them out.
>
> o   Code has been inspected/extended by 4-5 other researchers which are
> exploring integration with other systems and extensions of our algorithms
> for "reservation placement".
>
> *         We have few ideas for follow-up extensions/improvements are
> tracked by the umbrella JIRA
> https://issues.apache.org/jira/browse/YARN-2572
>
> Documents and Deliverables
>
> *         This work was accepted for publication to SoCC 2014 (pre-camera
> ready version of the paper here):
> https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf
>
> *         Shorter design doc:
> https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf
>
> *         Overall patch:
> https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch
>
> *         Per Karthik request we are preparing a small how-to document and
> example code/configuration tracked by
> https://issues.apache.org/jira/browse/YARN-2609
>
>
> Credits
> Myself and Subru did lots of the coding (hence the flow of patches from
> us), but this is a group effort that could have not been possible without
> the ideas and hard work of many other
> folks in our research group (Microsoft-CISL). Major kudos to:  Chris
> Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel Difallah.
> Also big thanks to the many folks in community  (Arun, Vinod, Alejandro,
> Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby, and
> many more) that helped us shape our ideas and code with very insightful
> feedback and comments.
>
> We expect the vote to run for the usual 7 days and will expire at 12pm PDT
> on Oct 3. Please feel free to reach out to us if you have any
> questions/doubts.
>
> Cheers,
> Carlo & Subru
>
>