You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Subru Krishnan <su...@apache.org> on 2017/06/20 23:28:15 UTC

[DISCUSS] merging YARN-2915 (Federation) to trunk

Hi all,

We would like to open a discussion on merging the YARN Federation
(YARN-2915) [1] feature to trunk.  We have been developing the feature in a
feature branch (YARN-2915 [2]) for a while, and we are reasonably confident
that the state of the feature meets the criteria to be merged onto trunk.

*Key Ideas*:

YARN’s centralized design allows strict enforcement of scheduling
invariants and effective resource sharing, but becomes a scalability
bottleneck (in number of jobs and nodes) well before reaching the scale of
our clusters (e.g., 20k-50k nodes).


To address these limitations, we developed a scale-out, federation-based
solution (YARN-2915). Our architecture scales near-linearly to datacenter
sized clusters, by partitioning nodes across multiple sub-clusters (each
running a YARN cluster of few thousands nodes). Applications can span
multiple sub-clusters *transparently (i.e. no code change or recompilation
of existing apps)*, thanks to a layer of indirection that negotiates with
multiple sub-clusters' Resource Managers on behalf of the application.


This design is structurally scalable, as it bounds the number of nodes each
RM is responsible for. Appropriate policies ensure that the majority of
applications reside within a single sub-cluster, thus further controlling
the load on each RM. This provides near linear scale-out by simply adding
more sub-clusters. The same mechanism enables pooling of resources from
clusters owned and operated by different teams.

Status:

   - The version we would like to merge to trunk is termed "MVP" (minimal
   viable product). The feature will have a complete end-to-end application
   execution flow with the ability to span a single application across
   multiple YARN (sub) clusters.
   - There were 50+ sub-tasks that were that were completed as part of this
   effort. Every patch has been reviewed and +1ed by a committer. Thanks to
   Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
   - Federation is designed to be built around YARN and consequently has
   minimal code changes to core YARN. The relevant JIRAs that modify existing
   YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
   attention to ensure that if federation is disabled there is zero impact to
   existing functionality (disabled by default).
   - We found a few bugs as we went along which we fixed directly upstream
   in trunk and/or branch-2.
   - We have continuously rebasing the feature branch [2] so the merge
   should be a straightforward cherry-pick.
   - The current version has been rather thoroughly tested and is currently
   deployed in a *10,000+ node federated YARN cluster that's running
   upwards of 50k jobs daily with a reliability of 99.9%*.
   - We have few ideas for follow-up extensions/improvements which are
   tracked in the umbrella JIRA YARN-5597[3].


Documentation:

   - Quick start guide (maven site) - YARN-6484[4].
   - Overall design doc[5] and the slide-deck [6] we used for our talk at
   Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.


Credits:

This is a group effort that could have not been possible without the ideas
and hard work of many other folks and we would like to specifically call
out Giovanni, Botong & Ellen for their invaluable contributions. Also big
thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
many more) that helped us shape our ideas and code with very insightful
feedback and comments.

We plan to start the merge vote in the next week or so. The branch is close
to complete (~5 patches before one can kick the tires on a running
deployment). Please look through the branch; feedback is welcome. Thanks!

Cheers,
Subru & Carlo

[1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
[2] https://github.com/apache/hadoop/tree/YARN-2915
[3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
[4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
[5] https://issues.apache.org/jira/secure/attachment/12733292/
Yarn_federation_design_v1.pdf
[6] https://issues.apache.org/jira/secure/attachment/1281922
9/YARN-Federation-Hadoop-Summit_final.pptx
[7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
[8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673

Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

Posted by Subru Krishnan <su...@apache.org>.
Thanks everyone for the thoughtful comments. We completed the last patch
required (YARN-3659) for testing the YARN-2915 branch e2e yesterday. I'll
go ahead and start a vote thread as soon as the validations are done as
there were no blockers raised.

Cheers,
Subru & Carlo

On Fri, Jun 23, 2017 at 5:11 PM, Wangda Tan <wh...@gmail.com> wrote:

> Thanks all for working on the feature, I'm in favor of moving forward as
> well.
>
> Best,
> Wangda
>
> On Fri, Jun 23, 2017 at 2:44 PM, Sangjin Lee <sj...@gmail.com> wrote:
>
>> Thanks for the clarification Subru. I am in favor of moving forward.
>>
>>
>> Sangjin
>>
>> On Thu, Jun 22, 2017 at 6:21 PM, Karthik Shashank Kambatla <
>> kasha@cloudera.com> wrote:
>>
>> > Given RTC and the amount of production testing this feature has
>> received, I
>> > am totally in favor of this merge.
>> >
>> >
>> >
>> > On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org>
>> wrote:
>> >
>> > > Hi all,
>> > >
>> > > We would like to open a discussion on merging the YARN Federation
>> > > (YARN-2915) [1] feature to trunk.  We have been developing the feature
>> > in a
>> > > feature branch (YARN-2915 [2]) for a while, and we are reasonably
>> > confident
>> > > that the state of the feature meets the criteria to be merged onto
>> trunk.
>> > >
>> > > *Key Ideas*:
>> > >
>> > > YARN’s centralized design allows strict enforcement of scheduling
>> > > invariants and effective resource sharing, but becomes a scalability
>> > > bottleneck (in number of jobs and nodes) well before reaching the
>> scale
>> > of
>> > > our clusters (e.g., 20k-50k nodes).
>> > >
>> > >
>> > > To address these limitations, we developed a scale-out,
>> federation-based
>> > > solution (YARN-2915). Our architecture scales near-linearly to
>> datacenter
>> > > sized clusters, by partitioning nodes across multiple sub-clusters
>> (each
>> > > running a YARN cluster of few thousands nodes). Applications can span
>> > > multiple sub-clusters *transparently (i.e. no code change or
>> > recompilation
>> > > of existing apps)*, thanks to a layer of indirection that negotiates
>> with
>> > > multiple sub-clusters' Resource Managers on behalf of the application.
>> > >
>> > >
>> > > This design is structurally scalable, as it bounds the number of nodes
>> > each
>> > > RM is responsible for. Appropriate policies ensure that the majority
>> of
>> > > applications reside within a single sub-cluster, thus further
>> controlling
>> > > the load on each RM. This provides near linear scale-out by simply
>> adding
>> > > more sub-clusters. The same mechanism enables pooling of resources
>> from
>> > > clusters owned and operated by different teams.
>> > >
>> > > Status:
>> > >
>> > >    - The version we would like to merge to trunk is termed "MVP"
>> (minimal
>> > >    viable product). The feature will have a complete end-to-end
>> > application
>> > >    execution flow with the ability to span a single application across
>> > >    multiple YARN (sub) clusters.
>> > >    - There were 50+ sub-tasks that were that were completed as part of
>> > this
>> > >    effort. Every patch has been reviewed and +1ed by a committer.
>> Thanks
>> > to
>> > >    Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough
>> reviews!
>> > >    - Federation is designed to be built around YARN and consequently
>> has
>> > >    minimal code changes to core YARN. The relevant JIRAs that modify
>> > > existing
>> > >    YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid
>> close
>> > >    attention to ensure that if federation is disabled there is zero
>> > impact
>> > > to
>> > >    existing functionality (disabled by default).
>> > >    - We found a few bugs as we went along which we fixed directly
>> > upstream
>> > >    in trunk and/or branch-2.
>> > >    - We have continuously rebasing the feature branch [2] so the merge
>> > >    should be a straightforward cherry-pick.
>> > >    - The current version has been rather thoroughly tested and is
>> > currently
>> > >    deployed in a *10,000+ node federated YARN cluster that's running
>> > >    upwards of 50k jobs daily with a reliability of 99.9%*.
>> > >    - We have few ideas for follow-up extensions/improvements which are
>> > >    tracked in the umbrella JIRA YARN-5597[3].
>> > >
>> > >
>> > > Documentation:
>> > >
>> > >    - Quick start guide (maven site) - YARN-6484[4].
>> > >    - Overall design doc[5] and the slide-deck [6] we used for our
>> talk at
>> > >    Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
>> > >
>> > >
>> > > Credits:
>> > >
>> > > This is a group effort that could have not been possible without the
>> > ideas
>> > > and hard work of many other folks and we would like to specifically
>> call
>> > > out Giovanni, Botong & Ellen for their invaluable contributions. Also
>> big
>> > > thanks to the many folks in community  (Sriram, Kishore, Sarvesh,
>> Jian,
>> > > Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith
>> and
>> > > many more) that helped us shape our ideas and code with very
>> insightful
>> > > feedback and comments.
>> > >
>> > > We plan to start the merge vote in the next week or so. The branch is
>> > close
>> > > to complete (~5 patches before one can kick the tires on a running
>> > > deployment). Please look through the branch; feedback is welcome.
>> Thanks!
>> > >
>> > > Cheers,
>> > > Subru & Carlo
>> > >
>> > > [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
>> > > [2] https://github.com/apache/hadoop/tree/YARN-2915
>> > > [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
>> > > [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
>> > > [5] https://issues.apache.org/jira/secure/attachment/12733292/
>> > > Yarn_federation_design_v1.pdf
>> > > [6] https://issues.apache.org/jira/secure/attachment/1281922
>> > > 9/YARN-Federation-Hadoop-Summit_final.pptx
>> > > [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
>> > > [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
>> > >
>> >
>>
>
>

Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

Posted by Kiran Kumar Pulamolu <ki...@sasi.ac.in>.
Dear all,

Good morning,

This is Kiran Kumar Pulamolu, doing research in resource optimization in
Hadoop by dapit resource sharing with fairness policies. I would like
contribute in this group. Please help out how to proceed.

Thankyou
Kiran Kumar Pulamolu
+919492400797

On 24 Jun 2017 5:42 a.m., "Wangda Tan" <wh...@gmail.com> wrote:

> Thanks all for working on the feature, I'm in favor of moving forward as
> well.
>
> Best,
> Wangda
>
> On Fri, Jun 23, 2017 at 2:44 PM, Sangjin Lee <sj...@gmail.com> wrote:
>
> > Thanks for the clarification Subru. I am in favor of moving forward.
> >
> >
> > Sangjin
> >
> > On Thu, Jun 22, 2017 at 6:21 PM, Karthik Shashank Kambatla <
> > kasha@cloudera.com> wrote:
> >
> > > Given RTC and the amount of production testing this feature has
> > received, I
> > > am totally in favor of this merge.
> > >
> > >
> > >
> > > On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > We would like to open a discussion on merging the YARN Federation
> > > > (YARN-2915) [1] feature to trunk.  We have been developing the
> feature
> > > in a
> > > > feature branch (YARN-2915 [2]) for a while, and we are reasonably
> > > confident
> > > > that the state of the feature meets the criteria to be merged onto
> > trunk.
> > > >
> > > > *Key Ideas*:
> > > >
> > > > YARN’s centralized design allows strict enforcement of scheduling
> > > > invariants and effective resource sharing, but becomes a scalability
> > > > bottleneck (in number of jobs and nodes) well before reaching the
> scale
> > > of
> > > > our clusters (e.g., 20k-50k nodes).
> > > >
> > > >
> > > > To address these limitations, we developed a scale-out,
> > federation-based
> > > > solution (YARN-2915). Our architecture scales near-linearly to
> > datacenter
> > > > sized clusters, by partitioning nodes across multiple sub-clusters
> > (each
> > > > running a YARN cluster of few thousands nodes). Applications can span
> > > > multiple sub-clusters *transparently (i.e. no code change or
> > > recompilation
> > > > of existing apps)*, thanks to a layer of indirection that negotiates
> > with
> > > > multiple sub-clusters' Resource Managers on behalf of the
> application.
> > > >
> > > >
> > > > This design is structurally scalable, as it bounds the number of
> nodes
> > > each
> > > > RM is responsible for. Appropriate policies ensure that the majority
> of
> > > > applications reside within a single sub-cluster, thus further
> > controlling
> > > > the load on each RM. This provides near linear scale-out by simply
> > adding
> > > > more sub-clusters. The same mechanism enables pooling of resources
> from
> > > > clusters owned and operated by different teams.
> > > >
> > > > Status:
> > > >
> > > >    - The version we would like to merge to trunk is termed "MVP"
> > (minimal
> > > >    viable product). The feature will have a complete end-to-end
> > > application
> > > >    execution flow with the ability to span a single application
> across
> > > >    multiple YARN (sub) clusters.
> > > >    - There were 50+ sub-tasks that were that were completed as part
> of
> > > this
> > > >    effort. Every patch has been reviewed and +1ed by a committer.
> > Thanks
> > > to
> > > >    Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough
> reviews!
> > > >    - Federation is designed to be built around YARN and consequently
> > has
> > > >    minimal code changes to core YARN. The relevant JIRAs that modify
> > > > existing
> > > >    YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid
> close
> > > >    attention to ensure that if federation is disabled there is zero
> > > impact
> > > > to
> > > >    existing functionality (disabled by default).
> > > >    - We found a few bugs as we went along which we fixed directly
> > > upstream
> > > >    in trunk and/or branch-2.
> > > >    - We have continuously rebasing the feature branch [2] so the
> merge
> > > >    should be a straightforward cherry-pick.
> > > >    - The current version has been rather thoroughly tested and is
> > > currently
> > > >    deployed in a *10,000+ node federated YARN cluster that's running
> > > >    upwards of 50k jobs daily with a reliability of 99.9%*.
> > > >    - We have few ideas for follow-up extensions/improvements which
> are
> > > >    tracked in the umbrella JIRA YARN-5597[3].
> > > >
> > > >
> > > > Documentation:
> > > >
> > > >    - Quick start guide (maven site) - YARN-6484[4].
> > > >    - Overall design doc[5] and the slide-deck [6] we used for our
> talk
> > at
> > > >    Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
> > > >
> > > >
> > > > Credits:
> > > >
> > > > This is a group effort that could have not been possible without the
> > > ideas
> > > > and hard work of many other folks and we would like to specifically
> > call
> > > > out Giovanni, Botong & Ellen for their invaluable contributions. Also
> > big
> > > > thanks to the many folks in community  (Sriram, Kishore, Sarvesh,
> Jian,
> > > > Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith
> > and
> > > > many more) that helped us shape our ideas and code with very
> insightful
> > > > feedback and comments.
> > > >
> > > > We plan to start the merge vote in the next week or so. The branch is
> > > close
> > > > to complete (~5 patches before one can kick the tires on a running
> > > > deployment). Please look through the branch; feedback is welcome.
> > Thanks!
> > > >
> > > > Cheers,
> > > > Subru & Carlo
> > > >
> > > > [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
> > > > [2] https://github.com/apache/hadoop/tree/YARN-2915
> > > > [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
> > > > [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
> > > > [5] https://issues.apache.org/jira/secure/attachment/12733292/
> > > > Yarn_federation_design_v1.pdf
> > > > [6] https://issues.apache.org/jira/secure/attachment/1281922
> > > > 9/YARN-Federation-Hadoop-Summit_final.pptx
> > > > [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
> > > > [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
> > > >
> > >
> >
>

Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

Posted by Wangda Tan <wh...@gmail.com>.
Thanks all for working on the feature, I'm in favor of moving forward as
well.

Best,
Wangda

On Fri, Jun 23, 2017 at 2:44 PM, Sangjin Lee <sj...@gmail.com> wrote:

> Thanks for the clarification Subru. I am in favor of moving forward.
>
>
> Sangjin
>
> On Thu, Jun 22, 2017 at 6:21 PM, Karthik Shashank Kambatla <
> kasha@cloudera.com> wrote:
>
> > Given RTC and the amount of production testing this feature has
> received, I
> > am totally in favor of this merge.
> >
> >
> >
> > On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org>
> wrote:
> >
> > > Hi all,
> > >
> > > We would like to open a discussion on merging the YARN Federation
> > > (YARN-2915) [1] feature to trunk.  We have been developing the feature
> > in a
> > > feature branch (YARN-2915 [2]) for a while, and we are reasonably
> > confident
> > > that the state of the feature meets the criteria to be merged onto
> trunk.
> > >
> > > *Key Ideas*:
> > >
> > > YARN’s centralized design allows strict enforcement of scheduling
> > > invariants and effective resource sharing, but becomes a scalability
> > > bottleneck (in number of jobs and nodes) well before reaching the scale
> > of
> > > our clusters (e.g., 20k-50k nodes).
> > >
> > >
> > > To address these limitations, we developed a scale-out,
> federation-based
> > > solution (YARN-2915). Our architecture scales near-linearly to
> datacenter
> > > sized clusters, by partitioning nodes across multiple sub-clusters
> (each
> > > running a YARN cluster of few thousands nodes). Applications can span
> > > multiple sub-clusters *transparently (i.e. no code change or
> > recompilation
> > > of existing apps)*, thanks to a layer of indirection that negotiates
> with
> > > multiple sub-clusters' Resource Managers on behalf of the application.
> > >
> > >
> > > This design is structurally scalable, as it bounds the number of nodes
> > each
> > > RM is responsible for. Appropriate policies ensure that the majority of
> > > applications reside within a single sub-cluster, thus further
> controlling
> > > the load on each RM. This provides near linear scale-out by simply
> adding
> > > more sub-clusters. The same mechanism enables pooling of resources from
> > > clusters owned and operated by different teams.
> > >
> > > Status:
> > >
> > >    - The version we would like to merge to trunk is termed "MVP"
> (minimal
> > >    viable product). The feature will have a complete end-to-end
> > application
> > >    execution flow with the ability to span a single application across
> > >    multiple YARN (sub) clusters.
> > >    - There were 50+ sub-tasks that were that were completed as part of
> > this
> > >    effort. Every patch has been reviewed and +1ed by a committer.
> Thanks
> > to
> > >    Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
> > >    - Federation is designed to be built around YARN and consequently
> has
> > >    minimal code changes to core YARN. The relevant JIRAs that modify
> > > existing
> > >    YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
> > >    attention to ensure that if federation is disabled there is zero
> > impact
> > > to
> > >    existing functionality (disabled by default).
> > >    - We found a few bugs as we went along which we fixed directly
> > upstream
> > >    in trunk and/or branch-2.
> > >    - We have continuously rebasing the feature branch [2] so the merge
> > >    should be a straightforward cherry-pick.
> > >    - The current version has been rather thoroughly tested and is
> > currently
> > >    deployed in a *10,000+ node federated YARN cluster that's running
> > >    upwards of 50k jobs daily with a reliability of 99.9%*.
> > >    - We have few ideas for follow-up extensions/improvements which are
> > >    tracked in the umbrella JIRA YARN-5597[3].
> > >
> > >
> > > Documentation:
> > >
> > >    - Quick start guide (maven site) - YARN-6484[4].
> > >    - Overall design doc[5] and the slide-deck [6] we used for our talk
> at
> > >    Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
> > >
> > >
> > > Credits:
> > >
> > > This is a group effort that could have not been possible without the
> > ideas
> > > and hard work of many other folks and we would like to specifically
> call
> > > out Giovanni, Botong & Ellen for their invaluable contributions. Also
> big
> > > thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
> > > Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith
> and
> > > many more) that helped us shape our ideas and code with very insightful
> > > feedback and comments.
> > >
> > > We plan to start the merge vote in the next week or so. The branch is
> > close
> > > to complete (~5 patches before one can kick the tires on a running
> > > deployment). Please look through the branch; feedback is welcome.
> Thanks!
> > >
> > > Cheers,
> > > Subru & Carlo
> > >
> > > [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
> > > [2] https://github.com/apache/hadoop/tree/YARN-2915
> > > [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
> > > [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
> > > [5] https://issues.apache.org/jira/secure/attachment/12733292/
> > > Yarn_federation_design_v1.pdf
> > > [6] https://issues.apache.org/jira/secure/attachment/1281922
> > > 9/YARN-Federation-Hadoop-Summit_final.pptx
> > > [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
> > > [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
> > >
> >
>

Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

Posted by Sangjin Lee <sj...@gmail.com>.
Thanks for the clarification Subru. I am in favor of moving forward.


Sangjin

On Thu, Jun 22, 2017 at 6:21 PM, Karthik Shashank Kambatla <
kasha@cloudera.com> wrote:

> Given RTC and the amount of production testing this feature has received, I
> am totally in favor of this merge.
>
>
>
> On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org> wrote:
>
> > Hi all,
> >
> > We would like to open a discussion on merging the YARN Federation
> > (YARN-2915) [1] feature to trunk.  We have been developing the feature
> in a
> > feature branch (YARN-2915 [2]) for a while, and we are reasonably
> confident
> > that the state of the feature meets the criteria to be merged onto trunk.
> >
> > *Key Ideas*:
> >
> > YARN’s centralized design allows strict enforcement of scheduling
> > invariants and effective resource sharing, but becomes a scalability
> > bottleneck (in number of jobs and nodes) well before reaching the scale
> of
> > our clusters (e.g., 20k-50k nodes).
> >
> >
> > To address these limitations, we developed a scale-out, federation-based
> > solution (YARN-2915). Our architecture scales near-linearly to datacenter
> > sized clusters, by partitioning nodes across multiple sub-clusters (each
> > running a YARN cluster of few thousands nodes). Applications can span
> > multiple sub-clusters *transparently (i.e. no code change or
> recompilation
> > of existing apps)*, thanks to a layer of indirection that negotiates with
> > multiple sub-clusters' Resource Managers on behalf of the application.
> >
> >
> > This design is structurally scalable, as it bounds the number of nodes
> each
> > RM is responsible for. Appropriate policies ensure that the majority of
> > applications reside within a single sub-cluster, thus further controlling
> > the load on each RM. This provides near linear scale-out by simply adding
> > more sub-clusters. The same mechanism enables pooling of resources from
> > clusters owned and operated by different teams.
> >
> > Status:
> >
> >    - The version we would like to merge to trunk is termed "MVP" (minimal
> >    viable product). The feature will have a complete end-to-end
> application
> >    execution flow with the ability to span a single application across
> >    multiple YARN (sub) clusters.
> >    - There were 50+ sub-tasks that were that were completed as part of
> this
> >    effort. Every patch has been reviewed and +1ed by a committer. Thanks
> to
> >    Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
> >    - Federation is designed to be built around YARN and consequently has
> >    minimal code changes to core YARN. The relevant JIRAs that modify
> > existing
> >    YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
> >    attention to ensure that if federation is disabled there is zero
> impact
> > to
> >    existing functionality (disabled by default).
> >    - We found a few bugs as we went along which we fixed directly
> upstream
> >    in trunk and/or branch-2.
> >    - We have continuously rebasing the feature branch [2] so the merge
> >    should be a straightforward cherry-pick.
> >    - The current version has been rather thoroughly tested and is
> currently
> >    deployed in a *10,000+ node federated YARN cluster that's running
> >    upwards of 50k jobs daily with a reliability of 99.9%*.
> >    - We have few ideas for follow-up extensions/improvements which are
> >    tracked in the umbrella JIRA YARN-5597[3].
> >
> >
> > Documentation:
> >
> >    - Quick start guide (maven site) - YARN-6484[4].
> >    - Overall design doc[5] and the slide-deck [6] we used for our talk at
> >    Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
> >
> >
> > Credits:
> >
> > This is a group effort that could have not been possible without the
> ideas
> > and hard work of many other folks and we would like to specifically call
> > out Giovanni, Botong & Ellen for their invaluable contributions. Also big
> > thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
> > Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
> > many more) that helped us shape our ideas and code with very insightful
> > feedback and comments.
> >
> > We plan to start the merge vote in the next week or so. The branch is
> close
> > to complete (~5 patches before one can kick the tires on a running
> > deployment). Please look through the branch; feedback is welcome. Thanks!
> >
> > Cheers,
> > Subru & Carlo
> >
> > [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
> > [2] https://github.com/apache/hadoop/tree/YARN-2915
> > [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
> > [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
> > [5] https://issues.apache.org/jira/secure/attachment/12733292/
> > Yarn_federation_design_v1.pdf
> > [6] https://issues.apache.org/jira/secure/attachment/1281922
> > 9/YARN-Federation-Hadoop-Summit_final.pptx
> > [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
> > [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
> >
>

Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

Posted by Karthik Shashank Kambatla <ka...@cloudera.com>.
Given RTC and the amount of production testing this feature has received, I
am totally in favor of this merge.



On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org> wrote:

> Hi all,
>
> We would like to open a discussion on merging the YARN Federation
> (YARN-2915) [1] feature to trunk.  We have been developing the feature in a
> feature branch (YARN-2915 [2]) for a while, and we are reasonably confident
> that the state of the feature meets the criteria to be merged onto trunk.
>
> *Key Ideas*:
>
> YARN’s centralized design allows strict enforcement of scheduling
> invariants and effective resource sharing, but becomes a scalability
> bottleneck (in number of jobs and nodes) well before reaching the scale of
> our clusters (e.g., 20k-50k nodes).
>
>
> To address these limitations, we developed a scale-out, federation-based
> solution (YARN-2915). Our architecture scales near-linearly to datacenter
> sized clusters, by partitioning nodes across multiple sub-clusters (each
> running a YARN cluster of few thousands nodes). Applications can span
> multiple sub-clusters *transparently (i.e. no code change or recompilation
> of existing apps)*, thanks to a layer of indirection that negotiates with
> multiple sub-clusters' Resource Managers on behalf of the application.
>
>
> This design is structurally scalable, as it bounds the number of nodes each
> RM is responsible for. Appropriate policies ensure that the majority of
> applications reside within a single sub-cluster, thus further controlling
> the load on each RM. This provides near linear scale-out by simply adding
> more sub-clusters. The same mechanism enables pooling of resources from
> clusters owned and operated by different teams.
>
> Status:
>
>    - The version we would like to merge to trunk is termed "MVP" (minimal
>    viable product). The feature will have a complete end-to-end application
>    execution flow with the ability to span a single application across
>    multiple YARN (sub) clusters.
>    - There were 50+ sub-tasks that were that were completed as part of this
>    effort. Every patch has been reviewed and +1ed by a committer. Thanks to
>    Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
>    - Federation is designed to be built around YARN and consequently has
>    minimal code changes to core YARN. The relevant JIRAs that modify
> existing
>    YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
>    attention to ensure that if federation is disabled there is zero impact
> to
>    existing functionality (disabled by default).
>    - We found a few bugs as we went along which we fixed directly upstream
>    in trunk and/or branch-2.
>    - We have continuously rebasing the feature branch [2] so the merge
>    should be a straightforward cherry-pick.
>    - The current version has been rather thoroughly tested and is currently
>    deployed in a *10,000+ node federated YARN cluster that's running
>    upwards of 50k jobs daily with a reliability of 99.9%*.
>    - We have few ideas for follow-up extensions/improvements which are
>    tracked in the umbrella JIRA YARN-5597[3].
>
>
> Documentation:
>
>    - Quick start guide (maven site) - YARN-6484[4].
>    - Overall design doc[5] and the slide-deck [6] we used for our talk at
>    Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
>
>
> Credits:
>
> This is a group effort that could have not been possible without the ideas
> and hard work of many other folks and we would like to specifically call
> out Giovanni, Botong & Ellen for their invaluable contributions. Also big
> thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
> Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
> many more) that helped us shape our ideas and code with very insightful
> feedback and comments.
>
> We plan to start the merge vote in the next week or so. The branch is close
> to complete (~5 patches before one can kick the tires on a running
> deployment). Please look through the branch; feedback is welcome. Thanks!
>
> Cheers,
> Subru & Carlo
>
> [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
> [2] https://github.com/apache/hadoop/tree/YARN-2915
> [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
> [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
> [5] https://issues.apache.org/jira/secure/attachment/12733292/
> Yarn_federation_design_v1.pdf
> [6] https://issues.apache.org/jira/secure/attachment/1281922
> 9/YARN-Federation-Hadoop-Summit_final.pptx
> [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
> [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
>

Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

Posted by Subru Krishnan <su...@apache.org>.
Thanks Arun for the vote of confidence.

Sangjin,

Your question is very pertinent and that's why we called it out
specifically in the mail. The short answer is yes, there's absolutely no
impact on keeping it turned off.

The more detailed answer is with Federation there's only *one* integration
point with YARN, i.e. the RMs heartbeat "membership" to the federated
cluster (*YARN-3671*). The heartbeat thread is enabled only when Federation
is turned on. We made a few minor changes in core YARN, every single one
was a refactoring (to reuse code or make something configurable) and bug
fixes. There is zero change both in behavior and data-structures in the RM
(or Scheduler) to support Federation. In fact (except for the heartbeat)
the RM & NM is totally unaware of Federation. In addition, we are also
transparently rolling out Federation in our clusters, i.e. moving from
multiple disjoint YARN clusters to federated ones independent of the apps
and we have not observed any issues so far. I suggest looking at *YARN-3671
*as that'll give you a clear picture. And yes, it is manageable (40KB) most
of which is the new heartbeat thread and tests :).

Hope this addresses your concern.


On Thu, Jun 22, 2017 at 11:17 AM, Sangjin Lee <sj...@apache.org> wrote:

> Thanks much Subru, Carlo, and others for working on this tirelessly and
> bringing it to this point!
>
> I haven't spent enough time on the federation code itself to have a strong
> opinion of the quality. I trust the reviews that folks did for individual
> commits.
>
> One ask I have: could you comment on whether it is turned off cleanly
> without any impact whatsoever? There are many who won't turn on federation
> (now) and it is imperative that there is as little impact as possible when
> this is merged. I'm thinking of behavior, memory footprint of the RM, and
> so on, when federation is turned off.
>
> Sangjin
>
> On Thu, Jun 22, 2017 at 10:59 AM, Arun Suresh <as...@apache.org> wrote:
>
>> Thanks for all the work on this Subru, Carlo et al.
>> I think we should proceed with a merge vote.
>>
>> Cheers
>> -Arun
>>
>> On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org> wrote:
>>
>> > Hi all,
>> >
>> > We would like to open a discussion on merging the YARN Federation
>> > (YARN-2915) [1] feature to trunk.  We have been developing the feature
>> in a
>> > feature branch (YARN-2915 [2]) for a while, and we are reasonably
>> confident
>> > that the state of the feature meets the criteria to be merged onto
>> trunk.
>> >
>> > *Key Ideas*:
>> >
>> > YARN’s centralized design allows strict enforcement of scheduling
>> > invariants and effective resource sharing, but becomes a scalability
>> > bottleneck (in number of jobs and nodes) well before reaching the scale
>> of
>> > our clusters (e.g., 20k-50k nodes).
>> >
>> >
>> > To address these limitations, we developed a scale-out, federation-based
>> > solution (YARN-2915). Our architecture scales near-linearly to
>> datacenter
>> > sized clusters, by partitioning nodes across multiple sub-clusters (each
>> > running a YARN cluster of few thousands nodes). Applications can span
>> > multiple sub-clusters *transparently (i.e. no code change or
>> recompilation
>> > of existing apps)*, thanks to a layer of indirection that negotiates
>> with
>> > multiple sub-clusters' Resource Managers on behalf of the application.
>> >
>> >
>> > This design is structurally scalable, as it bounds the number of nodes
>> each
>> > RM is responsible for. Appropriate policies ensure that the majority of
>> > applications reside within a single sub-cluster, thus further
>> controlling
>> > the load on each RM. This provides near linear scale-out by simply
>> adding
>> > more sub-clusters. The same mechanism enables pooling of resources from
>> > clusters owned and operated by different teams.
>> >
>> > Status:
>> >
>> >    - The version we would like to merge to trunk is termed "MVP"
>> (minimal
>> >    viable product). The feature will have a complete end-to-end
>> application
>> >    execution flow with the ability to span a single application across
>> >    multiple YARN (sub) clusters.
>> >    - There were 50+ sub-tasks that were that were completed as part of
>> this
>> >    effort. Every patch has been reviewed and +1ed by a committer.
>> Thanks to
>> >    Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
>> >    - Federation is designed to be built around YARN and consequently has
>> >    minimal code changes to core YARN. The relevant JIRAs that modify
>> > existing
>> >    YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
>> >    attention to ensure that if federation is disabled there is zero
>> impact
>> > to
>> >    existing functionality (disabled by default).
>> >    - We found a few bugs as we went along which we fixed directly
>> upstream
>> >    in trunk and/or branch-2.
>> >    - We have continuously rebasing the feature branch [2] so the merge
>> >    should be a straightforward cherry-pick.
>> >    - The current version has been rather thoroughly tested and is
>> currently
>> >    deployed in a *10,000+ node federated YARN cluster that's running
>> >    upwards of 50k jobs daily with a reliability of 99.9%*.
>> >    - We have few ideas for follow-up extensions/improvements which are
>> >    tracked in the umbrella JIRA YARN-5597[3].
>> >
>> >
>> > Documentation:
>> >
>> >    - Quick start guide (maven site) - YARN-6484[4].
>> >    - Overall design doc[5] and the slide-deck [6] we used for our talk
>> at
>> >    Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
>> >
>> >
>> > Credits:
>> >
>> > This is a group effort that could have not been possible without the
>> ideas
>> > and hard work of many other folks and we would like to specifically call
>> > out Giovanni, Botong & Ellen for their invaluable contributions. Also
>> big
>> > thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
>> > Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith
>> and
>> > many more) that helped us shape our ideas and code with very insightful
>> > feedback and comments.
>> >
>> > We plan to start the merge vote in the next week or so. The branch is
>> close
>> > to complete (~5 patches before one can kick the tires on a running
>> > deployment). Please look through the branch; feedback is welcome.
>> Thanks!
>> >
>> > Cheers,
>> > Subru & Carlo
>> >
>> > [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
>> > [2] https://github.com/apache/hadoop/tree/YARN-2915
>> > [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
>> > [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
>> > [5] https://issues.apache.org/jira/secure/attachment/12733292/
>> > Yarn_federation_design_v1.pdf
>> > [6] https://issues.apache.org/jira/secure/attachment/1281922
>> > 9/YARN-Federation-Hadoop-Summit_final.pptx
>> > [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
>> > [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
>> >
>>
>
>

Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

Posted by Sangjin Lee <sj...@apache.org>.
Thanks much Subru, Carlo, and others for working on this tirelessly and
bringing it to this point!

I haven't spent enough time on the federation code itself to have a strong
opinion of the quality. I trust the reviews that folks did for individual
commits.

One ask I have: could you comment on whether it is turned off cleanly
without any impact whatsoever? There are many who won't turn on federation
(now) and it is imperative that there is as little impact as possible when
this is merged. I'm thinking of behavior, memory footprint of the RM, and
so on, when federation is turned off.

Sangjin

On Thu, Jun 22, 2017 at 10:59 AM, Arun Suresh <as...@apache.org> wrote:

> Thanks for all the work on this Subru, Carlo et al.
> I think we should proceed with a merge vote.
>
> Cheers
> -Arun
>
> On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org> wrote:
>
> > Hi all,
> >
> > We would like to open a discussion on merging the YARN Federation
> > (YARN-2915) [1] feature to trunk.  We have been developing the feature
> in a
> > feature branch (YARN-2915 [2]) for a while, and we are reasonably
> confident
> > that the state of the feature meets the criteria to be merged onto trunk.
> >
> > *Key Ideas*:
> >
> > YARN’s centralized design allows strict enforcement of scheduling
> > invariants and effective resource sharing, but becomes a scalability
> > bottleneck (in number of jobs and nodes) well before reaching the scale
> of
> > our clusters (e.g., 20k-50k nodes).
> >
> >
> > To address these limitations, we developed a scale-out, federation-based
> > solution (YARN-2915). Our architecture scales near-linearly to datacenter
> > sized clusters, by partitioning nodes across multiple sub-clusters (each
> > running a YARN cluster of few thousands nodes). Applications can span
> > multiple sub-clusters *transparently (i.e. no code change or
> recompilation
> > of existing apps)*, thanks to a layer of indirection that negotiates with
> > multiple sub-clusters' Resource Managers on behalf of the application.
> >
> >
> > This design is structurally scalable, as it bounds the number of nodes
> each
> > RM is responsible for. Appropriate policies ensure that the majority of
> > applications reside within a single sub-cluster, thus further controlling
> > the load on each RM. This provides near linear scale-out by simply adding
> > more sub-clusters. The same mechanism enables pooling of resources from
> > clusters owned and operated by different teams.
> >
> > Status:
> >
> >    - The version we would like to merge to trunk is termed "MVP" (minimal
> >    viable product). The feature will have a complete end-to-end
> application
> >    execution flow with the ability to span a single application across
> >    multiple YARN (sub) clusters.
> >    - There were 50+ sub-tasks that were that were completed as part of
> this
> >    effort. Every patch has been reviewed and +1ed by a committer. Thanks
> to
> >    Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
> >    - Federation is designed to be built around YARN and consequently has
> >    minimal code changes to core YARN. The relevant JIRAs that modify
> > existing
> >    YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
> >    attention to ensure that if federation is disabled there is zero
> impact
> > to
> >    existing functionality (disabled by default).
> >    - We found a few bugs as we went along which we fixed directly
> upstream
> >    in trunk and/or branch-2.
> >    - We have continuously rebasing the feature branch [2] so the merge
> >    should be a straightforward cherry-pick.
> >    - The current version has been rather thoroughly tested and is
> currently
> >    deployed in a *10,000+ node federated YARN cluster that's running
> >    upwards of 50k jobs daily with a reliability of 99.9%*.
> >    - We have few ideas for follow-up extensions/improvements which are
> >    tracked in the umbrella JIRA YARN-5597[3].
> >
> >
> > Documentation:
> >
> >    - Quick start guide (maven site) - YARN-6484[4].
> >    - Overall design doc[5] and the slide-deck [6] we used for our talk at
> >    Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
> >
> >
> > Credits:
> >
> > This is a group effort that could have not been possible without the
> ideas
> > and hard work of many other folks and we would like to specifically call
> > out Giovanni, Botong & Ellen for their invaluable contributions. Also big
> > thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
> > Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
> > many more) that helped us shape our ideas and code with very insightful
> > feedback and comments.
> >
> > We plan to start the merge vote in the next week or so. The branch is
> close
> > to complete (~5 patches before one can kick the tires on a running
> > deployment). Please look through the branch; feedback is welcome. Thanks!
> >
> > Cheers,
> > Subru & Carlo
> >
> > [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
> > [2] https://github.com/apache/hadoop/tree/YARN-2915
> > [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
> > [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
> > [5] https://issues.apache.org/jira/secure/attachment/12733292/
> > Yarn_federation_design_v1.pdf
> > [6] https://issues.apache.org/jira/secure/attachment/1281922
> > 9/YARN-Federation-Hadoop-Summit_final.pptx
> > [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
> > [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
> >
>

Re: [DISCUSS] merging YARN-2915 (Federation) to trunk

Posted by Arun Suresh <as...@apache.org>.
Thanks for all the work on this Subru, Carlo et al.
I think we should proceed with a merge vote.

Cheers
-Arun

On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <su...@apache.org> wrote:

> Hi all,
>
> We would like to open a discussion on merging the YARN Federation
> (YARN-2915) [1] feature to trunk.  We have been developing the feature in a
> feature branch (YARN-2915 [2]) for a while, and we are reasonably confident
> that the state of the feature meets the criteria to be merged onto trunk.
>
> *Key Ideas*:
>
> YARN’s centralized design allows strict enforcement of scheduling
> invariants and effective resource sharing, but becomes a scalability
> bottleneck (in number of jobs and nodes) well before reaching the scale of
> our clusters (e.g., 20k-50k nodes).
>
>
> To address these limitations, we developed a scale-out, federation-based
> solution (YARN-2915). Our architecture scales near-linearly to datacenter
> sized clusters, by partitioning nodes across multiple sub-clusters (each
> running a YARN cluster of few thousands nodes). Applications can span
> multiple sub-clusters *transparently (i.e. no code change or recompilation
> of existing apps)*, thanks to a layer of indirection that negotiates with
> multiple sub-clusters' Resource Managers on behalf of the application.
>
>
> This design is structurally scalable, as it bounds the number of nodes each
> RM is responsible for. Appropriate policies ensure that the majority of
> applications reside within a single sub-cluster, thus further controlling
> the load on each RM. This provides near linear scale-out by simply adding
> more sub-clusters. The same mechanism enables pooling of resources from
> clusters owned and operated by different teams.
>
> Status:
>
>    - The version we would like to merge to trunk is termed "MVP" (minimal
>    viable product). The feature will have a complete end-to-end application
>    execution flow with the ability to span a single application across
>    multiple YARN (sub) clusters.
>    - There were 50+ sub-tasks that were that were completed as part of this
>    effort. Every patch has been reviewed and +1ed by a committer. Thanks to
>    Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
>    - Federation is designed to be built around YARN and consequently has
>    minimal code changes to core YARN. The relevant JIRAs that modify
> existing
>    YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
>    attention to ensure that if federation is disabled there is zero impact
> to
>    existing functionality (disabled by default).
>    - We found a few bugs as we went along which we fixed directly upstream
>    in trunk and/or branch-2.
>    - We have continuously rebasing the feature branch [2] so the merge
>    should be a straightforward cherry-pick.
>    - The current version has been rather thoroughly tested and is currently
>    deployed in a *10,000+ node federated YARN cluster that's running
>    upwards of 50k jobs daily with a reliability of 99.9%*.
>    - We have few ideas for follow-up extensions/improvements which are
>    tracked in the umbrella JIRA YARN-5597[3].
>
>
> Documentation:
>
>    - Quick start guide (maven site) - YARN-6484[4].
>    - Overall design doc[5] and the slide-deck [6] we used for our talk at
>    Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.
>
>
> Credits:
>
> This is a group effort that could have not been possible without the ideas
> and hard work of many other folks and we would like to specifically call
> out Giovanni, Botong & Ellen for their invaluable contributions. Also big
> thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
> Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
> many more) that helped us shape our ideas and code with very insightful
> feedback and comments.
>
> We plan to start the merge vote in the next week or so. The branch is close
> to complete (~5 patches before one can kick the tires on a running
> deployment). Please look through the branch; feedback is welcome. Thanks!
>
> Cheers,
> Subru & Carlo
>
> [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
> [2] https://github.com/apache/hadoop/tree/YARN-2915
> [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
> [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
> [5] https://issues.apache.org/jira/secure/attachment/12733292/
> Yarn_federation_design_v1.pdf
> [6] https://issues.apache.org/jira/secure/attachment/1281922
> 9/YARN-Federation-Hadoop-Summit_final.pptx
> [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
> [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
>