You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Daniel Templeton <da...@cloudera.com> on 2017/10/19 19:53:02 UTC

[DISCUSS] Merge Resource Types (YARN-3926) to branch-3.0

After much offline discussion with Wangda, Sunil, Varun V., and Andrew 
we've agreed that it would make sense to pull resource types into 
branch-3.0 ahead of the Hadoop 3.0 RC0.  Resource types has already been 
merged into trunk/3.1.  Now I'd like open a discussion about getting it 
into 3.0 GA.  Here's the run-down:

Feature Details
---------------
Resource types replaces the two primitives that tracked CPU and memory 
with an array of objects to track an arbitrary set of resources (that 
must always include CPU and memory).  The resource manager reads the 
master list of supported resources from its configs.  The node managers 
read their resource values from their configs and report them to the 
resource manager in their heartbeats.  The clients read the supported 
resource types from their configs (or an RM service) and specify them in 
the application submission.  At a high level, nothing else changes.

The Resource object is a core construct in the resource manager and 
scheduler.  All application operations end up touching Resource objects 
as we determine fit or share-based priority for applications, queues, 
and nodes.  As this feature replaces the core of how Resource objects 
work, resource types impacts almost every aspect of the resource 
manager's operation.  The change is pervasive, but not radical.

The resource types patches as merged into trunk/3.1 include an 
additional feature called resource profiles.  Resource profiles are 
actually independent of resource types, and either is useful without the 
other.  The resource profiles code is still in a bit of flux, so the 
current plan is to pull only the resource types code into branch-3.0.  I 
have backported only the resource types patches into the resource-types 
branch.  Unit tests are passing, and I don't see any significant risk 
from the split.  The diff between the resource-types branch and 
branch-3.0 is available as a branch-3.0 patch on YARN-7013[1].

Justification for 3.0
---------------------
Resource types (leaving out resource profiles) is in a stable state and 
is well tested with unit tests, performance tests, and functional tests 
with both the fair scheduler and the capacity scheduler.  Tests were run 
on both the resource-types branch and the original YARN-3926 branch. 
There is some additional work to do, but none of it's critical (except 
maybe improving the docs).  Our confidence level in the feature is good.

Resource types doesn't introduce incompatible changes to any Public and 
Stable APIs.  The are some incompatible changes to Public and Unstable 
APIs, but that's what a major release is for.  The Resource object proto 
retains the CPU and memory fields and adds a new field for any 
additional resource types to retain wire compatibility.  Other proto 
changes are all additive.

While it's not possible to turn resource types off per se, if the user 
does not activate the feature, the operation of YARN will be unchanged.  
Getting this feature into Hadoop 3.0 gives us the required groundwork to 
make progress on tidying up the usage details without having to drag in 
a large set of invasive changes into 3.1.

If we don't pull resource types into 3.0, it will open a persistent 
channel through which failures can be introduced through backporting.  
The differences introduced by resource types are significant enough that 
it will be an issue for scheduler and resource manager patches between 
3.1 and 3.0.

 From the other side, resource types is a pervasive change, and there's 
no turning it off.  Users will be impacted by it regardless of whether 
they choose to use it or not.  While we've tested it, the feature 
represents a large number of changes to core code that's critical to the 
resource manager's operation.  If we're going to introduce a large 
change like this, no matter how well tested, we should do it in 3.0 
where users already expect some bumps in the road.  Bringing in a large 
change like this in a 3.1 release, when users expect the release to have 
stabilized, sounds like a bad idea.


What do folks think about pulling resource types back into branch-3.0 in 
time for RC0?  Any concerns?

Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn, 
Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew 
Wang for their work on getting the resource types work done, backported, 
tested, and on track for 3.0.

[1]: 
https://issues.apache.org/jira/secure/attachment/12892456/YARN-7013.branch-3.0.002.patch

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Re: [DISCUSS] Merge Resource Types (YARN-3926) to branch-3.0

Posted by Wangda Tan <wh...@gmail.com>.
Hi Daniel,

Thanks for starting the thread and working on branch-3.0 merge efforts.

I'm in favor of bringing resource types in branch-3.0.

Could you please share test you have done and performance numbers to
compare branch-3.0 and branch-3.0 + resource types patches? I will +1 to
the merge if we see similar performance after applying resource types
patches comparing to trunk

- Wangda


On Thu, Oct 19, 2017 at 1:47 PM, Andrew Wang <an...@cloudera.com>
wrote:

> +0, as Daniel said we discussed this a lot off-list.
>
> Let's make sure the docs are up to snuff, and we update the site release
> notes to have a blurb on resource types.
>
> Hoping we can get a merge VOTE kicked off ASAP (tomorrow?) since we're down
> to the wire for the proposed RC0 schedule.
>
> On Thu, Oct 19, 2017 at 12:53 PM, Daniel Templeton <da...@cloudera.com>
> wrote:
>
> > After much offline discussion with Wangda, Sunil, Varun V., and Andrew
> > we've agreed that it would make sense to pull resource types into
> > branch-3.0 ahead of the Hadoop 3.0 RC0.  Resource types has already been
> > merged into trunk/3.1.  Now I'd like open a discussion about getting it
> > into 3.0 GA.  Here's the run-down:
> >
> > Feature Details
> > ---------------
> > Resource types replaces the two primitives that tracked CPU and memory
> > with an array of objects to track an arbitrary set of resources (that
> must
> > always include CPU and memory).  The resource manager reads the master
> list
> > of supported resources from its configs.  The node managers read their
> > resource values from their configs and report them to the resource
> manager
> > in their heartbeats.  The clients read the supported resource types from
> > their configs (or an RM service) and specify them in the application
> > submission.  At a high level, nothing else changes.
> >
> > The Resource object is a core construct in the resource manager and
> > scheduler.  All application operations end up touching Resource objects
> as
> > we determine fit or share-based priority for applications, queues, and
> > nodes.  As this feature replaces the core of how Resource objects work,
> > resource types impacts almost every aspect of the resource manager's
> > operation.  The change is pervasive, but not radical.
> >
> > The resource types patches as merged into trunk/3.1 include an additional
> > feature called resource profiles.  Resource profiles are actually
> > independent of resource types, and either is useful without the other.
> The
> > resource profiles code is still in a bit of flux, so the current plan is
> to
> > pull only the resource types code into branch-3.0.  I have backported
> only
> > the resource types patches into the resource-types branch.  Unit tests
> are
> > passing, and I don't see any significant risk from the split.  The diff
> > between the resource-types branch and branch-3.0 is available as a
> > branch-3.0 patch on YARN-7013[1].
> >
> > Justification for 3.0
> > ---------------------
> > Resource types (leaving out resource profiles) is in a stable state and
> is
> > well tested with unit tests, performance tests, and functional tests with
> > both the fair scheduler and the capacity scheduler.  Tests were run on
> both
> > the resource-types branch and the original YARN-3926 branch. There is
> some
> > additional work to do, but none of it's critical (except maybe improving
> > the docs).  Our confidence level in the feature is good.
> >
> > Resource types doesn't introduce incompatible changes to any Public and
> > Stable APIs.  The are some incompatible changes to Public and Unstable
> > APIs, but that's what a major release is for.  The Resource object proto
> > retains the CPU and memory fields and adds a new field for any additional
> > resource types to retain wire compatibility.  Other proto changes are all
> > additive.
> >
> > While it's not possible to turn resource types off per se, if the user
> > does not activate the feature, the operation of YARN will be unchanged.
> > Getting this feature into Hadoop 3.0 gives us the required groundwork to
> > make progress on tidying up the usage details without having to drag in a
> > large set of invasive changes into 3.1.
> >
> > If we don't pull resource types into 3.0, it will open a persistent
> > channel through which failures can be introduced through backporting.
> The
> > differences introduced by resource types are significant enough that it
> > will be an issue for scheduler and resource manager patches between 3.1
> and
> > 3.0.
> >
> > From the other side, resource types is a pervasive change, and there's no
> > turning it off.  Users will be impacted by it regardless of whether they
> > choose to use it or not.  While we've tested it, the feature represents a
> > large number of changes to core code that's critical to the resource
> > manager's operation.  If we're going to introduce a large change like
> this,
> > no matter how well tested, we should do it in 3.0 where users already
> > expect some bumps in the road.  Bringing in a large change like this in a
> > 3.1 release, when users expect the release to have stabilized, sounds
> like
> > a bad idea.
> >
> >
> > What do folks think about pulling resource types back into branch-3.0 in
> > time for RC0?  Any concerns?
> >
> > Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn,
> > Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew
> > Wang for their work on getting the resource types work done, backported,
> > tested, and on track for 3.0.
> >
> > [1]: https://issues.apache.org/jira/secure/attachment/12892456/
> > YARN-7013.branch-3.0.002.patch
> >
>

Re: [DISCUSS] Merge Resource Types (YARN-3926) to branch-3.0

Posted by Andrew Wang <an...@cloudera.com>.
+0, as Daniel said we discussed this a lot off-list.

Let's make sure the docs are up to snuff, and we update the site release
notes to have a blurb on resource types.

Hoping we can get a merge VOTE kicked off ASAP (tomorrow?) since we're down
to the wire for the proposed RC0 schedule.

On Thu, Oct 19, 2017 at 12:53 PM, Daniel Templeton <da...@cloudera.com>
wrote:

> After much offline discussion with Wangda, Sunil, Varun V., and Andrew
> we've agreed that it would make sense to pull resource types into
> branch-3.0 ahead of the Hadoop 3.0 RC0.  Resource types has already been
> merged into trunk/3.1.  Now I'd like open a discussion about getting it
> into 3.0 GA.  Here's the run-down:
>
> Feature Details
> ---------------
> Resource types replaces the two primitives that tracked CPU and memory
> with an array of objects to track an arbitrary set of resources (that must
> always include CPU and memory).  The resource manager reads the master list
> of supported resources from its configs.  The node managers read their
> resource values from their configs and report them to the resource manager
> in their heartbeats.  The clients read the supported resource types from
> their configs (or an RM service) and specify them in the application
> submission.  At a high level, nothing else changes.
>
> The Resource object is a core construct in the resource manager and
> scheduler.  All application operations end up touching Resource objects as
> we determine fit or share-based priority for applications, queues, and
> nodes.  As this feature replaces the core of how Resource objects work,
> resource types impacts almost every aspect of the resource manager's
> operation.  The change is pervasive, but not radical.
>
> The resource types patches as merged into trunk/3.1 include an additional
> feature called resource profiles.  Resource profiles are actually
> independent of resource types, and either is useful without the other.  The
> resource profiles code is still in a bit of flux, so the current plan is to
> pull only the resource types code into branch-3.0.  I have backported only
> the resource types patches into the resource-types branch.  Unit tests are
> passing, and I don't see any significant risk from the split.  The diff
> between the resource-types branch and branch-3.0 is available as a
> branch-3.0 patch on YARN-7013[1].
>
> Justification for 3.0
> ---------------------
> Resource types (leaving out resource profiles) is in a stable state and is
> well tested with unit tests, performance tests, and functional tests with
> both the fair scheduler and the capacity scheduler.  Tests were run on both
> the resource-types branch and the original YARN-3926 branch. There is some
> additional work to do, but none of it's critical (except maybe improving
> the docs).  Our confidence level in the feature is good.
>
> Resource types doesn't introduce incompatible changes to any Public and
> Stable APIs.  The are some incompatible changes to Public and Unstable
> APIs, but that's what a major release is for.  The Resource object proto
> retains the CPU and memory fields and adds a new field for any additional
> resource types to retain wire compatibility.  Other proto changes are all
> additive.
>
> While it's not possible to turn resource types off per se, if the user
> does not activate the feature, the operation of YARN will be unchanged.
> Getting this feature into Hadoop 3.0 gives us the required groundwork to
> make progress on tidying up the usage details without having to drag in a
> large set of invasive changes into 3.1.
>
> If we don't pull resource types into 3.0, it will open a persistent
> channel through which failures can be introduced through backporting.  The
> differences introduced by resource types are significant enough that it
> will be an issue for scheduler and resource manager patches between 3.1 and
> 3.0.
>
> From the other side, resource types is a pervasive change, and there's no
> turning it off.  Users will be impacted by it regardless of whether they
> choose to use it or not.  While we've tested it, the feature represents a
> large number of changes to core code that's critical to the resource
> manager's operation.  If we're going to introduce a large change like this,
> no matter how well tested, we should do it in 3.0 where users already
> expect some bumps in the road.  Bringing in a large change like this in a
> 3.1 release, when users expect the release to have stabilized, sounds like
> a bad idea.
>
>
> What do folks think about pulling resource types back into branch-3.0 in
> time for RC0?  Any concerns?
>
> Thanks to Varun Vasudev, Sunil Govind, Wangda Tan, Yufei Gu, Grant Sohn,
> Jason Lowe, Arun Suresh, Karthik Kambatla, Vinod Vavilapalli, and Andrew
> Wang for their work on getting the resource types work done, backported,
> tested, and on track for 3.0.
>
> [1]: https://issues.apache.org/jira/secure/attachment/12892456/
> YARN-7013.branch-3.0.002.patch
>