You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Andrew Wang <an...@cloudera.com> on 2017/10/06 20:31:18 UTC

2017-10-06 Hadoop 3 release status update

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-10-06

The beta1 RC0 vote passed, and beta1 is out! Now tracking GA features.

Highlights:

   - 3.0.0-beta1 has been released!
   - Router-based federation merge vote should be about to pass
   - API-based scheduler configuration merge vote is out, has the votes so
   far

Red flags:

   - Still need to nail down whether we're going to try and merge resource
   profiles. I've been emailing with Wangda and Daniel about this, we need to
   reach a decision ASAP (might already be too late).
   - Still waiting on Allen to review YARN native services feature.

Previously tracked GA blockers that have been resolved or dropped:

   - YARN-7134
   <https://issues.apache.org/jira/browse/YARN-7134?src=confmacro> -
AppSchedulingInfo
   has a dependency on capacity schedulerOPEN:  Wangda downgraded this to
   "Major", dropping from list.

GA blockers:

   - YARN-6623
   <https://issues.apache.org/jira/browse/YARN-6623?src=confmacro> - Add
   support to turn off launching privileged containers in the
   container-executor PATCH AVAILABLE: Actively being reviewed
   - Change of ExecutionType
      - YARN-7275
      <https://issues.apache.org/jira/browse/YARN-7275?src=confmacro> - NM
      Statestore cleanup for Container updatesPATCH AVAILABLE: Kartheek has
      posted a patch, waiting for review
      - YARN-7178
      <https://issues.apache.org/jira/browse/YARN-7178?src=confmacro> - Add
      documentation for Container Update API OPEN : No update from Arun,
      though it's just a docs patch
   - ReservationSystem
      - YARN-4859
      <https://issues.apache.org/jira/browse/YARN-4859?src=confmacro> - [Bug]
      Unable to submit a job to a reservation when using FairScheduler OPEN:
      Yufei has picked this up
      - YARN-4827
      <https://issues.apache.org/jira/browse/YARN-4827?src=confmacro>
- Document
      configuration of ReservationSystem for FairScheduler OPEN: Yufei has
      picked this up, just a docs patch
   - Rolling upgrade
      - YARN-6142
      <https://issues.apache.org/jira/browse/YARN-6142?src=confmacro> - Support
      rolling upgrade between 2.x and 3.x OPEN : Ray is still going through
      JACC and proto output
      - HDFS-11096
      <https://issues.apache.org/jira/browse/HDFS-11096?src=confmacro>
- Support
      rolling upgrade between 2.x and 3.xPATCH AVAILABLE: Sean has revved
      the patch and is waiting on reviews from Ray, Allen

Features merged for GA:

   - Erasure coding
      - Continued bug reporting and fixing based on testing at Cloudera.
      - Still need to finish the 3.0 must-do's
   - Classpath isolation (HADOOP-11656)
   - HADOOP-14771 is still floating, along with adding documentation.
   - Compat guide (HADOOP-13714
   <https://issues.apache.org/jira/browse/HADOOP-13714>)
      - Synced with Daniel, he plans to wrap up the remaining  stuff next
      week
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

Unmerged features:

   - Resource types / profiles (YARN-3926
   <https://issues.apache.org/jira/browse/YARN-3926> and YARN-7069
   <https://issues.apache.org/jira/browse/YARN-7069>) (Wangda Tan)
      - This has been merged for 3.1.0, YARN-7069 tracks follow on work
      - Wangda said that he's okay waiting for 3.1.0 for this, we're
      waiting on Daniel. I synced with Daniel earlier this week, and
he wants to
      try and get some of it into 3.0.0. Waiting on an update.
      - I still need a JIRA query for tracking the state of this.
   - HDFS router-based federation (HDFS-10467
   <https://issues.apache.org/jira/browse/HDFS-10467>) (Inigo Goiri and
   Chris Douglas)
   - Merge vote should close any minute now
   - API-based scheduler configuration (Jonathan Hung)
      - Merge vote is out, will close next week
   - YARN native services (YARN-5079
   <https://issues.apache.org/jira/browse/YARN-5079>) (Jian He)
      - Subtasks were filed to address Allen's review comments from the
      previous merge vote, only one pending
      - We need to confirm with Allen that this is ready to go, he hasn't
      been reviewing

Re: 2017-10-06 Hadoop 3 release status update

Posted by Andrew Wang <an...@cloudera.com>.
Thanks for the update Allen, appreciate your continued help reviewing this
feature.

Looking at the calendar, we have three weeks from when we want to have GA
RC0 out for vote. We're already dipping into code freeze time landing HDFS
router-based federation and API-based scheduler configuration next week. If
we want to get any more features in, it means slipping the GA date.

So, my current thinking is that we should draw a line after these pending
branches merge. Like before, I'm willing to bend on this if there are
strong arguments, but the quality bar is even higher than it was for beta1,
and we've still got plenty of other blockers/criticals to work on for GA.

If you feel differently, please reach out, I can make myself very available
next week for a call.

Best,
Andrew

On Fri, Oct 6, 2017 at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com>
wrote:

>
> > On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> >
> >   - Still waiting on Allen to review YARN native services feature.
>
>         Fake news.
>
>         I’m still -1 on it, at least prior to a patch that posted late
> yesterday. I’ll probably have a chance to play with it early next week.
>
>
> Key problems:
>
>         * still haven’t been able to bring up dns daemon due to lacking
> documentation
>
>         * it really needs better naming and command structures.  When put
> into the larger YARN context, it’s very problematic:
>
> $ yarn —daemon start resourcemanager
>
>                 vs.
>
> $ yarn —daemon start apiserver
>
>                 if you awoke from a deep sleep from inside a cave, which
> one would you expect to “start YARN”?     Made worse that the feature is
> called “YARN services” all over the place.
>
> $ yarn service foo
>
>                 … what does this even mean?
>
>         It would be great if other outsiders really looked hard at this
> branch to give the team feedback.   Once it gets released, it’s gonna be
> too late to change it….
>
> As a sidenote:
>
>         It’d be great if the folks working on YARN spent some time
> consolidating daemons.  With this branch, it now feels like we’re
> approaching the double digit area of daemons to turn on all the features.
> It’s well past ridiculous, especially considering we still haven’t replaced
> the MRJHS’s feature set to the point we can turn it off.
>
>

Re: YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Jian He <jh...@hortonworks.com>.
Allen,

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:
The goal of this feature is to support container-based services on YARN. The team started with merging slider but built many new stuff like the REST service, the DNS which don’t exist in slider and also rewrote a bunch of stuff in the core.
This thread was supposed for release update. Let’s move the feature discussion to the jira YARN-7127<https://issues.apache.org/jira/browse/YARN-7127>.

Thanks,
Jian


On Oct 9, 2017, at 5:51 PM, Allen Wittenauer <aw...@effectivemachines.com>> wrote:


On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com>> wrote:
yarn application -deploy –f spec.json
yarn application -stop <service-name>
yarn application -restart <service-name>
yarn application -remove <service-name>

and

yarn application –list will display both application list from RM as well as docker services?

IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
  exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand.

But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation:

"Start all the hadoop components HDFS, YARN as usual.”

a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”.
b) Minimally, this should link to ClusterSetup.

Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics.

Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.




Re: YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Jian He <jh...@hortonworks.com>.
Allen,

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:
The goal of this feature is to support container-based services on YARN. The team started with merging slider but built many new stuff like the REST service, the DNS which don’t exist in slider and also rewrote a bunch of stuff in the core.
This thread was supposed for release update. Let’s move the feature discussion to the jira YARN-7127<https://issues.apache.org/jira/browse/YARN-7127>.

Thanks,
Jian


On Oct 9, 2017, at 5:51 PM, Allen Wittenauer <aw...@effectivemachines.com>> wrote:


On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com>> wrote:
yarn application -deploy –f spec.json
yarn application -stop <service-name>
yarn application -restart <service-name>
yarn application -remove <service-name>

and

yarn application –list will display both application list from RM as well as docker services?

IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
  exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand.

But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation:

"Start all the hadoop components HDFS, YARN as usual.”

a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”.
b) Minimally, this should link to ClusterSetup.

Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics.

Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.




Re: YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Jian He <jh...@hortonworks.com>.
Allen,

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:
The goal of this feature is to support container-based services on YARN. The team started with merging slider but built many new stuff like the REST service, the DNS which don’t exist in slider and also rewrote a bunch of stuff in the core.
This thread was supposed for release update. Let’s move the feature discussion to the jira YARN-7127<https://issues.apache.org/jira/browse/YARN-7127>.

Thanks,
Jian


On Oct 9, 2017, at 5:51 PM, Allen Wittenauer <aw...@effectivemachines.com>> wrote:


On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com>> wrote:
yarn application -deploy –f spec.json
yarn application -stop <service-name>
yarn application -restart <service-name>
yarn application -remove <service-name>

and

yarn application –list will display both application list from RM as well as docker services?

IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
  exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand.

But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation:

"Start all the hadoop components HDFS, YARN as usual.”

a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”.
b) Minimally, this should link to ClusterSetup.

Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics.

Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.




YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com> wrote:
> yarn application -deploy –f spec.json
> yarn application -stop <service-name>
> yarn application -restart <service-name>
> yarn application -remove <service-name>
> 
> and
> 
> yarn application –list will display both application list from RM as well as docker services?
	
	IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

> I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

	I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

	I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
   exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

	From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand. 

	But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation: 

	"Start all the hadoop components HDFS, YARN as usual.”

		a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”. 
		b) Minimally, this should link to ClusterSetup. 

	Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics. 

	Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.



---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com> wrote:
> yarn application -deploy –f spec.json
> yarn application -stop <service-name>
> yarn application -restart <service-name>
> yarn application -remove <service-name>
> 
> and
> 
> yarn application –list will display both application list from RM as well as docker services?
	
	IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

> I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

	I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

	I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
   exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

	From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand. 

	But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation: 

	"Start all the hadoop components HDFS, YARN as usual.”

		a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”. 
		b) Minimally, this should link to ClusterSetup. 

	Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics. 

	Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.



---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com> wrote:
> yarn application -deploy –f spec.json
> yarn application -stop <service-name>
> yarn application -restart <service-name>
> yarn application -remove <service-name>
> 
> and
> 
> yarn application –list will display both application list from RM as well as docker services?
	
	IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

> I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

	I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

	I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
   exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

	From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand. 

	But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation: 

	"Start all the hadoop components HDFS, YARN as usual.”

		a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”. 
		b) Minimally, this should link to ClusterSetup. 

	Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics. 

	Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.



---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org


YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com> wrote:
> yarn application -deploy –f spec.json
> yarn application -stop <service-name>
> yarn application -restart <service-name>
> yarn application -remove <service-name>
> 
> and
> 
> yarn application –list will display both application list from RM as well as docker services?
	
	IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

> I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

	I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

	I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
   exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

	From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand. 

	But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation: 

	"Start all the hadoop components HDFS, YARN as usual.”

		a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”. 
		b) Minimally, this should link to ClusterSetup. 

	Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics. 

	Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.



---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: 2017-10-06 Hadoop 3 release status update

Posted by Eric Yang <ey...@hortonworks.com>.
Hi Allen,

What if the commands are:

yarn application -deploy –f spec.json
yarn application -stop <service-name>
yarn application -restart <service-name>
yarn application -remove <service-name>

and

yarn application –list will display both application list from RM as well as docker services?

I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.  Your input is valuable to us. 

Feedback from others can also help to improve the current work.  Thank you.

Regards,
Eric

On 10/6/17, 4:27 PM, "Jian He" <jh...@hortonworks.com> wrote:

    Hi Allen,
    
    Thanks for spending the time reviewing it.
    A new patch was uploaded yesterday on YARN-7198 to address the documentation of missing config, you might want to check.
    The api-server is basically a REST server which accepts user requests to deploy services, it now has an option to be run as part of RM, which eliminates one separate daemon.
    
    We are open to naming suggestions. So far we used ‘service’ keyword to indicate this feature. E.g. 
    "yarn service” sub-command is used to manage services deployed on YARN such as:
    
    yarn service create -f service-spec.json
    yarn service stop <service-name>
    
    Jian
    
    > On Oct 6, 2017, at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com> wrote:
    > 
    > 
    >> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
    >> 
    >>  - Still waiting on Allen to review YARN native services feature.
    > 
    > 	Fake news.  
    > 
    > 	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.
    > 
    > 
    > Key problems:
    > 
    > 	* still haven’t been able to bring up dns daemon due to lacking documentation
    > 
    > 	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:
    > 
    > $ yarn —daemon start resourcemanager
    > 
    > 		vs.
    > 
    > $ yarn —daemon start apiserver 
    > 
    > 		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.
    > 
    > $ yarn service foo
    > 
    > 		… what does this even mean?
    > 
    > 	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….
    > 
    > As a sidenote:
    > 
    > 	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.
    > 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
    > For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
    > 
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: common-dev-help@hadoop.apache.org
    


Re: 2017-10-06 Hadoop 3 release status update

Posted by Eric Yang <ey...@hortonworks.com>.
Hi Allen,

What if the commands are:

yarn application -deploy –f spec.json
yarn application -stop <service-name>
yarn application -restart <service-name>
yarn application -remove <service-name>

and

yarn application –list will display both application list from RM as well as docker services?

I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.  Your input is valuable to us. 

Feedback from others can also help to improve the current work.  Thank you.

Regards,
Eric

On 10/6/17, 4:27 PM, "Jian He" <jh...@hortonworks.com> wrote:

    Hi Allen,
    
    Thanks for spending the time reviewing it.
    A new patch was uploaded yesterday on YARN-7198 to address the documentation of missing config, you might want to check.
    The api-server is basically a REST server which accepts user requests to deploy services, it now has an option to be run as part of RM, which eliminates one separate daemon.
    
    We are open to naming suggestions. So far we used ‘service’ keyword to indicate this feature. E.g. 
    "yarn service” sub-command is used to manage services deployed on YARN such as:
    
    yarn service create -f service-spec.json
    yarn service stop <service-name>
    
    Jian
    
    > On Oct 6, 2017, at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com> wrote:
    > 
    > 
    >> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
    >> 
    >>  - Still waiting on Allen to review YARN native services feature.
    > 
    > 	Fake news.  
    > 
    > 	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.
    > 
    > 
    > Key problems:
    > 
    > 	* still haven’t been able to bring up dns daemon due to lacking documentation
    > 
    > 	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:
    > 
    > $ yarn —daemon start resourcemanager
    > 
    > 		vs.
    > 
    > $ yarn —daemon start apiserver 
    > 
    > 		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.
    > 
    > $ yarn service foo
    > 
    > 		… what does this even mean?
    > 
    > 	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….
    > 
    > As a sidenote:
    > 
    > 	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.
    > 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
    > For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
    > 
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: common-dev-help@hadoop.apache.org
    


Re: 2017-10-06 Hadoop 3 release status update

Posted by Eric Yang <ey...@hortonworks.com>.
Hi Allen,

What if the commands are:

yarn application -deploy –f spec.json
yarn application -stop <service-name>
yarn application -restart <service-name>
yarn application -remove <service-name>

and

yarn application –list will display both application list from RM as well as docker services?

I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.  Your input is valuable to us. 

Feedback from others can also help to improve the current work.  Thank you.

Regards,
Eric

On 10/6/17, 4:27 PM, "Jian He" <jh...@hortonworks.com> wrote:

    Hi Allen,
    
    Thanks for spending the time reviewing it.
    A new patch was uploaded yesterday on YARN-7198 to address the documentation of missing config, you might want to check.
    The api-server is basically a REST server which accepts user requests to deploy services, it now has an option to be run as part of RM, which eliminates one separate daemon.
    
    We are open to naming suggestions. So far we used ‘service’ keyword to indicate this feature. E.g. 
    "yarn service” sub-command is used to manage services deployed on YARN such as:
    
    yarn service create -f service-spec.json
    yarn service stop <service-name>
    
    Jian
    
    > On Oct 6, 2017, at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com> wrote:
    > 
    > 
    >> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
    >> 
    >>  - Still waiting on Allen to review YARN native services feature.
    > 
    > 	Fake news.  
    > 
    > 	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.
    > 
    > 
    > Key problems:
    > 
    > 	* still haven’t been able to bring up dns daemon due to lacking documentation
    > 
    > 	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:
    > 
    > $ yarn —daemon start resourcemanager
    > 
    > 		vs.
    > 
    > $ yarn —daemon start apiserver 
    > 
    > 		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.
    > 
    > $ yarn service foo
    > 
    > 		… what does this even mean?
    > 
    > 	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….
    > 
    > As a sidenote:
    > 
    > 	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.
    > 
    > 
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
    > For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
    > 
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: common-dev-help@hadoop.apache.org
    


Re: 2017-10-06 Hadoop 3 release status update

Posted by Jian He <jh...@hortonworks.com>.
Hi Allen,

Thanks for spending the time reviewing it.
A new patch was uploaded yesterday on YARN-7198 to address the documentation of missing config, you might want to check.
The api-server is basically a REST server which accepts user requests to deploy services, it now has an option to be run as part of RM, which eliminates one separate daemon.

We are open to naming suggestions. So far we used ‘service’ keyword to indicate this feature. E.g. 
"yarn service” sub-command is used to manage services deployed on YARN such as:

yarn service create -f service-spec.json
yarn service stop <service-name>

Jian

> On Oct 6, 2017, at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com> wrote:
> 
> 
>> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
>> 
>>  - Still waiting on Allen to review YARN native services feature.
> 
> 	Fake news.  
> 
> 	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.
> 
> 
> Key problems:
> 
> 	* still haven’t been able to bring up dns daemon due to lacking documentation
> 
> 	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:
> 
> $ yarn —daemon start resourcemanager
> 
> 		vs.
> 
> $ yarn —daemon start apiserver 
> 
> 		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.
> 
> $ yarn service foo
> 
> 		… what does this even mean?
> 
> 	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….
> 
> As a sidenote:
> 
> 	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
> 


Re: 2017-10-06 Hadoop 3 release status update

Posted by Andrew Wang <an...@cloudera.com>.
Thanks for the update Allen, appreciate your continued help reviewing this
feature.

Looking at the calendar, we have three weeks from when we want to have GA
RC0 out for vote. We're already dipping into code freeze time landing HDFS
router-based federation and API-based scheduler configuration next week. If
we want to get any more features in, it means slipping the GA date.

So, my current thinking is that we should draw a line after these pending
branches merge. Like before, I'm willing to bend on this if there are
strong arguments, but the quality bar is even higher than it was for beta1,
and we've still got plenty of other blockers/criticals to work on for GA.

If you feel differently, please reach out, I can make myself very available
next week for a call.

Best,
Andrew

On Fri, Oct 6, 2017 at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com>
wrote:

>
> > On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> >
> >   - Still waiting on Allen to review YARN native services feature.
>
>         Fake news.
>
>         I’m still -1 on it, at least prior to a patch that posted late
> yesterday. I’ll probably have a chance to play with it early next week.
>
>
> Key problems:
>
>         * still haven’t been able to bring up dns daemon due to lacking
> documentation
>
>         * it really needs better naming and command structures.  When put
> into the larger YARN context, it’s very problematic:
>
> $ yarn —daemon start resourcemanager
>
>                 vs.
>
> $ yarn —daemon start apiserver
>
>                 if you awoke from a deep sleep from inside a cave, which
> one would you expect to “start YARN”?     Made worse that the feature is
> called “YARN services” all over the place.
>
> $ yarn service foo
>
>                 … what does this even mean?
>
>         It would be great if other outsiders really looked hard at this
> branch to give the team feedback.   Once it gets released, it’s gonna be
> too late to change it….
>
> As a sidenote:
>
>         It’d be great if the folks working on YARN spent some time
> consolidating daemons.  With this branch, it now feels like we’re
> approaching the double digit area of daemons to turn on all the features.
> It’s well past ridiculous, especially considering we still haven’t replaced
> the MRJHS’s feature set to the point we can turn it off.
>
>

Re: 2017-10-06 Hadoop 3 release status update

Posted by Jian He <jh...@hortonworks.com>.
Hi Allen,

Thanks for spending the time reviewing it.
A new patch was uploaded yesterday on YARN-7198 to address the documentation of missing config, you might want to check.
The api-server is basically a REST server which accepts user requests to deploy services, it now has an option to be run as part of RM, which eliminates one separate daemon.

We are open to naming suggestions. So far we used ‘service’ keyword to indicate this feature. E.g. 
"yarn service” sub-command is used to manage services deployed on YARN such as:

yarn service create -f service-spec.json
yarn service stop <service-name>

Jian

> On Oct 6, 2017, at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com> wrote:
> 
> 
>> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
>> 
>>  - Still waiting on Allen to review YARN native services feature.
> 
> 	Fake news.  
> 
> 	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.
> 
> 
> Key problems:
> 
> 	* still haven’t been able to bring up dns daemon due to lacking documentation
> 
> 	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:
> 
> $ yarn —daemon start resourcemanager
> 
> 		vs.
> 
> $ yarn —daemon start apiserver 
> 
> 		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.
> 
> $ yarn service foo
> 
> 		… what does this even mean?
> 
> 	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….
> 
> As a sidenote:
> 
> 	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
> 


Re: 2017-10-06 Hadoop 3 release status update

Posted by Jian He <jh...@hortonworks.com>.
Hi Allen,

Thanks for spending the time reviewing it.
A new patch was uploaded yesterday on YARN-7198 to address the documentation of missing config, you might want to check.
The api-server is basically a REST server which accepts user requests to deploy services, it now has an option to be run as part of RM, which eliminates one separate daemon.

We are open to naming suggestions. So far we used ‘service’ keyword to indicate this feature. E.g. 
"yarn service” sub-command is used to manage services deployed on YARN such as:

yarn service create -f service-spec.json
yarn service stop <service-name>

Jian

> On Oct 6, 2017, at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com> wrote:
> 
> 
>> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
>> 
>>  - Still waiting on Allen to review YARN native services feature.
> 
> 	Fake news.  
> 
> 	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.
> 
> 
> Key problems:
> 
> 	* still haven’t been able to bring up dns daemon due to lacking documentation
> 
> 	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:
> 
> $ yarn —daemon start resourcemanager
> 
> 		vs.
> 
> $ yarn —daemon start apiserver 
> 
> 		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.
> 
> $ yarn service foo
> 
> 		… what does this even mean?
> 
> 	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….
> 
> As a sidenote:
> 
> 	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
> 


Re: 2017-10-06 Hadoop 3 release status update

Posted by Andrew Wang <an...@cloudera.com>.
Thanks for the update Allen, appreciate your continued help reviewing this
feature.

Looking at the calendar, we have three weeks from when we want to have GA
RC0 out for vote. We're already dipping into code freeze time landing HDFS
router-based federation and API-based scheduler configuration next week. If
we want to get any more features in, it means slipping the GA date.

So, my current thinking is that we should draw a line after these pending
branches merge. Like before, I'm willing to bend on this if there are
strong arguments, but the quality bar is even higher than it was for beta1,
and we've still got plenty of other blockers/criticals to work on for GA.

If you feel differently, please reach out, I can make myself very available
next week for a call.

Best,
Andrew

On Fri, Oct 6, 2017 at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com>
wrote:

>
> > On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> >
> >   - Still waiting on Allen to review YARN native services feature.
>
>         Fake news.
>
>         I’m still -1 on it, at least prior to a patch that posted late
> yesterday. I’ll probably have a chance to play with it early next week.
>
>
> Key problems:
>
>         * still haven’t been able to bring up dns daemon due to lacking
> documentation
>
>         * it really needs better naming and command structures.  When put
> into the larger YARN context, it’s very problematic:
>
> $ yarn —daemon start resourcemanager
>
>                 vs.
>
> $ yarn —daemon start apiserver
>
>                 if you awoke from a deep sleep from inside a cave, which
> one would you expect to “start YARN”?     Made worse that the feature is
> called “YARN services” all over the place.
>
> $ yarn service foo
>
>                 … what does this even mean?
>
>         It would be great if other outsiders really looked hard at this
> branch to give the team feedback.   Once it gets released, it’s gonna be
> too late to change it….
>
> As a sidenote:
>
>         It’d be great if the folks working on YARN spent some time
> consolidating daemons.  With this branch, it now feels like we’re
> approaching the double digit area of daemons to turn on all the features.
> It’s well past ridiculous, especially considering we still haven’t replaced
> the MRJHS’s feature set to the point we can turn it off.
>
>

Re: 2017-10-06 Hadoop 3 release status update

Posted by Andrew Wang <an...@cloudera.com>.
Thanks for the update Allen, appreciate your continued help reviewing this
feature.

Looking at the calendar, we have three weeks from when we want to have GA
RC0 out for vote. We're already dipping into code freeze time landing HDFS
router-based federation and API-based scheduler configuration next week. If
we want to get any more features in, it means slipping the GA date.

So, my current thinking is that we should draw a line after these pending
branches merge. Like before, I'm willing to bend on this if there are
strong arguments, but the quality bar is even higher than it was for beta1,
and we've still got plenty of other blockers/criticals to work on for GA.

If you feel differently, please reach out, I can make myself very available
next week for a call.

Best,
Andrew

On Fri, Oct 6, 2017 at 3:12 PM, Allen Wittenauer <aw...@effectivemachines.com>
wrote:

>
> > On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> >
> >   - Still waiting on Allen to review YARN native services feature.
>
>         Fake news.
>
>         I’m still -1 on it, at least prior to a patch that posted late
> yesterday. I’ll probably have a chance to play with it early next week.
>
>
> Key problems:
>
>         * still haven’t been able to bring up dns daemon due to lacking
> documentation
>
>         * it really needs better naming and command structures.  When put
> into the larger YARN context, it’s very problematic:
>
> $ yarn —daemon start resourcemanager
>
>                 vs.
>
> $ yarn —daemon start apiserver
>
>                 if you awoke from a deep sleep from inside a cave, which
> one would you expect to “start YARN”?     Made worse that the feature is
> called “YARN services” all over the place.
>
> $ yarn service foo
>
>                 … what does this even mean?
>
>         It would be great if other outsiders really looked hard at this
> branch to give the team feedback.   Once it gets released, it’s gonna be
> too late to change it….
>
> As a sidenote:
>
>         It’d be great if the folks working on YARN spent some time
> consolidating daemons.  With this branch, it now feels like we’re
> approaching the double digit area of daemons to turn on all the features.
> It’s well past ridiculous, especially considering we still haven’t replaced
> the MRJHS’s feature set to the point we can turn it off.
>
>

Re: 2017-10-06 Hadoop 3 release status update

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
> 
>   - Still waiting on Allen to review YARN native services feature.

	Fake news.  

	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.


Key problems:

	* still haven’t been able to bring up dns daemon due to lacking documentation

	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:

$ yarn —daemon start resourcemanager

		vs.

$ yarn —daemon start apiserver 

		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.

$ yarn service foo

		… what does this even mean?

	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….

As a sidenote:

	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Re: 2017-10-06 Hadoop 3 release status update

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
> 
>   - Still waiting on Allen to review YARN native services feature.

	Fake news.  

	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.


Key problems:

	* still haven’t been able to bring up dns daemon due to lacking documentation

	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:

$ yarn —daemon start resourcemanager

		vs.

$ yarn —daemon start apiserver 

		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.

$ yarn service foo

		… what does this even mean?

	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….

As a sidenote:

	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.


---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org


Re: 2017-10-06 Hadoop 3 release status update

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
> 
>   - Still waiting on Allen to review YARN native services feature.

	Fake news.  

	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.


Key problems:

	* still haven’t been able to bring up dns daemon due to lacking documentation

	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:

$ yarn —daemon start resourcemanager

		vs.

$ yarn —daemon start apiserver 

		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.

$ yarn service foo

		… what does this even mean?

	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….

As a sidenote:

	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: 2017-10-06 Hadoop 3 release status update

Posted by Allen Wittenauer <aw...@effectivemachines.com>.
> On Oct 6, 2017, at 1:31 PM, Andrew Wang <an...@cloudera.com> wrote:
> 
>   - Still waiting on Allen to review YARN native services feature.

	Fake news.  

	I’m still -1 on it, at least prior to a patch that posted late yesterday. I’ll probably have a chance to play with it early next week.


Key problems:

	* still haven’t been able to bring up dns daemon due to lacking documentation

	* it really needs better naming and command structures.  When put into the larger YARN context, it’s very problematic:

$ yarn —daemon start resourcemanager

		vs.

$ yarn —daemon start apiserver 

		if you awoke from a deep sleep from inside a cave, which one would you expect to “start YARN”?     Made worse that the feature is called “YARN services” all over the place.

$ yarn service foo

		… what does this even mean?

	It would be great if other outsiders really looked hard at this branch to give the team feedback.   Once it gets released, it’s gonna be too late to change it….

As a sidenote:

	It’d be great if the folks working on YARN spent some time consolidating daemons.  With this branch, it now feels like we’re approaching the double digit area of daemons to turn on all the features.  It’s well past ridiculous, especially considering we still haven’t replaced the MRJHS’s feature set to the point we can turn it off.


---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org