You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Allen Wittenauer <aw...@effectivemachines.com> on 2017/10/10 00:51:05 UTC

YARN native services Re: 2017-10-06 Hadoop 3 release status update

> On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com> wrote:
> yarn application -deploy –f spec.json
> yarn application -stop <service-name>
> yarn application -restart <service-name>
> yarn application -remove <service-name>
> 
> and
> 
> yarn application –list will display both application list from RM as well as docker services?
	
	IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

> I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

	I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

	I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
   exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

	From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand. 

	But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation: 

	"Start all the hadoop components HDFS, YARN as usual.”

		a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”. 
		b) Minimally, this should link to ClusterSetup. 

	Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics. 

	Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.



---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org


Re: YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Jian He <jh...@hortonworks.com>.
Allen,

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:
The goal of this feature is to support container-based services on YARN. The team started with merging slider but built many new stuff like the REST service, the DNS which don’t exist in slider and also rewrote a bunch of stuff in the core.
This thread was supposed for release update. Let’s move the feature discussion to the jira YARN-7127<https://issues.apache.org/jira/browse/YARN-7127>.

Thanks,
Jian


On Oct 9, 2017, at 5:51 PM, Allen Wittenauer <aw...@effectivemachines.com>> wrote:


On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com>> wrote:
yarn application -deploy –f spec.json
yarn application -stop <service-name>
yarn application -restart <service-name>
yarn application -remove <service-name>

and

yarn application –list will display both application list from RM as well as docker services?

IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
  exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand.

But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation:

"Start all the hadoop components HDFS, YARN as usual.”

a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”.
b) Minimally, this should link to ClusterSetup.

Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics.

Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.




Re: YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Jian He <jh...@hortonworks.com>.
Allen,

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:
The goal of this feature is to support container-based services on YARN. The team started with merging slider but built many new stuff like the REST service, the DNS which don’t exist in slider and also rewrote a bunch of stuff in the core.
This thread was supposed for release update. Let’s move the feature discussion to the jira YARN-7127<https://issues.apache.org/jira/browse/YARN-7127>.

Thanks,
Jian


On Oct 9, 2017, at 5:51 PM, Allen Wittenauer <aw...@effectivemachines.com>> wrote:


On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com>> wrote:
yarn application -deploy –f spec.json
yarn application -stop <service-name>
yarn application -restart <service-name>
yarn application -remove <service-name>

and

yarn application –list will display both application list from RM as well as docker services?

IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
  exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand.

But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation:

"Start all the hadoop components HDFS, YARN as usual.”

a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”.
b) Minimally, this should link to ClusterSetup.

Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics.

Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.




Re: YARN native services Re: 2017-10-06 Hadoop 3 release status update

Posted by Jian He <jh...@hortonworks.com>.
Allen,

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:
The goal of this feature is to support container-based services on YARN. The team started with merging slider but built many new stuff like the REST service, the DNS which don’t exist in slider and also rewrote a bunch of stuff in the core.
This thread was supposed for release update. Let’s move the feature discussion to the jira YARN-7127<https://issues.apache.org/jira/browse/YARN-7127>.

Thanks,
Jian


On Oct 9, 2017, at 5:51 PM, Allen Wittenauer <aw...@effectivemachines.com>> wrote:


On Oct 6, 2017, at 5:51 PM, Eric Yang <ey...@hortonworks.com>> wrote:
yarn application -deploy –f spec.json
yarn application -stop <service-name>
yarn application -restart <service-name>
yarn application -remove <service-name>

and

yarn application –list will display both application list from RM as well as docker services?

IMO, that makes much more sense. [*] I’m trying think of a reason why I’d care if something was using this API or not.  It’s not like users can’t run whatever they want as part of their job now.  The break out is really only necessary so I have an idea if something is running that is using the REST API daemon. But more on that later….

I think the development team was concerned that command structure overload between batch applications and long running services.  In my view, there is no difference, they are all applications.  The only distinction is the launching and shutdown of services may be different from batch jobs.  I think user can get used to these command structures without creating additional command grouping.

I pretty much agree.  In fact, I’d love to see ‘yarn application’ even replace ‘yarn jar’. One Interface To Rule Them All.

I was under the impression (and, maybe this was my misunderstanding. if so, sorry) that “the goal” for this first pass was to integrate the existing Apache Slider functionality into YARN.  As it stands, I don’t think those goals have been met.  It doesn’t seem to be much different than just writing a shell profile to call slider directly:

---
function yarn_subcommand_service
{
  exec slider “$@“
}
----

(or whatever). Plus doing it this way, one gets the added benefit of the SIGNIFICANTLY better documentation. (Seriously: well done that team)

From an outside perspective, the extra daemon for running the REST API seems like when it should have clicked that the project is going off the rails and missing the whole “integration” aspect. Integrating the REST API into the RM from day one and the command separation would have also stuck out. If the RM runs the REST API, it now becomes a problem of “how does a user launch more than just a jar easily?” A problem that Hadoop has had since nearly day one.  Redefining the “application” subcommand sounds like a reasonable way to move forward on that problem while also dropping the generic sounding "service" subcommand.

But all that said, it feels like direct integration was avoided from the beginning and I’m unclear as to why. Take this line from the quick start documentation:

"Start all the hadoop components HDFS, YARN as usual.”

a) This sentence is pretty much a declaration that this feature set isn’t part of “YARN”.
b) Minimally, this should link to ClusterSetup.

Anyway, yes, please work on removing all of these extra adoption barriers and increased workload on admin teams with Yet Another Daemon to monitor and collect metrics.

Thanks!

[*] - I’m reminded of a conversation I had with a PMC member year or three ago about HDFS. They proudly almost defiantly stated that the HDFS command structure is such because it resembles the protocols and that was great. Guess what: users’ don’t care about how something is implemented, much less the protocols that are used to drive it. They care about consistency, EOU, and all those feel good things that make applications a joy to use. They have more important stuff to do. Copying the protocols onto the command line only help the person who wrote it and no one else. It’s hard not to walk away from playing with YARN in this branch as exhibiting those same anti-user behaviors.