You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Keith Wiley <kw...@keithwiley.com> on 2012/05/18 01:47:18 UTC

Oozie vs. YARN "application"

Hadoop 0.23 (MRv2 or YARN) provides the concept of an "application", which is described as either a classic single MR job *or* a DAG of such jobs...which at a glance appears redundant with Oozie's primary purpose.

How do they differ?  Does 0.23 effectively obviate the service that Oozie provides or is Oozie more powerful than a YARN "application"...or simply more different from an application than I have conceptualized to the point that they don't necessarily step on one another's toes?

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
                                           --  Mark Twain
________________________________________________________________________________


Re: Oozie vs. YARN "application"

Posted by Keith Wiley <kw...@keithwiley.com>.
Thank you very much.  I appreciate the clarification.

Cheers!

On May 17, 2012, at 22:11 , Robert Evans wrote:

> Keith,
> 
> The documentation is incorrect.  [etc.]


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Luminous beings are we, not this crude matter."
                                           --  Yoda
________________________________________________________________________________


Re: Oozie vs. YARN "application"

Posted by Robert Evans <ev...@yahoo-inc.com>.
Keith,

The documentation is incorrect.  The plan is to be able to support such a thing, but it has not been implemented yet.  I would like to see it be part of the core map/reduce when it does happen because there are several different projects that could share this functionality like oozie, pig and hive.  So in that case, when it does show up it is likely to a superset of the functionality supported by Oozie, minus the functionality that Arun mentioned, like triggering of jobs through data availability and on a regular time interval. Hopefully Oozie would eventually also move to use it.  It would also allow such projects to potentially share DAG level optimizations, like reducing or even eliminating writing temporary output to HDFS in between small jobs similar to what spark does.

A DAGApplicationMaster would probably not be a DAG of generic applications it would probably be a DAG of mapreduce jobs with a few other things like what oozie supports in their DAG definitions.  The reason for this is because for the DAG Application Master to truly be generic it would need to launch other Application Masters in separate containers where as if we limit it to just a subset of AMs we would not have to launch the separate processes, and we could provide the MR specific DAG level optimizations like I stated previously.  We could still support launching of other AMs for completeness sake, but I see that as a lower priority.

--Bobby Evans


On 5/17/12 9:29 PM, "Keith Wiley" <kw...@keithwiley.com> wrote:

On May 17, 2012, at 17:49 , Arun C Murthy wrote:

> Currently YARN doesn't offer anything to manage a DAG of applications.

Well, there is the following webpage:
http://hadoop.apache.org/common/docs/r0.23.1/hadoop-yarn/hadoop-yarn-site/YARN.html

which suggests that YARN supports a dag of MR jobs within a YARN application (second paragraph, last sentence).  True, it is a dag of jobs within an application, not a dag of applications, but that wasn't really my original question.  My question was how the dag structure offered by YARN differs from that offered by Oozie.

It doesn't seem like the responses to my question so far have adequately reconciled Oozie's dag of jobs with YARN's dag of jobs.  To the contrary, the only response I've gotten so far seems to suggest that the webpage above is simply wrong and YARN offers no form of multi-job dag at all; no response in this thread has confirmed it for example.

> It's fairly easy to implement a DAGApplicationMaster to manage a set of applications (whether MR or others).

Right, but that applies to whole applications.  Isn't a dag *of* jobs within an application rather analogous to what Oozie does?  Bear in mind, that is the entire premise of my original question (the degree of similarity between these two multi-job dag coordination systems).  The distinction between jobs and applications is only relevant after the relationship to Oozie has been established, since that was my original question.

I'm really sorry about the apparent misunderstanding.  I didn't intend any confusion on the matter.  I simply read the webpage at all and was immediately curious about its implications for Oozie, that's all.

> Arun
>
> PS: Please use mapreduce-dev@ for technical discussions, general@ is used for project discussions/announcements. Thanks.


Oof, sorry about that.  It's hard to move a thread mid-discussion of course since that messes up the archives and I still don't feel that the text on the webpage quoted above, which clearly describes YARN's dag of jobs, has been addressed, so I'm carrying on for the sake of "the historical record", but I apologize for not targeting my question at the most relevant mailing list.  A mailing list named "general" struck me as, well, general, but I must have misinterpreted it.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland
________________________________________________________________________________



Re: Oozie vs. YARN "application"

Posted by Keith Wiley <kw...@keithwiley.com>.
On May 17, 2012, at 17:49 , Arun C Murthy wrote:

> Currently YARN doesn't offer anything to manage a DAG of applications.

Well, there is the following webpage:
http://hadoop.apache.org/common/docs/r0.23.1/hadoop-yarn/hadoop-yarn-site/YARN.html

which suggests that YARN supports a dag of MR jobs within a YARN application (second paragraph, last sentence).  True, it is a dag of jobs within an application, not a dag of applications, but that wasn't really my original question.  My question was how the dag structure offered by YARN differs from that offered by Oozie.

It doesn't seem like the responses to my question so far have adequately reconciled Oozie's dag of jobs with YARN's dag of jobs.  To the contrary, the only response I've gotten so far seems to suggest that the webpage above is simply wrong and YARN offers no form of multi-job dag at all; no response in this thread has confirmed it for example.

> It's fairly easy to implement a DAGApplicationMaster to manage a set of applications (whether MR or others).

Right, but that applies to whole applications.  Isn't a dag *of* jobs within an application rather analogous to what Oozie does?  Bear in mind, that is the entire premise of my original question (the degree of similarity between these two multi-job dag coordination systems).  The distinction between jobs and applications is only relevant after the relationship to Oozie has been established, since that was my original question.

I'm really sorry about the apparent misunderstanding.  I didn't intend any confusion on the matter.  I simply read the webpage at all and was immediately curious about its implications for Oozie, that's all.

> Arun
> 
> PS: Please use mapreduce-dev@ for technical discussions, general@ is used for project discussions/announcements. Thanks.


Oof, sorry about that.  It's hard to move a thread mid-discussion of course since that messes up the archives and I still don't feel that the text on the webpage quoted above, which clearly describes YARN's dag of jobs, has been addressed, so I'm carrying on for the sake of "the historical record", but I apologize for not targeting my question at the most relevant mailing list.  A mailing list named "general" struck me as, well, general, but I must have misinterpreted it.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland
________________________________________________________________________________


Re: Oozie vs. YARN "application"

Posted by Keith Wiley <kw...@keithwiley.com>.
On May 17, 2012, at 17:49 , Arun C Murthy wrote:

> Currently YARN doesn't offer anything to manage a DAG of applications.

Well, there is the following webpage:
http://hadoop.apache.org/common/docs/r0.23.1/hadoop-yarn/hadoop-yarn-site/YARN.html

which suggests that YARN supports a dag of MR jobs within a YARN application (second paragraph, last sentence).  True, it is a dag of jobs within an application, not a dag of applications, but that wasn't really my original question.  My question was how the dag structure offered by YARN differs from that offered by Oozie.

It doesn't seem like the responses to my question so far have adequately reconciled Oozie's dag of jobs with YARN's dag of jobs.  To the contrary, the only response I've gotten so far seems to suggest that the webpage above is simply wrong and YARN offers no form of multi-job dag at all; no response in this thread has confirmed it for example.

> It's fairly easy to implement a DAGApplicationMaster to manage a set of applications (whether MR or others).

Right, but that applies to whole applications.  Isn't a dag *of* jobs within an application rather analogous to what Oozie does?  Bear in mind, that is the entire premise of my original question (the degree of similarity between these two multi-job dag coordination systems).  The distinction between jobs and applications is only relevant after the relationship to Oozie has been established, since that was my original question.

I'm really sorry about the apparent misunderstanding.  I didn't intend any confusion on the matter.  I simply read the webpage at all and was immediately curious about its implications for Oozie, that's all.

> Arun
> 
> PS: Please use mapreduce-dev@ for technical discussions, general@ is used for project discussions/announcements. Thanks.


Oof, sorry about that.  It's hard to move a thread mid-discussion of course since that messes up the archives and I still don't feel that the text on the webpage quoted above, which clearly describes YARN's dag of jobs, has been addressed, so I'm carrying on for the sake of "the historical record", but I apologize for not targeting my question at the most relevant mailing list.  A mailing list named "general" struck me as, well, general, but I must have misinterpreted it.

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
                                           --  Edwin A. Abbott, Flatland
________________________________________________________________________________


Re: Oozie vs. YARN "application"

Posted by Arun C Murthy <ac...@hortonworks.com>.
Currently YARN doesn't offer anything to manage a DAG of applications.

It's fairly easy to implement a DAGApplicationMaster to manage a set of applications (whether MR or others).

Arun

PS: Please use mapreduce-dev@ for technical discussions, general@ is used for project discussions/announcements. Thanks.

On May 17, 2012, at 5:41 PM, Keith Wiley wrote:

> Just to be clear yarn does offer something described as dag management, but perhaps you mean it doesn't offer a dag *of* applications, only of mr jobs *within* an application. Is that what you mean?
> ________________________________________
> Sent from my phone, please excuse my brevity.
> Keith Wiley, kwiley@keithwiley.com, http://keithwiley.com
> 
> 
> Arun C Murthy <ac...@hortonworks.com> wrote:
> 
> YARN doesn't yet have a DAGApplicationMaster which can handle a DAG of jobs.
> 
> Conceivable Oozie could offload the DAG management once it's available.
> 
> OTOH, Oozie provides much more than just DAG management - it provides time & data-availability based scheduling of workflows.
> 
> hth,
> Arun
> 
> On May 17, 2012, at 4:47 PM, Keith Wiley wrote:
> 
>> Hadoop 0.23 (MRv2 or YARN) provides the concept of an "application", which is described as either a classic single MR job *or* a DAG of such jobs...which at a glance appears redundant with Oozie's primary purpose.
>> 
>> How do they differ? Does 0.23 effectively obviate the service that Oozie provides or is Oozie more powerful than a YARN "application"...or simply more different from an application than I have conceptualized to the point that they don't necessarily step on one another's toes?
>> 
>> _____________________________________________
> 
>> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>> 
>> "The easy confidence with which I know another man's religion is folly teaches
>> me to suspect that my own is also."
>> -- Mark Twain
>> _____________________________________________
> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Oozie vs. YARN "application"

Posted by Keith Wiley <kw...@keithwiley.com>.
Just to be clear yarn does offer something described as dag management, but perhaps you mean it doesn't offer a dag *of* applications, only of mr jobs *within* an application. Is that what you mean?
________________________________________
Sent from my phone, please excuse my brevity.
Keith Wiley, kwiley@keithwiley.com, http://keithwiley.com


Arun C Murthy <ac...@hortonworks.com> wrote:

YARN doesn't yet have a DAGApplicationMaster which can handle a DAG of jobs.

Conceivable Oozie could offload the DAG management once it's available.

OTOH, Oozie provides much more than just DAG management - it provides time & data-availability based scheduling of workflows.

hth,
Arun

On May 17, 2012, at 4:47 PM, Keith Wiley wrote:

> Hadoop 0.23 (MRv2 or YARN) provides the concept of an "application", which is described as either a classic single MR job *or* a DAG of such jobs...which at a glance appears redundant with Oozie's primary purpose.
> 
> How do they differ? Does 0.23 effectively obviate the service that Oozie provides or is Oozie more powerful than a YARN "application"...or simply more different from an application than I have conceptualized to the point that they don't necessarily step on one another's toes?
> 
>_____________________________________________

> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
> 
> "The easy confidence with which I know another man's religion is folly teaches
> me to suspect that my own is also."
> -- Mark Twain
>_____________________________________________

> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Oozie vs. YARN "application"

Posted by Arun C Murthy <ac...@hortonworks.com>.
YARN doesn't yet have a DAGApplicationMaster which can handle a DAG of jobs.

Conceivable Oozie could offload the DAG management once it's available.

OTOH, Oozie provides much more than just DAG management - it provides time & data-availability based scheduling of workflows.

hth,
Arun

On May 17, 2012, at 4:47 PM, Keith Wiley wrote:

> Hadoop 0.23 (MRv2 or YARN) provides the concept of an "application", which is described as either a classic single MR job *or* a DAG of such jobs...which at a glance appears redundant with Oozie's primary purpose.
> 
> How do they differ?  Does 0.23 effectively obviate the service that Oozie provides or is Oozie more powerful than a YARN "application"...or simply more different from an application than I have conceptualized to the point that they don't necessarily step on one another's toes?
> 
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
> 
> "The easy confidence with which I know another man's religion is folly teaches
> me to suspect that my own is also."
>                                           --  Mark Twain
> ________________________________________________________________________________
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/