You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Alexey Yakubovich <al...@gmail.com> on 2012/02/27 00:41:36 UTC

Oozie map launcher and multiple MR jobs on Hadoop

This is rather an academical question .. I just try to understand how
Oozie / Hadoop work together.

In the Apache-incubator documentation, there is a picture and
description about Map launcher.
http://incubator.apache.org/oozie/overview.html#Launcher-Mapper
That pic/desc. show multiple map-reduce jobs coordinated from map
launcher (one instance?).

Does it mean, that map launcher is kind of daemon, that runs all the
time and processes multiple requests from Oozie server (delivered
through Job Tracker),

Or it means that one request from Oozie server can actually trigger
multiple map-reduce jobs?

Or it means something else? Or this is an error in doc?

Also: is this Map launcher used only for Oozie originated jobs? Wjaty
about job from Oozie that come from java actions? What java classes in
Hadoop (Oozie) implement Map launcher?

Thanks

Alexey

Re: Oozie map launcher and multiple MR jobs on Hadoop

Posted by Alejandro Abdelnur <tu...@gmail.com>.
Alexey,

The Java action is the basic case of a launcher job, you have to
specify the Java Main class you want to execute and it will be
executed in the launcher job JVM. I'm saying the 'basic case' because
in the case of MR/Pig/Hive/Sqoop/DistCp Oozie uses built-in Main
classes.

Hope this clarifies.

Alejandro

On Mon, Feb 27, 2012 at 1:18 PM, Alexey Yakubovich <al...@gmail.com> wrote:
> Mohammad,
>
> Thanks for answer, it makes it clear.
>
> Can I ask you to elaborate a little more on the java action. You said:
> "For  java action, Oozie executes the java code  and control gives to
> user code".
>
> But java action still executes specified code on Hadoop cluster (not
> in Oozie server process on edge node), and MR-Admin shows it as MR job
> with one mapper. So does this process goes through Job Tracker, as
> with Oozie MR action?
>
> Thanks
>
> Alexey
>
> On 2/26/12, Mohammad Islam <mi...@yahoo.com> wrote:
>> Hi Alexey,
>> Thanks for pointing it out.
>> Indeed doc is confusing.
>>
>> The example is only for pig action where pig could submit multiple MR jobs.
>> We will take action to clarify this.
>>
>> General steps are:
>>
>> 1. For each action Oozie submits a launcher MR job (map-only)
>> 2. Launcher map in turn executes the action specific task as follow:
>>       a. For  java action, Oozie executes the java code  and control gives
>> to user code.
>>       b. For pig/hive action, it calls the Pig/Hive API to execute the
>> script. The control moves to pig/hive API.
>>       c. For MR action,  it submits MR job using hadoop JobClient API  and
>> return the Launcher mapper right-a-way.
>>     Note: All actions type (except MR) wait for the execution to finish
>> before completing the Launcher MR job.
>>
>>
>> Addressing your comments:
>>
>>>Does it mean, that map launcher is kind of daemon, that runs all the
>>>time and processes multiple requests from Oozie server (delivered
>>>through Job Tracker),
>>
>> No. It submits a new Launche job for each action.
>>
>>
>>>Or it means that one request from Oozie server can actually trigger
>>>multiple map-reduce jobs?
>>
>>
>> No. it can't. For pig and hive , it could be but those are not in Oozie
>> Launcher's control.
>>
>>>Or it means something else? Or this is an error in doc?
>>
>>
>> Doc doesn't explain this well.
>>
>>>Also: is this Map launcher used only for Oozie originated jobs? Wjaty
>>>about job from Oozie that come from java actions? What java classes in
>>>Hadoop (Oozie) implement Map launcher?
>>
>>
>> Oozie only controls the job directly submitted through it. For example,if
>> java code submits another set of MR jobs , Oozie will not know that other
>> than waiting for the Java code to return.
>>
>>
>> Please let me know if it is not clear.
>>
>> Regards,
>> Mohammad
>>
>>
>>
>>
>> ----- Original Message -----
>> From: Alexey Yakubovich <al...@gmail.com>
>> To: oozie-users@incubator.apache.org
>> Cc:
>> Sent: Sunday, February 26, 2012 3:41 PM
>> Subject: Oozie map launcher and multiple MR jobs on Hadoop
>>
>> This is rather an academical question .. I just try to understand how
>> Oozie / Hadoop work together.
>>
>> In the Apache-incubator documentation, there is a picture and
>> description about Map launcher.
>> http://incubator.apache.org/oozie/overview.html#Launcher-Mapper
>> That pic/desc. show multiple map-reduce jobs coordinated from map
>> launcher (one instance?).
>>
>> Does it mean, that map launcher is kind of daemon, that runs all the
>> time and processes multiple requests from Oozie server (delivered
>> through Job Tracker),
>>
>> Or it means that one request from Oozie server can actually trigger
>> multiple map-reduce jobs?
>>
>> Or it means something else? Or this is an error in doc?
>>
>> Also: is this Map launcher used only for Oozie originated jobs? Wjaty
>> about job from Oozie that come from java actions? What java classes in
>> Hadoop (Oozie) implement Map launcher?
>>
>> Thanks
>>
>> Alexey
>>
>>

Re: Oozie map launcher and multiple MR jobs on Hadoop

Posted by Alexey Yakubovich <al...@gmail.com>.
Mohammad,

Thanks for answer, it makes it clear.

Can I ask you to elaborate a little more on the java action. You said:
"For  java action, Oozie executes the java code  and control gives to
user code".

But java action still executes specified code on Hadoop cluster (not
in Oozie server process on edge node), and MR-Admin shows it as MR job
with one mapper. So does this process goes through Job Tracker, as
with Oozie MR action?

Thanks

Alexey

On 2/26/12, Mohammad Islam <mi...@yahoo.com> wrote:
> Hi Alexey,
> Thanks for pointing it out.
> Indeed doc is confusing.
>
> The example is only for pig action where pig could submit multiple MR jobs.
> We will take action to clarify this.
>
> General steps are:
>
> 1. For each action Oozie submits a launcher MR job (map-only)
> 2. Launcher map in turn executes the action specific task as follow:
>       a. For  java action, Oozie executes the java code  and control gives
> to user code.
>       b. For pig/hive action, it calls the Pig/Hive API to execute the
> script. The control moves to pig/hive API.
>       c. For MR action,  it submits MR job using hadoop JobClient API  and
> return the Launcher mapper right-a-way.
>     Note: All actions type (except MR) wait for the execution to finish
> before completing the Launcher MR job.
>
>
> Addressing your comments:
>
>>Does it mean, that map launcher is kind of daemon, that runs all the
>>time and processes multiple requests from Oozie server (delivered
>>through Job Tracker),
>
> No. It submits a new Launche job for each action.
>
>
>>Or it means that one request from Oozie server can actually trigger
>>multiple map-reduce jobs?
>
>
> No. it can't. For pig and hive , it could be but those are not in Oozie
> Launcher's control.
>
>>Or it means something else? Or this is an error in doc?
>
>
> Doc doesn't explain this well.
>
>>Also: is this Map launcher used only for Oozie originated jobs? Wjaty
>>about job from Oozie that come from java actions? What java classes in
>>Hadoop (Oozie) implement Map launcher?
>
>
> Oozie only controls the job directly submitted through it. For example,if
> java code submits another set of MR jobs , Oozie will not know that other
> than waiting for the Java code to return.
>
>
> Please let me know if it is not clear.
>
> Regards,
> Mohammad
>
>
>
>
> ----- Original Message -----
> From: Alexey Yakubovich <al...@gmail.com>
> To: oozie-users@incubator.apache.org
> Cc:
> Sent: Sunday, February 26, 2012 3:41 PM
> Subject: Oozie map launcher and multiple MR jobs on Hadoop
>
> This is rather an academical question .. I just try to understand how
> Oozie / Hadoop work together.
>
> In the Apache-incubator documentation, there is a picture and
> description about Map launcher.
> http://incubator.apache.org/oozie/overview.html#Launcher-Mapper
> That pic/desc. show multiple map-reduce jobs coordinated from map
> launcher (one instance?).
>
> Does it mean, that map launcher is kind of daemon, that runs all the
> time and processes multiple requests from Oozie server (delivered
> through Job Tracker),
>
> Or it means that one request from Oozie server can actually trigger
> multiple map-reduce jobs?
>
> Or it means something else? Or this is an error in doc?
>
> Also: is this Map launcher used only for Oozie originated jobs? Wjaty
> about job from Oozie that come from java actions? What java classes in
> Hadoop (Oozie) implement Map launcher?
>
> Thanks
>
> Alexey
>
>

Re: Oozie map launcher and multiple MR jobs on Hadoop

Posted by Mohammad Islam <mi...@yahoo.com>.
Hi Alexey,
Thanks for pointing it out.
Indeed doc is confusing.

The example is only for pig action where pig could submit multiple MR jobs. We will take action to clarify this.

General steps are:

1. For each action Oozie submits a launcher MR job (map-only)
2. Launcher map in turn executes the action specific task as follow:
      a. For  java action, Oozie executes the java code  and control gives to user code.
      b. For pig/hive action, it calls the Pig/Hive API to execute the script. The control moves to pig/hive API.
      c. For MR action,  it submits MR job using hadoop JobClient API  and return the Launcher mapper right-a-way.
    Note: All actions type (except MR) wait for the execution to finish before completing the Launcher MR job.


Addressing your comments:

>Does it mean, that map launcher is kind of daemon, that runs all the
>time and processes multiple requests from Oozie server (delivered
>through Job Tracker),

No. It submits a new Launche job for each action.


>Or it means that one request from Oozie server can actually trigger
>multiple map-reduce jobs?


No. it can't. For pig and hive , it could be but those are not in Oozie Launcher's control.

>Or it means something else? Or this is an error in doc?


Doc doesn't explain this well.

>Also: is this Map launcher used only for Oozie originated jobs? Wjaty
>about job from Oozie that come from java actions? What java classes in
>Hadoop (Oozie) implement Map launcher?


Oozie only controls the job directly submitted through it. For example,if java code submits another set of MR jobs , Oozie will not know that other than waiting for the Java code to return.


Please let me know if it is not clear.

Regards,
Mohammad  




----- Original Message -----
From: Alexey Yakubovich <al...@gmail.com>
To: oozie-users@incubator.apache.org
Cc: 
Sent: Sunday, February 26, 2012 3:41 PM
Subject: Oozie map launcher and multiple MR jobs on Hadoop

This is rather an academical question .. I just try to understand how
Oozie / Hadoop work together.

In the Apache-incubator documentation, there is a picture and
description about Map launcher.
http://incubator.apache.org/oozie/overview.html#Launcher-Mapper
That pic/desc. show multiple map-reduce jobs coordinated from map
launcher (one instance?).

Does it mean, that map launcher is kind of daemon, that runs all the
time and processes multiple requests from Oozie server (delivered
through Job Tracker),

Or it means that one request from Oozie server can actually trigger
multiple map-reduce jobs?

Or it means something else? Or this is an error in doc?

Also: is this Map launcher used only for Oozie originated jobs? Wjaty
about job from Oozie that come from java actions? What java classes in
Hadoop (Oozie) implement Map launcher?

Thanks

Alexey