You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Tina Samuel <ti...@gmail.com> on 2014/05/21 05:49:15 UTC

Modifying Oozie code

I would like to modify the Oozie code to introduce a new scheduling pattern
in Hadoop. I am new to Oozie. I read that there is a file called
workflow.xml which has the actions that are to be performed by Hadoop. I
want to introduce a new field to the job, something like a JOB_TYPE. For
eg, if a job belongs to TYPE_1, then it should be replicated in all the
worker nodes. If a job belongs to TYPE_2, then it should be replicated in
only a fraction of nodes. Is it possible to modify the parser of Oozie
which parses the workflow.xml?Please do help

-- 
Tina

Re: Modifying Oozie code

Posted by Tina Samuel <ti...@gmail.com>.
Hi,
I want to develop a result verification system for which the jobs are to be
replicated in multiple nodes. Once the jobs are completed, I want to make a
callback sort of mechanism to return some hash value computed over the
result and verify if the hash values of the other replicated jobs are the
same. I thought Oozie was responsible for scheduling it. So inorder to suit
the requirement, what should I be actually doing?Should I be changing the
scheduler?Can you give me some guidance on where to modify this code?

Thanks,
Tina


On Fri, May 23, 2014 at 12:49 PM, Harsh J <ha...@cloudera.com> wrote:

> Are you looking to pass information onto Hadoop by detecting a specific
> type configuration, or are you looking to control the job's execution?
>
> I also wish to mention that Oozie is not a Hadoop job scheduler - it is a
> workflow scheduler and works at a higher level above Hadoop. Once an Oozie
> submitted launcher or MR job hits Hadoop, the real scheduling of the tasks
> that the job will need is handled by Hadoop's scheduler (and not by Oozie).
>
> Or to say, Oozie has no notion of a "cluster" and its "nodes". It submits
> packaged and configured jobs onto Hadoop, and lets Hadoop's scheduler
> handle and worry about its execution, distribution, etc..
>
> If you are looking to control actual execution of a Hadoop job, then Oozie
> isn't the right place to do it.
>
>
> On Wed, May 21, 2014 at 9:19 AM, Tina Samuel <ti...@gmail.com>
> wrote:
>
> > I would like to modify the Oozie code to introduce a new scheduling
> pattern
> > in Hadoop. I am new to Oozie. I read that there is a file called
> > workflow.xml which has the actions that are to be performed by Hadoop. I
> > want to introduce a new field to the job, something like a JOB_TYPE. For
> > eg, if a job belongs to TYPE_1, then it should be replicated in all the
> > worker nodes. If a job belongs to TYPE_2, then it should be replicated in
> > only a fraction of nodes. Is it possible to modify the parser of Oozie
> > which parses the workflow.xml?Please do help
> >
> > --
> > Tina
> >
>
>
>
> --
> Harsh J
>



-- 
Tina

Re: Modifying Oozie code

Posted by Harsh J <ha...@cloudera.com>.
Are you looking to pass information onto Hadoop by detecting a specific
type configuration, or are you looking to control the job's execution?

I also wish to mention that Oozie is not a Hadoop job scheduler - it is a
workflow scheduler and works at a higher level above Hadoop. Once an Oozie
submitted launcher or MR job hits Hadoop, the real scheduling of the tasks
that the job will need is handled by Hadoop's scheduler (and not by Oozie).

Or to say, Oozie has no notion of a "cluster" and its "nodes". It submits
packaged and configured jobs onto Hadoop, and lets Hadoop's scheduler
handle and worry about its execution, distribution, etc..

If you are looking to control actual execution of a Hadoop job, then Oozie
isn't the right place to do it.


On Wed, May 21, 2014 at 9:19 AM, Tina Samuel <ti...@gmail.com> wrote:

> I would like to modify the Oozie code to introduce a new scheduling pattern
> in Hadoop. I am new to Oozie. I read that there is a file called
> workflow.xml which has the actions that are to be performed by Hadoop. I
> want to introduce a new field to the job, something like a JOB_TYPE. For
> eg, if a job belongs to TYPE_1, then it should be replicated in all the
> worker nodes. If a job belongs to TYPE_2, then it should be replicated in
> only a fraction of nodes. Is it possible to modify the parser of Oozie
> which parses the workflow.xml?Please do help
>
> --
> Tina
>



-- 
Harsh J

Re: Modifying Oozie code

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Tina,

https://cwiki.apache.org/confluence/display/OOZIE/How+To+Contribute

You will need a maven integration plugin (m2e) in Eclipse to import Oozie
maven project
http://www.eclipse.org/m2e/download/

The latest plugin version was known to give some problems during maven
compilation. If it persists, try an older version 0.12 or so.

‹
Mona


On 5/20/14, 9:23 PM, "Tina Samuel" <ti...@gmail.com> wrote:

>Yes.. 'Fraction' is based on the number of alive nodes. What did you mean
>by Queue approach?Can you clarify it..
>Thanks,
>Tina
>
>
>On Wed, May 21, 2014 at 9:46 AM, Jagat Singh <ja...@gmail.com> wrote:
>
>> What you means by replicated in fraction of nodes? Is it related to
>> capacity of cluster in anyway ?  Have you looked at Queue approach?
>>
>>
>> On Wed, May 21, 2014 at 1:49 PM, Tina Samuel <ti...@gmail.com>
>> wrote:
>>
>> > I would like to modify the Oozie code to introduce a new scheduling
>> pattern
>> > in Hadoop. I am new to Oozie. I read that there is a file called
>> > workflow.xml which has the actions that are to be performed by
>>Hadoop. I
>> > want to introduce a new field to the job, something like a JOB_TYPE.
>>For
>> > eg, if a job belongs to TYPE_1, then it should be replicated in all
>>the
>> > worker nodes. If a job belongs to TYPE_2, then it should be
>>replicated in
>> > only a fraction of nodes. Is it possible to modify the parser of Oozie
>> > which parses the workflow.xml?Please do help
>> >
>> > --
>> > Tina
>> >
>>
>
>
>
>-- 
>Tina


Re: Modifying Oozie code

Posted by Tina Samuel <ti...@gmail.com>.
Yes.. 'Fraction' is based on the number of alive nodes. What did you mean
by Queue approach?Can you clarify it..
Thanks,
Tina


On Wed, May 21, 2014 at 9:46 AM, Jagat Singh <ja...@gmail.com> wrote:

> What you means by replicated in fraction of nodes? Is it related to
> capacity of cluster in anyway ?  Have you looked at Queue approach?
>
>
> On Wed, May 21, 2014 at 1:49 PM, Tina Samuel <ti...@gmail.com>
> wrote:
>
> > I would like to modify the Oozie code to introduce a new scheduling
> pattern
> > in Hadoop. I am new to Oozie. I read that there is a file called
> > workflow.xml which has the actions that are to be performed by Hadoop. I
> > want to introduce a new field to the job, something like a JOB_TYPE. For
> > eg, if a job belongs to TYPE_1, then it should be replicated in all the
> > worker nodes. If a job belongs to TYPE_2, then it should be replicated in
> > only a fraction of nodes. Is it possible to modify the parser of Oozie
> > which parses the workflow.xml?Please do help
> >
> > --
> > Tina
> >
>



-- 
Tina

Re: Modifying Oozie code

Posted by Jagat Singh <ja...@gmail.com>.
What you means by replicated in fraction of nodes? Is it related to
capacity of cluster in anyway ?  Have you looked at Queue approach?


On Wed, May 21, 2014 at 1:49 PM, Tina Samuel <ti...@gmail.com> wrote:

> I would like to modify the Oozie code to introduce a new scheduling pattern
> in Hadoop. I am new to Oozie. I read that there is a file called
> workflow.xml which has the actions that are to be performed by Hadoop. I
> want to introduce a new field to the job, something like a JOB_TYPE. For
> eg, if a job belongs to TYPE_1, then it should be replicated in all the
> worker nodes. If a job belongs to TYPE_2, then it should be replicated in
> only a fraction of nodes. Is it possible to modify the parser of Oozie
> which parses the workflow.xml?Please do help
>
> --
> Tina
>