You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Tina Samuel <ti...@gmail.com> on 2014/05/19 12:08:28 UTC

Modifying Oozie code

I would like to modify the Oozie code to introduce a new scheduling pattern
in Hadoop. I am new to Oozie. I read that there is a file called
workflow.xml which has the actions that are to be performed by Hadoop. I
want to introduce a new field to the job, something like a JOB_TYPE. For
eg, if a job belongs to TYPE_1, then it should be replicated in all the
worker nodes. If a job belongs to TYPE_2, then it should be replicated in
only a fraction of nodes. Is it possible to modify the parser of Oozie
which parses the workflow.xml? Please do help

-- 
Tina

Re: Modifying Oozie code

Posted by Tina Samuel <ti...@gmail.com>.
Hi,
Is it possible to import and modify Oozie code in Eclipse?Can you give me
steps to do so? I could not find a proper documentation of the same. Please
do help.

Thanks,
Tina


On Tue, May 20, 2014 at 9:38 AM, Tina Samuel <ti...@gmail.com> wrote:

> Hi,
> I want to build a result verification system for Map reduce, using the
> concept of replication, for which I am employing 2 kinds of tasks - quizzes
> and map reduce tasks. Basically, if the job type submitted is a quiz, I
> want to replicate it to all the worker nodes, whereas, in the case of map
> reduce tasks, I want to replicate it to only a fraction of the worker
> nodes.For this, I thought of introducing a new field jobtype in the
> workflow.xml, which when parsed would result in a set of new workflow.xml,
> that would replicate the task in the worker nodes, the number of replicas
> based on the job type.I am new to Oozie, so I really have no idea as to
> where the code should be modified and even if the parsing of workflow.xml
> happens or not. Could you tell me if it's possible to modify the Oozie code
> to implement this concept.
> Thanks,
> Tina
>
>
> On Mon, May 19, 2014 at 10:53 PM, Mona Chitnis <ch...@yahoo-inc.com>wrote:
>
>> Hi Tina,
>>
>> Oozie is not meant currently to influence resource management. It
>> coordinates and tracks workflows but the decision about number of M-R
>> tasks (aka number of nodes parallelly executing the workflow) rests with
>> Hadoop. Of course, we can pass mapreduce configuration parameters through
>> Oozie, such as split size, number of map and reduce tasks desired, to
>> influence resource management, but that is done at a best-effort basis in
>> principle.
>>
>> Can you provide us more details of your use-case? It sounds interesting
>> but not sure if Oozie would be the place for this kind of logic.
>>
>> ‹
>> Mona
>>
>> On 5/19/14, 3:08 AM, "Tina Samuel" <ti...@gmail.com> wrote:
>>
>> >I would like to modify the Oozie code to introduce a new scheduling
>> >pattern
>> >in Hadoop. I am new to Oozie. I read that there is a file called
>> >workflow.xml which has the actions that are to be performed by Hadoop. I
>> >want to introduce a new field to the job, something like a JOB_TYPE. For
>> >eg, if a job belongs to TYPE_1, then it should be replicated in all the
>> >worker nodes. If a job belongs to TYPE_2, then it should be replicated in
>> >only a fraction of nodes. Is it possible to modify the parser of Oozie
>> >which parses the workflow.xml? Please do help
>> >
>> >--
>> >Tina
>>
>>
>
>
> --
> Tina
>



-- 
Tina

Re: Modifying Oozie code

Posted by Tina Samuel <ti...@gmail.com>.
Hi,
I want to build a result verification system for Map reduce, using the
concept of replication, for which I am employing 2 kinds of tasks - quizzes
and map reduce tasks. Basically, if the job type submitted is a quiz, I
want to replicate it to all the worker nodes, whereas, in the case of map
reduce tasks, I want to replicate it to only a fraction of the worker
nodes.For this, I thought of introducing a new field jobtype in the
workflow.xml, which when parsed would result in a set of new workflow.xml,
that would replicate the task in the worker nodes, the number of replicas
based on the job type.I am new to Oozie, so I really have no idea as to
where the code should be modified and even if the parsing of workflow.xml
happens or not. Could you tell me if it's possible to modify the Oozie code
to implement this concept.
Thanks,
Tina


On Mon, May 19, 2014 at 10:53 PM, Mona Chitnis <ch...@yahoo-inc.com>wrote:

> Hi Tina,
>
> Oozie is not meant currently to influence resource management. It
> coordinates and tracks workflows but the decision about number of M-R
> tasks (aka number of nodes parallelly executing the workflow) rests with
> Hadoop. Of course, we can pass mapreduce configuration parameters through
> Oozie, such as split size, number of map and reduce tasks desired, to
> influence resource management, but that is done at a best-effort basis in
> principle.
>
> Can you provide us more details of your use-case? It sounds interesting
> but not sure if Oozie would be the place for this kind of logic.
>
> ‹
> Mona
>
> On 5/19/14, 3:08 AM, "Tina Samuel" <ti...@gmail.com> wrote:
>
> >I would like to modify the Oozie code to introduce a new scheduling
> >pattern
> >in Hadoop. I am new to Oozie. I read that there is a file called
> >workflow.xml which has the actions that are to be performed by Hadoop. I
> >want to introduce a new field to the job, something like a JOB_TYPE. For
> >eg, if a job belongs to TYPE_1, then it should be replicated in all the
> >worker nodes. If a job belongs to TYPE_2, then it should be replicated in
> >only a fraction of nodes. Is it possible to modify the parser of Oozie
> >which parses the workflow.xml? Please do help
> >
> >--
> >Tina
>
>


-- 
Tina

Re: Modifying Oozie code

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi Tina,

Oozie is not meant currently to influence resource management. It
coordinates and tracks workflows but the decision about number of M-R
tasks (aka number of nodes parallelly executing the workflow) rests with
Hadoop. Of course, we can pass mapreduce configuration parameters through
Oozie, such as split size, number of map and reduce tasks desired, to
influence resource management, but that is done at a best-effort basis in
principle.

Can you provide us more details of your use-case? It sounds interesting
but not sure if Oozie would be the place for this kind of logic.

‹
Mona

On 5/19/14, 3:08 AM, "Tina Samuel" <ti...@gmail.com> wrote:

>I would like to modify the Oozie code to introduce a new scheduling
>pattern
>in Hadoop. I am new to Oozie. I read that there is a file called
>workflow.xml which has the actions that are to be performed by Hadoop. I
>want to introduce a new field to the job, something like a JOB_TYPE. For
>eg, if a job belongs to TYPE_1, then it should be replicated in all the
>worker nodes. If a job belongs to TYPE_2, then it should be replicated in
>only a fraction of nodes. Is it possible to modify the parser of Oozie
>which parses the workflow.xml? Please do help
>
>-- 
>Tina