You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Inder Pall <in...@gmail.com> on 2011/12/02 15:08:14 UTC

Clarifications on oozie usage

People,

Need some clarifications as i am planning to use OOZIE.

1. if i setup a coordinator to execute a workflow every minute - Will oozie
take care of ensuring if the previous one took more than one minute the new
one will wait in a queue. A scenario wherein multiple instances race due to
some less frequent timeout issues will result in race condition and cause
havoc.
2. What if i don't give endtime - will it run infinitely.
3. Do i need to do anything special to get my logj connected to ozzie or
the log's would show up in the oozie console, job log section.
4. What is the recommended mechanism to get alerted if a job didn't get
triggered or oozie went down.
5. Right now i have installed oozie on one box which is shared with data
node, how can i run 2 servers and will the failover be HOT?

Thanks,
- Inder
Tech Platforms @Inmobi.
Linkedin - http://goo.gl/eR4Ub

Re: Clarifications on oozie usage

Posted by Inder Pall <in...@gmail.com>.
Alejandro,

#It is not clear how you will determine which files are closed and which
ones are not.
>> Producer program is maintaining a symlink to the current file which is
being written everything else is closed.

I agree that we don't have a complex workflow use-case for using oozie to
trigger the job every minute. I will also evaluate the thread executor
service of JAVA. But we'd need oozie to trigger consumer workflow once the
DONE file is created.

Thanks,
- inder

On Mon, Dec 5, 2011 at 11:11 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> Inder,
>
> #1, Again, every minute may overload Oozie. You'll have to do some load
> testing to ensure things work as expected.
>
> #2, It is not clear how you will determine which files are closed and which
> ones are not.
>
> Regards.
>
> Alejandro
>
> On Sat, Dec 3, 2011 at 1:38 AM, Inder Pall <in...@gmail.com> wrote:
>
> > Alejandro,
> >
> > Thanks. Let me elaborate my use-case -
> >
> > 1. we'd receive big data in HDFS in raw directory structure format.
> > 2. every minute we want to check for current available files and move
> them
> > in HDFS in a more structured directory format and a DONE file within the
> > timestamped directory to mark transaction commit for consumer's. Here is
> > where i was looking to use the cron like coordinator.
> > 3. Then we'll have another coordinator which should trigger on creation
> of
> > DONE file so that the consumer can start on it.
> >
> > So basically we won't have a M/R job running in oozie i was more planning
> > to have a multi threaded JAVA action working with HDFS java api's.
> > Is this a right use-case for oozie?
> > What about the scenario of new workflow instance interfering with the old
> > running instance? Can this happen?
> >
> > - Inder
> >
> >
> >
> >
> > On Fri, Dec 2, 2011 at 11:12 PM, Alejandro Abdelnur <tucu@cloudera.com
> > >wrote:
> >
> > > Hi Inder,
> > >
> > > #1, Oozie minimum coord job frequency is 5 mins, you could tweak the
> conf
> > > to be 1 min, but it may overload things depending on your load. Hadoop
> > is a
> > > batch processing system, Oozie was designed with that in mind. Doing a
> 1
> > > min frequency seems more like an almost realtime requirement.
> > >
> > > #2, currently you have to specify an end time, it can be 100 years into
> > the
> > > future.
> > >
> > > #3, you can get oozie jobs logs via the oozie webconsole without any
> > log4j
> > > config.
> > >
> > > #4, you have to implement your own monitoring.
> > >
> > > #5, currently Oozie does not support HOT-HOT. The design accounts for
> it,
> > > and it has been prototyped, but it has not been implemented.
> > >
> > > Thanks.
> > >
> > > Alejandro
> > >
> > > On Fri, Dec 2, 2011 at 6:08 AM, Inder Pall <in...@gmail.com>
> wrote:
> > >
> > > > People,
> > > >
> > > > Need some clarifications as i am planning to use OOZIE.
> > > >
> > > > 1. if i setup a coordinator to execute a workflow every minute - Will
> > > oozie
> > > > take care of ensuring if the previous one took more than one minute
> the
> > > new
> > > > one will wait in a queue. A scenario wherein multiple instances race
> > due
> > > to
> > > > some less frequent timeout issues will result in race condition and
> > cause
> > > > havoc.
> > > > 2. What if i don't give endtime - will it run infinitely.
> > > > 3. Do i need to do anything special to get my logj connected to ozzie
> > or
> > > > the log's would show up in the oozie console, job log section.
> > > > 4. What is the recommended mechanism to get alerted if a job didn't
> get
> > > > triggered or oozie went down.
> > > > 5. Right now i have installed oozie on one box which is shared with
> > data
> > > > node, how can i run 2 servers and will the failover be HOT?
> > > >
> > > > Thanks,
> > > > - Inder
> > > > Tech Platforms @Inmobi.
> > > > Linkedin - http://goo.gl/eR4Ub
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > - Inder
> >  Tech Platforms @Inmobi
> >   Linkedin - http://goo.gl/eR4Ub
> >
>



-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub

Re: Clarifications on oozie usage

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Inder,

#1, Again, every minute may overload Oozie. You'll have to do some load
testing to ensure things work as expected.

#2, It is not clear how you will determine which files are closed and which
ones are not.

Regards.

Alejandro

On Sat, Dec 3, 2011 at 1:38 AM, Inder Pall <in...@gmail.com> wrote:

> Alejandro,
>
> Thanks. Let me elaborate my use-case -
>
> 1. we'd receive big data in HDFS in raw directory structure format.
> 2. every minute we want to check for current available files and move them
> in HDFS in a more structured directory format and a DONE file within the
> timestamped directory to mark transaction commit for consumer's. Here is
> where i was looking to use the cron like coordinator.
> 3. Then we'll have another coordinator which should trigger on creation of
> DONE file so that the consumer can start on it.
>
> So basically we won't have a M/R job running in oozie i was more planning
> to have a multi threaded JAVA action working with HDFS java api's.
> Is this a right use-case for oozie?
> What about the scenario of new workflow instance interfering with the old
> running instance? Can this happen?
>
> - Inder
>
>
>
>
> On Fri, Dec 2, 2011 at 11:12 PM, Alejandro Abdelnur <tucu@cloudera.com
> >wrote:
>
> > Hi Inder,
> >
> > #1, Oozie minimum coord job frequency is 5 mins, you could tweak the conf
> > to be 1 min, but it may overload things depending on your load. Hadoop
> is a
> > batch processing system, Oozie was designed with that in mind. Doing a 1
> > min frequency seems more like an almost realtime requirement.
> >
> > #2, currently you have to specify an end time, it can be 100 years into
> the
> > future.
> >
> > #3, you can get oozie jobs logs via the oozie webconsole without any
> log4j
> > config.
> >
> > #4, you have to implement your own monitoring.
> >
> > #5, currently Oozie does not support HOT-HOT. The design accounts for it,
> > and it has been prototyped, but it has not been implemented.
> >
> > Thanks.
> >
> > Alejandro
> >
> > On Fri, Dec 2, 2011 at 6:08 AM, Inder Pall <in...@gmail.com> wrote:
> >
> > > People,
> > >
> > > Need some clarifications as i am planning to use OOZIE.
> > >
> > > 1. if i setup a coordinator to execute a workflow every minute - Will
> > oozie
> > > take care of ensuring if the previous one took more than one minute the
> > new
> > > one will wait in a queue. A scenario wherein multiple instances race
> due
> > to
> > > some less frequent timeout issues will result in race condition and
> cause
> > > havoc.
> > > 2. What if i don't give endtime - will it run infinitely.
> > > 3. Do i need to do anything special to get my logj connected to ozzie
> or
> > > the log's would show up in the oozie console, job log section.
> > > 4. What is the recommended mechanism to get alerted if a job didn't get
> > > triggered or oozie went down.
> > > 5. Right now i have installed oozie on one box which is shared with
> data
> > > node, how can i run 2 servers and will the failover be HOT?
> > >
> > > Thanks,
> > > - Inder
> > > Tech Platforms @Inmobi.
> > > Linkedin - http://goo.gl/eR4Ub
> > >
> >
>
>
>
> --
> Thanks,
> - Inder
>  Tech Platforms @Inmobi
>   Linkedin - http://goo.gl/eR4Ub
>

Re: Clarifications on oozie usage

Posted by Inder Pall <in...@gmail.com>.
Alejandro,

Thanks. Let me elaborate my use-case -

1. we'd receive big data in HDFS in raw directory structure format.
2. every minute we want to check for current available files and move them
in HDFS in a more structured directory format and a DONE file within the
timestamped directory to mark transaction commit for consumer's. Here is
where i was looking to use the cron like coordinator.
3. Then we'll have another coordinator which should trigger on creation of
DONE file so that the consumer can start on it.

So basically we won't have a M/R job running in oozie i was more planning
to have a multi threaded JAVA action working with HDFS java api's.
Is this a right use-case for oozie?
What about the scenario of new workflow instance interfering with the old
running instance? Can this happen?

- Inder




On Fri, Dec 2, 2011 at 11:12 PM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> Hi Inder,
>
> #1, Oozie minimum coord job frequency is 5 mins, you could tweak the conf
> to be 1 min, but it may overload things depending on your load. Hadoop is a
> batch processing system, Oozie was designed with that in mind. Doing a 1
> min frequency seems more like an almost realtime requirement.
>
> #2, currently you have to specify an end time, it can be 100 years into the
> future.
>
> #3, you can get oozie jobs logs via the oozie webconsole without any log4j
> config.
>
> #4, you have to implement your own monitoring.
>
> #5, currently Oozie does not support HOT-HOT. The design accounts for it,
> and it has been prototyped, but it has not been implemented.
>
> Thanks.
>
> Alejandro
>
> On Fri, Dec 2, 2011 at 6:08 AM, Inder Pall <in...@gmail.com> wrote:
>
> > People,
> >
> > Need some clarifications as i am planning to use OOZIE.
> >
> > 1. if i setup a coordinator to execute a workflow every minute - Will
> oozie
> > take care of ensuring if the previous one took more than one minute the
> new
> > one will wait in a queue. A scenario wherein multiple instances race due
> to
> > some less frequent timeout issues will result in race condition and cause
> > havoc.
> > 2. What if i don't give endtime - will it run infinitely.
> > 3. Do i need to do anything special to get my logj connected to ozzie or
> > the log's would show up in the oozie console, job log section.
> > 4. What is the recommended mechanism to get alerted if a job didn't get
> > triggered or oozie went down.
> > 5. Right now i have installed oozie on one box which is shared with data
> > node, how can i run 2 servers and will the failover be HOT?
> >
> > Thanks,
> > - Inder
> > Tech Platforms @Inmobi.
> > Linkedin - http://goo.gl/eR4Ub
> >
>



-- 
Thanks,
- Inder
  Tech Platforms @Inmobi
  Linkedin - http://goo.gl/eR4Ub

Re: Clarifications on oozie usage

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Hi Inder,

#1, Oozie minimum coord job frequency is 5 mins, you could tweak the conf
to be 1 min, but it may overload things depending on your load. Hadoop is a
batch processing system, Oozie was designed with that in mind. Doing a 1
min frequency seems more like an almost realtime requirement.

#2, currently you have to specify an end time, it can be 100 years into the
future.

#3, you can get oozie jobs logs via the oozie webconsole without any log4j
config.

#4, you have to implement your own monitoring.

#5, currently Oozie does not support HOT-HOT. The design accounts for it,
and it has been prototyped, but it has not been implemented.

Thanks.

Alejandro

On Fri, Dec 2, 2011 at 6:08 AM, Inder Pall <in...@gmail.com> wrote:

> People,
>
> Need some clarifications as i am planning to use OOZIE.
>
> 1. if i setup a coordinator to execute a workflow every minute - Will oozie
> take care of ensuring if the previous one took more than one minute the new
> one will wait in a queue. A scenario wherein multiple instances race due to
> some less frequent timeout issues will result in race condition and cause
> havoc.
> 2. What if i don't give endtime - will it run infinitely.
> 3. Do i need to do anything special to get my logj connected to ozzie or
> the log's would show up in the oozie console, job log section.
> 4. What is the recommended mechanism to get alerted if a job didn't get
> triggered or oozie went down.
> 5. Right now i have installed oozie on one box which is shared with data
> node, how can i run 2 servers and will the failover be HOT?
>
> Thanks,
> - Inder
> Tech Platforms @Inmobi.
> Linkedin - http://goo.gl/eR4Ub
>