You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@oozie.apache.org by Evan Pollan <Ev...@bazaarvoice.com> on 2012/01/10 15:57:48 UTC

Oozie and whirr interoperability

All,

I'm new to oozie, but have been using whirr to process data on ephemeral EC2 hadoop clusters.  I'd like to be able to leverage the scheduling and overall job management capabilities of oozie, but I'd like to use whirr to spin up a cluster as needed (and, correspondingly, destroy the cluster when not needed).

My question has to do with the amount of hadoop configuration loaded by the oozie server at startup time, and how an oozie server would handle having a cluster destroyed and subsequently re-created by whirr while the oozie server was up and running.

Architecturally, I'm envisioning the oozie server running on a machine on which whirr is also installed — ideally, oozie would leverage the alternatives-driven cluster configuration that whirr is designed to drive.

Is this a reasonable approach, or is the oozie server too stateful WRT hadoop configuration to have a cluster come and go between workflows?


Thanks,
Evan

Re: Oozie and whirr interoperability

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

 Evan,

Oozie does not have Hadoop config in its config. Workflow jobs point to the
hadoop JT/NN they want to use. So from that perspective you are fine.

Regarding how Oozie would behave if the Hadoop cluster comes and goes.
Oozie will be fine, the issue could be with running workflow jobs that may
fail if the cluster is gone in the middle.

Hope this answers your question.

Thanks.

Alejandro

 *From:* Evan Pollan <Ev...@bazaarvoice.com>
> *To:* "oozie-users@incubator.apache.org" <oo...@incubator.apache.org>
>
> *Sent:* Tuesday, January 10, 2012 6:57 AM
> *Subject:* Oozie and whirr interoperability
>
> All,
>
> I'm new to oozie, but have been using whirr to process data on ephemeral
> EC2 hadoop clusters.  I'd like to be able to leverage the scheduling and
> overall job management capabilities of oozie, but I'd like to use whirr to
> spin up a cluster as needed (and, correspondingly, destroy the cluster when
> not needed).
>
> My question has to do with the amount of hadoop configuration loaded by
> the oozie server at startup time, and how an oozie server would handle
> having a cluster destroyed and subsequently re-created by whirr while the
> oozie server was up and running.
>
> Architecturally, I'm envisioning the oozie server running on a machine on
> which whirr is also installed — ideally, oozie would leverage the
> alternatives-driven cluster configuration that whirr is designed to drive.
>
> Is this a reasonable approach, or is the oozie server too stateful WRT
> hadoop configuration to have a cluster come and go between workflows?
>
>
> Thanks,
> Evan
>
>
>