You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by mdk-swandha <di...@gmail.com> on 2016/12/01 20:29:16 UTC

Change NN and JobTracker dynamically during runtime

Hi,

I have a use case like this - in a multi cluster (hadoop cluster)
environment if I would like to send a job/oozie workflow to a desired
cluster during runtime, how can this be done.

I see that there is JavaActionExecutor class which read NN and JobTracker
in createBaseHadoopConf method

All HadoopActionExectors are derived from JavaActionExecutor so this seems
to be a place wherein I can insert my code. How can I do this without
disrupting the original flow by adding my hook.

One option is to to derive my new JavaActionExecutor and over ride
createBaseHadoopConf method and then derive all ActionExecutors from my new
JavaActionExecutor. It doesn't seem to be elegant to me, so thought to ask
out here.

Any input will be useful.

Thanks.
-Dipesh

Re: Change NN and JobTracker dynamically during runtime

Posted by Andras Piros <an...@cloudera.com>.
Hi Dipesh,

I'd go for a load balancer option that supports calling another service on
deciding who's next. Let the LB address be provided inside job.properties.

Have you seen *Varnish <http://varnish-cache.org/trac/wiki/LoadBalancing>*?
It's a well configurable LB option that can allegedly *call another script
<https://www.varnish-cache.org/lists/pipermail/varnish-misc/2012-February/021690.html>*
.

Andras

--
Andras PIROS
Software Engineer
<http://www.cloudera.com/>

On Wed, Dec 7, 2016 at 8:28 PM, mdk-swandha <di...@gmail.com>
wrote:

> @Andreas
>
> Hope you understood my use case above. I would appreciate if you please
> shed some more light and share about using load balancer to route jobs and
> keeping this load balancer outside. I would like to know how you are
> suggesting enable this load balancer in RM or NN or you are suggesting to
> write this in my service. Please inform me if you anything is not clear in
> my use case.
>
> Thanks.
>
> On Mon, Dec 5, 2016 at 11:57 AM, mdk-swandha <di...@gmail.com>
> wrote:
>
> > You mean I have to set env variables for each job/workflow execution and
> > then it will be picked up by Oozie. And I should set them in my service
> > (the service which is finding the best cluster?).
> >
> > For example let say I have 3 cluster:
> > - When a job is sent via Oozie/Hue/Zepellin/Livy etc. - they are mapped
> to
> > one cluster and jobs always goes there. Let say this as a default cluster
> > - I have a service which determines what can be best cluster for a given
> > job considering various attributes (availability, data locality, network
> > bandwidth etc.)
> > - This service has exposed an API and caller just passes the required
> > parameters(job/input/output/queue etc.) and this service will return the
> > best available cluster
> >
> > With what I have above, I feel keeping the calling code should be in the
> > caller (Oozie/Zepellin/Any application) should be the way to go to keep
> it
> > simple to isolate JT's default behavior. This won't disrupt existing jobs
> > which are running on these clusters by introducing some new settings. May
> > be I'm missing how are you advising creating load balancer setting in JT
> > and configuring it during runtime. Can you please tell me more how this
> can
> > be done?
> >
> > Thanks.
> > -Dipesh
> >
> >
> >
> > On Mon, Dec 5, 2016 at 10:59 AM, Andras Piros <andras.piros@cloudera.com
> >
> > wrote:
> >
> >> Hi Dipesh,
> >>
> >> during workflow / job submission you can define variables inside
> >> job.properties coming e.g. from env vars that are used in workflow.xml.
> So
> >> much for the flexibility.
> >>
> >> Can you tell me a use case where runtime routing to different JT / NN
> >> instances via Oozie (and not e.g. coming from a load balancer setting
> >> configured runtime) is better?
> >>
> >> Thanks,
> >>
> >> Andras
> >>
> >> --
> >> Andras PIROS
> >> Software Engineer
> >> <http://www.cloudera.com/>
> >>
> >> On Mon, Dec 5, 2016 at 7:45 PM, mdk-swandha <di...@gmail.com>
> >> wrote:
> >>
> >> > Hi Alex,
> >> >
> >> > The idea is to call this external service which will find the best
> >> cluster
> >> > and inform the caller. So today this caller is Oozie, tomorrow it will
> >> be
> >> > Zeppelin or any other application.
> >> >
> >> > How can I provide multiple JT and NN addresses in job.properties? You
> >> mean
> >> > during job/workflow creation? I will still need to overwrite
> >> job.properties
> >> > or provide these values somewhere dynamically?
> >> >
> >> > Thanks.
> >> > -Dipesh
> >> >
> >> > On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros <
> andras.piros@cloudera.com
> >> >
> >> > wrote:
> >> >
> >> > > Hi Dipesh,
> >> > >
> >> > > seems like a bad idea to programmatically change job-tracker or
> >> > > name-node properties
> >> > > - it's just not the task of Oozie to determine what are the exact JT
> >> or
> >> > NN
> >> > > instances Oozie should use.
> >> > >
> >> > > Instead, I'd rather setup a load balancer for JT and another one for
> >> NN,
> >> > > and provide those addresses to Oozie's job.properties. That way, we
> >> > > separate concerns - the load balancer can choose the JT or NN node
> >> > runtime,
> >> > > e.g. on a round robin basis.
> >> > >
> >> > > Regards,
> >> > >
> >> > > Andras
> >> > >
> >> > > --
> >> > > Andras PIROS
> >> > > Software Engineer
> >> > > <http://www.cloudera.com/>
> >> > >
> >> > > On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha <
> dipeshsoftware@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I have a use case like this - in a multi cluster (hadoop cluster)
> >> > > > environment if I would like to send a job/oozie workflow to a
> >> desired
> >> > > > cluster during runtime, how can this be done.
> >> > > >
> >> > > > I see that there is JavaActionExecutor class which read NN and
> >> > JobTracker
> >> > > > in createBaseHadoopConf method
> >> > > >
> >> > > > All HadoopActionExectors are derived from JavaActionExecutor so
> this
> >> > > seems
> >> > > > to be a place wherein I can insert my code. How can I do this
> >> without
> >> > > > disrupting the original flow by adding my hook.
> >> > > >
> >> > > > One option is to to derive my new JavaActionExecutor and over ride
> >> > > > createBaseHadoopConf method and then derive all ActionExecutors
> >> from my
> >> > > new
> >> > > > JavaActionExecutor. It doesn't seem to be elegant to me, so
> thought
> >> to
> >> > > ask
> >> > > > out here.
> >> > > >
> >> > > > Any input will be useful.
> >> > > >
> >> > > > Thanks.
> >> > > > -Dipesh
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Change NN and JobTracker dynamically during runtime

Posted by mdk-swandha <di...@gmail.com>.
@Andreas

Hope you understood my use case above. I would appreciate if you please
shed some more light and share about using load balancer to route jobs and
keeping this load balancer outside. I would like to know how you are
suggesting enable this load balancer in RM or NN or you are suggesting to
write this in my service. Please inform me if you anything is not clear in
my use case.

Thanks.

On Mon, Dec 5, 2016 at 11:57 AM, mdk-swandha <di...@gmail.com>
wrote:

> You mean I have to set env variables for each job/workflow execution and
> then it will be picked up by Oozie. And I should set them in my service
> (the service which is finding the best cluster?).
>
> For example let say I have 3 cluster:
> - When a job is sent via Oozie/Hue/Zepellin/Livy etc. - they are mapped to
> one cluster and jobs always goes there. Let say this as a default cluster
> - I have a service which determines what can be best cluster for a given
> job considering various attributes (availability, data locality, network
> bandwidth etc.)
> - This service has exposed an API and caller just passes the required
> parameters(job/input/output/queue etc.) and this service will return the
> best available cluster
>
> With what I have above, I feel keeping the calling code should be in the
> caller (Oozie/Zepellin/Any application) should be the way to go to keep it
> simple to isolate JT's default behavior. This won't disrupt existing jobs
> which are running on these clusters by introducing some new settings. May
> be I'm missing how are you advising creating load balancer setting in JT
> and configuring it during runtime. Can you please tell me more how this can
> be done?
>
> Thanks.
> -Dipesh
>
>
>
> On Mon, Dec 5, 2016 at 10:59 AM, Andras Piros <an...@cloudera.com>
> wrote:
>
>> Hi Dipesh,
>>
>> during workflow / job submission you can define variables inside
>> job.properties coming e.g. from env vars that are used in workflow.xml. So
>> much for the flexibility.
>>
>> Can you tell me a use case where runtime routing to different JT / NN
>> instances via Oozie (and not e.g. coming from a load balancer setting
>> configured runtime) is better?
>>
>> Thanks,
>>
>> Andras
>>
>> --
>> Andras PIROS
>> Software Engineer
>> <http://www.cloudera.com/>
>>
>> On Mon, Dec 5, 2016 at 7:45 PM, mdk-swandha <di...@gmail.com>
>> wrote:
>>
>> > Hi Alex,
>> >
>> > The idea is to call this external service which will find the best
>> cluster
>> > and inform the caller. So today this caller is Oozie, tomorrow it will
>> be
>> > Zeppelin or any other application.
>> >
>> > How can I provide multiple JT and NN addresses in job.properties? You
>> mean
>> > during job/workflow creation? I will still need to overwrite
>> job.properties
>> > or provide these values somewhere dynamically?
>> >
>> > Thanks.
>> > -Dipesh
>> >
>> > On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros <andras.piros@cloudera.com
>> >
>> > wrote:
>> >
>> > > Hi Dipesh,
>> > >
>> > > seems like a bad idea to programmatically change job-tracker or
>> > > name-node properties
>> > > - it's just not the task of Oozie to determine what are the exact JT
>> or
>> > NN
>> > > instances Oozie should use.
>> > >
>> > > Instead, I'd rather setup a load balancer for JT and another one for
>> NN,
>> > > and provide those addresses to Oozie's job.properties. That way, we
>> > > separate concerns - the load balancer can choose the JT or NN node
>> > runtime,
>> > > e.g. on a round robin basis.
>> > >
>> > > Regards,
>> > >
>> > > Andras
>> > >
>> > > --
>> > > Andras PIROS
>> > > Software Engineer
>> > > <http://www.cloudera.com/>
>> > >
>> > > On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha <dipeshsoftware@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I have a use case like this - in a multi cluster (hadoop cluster)
>> > > > environment if I would like to send a job/oozie workflow to a
>> desired
>> > > > cluster during runtime, how can this be done.
>> > > >
>> > > > I see that there is JavaActionExecutor class which read NN and
>> > JobTracker
>> > > > in createBaseHadoopConf method
>> > > >
>> > > > All HadoopActionExectors are derived from JavaActionExecutor so this
>> > > seems
>> > > > to be a place wherein I can insert my code. How can I do this
>> without
>> > > > disrupting the original flow by adding my hook.
>> > > >
>> > > > One option is to to derive my new JavaActionExecutor and over ride
>> > > > createBaseHadoopConf method and then derive all ActionExecutors
>> from my
>> > > new
>> > > > JavaActionExecutor. It doesn't seem to be elegant to me, so thought
>> to
>> > > ask
>> > > > out here.
>> > > >
>> > > > Any input will be useful.
>> > > >
>> > > > Thanks.
>> > > > -Dipesh
>> > > >
>> > >
>> >
>>
>
>

Re: Change NN and JobTracker dynamically during runtime

Posted by mdk-swandha <di...@gmail.com>.
You mean I have to set env variables for each job/workflow execution and
then it will be picked up by Oozie. And I should set them in my service
(the service which is finding the best cluster?).

For example let say I have 3 cluster:
- When a job is sent via Oozie/Hue/Zepellin/Livy etc. - they are mapped to
one cluster and jobs always goes there. Let say this as a default cluster
- I have a service which determines what can be best cluster for a given
job considering various attributes (availability, data locality, network
bandwidth etc.)
- This service has exposed an API and caller just passes the required
parameters(job/input/output/queue etc.) and this service will return the
best available cluster

With what I have above, I feel keeping the calling code should be in the
caller (Oozie/Zepellin/Any application) should be the way to go to keep it
simple to isolate JT's default behavior. This won't disrupt existing jobs
which are running on these clusters by introducing some new settings. May
be I'm missing how are you advising creating load balancer setting in JT
and configuring it during runtime. Can you please tell me more how this can
be done?

Thanks.
-Dipesh



On Mon, Dec 5, 2016 at 10:59 AM, Andras Piros <an...@cloudera.com>
wrote:

> Hi Dipesh,
>
> during workflow / job submission you can define variables inside
> job.properties coming e.g. from env vars that are used in workflow.xml. So
> much for the flexibility.
>
> Can you tell me a use case where runtime routing to different JT / NN
> instances via Oozie (and not e.g. coming from a load balancer setting
> configured runtime) is better?
>
> Thanks,
>
> Andras
>
> --
> Andras PIROS
> Software Engineer
> <http://www.cloudera.com/>
>
> On Mon, Dec 5, 2016 at 7:45 PM, mdk-swandha <di...@gmail.com>
> wrote:
>
> > Hi Alex,
> >
> > The idea is to call this external service which will find the best
> cluster
> > and inform the caller. So today this caller is Oozie, tomorrow it will be
> > Zeppelin or any other application.
> >
> > How can I provide multiple JT and NN addresses in job.properties? You
> mean
> > during job/workflow creation? I will still need to overwrite
> job.properties
> > or provide these values somewhere dynamically?
> >
> > Thanks.
> > -Dipesh
> >
> > On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros <an...@cloudera.com>
> > wrote:
> >
> > > Hi Dipesh,
> > >
> > > seems like a bad idea to programmatically change job-tracker or
> > > name-node properties
> > > - it's just not the task of Oozie to determine what are the exact JT or
> > NN
> > > instances Oozie should use.
> > >
> > > Instead, I'd rather setup a load balancer for JT and another one for
> NN,
> > > and provide those addresses to Oozie's job.properties. That way, we
> > > separate concerns - the load balancer can choose the JT or NN node
> > runtime,
> > > e.g. on a round robin basis.
> > >
> > > Regards,
> > >
> > > Andras
> > >
> > > --
> > > Andras PIROS
> > > Software Engineer
> > > <http://www.cloudera.com/>
> > >
> > > On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha <di...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a use case like this - in a multi cluster (hadoop cluster)
> > > > environment if I would like to send a job/oozie workflow to a desired
> > > > cluster during runtime, how can this be done.
> > > >
> > > > I see that there is JavaActionExecutor class which read NN and
> > JobTracker
> > > > in createBaseHadoopConf method
> > > >
> > > > All HadoopActionExectors are derived from JavaActionExecutor so this
> > > seems
> > > > to be a place wherein I can insert my code. How can I do this without
> > > > disrupting the original flow by adding my hook.
> > > >
> > > > One option is to to derive my new JavaActionExecutor and over ride
> > > > createBaseHadoopConf method and then derive all ActionExecutors from
> my
> > > new
> > > > JavaActionExecutor. It doesn't seem to be elegant to me, so thought
> to
> > > ask
> > > > out here.
> > > >
> > > > Any input will be useful.
> > > >
> > > > Thanks.
> > > > -Dipesh
> > > >
> > >
> >
>

Re: Change NN and JobTracker dynamically during runtime

Posted by Andras Piros <an...@cloudera.com>.
Hi Dipesh,

during workflow / job submission you can define variables inside
job.properties coming e.g. from env vars that are used in workflow.xml. So
much for the flexibility.

Can you tell me a use case where runtime routing to different JT / NN
instances via Oozie (and not e.g. coming from a load balancer setting
configured runtime) is better?

Thanks,

Andras

--
Andras PIROS
Software Engineer
<http://www.cloudera.com/>

On Mon, Dec 5, 2016 at 7:45 PM, mdk-swandha <di...@gmail.com>
wrote:

> Hi Alex,
>
> The idea is to call this external service which will find the best cluster
> and inform the caller. So today this caller is Oozie, tomorrow it will be
> Zeppelin or any other application.
>
> How can I provide multiple JT and NN addresses in job.properties? You mean
> during job/workflow creation? I will still need to overwrite job.properties
> or provide these values somewhere dynamically?
>
> Thanks.
> -Dipesh
>
> On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros <an...@cloudera.com>
> wrote:
>
> > Hi Dipesh,
> >
> > seems like a bad idea to programmatically change job-tracker or
> > name-node properties
> > - it's just not the task of Oozie to determine what are the exact JT or
> NN
> > instances Oozie should use.
> >
> > Instead, I'd rather setup a load balancer for JT and another one for NN,
> > and provide those addresses to Oozie's job.properties. That way, we
> > separate concerns - the load balancer can choose the JT or NN node
> runtime,
> > e.g. on a round robin basis.
> >
> > Regards,
> >
> > Andras
> >
> > --
> > Andras PIROS
> > Software Engineer
> > <http://www.cloudera.com/>
> >
> > On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha <di...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have a use case like this - in a multi cluster (hadoop cluster)
> > > environment if I would like to send a job/oozie workflow to a desired
> > > cluster during runtime, how can this be done.
> > >
> > > I see that there is JavaActionExecutor class which read NN and
> JobTracker
> > > in createBaseHadoopConf method
> > >
> > > All HadoopActionExectors are derived from JavaActionExecutor so this
> > seems
> > > to be a place wherein I can insert my code. How can I do this without
> > > disrupting the original flow by adding my hook.
> > >
> > > One option is to to derive my new JavaActionExecutor and over ride
> > > createBaseHadoopConf method and then derive all ActionExecutors from my
> > new
> > > JavaActionExecutor. It doesn't seem to be elegant to me, so thought to
> > ask
> > > out here.
> > >
> > > Any input will be useful.
> > >
> > > Thanks.
> > > -Dipesh
> > >
> >
>

Re: Change NN and JobTracker dynamically during runtime

Posted by mdk-swandha <di...@gmail.com>.
Hi Alex,

The idea is to call this external service which will find the best cluster
and inform the caller. So today this caller is Oozie, tomorrow it will be
Zeppelin or any other application.

How can I provide multiple JT and NN addresses in job.properties? You mean
during job/workflow creation? I will still need to overwrite job.properties
or provide these values somewhere dynamically?

Thanks.
-Dipesh

On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros <an...@cloudera.com>
wrote:

> Hi Dipesh,
>
> seems like a bad idea to programmatically change job-tracker or
> name-node properties
> - it's just not the task of Oozie to determine what are the exact JT or NN
> instances Oozie should use.
>
> Instead, I'd rather setup a load balancer for JT and another one for NN,
> and provide those addresses to Oozie's job.properties. That way, we
> separate concerns - the load balancer can choose the JT or NN node runtime,
> e.g. on a round robin basis.
>
> Regards,
>
> Andras
>
> --
> Andras PIROS
> Software Engineer
> <http://www.cloudera.com/>
>
> On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha <di...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I have a use case like this - in a multi cluster (hadoop cluster)
> > environment if I would like to send a job/oozie workflow to a desired
> > cluster during runtime, how can this be done.
> >
> > I see that there is JavaActionExecutor class which read NN and JobTracker
> > in createBaseHadoopConf method
> >
> > All HadoopActionExectors are derived from JavaActionExecutor so this
> seems
> > to be a place wherein I can insert my code. How can I do this without
> > disrupting the original flow by adding my hook.
> >
> > One option is to to derive my new JavaActionExecutor and over ride
> > createBaseHadoopConf method and then derive all ActionExecutors from my
> new
> > JavaActionExecutor. It doesn't seem to be elegant to me, so thought to
> ask
> > out here.
> >
> > Any input will be useful.
> >
> > Thanks.
> > -Dipesh
> >
>

Re: Change NN and JobTracker dynamically during runtime

Posted by Andras Piros <an...@cloudera.com>.
Hi Dipesh,

seems like a bad idea to programmatically change job-tracker or
name-node properties
- it's just not the task of Oozie to determine what are the exact JT or NN
instances Oozie should use.

Instead, I'd rather setup a load balancer for JT and another one for NN,
and provide those addresses to Oozie's job.properties. That way, we
separate concerns - the load balancer can choose the JT or NN node runtime,
e.g. on a round robin basis.

Regards,

Andras

--
Andras PIROS
Software Engineer
<http://www.cloudera.com/>

On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha <di...@gmail.com>
wrote:

> Hi,
>
> I have a use case like this - in a multi cluster (hadoop cluster)
> environment if I would like to send a job/oozie workflow to a desired
> cluster during runtime, how can this be done.
>
> I see that there is JavaActionExecutor class which read NN and JobTracker
> in createBaseHadoopConf method
>
> All HadoopActionExectors are derived from JavaActionExecutor so this seems
> to be a place wherein I can insert my code. How can I do this without
> disrupting the original flow by adding my hook.
>
> One option is to to derive my new JavaActionExecutor and over ride
> createBaseHadoopConf method and then derive all ActionExecutors from my new
> JavaActionExecutor. It doesn't seem to be elegant to me, so thought to ask
> out here.
>
> Any input will be useful.
>
> Thanks.
> -Dipesh
>