You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Chris White <ch...@gmail.com> on 2013/07/29 19:26:34 UTC

Overriding Hadoop Configuration Defaults for Oozie jobs only

Is there a simple way to override values in the Hadoop mapred-site.xml for
all jobs run via the oozie server (rather than on a per workflow basis)?

By default my cluster has JVM child.opts with a specific maximum JVM size
(-Xmx2G) and i want to reduce this, but only for jobs run via oozie.

Is there some way to start oozie with an alternative HADOOP_CONF directory?
- i've not been able to find an obvious place in my instance where the conf
directory is configured (to override)

(oozie-2.3.0, CDH3u1 if that makes any difference)

Thanks

Re: Overriding Hadoop Configuration Defaults for Oozie jobs only

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
I believe, config-default.xml can also handle properties like
<name>mapred.child.java.opts</name>
<value>-DXmx2G</value>

and apply it across all Oozie workflows. Though this is my theoretical
claim and haven't tried this out myself.

Glad to know you were able to backport action conf part and get it working.

Regards,
Mona

On 7/30/13 9:06 AM, "Chris White" <ch...@gmail.com> wrote:

>I'm assuming that config-default.xml will work for variable expansion, but
>won't help me if i don't already have the following in each and every
>map-reduce action config for every workflow:
>
>    <property>
>      <name>mapred.map.child.java.opts</name>
>      <value>-${defaultMapChildJavaOpts}</value>
>    </property>
>
>To reiterate i don't have the above block in any of my current workflows,
>i
>want a way to define a default MR configuration entry for any mapreduce
>action.
>
>As a side note i've managed to back port the default action configuration
>code from the HadoopAccessorService and JavaActionExecutor (from oozie
>3.2.0) to suit my needs.
>
>Thanks
>
>
>On Mon, Jul 29, 2013 at 2:33 PM, Mona Chitnis <ch...@yahoo-inc.com>
>wrote:
>
>> The 'global' file you are referring to would be a 'config-default.xml'.
>>In
>> terms of precedence, the order from least to most is
>>
>> Config-default.xml < job.properties < workflow.xml config property
>>
>> Can you give that a try?
>>
>>
>> On 7/29/13 11:25 AM, "Chris White" <ch...@gmail.com> wrote:
>>
>> >It's certainly something that's present in 3.2.0, but i'm not seeing
>>the
>> >relevant code in HadoopAccessorService for 2.3.0 that would pick this
>>up.
>> >
>> >As for using workflow.xml - i was trying to avoid this as i have a good
>> >number of coordinators, workflows and ultimately MR actions that I'd
>>need
>> >to update this for (unless I'm misunderstanding you - is there a
>>'global'
>> >workflow.xml file i can set some default values in - i don't think
>>there
>> >is
>> >as this wouldn't make sense)?
>> >
>> >Thanks
>> >
>> >Chris
>> >
>> >
>> >On Mon, Jul 29, 2013 at 2:12 PM, Mona Chitnis <ch...@yahoo-inc.com>
>> >wrote:
>> >
>> >> Not entirely sure about 2.3.0 having that change. Tucu, Robert do you
>> >>know
>> >> what that change to introduce 'hadoop-conf' dir was called so it can
>>be
>> >> mapped to one of the release-log entries?
>> >>
>> >> In case, this approach won't apply to your version, you can always
>>pass
>> >>on
>> >> configuration properties through your Oozie workflow.xml for mapred
>> >>child
>> >> JVM opts
>> >>
>> >> E.g.
>> >> <action>
>> >> Š
>> >> <configuration>
>> >> <property>
>> >> <name>mapred.child.java.opts></name>
>> >> <value>-Xmx2G</value>
>> >> </property>
>> >> ...
>> >> </configuration>
>> >> Š
>> >> </action>
>> >>
>> >> NOTE: If you wish to increase the JVM size for Pig launcher job too,
>> >> prepend "oozie.launcher." to the above property name.
>> >>
>> >> On 7/29/13 10:43 AM, "Chris White" <ch...@gmail.com> wrote:
>> >>
>> >> >Mona,
>> >> >
>> >> >Is this true for 2.3.0 with a war distro - i can't find any
>>reference
>> >>to
>> >> >"hadoop-conf" while greping the src folder
>> >> >
>> >> >Chris
>> >> >
>> >> >
>> >> >On Mon, Jul 29, 2013 at 1:36 PM, Mona Chitnis
>><ch...@yahoo-inc.com>
>> >> >wrote:
>> >> >
>> >> >> Hi Chris,
>> >> >>
>> >> >> Oozie distro dir is currently built in a way that it has a
>> >>'hadoop-conf'
>> >> >> dir underneath 'conf', which sits alongside other Oozie config
>>files
>> >> >>e.g.
>> >> >> Oozie-site.xml and so on. You can put your custom hadoop
>>*-site.xml
>> >> >>files
>> >> >> inside this 'hadoop-conf'.
>> >> >>
>> >> >> Regards,
>> >> >> Mona
>> >> >>
>> >> >> On 7/29/13 10:26 AM, "Chris White" <ch...@gmail.com>
>>wrote:
>> >> >>
>> >> >> >Is there a simple way to override values in the Hadoop
>> >>mapred-site.xml
>> >> >>for
>> >> >> >all jobs run via the oozie server (rather than on a per workflow
>> >> >>basis)?
>> >> >> >
>> >> >> >By default my cluster has JVM child.opts with a specific maximum
>>JVM
>> >> >>size
>> >> >> >(-Xmx2G) and i want to reduce this, but only for jobs run via
>>oozie.
>> >> >> >
>> >> >> >Is there some way to start oozie with an alternative HADOOP_CONF
>> >> >> >directory?
>> >> >> >- i've not been able to find an obvious place in my instance
>>where
>> >>the
>> >> >> >conf
>> >> >> >directory is configured (to override)
>> >> >> >
>> >> >> >(oozie-2.3.0, CDH3u1 if that makes any difference)
>> >> >> >
>> >> >> >Thanks
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>


Re: Overriding Hadoop Configuration Defaults for Oozie jobs only

Posted by Chris White <ch...@gmail.com>.
I'm assuming that config-default.xml will work for variable expansion, but
won't help me if i don't already have the following in each and every
map-reduce action config for every workflow:

    <property>
      <name>mapred.map.child.java.opts</name>
      <value>-${defaultMapChildJavaOpts}</value>
    </property>

To reiterate i don't have the above block in any of my current workflows, i
want a way to define a default MR configuration entry for any mapreduce
action.

As a side note i've managed to back port the default action configuration
code from the HadoopAccessorService and JavaActionExecutor (from oozie
3.2.0) to suit my needs.

Thanks


On Mon, Jul 29, 2013 at 2:33 PM, Mona Chitnis <ch...@yahoo-inc.com> wrote:

> The 'global' file you are referring to would be a 'config-default.xml'. In
> terms of precedence, the order from least to most is
>
> Config-default.xml < job.properties < workflow.xml config property
>
> Can you give that a try?
>
>
> On 7/29/13 11:25 AM, "Chris White" <ch...@gmail.com> wrote:
>
> >It's certainly something that's present in 3.2.0, but i'm not seeing the
> >relevant code in HadoopAccessorService for 2.3.0 that would pick this up.
> >
> >As for using workflow.xml - i was trying to avoid this as i have a good
> >number of coordinators, workflows and ultimately MR actions that I'd need
> >to update this for (unless I'm misunderstanding you - is there a 'global'
> >workflow.xml file i can set some default values in - i don't think there
> >is
> >as this wouldn't make sense)?
> >
> >Thanks
> >
> >Chris
> >
> >
> >On Mon, Jul 29, 2013 at 2:12 PM, Mona Chitnis <ch...@yahoo-inc.com>
> >wrote:
> >
> >> Not entirely sure about 2.3.0 having that change. Tucu, Robert do you
> >>know
> >> what that change to introduce 'hadoop-conf' dir was called so it can be
> >> mapped to one of the release-log entries?
> >>
> >> In case, this approach won't apply to your version, you can always pass
> >>on
> >> configuration properties through your Oozie workflow.xml for mapred
> >>child
> >> JVM opts
> >>
> >> E.g.
> >> <action>
> >> Š
> >> <configuration>
> >> <property>
> >> <name>mapred.child.java.opts></name>
> >> <value>-Xmx2G</value>
> >> </property>
> >> ...
> >> </configuration>
> >> Š
> >> </action>
> >>
> >> NOTE: If you wish to increase the JVM size for Pig launcher job too,
> >> prepend "oozie.launcher." to the above property name.
> >>
> >> On 7/29/13 10:43 AM, "Chris White" <ch...@gmail.com> wrote:
> >>
> >> >Mona,
> >> >
> >> >Is this true for 2.3.0 with a war distro - i can't find any reference
> >>to
> >> >"hadoop-conf" while greping the src folder
> >> >
> >> >Chris
> >> >
> >> >
> >> >On Mon, Jul 29, 2013 at 1:36 PM, Mona Chitnis <ch...@yahoo-inc.com>
> >> >wrote:
> >> >
> >> >> Hi Chris,
> >> >>
> >> >> Oozie distro dir is currently built in a way that it has a
> >>'hadoop-conf'
> >> >> dir underneath 'conf', which sits alongside other Oozie config files
> >> >>e.g.
> >> >> Oozie-site.xml and so on. You can put your custom hadoop *-site.xml
> >> >>files
> >> >> inside this 'hadoop-conf'.
> >> >>
> >> >> Regards,
> >> >> Mona
> >> >>
> >> >> On 7/29/13 10:26 AM, "Chris White" <ch...@gmail.com> wrote:
> >> >>
> >> >> >Is there a simple way to override values in the Hadoop
> >>mapred-site.xml
> >> >>for
> >> >> >all jobs run via the oozie server (rather than on a per workflow
> >> >>basis)?
> >> >> >
> >> >> >By default my cluster has JVM child.opts with a specific maximum JVM
> >> >>size
> >> >> >(-Xmx2G) and i want to reduce this, but only for jobs run via oozie.
> >> >> >
> >> >> >Is there some way to start oozie with an alternative HADOOP_CONF
> >> >> >directory?
> >> >> >- i've not been able to find an obvious place in my instance where
> >>the
> >> >> >conf
> >> >> >directory is configured (to override)
> >> >> >
> >> >> >(oozie-2.3.0, CDH3u1 if that makes any difference)
> >> >> >
> >> >> >Thanks
> >> >>
> >> >>
> >>
> >>
>
>

Re: Overriding Hadoop Configuration Defaults for Oozie jobs only

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
The 'global' file you are referring to would be a 'config-default.xml'. In
terms of precedence, the order from least to most is

Config-default.xml < job.properties < workflow.xml config property

Can you give that a try?


On 7/29/13 11:25 AM, "Chris White" <ch...@gmail.com> wrote:

>It's certainly something that's present in 3.2.0, but i'm not seeing the
>relevant code in HadoopAccessorService for 2.3.0 that would pick this up.
>
>As for using workflow.xml - i was trying to avoid this as i have a good
>number of coordinators, workflows and ultimately MR actions that I'd need
>to update this for (unless I'm misunderstanding you - is there a 'global'
>workflow.xml file i can set some default values in - i don't think there
>is
>as this wouldn't make sense)?
>
>Thanks
>
>Chris
>
>
>On Mon, Jul 29, 2013 at 2:12 PM, Mona Chitnis <ch...@yahoo-inc.com>
>wrote:
>
>> Not entirely sure about 2.3.0 having that change. Tucu, Robert do you
>>know
>> what that change to introduce 'hadoop-conf' dir was called so it can be
>> mapped to one of the release-log entries?
>>
>> In case, this approach won't apply to your version, you can always pass
>>on
>> configuration properties through your Oozie workflow.xml for mapred
>>child
>> JVM opts
>>
>> E.g.
>> <action>
>> Š
>> <configuration>
>> <property>
>> <name>mapred.child.java.opts></name>
>> <value>-Xmx2G</value>
>> </property>
>> ...
>> </configuration>
>> Š
>> </action>
>>
>> NOTE: If you wish to increase the JVM size for Pig launcher job too,
>> prepend "oozie.launcher." to the above property name.
>>
>> On 7/29/13 10:43 AM, "Chris White" <ch...@gmail.com> wrote:
>>
>> >Mona,
>> >
>> >Is this true for 2.3.0 with a war distro - i can't find any reference
>>to
>> >"hadoop-conf" while greping the src folder
>> >
>> >Chris
>> >
>> >
>> >On Mon, Jul 29, 2013 at 1:36 PM, Mona Chitnis <ch...@yahoo-inc.com>
>> >wrote:
>> >
>> >> Hi Chris,
>> >>
>> >> Oozie distro dir is currently built in a way that it has a
>>'hadoop-conf'
>> >> dir underneath 'conf', which sits alongside other Oozie config files
>> >>e.g.
>> >> Oozie-site.xml and so on. You can put your custom hadoop *-site.xml
>> >>files
>> >> inside this 'hadoop-conf'.
>> >>
>> >> Regards,
>> >> Mona
>> >>
>> >> On 7/29/13 10:26 AM, "Chris White" <ch...@gmail.com> wrote:
>> >>
>> >> >Is there a simple way to override values in the Hadoop
>>mapred-site.xml
>> >>for
>> >> >all jobs run via the oozie server (rather than on a per workflow
>> >>basis)?
>> >> >
>> >> >By default my cluster has JVM child.opts with a specific maximum JVM
>> >>size
>> >> >(-Xmx2G) and i want to reduce this, but only for jobs run via oozie.
>> >> >
>> >> >Is there some way to start oozie with an alternative HADOOP_CONF
>> >> >directory?
>> >> >- i've not been able to find an obvious place in my instance where
>>the
>> >> >conf
>> >> >directory is configured (to override)
>> >> >
>> >> >(oozie-2.3.0, CDH3u1 if that makes any difference)
>> >> >
>> >> >Thanks
>> >>
>> >>
>>
>>


Re: Overriding Hadoop Configuration Defaults for Oozie jobs only

Posted by Chris White <ch...@gmail.com>.
It's certainly something that's present in 3.2.0, but i'm not seeing the
relevant code in HadoopAccessorService for 2.3.0 that would pick this up.

As for using workflow.xml - i was trying to avoid this as i have a good
number of coordinators, workflows and ultimately MR actions that I'd need
to update this for (unless I'm misunderstanding you - is there a 'global'
workflow.xml file i can set some default values in - i don't think there is
as this wouldn't make sense)?

Thanks

Chris


On Mon, Jul 29, 2013 at 2:12 PM, Mona Chitnis <ch...@yahoo-inc.com> wrote:

> Not entirely sure about 2.3.0 having that change. Tucu, Robert do you know
> what that change to introduce 'hadoop-conf' dir was called so it can be
> mapped to one of the release-log entries?
>
> In case, this approach won't apply to your version, you can always pass on
> configuration properties through your Oozie workflow.xml for mapred child
> JVM opts
>
> E.g.
> <action>
> Š
> <configuration>
> <property>
> <name>mapred.child.java.opts></name>
> <value>-Xmx2G</value>
> </property>
> ...
> </configuration>
> Š
> </action>
>
> NOTE: If you wish to increase the JVM size for Pig launcher job too,
> prepend "oozie.launcher." to the above property name.
>
> On 7/29/13 10:43 AM, "Chris White" <ch...@gmail.com> wrote:
>
> >Mona,
> >
> >Is this true for 2.3.0 with a war distro - i can't find any reference to
> >"hadoop-conf" while greping the src folder
> >
> >Chris
> >
> >
> >On Mon, Jul 29, 2013 at 1:36 PM, Mona Chitnis <ch...@yahoo-inc.com>
> >wrote:
> >
> >> Hi Chris,
> >>
> >> Oozie distro dir is currently built in a way that it has a 'hadoop-conf'
> >> dir underneath 'conf', which sits alongside other Oozie config files
> >>e.g.
> >> Oozie-site.xml and so on. You can put your custom hadoop *-site.xml
> >>files
> >> inside this 'hadoop-conf'.
> >>
> >> Regards,
> >> Mona
> >>
> >> On 7/29/13 10:26 AM, "Chris White" <ch...@gmail.com> wrote:
> >>
> >> >Is there a simple way to override values in the Hadoop mapred-site.xml
> >>for
> >> >all jobs run via the oozie server (rather than on a per workflow
> >>basis)?
> >> >
> >> >By default my cluster has JVM child.opts with a specific maximum JVM
> >>size
> >> >(-Xmx2G) and i want to reduce this, but only for jobs run via oozie.
> >> >
> >> >Is there some way to start oozie with an alternative HADOOP_CONF
> >> >directory?
> >> >- i've not been able to find an obvious place in my instance where the
> >> >conf
> >> >directory is configured (to override)
> >> >
> >> >(oozie-2.3.0, CDH3u1 if that makes any difference)
> >> >
> >> >Thanks
> >>
> >>
>
>

Re: Overriding Hadoop Configuration Defaults for Oozie jobs only

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Not entirely sure about 2.3.0 having that change. Tucu, Robert do you know
what that change to introduce 'hadoop-conf' dir was called so it can be
mapped to one of the release-log entries?

In case, this approach won't apply to your version, you can always pass on
configuration properties through your Oozie workflow.xml for mapred child
JVM opts

E.g.
<action>
Š
<configuration>
<property>
<name>mapred.child.java.opts></name>
<value>-Xmx2G</value>
</property>
...
</configuration>
Š
</action>

NOTE: If you wish to increase the JVM size for Pig launcher job too,
prepend "oozie.launcher." to the above property name.

On 7/29/13 10:43 AM, "Chris White" <ch...@gmail.com> wrote:

>Mona,
>
>Is this true for 2.3.0 with a war distro - i can't find any reference to
>"hadoop-conf" while greping the src folder
>
>Chris
>
>
>On Mon, Jul 29, 2013 at 1:36 PM, Mona Chitnis <ch...@yahoo-inc.com>
>wrote:
>
>> Hi Chris,
>>
>> Oozie distro dir is currently built in a way that it has a 'hadoop-conf'
>> dir underneath 'conf', which sits alongside other Oozie config files
>>e.g.
>> Oozie-site.xml and so on. You can put your custom hadoop *-site.xml
>>files
>> inside this 'hadoop-conf'.
>>
>> Regards,
>> Mona
>>
>> On 7/29/13 10:26 AM, "Chris White" <ch...@gmail.com> wrote:
>>
>> >Is there a simple way to override values in the Hadoop mapred-site.xml
>>for
>> >all jobs run via the oozie server (rather than on a per workflow
>>basis)?
>> >
>> >By default my cluster has JVM child.opts with a specific maximum JVM
>>size
>> >(-Xmx2G) and i want to reduce this, but only for jobs run via oozie.
>> >
>> >Is there some way to start oozie with an alternative HADOOP_CONF
>> >directory?
>> >- i've not been able to find an obvious place in my instance where the
>> >conf
>> >directory is configured (to override)
>> >
>> >(oozie-2.3.0, CDH3u1 if that makes any difference)
>> >
>> >Thanks
>>
>>


Re: Overriding Hadoop Configuration Defaults for Oozie jobs only

Posted by Chris White <ch...@gmail.com>.
Mona,

Is this true for 2.3.0 with a war distro - i can't find any reference to
"hadoop-conf" while greping the src folder

Chris


On Mon, Jul 29, 2013 at 1:36 PM, Mona Chitnis <ch...@yahoo-inc.com> wrote:

> Hi Chris,
>
> Oozie distro dir is currently built in a way that it has a 'hadoop-conf'
> dir underneath 'conf', which sits alongside other Oozie config files e.g.
> Oozie-site.xml and so on. You can put your custom hadoop *-site.xml files
> inside this 'hadoop-conf'.
>
> Regards,
> Mona
>
> On 7/29/13 10:26 AM, "Chris White" <ch...@gmail.com> wrote:
>
> >Is there a simple way to override values in the Hadoop mapred-site.xml for
> >all jobs run via the oozie server (rather than on a per workflow basis)?
> >
> >By default my cluster has JVM child.opts with a specific maximum JVM size
> >(-Xmx2G) and i want to reduce this, but only for jobs run via oozie.
> >
> >Is there some way to start oozie with an alternative HADOOP_CONF
> >directory?
> >- i've not been able to find an obvious place in my instance where the
> >conf
> >directory is configured (to override)
> >
> >(oozie-2.3.0, CDH3u1 if that makes any difference)
> >
> >Thanks
>
>

Re: Overriding Hadoop Configuration Defaults for Oozie jobs only

Posted by Mona Chitnis <ch...@yahoo-inc.com>.
Hi Chris,

Oozie distro dir is currently built in a way that it has a 'hadoop-conf'
dir underneath 'conf', which sits alongside other Oozie config files e.g.
Oozie-site.xml and so on. You can put your custom hadoop *-site.xml files
inside this 'hadoop-conf'.

Regards,
Mona

On 7/29/13 10:26 AM, "Chris White" <ch...@gmail.com> wrote:

>Is there a simple way to override values in the Hadoop mapred-site.xml for
>all jobs run via the oozie server (rather than on a per workflow basis)?
>
>By default my cluster has JVM child.opts with a specific maximum JVM size
>(-Xmx2G) and i want to reduce this, but only for jobs run via oozie.
>
>Is there some way to start oozie with an alternative HADOOP_CONF
>directory?
>- i've not been able to find an obvious place in my instance where the
>conf
>directory is configured (to override)
>
>(oozie-2.3.0, CDH3u1 if that makes any difference)
>
>Thanks