You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Morgrim Murdargent <mo...@gmail.com> on 2016/04/26 17:22:27 UTC

How to use tez in a pig action in oozie workflow ?

Hello !

I'm trying to use tez in a pig action of an oozie workflow.

The workflow itself is pretty simple :
###
<workflow-app xmlns="uri:oozie:workflow:0.5" name="TEST_PIG_ACTION">
    <start to="init-pig" />
        <action name="init-pig">
            <pig>
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <prepare>
                    <delete path="/tmp/test/output.txt"/>
                </prepare>
                <configuration>
                    <property>
                        <name>mapreduce.job.queuename</name>
                        <value>${queueName}</value>
                    </property>
                </configuration>
                <script>pig.pig</script>
                <file>pig.pig#pig.pig</file>
            </pig>
            <ok to="end"/>
            <error to="fail"/>
        </action>

    <kill name="fail">
        <message>Script failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>

    <end name="end"/>
</workflow-app>
###

The properties :
###
#
------------------------------------------------------------------------------
# Environment
#
------------------------------------------------------------------------------
nameNode=hdfs://<NN FQDN>:8020
jobTracker=<RM FQDN>:8050
kerberosRealm=<REALM>
queueName=<QUEUE>

# Application
#
------------------------------------------------------------------------------
appRoot=${nameNode}/tmp/test
oozie.wf.application.path=${appRoot}/pig.xml

#
------------------------------------------------------------------------------
# Oozie
#
------------------------------------------------------------------------------
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
###

The pig script :
###
data = LOAD '/tmp/test/data.txt' USING PigStorage(',') AS (user, age,
salary);
filtered_data = FILTER data BY age > 11;
ordered_data = ORDER filtered_data BY salary;
final_data = FOREACH ordered_data GENERATE (user, age, salary);
STORE final_data INTO '/tmp/test/output.txt' USING PigStorage();
###


This workflow, launched in MR works.
But as soon as I try to execute it with Tez, it fails.
I tried the following configuration to make it work with Tez, but it fails :
###
                    <property>
                        <name>exectype</name>
                        <value>tez</value>
                    </property>
                    <property>
                        <name>tez.queue.name</name>
                        <value>${queueName}</value>
                    </property>
###

When I added these properties, it failed because it does take into account
the property tez.queue.name.

So my first question would be : how can I specify that I want to use the
execution engine tez instead of MR in a pig action in an oozie workflow ?

And then how can I specify the queue I want to use ?

BR.

Morgrim.

Re: How to use tez in a pig action in oozie workflow ?

Posted by Morgrim Murdargent <mo...@gmail.com>.
Hello Satish.

Thank you for your answer.

I added the argument, but I performed two other modifications.

Here they are :

1. First in the configuration tag of my xml file, I added two properties :

<property>
<name>tez.lib.uris</name>
<value>/hdp/apps/2.3.2.0-2950/tez/tez.tar.gz</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${queueName}</value>
</property>

But it was not enough. I was still encountering error about classes not
found.


2. Second in the pig folder of the oozie sharelib

I added the tez libs in the pig folder of the oozie sharelibs, and then
update them.
I'm using HDP as distribution, so my tez libs are in /usr/hdp/<hdp.version>.
###
# From any nodes of the cluster
kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs
hdfs dfs -put /usr/hdp/2.3.2.0-2950/tez/lib/*
/user/oozie/share/lib/lib_*/pig/
hdfs dfs -put /usr/hdp/2.3.2.0-2950/tez/*.jar
/user/oozie/share/lib/lib_*/pig/
hdfs dfs -chown -R oozie /user/oozie/share/lib/lib_<timestamp>/pig
hdfs dfs -chmod -R 755 /user/oozie/share/lib/lib_<timestamp>/pig

# On the host with oozie-server
kinit -kt /etc/security/keytabs/oozie.service.keytab oozie/`hostname
-f`@<REALM>
oozie admin -oozie http://<fqdn oozie-server> <http://%3CFQDN>:11000/oozie
-sharelibupdate
###

And then I was able to run my worklow successfully.

I'm really wondering why I have to use mapreduce.job.queuename for my tez
job instead of tez.queue.name.
And I wonder why the tez libs where not already in the pig folder of the
oozie sharelibs.

Anyway ty again for your help.

BR.

Morgrim.



On Thu, Apr 28, 2016 at 7:33 PM, satish saley <sa...@gmail.com>
wrote:

> You can specify execution engine using argument tag -
>
> <argument>-x</argument>
> <argument>tez</argument>
>
>
>
> On Thu, Apr 28, 2016 at 8:26 AM, Morgrim Murdargent <
> morgrim.oozie@gmail.com
> > wrote:
>
> > Has anyone any idea about this please ?
> >
> > BR.
> >
> > Morgrim.
> >
> > On Tue, Apr 26, 2016 at 5:22 PM, Morgrim Murdargent <
> > morgrim.oozie@gmail.com
> > > wrote:
> >
> > > Hello !
> > >
> > > I'm trying to use tez in a pig action of an oozie workflow.
> > >
> > > The workflow itself is pretty simple :
> > > ###
> > > <workflow-app xmlns="uri:oozie:workflow:0.5" name="TEST_PIG_ACTION">
> > >     <start to="init-pig" />
> > >         <action name="init-pig">
> > >             <pig>
> > >                 <job-tracker>${jobTracker}</job-tracker>
> > >                 <name-node>${nameNode}</name-node>
> > >                 <prepare>
> > >                     <delete path="/tmp/test/output.txt"/>
> > >                 </prepare>
> > >                 <configuration>
> > >                     <property>
> > >                         <name>mapreduce.job.queuename</name>
> > >                         <value>${queueName}</value>
> > >                     </property>
> > >                 </configuration>
> > >                 <script>pig.pig</script>
> > >                 <file>pig.pig#pig.pig</file>
> > >             </pig>
> > >             <ok to="end"/>
> > >             <error to="fail"/>
> > >         </action>
> > >
> > >     <kill name="fail">
> > >         <message>Script failed, error
> > > message[${wf:errorMessage(wf:lastErrorNode())}]</message>
> > >     </kill>
> > >
> > >     <end name="end"/>
> > > </workflow-app>
> > > ###
> > >
> > > The properties :
> > > ###
> > > #
> > >
> >
> ------------------------------------------------------------------------------
> > > # Environment
> > > #
> > >
> >
> ------------------------------------------------------------------------------
> > > nameNode=hdfs://<NN FQDN>:8020
> > > jobTracker=<RM FQDN>:8050
> > > kerberosRealm=<REALM>
> > > queueName=<QUEUE>
> > >
> > > # Application
> > > #
> > >
> >
> ------------------------------------------------------------------------------
> > > appRoot=${nameNode}/tmp/test
> > > oozie.wf.application.path=${appRoot}/pig.xml
> > >
> > > #
> > >
> >
> ------------------------------------------------------------------------------
> > > # Oozie
> > > #
> > >
> >
> ------------------------------------------------------------------------------
> > > oozie.use.system.libpath=true
> > > oozie.wf.rerun.failnodes=true
> > > ###
> > >
> > > The pig script :
> > > ###
> > > data = LOAD '/tmp/test/data.txt' USING PigStorage(',') AS (user, age,
> > > salary);
> > > filtered_data = FILTER data BY age > 11;
> > > ordered_data = ORDER filtered_data BY salary;
> > > final_data = FOREACH ordered_data GENERATE (user, age, salary);
> > > STORE final_data INTO '/tmp/test/output.txt' USING PigStorage();
> > > ###
> > >
> > >
> > > This workflow, launched in MR works.
> > > But as soon as I try to execute it with Tez, it fails.
> > > I tried the following configuration to make it work with Tez, but it
> > fails
> > > :
> > > ###
> > >                     <property>
> > >                         <name>exectype</name>
> > >                         <value>tez</value>
> > >                     </property>
> > >                     <property>
> > >                         <name>tez.queue.name</name>
> > >                         <value>${queueName}</value>
> > >                     </property>
> > > ###
> > >
> > > When I added these properties, it failed because it does take into
> > account
> > > the property tez.queue.name.
> > >
> > > So my first question would be : how can I specify that I want to use
> the
> > > execution engine tez instead of MR in a pig action in an oozie
> workflow ?
> > >
> > > And then how can I specify the queue I want to use ?
> > >
> > > BR.
> > >
> > > Morgrim.
> > >
> >
>

Re: How to use tez in a pig action in oozie workflow ?

Posted by satish saley <sa...@gmail.com>.
You can specify execution engine using argument tag -

<argument>-x</argument>
<argument>tez</argument>



On Thu, Apr 28, 2016 at 8:26 AM, Morgrim Murdargent <morgrim.oozie@gmail.com
> wrote:

> Has anyone any idea about this please ?
>
> BR.
>
> Morgrim.
>
> On Tue, Apr 26, 2016 at 5:22 PM, Morgrim Murdargent <
> morgrim.oozie@gmail.com
> > wrote:
>
> > Hello !
> >
> > I'm trying to use tez in a pig action of an oozie workflow.
> >
> > The workflow itself is pretty simple :
> > ###
> > <workflow-app xmlns="uri:oozie:workflow:0.5" name="TEST_PIG_ACTION">
> >     <start to="init-pig" />
> >         <action name="init-pig">
> >             <pig>
> >                 <job-tracker>${jobTracker}</job-tracker>
> >                 <name-node>${nameNode}</name-node>
> >                 <prepare>
> >                     <delete path="/tmp/test/output.txt"/>
> >                 </prepare>
> >                 <configuration>
> >                     <property>
> >                         <name>mapreduce.job.queuename</name>
> >                         <value>${queueName}</value>
> >                     </property>
> >                 </configuration>
> >                 <script>pig.pig</script>
> >                 <file>pig.pig#pig.pig</file>
> >             </pig>
> >             <ok to="end"/>
> >             <error to="fail"/>
> >         </action>
> >
> >     <kill name="fail">
> >         <message>Script failed, error
> > message[${wf:errorMessage(wf:lastErrorNode())}]</message>
> >     </kill>
> >
> >     <end name="end"/>
> > </workflow-app>
> > ###
> >
> > The properties :
> > ###
> > #
> >
> ------------------------------------------------------------------------------
> > # Environment
> > #
> >
> ------------------------------------------------------------------------------
> > nameNode=hdfs://<NN FQDN>:8020
> > jobTracker=<RM FQDN>:8050
> > kerberosRealm=<REALM>
> > queueName=<QUEUE>
> >
> > # Application
> > #
> >
> ------------------------------------------------------------------------------
> > appRoot=${nameNode}/tmp/test
> > oozie.wf.application.path=${appRoot}/pig.xml
> >
> > #
> >
> ------------------------------------------------------------------------------
> > # Oozie
> > #
> >
> ------------------------------------------------------------------------------
> > oozie.use.system.libpath=true
> > oozie.wf.rerun.failnodes=true
> > ###
> >
> > The pig script :
> > ###
> > data = LOAD '/tmp/test/data.txt' USING PigStorage(',') AS (user, age,
> > salary);
> > filtered_data = FILTER data BY age > 11;
> > ordered_data = ORDER filtered_data BY salary;
> > final_data = FOREACH ordered_data GENERATE (user, age, salary);
> > STORE final_data INTO '/tmp/test/output.txt' USING PigStorage();
> > ###
> >
> >
> > This workflow, launched in MR works.
> > But as soon as I try to execute it with Tez, it fails.
> > I tried the following configuration to make it work with Tez, but it
> fails
> > :
> > ###
> >                     <property>
> >                         <name>exectype</name>
> >                         <value>tez</value>
> >                     </property>
> >                     <property>
> >                         <name>tez.queue.name</name>
> >                         <value>${queueName}</value>
> >                     </property>
> > ###
> >
> > When I added these properties, it failed because it does take into
> account
> > the property tez.queue.name.
> >
> > So my first question would be : how can I specify that I want to use the
> > execution engine tez instead of MR in a pig action in an oozie workflow ?
> >
> > And then how can I specify the queue I want to use ?
> >
> > BR.
> >
> > Morgrim.
> >
>

Re: How to use tez in a pig action in oozie workflow ?

Posted by Morgrim Murdargent <mo...@gmail.com>.
Has anyone any idea about this please ?

BR.

Morgrim.

On Tue, Apr 26, 2016 at 5:22 PM, Morgrim Murdargent <morgrim.oozie@gmail.com
> wrote:

> Hello !
>
> I'm trying to use tez in a pig action of an oozie workflow.
>
> The workflow itself is pretty simple :
> ###
> <workflow-app xmlns="uri:oozie:workflow:0.5" name="TEST_PIG_ACTION">
>     <start to="init-pig" />
>         <action name="init-pig">
>             <pig>
>                 <job-tracker>${jobTracker}</job-tracker>
>                 <name-node>${nameNode}</name-node>
>                 <prepare>
>                     <delete path="/tmp/test/output.txt"/>
>                 </prepare>
>                 <configuration>
>                     <property>
>                         <name>mapreduce.job.queuename</name>
>                         <value>${queueName}</value>
>                     </property>
>                 </configuration>
>                 <script>pig.pig</script>
>                 <file>pig.pig#pig.pig</file>
>             </pig>
>             <ok to="end"/>
>             <error to="fail"/>
>         </action>
>
>     <kill name="fail">
>         <message>Script failed, error
> message[${wf:errorMessage(wf:lastErrorNode())}]</message>
>     </kill>
>
>     <end name="end"/>
> </workflow-app>
> ###
>
> The properties :
> ###
> #
> ------------------------------------------------------------------------------
> # Environment
> #
> ------------------------------------------------------------------------------
> nameNode=hdfs://<NN FQDN>:8020
> jobTracker=<RM FQDN>:8050
> kerberosRealm=<REALM>
> queueName=<QUEUE>
>
> # Application
> #
> ------------------------------------------------------------------------------
> appRoot=${nameNode}/tmp/test
> oozie.wf.application.path=${appRoot}/pig.xml
>
> #
> ------------------------------------------------------------------------------
> # Oozie
> #
> ------------------------------------------------------------------------------
> oozie.use.system.libpath=true
> oozie.wf.rerun.failnodes=true
> ###
>
> The pig script :
> ###
> data = LOAD '/tmp/test/data.txt' USING PigStorage(',') AS (user, age,
> salary);
> filtered_data = FILTER data BY age > 11;
> ordered_data = ORDER filtered_data BY salary;
> final_data = FOREACH ordered_data GENERATE (user, age, salary);
> STORE final_data INTO '/tmp/test/output.txt' USING PigStorage();
> ###
>
>
> This workflow, launched in MR works.
> But as soon as I try to execute it with Tez, it fails.
> I tried the following configuration to make it work with Tez, but it fails
> :
> ###
>                     <property>
>                         <name>exectype</name>
>                         <value>tez</value>
>                     </property>
>                     <property>
>                         <name>tez.queue.name</name>
>                         <value>${queueName}</value>
>                     </property>
> ###
>
> When I added these properties, it failed because it does take into account
> the property tez.queue.name.
>
> So my first question would be : how can I specify that I want to use the
> execution engine tez instead of MR in a pig action in an oozie workflow ?
>
> And then how can I specify the queue I want to use ?
>
> BR.
>
> Morgrim.
>