You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Miguel Angel Martin junquera <mi...@gmail.com> on 2013/11/28 13:56:13 UTC

Pig-cassandra Scritps and Oozie

hi all;

What is the best way to integrate cassandra pig-extension with oozie?

can be configure  oozie to use pig-cassandra instead of pig?

Some ideas that I thinking are:

Launching a Shell job    that runs ./pig-cassandra script.pig
or   changing environment variables  vakues
or the original to include the pig-cassandra code .... etc

Thanks and regards

Re: Pig-cassandra Scritps and Oozie

Posted by Jeremy Hanna <je...@gmail.com>.
I believe what I did was when I set up Oozie with the setup script where you specify the version of Hadoop and such, I also added additional jars like the Cassandra jars and some of its dependencies there and the cassandra.yaml, cassandra-env.sh and potentially the topology properties file.  Then with the configuration outlined on the Cassandra wiki that you posted, I just used the built-in Pig support and it worked fine.  You might try a simple test case to read from and write to Cassandra and look for errors either in the job setup (the 1 mapper job that Oozie creates to initialize the job) or in the job itself.

The specific jars from Cassandra that I added as additional jars were:
cassandra-all
cassandra-thrift
guava
high-scale-lib
lib-thrift
log4j
snake-yaml
commons-io
then cassandra.yaml, cassandra-env.sh, and cassandra-topology.properties file (if using property file snitch)

I reference those jars in the environment variable LIBEXT_JARS then execute:
bin/oozie-setup.sh prepare-war -jars $LIBEXT_JARS -extjs ./ext-2.2.zip

Hopefully that helps,

Jeremy

On 28 Nov 2013, at 15:31, Miguel Angel Martin junquera <mi...@gmail.com> wrote:

> hi Jeremy,
> 
> I do not try test it  still, I only test examples pig from oozie project
> without cassadra.
> 
> * pig-cassandra* sets the cassandra pig libraries .jar in the the
> PIG_CLASSPATH env var. and after call the original shell script  *pig* from
> PIG_HOME/bin/pig and , up to now, I launch pig scripts with pig_cassandra
> directly.
> 
> I do not know and did not  see how oozie launch pig and I supose that Oozie
> launch the PIG_HOME/bin/pig.
> 
> If you are using  this config and the pig scripts that use cassandra works
> fine  , I suspose that the trick is  putting  the cassandra jars
> dependencies and other udf or libraries that you use in the pig scripts  in
> the oozie  sharelib or in the lib folder of the job.
> 
> 
> On the other hand, I do not know if  i have to configure some thing  like
> this.
> 
> http://wiki.apache.org/cassandra/HadoopSupport#Oozie
> 
> I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.
> 
> I try to test these options and see if it works-
> 
> Thanks in advance
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 2013/11/28 Jeremy Hanna <je...@gmail.com>
> 
>> If I remember correctly when I configured pig, cassandra, and oozie to
>> work together, I just used vanilla pig but gave it the jars it needed.
>> 
>> What is the problem you’re experiencing that you are unable to do this?
>> 
>> Jeremy
>> 
>> On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <
>> mianmarjun.mailinglist@gmail.com> wrote:
>> 
>>> hi all;
>>> 
>>> What is the best way to integrate cassandra pig-extension with oozie?
>>> 
>>> can be configure  oozie to use pig-cassandra instead of pig?
>>> 
>>> Some ideas that I thinking are:
>>> 
>>> Launching a Shell job    that runs ./pig-cassandra script.pig
>>> or   changing environment variables  vakues
>>> or the original to include the pig-cassandra code .... etc
>>> 
>>> Thanks and regards
>> 
>> 


Re: Pig-cassandra Scritps and Oozie

Posted by Jeremy Hanna <je...@gmail.com>.
I believe what I did was when I set up Oozie with the setup script where you specify the version of Hadoop and such, I also added additional jars like the Cassandra jars and some of its dependencies there and the cassandra.yaml, cassandra-env.sh and potentially the topology properties file.  Then with the configuration outlined on the Cassandra wiki that you posted, I just used the built-in Pig support and it worked fine.  You might try a simple test case to read from and write to Cassandra and look for errors either in the job setup (the 1 mapper job that Oozie creates to initialize the job) or in the job itself.

The specific jars from Cassandra that I added as additional jars were:
cassandra-all
cassandra-thrift
guava
high-scale-lib
lib-thrift
log4j
snake-yaml
commons-io
then cassandra.yaml, cassandra-env.sh, and cassandra-topology.properties file (if using property file snitch)

I reference those jars in the environment variable LIBEXT_JARS then execute:
bin/oozie-setup.sh prepare-war -jars $LIBEXT_JARS -extjs ./ext-2.2.zip

Hopefully that helps,

Jeremy

On 28 Nov 2013, at 15:31, Miguel Angel Martin junquera <mi...@gmail.com> wrote:

> hi Jeremy,
> 
> I do not try test it  still, I only test examples pig from oozie project
> without cassadra.
> 
> * pig-cassandra* sets the cassandra pig libraries .jar in the the
> PIG_CLASSPATH env var. and after call the original shell script  *pig* from
> PIG_HOME/bin/pig and , up to now, I launch pig scripts with pig_cassandra
> directly.
> 
> I do not know and did not  see how oozie launch pig and I supose that Oozie
> launch the PIG_HOME/bin/pig.
> 
> If you are using  this config and the pig scripts that use cassandra works
> fine  , I suspose that the trick is  putting  the cassandra jars
> dependencies and other udf or libraries that you use in the pig scripts  in
> the oozie  sharelib or in the lib folder of the job.
> 
> 
> On the other hand, I do not know if  i have to configure some thing  like
> this.
> 
> http://wiki.apache.org/cassandra/HadoopSupport#Oozie
> 
> I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.
> 
> I try to test these options and see if it works-
> 
> Thanks in advance
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 2013/11/28 Jeremy Hanna <je...@gmail.com>
> 
>> If I remember correctly when I configured pig, cassandra, and oozie to
>> work together, I just used vanilla pig but gave it the jars it needed.
>> 
>> What is the problem you’re experiencing that you are unable to do this?
>> 
>> Jeremy
>> 
>> On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <
>> mianmarjun.mailinglist@gmail.com> wrote:
>> 
>>> hi all;
>>> 
>>> What is the best way to integrate cassandra pig-extension with oozie?
>>> 
>>> can be configure  oozie to use pig-cassandra instead of pig?
>>> 
>>> Some ideas that I thinking are:
>>> 
>>> Launching a Shell job    that runs ./pig-cassandra script.pig
>>> or   changing environment variables  vakues
>>> or the original to include the pig-cassandra code .... etc
>>> 
>>> Thanks and regards
>> 
>> 


Re: Pig-cassandra Scritps and Oozie

Posted by Jeremy Hanna <je...@gmail.com>.
I believe what I did was when I set up Oozie with the setup script where you specify the version of Hadoop and such, I also added additional jars like the Cassandra jars and some of its dependencies there and the cassandra.yaml, cassandra-env.sh and potentially the topology properties file.  Then with the configuration outlined on the Cassandra wiki that you posted, I just used the built-in Pig support and it worked fine.  You might try a simple test case to read from and write to Cassandra and look for errors either in the job setup (the 1 mapper job that Oozie creates to initialize the job) or in the job itself.

The specific jars from Cassandra that I added as additional jars were:
cassandra-all
cassandra-thrift
guava
high-scale-lib
lib-thrift
log4j
snake-yaml
commons-io
then cassandra.yaml, cassandra-env.sh, and cassandra-topology.properties file (if using property file snitch)

I reference those jars in the environment variable LIBEXT_JARS then execute:
bin/oozie-setup.sh prepare-war -jars $LIBEXT_JARS -extjs ./ext-2.2.zip

Hopefully that helps,

Jeremy

On 28 Nov 2013, at 15:31, Miguel Angel Martin junquera <mi...@gmail.com> wrote:

> hi Jeremy,
> 
> I do not try test it  still, I only test examples pig from oozie project
> without cassadra.
> 
> * pig-cassandra* sets the cassandra pig libraries .jar in the the
> PIG_CLASSPATH env var. and after call the original shell script  *pig* from
> PIG_HOME/bin/pig and , up to now, I launch pig scripts with pig_cassandra
> directly.
> 
> I do not know and did not  see how oozie launch pig and I supose that Oozie
> launch the PIG_HOME/bin/pig.
> 
> If you are using  this config and the pig scripts that use cassandra works
> fine  , I suspose that the trick is  putting  the cassandra jars
> dependencies and other udf or libraries that you use in the pig scripts  in
> the oozie  sharelib or in the lib folder of the job.
> 
> 
> On the other hand, I do not know if  i have to configure some thing  like
> this.
> 
> http://wiki.apache.org/cassandra/HadoopSupport#Oozie
> 
> I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.
> 
> I try to test these options and see if it works-
> 
> Thanks in advance
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 2013/11/28 Jeremy Hanna <je...@gmail.com>
> 
>> If I remember correctly when I configured pig, cassandra, and oozie to
>> work together, I just used vanilla pig but gave it the jars it needed.
>> 
>> What is the problem you’re experiencing that you are unable to do this?
>> 
>> Jeremy
>> 
>> On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <
>> mianmarjun.mailinglist@gmail.com> wrote:
>> 
>>> hi all;
>>> 
>>> What is the best way to integrate cassandra pig-extension with oozie?
>>> 
>>> can be configure  oozie to use pig-cassandra instead of pig?
>>> 
>>> Some ideas that I thinking are:
>>> 
>>> Launching a Shell job    that runs ./pig-cassandra script.pig
>>> or   changing environment variables  vakues
>>> or the original to include the pig-cassandra code .... etc
>>> 
>>> Thanks and regards
>> 
>> 


Re: Pig-cassandra Scritps and Oozie

Posted by Miguel Angel Martin junquera <mi...@gmail.com>.
hi Jeremy,

I do not try test it  still, I only test examples pig from oozie project
 without cassadra.

* pig-cassandra* sets the cassandra pig libraries .jar in the the
PIG_CLASSPATH env var. and after call the original shell script  *pig* from
PIG_HOME/bin/pig and , up to now, I launch pig scripts with pig_cassandra
directly.

I do not know and did not  see how oozie launch pig and I supose that Oozie
launch the PIG_HOME/bin/pig.

If you are using  this config and the pig scripts that use cassandra works
fine  , I suspose that the trick is  putting  the cassandra jars
dependencies and other udf or libraries that you use in the pig scripts  in
the oozie  sharelib or in the lib folder of the job.


On the other hand, I do not know if  i have to configure some thing  like
this.

http://wiki.apache.org/cassandra/HadoopSupport#Oozie

I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.

I try to test these options and see if it works-

Thanks in advance











2013/11/28 Jeremy Hanna <je...@gmail.com>

> If I remember correctly when I configured pig, cassandra, and oozie to
> work together, I just used vanilla pig but gave it the jars it needed.
>
> What is the problem you’re experiencing that you are unable to do this?
>
> Jeremy
>
> On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <
> mianmarjun.mailinglist@gmail.com> wrote:
>
> > hi all;
> >
> > What is the best way to integrate cassandra pig-extension with oozie?
> >
> > can be configure  oozie to use pig-cassandra instead of pig?
> >
> > Some ideas that I thinking are:
> >
> > Launching a Shell job    that runs ./pig-cassandra script.pig
> > or   changing environment variables  vakues
> > or the original to include the pig-cassandra code .... etc
> >
> > Thanks and regards
>
>

Re: Pig-cassandra Scritps and Oozie

Posted by Miguel Angel Martin junquera <mi...@gmail.com>.
hi Jeremy,

I do not try test it  still, I only test examples pig from oozie project
 without cassadra.

* pig-cassandra* sets the cassandra pig libraries .jar in the the
PIG_CLASSPATH env var. and after call the original shell script  *pig* from
PIG_HOME/bin/pig and , up to now, I launch pig scripts with pig_cassandra
directly.

I do not know and did not  see how oozie launch pig and I supose that Oozie
launch the PIG_HOME/bin/pig.

If you are using  this config and the pig scripts that use cassandra works
fine  , I suspose that the trick is  putting  the cassandra jars
dependencies and other udf or libraries that you use in the pig scripts  in
the oozie  sharelib or in the lib folder of the job.


On the other hand, I do not know if  i have to configure some thing  like
this.

http://wiki.apache.org/cassandra/HadoopSupport#Oozie

I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1.

I try to test these options and see if it works-

Thanks in advance











2013/11/28 Jeremy Hanna <je...@gmail.com>

> If I remember correctly when I configured pig, cassandra, and oozie to
> work together, I just used vanilla pig but gave it the jars it needed.
>
> What is the problem you’re experiencing that you are unable to do this?
>
> Jeremy
>
> On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <
> mianmarjun.mailinglist@gmail.com> wrote:
>
> > hi all;
> >
> > What is the best way to integrate cassandra pig-extension with oozie?
> >
> > can be configure  oozie to use pig-cassandra instead of pig?
> >
> > Some ideas that I thinking are:
> >
> > Launching a Shell job    that runs ./pig-cassandra script.pig
> > or   changing environment variables  vakues
> > or the original to include the pig-cassandra code .... etc
> >
> > Thanks and regards
>
>

Re: Pig-cassandra Scritps and Oozie

Posted by Jeremy Hanna <je...@gmail.com>.
If I remember correctly when I configured pig, cassandra, and oozie to work together, I just used vanilla pig but gave it the jars it needed.

What is the problem you’re experiencing that you are unable to do this?

Jeremy

On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <mi...@gmail.com> wrote:

> hi all;
> 
> What is the best way to integrate cassandra pig-extension with oozie?
> 
> can be configure  oozie to use pig-cassandra instead of pig?
> 
> Some ideas that I thinking are:
> 
> Launching a Shell job    that runs ./pig-cassandra script.pig
> or   changing environment variables  vakues
> or the original to include the pig-cassandra code .... etc
> 
> Thanks and regards


Re: Pig-cassandra Scritps and Oozie

Posted by Jeremy Hanna <je...@gmail.com>.
If I remember correctly when I configured pig, cassandra, and oozie to work together, I just used vanilla pig but gave it the jars it needed.

What is the problem you’re experiencing that you are unable to do this?

Jeremy

On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <mi...@gmail.com> wrote:

> hi all;
> 
> What is the best way to integrate cassandra pig-extension with oozie?
> 
> can be configure  oozie to use pig-cassandra instead of pig?
> 
> Some ideas that I thinking are:
> 
> Launching a Shell job    that runs ./pig-cassandra script.pig
> or   changing environment variables  vakues
> or the original to include the pig-cassandra code .... etc
> 
> Thanks and regards


Re: Pig-cassandra Scritps and Oozie

Posted by Jeremy Hanna <je...@gmail.com>.
If I remember correctly when I configured pig, cassandra, and oozie to work together, I just used vanilla pig but gave it the jars it needed.

What is the problem you’re experiencing that you are unable to do this?

Jeremy

On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera <mi...@gmail.com> wrote:

> hi all;
> 
> What is the best way to integrate cassandra pig-extension with oozie?
> 
> can be configure  oozie to use pig-cassandra instead of pig?
> 
> Some ideas that I thinking are:
> 
> Launching a Shell job    that runs ./pig-cassandra script.pig
> or   changing environment variables  vakues
> or the original to include the pig-cassandra code .... etc
> 
> Thanks and regards