You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Salabhanjika S <sa...@gmail.com> on 2014/04/26 21:01:51 UTC

pig maven integration

Hi,

I'm new to Pig scripting. Please provide me some pointers on the following.

1. How can we just *compile (compile only)* pig scripts? I had gone through
the documentation and *"-check"* is providing the syntax check
functionality. But it requires all params used in the script to be set. I'm
looking for something more generic. So that I can have a quick sanity check
of my scripts.

2. Also, what is the clean way to handle library dependencies of a Pig
script? Current way of registering the jars by path looks very odd to me.
This requires changes in script/code when there is a library upgrade.


-S

Re: pig maven integration

Posted by Salabhanjika S <sa...@gmail.com>.
"compile only"
PigServer/PigUnit requires params substitution. To be more specific,
to compile a java app "javac" don't need to know the internals of the
app.

"registering dependencies"
Same case with dependencies as well. By copy-pasting & hard-coding
libraries into the script introduces new set of problems during
version upgrade of any library. In java world, no one wants to
hardcode jar names inside a java file.

Hope, this clarifies my requirement.

On Sun, Apr 27, 2014 at 4:29 AM, Jay Vyas <ja...@gmail.com> wrote:
> Ah, I guess from the "pig maven integration" subject heading i assumed you
> wanted tight maven integration of your pig scripts.  In any case, here is
> what i was suggesting.
>
> 1) How can we just *compile (compile only)* pig scripts?
>
> See
> https://github.com/apache/bigtop/blob/master/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/etl/PigCSVCleaner.java.
>  That class will run a pig script for by creating an instance of the
> "PigServer" object.  This allows you to craft your own pig tests in java
> that can validate your data flow and pig script in a java app.
> Alternatively, use PigUnit to test your pig script.  Again, although these
> do more than compilation, the effect is essentially the same:  They
> validate your script without running in a real cluster.
>
> 2) Also, what is the clean way to handle library dependencies of a Pig
> script?
>
> Sorry, about this, I thought by the subject "pig maven integration" you
> were referring to the various dependencies you need in a maven project to
> run pig code with real API calls.  For that, you can paste the pom
> dependencies .
>
> If you mean external libraries, just use the pig "register" command in your
> pig script.

Re: pig maven integration

Posted by Jay Vyas <ja...@gmail.com>.
Ah, I guess from the "pig maven integration" subject heading i assumed you
wanted tight maven integration of your pig scripts.  In any case, here is
what i was suggesting.

1) How can we just *compile (compile only)* pig scripts?

See
https://github.com/apache/bigtop/blob/master/bigtop-bigpetstore/src/main/java/org/apache/bigtop/bigpetstore/etl/PigCSVCleaner.java.
 That class will run a pig script for by creating an instance of the
"PigServer" object.  This allows you to craft your own pig tests in java
that can validate your data flow and pig script in a java app.
Alternatively, use PigUnit to test your pig script.  Again, although these
do more than compilation, the effect is essentially the same:  They
validate your script without running in a real cluster.

2) Also, what is the clean way to handle library dependencies of a Pig
script?

Sorry, about this, I thought by the subject "pig maven integration" you
were referring to the various dependencies you need in a maven project to
run pig code with real API calls.  For that, you can paste the pom
dependencies .

If you mean external libraries, just use the pig "register" command in your
pig script.

Re: pig maven integration

Posted by Salabhanjika S <sa...@gmail.com>.
Thanks for quick response Jay.

I had gone through the bigpetstore pom. However, it seems to me that pig
profile in bigpetstore doesn't seems to solve either of my two problem I
posted. Its highly possible that I have missed something. Can you please
elaborate more on this?


On Sun, Apr 27, 2014 at 1:32 AM, Jay Vyas <ja...@gmail.com> wrote:

> Hi S:
>
> I would suggest you look into apache bigtop's bigpetstore project and
> borrow from the pig profile in the pom.xml file there.  it does essentially
> what you want, and also has all the pig libraries necesssary for running
> the whole thing in a maven task.
>
> For an intro to the bigpetstore project's goals you can watch the youtube
> demo: https://www.youtube.com/watch?v=OVB3nEKN94k, which also shows how we
> test the pig portion locally and then how we run the same thing in a
> cluster.
>
> To test it locally, you do : mvn clean verify -P pig.  You can easily adopt
> the TestPig*IT.java class for your own integration testing needs.
>
> To run the same thing on the cluster, you run the corresponding pig class
> in a hadoop job.
>
> See
>
> https://github.com/apache/bigtop/blob/master/bigtop-bigpetstore/README.mdfor
> details .
>
>
> On Sat, Apr 26, 2014 at 2:01 PM, Salabhanjika S <salabhanjika9@gmail.com
> >wrote:
>
> > Hi,
> >
> > I'm new to Pig scripting. Please provide me some pointers on the
> following.
> >
> > 1. How can we just *compile (compile only)* pig scripts? I had gone
> through
> > the documentation and *"-check"* is providing the syntax check
> > functionality. But it requires all params used in the script to be set.
> I'm
> > looking for something more generic. So that I can have a quick sanity
> check
> > of my scripts.
> >
> > 2. Also, what is the clean way to handle library dependencies of a Pig
> > script? Current way of registering the jars by path looks very odd to me.
> > This requires changes in script/code when there is a library upgrade.
> >
> >
> > -S
> >
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: pig maven integration

Posted by Jay Vyas <ja...@gmail.com>.
Hi S:

I would suggest you look into apache bigtop's bigpetstore project and
borrow from the pig profile in the pom.xml file there.  it does essentially
what you want, and also has all the pig libraries necesssary for running
the whole thing in a maven task.

For an intro to the bigpetstore project's goals you can watch the youtube
demo: https://www.youtube.com/watch?v=OVB3nEKN94k, which also shows how we
test the pig portion locally and then how we run the same thing in a
cluster.

To test it locally, you do : mvn clean verify -P pig.  You can easily adopt
the TestPig*IT.java class for your own integration testing needs.

To run the same thing on the cluster, you run the corresponding pig class
in a hadoop job.

See
https://github.com/apache/bigtop/blob/master/bigtop-bigpetstore/README.mdfor
details .


On Sat, Apr 26, 2014 at 2:01 PM, Salabhanjika S <sa...@gmail.com>wrote:

> Hi,
>
> I'm new to Pig scripting. Please provide me some pointers on the following.
>
> 1. How can we just *compile (compile only)* pig scripts? I had gone through
> the documentation and *"-check"* is providing the syntax check
> functionality. But it requires all params used in the script to be set. I'm
> looking for something more generic. So that I can have a quick sanity check
> of my scripts.
>
> 2. Also, what is the clean way to handle library dependencies of a Pig
> script? Current way of registering the jars by path looks very odd to me.
> This requires changes in script/code when there is a library upgrade.
>
>
> -S
>



-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: pig maven integration

Posted by Jay Vyas <ja...@gmail.com>.
Hi S:

I would suggest you look into apache bigtop's bigpetstore project and
borrow from the pig profile in the pom.xml file there.  it does essentially
what you want, and also has all the pig libraries necesssary for running
the whole thing in a maven task.

For an intro to the bigpetstore project's goals you can watch the youtube
demo: https://www.youtube.com/watch?v=OVB3nEKN94k, which also shows how we
test the pig portion locally and then how we run the same thing in a
cluster.

To test it locally, you do : mvn clean verify -P pig.  You can easily adopt
the TestPig*IT.java class for your own integration testing needs.

To run the same thing on the cluster, you run the corresponding pig class
in a hadoop job.

See
https://github.com/apache/bigtop/blob/master/bigtop-bigpetstore/README.mdfor
details .


On Sat, Apr 26, 2014 at 2:01 PM, Salabhanjika S <sa...@gmail.com>wrote:

> Hi,
>
> I'm new to Pig scripting. Please provide me some pointers on the following.
>
> 1. How can we just *compile (compile only)* pig scripts? I had gone through
> the documentation and *"-check"* is providing the syntax check
> functionality. But it requires all params used in the script to be set. I'm
> looking for something more generic. So that I can have a quick sanity check
> of my scripts.
>
> 2. Also, what is the clean way to handle library dependencies of a Pig
> script? Current way of registering the jars by path looks very odd to me.
> This requires changes in script/code when there is a library upgrade.
>
>
> -S
>



-- 
Jay Vyas
http://jayunit100.blogspot.com