You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Christopher Petrino <cj...@yesware.com> on 2014/03/05 16:33:16 UTC

Managing Large Pig Scripts

Hi all, what is everyone's approach for managing a Pig scripts that has
become very long? What is your best way to break it up into smaller pieces?

Re: Managing Large Pig Scripts

Posted by Russell Jurney <ru...@gmail.com>.
Macros for code that appears more than once. Split script into multiple
scripts and schedule with dependency in azkaban.

On Wednesday, March 5, 2014, Christopher Petrino <cj...@yesware.com> wrote:

> Thank you Dan and Jacob. I am currently on 0.11.1 but open to upgrading.
> Over the last few weeks I developed a Pig script that has become a little
> over a 150 lines long and I was hoping I could find a way to modularize the
> script. I was going to follow something like mentioned in this link:
>
> http://stackoverflow.com/questions/7557528/how-to-call-a-pig-script-within-another-pig-scriptbut
> was curious what the community has been doing. Thank you for your
> input!
>
> -Chris
>
>
> On Wed, Mar 5, 2014 at 10:55 AM, Jacob Perkins <jacob.a.perkins@gmail.com<javascript:;>
> >wrote:
>
> > Christopher,
> >
> > You might consider breaking it into one or more reusable macros. What
> > version of pig are you using?
> >
> > For complicated scripts, especially if you didn't write them, you might
> > want to take a look at lipstick, https://github.com/Netflix/Lipstick
> > It allows you to visualize the dag and clearly shows what logical
> > operators map to map-reduce jobs. It could be a starting point for
> managing
> > complexity at least.
> >
> > --jacob
> > @thedatachef
> >
> >
> > On Mar 5, 2014, at 7:33 AM, Christopher Petrino <cjp@yesware.com<javascript:;>>
> wrote:
> >
> > > Hi all, what is everyone's approach for managing a Pig scripts that has
> > > become very long? What is your best way to break it up into smaller
> > pieces?
> >
> >
>


-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Managing Large Pig Scripts

Posted by Christopher Petrino <cj...@yesware.com>.
Thank you Dan and Jacob. I am currently on 0.11.1 but open to upgrading.
Over the last few weeks I developed a Pig script that has become a little
over a 150 lines long and I was hoping I could find a way to modularize the
script. I was going to follow something like mentioned in this link:
http://stackoverflow.com/questions/7557528/how-to-call-a-pig-script-within-another-pig-scriptbut
was curious what the community has been doing. Thank you for your
input!

-Chris


On Wed, Mar 5, 2014 at 10:55 AM, Jacob Perkins <ja...@gmail.com>wrote:

> Christopher,
>
> You might consider breaking it into one or more reusable macros. What
> version of pig are you using?
>
> For complicated scripts, especially if you didn't write them, you might
> want to take a look at lipstick, https://github.com/Netflix/Lipstick
> It allows you to visualize the dag and clearly shows what logical
> operators map to map-reduce jobs. It could be a starting point for managing
> complexity at least.
>
> --jacob
> @thedatachef
>
>
> On Mar 5, 2014, at 7:33 AM, Christopher Petrino <cj...@yesware.com> wrote:
>
> > Hi all, what is everyone's approach for managing a Pig scripts that has
> > become very long? What is your best way to break it up into smaller
> pieces?
>
>

Re: Managing Large Pig Scripts

Posted by Jacob Perkins <ja...@gmail.com>.
Christopher,

You might consider breaking it into one or more reusable macros. What version of pig are you using?

For complicated scripts, especially if you didn't write them, you might want to take a look at lipstick, https://github.com/Netflix/Lipstick
It allows you to visualize the dag and clearly shows what logical operators map to map-reduce jobs. It could be a starting point for managing complexity at least.

--jacob
@thedatachef


On Mar 5, 2014, at 7:33 AM, Christopher Petrino <cj...@yesware.com> wrote:

> Hi all, what is everyone's approach for managing a Pig scripts that has
> become very long? What is your best way to break it up into smaller pieces?


Re: Managing Large Pig Scripts

Posted by "Dan DeCapria, CivicScience" <da...@civicscience.com>.
I think it really depends on your script and your environment.

A good approach may be to split up the script into logical code blocks
(jobs), then execute those jobs in series via a bash script. I have found
it also helpful to persist the data from these jobs (not the intermediate
data) to a persistent data store; in case something goes wrong, you don't
have to rerun prior computations, just from the last failed job (at the
cost of additional loads).  This modular approach has been helpful in
development; you still get the Pig optimization benefits per module, and
this will allow for future expansion, such as concurrent job execution on
your cluster and optimizing cluster capacity.

Hope this helps,

-Dan



On Wed, Mar 5, 2014 at 10:33 AM, Christopher Petrino <cj...@yesware.com>wrote:

> Hi all, what is everyone's approach for managing a Pig scripts that has
> become very long? What is your best way to break it up into smaller pieces?
>