You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Eric Czech <ec...@gmail.com> on 2013/01/22 16:17:17 UTC

Shared script commands

Hi everyone,

I'm trying to determine the best way for all of my scripts to have
shared initialization statements like jar register commands, default
variable declarations, etc. and I'm not sure what the best way to do
it is.

Is it possible to create a script that does all of these boilerplate
things and then use the "exec" command from all of my other scripts to
call that "initialization" script?

For example, here's a somewhat abstracted version of what all of my
scripts start with:

myscript.pig -->

------------- Boilerplate declarations necessary in ALL scripts -------------

%DECLARE USERNAME `echo \$USER`
REGISTER /home/$USERNAME/apps/hadoop/build/share/myudfs.jar
%DEFAULT SCRIPT_MODE 'development'
%DEFAULT BUILD_ID '0'
SET pig.build.id '$BUILD_ID'

--------------------------------------------------------------------------------------

// pig code to do useful things


*/ end myscript.pig


Or is there a better way to use shared pig code like this (macros
don't allow a lot of the statements I need)?

Thank you!

Re: Shared script commands

Posted by Jonathan Coveney <jc...@gmail.com>.
At Twitter, we have a lightweight framework that handles stitching code
together....so I think with pig, stitching stuff together in some organized
way is the current best practice.


2013/1/22 Cheolsoo Park <ch...@cloudera.com>

> Hi Eric,
>
> You can move REGISTER and SET to a properties file and DECLARE and DEFAULT
> to a param file. Then, you can create an alias of pig like "pig -p <param
> file> -P <property file>".
>
> This is the best that I can think of. I am wondering if anyone has a better
> suggestion.
>
> Thanks,
> Cheolsoo
>
>
> On Tue, Jan 22, 2013 at 7:17 AM, Eric Czech <ec...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I'm trying to determine the best way for all of my scripts to have
> > shared initialization statements like jar register commands, default
> > variable declarations, etc. and I'm not sure what the best way to do
> > it is.
> >
> > Is it possible to create a script that does all of these boilerplate
> > things and then use the "exec" command from all of my other scripts to
> > call that "initialization" script?
> >
> > For example, here's a somewhat abstracted version of what all of my
> > scripts start with:
> >
> > myscript.pig -->
> >
> > ------------- Boilerplate declarations necessary in ALL scripts
> > -------------
> >
> > %DECLARE USERNAME `echo \$USER`
> > REGISTER /home/$USERNAME/apps/hadoop/build/share/myudfs.jar
> > %DEFAULT SCRIPT_MODE 'development'
> > %DEFAULT BUILD_ID '0'
> > SET pig.build.id '$BUILD_ID'
> >
> >
> >
> --------------------------------------------------------------------------------------
> >
> > // pig code to do useful things
> >
> >
> > */ end myscript.pig
> >
> >
> > Or is there a better way to use shared pig code like this (macros
> > don't allow a lot of the statements I need)?
> >
> > Thank you!
> >
>

Re: Shared script commands

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi Eric,

You can move REGISTER and SET to a properties file and DECLARE and DEFAULT
to a param file. Then, you can create an alias of pig like "pig -p <param
file> -P <property file>".

This is the best that I can think of. I am wondering if anyone has a better
suggestion.

Thanks,
Cheolsoo


On Tue, Jan 22, 2013 at 7:17 AM, Eric Czech <ec...@gmail.com> wrote:

> Hi everyone,
>
> I'm trying to determine the best way for all of my scripts to have
> shared initialization statements like jar register commands, default
> variable declarations, etc. and I'm not sure what the best way to do
> it is.
>
> Is it possible to create a script that does all of these boilerplate
> things and then use the "exec" command from all of my other scripts to
> call that "initialization" script?
>
> For example, here's a somewhat abstracted version of what all of my
> scripts start with:
>
> myscript.pig -->
>
> ------------- Boilerplate declarations necessary in ALL scripts
> -------------
>
> %DECLARE USERNAME `echo \$USER`
> REGISTER /home/$USERNAME/apps/hadoop/build/share/myudfs.jar
> %DEFAULT SCRIPT_MODE 'development'
> %DEFAULT BUILD_ID '0'
> SET pig.build.id '$BUILD_ID'
>
>
> --------------------------------------------------------------------------------------
>
> // pig code to do useful things
>
>
> */ end myscript.pig
>
>
> Or is there a better way to use shared pig code like this (macros
> don't allow a lot of the statements I need)?
>
> Thank you!
>