You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Danfeng Li <dl...@operasolutions.com> on 2012/12/20 19:01:49 UTC

pig ship tar files

I read alot of about pig can ship a tar file and untar it before execution. However, I couldn't find any example. Can someone provide an example?

What I would like to do is to ship a python module, such as nltk, for my streaming.

Thanks.

Dan



RE: pig ship tar files

Posted by Danfeng Li <dl...@operasolutions.com>.
Thanks, but I'm still not quite clear on how to do it.

"...One way to work around this limitation is to tar all the dependencies into a tar file that accurately reflects the structure needed on the compute nodes, then have a wrapper for your script that un-tars the dependencies prior to execution.",

Can you show an example of how to do it?

Thanks.

-----Original Message-----
From: Alan Gates [mailto:gates@hortonworks.com] 
Sent: Thursday, December 20, 2012 10:57 AM
To: user@pig.apache.org
Subject: Re: pig ship tar files

See http://pig.apache.org/docs/r0.10.0/basic.html#define-udfs especially the section on SHIP.

Alan.

On Dec 20, 2012, at 10:01 AM, Danfeng Li wrote:

> I read alot of about pig can ship a tar file and untar it before execution. However, I couldn't find any example. Can someone provide an example?
> 
> What I would like to do is to ship a python module, such as nltk, for my streaming.
> 
> Thanks.
> 
> Dan
> 
> 


Re: pig ship tar files

Posted by Alan Gates <ga...@hortonworks.com>.
See http://pig.apache.org/docs/r0.10.0/basic.html#define-udfs especially the section on SHIP.

Alan.

On Dec 20, 2012, at 10:01 AM, Danfeng Li wrote:

> I read alot of about pig can ship a tar file and untar it before execution. However, I couldn't find any example. Can someone provide an example?
> 
> What I would like to do is to ship a python module, such as nltk, for my streaming.
> 
> Thanks.
> 
> Dan
> 
> 


Re: pig ship tar files

Posted by Rohini Palaniswamy <ro...@gmail.com>.
You can also use -Dmapred.cache.archives=<hdfs:///your tar file path> to
ship the tar file using distributed cache. Hadoop will take care of
untarring the file and putting it in the current directory if the extension
is one of .zip, .tar, .tgz or .tar.gz. This is a feature of
hadoop's distributed cache.

Regards,
Rohini


On Fri, Dec 21, 2012 at 2:25 AM, Thomas Bach
<th...@students.uni-mainz.de>wrote:

> On Thu, Dec 20, 2012 at 01:01:49PM -0500, Danfeng Li wrote:
> > I read alot of about pig can ship a tar file and untar it before
> > execution. However, I couldn't find any example. Can someone provide
> > an example?
>
> The trick is to use the `SH' statement to untar the file.
>
> > What I would like to do is to ship a python module, such as nltk,
> > for my streaming.
>
> Try something like (untested)
>
> DEFINE my_cmd `relative/path/to/my_cmd/in/tar/file.py`
> SHIP('nltk.tar');
>
> SH tar xf nltk.tar
>
> Does this help/work?
>
> Regards,
>         Thomas.
>

Re: pig ship tar files

Posted by Thomas Bach <th...@students.uni-mainz.de>.
On Thu, Dec 20, 2012 at 01:01:49PM -0500, Danfeng Li wrote:
> I read alot of about pig can ship a tar file and untar it before
> execution. However, I couldn't find any example. Can someone provide
> an example?

The trick is to use the `SH' statement to untar the file.
 
> What I would like to do is to ship a python module, such as nltk,
> for my streaming.

Try something like (untested)

DEFINE my_cmd `relative/path/to/my_cmd/in/tar/file.py`
SHIP('nltk.tar');

SH tar xf nltk.tar

Does this help/work?

Regards,
	Thomas.