You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2014/07/20 23:14:55 UTC

Load a file in a Python UDF

I need to load a file and loop through it during the execution of a python
UDF. Is this possible? How?

-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
ᐧ

Re: Load a file in a Python UDF

Posted by Russell Jurney <ru...@gmail.com>.
Thanks for the tip, it looks like this will work.

On Thursday, July 24, 2014, Jeremy Karn <jk...@mortardata.com> wrote:

> Hi Russell,
>
> This might be a bit late, but here's an example of how you can load a file
> in python and pass the results back to Pig:
> https://github.com/mortarcode/python-files
>
> It's a Mortar project but the pig script (
>
> https://github.com/mortarcode/python-files/blob/master/pigscripts/python-files.pig
> )
> and python udf file (
>
> https://github.com/mortarcode/python-files/blob/master/udfs/python/python-files.py
> )
> should work fine without Mortar as long as you explicitly set the AWS key
> parameters in the Pig script and have boto installed.
>
> This example uses a small file - if you want to read a larger file you'll
> need to handle boto/s3 issues with downloading large files or have Python
> read directly from hdfs.  I've found s3 actually works pretty well though
> for small files like this.  Reading larger files in Python doesn't work
> very well because you have to worry about running out of memory when
> passing everything back from Python to Java.
>
>   Jeremy Karn / Lead Developer
> MORTAR DATA / 519 277 4391 / www.mortardata.com
>
>
> On Sun, Jul 20, 2014 at 5:14 PM, Russell Jurney <russell.jurney@gmail.com
> <javascript:;>>
> wrote:
>
> > I need to load a file and loop through it during the execution of a
> python
> > UDF. Is this possible? How?
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> <javascript:;>
> > datasyndrome.com
> > ᐧ
> >
>


-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: Load a file in a Python UDF

Posted by Jeremy Karn <jk...@mortardata.com>.
Hi Russell,

This might be a bit late, but here's an example of how you can load a file
in python and pass the results back to Pig:
https://github.com/mortarcode/python-files

It's a Mortar project but the pig script (
https://github.com/mortarcode/python-files/blob/master/pigscripts/python-files.pig)
and python udf file (
https://github.com/mortarcode/python-files/blob/master/udfs/python/python-files.py)
should work fine without Mortar as long as you explicitly set the AWS key
parameters in the Pig script and have boto installed.

This example uses a small file - if you want to read a larger file you'll
need to handle boto/s3 issues with downloading large files or have Python
read directly from hdfs.  I've found s3 actually works pretty well though
for small files like this.  Reading larger files in Python doesn't work
very well because you have to worry about running out of memory when
passing everything back from Python to Java.

  Jeremy Karn / Lead Developer
MORTAR DATA / 519 277 4391 / www.mortardata.com


On Sun, Jul 20, 2014 at 5:14 PM, Russell Jurney <ru...@gmail.com>
wrote:

> I need to load a file and loop through it during the execution of a python
> UDF. Is this possible? How?
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
> ᐧ
>