You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2014/08/11 23:58:13 UTC

Re: Load a file in a Python UDF

Thanks for the tip, it looks like this will work.

On Thursday, July 24, 2014, Jeremy Karn <jk...@mortardata.com> wrote:

> Hi Russell,
>
> This might be a bit late, but here's an example of how you can load a file
> in python and pass the results back to Pig:
> https://github.com/mortarcode/python-files
>
> It's a Mortar project but the pig script (
>
> https://github.com/mortarcode/python-files/blob/master/pigscripts/python-files.pig
> )
> and python udf file (
>
> https://github.com/mortarcode/python-files/blob/master/udfs/python/python-files.py
> )
> should work fine without Mortar as long as you explicitly set the AWS key
> parameters in the Pig script and have boto installed.
>
> This example uses a small file - if you want to read a larger file you'll
> need to handle boto/s3 issues with downloading large files or have Python
> read directly from hdfs.  I've found s3 actually works pretty well though
> for small files like this.  Reading larger files in Python doesn't work
> very well because you have to worry about running out of memory when
> passing everything back from Python to Java.
>
>   Jeremy Karn / Lead Developer
> MORTAR DATA / 519 277 4391 / www.mortardata.com
>
>
> On Sun, Jul 20, 2014 at 5:14 PM, Russell Jurney <russell.jurney@gmail.com
> <javascript:;>>
> wrote:
>
> > I need to load a file and loop through it during the execution of a
> python
> > UDF. Is this possible? How?
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> <javascript:;>
> > datasyndrome.com
> > ᐧ
> >
>


-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com