You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mayra Mendoza <ar...@gmail.com> on 2010/01/14 14:11:27 UTC

Help with hadoop Streaming

I need open and save image using hadoop and python, i'm tried  two way to do
this:

1. Using WholeFileInputFormat.class
for infile in sys.stdin:
   data = str(infile)
   data = StringIO.StringIO(data).getvalue()
   image = Image.frombuffer("RGB", (128,128), str(data), "raw","RGB",0,1)
   image = image.convert('L')
   print '%s\t' % (data)

ERROR
File "/usr/lib/python2.5/site-packages/PIL/Image.py", line 576, in
fromstring
raise ValueError("not enough image data")
ValueError: not enough image data

2.
for infile in sys.stdin:
   pathIn = os.getenv('map_input_file')
   image = Image.open(pathIn)

ERROR
IOError: [Errno 2] No such file or directory:
'hdfs://localhost:8022/user/training/input/img10.jpg'

This dir exist....!
training@training-vm:~$ hadoop fs -lsr /user/training/input/
-rw-r--r-- 1 training supergroup 1556016 2010-01-13 08:22
/user/training/input/img10.jpg

Can you help me, what i do wrong??, I'm using hadoop 0.20.1, python and PIL

Re: Help with hadoop Streaming

Posted by Zak Stone <zs...@gmail.com>.
I recommend that you use Dumbo and TypedBytes to work with binary data
from Python with Hadoop streaming:

http://wiki.github.com/klbostee/dumbo/
http://dumbotics.com/2009/02/24/hadoop-1722-and-typed-bytes/

All of the TypedBytes patches should now be included in all of the
Cloudera Hadoop distributions.

Zak


On Thu, Jan 14, 2010 at 8:11 AM, Mayra Mendoza <ar...@gmail.com> wrote:
> I need open and save image using hadoop and python, i'm tried  two way to do
> this:
>
> 1. Using WholeFileInputFormat.class
> for infile in sys.stdin:
>   data = str(infile)
>   data = StringIO.StringIO(data).getvalue()
>   image = Image.frombuffer("RGB", (128,128), str(data), "raw","RGB",0,1)
>   image = image.convert('L')
>   print '%s\t' % (data)
>
> ERROR
> File "/usr/lib/python2.5/site-packages/PIL/Image.py", line 576, in
> fromstring
> raise ValueError("not enough image data")
> ValueError: not enough image data
>
> 2.
> for infile in sys.stdin:
>   pathIn = os.getenv('map_input_file')
>   image = Image.open(pathIn)
>
> ERROR
> IOError: [Errno 2] No such file or directory:
> 'hdfs://localhost:8022/user/training/input/img10.jpg'
>
> This dir exist....!
> training@training-vm:~$ hadoop fs -lsr /user/training/input/
> -rw-r--r-- 1 training supergroup 1556016 2010-01-13 08:22
> /user/training/input/img10.jpg
>
> Can you help me, what i do wrong??, I'm using hadoop 0.20.1, python and PIL
>