You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by openresearch <Qi...@openresearchinc.com> on 2009/06/03 23:18:16 UTC

streaming a binary processing file

Hi all,

I have a urgent question regarding processing binary (image) data using
Hadoop streaming.
I am looking for simplest solution, preferably without making change to
hadoop and/or streaming package. 

I got some hints from this mailing list, including using customized
InputFormat, or sequencefileInputForm. but nothing really help me out. Here
is my problem:

1. A lot of binary (image) files stored on HDFS. 
2. a standalone executable take binary (e.g., image) filename as input (key)
and export small metadata as value (e.g., size of image)

How can we passing the this standalone program as a mapper to streaming to
process image across all nodes, given streaming currently only takes stdin
by default. 

Thanks. 

-Qiming


-- 
View this message in context: http://www.nabble.com/streaming-a-binary-processing-file-tp23859645p23859645.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: streaming a binary processing file

Posted by Zak Stone <zs...@gmail.com>.
One simple solution is to use Dumbo, a Python interface to Hadoop that
supports binary streaming:

http://wiki.github.com/klbostee/dumbo

Zak


On Wed, Jun 3, 2009 at 5:18 PM, openresearch
<Qi...@openresearchinc.com> wrote:
>
> Hi all,
>
> I have a urgent question regarding processing binary (image) data using
> Hadoop streaming.
> I am looking for simplest solution, preferably without making change to
> hadoop and/or streaming package.
>
> I got some hints from this mailing list, including using customized
> InputFormat, or sequencefileInputForm. but nothing really help me out. Here
> is my problem:
>
> 1. A lot of binary (image) files stored on HDFS.
> 2. a standalone executable take binary (e.g., image) filename as input (key)
> and export small metadata as value (e.g., size of image)
>
> How can we passing the this standalone program as a mapper to streaming to
> process image across all nodes, given streaming currently only takes stdin
> by default.
>
> Thanks.
>
> -Qiming
>
>
> --
> View this message in context: http://www.nabble.com/streaming-a-binary-processing-file-tp23859645p23859645.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>