You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Aravinth Bheemaraj <b....@gmail.com> on 2010/11/29 20:56:35 UTC

Image as input to M-R in Hadoop

Hi,

I am a beginner to Hadoop and I am looking for some help in implementing the
Mapper with an image as input. Is there any predefined Writable class for
processing image? If so, how do I use it?

Also I have read somewhere that compressed formats cannot be processed in
Hadoop. If this is true, am I making any sense in saying that the JPEG
images (which are also compressed format) cannot be processed by Hadoop?
Please correct me if I have misunderstood this concept.

Thanks,
-- 
Aravinth

Re: Image as input to M-R in Hadoop

Posted by daniel sikar <ds...@gmail.com>.
Hi Aravinth

This is probably do-able with Hadoop Streaming.

Imagine you have copied a bunch of image files to HDFS and now you
want to point them to say, an executable. Odds are that executable
already exists with some command line options that would take amongst
other things, the file path of the image you would like to process.

Hadoop Streaming makes a number of environment variables available at
runtime, for instance "map_input_file" which gives you the file name
of the file being processed, and so forth. My guess is that there is
also an environment variable that will give you the filepath in the
local filesystem.

You need to code that in plus add a -file parameter to specify your
executable. If you are using Amazon's EMR, you will need to put your
code and executable into an S3 bucket, then specify the bucket name to
Hadoop Streaming.

Good luck

Daniel

On 29 November 2010 22:49, Shrijeet Paliwal <sh...@rocketfuel.com> wrote:
> This gentleman here (see below) is doing a hadoop streaming magic and
> seems to be playing with the image features in map reducy way. Its not
> using hadoop's java api though, so no help there.
> Still you can check and see if the articles gives you some clues,
> http://techportal.ibuildings.com/2009/11/02/precision-color-searching-with-gmagick-and-amazon-elastic-mapreduce/
>
> PS: Pardon if the motivation in the article is orthogonal to yours.
>
> -Shrijeet
>
> On Mon, Nov 29, 2010 at 2:13 PM, Aravinth Bheemaraj
> <b....@gmail.com> wrote:
>> Michael, thanks a lot for your reply.
>>
>> I got to compare the images based on pixels. So is it possible to process
>> the image based on Pixel values rather than XML records?
>>
>> I have read somewhere that the class "InputFormat" can be customized to
>> handle images by extending "InputSplit" and "RecordReader". But I am unsure
>> of the methods which are to be overridden so that I can access pixels of the
>> image. Is there anyway you can help me with this?
>>
>> Regarding the note, I am reading in a directory with multiple image files.
>>
>> On Mon, Nov 29, 2010 at 4:08 PM, Michael Segel <mi...@hotmail.com>wrote:
>>
>>>
>>> Hi,
>>> The short answer is yes you can process images in Hadoop.
>>> Think of the image as a multi-line byte stream.
>>>
>>> As to an existing class, I don't believe that it exists, but shouldn't be
>>> too difficult to cobble.
>>> (If you can read in XML records for processing you should be able to read
>>> in a file containing a series of images.)
>>>
>>> Note: I'm assuming that you're either reading in a directory w multiple
>>> image files, or an image file w multiple images. Otherwise you probably
>>> don't want to use Hadoop.
>>>
>>>
>>> > Date: Mon, 29 Nov 2010 14:56:35 -0500
>>> > Subject: Image as input to M-R in Hadoop
>>> > From: b.aravinth@gmail.com
>>> > To: general@hadoop.apache.org
>>> >
>>> > Hi,
>>> >
>>> > I am a beginner to Hadoop and I am looking for some help in implementing
>>> the
>>> > Mapper with an image as input. Is there any predefined Writable class for
>>> > processing image? If so, how do I use it?
>>> >
>>> > Also I have read somewhere that compressed formats cannot be processed in
>>> > Hadoop. If this is true, am I making any sense in saying that the JPEG
>>> > images (which are also compressed format) cannot be processed by Hadoop?
>>> > Please correct me if I have misunderstood this concept.
>>> >
>>> > Thanks,
>>> > --
>>> > Aravinth
>>>
>>>
>>
>>
>>
>> --
>> Aravinth Bheemaraj
>> University of Florida
>>
>

Re: Image as input to M-R in Hadoop

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
This gentleman here (see below) is doing a hadoop streaming magic and
seems to be playing with the image features in map reducy way. Its not
using hadoop's java api though, so no help there.
Still you can check and see if the articles gives you some clues,
http://techportal.ibuildings.com/2009/11/02/precision-color-searching-with-gmagick-and-amazon-elastic-mapreduce/

PS: Pardon if the motivation in the article is orthogonal to yours.

-Shrijeet

On Mon, Nov 29, 2010 at 2:13 PM, Aravinth Bheemaraj
<b....@gmail.com> wrote:
> Michael, thanks a lot for your reply.
>
> I got to compare the images based on pixels. So is it possible to process
> the image based on Pixel values rather than XML records?
>
> I have read somewhere that the class "InputFormat" can be customized to
> handle images by extending "InputSplit" and "RecordReader". But I am unsure
> of the methods which are to be overridden so that I can access pixels of the
> image. Is there anyway you can help me with this?
>
> Regarding the note, I am reading in a directory with multiple image files.
>
> On Mon, Nov 29, 2010 at 4:08 PM, Michael Segel <mi...@hotmail.com>wrote:
>
>>
>> Hi,
>> The short answer is yes you can process images in Hadoop.
>> Think of the image as a multi-line byte stream.
>>
>> As to an existing class, I don't believe that it exists, but shouldn't be
>> too difficult to cobble.
>> (If you can read in XML records for processing you should be able to read
>> in a file containing a series of images.)
>>
>> Note: I'm assuming that you're either reading in a directory w multiple
>> image files, or an image file w multiple images. Otherwise you probably
>> don't want to use Hadoop.
>>
>>
>> > Date: Mon, 29 Nov 2010 14:56:35 -0500
>> > Subject: Image as input to M-R in Hadoop
>> > From: b.aravinth@gmail.com
>> > To: general@hadoop.apache.org
>> >
>> > Hi,
>> >
>> > I am a beginner to Hadoop and I am looking for some help in implementing
>> the
>> > Mapper with an image as input. Is there any predefined Writable class for
>> > processing image? If so, how do I use it?
>> >
>> > Also I have read somewhere that compressed formats cannot be processed in
>> > Hadoop. If this is true, am I making any sense in saying that the JPEG
>> > images (which are also compressed format) cannot be processed by Hadoop?
>> > Please correct me if I have misunderstood this concept.
>> >
>> > Thanks,
>> > --
>> > Aravinth
>>
>>
>
>
>
> --
> Aravinth Bheemaraj
> University of Florida
>

Re: Image as input to M-R in Hadoop

Posted by Aravinth Bheemaraj <b....@gmail.com>.
Michael, thanks a lot for your reply.

I got to compare the images based on pixels. So is it possible to process
the image based on Pixel values rather than XML records?

I have read somewhere that the class "InputFormat" can be customized to
handle images by extending "InputSplit" and "RecordReader". But I am unsure
of the methods which are to be overridden so that I can access pixels of the
image. Is there anyway you can help me with this?

Regarding the note, I am reading in a directory with multiple image files.

On Mon, Nov 29, 2010 at 4:08 PM, Michael Segel <mi...@hotmail.com>wrote:

>
> Hi,
> The short answer is yes you can process images in Hadoop.
> Think of the image as a multi-line byte stream.
>
> As to an existing class, I don't believe that it exists, but shouldn't be
> too difficult to cobble.
> (If you can read in XML records for processing you should be able to read
> in a file containing a series of images.)
>
> Note: I'm assuming that you're either reading in a directory w multiple
> image files, or an image file w multiple images. Otherwise you probably
> don't want to use Hadoop.
>
>
> > Date: Mon, 29 Nov 2010 14:56:35 -0500
> > Subject: Image as input to M-R in Hadoop
> > From: b.aravinth@gmail.com
> > To: general@hadoop.apache.org
> >
> > Hi,
> >
> > I am a beginner to Hadoop and I am looking for some help in implementing
> the
> > Mapper with an image as input. Is there any predefined Writable class for
> > processing image? If so, how do I use it?
> >
> > Also I have read somewhere that compressed formats cannot be processed in
> > Hadoop. If this is true, am I making any sense in saying that the JPEG
> > images (which are also compressed format) cannot be processed by Hadoop?
> > Please correct me if I have misunderstood this concept.
> >
> > Thanks,
> > --
> > Aravinth
>
>



-- 
Aravinth Bheemaraj
University of Florida

RE: Image as input to M-R in Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
Hi,
The short answer is yes you can process images in Hadoop.
Think of the image as a multi-line byte stream.

As to an existing class, I don't believe that it exists, but shouldn't be too difficult to cobble. 
(If you can read in XML records for processing you should be able to read in a file containing a series of images.)

Note: I'm assuming that you're either reading in a directory w multiple image files, or an image file w multiple images. Otherwise you probably don't want to use Hadoop.


> Date: Mon, 29 Nov 2010 14:56:35 -0500
> Subject: Image as input to M-R in Hadoop
> From: b.aravinth@gmail.com
> To: general@hadoop.apache.org
> 
> Hi,
> 
> I am a beginner to Hadoop and I am looking for some help in implementing the
> Mapper with an image as input. Is there any predefined Writable class for
> processing image? If so, how do I use it?
> 
> Also I have read somewhere that compressed formats cannot be processed in
> Hadoop. If this is true, am I making any sense in saying that the JPEG
> images (which are also compressed format) cannot be processed by Hadoop?
> Please correct me if I have misunderstood this concept.
> 
> Thanks,
> -- 
> Aravinth