You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ch...@students.iiit.ac.in on 2008/05/08 10:02:50 UTC

How to handle tif image files in hadoop

Hi,

 I want to process the information in tif images using hadoop. For this, a
BufferedImage object has to be created. For JPEG images, ImageIO is used
alongwith the ByteArrayOutputStream which contains byte data of the
image. But  for TIFF image,this doesn't work. Is there any way to handle
this problem?

  Also, can conventional JAI library methods be used to directly access
TIFF files in HDFS?

Thank you.



Re: How to handle tif image files in hadoop

Posted by Ted Dunning <td...@veoh.com>.
Your read loop has a bug in it and also allocate way more garbage than is
necessary.  Also, the small buffer size will slow things down somewhat.

Try this instead:

      byte[] buffer = new byte[100000];
      int readBytes = instream.read(buffer);
      while (readBytes > 0) {
         fimb.write(buffer, 0, readBytes);
         readBytes = instream.read(buffer);
      }

But even if you manage to read the data correctly, why are you doing all the
work of reading the data into a buffer and then reading it into an image?

Why not just replace everything from line 3 on with this:

      BufferedImage picture = ImageIO.read( instream )


What you are doing here will work reasonably will if you have an input path
name, possibly because your map input contains file names, but this loses
all locality.

It would be better to copy or reinvent some of the archive code so that you
can put all of your images in a few files.  All you need from your envelope
is a byte count for each image.  Thus, if your input file has a 4 byte
integer containing the size of the following image you can build a very
simple input format that will read images and pass them to your mapper.
Doing that allows hadoop to position the compute tasks near your data which
will improve your performance dramatically.


On 5/9/08 6:22 AM, "charan@students.iiit.ac.in" <ch...@students.iiit.ac.in>
wrote:

> Hi,
> 
>   Thankyou sir for letting me know one more aspect of hadoop.
> But I used JAI and processed our files by reading them as bytes in HDFS
> and sending it to JAI library for tiff . And it worked :)
> 
> For those Who want to work with tiff files in hdfs, here is a way
> 
>           Path inFile = new Path(infilename);
>           FSDataInputStream instream = fs.open(inFile);
>           ByteArrayOutputStream fimb = new ByteArrayOutputStream();
>           byte[] buffer = new byte[300];
>           int readBytes = 0;
>           while((readBytes = instream.read(buffer)) > 0)
>           {
>               fimb.write(buffer,0,300);
>               buffer = new byte[300];
>           }
>           byte[] formattedImageBytes;
>           formattedImageBytes = fimb.toByteArray();
>   BufferedImage picture = ImageIO.read ( new ByteArrayInputStream (
> formattedImageBytes ) );
> 
> Once we can get a buffered image it is easy to process as it is an object
> 
> Thank you.
> 
> 
>> Hello ,
>> 
>> It's better that you write your own InputFormat for processing the tif
>> images . For more information you can look into this
>> 
>> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Input
>> Format.html
>> 
>> ---
>> Peeyush
>> 
>> On Thu, 2008-05-08 at 13:32 +0530, charan@students.iiit.ac.in wrote:
>> 
>>> Hi,
>>> 
>>>  I want to process the information in tif images using hadoop. For this,
>>> a
>>> BufferedImage object has to be created. For JPEG images, ImageIO is used
>>> alongwith the ByteArrayOutputStream which contains byte data of the
>>> image. But  for TIFF image,this doesn't work. Is there any way to handle
>>> this problem?
>>> 
>>>   Also, can conventional JAI library methods be used to directly access
>>> TIFF files in HDFS?
>>> 
>>> Thank you.
>>> 
>>> 
>> 
>> 
>> 
> 
> 
> 


Re: How to handle tif image files in hadoop

Posted by ch...@students.iiit.ac.in.
Hi,

  Thankyou sir for letting me know one more aspect of hadoop.
But I used JAI and processed our files by reading them as bytes in HDFS
and sending it to JAI library for tiff . And it worked :)

For those Who want to work with tiff files in hdfs, here is a way

          Path inFile = new Path(infilename);
          FSDataInputStream instream = fs.open(inFile);
          ByteArrayOutputStream fimb = new ByteArrayOutputStream();
          byte[] buffer = new byte[300];
          int readBytes = 0;
          while((readBytes = instream.read(buffer)) > 0)
          {
              fimb.write(buffer,0,300);
              buffer = new byte[300];
          }
          byte[] formattedImageBytes;
          formattedImageBytes = fimb.toByteArray();
  BufferedImage picture = ImageIO.read ( new ByteArrayInputStream (
formattedImageBytes ) );

Once we can get a buffered image it is easy to process as it is an object

Thank you.


> Hello ,
>
> It's better that you write your own InputFormat for processing the tif
> images . For more information you can look into this
>
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html
>
> ---
> Peeyush
>
> On Thu, 2008-05-08 at 13:32 +0530, charan@students.iiit.ac.in wrote:
>
>> Hi,
>>
>>  I want to process the information in tif images using hadoop. For this,
>> a
>> BufferedImage object has to be created. For JPEG images, ImageIO is used
>> alongwith the ByteArrayOutputStream which contains byte data of the
>> image. But  for TIFF image,this doesn't work. Is there any way to handle
>> this problem?
>>
>>   Also, can conventional JAI library methods be used to directly access
>> TIFF files in HDFS?
>>
>> Thank you.
>>
>>
>
>
>




Re: How to handle tif image files in hadoop

Posted by Peeyush Bishnoi <pe...@yahoo-inc.com>.
Hello ,

It's better that you write your own InputFormat for processing the tif
images . For more information you can look into this 

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html 

---
Peeyush

On Thu, 2008-05-08 at 13:32 +0530, charan@students.iiit.ac.in wrote:

> Hi,
> 
>  I want to process the information in tif images using hadoop. For this, a
> BufferedImage object has to be created. For JPEG images, ImageIO is used
> alongwith the ByteArrayOutputStream which contains byte data of the
> image. But  for TIFF image,this doesn't work. Is there any way to handle
> this problem?
> 
>   Also, can conventional JAI library methods be used to directly access
> TIFF files in HDFS?
> 
> Thank you.
> 
> 

RE: How to handle tif image files in hadoop

Posted by Ted Dunning <td...@veoh.com>.
Since you get an InputStream from HDFS, you should be able to use any standard I/O package including all of the image I/O stuff.

How do you normally read TIFF files?

-----Original Message-----
From: charan@students.iiit.ac.in [mailto:charan@students.iiit.ac.in]
Sent: Thu 5/8/2008 1:02 AM
To: core-user@hadoop.apache.org
Subject: How to handle tif image files in hadoop
 
Hi,

 I want to process the information in tif images using hadoop. For this, a
BufferedImage object has to be created. For JPEG images, ImageIO is used
alongwith the ByteArrayOutputStream which contains byte data of the
image. But  for TIFF image,this doesn't work. Is there any way to handle
this problem?

  Also, can conventional JAI library methods be used to directly access
TIFF files in HDFS?

Thank you.