You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sameer Tilak <sa...@gmail.com> on 2009/04/10 23:19:54 UTC

Hadoop and Image analysis question

Hi everyone,
I would like to use Hadoop for analyzing tens of thousands of images.
Ideally each mapper gets few hundred images to process and I'll have few
hundred mappers. However, I want the mapper function to run on the machine
where its images are stored. How can I achieve that. With text data creating
splits and exploiting locality seems easy.

One option would be input to the map function would be a text file and that
each line of the text file will contain name of the image to
be processed. Now this text file is the i/p to the mapper function, so
mapper parses the file and reads the image file name to be processed..
Unfortunately, one drawback of this scheme is that the image file itself
might be stored on a machine different than the one running this mapper
function. Copying the file over the network would be quite inefficient. Any
help on this would be great.