You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ni...@btconnect.com on 2011/03/03 08:23:06 UTC
Hadoop and image processing?
How applicable would Hadoop be to the processing of thousands of large
(60-100MB) 3D image files accessible via NFS, using a 100+ machine cluster?
Does the idea have any merit at all?
If so, perhaps there is a description or use case example kicking around
somewhere that I read as a first step?
Thanks, buk.
Re: Hadoop and image processing?
Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Mar 3, 2011 at 10:00 AM, Tom Deutsch <td...@us.ibm.com> wrote:
> Along with Brian I'd also suggest it depends on what you are doing with
> the images, but we used Hadoop specifically for this purpose in several
> solutions we build to do advanced imaging processing. Both scale out
> ability to large data volumes and (in our case) compute to do the image
> classification was well suited to Hadoop.
>
>
> ------------------------------------------------
> Tom Deutsch
> Program Director
> CTO Office: Information Management
> Hadoop Product Manager / Customer Exec
> IBM
> 3565 Harbor Blvd
> Costa Mesa, CA 92626-1420
> tdeutsch@us.ibm.com
>
>
>
>
> Brian Bockelman <bb...@cse.unl.edu>
> 03/03/2011 06:42 AM
> Please respond to
> common-user@hadoop.apache.org
>
>
> To
> common-user@hadoop.apache.org
> cc
>
> Subject
> Re: Hadoop and image processing?
>
>
>
>
>
>
>
> On Mar 3, 2011, at 1:23 AM, nigelsandever@btconnect.com wrote:
>
>> How applicable would Hadoop be to the processing of thousands of large
> (60-100MB) 3D image files accessible via NFS, using a 100+ machine
> cluster?
>>
>> Does the idea have any merit at all?
>>
>
> It may be a good idea. If you think the above is a viable architecture
> for data processing, then you likely don't "need" Hadoop because your
> problem is small enough, or you spent way too much money on your NFS
> server.
>
> Whether or not you "need" Hadoop for data scalability - petabytes of data
> moved at gigabytes a second - is a small aspect of the question.
>
> Hadoop is a good data processing platform in its own right. Traditional
> batch systems tend to have very Unix-friendly APIs for data processing
> (you'll find yourself writing perl script that create text submit files,
> shell scripts, and C code), but appear clumsy to "modern developers" (this
> is speaking as someone who lives and breathes batch systems). Hadoop has
> "nice" Java APIs and is Java developer friendly, has a lot of data
> processing concepts built in compared to batch systems, and extends OK to
> other langauges.
>
> If you write your image processing in Java, it would be silly to not
> consider Hadoop. If you currently run a bag full of shell scripts and C++
> code, it's a tougher decision to make.
>
> Brian
>
It can't be done.
http://open.blogs.nytimes.com/2008/05/21/the-new-york-times-archives-amazon-web-services-timesmachine/
Just kidding :)
Re: Hadoop and image processing?
Posted by Tom Deutsch <td...@us.ibm.com>.
Along with Brian I'd also suggest it depends on what you are doing with
the images, but we used Hadoop specifically for this purpose in several
solutions we build to do advanced imaging processing. Both scale out
ability to large data volumes and (in our case) compute to do the image
classification was well suited to Hadoop.
------------------------------------------------
Tom Deutsch
Program Director
CTO Office: Information Management
Hadoop Product Manager / Customer Exec
IBM
3565 Harbor Blvd
Costa Mesa, CA 92626-1420
tdeutsch@us.ibm.com
Brian Bockelman <bb...@cse.unl.edu>
03/03/2011 06:42 AM
Please respond to
common-user@hadoop.apache.org
To
common-user@hadoop.apache.org
cc
Subject
Re: Hadoop and image processing?
On Mar 3, 2011, at 1:23 AM, nigelsandever@btconnect.com wrote:
> How applicable would Hadoop be to the processing of thousands of large
(60-100MB) 3D image files accessible via NFS, using a 100+ machine
cluster?
>
> Does the idea have any merit at all?
>
It may be a good idea. If you think the above is a viable architecture
for data processing, then you likely don't "need" Hadoop because your
problem is small enough, or you spent way too much money on your NFS
server.
Whether or not you "need" Hadoop for data scalability - petabytes of data
moved at gigabytes a second - is a small aspect of the question.
Hadoop is a good data processing platform in its own right. Traditional
batch systems tend to have very Unix-friendly APIs for data processing
(you'll find yourself writing perl script that create text submit files,
shell scripts, and C code), but appear clumsy to "modern developers" (this
is speaking as someone who lives and breathes batch systems). Hadoop has
"nice" Java APIs and is Java developer friendly, has a lot of data
processing concepts built in compared to batch systems, and extends OK to
other langauges.
If you write your image processing in Java, it would be silly to not
consider Hadoop. If you currently run a bag full of shell scripts and C++
code, it's a tougher decision to make.
Brian
Re: Hadoop and image processing?
Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Mar 3, 2011, at 1:23 AM, nigelsandever@btconnect.com wrote:
> How applicable would Hadoop be to the processing of thousands of large (60-100MB) 3D image files accessible via NFS, using a 100+ machine cluster?
>
> Does the idea have any merit at all?
>
It may be a good idea. If you think the above is a viable architecture for data processing, then you likely don't "need" Hadoop because your problem is small enough, or you spent way too much money on your NFS server.
Whether or not you "need" Hadoop for data scalability - petabytes of data moved at gigabytes a second - is a small aspect of the question.
Hadoop is a good data processing platform in its own right. Traditional batch systems tend to have very Unix-friendly APIs for data processing (you'll find yourself writing perl script that create text submit files, shell scripts, and C code), but appear clumsy to "modern developers" (this is speaking as someone who lives and breathes batch systems). Hadoop has "nice" Java APIs and is Java developer friendly, has a lot of data processing concepts built in compared to batch systems, and extends OK to other langauges.
If you write your image processing in Java, it would be silly to not consider Hadoop. If you currently run a bag full of shell scripts and C++ code, it's a tougher decision to make.
Brian