You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Robert Krüger <kr...@signal7.de> on 2008/05/16 10:22:19 UTC

Making the case for Hadoop

Hi,

I'm currently trying to make the case for using Hadoop (or more 
precisely HDFS) as part of a storage architecture for a large media 
asset repository. HDFS will be used for storing up to total of 1 PB of 
high-resolution video (average file size will be > 1GB). We think HDFS 
is a perfect match for these requirements but have to convince a rather 
conservative IT executive who is worried about

- Hadoop not being mature enough (reference customers or case studies 
for similiar projects, e.g. TV stations running their video archives 
using HDFS would probably help)
- Hadoop knowledge being not widely available should our customer choose 
to change their hosting partner or systems integrator (a list of 
consulting/hosting firms having Hadoop expertise would help)

Thanks in advance for any pointers which help me make the case.

Robert





Re: Making the case for Hadoop

Posted by Ted Dunning <td...@veoh.com>.
Nothing is the best case!


On 5/16/08 7:00 AM, "Edward Capriolo" <ed...@gmail.com> wrote:

> So hadoop is a fact. My advice for convincing IT executives. Ask them
> to present their alternative. (usually its nothing)


Re: Making the case for Hadoop

Posted by Edward Capriolo <ed...@gmail.com>.
Conservative IT executive....Sounds like your working at my last job. :)

Yahoo uses hadoop. For a very large cluster.
http://developer.yahoo.com/blogs/hadoop/

And afterall hadoop is a work alike of the Google File System, google
uses that for all types of satelite data,

The new york times is using hadoop
http://open.blogs.nytimes.com/tag/hadoop/

Raskpsace a big hosting provider users hadoop too
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

So hadoop is a fact. My advice for convincing IT executives. Ask them
to present their alternative. (usually its nothing)

On Fri, May 16, 2008 at 4:22 AM, Robert Krüger <kr...@signal7.de> wrote:
>
> Hi,
>
> I'm currently trying to make the case for using Hadoop (or more precisely
> HDFS) as part of a storage architecture for a large media asset repository.
> HDFS will be used for storing up to total of 1 PB of high-resolution video
> (average file size will be > 1GB). We think HDFS is a perfect match for
> these requirements but have to convince a rather conservative IT executive
> who is worried about
>
> - Hadoop not being mature enough (reference customers or case studies for
> similiar projects, e.g. TV stations running their video archives using HDFS
> would probably help)
> - Hadoop knowledge being not widely available should our customer choose to
> change their hosting partner or systems integrator (a list of
> consulting/hosting firms having Hadoop expertise would help)
>
> Thanks in advance for any pointers which help me make the case.
>
> Robert
>
>
>
>
>

Re: Making the case for Hadoop

Posted by Ted Dunning <td...@veoh.com>.
Here at Veoh, we have committed to this style of file system in a very big
way.  We currently have around a billion files that we manage using
replicated file storage.

We didn't go with HDFS for this, but the reasons probably do not apply in
your case.  In our case, we have lots (as in LOTS) of files, many of which
are relatively small.  The sheer number of files made it better for us to go
with an alternative (MogileFS, heavily patched).  That choice had issues as
well since Mogile was not very well engineered for scaling to true web
scale.  We felt then, and I think that this is still true, that putting the
effort into Mogile to stabilize it was a bit easier than putting the effort
into Hadoop to scale it.

We also run Hadoop for log processing and are very happy.

Our experience with the replicated file store lifestyle has been excellent.
Both Mogile and HDFS have been really excellent in terms of reliable file
storage and nearly 100% uptime.

For your application, I would be pretty confident that HDFS would be a very
effective solution and would probably be much better than Mogile.  The
advantage over Mogile would largely be due to the fact that HDFS breaks
files across servers so the quantum of file management would be much smaller
for your application.

Hiring Hadoop experienced engineers is still difficult, but we have found
that it takes very little time for engineers to come up to speed on using
Hadoop and HDFS concepts take much less time.  With recent versions have
file system level interfaces integrated, it should be even easier to manage
these systems.  We are beginning to see Hadoop experience on resumes, but I
would guess that there will be a lag in Europe before you begin to see that.
There will also be some bias in our favor because we have a relatively high
profile as "cool" place to work.

On 5/16/08 1:22 AM, "Robert Krüger" <kr...@signal7.de> wrote:

> 
> Hi,
> 
> I'm currently trying to make the case for using Hadoop (or more
> precisely HDFS) as part of a storage architecture for a large media
> asset repository. HDFS will be used for storing up to total of 1 PB of
> high-resolution video (average file size will be > 1GB). We think HDFS
> is a perfect match for these requirements but have to convince a rather
> conservative IT executive who is worried about
> 
> - Hadoop not being mature enough (reference customers or case studies
> for similiar projects, e.g. TV stations running their video archives
> using HDFS would probably help)
> - Hadoop knowledge being not widely available should our customer choose
> to change their hosting partner or systems integrator (a list of
> consulting/hosting firms having Hadoop expertise would help)
> 
> Thanks in advance for any pointers which help me make the case.
> 
> Robert
> 
> 
> 
>