You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Jeff Zhang <zj...@gmail.com> on 2010/02/21 16:49:22 UTC

What is the biggest problem of extremely large hadoop cluster ?

Hi ,

I am curious to know what is the biggest problem of extremely large hadoop
cluster. What I can imagine now is the memory cost of meta data of hdfs in
name node. One solution I can think about now is to use other storage
implementation such as database to store the metadata, although it has
performance cost. Is there any other solutions or any problems of extremely
large hadoop cluster ?



-- 
Best Regards

Jeff Zhang

Re: What is the biggest problem of extremely large hadoop cluster ?

Posted by Jason Venner <ja...@gmail.com>.
Underlying network bandwidth and rack locality, as well as the
operational overhead of managing the machines. After a certain scale
point, there will most always be at least one machine failing.


On Sun, Feb 21, 2010 at 7:54 AM, Jeff Zhang <zj...@gmail.com> wrote:
> ---------- Forwarded message ----------
> From: Jeff Zhang <zj...@gmail.com>
> Date: Sun, Feb 21, 2010 at 7:49 AM
> Subject: What is the biggest problem of extremely large hadoop cluster ?
> To: hdfs-dev@hadoop.apache.org
>
>
> Hi ,
>
> I am curious to know what is the biggest problem of extremely large hadoop
> cluster. What I can imagine now is the memory cost of meta data of hdfs in
> name node. One solution I can think about now is to use other storage
> implementation such as database to store the metadata, although it has
> performance cost. Is there any other solutions or any problems of extremely
> large hadoop cluster ?
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Fwd: What is the biggest problem of extremely large hadoop cluster ?

Posted by Jeff Zhang <zj...@gmail.com>.
---------- Forwarded message ----------
From: Jeff Zhang <zj...@gmail.com>
Date: Sun, Feb 21, 2010 at 7:49 AM
Subject: What is the biggest problem of extremely large hadoop cluster ?
To: hdfs-dev@hadoop.apache.org


Hi ,

I am curious to know what is the biggest problem of extremely large hadoop
cluster. What I can imagine now is the memory cost of meta data of hdfs in
name node. One solution I can think about now is to use other storage
implementation such as database to store the metadata, although it has
performance cost. Is there any other solutions or any problems of extremely
large hadoop cluster ?



-- 
Best Regards

Jeff Zhang



-- 
Best Regards

Jeff Zhang

Fwd: What is the biggest problem of extremely large hadoop cluster ?

Posted by Jeff Zhang <zj...@gmail.com>.
---------- Forwarded message ----------
From: Jeff Zhang <zj...@gmail.com>
Date: Sun, Feb 21, 2010 at 7:49 AM
Subject: What is the biggest problem of extremely large hadoop cluster ?
To: hdfs-dev@hadoop.apache.org


Hi ,

I am curious to know what is the biggest problem of extremely large hadoop
cluster. What I can imagine now is the memory cost of meta data of hdfs in
name node. One solution I can think about now is to use other storage
implementation such as database to store the metadata, although it has
performance cost. Is there any other solutions or any problems of extremely
large hadoop cluster ?



-- 
Best Regards

Jeff Zhang



-- 
Best Regards

Jeff Zhang

Re: What is the biggest problem of extremely large hadoop cluster ?

Posted by Scott Chen <cy...@facebook.com>.
The other problem is that the slowness of the TT heartbeats will make the
job latency worse.


On 2/21/10 7:49 AM, "Jeff Zhang" <zj...@gmail.com> wrote:

> Hi ,
> 
> I am curious to know what is the biggest problem of extremely large hadoop
> cluster. What I can imagine now is the memory cost of meta data of hdfs in
> name node. One solution I can think about now is to use other storage
> implementation such as database to store the metadata, although it has
> performance cost. Is there any other solutions or any problems of extremely
> large hadoop cluster ?
> 
>