You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Xiaobo Gu <gu...@gmail.com> on 2011/04/26 15:55:05 UTC

Cluster hardware question

Hi,

 People say a balanced server configration is as following:

 2 4 Core CPU, 24G RAM, 4 1TB SATA Disks

But we have been used to use storages servers with 24 1T SATA Disks,
we are wondering will Hadoop be CPU bounded if this kind of servers
are used. Does anybody have experiences with hadoop running on servers
with so many disks.

Regards,

Xiaobo Gu

Re: Cluster hardware question

Posted by Xiaobo Gu <gu...@gmail.com>.

On Wed, Apr 27, 2011 at 7:07 PM, Steve Loughran <st...@apache.org> wrote:
> On 26/04/11 14:55, Xiaobo Gu wrote:
>>
>> Hi,
>>
>>  People say a balanced server configration is as following:
>>
>>  2 4 Core CPU, 24G RAM, 4 1TB SATA Disks
>>
>> But we have been used to use storages servers with 24 1T SATA Disks,
>> we are wondering will Hadoop be CPU bounded if this kind of servers
>> are used. Does anybody have experiences with hadoop running on servers
>> with so many disks.
>
> Some of the new clusters are running one or two 6 core CPUs with 12*2TB 3.5"
> HDDs for storage, as this gives maximum storage density (it fits in a 1U).
> The exact ratio of CPU:RAM:disk depends on the application.
>
> What you get with the big servers is
>  -more probability of local access
>  -great IO bandwidth, especially if you set up the mapred.temp.dir value to
> include all the drives.

Do you mean mapred.local.dir? Another question: how mapreduce write to
mapred.local.dir, in round-rbion ?
Does mixing mapred.local.dir and dfs.data.dir  is common practice?


>  -less servers means less network ports on the switches, so you can save
> some money in the network fabric, and in time/effort cabling everything up.
>
> What do you lose?
>  -in a small cluster, loss of a single machine matters
>  -in a large cluster, loss of a single machine can generate up to 24TB of
> replication traffic (more once 3TB HDDs become affordable)
>  -in a large cluster, loss of a rack (or switch) can generate a very large
> amount of traffic.
>
> If you were building a large (muti PB) cluster, this design is good for
> storage density -you could get a petabyte in a couple of racks, though the
> replication costs of a Top of Rack switch failure might push you towards
> 2xToR switches and bonded NICs, which introduce a whole new set of problems.
>
> For smaller installations? I don't know.
>
> -Steve
>

Re: Cluster hardware question

Posted by Steve Loughran <st...@apache.org>.

On 26/04/11 14:55, Xiaobo Gu wrote:
> Hi,
>
>   People say a balanced server configration is as following:
>
>   2 4 Core CPU, 24G RAM, 4 1TB SATA Disks
>
> But we have been used to use storages servers with 24 1T SATA Disks,
> we are wondering will Hadoop be CPU bounded if this kind of servers
> are used. Does anybody have experiences with hadoop running on servers
> with so many disks.

Some of the new clusters are running one or two 6 core CPUs with 12*2TB 
3.5" HDDs for storage, as this gives maximum storage density (it fits in 
a 1U). The exact ratio of CPU:RAM:disk depends on the application.

What you get with the big servers is
  -more probability of local access
  -great IO bandwidth, especially if you set up the mapred.temp.dir 
value to include all the drives.
  -less servers means less network ports on the switches, so you can 
save some money in the network fabric, and in time/effort cabling 
everything up.

What do you lose?
  -in a small cluster, loss of a single machine matters
  -in a large cluster, loss of a single machine can generate up to 24TB 
of replication traffic (more once 3TB HDDs become affordable)
  -in a large cluster, loss of a rack (or switch) can generate a very 
large amount of traffic.

If you were building a large (muti PB) cluster, this design is good for 
storage density -you could get a petabyte in a couple of racks, though 
the replication costs of a Top of Rack switch failure might push you 
towards 2xToR switches and bonded NICs, which introduce a whole new set 
of problems.

For smaller installations? I don't know.

-Steve

Re: Cluster hardware question

Posted by Xiaobo Gu <gu...@gmail.com>.

On Tue, Apr 26, 2011 at 11:30 PM, Michel Segel
<mi...@hotmail.com> wrote:
> Hi,
> Actually if you have 2 4 core CPUs xeon chips... You will become i/o bound with 4 drives.
> The rule of thumb tends to be 2 disks per core so you would want 16 drives per node... At least in theory.

> 24 1TB drives would be interesting, but I'm not sure what sort of problems you could expect to encounter until you had to expand the cluster...
So I can change the  CPU configuration  to 2 6 core xeons.

What problem do you mean we will encount when expand the cluster, too
much data to move?

> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 26, 2011, at 8:55 AM, Xiaobo Gu <gu...@gmail.com> wrote:
>
>> Hi,
>>
>> People say a balanced server configration is as following:
>>
>> 2 4 Core CPU, 24G RAM, 4 1TB SATA Disks
>>
>> But we have been used to use storages servers with 24 1T SATA Disks,
>> we are wondering will Hadoop be CPU bounded if this kind of servers
>> are used. Does anybody have experiences with hadoop running on servers
>> with so many disks.
>>
>> Regards,
>>
>> Xiaobo G
>>
>

Re: Cluster hardware question

Posted by Michel Segel <mi...@hotmail.com>.

Hi,
Actually if you have 2 4 core CPUs xeon chips... You will become i/o bound with 4 drives.
The rule of thumb tends to be 2 disks per core so you would want 16 drives per node... At least in theory.

24 1TB drives would be interesting, but I'm not sure what sort of problems you could expect to encounter until you had to expand the cluster...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 26, 2011, at 8:55 AM, Xiaobo Gu <gu...@gmail.com> wrote:

> Hi,
> 
> People say a balanced server configration is as following:
> 
> 2 4 Core CPU, 24G RAM, 4 1TB SATA Disks
> 
> But we have been used to use storages servers with 24 1T SATA Disks,
> we are wondering will Hadoop be CPU bounded if this kind of servers
> are used. Does anybody have experiences with hadoop running on servers
> with so many disks.
> 
> Regards,
> 
> Xiaobo G
>