You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Justin Becker <be...@gmail.com> on 2010/02/04 02:14:32 UTC

Hardware inquiry

My organization has decided to make a substantial investment in hardware for
processing Hadoop jobs.  Our cluster will be used by multiple groups so its
hard to classify the problems as IO, memory, or CPU bound.  Would others be
willing to share their hardware profiles coupled with the problem types
(memory, cpu, etc.).  Our current setup, for the existing cluster is made up
of the following machines,

Poweredge 1655
2x2 Intel Xeon 1.4ghz
2GB RAM
72GB local HD

Poweredge 1855
2x2 Intel Xeon 3.2ghz
8GB RAM
146GB local HD

Poweredge 1955
2x2 Intel Xeon 3.0ghz
4GB RAM
72GB local HD

Obviously, we would like to increase local disk space, memory, and the
number of cores.  The not-so-obvious decision is wether to select high end
equipment (fewer machines) or lower-class hardware.  We're trying to balance
"how commodity" against the administration costs.  I've read the machine
scaling material on the Hadoop wiki.  Any additional real-world advice would
be awesome.


Thanks,

Justin

Re: Hardware inquiry

Posted by Konstantin Boudnik <co...@yahoo-inc.com>.
Oh you might consider a colo which cost you 1/4 of EC2 :)

On Fri, Feb 05, 2010 at 10:59AM, Sirota, Peter wrote:
> Hi Justin,
> 
> Have you guys considered running inside Amazon Elastic MapReduce?  With this service you don't have to choose your hardware across all jobs but rather pic out of 7 hardware types we have available.  Also you don't have to pay capital upfront but rather scale with your needs.
> 
> Let me know if we can help you to get started with Amazon Elastic MapReduce.   http://aws.amazon.com/elasticmapreduce/ 
> 
> 
> 
> 
> Regards,
> Peter Sirota
> GM, Amazon Elastic MapReduce
> 
> -----Original Message-----
> From: Justin Becker [mailto:becker.justin@gmail.com] 
> Sent: Wednesday, February 03, 2010 5:15 PM
> To: common-user@hadoop.apache.org
> Subject: Hardware inquiry
> 
> My organization has decided to make a substantial investment in hardware for
> processing Hadoop jobs.  Our cluster will be used by multiple groups so its
> hard to classify the problems as IO, memory, or CPU bound.  Would others be
> willing to share their hardware profiles coupled with the problem types
> (memory, cpu, etc.).  Our current setup, for the existing cluster is made up
> of the following machines,
> 
> Poweredge 1655
> 2x2 Intel Xeon 1.4ghz
> 2GB RAM
> 72GB local HD
> 
> Poweredge 1855
> 2x2 Intel Xeon 3.2ghz
> 8GB RAM
> 146GB local HD
> 
> Poweredge 1955
> 2x2 Intel Xeon 3.0ghz
> 4GB RAM
> 72GB local HD
> 
> Obviously, we would like to increase local disk space, memory, and the
> number of cores.  The not-so-obvious decision is wether to select high end
> equipment (fewer machines) or lower-class hardware.  We're trying to balance
> "how commodity" against the administration costs.  I've read the machine
> scaling material on the Hadoop wiki.  Any additional real-world advice would
> be awesome.
> 
> 
> Thanks,
> 
> Justin

RE: Hardware inquiry

Posted by "Sirota, Peter" <si...@amazon.com>.
Hi Justin,

Have you guys considered running inside Amazon Elastic MapReduce?  With this service you don't have to choose your hardware across all jobs but rather pic out of 7 hardware types we have available.  Also you don't have to pay capital upfront but rather scale with your needs.

Let me know if we can help you to get started with Amazon Elastic MapReduce.   http://aws.amazon.com/elasticmapreduce/ 




Regards,
Peter Sirota
GM, Amazon Elastic MapReduce

-----Original Message-----
From: Justin Becker [mailto:becker.justin@gmail.com] 
Sent: Wednesday, February 03, 2010 5:15 PM
To: common-user@hadoop.apache.org
Subject: Hardware inquiry

My organization has decided to make a substantial investment in hardware for
processing Hadoop jobs.  Our cluster will be used by multiple groups so its
hard to classify the problems as IO, memory, or CPU bound.  Would others be
willing to share their hardware profiles coupled with the problem types
(memory, cpu, etc.).  Our current setup, for the existing cluster is made up
of the following machines,

Poweredge 1655
2x2 Intel Xeon 1.4ghz
2GB RAM
72GB local HD

Poweredge 1855
2x2 Intel Xeon 3.2ghz
8GB RAM
146GB local HD

Poweredge 1955
2x2 Intel Xeon 3.0ghz
4GB RAM
72GB local HD

Obviously, we would like to increase local disk space, memory, and the
number of cores.  The not-so-obvious decision is wether to select high end
equipment (fewer machines) or lower-class hardware.  We're trying to balance
"how commodity" against the administration costs.  I've read the machine
scaling material on the Hadoop wiki.  Any additional real-world advice would
be awesome.


Thanks,

Justin

Re: Hardware inquiry

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Justin,

The best overall balanced machine in my experience, if you're buying
today, is a dual quad core recent processor, 4-6x1TB 7200 RPM SATA
disks, and 16-24G RAM. It sounds beefy, but it hits a good
price/performance/power sweet spot - you can still get this in 1U and
hence pack a lot of punch in each rack.

Depending on your problem and data, you can bump up RAM, disk, or CPU.
For example, some people are running archival clusters with 12 or even
24 disks per node, or using 1.5TB/2TB disks on each node.

-Todd

On Wed, Feb 3, 2010 at 5:14 PM, Justin Becker <be...@gmail.com> wrote:
> My organization has decided to make a substantial investment in hardware for
> processing Hadoop jobs.  Our cluster will be used by multiple groups so its
> hard to classify the problems as IO, memory, or CPU bound.  Would others be
> willing to share their hardware profiles coupled with the problem types
> (memory, cpu, etc.).  Our current setup, for the existing cluster is made up
> of the following machines,
>
> Poweredge 1655
> 2x2 Intel Xeon 1.4ghz
> 2GB RAM
> 72GB local HD
>
> Poweredge 1855
> 2x2 Intel Xeon 3.2ghz
> 8GB RAM
> 146GB local HD
>
> Poweredge 1955
> 2x2 Intel Xeon 3.0ghz
> 4GB RAM
> 72GB local HD
>
> Obviously, we would like to increase local disk space, memory, and the
> number of cores.  The not-so-obvious decision is wether to select high end
> equipment (fewer machines) or lower-class hardware.  We're trying to balance
> "how commodity" against the administration costs.  I've read the machine
> scaling material on the Hadoop wiki.  Any additional real-world advice would
> be awesome.
>
>
> Thanks,
>
> Justin
>