You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Cussol <pi...@cnes.fr> on 2011/12/16 12:50:49 UTC

Hadoop and hardware


In my company, we intend to set up an hadoop cluster to run analylitics
applications. This cluster would have about 120 data nodes with dual sockets
servers with a GB interconnect. We are also exploring a solution with 60
quad sockets servers. How do compare the quad sockets and dual sockets
servers in an hadoop cluster ?

any help ?

pierre
-- 
View this message in context: http://old.nabble.com/Hadoop-and-hardware-tp32987374p32987374.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Hadoop and Ubuntu / Java

Posted by madhu phatak <ph...@gmail.com>.
As per Oracle, going forward openjdk will be official oracle jdk for linux
. Which means openjdk will be same as the official one.

On Tue, Dec 20, 2011 at 9:12 PM, hadoopman <ha...@gmail.com> wrote:

>
> http://www.omgubuntu.co.uk/**2011/12/java-to-be-removed-**
> from-ubuntu-uninstalled-from-**user-machines/<http://www.omgubuntu.co.uk/2011/12/java-to-be-removed-from-ubuntu-uninstalled-from-user-machines/>
>
> I'm curious what this will mean for Hadoop on Ubuntu systems moving
> forward.  I've tried openJDK nearly two years ago with Hadoop.  Needless to
> say it was a real problem.
>
> Hopefully we can still download it from the Sun/Oracle web site and still
> use it.  Won't be the same though :/
>



-- 
https://github.com/zinnia-phatak-dev/Nectar

Hadoop and Ubuntu / Java

Posted by hadoopman <ha...@gmail.com>.
http://www.omgubuntu.co.uk/2011/12/java-to-be-removed-from-ubuntu-uninstalled-from-user-machines/

I'm curious what this will mean for Hadoop on Ubuntu systems moving 
forward.  I've tried openJDK nearly two years ago with Hadoop.  Needless 
to say it was a real problem.

Hopefully we can still download it from the Sun/Oracle web site and 
still use it.  Won't be the same though :/

Re: Hadoop and hardware

Posted by Michel Segel <mi...@hotmail.com>.
Uhm... If I may add something...

Joep is correct. There are a lot of factors that will effect your cluster design.
And there have been a lot of threads on this topic because hardware prices frequently change along with advances in technology as well as non-commodity solutions aimed at niche spaces.
Plus this is the biggest decision that you can't easily change and you are forced to live with it...
(I think there's a potential blog in this ...)

Going from memory and at 4:30 am is not a good thing to do, I believe that in a standard rack there are 42 1U spaces so you can fit 20 2U boxes in your rack and still have room for your ToR switch. There is also the issue of power and cooling that may take up space too...

The one common question that no one seems to ask is .... "What are your constraints?"
For some it may be physical space, others budget... power... hardware availability.... 
That one question will have a big impact on you cluster design and nobody asks it. 

With respect to quad socket vs dual socket...

There was a post on Cloudera's site which recommended 2 drives per core so w 16 cores, you would have 32 spindles. Maximizing your data density, you will want 3.5" drives. I don't think that you can fit 16 3.5" drives in a 2U box, let alone 32 ...  Note that I didn't even think about 24 cores...

And as Joep points out that with this much disk 1GBe even port bonded isn't going to cut it...

Lots of thing to think about...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 16, 2011, at 10:47 AM, "J. Rottinghuis" <jr...@gmail.com> wrote:

> Pierre,
> 
> As discussed in recent other threads, it depends.
> The most sensible thing for Hadoop nodes is to find a sweet spot for
> price/performance.
> In general that will mean keeping a balance between compute power, disks,
> and network bandwidth, and factor in racks, space, operating costs etc.
> 
> How much storage capacity are you thinking of when you target "about 120
> data nodes"?
> 
> If you had for example 60 quad core nodes with 12 * 2 TB disks (or more) I
> would suspect you would be bottle-necked on your 1GB network connections.
> 
> Other things to consider is how many nodes per rack? If these 60 nodes
> would be 2u and you'd fit 20 nodes in a rack, then loosing one top of the
> rack switch means loosing 1/3 of the capacity of your cluster.
> 
> Yet another consideration is how easily you want to be able to expand your
> cluster incrementally? Until you run Hadoop 0.23 you probably want all your
> nodes to be roughly similar in capacity.
> 
> Cheers,
> 
> Joep
> 
> On Fri, Dec 16, 2011 at 3:50 AM, Cussol <pi...@cnes.fr> wrote:
> 
>> 
>> 
>> In my company, we intend to set up an hadoop cluster to run analylitics
>> applications. This cluster would have about 120 data nodes with dual
>> sockets
>> servers with a GB interconnect. We are also exploring a solution with 60
>> quad sockets servers. How do compare the quad sockets and dual sockets
>> servers in an hadoop cluster ?
>> 
>> any help ?
>> 
>> pierre
>> --
>> View this message in context:
>> http://old.nabble.com/Hadoop-and-hardware-tp32987374p32987374.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>> 
>> 

Re: Hadoop and hardware

Posted by "J. Rottinghuis" <jr...@gmail.com>.
Pierre,

As discussed in recent other threads, it depends.
The most sensible thing for Hadoop nodes is to find a sweet spot for
price/performance.
In general that will mean keeping a balance between compute power, disks,
and network bandwidth, and factor in racks, space, operating costs etc.

How much storage capacity are you thinking of when you target "about 120
data nodes"?

If you had for example 60 quad core nodes with 12 * 2 TB disks (or more) I
would suspect you would be bottle-necked on your 1GB network connections.

Other things to consider is how many nodes per rack? If these 60 nodes
would be 2u and you'd fit 20 nodes in a rack, then loosing one top of the
rack switch means loosing 1/3 of the capacity of your cluster.

Yet another consideration is how easily you want to be able to expand your
cluster incrementally? Until you run Hadoop 0.23 you probably want all your
nodes to be roughly similar in capacity.

Cheers,

Joep

On Fri, Dec 16, 2011 at 3:50 AM, Cussol <pi...@cnes.fr> wrote:

>
>
> In my company, we intend to set up an hadoop cluster to run analylitics
> applications. This cluster would have about 120 data nodes with dual
> sockets
> servers with a GB interconnect. We are also exploring a solution with 60
> quad sockets servers. How do compare the quad sockets and dual sockets
> servers in an hadoop cluster ?
>
> any help ?
>
> pierre
> --
> View this message in context:
> http://old.nabble.com/Hadoop-and-hardware-tp32987374p32987374.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>