You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Rahul Bhattacharjee <ra...@gmail.com> on 2013/05/05 18:41:52 UTC

Re: Hardware Selection for Hadoop

IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
Patai.Sangbutsarakum@turn.com> wrote:

>  2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
>
>  my 2 cents
>
>
>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
>
>      Hi,
>
> I have to propose some hardware requirements in my company for a Proof of
> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
> Website. But just wanted to know from the group - what is the requirements
> if I have to plan for a 5 node cluster. I dont know at this time, the data
> that need to be processed at this time for the Proof of Concept. So - can
> you suggest something to me?
>
> Regards,
> Raj
>
>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.
I understand.

But sometimes there is a lock-in with a particular vendor and you are not
allowed to put the servers inside corporate data center if they are
procured from some another vendor.

The idea is to start from basic and then grow. You can tell me some numbers
in $s if you have, preferred ;), I know sometimes there are no correct
answers.

I got a quote of $4200 for  6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB
RAM. Vendor: HP. Does this sound ok for this configuration?


On Tue, Aug 13, 2013 at 6:15 AM, Chris Embree <ce...@gmail.com> wrote:

> As we always say in Technology... it depends!
>
> What country are you in?  That makes a difference.
> How much buying power do you have?  I work for a Fortune 100 Company and
> we -- absurdly -- pay about 60% off retail when we buy servers.
> Are you buying a bunch at once?
>
> You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
> add-on services, like racking, cabling, configuring servers, etc.  You can
> decide if it's worth it.
>
> The bottom line, there is no correct answer to your question. ;)
>
>
> On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com>wrote:
>
>> Any rough ideas how much this would cost? Actually I kinda require a
>> budget approval and need to put some rough figures in $ on the paper. Help!
>>
>> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
>> 2. I rack mount unit
>> 3. I gbe switch for the rack
>> 4. 10 gbe switch for the network
>>
>> Regards,
>> Sambit Tripathy.
>>
>>
>> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com>wrote:
>>
>>>
>>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> While we have a rough metric on spindles to cores, you end up putting a
>>>> stress on the disk controllers. YMMV.
>>>>
>>>
>>> This is an important comment.
>>>
>>> Some controllers fold when you start pushing too much data.  Testing
>>> nodes independently before installation is important.
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.
I understand.

But sometimes there is a lock-in with a particular vendor and you are not
allowed to put the servers inside corporate data center if they are
procured from some another vendor.

The idea is to start from basic and then grow. You can tell me some numbers
in $s if you have, preferred ;), I know sometimes there are no correct
answers.

I got a quote of $4200 for  6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB
RAM. Vendor: HP. Does this sound ok for this configuration?


On Tue, Aug 13, 2013 at 6:15 AM, Chris Embree <ce...@gmail.com> wrote:

> As we always say in Technology... it depends!
>
> What country are you in?  That makes a difference.
> How much buying power do you have?  I work for a Fortune 100 Company and
> we -- absurdly -- pay about 60% off retail when we buy servers.
> Are you buying a bunch at once?
>
> You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
> add-on services, like racking, cabling, configuring servers, etc.  You can
> decide if it's worth it.
>
> The bottom line, there is no correct answer to your question. ;)
>
>
> On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com>wrote:
>
>> Any rough ideas how much this would cost? Actually I kinda require a
>> budget approval and need to put some rough figures in $ on the paper. Help!
>>
>> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
>> 2. I rack mount unit
>> 3. I gbe switch for the rack
>> 4. 10 gbe switch for the network
>>
>> Regards,
>> Sambit Tripathy.
>>
>>
>> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com>wrote:
>>
>>>
>>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> While we have a rough metric on spindles to cores, you end up putting a
>>>> stress on the disk controllers. YMMV.
>>>>
>>>
>>> This is an important comment.
>>>
>>> Some controllers fold when you start pushing too much data.  Testing
>>> nodes independently before installation is important.
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.
I understand.

But sometimes there is a lock-in with a particular vendor and you are not
allowed to put the servers inside corporate data center if they are
procured from some another vendor.

The idea is to start from basic and then grow. You can tell me some numbers
in $s if you have, preferred ;), I know sometimes there are no correct
answers.

I got a quote of $4200 for  6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB
RAM. Vendor: HP. Does this sound ok for this configuration?


On Tue, Aug 13, 2013 at 6:15 AM, Chris Embree <ce...@gmail.com> wrote:

> As we always say in Technology... it depends!
>
> What country are you in?  That makes a difference.
> How much buying power do you have?  I work for a Fortune 100 Company and
> we -- absurdly -- pay about 60% off retail when we buy servers.
> Are you buying a bunch at once?
>
> You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
> add-on services, like racking, cabling, configuring servers, etc.  You can
> decide if it's worth it.
>
> The bottom line, there is no correct answer to your question. ;)
>
>
> On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com>wrote:
>
>> Any rough ideas how much this would cost? Actually I kinda require a
>> budget approval and need to put some rough figures in $ on the paper. Help!
>>
>> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
>> 2. I rack mount unit
>> 3. I gbe switch for the rack
>> 4. 10 gbe switch for the network
>>
>> Regards,
>> Sambit Tripathy.
>>
>>
>> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com>wrote:
>>
>>>
>>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> While we have a rough metric on spindles to cores, you end up putting a
>>>> stress on the disk controllers. YMMV.
>>>>
>>>
>>> This is an important comment.
>>>
>>> Some controllers fold when you start pushing too much data.  Testing
>>> nodes independently before installation is important.
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.
I understand.

But sometimes there is a lock-in with a particular vendor and you are not
allowed to put the servers inside corporate data center if they are
procured from some another vendor.

The idea is to start from basic and then grow. You can tell me some numbers
in $s if you have, preferred ;), I know sometimes there are no correct
answers.

I got a quote of $4200 for  6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB
RAM. Vendor: HP. Does this sound ok for this configuration?


On Tue, Aug 13, 2013 at 6:15 AM, Chris Embree <ce...@gmail.com> wrote:

> As we always say in Technology... it depends!
>
> What country are you in?  That makes a difference.
> How much buying power do you have?  I work for a Fortune 100 Company and
> we -- absurdly -- pay about 60% off retail when we buy servers.
> Are you buying a bunch at once?
>
> You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
> add-on services, like racking, cabling, configuring servers, etc.  You can
> decide if it's worth it.
>
> The bottom line, there is no correct answer to your question. ;)
>
>
> On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com>wrote:
>
>> Any rough ideas how much this would cost? Actually I kinda require a
>> budget approval and need to put some rough figures in $ on the paper. Help!
>>
>> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
>> 2. I rack mount unit
>> 3. I gbe switch for the rack
>> 4. 10 gbe switch for the network
>>
>> Regards,
>> Sambit Tripathy.
>>
>>
>> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com>wrote:
>>
>>>
>>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> While we have a rough metric on spindles to cores, you end up putting a
>>>> stress on the disk controllers. YMMV.
>>>>
>>>
>>> This is an important comment.
>>>
>>> Some controllers fold when you start pushing too much data.  Testing
>>> nodes independently before installation is important.
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Chris Embree <ce...@gmail.com>.
As we always say in Technology... it depends!

What country are you in?  That makes a difference.
How much buying power do you have?  I work for a Fortune 100 Company and we
-- absurdly -- pay about 60% off retail when we buy servers.
Are you buying a bunch at once?

You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
add-on services, like racking, cabling, configuring servers, etc.  You can
decide if it's worth it.

The bottom line, there is no correct answer to your question. ;)


On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com> wrote:

> Any rough ideas how much this would cost? Actually I kinda require a
> budget approval and need to put some rough figures in $ on the paper. Help!
>
> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
> 2. I rack mount unit
> 3. I gbe switch for the rack
> 4. 10 gbe switch for the network
>
> Regards,
> Sambit Tripathy.
>
>
> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>>
>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>>
>>> While we have a rough metric on spindles to cores, you end up putting a
>>> stress on the disk controllers. YMMV.
>>>
>>
>> This is an important comment.
>>
>> Some controllers fold when you start pushing too much data.  Testing
>> nodes independently before installation is important.
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Chris Embree <ce...@gmail.com>.
As we always say in Technology... it depends!

What country are you in?  That makes a difference.
How much buying power do you have?  I work for a Fortune 100 Company and we
-- absurdly -- pay about 60% off retail when we buy servers.
Are you buying a bunch at once?

You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
add-on services, like racking, cabling, configuring servers, etc.  You can
decide if it's worth it.

The bottom line, there is no correct answer to your question. ;)


On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com> wrote:

> Any rough ideas how much this would cost? Actually I kinda require a
> budget approval and need to put some rough figures in $ on the paper. Help!
>
> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
> 2. I rack mount unit
> 3. I gbe switch for the rack
> 4. 10 gbe switch for the network
>
> Regards,
> Sambit Tripathy.
>
>
> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>>
>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>>
>>> While we have a rough metric on spindles to cores, you end up putting a
>>> stress on the disk controllers. YMMV.
>>>
>>
>> This is an important comment.
>>
>> Some controllers fold when you start pushing too much data.  Testing
>> nodes independently before installation is important.
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Chris Embree <ce...@gmail.com>.
As we always say in Technology... it depends!

What country are you in?  That makes a difference.
How much buying power do you have?  I work for a Fortune 100 Company and we
-- absurdly -- pay about 60% off retail when we buy servers.
Are you buying a bunch at once?

You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
add-on services, like racking, cabling, configuring servers, etc.  You can
decide if it's worth it.

The bottom line, there is no correct answer to your question. ;)


On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com> wrote:

> Any rough ideas how much this would cost? Actually I kinda require a
> budget approval and need to put some rough figures in $ on the paper. Help!
>
> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
> 2. I rack mount unit
> 3. I gbe switch for the rack
> 4. 10 gbe switch for the network
>
> Regards,
> Sambit Tripathy.
>
>
> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>>
>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>>
>>> While we have a rough metric on spindles to cores, you end up putting a
>>> stress on the disk controllers. YMMV.
>>>
>>
>> This is an important comment.
>>
>> Some controllers fold when you start pushing too much data.  Testing
>> nodes independently before installation is important.
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Chris Embree <ce...@gmail.com>.
As we always say in Technology... it depends!

What country are you in?  That makes a difference.
How much buying power do you have?  I work for a Fortune 100 Company and we
-- absurdly -- pay about 60% off retail when we buy servers.
Are you buying a bunch at once?

You best bet is to contact 3 or 4 VAR's to get quotes.  They'll offer you
add-on services, like racking, cabling, configuring servers, etc.  You can
decide if it's worth it.

The bottom line, there is no correct answer to your question. ;)


On Mon, Aug 12, 2013 at 8:30 PM, Sambit Tripathy <sa...@gmail.com> wrote:

> Any rough ideas how much this would cost? Actually I kinda require a
> budget approval and need to put some rough figures in $ on the paper. Help!
>
> 1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
> 2. I rack mount unit
> 3. I gbe switch for the rack
> 4. 10 gbe switch for the network
>
> Regards,
> Sambit Tripathy.
>
>
> On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>>
>> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>>
>>> While we have a rough metric on spindles to cores, you end up putting a
>>> stress on the disk controllers. YMMV.
>>>
>>
>> This is an important comment.
>>
>> Some controllers fold when you start pushing too much data.  Testing
>> nodes independently before installation is important.
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.
Any rough ideas how much this would cost? Actually I kinda require a budget
approval and need to put some rough figures in $ on the paper. Help!

1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
2. I rack mount unit
3. I gbe switch for the rack
4. 10 gbe switch for the network

Regards,
Sambit Tripathy.


On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>
>> While we have a rough metric on spindles to cores, you end up putting a
>> stress on the disk controllers. YMMV.
>>
>
> This is an important comment.
>
> Some controllers fold when you start pushing too much data.  Testing nodes
> independently before installation is important.
>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.
Any rough ideas how much this would cost? Actually I kinda require a budget
approval and need to put some rough figures in $ on the paper. Help!

1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
2. I rack mount unit
3. I gbe switch for the rack
4. 10 gbe switch for the network

Regards,
Sambit Tripathy.


On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>
>> While we have a rough metric on spindles to cores, you end up putting a
>> stress on the disk controllers. YMMV.
>>
>
> This is an important comment.
>
> Some controllers fold when you start pushing too much data.  Testing nodes
> independently before installation is important.
>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.
Any rough ideas how much this would cost? Actually I kinda require a budget
approval and need to put some rough figures in $ on the paper. Help!

1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
2. I rack mount unit
3. I gbe switch for the rack
4. 10 gbe switch for the network

Regards,
Sambit Tripathy.


On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>
>> While we have a rough metric on spindles to cores, you end up putting a
>> stress on the disk controllers. YMMV.
>>
>
> This is an important comment.
>
> Some controllers fold when you start pushing too much data.  Testing nodes
> independently before installation is important.
>
>

Re: Hardware Selection for Hadoop

Posted by Sambit Tripathy <sa...@gmail.com>.
Any rough ideas how much this would cost? Actually I kinda require a budget
approval and need to put some rough figures in $ on the paper. Help!

1. 6 X 2 TB hard disk JBOD, 2 quad cores, 24-48 GB RAM.
2. I rack mount unit
3. I gbe switch for the rack
4. 10 gbe switch for the network

Regards,
Sambit Tripathy.


On Tue, May 7, 2013 at 9:21 PM, Ted Dunning <td...@maprtech.com> wrote:

>
> On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:
>
>> While we have a rough metric on spindles to cores, you end up putting a
>> stress on the disk controllers. YMMV.
>>
>
> This is an important comment.
>
> Some controllers fold when you start pushing too much data.  Testing nodes
> independently before installation is important.
>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:

> While we have a rough metric on spindles to cores, you end up putting a
> stress on the disk controllers. YMMV.
>

This is an important comment.

Some controllers fold when you start pushing too much data.  Testing nodes
independently before installation is important.

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:

> While we have a rough metric on spindles to cores, you end up putting a
> stress on the disk controllers. YMMV.
>

This is an important comment.

Some controllers fold when you start pushing too much data.  Testing nodes
independently before installation is important.

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:

> While we have a rough metric on spindles to cores, you end up putting a
> stress on the disk controllers. YMMV.
>

This is an important comment.

Some controllers fold when you start pushing too much data.  Testing nodes
independently before installation is important.

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
On Tue, May 7, 2013 at 5:53 AM, Michael Segel <mi...@hotmail.com>wrote:

> While we have a rough metric on spindles to cores, you end up putting a
> stress on the disk controllers. YMMV.
>

This is an important comment.

Some controllers fold when you start pushing too much data.  Testing nodes
independently before installation is important.

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
I wouldn't.

You end up with a 'Frankencluster' which could become problematic down the road. 

Ever try to debug a port failure on a switch? (It does happen and its a bitch.) 
Note that you say 'reliable'... older hardware may or may not be reliable.... or under warranty.
(How many here build their own servers from the components up?  ;-) 

I'm not suggesting that you go out and buy a 10 core cpu, however, depending on who you are, and what your budget is... it may make sense. o 
Even for a proof of concept. ;-) 

While we have a rough metric on spindles to cores, you end up putting a stress on the disk controllers. YMMV.

As to spending $$$ on hardware for  a PoC, its not only relative... but also what makes you think this is the first PoC and only PoC he's going to do? The point is that hardware is reusable and it also sets a pattern for what the future cluster will look like. After this PoC, why not look at Storm, Mesos, Spark, Shark, etc... 

Trust me, as someone who has had to fight for allocation of hardware dollars for R&D... get the best bang you can for your buck.

HTH

-Mike

On May 6, 2013, at 5:57 PM, Patai Sangbutsarakum <Pa...@turn.com> wrote:

> I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
> Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
> Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.
> 
> You will get the idea how is your application will behave, how big of the data set you will play with
> is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration. 
> 
> 
> 
> On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com> wrote:
> 
>> 8 physical cores is so 2009 - 2010 :-)
>> 
>> Intel now offers a chip w 10 physical cores on a die. 
>> You are better off thinking of 4-8 GB per physical core. 
>> It depends on what you want to do, and what you think you may want to do...
>> 
>> It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:
>> 
>>> 
>>> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>> 
>>> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
>>> 
>>> If you look at the different configurations mentioned in this thread, you will see different limitations.
>>> 
>>> For instance:
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>> 
>>> This configuration is mostly limited by networking bandwidth
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>>  
>>> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>>>  
>>> 
>>> 
>>> 
>>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>>> 
>>> 
>>> 
> 


Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
I wouldn't.

You end up with a 'Frankencluster' which could become problematic down the road. 

Ever try to debug a port failure on a switch? (It does happen and its a bitch.) 
Note that you say 'reliable'... older hardware may or may not be reliable.... or under warranty.
(How many here build their own servers from the components up?  ;-) 

I'm not suggesting that you go out and buy a 10 core cpu, however, depending on who you are, and what your budget is... it may make sense. o 
Even for a proof of concept. ;-) 

While we have a rough metric on spindles to cores, you end up putting a stress on the disk controllers. YMMV.

As to spending $$$ on hardware for  a PoC, its not only relative... but also what makes you think this is the first PoC and only PoC he's going to do? The point is that hardware is reusable and it also sets a pattern for what the future cluster will look like. After this PoC, why not look at Storm, Mesos, Spark, Shark, etc... 

Trust me, as someone who has had to fight for allocation of hardware dollars for R&D... get the best bang you can for your buck.

HTH

-Mike

On May 6, 2013, at 5:57 PM, Patai Sangbutsarakum <Pa...@turn.com> wrote:

> I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
> Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
> Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.
> 
> You will get the idea how is your application will behave, how big of the data set you will play with
> is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration. 
> 
> 
> 
> On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com> wrote:
> 
>> 8 physical cores is so 2009 - 2010 :-)
>> 
>> Intel now offers a chip w 10 physical cores on a die. 
>> You are better off thinking of 4-8 GB per physical core. 
>> It depends on what you want to do, and what you think you may want to do...
>> 
>> It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:
>> 
>>> 
>>> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>> 
>>> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
>>> 
>>> If you look at the different configurations mentioned in this thread, you will see different limitations.
>>> 
>>> For instance:
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>> 
>>> This configuration is mostly limited by networking bandwidth
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>>  
>>> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>>>  
>>> 
>>> 
>>> 
>>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>>> 
>>> 
>>> 
> 


Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
I wouldn't.

You end up with a 'Frankencluster' which could become problematic down the road. 

Ever try to debug a port failure on a switch? (It does happen and its a bitch.) 
Note that you say 'reliable'... older hardware may or may not be reliable.... or under warranty.
(How many here build their own servers from the components up?  ;-) 

I'm not suggesting that you go out and buy a 10 core cpu, however, depending on who you are, and what your budget is... it may make sense. o 
Even for a proof of concept. ;-) 

While we have a rough metric on spindles to cores, you end up putting a stress on the disk controllers. YMMV.

As to spending $$$ on hardware for  a PoC, its not only relative... but also what makes you think this is the first PoC and only PoC he's going to do? The point is that hardware is reusable and it also sets a pattern for what the future cluster will look like. After this PoC, why not look at Storm, Mesos, Spark, Shark, etc... 

Trust me, as someone who has had to fight for allocation of hardware dollars for R&D... get the best bang you can for your buck.

HTH

-Mike

On May 6, 2013, at 5:57 PM, Patai Sangbutsarakum <Pa...@turn.com> wrote:

> I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
> Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
> Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.
> 
> You will get the idea how is your application will behave, how big of the data set you will play with
> is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration. 
> 
> 
> 
> On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com> wrote:
> 
>> 8 physical cores is so 2009 - 2010 :-)
>> 
>> Intel now offers a chip w 10 physical cores on a die. 
>> You are better off thinking of 4-8 GB per physical core. 
>> It depends on what you want to do, and what you think you may want to do...
>> 
>> It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:
>> 
>>> 
>>> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>> 
>>> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
>>> 
>>> If you look at the different configurations mentioned in this thread, you will see different limitations.
>>> 
>>> For instance:
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>> 
>>> This configuration is mostly limited by networking bandwidth
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>>  
>>> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>>>  
>>> 
>>> 
>>> 
>>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>>> 
>>> 
>>> 
> 


Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
I wouldn't.

You end up with a 'Frankencluster' which could become problematic down the road. 

Ever try to debug a port failure on a switch? (It does happen and its a bitch.) 
Note that you say 'reliable'... older hardware may or may not be reliable.... or under warranty.
(How many here build their own servers from the components up?  ;-) 

I'm not suggesting that you go out and buy a 10 core cpu, however, depending on who you are, and what your budget is... it may make sense. o 
Even for a proof of concept. ;-) 

While we have a rough metric on spindles to cores, you end up putting a stress on the disk controllers. YMMV.

As to spending $$$ on hardware for  a PoC, its not only relative... but also what makes you think this is the first PoC and only PoC he's going to do? The point is that hardware is reusable and it also sets a pattern for what the future cluster will look like. After this PoC, why not look at Storm, Mesos, Spark, Shark, etc... 

Trust me, as someone who has had to fight for allocation of hardware dollars for R&D... get the best bang you can for your buck.

HTH

-Mike

On May 6, 2013, at 5:57 PM, Patai Sangbutsarakum <Pa...@turn.com> wrote:

> I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
> Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
> Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.
> 
> You will get the idea how is your application will behave, how big of the data set you will play with
> is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration. 
> 
> 
> 
> On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com> wrote:
> 
>> 8 physical cores is so 2009 - 2010 :-)
>> 
>> Intel now offers a chip w 10 physical cores on a die. 
>> You are better off thinking of 4-8 GB per physical core. 
>> It depends on what you want to do, and what you think you may want to do...
>> 
>> It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...
>> 
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:
>> 
>>> 
>>> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>> 
>>> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
>>> 
>>> If you look at the different configurations mentioned in this thread, you will see different limitations.
>>> 
>>> For instance:
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>> 
>>> This configuration is mostly limited by networking bandwidth
>>> 
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>>  
>>> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>>>  
>>> 
>>> 
>>> 
>>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>> 
>>> 
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
>>> 
>>> 
>>> 
> 


Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.
I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.

You will get the idea how is your application will behave, how big of the data set you will play with
is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration.



On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com>> wrote:

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die.
You are better off thinking of 4-8 GB per physical core.
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com>> wrote:


Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you will see different limitations.

For instance:

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
64GB mem                <==== slightly larger than necessary
2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
24GB mem                <==== 24GB << 8 x 6GB
2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com>> wrote:
IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com>> wrote:
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj





Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.
I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.

You will get the idea how is your application will behave, how big of the data set you will play with
is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration.



On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com>> wrote:

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die.
You are better off thinking of 4-8 GB per physical core.
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com>> wrote:


Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you will see different limitations.

For instance:

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
64GB mem                <==== slightly larger than necessary
2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
24GB mem                <==== 24GB << 8 x 6GB
2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com>> wrote:
IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com>> wrote:
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj





Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.
I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.

You will get the idea how is your application will behave, how big of the data set you will play with
is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration.



On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com>> wrote:

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die.
You are better off thinking of 4-8 GB per physical core.
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com>> wrote:


Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you will see different limitations.

For instance:

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
64GB mem                <==== slightly larger than necessary
2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
24GB mem                <==== 24GB << 8 x 6GB
2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com>> wrote:
IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com>> wrote:
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj





Re: Hardware Selection for Hadoop

Posted by Patai Sangbutsarakum <Pa...@turn.com>.
I really doubt if he would spend $ to by 10 cores on a die CPU for "proof of concept" machines.
Actually, I even think of telling you to gathering old machines (but reliable) as much as you can collect.
Put as much as disks, Ram you can. teaming up NIC if you can, and at that point you can proof your concept up to certain point.

You will get the idea how is your application will behave, how big of the data set you will play with
is the application cpu or io bound, and from that you can go out shopping buy the best fit server configuration.



On May 6, 2013, at 4:17 AM, Michel Segel <mi...@hotmail.com>> wrote:

8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die.
You are better off thinking of 4-8 GB per physical core.
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com>> wrote:


Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you will see different limitations.

For instance:

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
64GB mem                <==== slightly larger than necessary
2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB

This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
24GB mem                <==== 24GB << 8 x 6GB
2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB

This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com>> wrote:
IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.


On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com>> wrote:
2 x Quad cores Intel
2-3 TB x 6 SATA
64GB mem
2 NICs teaming

my 2 cents


On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>>
 wrote:

Hi,

I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?

Regards,
Raj





Re: Hardware Selection for Hadoop

Posted by Michel Segel <mi...@hotmail.com>.
8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die. 
You are better off thinking of 4-8 GB per physical core. 
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:

> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>> 
>> 
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
> 

Re: Hardware Selection for Hadoop

Posted by Michel Segel <mi...@hotmail.com>.
8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die. 
You are better off thinking of 4-8 GB per physical core. 
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:

> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>> 
>> 
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
> 

Re: Hardware Selection for Hadoop

Posted by Michel Segel <mi...@hotmail.com>.
8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die. 
You are better off thinking of 4-8 GB per physical core. 
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:

> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>> 
>> 
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
> 

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks Mohit and Ted!


On Mon, May 6, 2013 at 9:11 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>>  2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>>  my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks Mohit and Ted!


On Mon, May 6, 2013 at 9:11 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>>  2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>>  my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
I wouldn't go the route of multiple nics unless you are using MapR. 
MapR allows you to do port bonding  or rather use both ports simultaneously. 
When you port bond. 1+1 != 2 and then you have some other configuration issues. 
(Unless they've fixed them)

If this is your first cluster... keep it simple.  If your machine comes w 2 nic ports, use one and then once you're an 'expurt',  turn on the second port. 

HTH

-Mike

On May 5, 2013, at 11:05 PM, Mohit Anchlia <mo...@gmail.com> wrote:

> Multiple NICs provide 2 benefits, 1) high availability 2) increases the network bandwidth when using LACP type model.
> 
> On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> OK. I do not know if I understand the spindle / core thing. I will dig more into that.
> 
> Thanks for the info. 
> 
> One more thing , whats the significance of multiple NIC.
> 
> Thanks,
> Rahul
> 
> 
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:
> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
> 
> 
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
> 
> my 2 cents
> 
> 
> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
> 
>> Hi,
>>  
>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>  
>> Regards,
>> Raj
> 
> 
> 
> 
> 


Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
I wouldn't go the route of multiple nics unless you are using MapR. 
MapR allows you to do port bonding  or rather use both ports simultaneously. 
When you port bond. 1+1 != 2 and then you have some other configuration issues. 
(Unless they've fixed them)

If this is your first cluster... keep it simple.  If your machine comes w 2 nic ports, use one and then once you're an 'expurt',  turn on the second port. 

HTH

-Mike

On May 5, 2013, at 11:05 PM, Mohit Anchlia <mo...@gmail.com> wrote:

> Multiple NICs provide 2 benefits, 1) high availability 2) increases the network bandwidth when using LACP type model.
> 
> On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> OK. I do not know if I understand the spindle / core thing. I will dig more into that.
> 
> Thanks for the info. 
> 
> One more thing , whats the significance of multiple NIC.
> 
> Thanks,
> Rahul
> 
> 
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:
> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
> 
> 
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
> 
> my 2 cents
> 
> 
> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
> 
>> Hi,
>>  
>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>  
>> Regards,
>> Raj
> 
> 
> 
> 
> 


Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
I wouldn't go the route of multiple nics unless you are using MapR. 
MapR allows you to do port bonding  or rather use both ports simultaneously. 
When you port bond. 1+1 != 2 and then you have some other configuration issues. 
(Unless they've fixed them)

If this is your first cluster... keep it simple.  If your machine comes w 2 nic ports, use one and then once you're an 'expurt',  turn on the second port. 

HTH

-Mike

On May 5, 2013, at 11:05 PM, Mohit Anchlia <mo...@gmail.com> wrote:

> Multiple NICs provide 2 benefits, 1) high availability 2) increases the network bandwidth when using LACP type model.
> 
> On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> OK. I do not know if I understand the spindle / core thing. I will dig more into that.
> 
> Thanks for the info. 
> 
> One more thing , whats the significance of multiple NIC.
> 
> Thanks,
> Rahul
> 
> 
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:
> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
> 
> 
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
> 
> my 2 cents
> 
> 
> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
> 
>> Hi,
>>  
>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>  
>> Regards,
>> Raj
> 
> 
> 
> 
> 


Re: Hardware Selection for Hadoop

Posted by Michael Segel <mi...@hotmail.com>.
I wouldn't go the route of multiple nics unless you are using MapR. 
MapR allows you to do port bonding  or rather use both ports simultaneously. 
When you port bond. 1+1 != 2 and then you have some other configuration issues. 
(Unless they've fixed them)

If this is your first cluster... keep it simple.  If your machine comes w 2 nic ports, use one and then once you're an 'expurt',  turn on the second port. 

HTH

-Mike

On May 5, 2013, at 11:05 PM, Mohit Anchlia <mo...@gmail.com> wrote:

> Multiple NICs provide 2 benefits, 1) high availability 2) increases the network bandwidth when using LACP type model.
> 
> On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> OK. I do not know if I understand the spindle / core thing. I will dig more into that.
> 
> Thanks for the info. 
> 
> One more thing , whats the significance of multiple NIC.
> 
> Thanks,
> Rahul
> 
> 
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:
> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
> 
> 
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
> 2 x Quad cores Intel
> 2-3 TB x 6 SATA
> 64GB mem
> 2 NICs teaming
> 
> my 2 cents
> 
> 
> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>  wrote:
> 
>> Hi,
>>  
>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>  
>> Regards,
>> Raj
> 
> 
> 
> 
> 


Re: Hardware Selection for Hadoop

Posted by Mohit Anchlia <mo...@gmail.com>.
Multiple NICs provide 2 benefits, 1) high availability 2) increases the
network bandwidth when using LACP type model.

On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

>  OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>>  IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>> 2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>> my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Mohit Anchlia <mo...@gmail.com>.
Multiple NICs provide 2 benefits, 1) high availability 2) increases the
network bandwidth when using LACP type model.

On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

>  OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>>  IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>> 2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>> my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Mohit Anchlia <mo...@gmail.com>.
Multiple NICs provide 2 benefits, 1) high availability 2) increases the
network bandwidth when using LACP type model.

On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

>  OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>>  IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>> 2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>> my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks Mohit and Ted!


On Mon, May 6, 2013 at 9:11 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>>  2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>>  my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Mohit Anchlia <mo...@gmail.com>.
Multiple NICs provide 2 benefits, 1) high availability 2) increases the
network bandwidth when using LACP type model.

On Sun, May 5, 2013 at 8:41 PM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

>  OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>>  IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>> 2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>> my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks Mohit and Ted!


On Mon, May 6, 2013 at 9:11 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> OK. I do not know if I understand the spindle / core thing. I will dig
> more into that.
>
> Thanks for the info.
>
> One more thing , whats the significance of multiple NIC.
>
> Thanks,
> Rahul
>
>
> On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com>wrote:
>
>>
>> Data nodes normally are also task nodes.  With 8 physical cores it isn't
>> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>>
>> Achieving highest performance requires that you match the capabilities of
>> your nodes including CPU, memory, disk and networking.  The standard wisdom
>> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
>> disk bandwidth available as network bandwidth.
>>
>> If you look at the different configurations mentioned in this thread, you
>> will see different limitations.
>>
>> For instance:
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 64GB mem                <==== slightly larger than necessary
>>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is mostly limited by networking bandwidth
>>
>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>>> 24GB mem                <==== 24GB << 8 x 6GB
>>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>>
>>
>> This configuration is weak on disk relative to CPU and very weak on disk
>> relative to network speed.  The worst problem, however, is likely to be
>> small memory.  This will likely require us to decrease the number of slots
>> by half or more making it impossible to even use the 6 disks that we have
>> and making the network even more outrageously over-provisioned.
>>
>>
>>
>>
>> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>>> Patai.Sangbutsarakum@turn.com> wrote:
>>>
>>>>  2 x Quad cores Intel
>>>> 2-3 TB x 6 SATA
>>>> 64GB mem
>>>> 2 NICs teaming
>>>>
>>>>  my 2 cents
>>>>
>>>>
>>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>>  wrote:
>>>>
>>>>      Hi,
>>>>
>>>> I have to propose some hardware requirements in my company for a Proof
>>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>>> Cloudera Website. But just wanted to know from the group - what is the
>>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>>> time, the data that need to be processed at this time for the Proof of
>>>> Concept. So - can you suggest something to me?
>>>>
>>>> Regards,
>>>> Raj
>>>>
>>>>
>>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
OK. I do not know if I understand the spindle / core thing. I will dig more
into that.

Thanks for the info.

One more thing , whats the significance of multiple NIC.

Thanks,
Rahul


On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:

>
> Data nodes normally are also task nodes.  With 8 physical cores it isn't
> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>
> Achieving highest performance requires that you match the capabilities of
> your nodes including CPU, memory, disk and networking.  The standard wisdom
> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
> disk bandwidth available as network bandwidth.
>
> If you look at the different configurations mentioned in this thread, you
> will see different limitations.
>
> For instance:
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is mostly limited by networking bandwidth
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is weak on disk relative to CPU and very weak on disk
> relative to network speed.  The worst problem, however, is likely to be
> small memory.  This will likely require us to decrease the number of slots
> by half or more making it impossible to even use the 6 disks that we have
> and making the network even more outrageously over-provisioned.
>
>
>
>
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>
>>
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>> Patai.Sangbutsarakum@turn.com> wrote:
>>
>>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>>
>>>  my 2 cents
>>>
>>>
>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>>
>>>      Hi,
>>>
>>> I have to propose some hardware requirements in my company for a Proof
>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>> Cloudera Website. But just wanted to know from the group - what is the
>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>> time, the data that need to be processed at this time for the Proof of
>>> Concept. So - can you suggest something to me?
>>>
>>> Regards,
>>> Raj
>>>
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
OK. I do not know if I understand the spindle / core thing. I will dig more
into that.

Thanks for the info.

One more thing , whats the significance of multiple NIC.

Thanks,
Rahul


On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:

>
> Data nodes normally are also task nodes.  With 8 physical cores it isn't
> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>
> Achieving highest performance requires that you match the capabilities of
> your nodes including CPU, memory, disk and networking.  The standard wisdom
> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
> disk bandwidth available as network bandwidth.
>
> If you look at the different configurations mentioned in this thread, you
> will see different limitations.
>
> For instance:
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is mostly limited by networking bandwidth
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is weak on disk relative to CPU and very weak on disk
> relative to network speed.  The worst problem, however, is likely to be
> small memory.  This will likely require us to decrease the number of slots
> by half or more making it impossible to even use the 6 disks that we have
> and making the network even more outrageously over-provisioned.
>
>
>
>
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>
>>
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>> Patai.Sangbutsarakum@turn.com> wrote:
>>
>>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>>
>>>  my 2 cents
>>>
>>>
>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>>
>>>      Hi,
>>>
>>> I have to propose some hardware requirements in my company for a Proof
>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>> Cloudera Website. But just wanted to know from the group - what is the
>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>> time, the data that need to be processed at this time for the Proof of
>>> Concept. So - can you suggest something to me?
>>>
>>> Regards,
>>> Raj
>>>
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Michel Segel <mi...@hotmail.com>.
8 physical cores is so 2009 - 2010 :-)

Intel now offers a chip w 10 physical cores on a die. 
You are better off thinking of 4-8 GB per physical core. 
It depends on what you want to do, and what you think you may want to do...

It also depends on the price points of the hardware. Memory, drives, CPUs (price by clock speeds...) you just need to find the right optimum between price and performance...


Sent from a remote device. Please excuse any typos...

Mike Segel

On May 5, 2013, at 1:47 PM, Ted Dunning <td...@maprtech.com> wrote:

> 
> Data nodes normally are also task nodes.  With 8 physical cores it isn't that unreasonable to have 64GB whereas 24GB really is going to pinch.
> 
> Achieving highest performance requires that you match the capabilities of your nodes including CPU, memory, disk and networking.  The standard wisdom is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of disk bandwidth available as network bandwidth.
> 
> If you look at the different configurations mentioned in this thread, you will see different limitations.
> 
> For instance:
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
> 
> This configuration is mostly limited by networking bandwidth
> 
>> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>  
> This configuration is weak on disk relative to CPU and very weak on disk relative to network speed.  The worst problem, however, is likely to be small memory.  This will likely require us to decrease the number of slots by half or more making it impossible to even use the 6 disks that we have and making the network even more outrageously over-provisioned.
>  
> 
> 
> 
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <ra...@gmail.com> wrote:
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>> 
>> 
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <Pa...@turn.com> wrote:
>>> 2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>> 
>>> my 2 cents
>>> 
>>> 
>>> On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>> 
>>>> Hi,
>>>>  
>>>> I have to propose some hardware requirements in my company for a Proof of Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera Website. But just wanted to know from the group - what is the requirements if I have to plan for a 5 node cluster. I dont know at this time, the data that need to be processed at this time for the Proof of Concept. So - can you suggest something to me?
>>>>  
>>>> Regards,
>>>> Raj
> 

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
OK. I do not know if I understand the spindle / core thing. I will dig more
into that.

Thanks for the info.

One more thing , whats the significance of multiple NIC.

Thanks,
Rahul


On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:

>
> Data nodes normally are also task nodes.  With 8 physical cores it isn't
> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>
> Achieving highest performance requires that you match the capabilities of
> your nodes including CPU, memory, disk and networking.  The standard wisdom
> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
> disk bandwidth available as network bandwidth.
>
> If you look at the different configurations mentioned in this thread, you
> will see different limitations.
>
> For instance:
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is mostly limited by networking bandwidth
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is weak on disk relative to CPU and very weak on disk
> relative to network speed.  The worst problem, however, is likely to be
> small memory.  This will likely require us to decrease the number of slots
> by half or more making it impossible to even use the 6 disks that we have
> and making the network even more outrageously over-provisioned.
>
>
>
>
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>
>>
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>> Patai.Sangbutsarakum@turn.com> wrote:
>>
>>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>>
>>>  my 2 cents
>>>
>>>
>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>>
>>>      Hi,
>>>
>>> I have to propose some hardware requirements in my company for a Proof
>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>> Cloudera Website. But just wanted to know from the group - what is the
>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>> time, the data that need to be processed at this time for the Proof of
>>> Concept. So - can you suggest something to me?
>>>
>>> Regards,
>>> Raj
>>>
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
OK. I do not know if I understand the spindle / core thing. I will dig more
into that.

Thanks for the info.

One more thing , whats the significance of multiple NIC.

Thanks,
Rahul


On Mon, May 6, 2013 at 12:17 AM, Ted Dunning <td...@maprtech.com> wrote:

>
> Data nodes normally are also task nodes.  With 8 physical cores it isn't
> that unreasonable to have 64GB whereas 24GB really is going to pinch.
>
> Achieving highest performance requires that you match the capabilities of
> your nodes including CPU, memory, disk and networking.  The standard wisdom
> is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
> disk bandwidth available as network bandwidth.
>
> If you look at the different configurations mentioned in this thread, you
> will see different limitations.
>
> For instance:
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 64GB mem                <==== slightly larger than necessary
>> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is mostly limited by networking bandwidth
>
> 2 x Quad cores Intel
>> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
>> 24GB mem                <==== 24GB << 8 x 6GB
>> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB
>
>
> This configuration is weak on disk relative to CPU and very weak on disk
> relative to network speed.  The worst problem, however, is likely to be
> small memory.  This will likely require us to decrease the number of slots
> by half or more making it impossible to even use the 6 disks that we have
> and making the network even more outrageously over-provisioned.
>
>
>
>
> On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>>
>>
>> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
>> Patai.Sangbutsarakum@turn.com> wrote:
>>
>>>  2 x Quad cores Intel
>>> 2-3 TB x 6 SATA
>>> 64GB mem
>>> 2 NICs teaming
>>>
>>>  my 2 cents
>>>
>>>
>>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>>  wrote:
>>>
>>>      Hi,
>>>
>>> I have to propose some hardware requirements in my company for a Proof
>>> of Concept with Hadoop. I was reading Hadoop Operations and also saw
>>> Cloudera Website. But just wanted to know from the group - what is the
>>> requirements if I have to plan for a 5 node cluster. I dont know at this
>>> time, the data that need to be processed at this time for the Proof of
>>> Concept. So - can you suggest something to me?
>>>
>>> Regards,
>>> Raj
>>>
>>>
>>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Data nodes normally are also task nodes.  With 8 physical cores it isn't
that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of
your nodes including CPU, memory, disk and networking.  The standard wisdom
is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you
will see different limitations.

For instance:

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB


This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB


This configuration is weak on disk relative to CPU and very weak on disk
relative to network speed.  The worst problem, however, is likely to be
small memory.  This will likely require us to decrease the number of slots
by half or more making it impossible to even use the 6 disks that we have
and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>
>
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
> Patai.Sangbutsarakum@turn.com> wrote:
>
>>  2 x Quad cores Intel
>> 2-3 TB x 6 SATA
>> 64GB mem
>> 2 NICs teaming
>>
>>  my 2 cents
>>
>>
>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>  wrote:
>>
>>      Hi,
>>
>> I have to propose some hardware requirements in my company for a Proof of
>> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
>> Website. But just wanted to know from the group - what is the requirements
>> if I have to plan for a 5 node cluster. I dont know at this time, the data
>> that need to be processed at this time for the Proof of Concept. So - can
>> you suggest something to me?
>>
>> Regards,
>> Raj
>>
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Data nodes normally are also task nodes.  With 8 physical cores it isn't
that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of
your nodes including CPU, memory, disk and networking.  The standard wisdom
is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you
will see different limitations.

For instance:

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB


This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB


This configuration is weak on disk relative to CPU and very weak on disk
relative to network speed.  The worst problem, however, is likely to be
small memory.  This will likely require us to decrease the number of slots
by half or more making it impossible to even use the 6 disks that we have
and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>
>
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
> Patai.Sangbutsarakum@turn.com> wrote:
>
>>  2 x Quad cores Intel
>> 2-3 TB x 6 SATA
>> 64GB mem
>> 2 NICs teaming
>>
>>  my 2 cents
>>
>>
>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>  wrote:
>>
>>      Hi,
>>
>> I have to propose some hardware requirements in my company for a Proof of
>> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
>> Website. But just wanted to know from the group - what is the requirements
>> if I have to plan for a 5 node cluster. I dont know at this time, the data
>> that need to be processed at this time for the Proof of Concept. So - can
>> you suggest something to me?
>>
>> Regards,
>> Raj
>>
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Data nodes normally are also task nodes.  With 8 physical cores it isn't
that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of
your nodes including CPU, memory, disk and networking.  The standard wisdom
is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you
will see different limitations.

For instance:

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB


This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB


This configuration is weak on disk relative to CPU and very weak on disk
relative to network speed.  The worst problem, however, is likely to be
small memory.  This will likely require us to decrease the number of slots
by half or more making it impossible to even use the 6 disks that we have
and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>
>
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
> Patai.Sangbutsarakum@turn.com> wrote:
>
>>  2 x Quad cores Intel
>> 2-3 TB x 6 SATA
>> 64GB mem
>> 2 NICs teaming
>>
>>  my 2 cents
>>
>>
>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>  wrote:
>>
>>      Hi,
>>
>> I have to propose some hardware requirements in my company for a Proof of
>> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
>> Website. But just wanted to know from the group - what is the requirements
>> if I have to plan for a 5 node cluster. I dont know at this time, the data
>> that need to be processed at this time for the Proof of Concept. So - can
>> you suggest something to me?
>>
>> Regards,
>> Raj
>>
>>
>>
>

Re: Hardware Selection for Hadoop

Posted by Ted Dunning <td...@maprtech.com>.
Data nodes normally are also task nodes.  With 8 physical cores it isn't
that unreasonable to have 64GB whereas 24GB really is going to pinch.

Achieving highest performance requires that you match the capabilities of
your nodes including CPU, memory, disk and networking.  The standard wisdom
is 4-6GB of RAM per core, at least a spindle per core and 1/2 to 2/3 of
disk bandwidth available as network bandwidth.

If you look at the different configurations mentioned in this thread, you
will see different limitations.

For instance:

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 64GB mem                <==== slightly larger than necessary
> 2 1GBe NICs teaming     <==== 2 x 100 MB << 400MB = 2/3 x 6 x 100MB


This configuration is mostly limited by networking bandwidth

2 x Quad cores Intel
> 2-3 TB x 6 SATA         <==== 6 disk < desired 8 or more
> 24GB mem                <==== 24GB << 8 x 6GB
> 2 10GBe NICs teaming    <==== 2 x 1000 MB > 400MB = 2/3 x 6 x 100MB


This configuration is weak on disk relative to CPU and very weak on disk
relative to network speed.  The worst problem, however, is likely to be
small memory.  This will likely require us to decrease the number of slots
by half or more making it impossible to even use the 6 disks that we have
and making the network even more outrageously over-provisioned.




On Sun, May 5, 2013 at 9:41 AM, Rahul Bhattacharjee <rahul.rec.dgp@gmail.com
> wrote:

> IMHO ,64 G looks bit high for DN. 24 should be good enough for DN.
>
>
> On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum <
> Patai.Sangbutsarakum@turn.com> wrote:
>
>>  2 x Quad cores Intel
>> 2-3 TB x 6 SATA
>> 64GB mem
>> 2 NICs teaming
>>
>>  my 2 cents
>>
>>
>>  On Apr 29, 2013, at 9:24 AM, Raj Hadoop <ha...@yahoo.com>
>>  wrote:
>>
>>      Hi,
>>
>> I have to propose some hardware requirements in my company for a Proof of
>> Concept with Hadoop. I was reading Hadoop Operations and also saw Cloudera
>> Website. But just wanted to know from the group - what is the requirements
>> if I have to plan for a 5 node cluster. I dont know at this time, the data
>> that need to be processed at this time for the Proof of Concept. So - can
>> you suggest something to me?
>>
>> Regards,
>> Raj
>>
>>
>>
>