You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Geoffry Roberts <ge...@gmail.com> on 2011/04/21 19:33:08 UTC

Seeking Advice on Upgrading a Cluster

All,

 I am a developer, not a super networking guy or hardware guy, and new to
Hadoop.

I'm working a research project. Funds are limited.  I have a compute problem
where I need to get the performance up on the processing of large text files
and no doubt Hadoop can help if I do things well.

I am cobbling my cluster together, to the greatest extent possible, out of
spare parts.  I can spend some money, but must do so with deliberation and
prudence.

I have at my disposal twelve, one time desk top computers:

   - Pentium 4 3.80GHz
   - 2-4G of memory
   - 1 Gigabit NIC
   - 1 Disk, Serial ATA/150 7,200 RPM

I have installed:

   - Ubuntu 10.10 /64 server
   - JDK /64
   - Hadoop 0.21.0

Processing is still slow.  I am tuning Hadoop, but I'm guessing I should
also upgrade my hardware.

What will give me the most bang for my buck?

   - Should I bring all machines up to 8G of memory? or is 4G good enough?
   (8 is the max.)
   - Should I double up the NICs and use LACP?
   - Should I double up the disks and attempt to flow my I/O from one disk
   to the another on the theory that this will minimizing contention?
   - Should I get another switch?  (I have a 10/100, 24 port Dlink and it's
   about 5 years old.)

Thanks in advance
-- 
Geoffry Roberts

Re: Seeking Advice on Upgrading a Cluster

Posted by Steve Loughran <st...@apache.org>.
On 21/04/11 18:33, Geoffry Roberts wrote:

> What will give me the most bang for my buck?
>
>     - Should I bring all machines up to 8G of memory? or is 4G good enough?
>     (8 is the max.)

depends on whether your code is running out of memory

>     - Should I double up the NICs and use LACP?

I would only recommend this for increasing availability at the expense 
of time spent getting it all to work.


>     - Should I double up the disks and attempt to flow my I/O from one disk
>     to the another on the theory that this will minimizing contention?

if your app is bandwidth bound (iotop should tell you this) then yes, 
this will help.

>     - Should I get another switch?  (I have a 10/100, 24 port Dlink and it's
>     about 5 years old.)

a gigabit switch is low cost now, I'd do that as one of my actions

Why not do some experiments by going to a smaller cluster and doubling 
the RAM and HDD from that cluster with those from your existing 
machines, and see which benefits your code the most?

Re: Seeking Advice on Upgrading a Cluster

Posted by Shrinivas Joshi <js...@gmail.com>.
Hi Geoffry,

A good answer to your question will probably involve more discussion on the
nature of the workload and the main bottlenecks that you are seeing with it.


My 2 cents:
If your workload is IO intensive, adding more disks and increasing the
amount of physical memory on these systems would probably be the least
expensive upgrade that you could try with. Enabling JVM reuse, compressing
map output using LZO library, tuning HDFS block size and avoiding map spills
are good starting points in terms of Hadoop level tuning.

HTH,
-Shrinivas

On Thu, Apr 21, 2011 at 12:33 PM, Geoffry Roberts <geoffry.roberts@gmail.com
> wrote:

> All,
>
>  I am a developer, not a super networking guy or hardware guy, and new to
> Hadoop.
>
> I'm working a research project. Funds are limited.  I have a compute
> problem
> where I need to get the performance up on the processing of large text
> files
> and no doubt Hadoop can help if I do things well.
>
> I am cobbling my cluster together, to the greatest extent possible, out of
> spare parts.  I can spend some money, but must do so with deliberation and
> prudence.
>
> I have at my disposal twelve, one time desk top computers:
>
>   - Pentium 4 3.80GHz
>   - 2-4G of memory
>   - 1 Gigabit NIC
>   - 1 Disk, Serial ATA/150 7,200 RPM
>
> I have installed:
>
>   - Ubuntu 10.10 /64 server
>   - JDK /64
>   - Hadoop 0.21.0
>
> Processing is still slow.  I am tuning Hadoop, but I'm guessing I should
> also upgrade my hardware.
>
> What will give me the most bang for my buck?
>
>   - Should I bring all machines up to 8G of memory? or is 4G good enough?
>   (8 is the max.)
>   - Should I double up the NICs and use LACP?
>   - Should I double up the disks and attempt to flow my I/O from one disk
>   to the another on the theory that this will minimizing contention?
>   - Should I get another switch?  (I have a 10/100, 24 port Dlink and it's
>   about 5 years old.)
>
> Thanks in advance
> --
> Geoffry Roberts
>