You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Cam Bazz <ca...@gmail.com> on 2011/02/14 01:58:18 UTC

how far can I go with a 1 node cluster

Hello,

So all my statistics is finally being calculated, results being
processed etc, i have a 1 node cluster. Mainly taking 3 aggreate logs
from my apache logs.

How far this setup will go? I have another machine ready to be hooked
up to my setup, and i wonder if it is worth at the moment to add this
and be a 2 node cluster.

The first node has 8gb ram and a quad core 3.0ghz, The second computer
I have is much more noisy, and spends more electricity.  İt has 8gb
ram and dual opterons with dual cores - and running at 2.0ghz.

Best Regards,
C.B.

Re: how far can I go with a 1 node cluster

Posted by Ajo Fod <aj...@gmail.com>.
Yes, I've often wondered about asymmetric configurations. Is there a
mechanism to  prevent partition map/reduce jobs to be aware of differences
between speeds of processors and allocate less work the the slower
processors?

To try to answer the question here: I have not had much experience with
multi-node clusters, but I'd start with checking if the 4 cores are being
used ... especiallly in the part of the process that takes the longest
(Amdahl's law) ... you can only get a speedup if that is already happening.

Here are a few other questions I go through:

Does the process take very long? At the very least the task should take
longer than twice the time it takes you to switch on switch on and boot up
the other computer ... rebalance HDFS and then run the job and switch off
the computer ... and all the investment in time to figure out how to use and
maintain the multi-node configuration.

How often do you need to run the job? ... if it is only once a day ... and
it can be run in the background or while the processor is not busy, perhaps
you can schedule it on your PC for when you are taking a break.

Are you developing code? ... If so, it is perhaps more efficient to run on
one computer and test with a small chunk of data.

So, in summary, I'd use multiple computers as a last resort ... multi core
is good enough for me most of the time.

Thanks,
-Ajo.

On Sun, Feb 13, 2011 at 4:58 PM, Cam Bazz <ca...@gmail.com> wrote:

> Hello,
>
> So all my statistics is finally being calculated, results being
> processed etc, i have a 1 node cluster. Mainly taking 3 aggreate logs
> from my apache logs.
>
> How far this setup will go? I have another machine ready to be hooked
> up to my setup, and i wonder if it is worth at the moment to add this
> and be a 2 node cluster.
>
> The first node has 8gb ram and a quad core 3.0ghz, The second computer
> I have is much more noisy, and spends more electricity.  İt has 8gb
> ram and dual opterons with dual cores - and running at 2.0ghz.
>
> Best Regards,
> C.B.
>