You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Joel Welling <we...@psc.edu> on 2008/09/18 00:27:31 UTC
gridmix on a small cluster?
Hi folks;
I'd like to try the gridmix benchmark on my small cluster (3 nodes at
8 cores each, Lustre with IB interconnect). The documentation for
gridmix suggests that it will take 4 hours on a 500 node cluster, which
suggests it would take me something like a week to run. Is there a way
to scale the problem size back? I don't mind the file size too much,
but the running time would be excessive if things scale linearly with
the number of nodes.
Thanks,
-Joel
Re: gridmix on a small cluster?
Posted by Chris Douglas <ch...@yahoo-inc.com>.
Yes. If you look at the README, gridmix-env, and the generateData
script, you should be able to alter the job mix to match your
requirements. In particular, you probably want to look closely at the
number of small, medium, and large jobs for each run. For a three node
cluster, you might want to try running only the small jobs (possibly
the medium jobs). Note that you don't have to generate the entropy
dataset if you don't plan on running any large jobs (what it tests is
not interesting on three nodes anyway). Note that the "real" dataset
is 1000 times larger than what generateData does by default; a smaller
dataset may let you keep the total number of jobs up, though you
should also be wary of the load on the submitting node (see
submissionScripts/sleep_if_too_busy). Keep in mind that each node may
also store (possibly uncompressed) copies of the datasets as
intermedate map outputs, so budgeting for local disk space will also
be important while gridmix runs, particularly for "medium" jobs. Good
luck. -C
On Sep 17, 2008, at 3:27 PM, Joel Welling wrote:
> Hi folks;
> I'd like to try the gridmix benchmark on my small cluster (3 nodes at
> 8 cores each, Lustre with IB interconnect). The documentation for
> gridmix suggests that it will take 4 hours on a 500 node cluster,
> which
> suggests it would take me something like a week to run. Is there a
> way
> to scale the problem size back? I don't mind the file size too much,
> but the running time would be excessive if things scale linearly with
> the number of nodes.
>
> Thanks,
> -Joel
>