You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Bwolen Yang <wb...@gmail.com> on 2007/06/11 20:44:36 UTC

configs for large clusters

Hi,

As a newbie to Hadoop, I have being wondering what's the best way to
configure my cluster, especially as one scales up.    After seeing
Doug's update to sort 900 performance, it occured to me that it may be
helpful to others to see configuration files examples, espeically for
large clusters.  Furthermore, if we can diff against the
configurations over time (and/or releases), we may be able to see how
Hadoop developers tune their own clusters (and hence follow suit :).
 Could the configs along with rough cluster specs be posted somewhere
on hadoop's website?  And perhaps encourage others (with different
system setups) to post similarly?

I'm also interested in seeing how people tune their clusters for
different kind of machines  (e.g, single disk machines vs 4-6 disk
machines), and hetergenous systems (different CPU power, disk size,
memory size...etc).   The hetergenous part arises for people who are
resource strapped and basically tried hard to put together a sizeable
system with whatever machines they have got.     In my case, bad
config can hurt as I add news machines (e.g., a machine with small
disk, fills up quicker and task scheduled there tend to die).

thanks

bwolen

Re: configs for large clusters

Posted by Doug Cutting <cu...@apache.org>.
Note that Nigel recently added some to the FAQ on the wiki:

http://wiki.apache.org/lucene-hadoop/FAQ#head-1b2c093275a1a8a7e7068de941b776fcceafbf44

It would be good to understand better which of these make significant 
differences and which do not.  And how many of these should we make the 
default?  And could some of these be set automatically, based on other 
cluster properties?

Doug

Richard wrote:
> I couldn't agree more.  There are quite a portion of questions that are relating to configuration more or less.  though there are pages explaining how (which is important), it would make things even easier if there were more various examples.
> 
> 
> 
> Bwolen Yang <wb...@gmail.com> wrote: Hi,
> 
> As a newbie to Hadoop, I have being wondering what's the best way to
> configure my cluster, especially as one scales up.    After seeing
> Doug's update to sort 900 performance, it occured to me that it may be
> helpful to others to see configuration files examples, espeically for
> large clusters.  Furthermore, if we can diff against the
> configurations over time (and/or releases), we may be able to see how
> Hadoop developers tune their own clusters (and hence follow suit :).
>  Could the configs along with rough cluster specs be posted somewhere
> on hadoop's website?  And perhaps encourage others (with different
> system setups) to post similarly?
> 
> I'm also interested in seeing how people tune their clusters for
> different kind of machines  (e.g, single disk machines vs 4-6 disk
> machines), and hetergenous systems (different CPU power, disk size,
> memory size...etc).   The hetergenous part arises for people who are
> resource strapped and basically tried hard to put together a sizeable
> system with whatever machines they have got.     In my case, bad
> config can hurt as I add news machines (e.g., a machine with small
> disk, fills up quicker and task scheduled there tend to die).
> 
> thanks
> 
> bwolen
> 
> 
> 
> Best Regards
> 
> Richard Yang
> richardyang@richardyang.net
> kusanagiyang@yahoo.com

Re: configs for large clusters

Posted by Richard <ku...@yahoo.com>.
I couldn't agree more.  There are quite a portion of questions that are relating to configuration more or less.  though there are pages explaining how (which is important), it would make things even easier if there were more various examples.



Bwolen Yang <wb...@gmail.com> wrote: Hi,

As a newbie to Hadoop, I have being wondering what's the best way to
configure my cluster, especially as one scales up.    After seeing
Doug's update to sort 900 performance, it occured to me that it may be
helpful to others to see configuration files examples, espeically for
large clusters.  Furthermore, if we can diff against the
configurations over time (and/or releases), we may be able to see how
Hadoop developers tune their own clusters (and hence follow suit :).
 Could the configs along with rough cluster specs be posted somewhere
on hadoop's website?  And perhaps encourage others (with different
system setups) to post similarly?

I'm also interested in seeing how people tune their clusters for
different kind of machines  (e.g, single disk machines vs 4-6 disk
machines), and hetergenous systems (different CPU power, disk size,
memory size...etc).   The hetergenous part arises for people who are
resource strapped and basically tried hard to put together a sizeable
system with whatever machines they have got.     In my case, bad
config can hurt as I add news machines (e.g., a machine with small
disk, fills up quicker and task scheduled there tend to die).

thanks

bwolen



Best Regards

Richard Yang
richardyang@richardyang.net
kusanagiyang@yahoo.com