You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Phantom <gh...@gmail.com> on 2007/06/08 19:34:40 UTC

Setting up large clusters

Hi

I had a question about ways of setting up large clusters. I did read the
WIKI which has a posting on this matter and I have also been through the
exercise of setting up a cluster of 15 nodes. If I were to scale that out to
100 nodes do I need to manually add the new nodes to the slaves file and
bounce the master server ? Or can I just start the task tracker on the new
nodes pointing them to the master ? If I have to bounce the master every
time I scale out my cluster what happens to the jobs that are currently
running ? Could someone please enlighten me regarding this ?

Thanks in advance
Avinash

Re: Setting up large clusters

Posted by Dennis Kubes <ku...@apache.org>.
To add new nodes you can just start up the datanodes and point them to 
the namenode (tasktracker to jobtracker).  They will join the cluster 
and any current jobs can continue.

If you want to be able to start and stop them from a single machine 
(doesn't necessarily need to be the namenode) you will need to setup 
both ssh keys from that single machines to all slaves and add the slave 
machines to the slaves file.

Dennis Kubes

Phantom wrote:
> Hi
> 
> I had a question about ways of setting up large clusters. I did read the
> WIKI which has a posting on this matter and I have also been through the
> exercise of setting up a cluster of 15 nodes. If I were to scale that 
> out to
> 100 nodes do I need to manually add the new nodes to the slaves file and
> bounce the master server ? Or can I just start the task tracker on the new
> nodes pointing them to the master ? If I have to bounce the master every
> time I scale out my cluster what happens to the jobs that are currently
> running ? Could someone please enlighten me regarding this ?
> 
> Thanks in advance
> Avinash
> 

Re: Setting up large clusters

Posted by Michael Bieniosek <mi...@powerset.com>.
The slaves connect to the master, not the other way around.

I don't use a slaves file at all; I just point new tasktrackers at the
jobtracker and everything just works (without restarting).

My understanding is that the slaves file, if present, merely functions as an
"allow list" of slaves that can connect.

If you do restart your jobtracker, I think your active jobs will stop and
move to the jobtracker history page, and the next job you create will be
called job_0001.

-Michael

On 6/8/07 10:34 AM, "Phantom" <gh...@gmail.com> wrote:

> Hi
> 
> I had a question about ways of setting up large clusters. I did read the
> WIKI which has a posting on this matter and I have also been through the
> exercise of setting up a cluster of 15 nodes. If I were to scale that out to
> 100 nodes do I need to manually add the new nodes to the slaves file and
> bounce the master server ? Or can I just start the task tracker on the new
> nodes pointing them to the master ? If I have to bounce the master every
> time I scale out my cluster what happens to the jobs that are currently
> running ? Could someone please enlighten me regarding this ?
> 
> Thanks in advance
> Avinash