You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by tjohn <gt...@gmail.com> on 2008/03/09 17:54:07 UTC

dynamically adding slaves to hadoop cluster

Hi all, i m new to hadoop and i wanted to know how to dynamically add a slave
to my cluster, obviously while it' s running.
Thanks in advance,

John

-- 
View this message in context: http://www.nabble.com/dynamically-adding-slaves-to-hadoop-cluster-tp15943388p15943388.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: dynamically adding slaves to hadoop cluster

Posted by Owen O'Malley <oo...@yahoo-inc.com>.

On Mar 10, 2008, at 8:22 AM, Jason Venner wrote:

> Is there a /proper/ way to bring up the processes on the slave node  
> so that the master will recognize them at *stop* time?

yes, you can setup the pid files by using (directly on the newly  
added node!):

% bin/hadoop-daemon.sh start datanode
% bin/hadoop-daemon.sh start tasktracker

then the stop-all will know the pid to shut down. It is unfortunate  
that start-daemon.sh and start-daemons.sh differ only in the "s".  
start-daemons.sh should probably be start-slave-daemons.sh or something.

-- Owen

Re: dynamically adding slaves to hadoop cluster

Posted by Jason Venner <ja...@attributor.com>.

We have done this, and it works well. The one downside, is that the 
stop-dfs.sh and stop-mapred.sh (and of course stop-all.sh) doen't seem 
to control the hand started datanodes/job trackers. I am assuming it is 
because the pid files haven't been written to the pid directory but have 
not investigated.

Is there a /proper/ way to bring up the processes on the slave node so 
that the master will recognize them at *stop* time?

tjohn wrote:
>
> Mafish Liu wrote:
>   
>> On Mon, Mar 10, 2008 at 9:47 AM, Mafish Liu <ma...@gmail.com> wrote:
>>
>>     
>>> You should do the following steps:
>>> 1. Have hadoop deployed on the new node with the same directory structure
>>> and configuration.
>>> 2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.
>>>       
>> Addition: do not run "bin/hadoop namenode -format" before you run
>> datanode,
>> or you will get a error like "Incompatible namespaceIDs ..."
>>
>>     
>>> Datanode and jobtracker will contact to namenode specified in hadoop
>>> configuration file automatically and finish adding new node to the hadoop
>>> cluster.
>>>
>>>
>>> On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <ak...@cs.washington.edu>
>>> wrote:
>>>
>>>       
>>>> Yes. You should have the same hadoop-site across all your slaves. They
>>>> will need to know the DNS name for the namenode and jobtracker.
>>>>
>>>> - Aaron
>>>>
>>>> tjohn wrote:
>>>>         
>>>>> Mahadev Konar wrote:
>>>>>
>>>>>           
>>>>>> I believe (as far as I remember) you should be able to add the node
>>>>>>             
>>>> by
>>>>         
>>>>>> bringing up the datanode or tasktracker on the remote machine. The
>>>>>> Namenode or the jobtracker (I think) does not check for the nodes in
>>>>>>             
>>>> the
>>>>         
>>>>>> slaves file. The slaves file is just to start up all the daemon's by
>>>>>> ssshing to all the nodes in the slaves file during startup. So you
>>>>>> should just be able to startup the datanode pointing to correct
>>>>>>             
>>>> namenode
>>>>         
>>>>>> and it should work.
>>>>>>
>>>>>> Regards
>>>>>> Mahadev
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> Sorry for my ignorance... To make a datanode/tasktraker point to the
>>>>> namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>           
>>>
>>> --
>>> Mafish@gmail.com
>>> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>>>
>>>       
>>
>> -- 
>> Mafish@gmail.com
>> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>>
>>
>>     
>
> Thanks a lot guys! It worked fine and it was exactly what i was looking for.
> Best wishes, 
> John.
>
>   
-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

Re: dynamically adding slaves to hadoop cluster

Posted by tjohn <gt...@gmail.com>.



Mafish Liu wrote:
> 
> On Mon, Mar 10, 2008 at 9:47 AM, Mafish Liu <ma...@gmail.com> wrote:
> 
>> You should do the following steps:
>> 1. Have hadoop deployed on the new node with the same directory structure
>> and configuration.
>> 2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.
> 
> Addition: do not run "bin/hadoop namenode -format" before you run
> datanode,
> or you will get a error like "Incompatible namespaceIDs ..."
> 
>>
>>
>> Datanode and jobtracker will contact to namenode specified in hadoop
>> configuration file automatically and finish adding new node to the hadoop
>> cluster.
>>
>>
>> On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <ak...@cs.washington.edu>
>> wrote:
>>
>> > Yes. You should have the same hadoop-site across all your slaves. They
>> > will need to know the DNS name for the namenode and jobtracker.
>> >
>> > - Aaron
>> >
>> > tjohn wrote:
>> > >
>> > > Mahadev Konar wrote:
>> > >
>> > >> I believe (as far as I remember) you should be able to add the node
>> > by
>> > >> bringing up the datanode or tasktracker on the remote machine. The
>> > >> Namenode or the jobtracker (I think) does not check for the nodes in
>> > the
>> > >> slaves file. The slaves file is just to start up all the daemon's by
>> > >> ssshing to all the nodes in the slaves file during startup. So you
>> > >> should just be able to startup the datanode pointing to correct
>> > namenode
>> > >> and it should work.
>> > >>
>> > >> Regards
>> > >> Mahadev
>> > >>
>> > >>
>> > >>
>> > >
>> > > Sorry for my ignorance... To make a datanode/tasktraker point to the
>> > > namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
>> > >
>> > > John
>> > >
>> > >
>> >
>>
>>
>>
>> --
>> Mafish@gmail.com
>> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>>
> 
> 
> 
> -- 
> Mafish@gmail.com
> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
> 
> 

Thanks a lot guys! It worked fine and it was exactly what i was looking for.
Best wishes, 
John.

-- 
View this message in context: http://www.nabble.com/dynamically-adding-slaves-to-hadoop-cluster-tp15943388p15950796.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: dynamically adding slaves to hadoop cluster

Posted by Mafish Liu <ma...@gmail.com>.

On Mon, Mar 10, 2008 at 9:47 AM, Mafish Liu <ma...@gmail.com> wrote:

> You should do the following steps:
> 1. Have hadoop deployed on the new node with the same directory structure
> and configuration.
> 2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.

Addition: do not run "bin/hadoop namenode -format" before you run datanode,
or you will get a error like "Incompatible namespaceIDs ..."

>
>
> Datanode and jobtracker will contact to namenode specified in hadoop
> configuration file automatically and finish adding new node to the hadoop
> cluster.
>
>
> On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <ak...@cs.washington.edu>
> wrote:
>
> > Yes. You should have the same hadoop-site across all your slaves. They
> > will need to know the DNS name for the namenode and jobtracker.
> >
> > - Aaron
> >
> > tjohn wrote:
> > >
> > > Mahadev Konar wrote:
> > >
> > >> I believe (as far as I remember) you should be able to add the node
> > by
> > >> bringing up the datanode or tasktracker on the remote machine. The
> > >> Namenode or the jobtracker (I think) does not check for the nodes in
> > the
> > >> slaves file. The slaves file is just to start up all the daemon's by
> > >> ssshing to all the nodes in the slaves file during startup. So you
> > >> should just be able to startup the datanode pointing to correct
> > namenode
> > >> and it should work.
> > >>
> > >> Regards
> > >> Mahadev
> > >>
> > >>
> > >>
> > >
> > > Sorry for my ignorance... To make a datanode/tasktraker point to the
> > > namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
> > >
> > > John
> > >
> > >
> >
>
>
>
> --
> Mafish@gmail.com
> Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
>



-- 
Mafish@gmail.com
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.

Re: dynamically adding slaves to hadoop cluster

Posted by Mafish Liu <ma...@gmail.com>.

You should do the following steps:
1. Have hadoop deployed on the new node with the same directory structure
and configuration.
2. Just run $HADOOP_HOME/bin/hadoop datanode and jobtracker.

Datanode and jobtracker will contact to namenode specified in hadoop
configuration file automatically and finish adding new node to the hadoop
cluster.

On Mon, Mar 10, 2008 at 4:56 AM, Aaron Kimball <ak...@cs.washington.edu> wrote:

> Yes. You should have the same hadoop-site across all your slaves. They
> will need to know the DNS name for the namenode and jobtracker.
>
> - Aaron
>
> tjohn wrote:
> >
> > Mahadev Konar wrote:
> >
> >> I believe (as far as I remember) you should be able to add the node by
> >> bringing up the datanode or tasktracker on the remote machine. The
> >> Namenode or the jobtracker (I think) does not check for the nodes in
> the
> >> slaves file. The slaves file is just to start up all the daemon's by
> >> ssshing to all the nodes in the slaves file during startup. So you
> >> should just be able to startup the datanode pointing to correct
> namenode
> >> and it should work.
> >>
> >> Regards
> >> Mahadev
> >>
> >>
> >>
> >
> > Sorry for my ignorance... To make a datanode/tasktraker point to the
> > namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
> >
> > John
> >
> >
>



-- 
Mafish@gmail.com
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.

Re: dynamically adding slaves to hadoop cluster

Posted by Aaron Kimball <ak...@cs.washington.edu>.

Yes. You should have the same hadoop-site across all your slaves. They 
will need to know the DNS name for the namenode and jobtracker.

- Aaron

tjohn wrote:
>
> Mahadev Konar wrote:
>   
>> I believe (as far as I remember) you should be able to add the node by
>> bringing up the datanode or tasktracker on the remote machine. The
>> Namenode or the jobtracker (I think) does not check for the nodes in the
>> slaves file. The slaves file is just to start up all the daemon's by
>> ssshing to all the nodes in the slaves file during startup. So you
>> should just be able to startup the datanode pointing to correct namenode
>> and it should work.
>>
>> Regards
>> Mahadev
>>
>>
>>     
>
> Sorry for my ignorance... To make a datanode/tasktraker point to the
> namenode what should i do? Have i to edit the hadoop-site.xml? Thanks
>
> John
>
>

RE: dynamically adding slaves to hadoop cluster

Posted by tjohn <gt...@gmail.com>.



Mahadev Konar wrote:
> 
> I believe (as far as I remember) you should be able to add the node by
> bringing up the datanode or tasktracker on the remote machine. The
> Namenode or the jobtracker (I think) does not check for the nodes in the
> slaves file. The slaves file is just to start up all the daemon's by
> ssshing to all the nodes in the slaves file during startup. So you
> should just be able to startup the datanode pointing to correct namenode
> and it should work.
> 
> Regards
> Mahadev
> 
> 

Sorry for my ignorance... To make a datanode/tasktraker point to the
namenode what should i do? Have i to edit the hadoop-site.xml? Thanks

John

-- 
View this message in context: http://www.nabble.com/dynamically-adding-slaves-to-hadoop-cluster-tp15943388p15946094.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

RE: dynamically adding slaves to hadoop cluster

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

I believe (as far as I remember) you should be able to add the node by
bringing up the datanode or tasktracker on the remote machine. The
Namenode or the jobtracker (I think) does not check for the nodes in the
slaves file. The slaves file is just to start up all the daemon's by
ssshing to all the nodes in the slaves file during startup. So you
should just be able to startup the datanode pointing to correct namenode
and it should work.

Regards
Mahadev

> -----Original Message-----
> From: tjohn [mailto:gtardini@gmail.com]
> Sent: Sunday, March 09, 2008 1:18 PM
> To: core-user@hadoop.apache.org
> Subject: Re: dynamically adding slaves to hadoop cluster
> 
> 
> 
> 
> Owen O'Malley-2 wrote:
> >
> >
> > On Mar 9, 2008, at 9:54 AM, tjohn wrote:
> >
> >>
> >> Hi all, i m new to hadoop and i wanted to know how to dynamically
> >> add a slave
> >> to my cluster, obviously while it' s running.
> >
> > If you start a new data node (and/or task tracker), they will join
> > the cluster of the configured name node / job tracker. After adding
> > datanodes, you should rebalance your hdfs data.
> >
> > -- Owen
> >
> >
> 
> Yeah thanks Owen that' s useful but what i wanted to know is how to
add a
> new remote machine although it s not listed in the conf/slaves file
and i
> just don' t understand how to do it without stopping the cluster or
the
> process running on it. (sorry for my english, it' s not my native
language
> so probably it sounds like i m a bit rude.. )
> 
> John
> --
> View this message in context:
http://www.nabble.com/dynamically-adding-
> slaves-to-hadoop-cluster-tp15943388p15945833.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: dynamically adding slaves to hadoop cluster

Posted by tjohn <gt...@gmail.com>.

Owen O'Malley-2 wrote:
> 
> 
> On Mar 9, 2008, at 9:54 AM, tjohn wrote:
> 
>>
>> Hi all, i m new to hadoop and i wanted to know how to dynamically  
>> add a slave
>> to my cluster, obviously while it' s running.
> 
> If you start a new data node (and/or task tracker), they will join  
> the cluster of the configured name node / job tracker. After adding  
> datanodes, you should rebalance your hdfs data.
> 
> -- Owen
> 
> 

Yeah thanks Owen that' s useful but what i wanted to know is how to add a
new remote machine although it s not listed in the conf/slaves file and i
just don' t understand how to do it without stopping the cluster or the
process running on it. (sorry for my english, it' s not my native language
so probably it sounds like i m a bit rude.. ) 

John
-- 
View this message in context: http://www.nabble.com/dynamically-adding-slaves-to-hadoop-cluster-tp15943388p15945833.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: dynamically adding slaves to hadoop cluster

Posted by Owen O'Malley <oo...@yahoo-inc.com>.

On Mar 9, 2008, at 9:54 AM, tjohn wrote:

>
> Hi all, i m new to hadoop and i wanted to know how to dynamically  
> add a slave
> to my cluster, obviously while it' s running.

If you start a new data node (and/or task tracker), they will join  
the cluster of the configured name node / job tracker. After adding  
datanodes, you should rebalance your hdfs data.

-- Owen