You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by sumadhur <su...@yahoo.com> on 2012/05/01 04:58:36 UTC

adding or restarting a data node in a hadoop cluster

 
I am on hadoop 0.20.
 
To add a data node to a cluster, if we do not use the include/exclude/slaves files, do we need to  do anything other than configuring the hdfs-site.xml to point to name node and the mapred-site.xml to point to job tracker?
 
For example, should the job tracker and name node be restarted always?  
 
On a related note, if we restart a data node(that has some blocks on it) and the data node now has new IP address, Should we restart namenode/job tracker for hdfs and map-reduce to function correctly? 
Would the blocks on the restarted data node be detected or would hdfs think that these blocks were lost and start replicating them?
 
Thanks,
Sumadhur

Re: adding or restarting a data node in a hadoop cluster

Posted by Harsh J <ha...@cloudera.com>.
Sumadhur,

(Inline)

On Tue, May 1, 2012 at 8:28 AM, sumadhur <su...@yahoo.com> wrote:
>
> I am on hadoop 0.20.
>
> To add a data node to a cluster, if we do not use the include/exclude/slaves files, do we need to  do anything other than configuring the hdfs-site.xml to point to name node and the mapred-site.xml to point to job tracker?
>
> For example, should the job tracker and name node be restarted always?

Just booting up the DN service with the right config and a configured
network for proper communication should suffice.

In case you're using rack-awareness, ensure you update the
rack-awareness script for your new node and refresh the NN before you
start your DN.

A restart isn't required for adding new nodes to the cluster.

> On a related note, if we restart a data node(that has some blocks on it) and the data node now has new IP address, Should we restart namenode/job tracker for hdfs and map-reduce to function correctly?
> Would the blocks on the restarted data node be detected or would hdfs think that these blocks were lost and start replicating them?

Stopping, changing the IP/Hostname cleanly and restarting the DN back
up should not cause any block movement.

-- 
Harsh J

Re: adding or restarting a data node in a hadoop cluster

Posted by Anil Gupta <an...@gmail.com>.
@amit: if the DN is getting the IP from dhcp then the ip address might change after a reboot. 
Dynamic ip's in the cluster are not a good choice. IMO

Best Regards,
Anil

On Apr 30, 2012, at 8:22 PM, Amith D K <am...@huawei.com> wrote:

> Hi sumadhur,
> 
> As u mentioned configureg the NN and JT ip would be enough.
> 
> I am not able to understand how on DN restart its IP get changed?
> 
> ________________________________________
> From: sumadhur [sumadhur_iitr@yahoo.com]
> Sent: Tuesday, May 01, 2012 10:58 AM
> To: common-user@hadoop.apache.org
> Subject: adding or restarting a data node in a hadoop cluster
> 
> I am on hadoop 0.20.
> 
> To add a data node to a cluster, if we do not use the include/exclude/slaves files, do we need to  do anything other than configuring the hdfs-site.xml to point to name node and the mapred-site.xml to point to job tracker?
> 
> For example, should the job tracker and name node be restarted always?
> 
> On a related note, if we restart a data node(that has some blocks on it) and the data node now has new IP address, Should we restart namenode/job tracker for hdfs and map-reduce to function correctly?
> Would the blocks on the restarted data node be detected or would hdfs think that these blocks were lost and start replicating them?
> 
> Thanks,
> Sumadhur

RE: adding or restarting a data node in a hadoop cluster

Posted by Amith D K <am...@huawei.com>.
Hi sumadhur,

As u mentioned configureg the NN and JT ip would be enough.

I am not able to understand how on DN restart its IP get changed?

________________________________________
From: sumadhur [sumadhur_iitr@yahoo.com]
Sent: Tuesday, May 01, 2012 10:58 AM
To: common-user@hadoop.apache.org
Subject: adding or restarting a data node in a hadoop cluster

I am on hadoop 0.20.

To add a data node to a cluster, if we do not use the include/exclude/slaves files, do we need to  do anything other than configuring the hdfs-site.xml to point to name node and the mapred-site.xml to point to job tracker?

For example, should the job tracker and name node be restarted always?

On a related note, if we restart a data node(that has some blocks on it) and the data node now has new IP address, Should we restart namenode/job tracker for hdfs and map-reduce to function correctly?
Would the blocks on the restarted data node be detected or would hdfs think that these blocks were lost and start replicating them?

Thanks,
Sumadhur