You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Dhanasekaran Anbalagan <bu...@gmail.com> on 2013/02/12 12:50:07 UTC

Decommissioning Nodes in Production Cluster.

Hi Guys,

It's recommenced do with removing one the datanode in production cluster.
via Decommission the particular datanode. please guide me.

-Dhanasekaran,

Did I learn something today? If not, I wasted it.

Re: Decommissioning Nodes in Production Cluster.

Posted by Benjamin Kim <bb...@gmail.com>.

Hi,

I would like to add another scenario. What are the steps for removing a 
dead node when the server had a hard failure that is unrecoverable.

Thanks,
Ben

On Tuesday, February 12, 2013 7:30:57 AM UTC-8, sudhakara st wrote:
>
> The decommissioning process is controlled by an exclude file, which for 
> HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude
> * property. In most cases, there is one shared file,referred to as the 
> exclude file.This  exclude file name should be specified as a configuration 
> parameter *dfs.hosts.exclude *in the name node start up.
>
>
> To remove nodes from the cluster:
> 1. Add the network addresses of the nodes to be decommissioned to the 
> exclude file.
>
> 2. Restart the MapReduce cluster to stop the tasktrackers on the nodes 
> being
> decommissioned.
> 3. Update the namenode with the new set of permitted datanodes, with this
> command:
> % hadoop dfsadmin -refreshNodes
> 4. Go to the web UI and check whether the admin state has changed to 
> “Decommission
> In Progress” for the datanodes being decommissioned. They will start 
> copying
> their blocks to other datanodes in the cluster.
>
> 5. When all the datanodes report their state as “Decommissioned,” then all 
> the blocks
> have been replicated. Shut down the decommissioned nodes.
> 6. Remove the nodes from the include file, and run:
> % hadoop dfsadmin -refreshNodes
> 7. Remove the nodes from the slaves file.
>
>  Decommission data nodes in small percentage(less than 2%) at time don't 
> cause any effect on cluster. But it better to pause MR-Jobs before you 
> triggering Decommission to ensure  no task running in decommissioning 
> subjected nodes.
>  If very small percentage of task running in the decommissioning node it 
> can submit to other task tracker, but percentage queued jobs  larger then 
> threshold  then there is chance of job failure. Once triggering the 'hadoop 
> dfsadmin -refreshNodes' command and decommission started, you can resume 
> the MR jobs.
>
> *Source : The Definitive Guide [Tom White]*
>
>
>
> On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan 
> wrote:
>>
>> Hi Guys,
>>
>> It's recommenced do with removing one the datanode in production cluster.
>> via Decommission the particular datanode. please guide me.
>>  
>> -Dhanasekaran,
>>
>> Did I learn something today? If not, I wasted it.
>>  
>

Re: Decommissioning Nodes in Production Cluster.

Posted by Benjamin Kim <bb...@gmail.com>.

Hi,

I would like to add another scenario. What are the steps for removing a 
dead node when the server had a hard failure that is unrecoverable.

Thanks,
Ben

On Tuesday, February 12, 2013 7:30:57 AM UTC-8, sudhakara st wrote:
>
> The decommissioning process is controlled by an exclude file, which for 
> HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude
> * property. In most cases, there is one shared file,referred to as the 
> exclude file.This  exclude file name should be specified as a configuration 
> parameter *dfs.hosts.exclude *in the name node start up.
>
>
> To remove nodes from the cluster:
> 1. Add the network addresses of the nodes to be decommissioned to the 
> exclude file.
>
> 2. Restart the MapReduce cluster to stop the tasktrackers on the nodes 
> being
> decommissioned.
> 3. Update the namenode with the new set of permitted datanodes, with this
> command:
> % hadoop dfsadmin -refreshNodes
> 4. Go to the web UI and check whether the admin state has changed to 
> “Decommission
> In Progress” for the datanodes being decommissioned. They will start 
> copying
> their blocks to other datanodes in the cluster.
>
> 5. When all the datanodes report their state as “Decommissioned,” then all 
> the blocks
> have been replicated. Shut down the decommissioned nodes.
> 6. Remove the nodes from the include file, and run:
> % hadoop dfsadmin -refreshNodes
> 7. Remove the nodes from the slaves file.
>
>  Decommission data nodes in small percentage(less than 2%) at time don't 
> cause any effect on cluster. But it better to pause MR-Jobs before you 
> triggering Decommission to ensure  no task running in decommissioning 
> subjected nodes.
>  If very small percentage of task running in the decommissioning node it 
> can submit to other task tracker, but percentage queued jobs  larger then 
> threshold  then there is chance of job failure. Once triggering the 'hadoop 
> dfsadmin -refreshNodes' command and decommission started, you can resume 
> the MR jobs.
>
> *Source : The Definitive Guide [Tom White]*
>
>
>
> On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan 
> wrote:
>>
>> Hi Guys,
>>
>> It's recommenced do with removing one the datanode in production cluster.
>> via Decommission the particular datanode. please guide me.
>>  
>> -Dhanasekaran,
>>
>> Did I learn something today? If not, I wasted it.
>>  
>

Re: Decommissioning Nodes in Production Cluster.

Posted by Benjamin Kim <bb...@gmail.com>.

Hi,

I would like to add another scenario. What are the steps for removing a 
dead node when the server had a hard failure that is unrecoverable.

Thanks,
Ben

On Tuesday, February 12, 2013 7:30:57 AM UTC-8, sudhakara st wrote:
>
> The decommissioning process is controlled by an exclude file, which for 
> HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude
> * property. In most cases, there is one shared file,referred to as the 
> exclude file.This  exclude file name should be specified as a configuration 
> parameter *dfs.hosts.exclude *in the name node start up.
>
>
> To remove nodes from the cluster:
> 1. Add the network addresses of the nodes to be decommissioned to the 
> exclude file.
>
> 2. Restart the MapReduce cluster to stop the tasktrackers on the nodes 
> being
> decommissioned.
> 3. Update the namenode with the new set of permitted datanodes, with this
> command:
> % hadoop dfsadmin -refreshNodes
> 4. Go to the web UI and check whether the admin state has changed to 
> “Decommission
> In Progress” for the datanodes being decommissioned. They will start 
> copying
> their blocks to other datanodes in the cluster.
>
> 5. When all the datanodes report their state as “Decommissioned,” then all 
> the blocks
> have been replicated. Shut down the decommissioned nodes.
> 6. Remove the nodes from the include file, and run:
> % hadoop dfsadmin -refreshNodes
> 7. Remove the nodes from the slaves file.
>
>  Decommission data nodes in small percentage(less than 2%) at time don't 
> cause any effect on cluster. But it better to pause MR-Jobs before you 
> triggering Decommission to ensure  no task running in decommissioning 
> subjected nodes.
>  If very small percentage of task running in the decommissioning node it 
> can submit to other task tracker, but percentage queued jobs  larger then 
> threshold  then there is chance of job failure. Once triggering the 'hadoop 
> dfsadmin -refreshNodes' command and decommission started, you can resume 
> the MR jobs.
>
> *Source : The Definitive Guide [Tom White]*
>
>
>
> On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan 
> wrote:
>>
>> Hi Guys,
>>
>> It's recommenced do with removing one the datanode in production cluster.
>> via Decommission the particular datanode. please guide me.
>>  
>> -Dhanasekaran,
>>
>> Did I learn something today? If not, I wasted it.
>>  
>

Re: Decommissioning Nodes in Production Cluster.

Posted by Benjamin Kim <bb...@gmail.com>.

Hi,

I would like to add another scenario. What are the steps for removing a 
dead node when the server had a hard failure that is unrecoverable.

Thanks,
Ben

On Tuesday, February 12, 2013 7:30:57 AM UTC-8, sudhakara st wrote:
>
> The decommissioning process is controlled by an exclude file, which for 
> HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude
> * property. In most cases, there is one shared file,referred to as the 
> exclude file.This  exclude file name should be specified as a configuration 
> parameter *dfs.hosts.exclude *in the name node start up.
>
>
> To remove nodes from the cluster:
> 1. Add the network addresses of the nodes to be decommissioned to the 
> exclude file.
>
> 2. Restart the MapReduce cluster to stop the tasktrackers on the nodes 
> being
> decommissioned.
> 3. Update the namenode with the new set of permitted datanodes, with this
> command:
> % hadoop dfsadmin -refreshNodes
> 4. Go to the web UI and check whether the admin state has changed to 
> “Decommission
> In Progress” for the datanodes being decommissioned. They will start 
> copying
> their blocks to other datanodes in the cluster.
>
> 5. When all the datanodes report their state as “Decommissioned,” then all 
> the blocks
> have been replicated. Shut down the decommissioned nodes.
> 6. Remove the nodes from the include file, and run:
> % hadoop dfsadmin -refreshNodes
> 7. Remove the nodes from the slaves file.
>
>  Decommission data nodes in small percentage(less than 2%) at time don't 
> cause any effect on cluster. But it better to pause MR-Jobs before you 
> triggering Decommission to ensure  no task running in decommissioning 
> subjected nodes.
>  If very small percentage of task running in the decommissioning node it 
> can submit to other task tracker, but percentage queued jobs  larger then 
> threshold  then there is chance of job failure. Once triggering the 'hadoop 
> dfsadmin -refreshNodes' command and decommission started, you can resume 
> the MR jobs.
>
> *Source : The Definitive Guide [Tom White]*
>
>
>
> On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan 
> wrote:
>>
>> Hi Guys,
>>
>> It's recommenced do with removing one the datanode in production cluster.
>> via Decommission the particular datanode. please guide me.
>>  
>> -Dhanasekaran,
>>
>> Did I learn something today? If not, I wasted it.
>>  
>

Re: Decommissioning Nodes in Production Cluster.

Posted by sudhakara st <su...@gmail.com>.

The decommissioning process is controlled by an exclude file, which for 
HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude
* property. In most cases, there is one shared file,referred to as the 
exclude file.This  exclude file name should be specified as a configuration 
parameter *dfs.hosts.exclude *in the name node start up.

To remove nodes from the cluster:
1. Add the network addresses of the nodes to be decommissioned to the 
exclude file.

2. Restart the MapReduce cluster to stop the tasktrackers on the nodes being
decommissioned.
3. Update the namenode with the new set of permitted datanodes, with this
command:
% hadoop dfsadmin -refreshNodes
4. Go to the web UI and check whether the admin state has changed to 
“Decommission
In Progress” for the datanodes being decommissioned. They will start copying
their blocks to other datanodes in the cluster.

5. When all the datanodes report their state as “Decommissioned,” then all 
the blocks
have been replicated. Shut down the decommissioned nodes.
6. Remove the nodes from the include file, and run:
% hadoop dfsadmin -refreshNodes
7. Remove the nodes from the slaves file.

 Decommission data nodes in small percentage(less than 2%) at time don't 
cause any effect on cluster. But it better to pause MR-Jobs before you 
triggering Decommission to ensure  no task running in decommissioning 
subjected nodes.
 If very small percentage of task running in the decommissioning node it 
can submit to other task tracker, but percentage queued jobs  larger then 
threshold  then there is chance of job failure. Once triggering the 'hadoop 
dfsadmin -refreshNodes' command and decommission started, you can resume 
the MR jobs.

*Source : The Definitive Guide [Tom White]*

On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan 
wrote:
>
> Hi Guys,
>
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
>  
> -Dhanasekaran,
>
> Did I learn something today? If not, I wasted it.
>

Re: Decommissioning Nodes in Production Cluster.

Posted by sudhakara st <su...@gmail.com>.

The decommissioning process is controlled by an exclude file, which for 
HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude
* property. In most cases, there is one shared file,referred to as the 
exclude file.This  exclude file name should be specified as a configuration 
parameter *dfs.hosts.exclude *in the name node start up.

To remove nodes from the cluster:
1. Add the network addresses of the nodes to be decommissioned to the 
exclude file.

2. Restart the MapReduce cluster to stop the tasktrackers on the nodes being
decommissioned.
3. Update the namenode with the new set of permitted datanodes, with this
command:
% hadoop dfsadmin -refreshNodes
4. Go to the web UI and check whether the admin state has changed to 
“Decommission
In Progress” for the datanodes being decommissioned. They will start copying
their blocks to other datanodes in the cluster.

5. When all the datanodes report their state as “Decommissioned,” then all 
the blocks
have been replicated. Shut down the decommissioned nodes.
6. Remove the nodes from the include file, and run:
% hadoop dfsadmin -refreshNodes
7. Remove the nodes from the slaves file.

 Decommission data nodes in small percentage(less than 2%) at time don't 
cause any effect on cluster. But it better to pause MR-Jobs before you 
triggering Decommission to ensure  no task running in decommissioning 
subjected nodes.
 If very small percentage of task running in the decommissioning node it 
can submit to other task tracker, but percentage queued jobs  larger then 
threshold  then there is chance of job failure. Once triggering the 'hadoop 
dfsadmin -refreshNodes' command and decommission started, you can resume 
the MR jobs.

*Source : The Definitive Guide [Tom White]*

On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan 
wrote:
>
> Hi Guys,
>
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
>  
> -Dhanasekaran,
>
> Did I learn something today? If not, I wasted it.
>

Re: Decommissioning Nodes in Production Cluster.

Posted by sudhakara st <su...@gmail.com>.

The decommissioning process is controlled by an exclude file, which for 
HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude
* property. In most cases, there is one shared file,referred to as the 
exclude file.This  exclude file name should be specified as a configuration 
parameter *dfs.hosts.exclude *in the name node start up.

To remove nodes from the cluster:
1. Add the network addresses of the nodes to be decommissioned to the 
exclude file.

2. Restart the MapReduce cluster to stop the tasktrackers on the nodes being
decommissioned.
3. Update the namenode with the new set of permitted datanodes, with this
command:
% hadoop dfsadmin -refreshNodes
4. Go to the web UI and check whether the admin state has changed to 
“Decommission
In Progress” for the datanodes being decommissioned. They will start copying
their blocks to other datanodes in the cluster.

5. When all the datanodes report their state as “Decommissioned,” then all 
the blocks
have been replicated. Shut down the decommissioned nodes.
6. Remove the nodes from the include file, and run:
% hadoop dfsadmin -refreshNodes
7. Remove the nodes from the slaves file.

 Decommission data nodes in small percentage(less than 2%) at time don't 
cause any effect on cluster. But it better to pause MR-Jobs before you 
triggering Decommission to ensure  no task running in decommissioning 
subjected nodes.
 If very small percentage of task running in the decommissioning node it 
can submit to other task tracker, but percentage queued jobs  larger then 
threshold  then there is chance of job failure. Once triggering the 'hadoop 
dfsadmin -refreshNodes' command and decommission started, you can resume 
the MR jobs.

*Source : The Definitive Guide [Tom White]*

On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan 
wrote:
>
> Hi Guys,
>
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
>  
> -Dhanasekaran,
>
> Did I learn something today? If not, I wasted it.
>

Re: Decommissioning Nodes in Production Cluster.

Posted by shashwat shriparv <dw...@gmail.com>.

On Tue, Feb 12, 2013 at 11:43 PM, Robert Molina <rm...@hortonworks.com>wrote:

> to do it, there should be some information he


this is best way to remove data node from a cluster. you have done the
right thing.........



∞
Shashwat Shriparv

Re: Decommissioning Nodes in Production Cluster.

Posted by shashwat shriparv <dw...@gmail.com>.

On Tue, Feb 12, 2013 at 11:43 PM, Robert Molina <rm...@hortonworks.com>wrote:

> to do it, there should be some information he


this is best way to remove data node from a cluster. you have done the
right thing.........



∞
Shashwat Shriparv

Re: Decommissioning Nodes in Production Cluster.

Posted by shashwat shriparv <dw...@gmail.com>.

On Tue, Feb 12, 2013 at 11:43 PM, Robert Molina <rm...@hortonworks.com>wrote:

> to do it, there should be some information he


this is best way to remove data node from a cluster. you have done the
right thing.........



∞
Shashwat Shriparv

Re: Decommissioning Nodes in Production Cluster.

Posted by shashwat shriparv <dw...@gmail.com>.

On Tue, Feb 12, 2013 at 11:43 PM, Robert Molina <rm...@hortonworks.com>wrote:

> to do it, there should be some information he


this is best way to remove data node from a cluster. you have done the
right thing.........



∞
Shashwat Shriparv

Re: Decommissioning Nodes in Production Cluster.

Posted by Robert Molina <rm...@hortonworks.com>.

Hi Dhanasekaran,
I believe you are trying to ask if it is recommended to use the
decommissioning feature to remove datanodes from your cluster, the answer
would be yes.  As far as how to do it, there should be some information
here http://wiki.apache.org/hadoop/FAQ that should help.

Regards,
Robert

On Tue, Feb 12, 2013 at 3:50 AM, Dhanasekaran Anbalagan
<bu...@gmail.com>wrote:

> Hi Guys,
>
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
>
> -Dhanasekaran,
>
> Did I learn something today? If not, I wasted it.
>

Re: Decommissioning Nodes in Production Cluster.

Posted by Robert Molina <rm...@hortonworks.com>.

Hi Dhanasekaran,
I believe you are trying to ask if it is recommended to use the
decommissioning feature to remove datanodes from your cluster, the answer
would be yes.  As far as how to do it, there should be some information
here http://wiki.apache.org/hadoop/FAQ that should help.

Regards,
Robert

On Tue, Feb 12, 2013 at 3:50 AM, Dhanasekaran Anbalagan
<bu...@gmail.com>wrote:

> Hi Guys,
>
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
>
> -Dhanasekaran,
>
> Did I learn something today? If not, I wasted it.
>

Re: Decommissioning Nodes in Production Cluster.

Posted by sudhakara st <su...@gmail.com>.

The decommissioning process is controlled by an exclude file, which for 
HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*mapred.hosts.exclude
* property. In most cases, there is one shared file,referred to as the 
exclude file.This  exclude file name should be specified as a configuration 
parameter *dfs.hosts.exclude *in the name node start up.

To remove nodes from the cluster:
1. Add the network addresses of the nodes to be decommissioned to the 
exclude file.

2. Restart the MapReduce cluster to stop the tasktrackers on the nodes being
decommissioned.
3. Update the namenode with the new set of permitted datanodes, with this
command:
% hadoop dfsadmin -refreshNodes
4. Go to the web UI and check whether the admin state has changed to 
“Decommission
In Progress” for the datanodes being decommissioned. They will start copying
their blocks to other datanodes in the cluster.

5. When all the datanodes report their state as “Decommissioned,” then all 
the blocks
have been replicated. Shut down the decommissioned nodes.
6. Remove the nodes from the include file, and run:
% hadoop dfsadmin -refreshNodes
7. Remove the nodes from the slaves file.

 Decommission data nodes in small percentage(less than 2%) at time don't 
cause any effect on cluster. But it better to pause MR-Jobs before you 
triggering Decommission to ensure  no task running in decommissioning 
subjected nodes.
 If very small percentage of task running in the decommissioning node it 
can submit to other task tracker, but percentage queued jobs  larger then 
threshold  then there is chance of job failure. Once triggering the 'hadoop 
dfsadmin -refreshNodes' command and decommission started, you can resume 
the MR jobs.

*Source : The Definitive Guide [Tom White]*

On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan 
wrote:
>
> Hi Guys,
>
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
>  
> -Dhanasekaran,
>
> Did I learn something today? If not, I wasted it.
>

Re: Decommissioning Nodes in Production Cluster.

Posted by Robert Molina <rm...@hortonworks.com>.

Hi Dhanasekaran,
I believe you are trying to ask if it is recommended to use the
decommissioning feature to remove datanodes from your cluster, the answer
would be yes.  As far as how to do it, there should be some information
here http://wiki.apache.org/hadoop/FAQ that should help.

Regards,
Robert

On Tue, Feb 12, 2013 at 3:50 AM, Dhanasekaran Anbalagan
<bu...@gmail.com>wrote:

> Hi Guys,
>
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
>
> -Dhanasekaran,
>
> Did I learn something today? If not, I wasted it.
>

Re: Decommissioning Nodes in Production Cluster.

Posted by Robert Molina <rm...@hortonworks.com>.

Hi Dhanasekaran,
I believe you are trying to ask if it is recommended to use the
decommissioning feature to remove datanodes from your cluster, the answer
would be yes.  As far as how to do it, there should be some information
here http://wiki.apache.org/hadoop/FAQ that should help.

Regards,
Robert

On Tue, Feb 12, 2013 at 3:50 AM, Dhanasekaran Anbalagan
<bu...@gmail.com>wrote:

> Hi Guys,
>
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
>
> -Dhanasekaran,
>
> Did I learn something today? If not, I wasted it.
>