You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by andrew touchet <ad...@latech.edu> on 2014/07/23 22:18:13 UTC

Decommissioning a data node and problems bringing it back online

Hello,

I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
Currently, users can run jobs that use data stored on /hdfs. They are able
to access all datanodes/compute nodes except the one being decommissioned.

Is this safe to do? Will edited files affect the decommissioning node?

I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
simply wait for log files to report completion. After upgrade, I simply
remove the node from hosts_exlude and start hadoop again on the datanode.

Also: Under the namenode web interface I just noticed that the node I have
decommissioned previously now has 0 Configured capacity, Used, Remaining
memory and is now 100% Used.

I used the same /etc/sysconfig/hadoop file from before the upgrade, removed
the node from hosts_exclude, and ran '-refreshNodes' afterwards.

What steps have I missed in the decommissioning process or while bringing
the data node back online?

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Hello Wellington,

That sounds wonderful!  I appreciate everyone's help.

Best Regards,

Andrew Touchet


On Thu, Jul 24, 2014 at 12:01 PM, Wellington Chevreuil <
wellington.chevreuil@gmail.com> wrote:

> You should not face any data loss. The replicas were just moved away from
> that node to other nodes in the cluster during decommission. Once you
> recommission the node and re-balance your cluster, HDFS will re-distribute
> replicas between the nodes evenly, and the recommissioned node will receive
> replicas from other nodes, but there is no guarantee that exact the same
> replicas that were stored on this node before it was decommissioned will be
> assigned to this node again, after recommission and rebalance.
>
> Cheers,
> Wellington.
>
>
> On 24 Jul 2014, at 17:55, andrew touchet <ad...@latech.edu> wrote:
>
> Hi Mirko,
>
> Thanks for the reply!
>
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect
> any data loss due to starting the data node process before running the
> balancing tool?
>
> Best Regards,
>
> Andrew Touchet
>
>
>
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com>
> wrote:
>
>> After you added the nodes back to your cluster you run the balancer tool,
>> but it will not bring in exactly the same blocks like before.
>>
>
>> Cheers,
>> Mirko
>>
>>
>>
>> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
>>
>> Thanks for the reply,
>>>
>>> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
>>> makes a difference.
>>>
>>> Currently I really need to know how to get the data that was replicated
>>> during decommissioning back onto my two data nodes.
>>>
>>>
>>>
>>>
>>>
>>> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>>>
>>>> which distribution are you using?
>>>>
>>>> Regards,
>>>> *Stanley Shi,*
>>>>
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>>>> wrote:
>>>>
>>>>> I should have added this in my first email but I do get an error in
>>>>> the data node's log file
>>>>>
>>>>> '2014-07-12 19:39:58,027 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>>>> got processed in 1 msecs'
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>>>
>>>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>>>
>>>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>>>
>>>>>> Also: Under the namenode web interface I just noticed that the node I
>>>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>>>> Remaining memory and is now 100% Used.
>>>>>>
>>>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>>>
>>>>>> What steps have I missed in the decommissioning process or while
>>>>>> bringing the data node back online?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Hello Wellington,

That sounds wonderful!  I appreciate everyone's help.

Best Regards,

Andrew Touchet


On Thu, Jul 24, 2014 at 12:01 PM, Wellington Chevreuil <
wellington.chevreuil@gmail.com> wrote:

> You should not face any data loss. The replicas were just moved away from
> that node to other nodes in the cluster during decommission. Once you
> recommission the node and re-balance your cluster, HDFS will re-distribute
> replicas between the nodes evenly, and the recommissioned node will receive
> replicas from other nodes, but there is no guarantee that exact the same
> replicas that were stored on this node before it was decommissioned will be
> assigned to this node again, after recommission and rebalance.
>
> Cheers,
> Wellington.
>
>
> On 24 Jul 2014, at 17:55, andrew touchet <ad...@latech.edu> wrote:
>
> Hi Mirko,
>
> Thanks for the reply!
>
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect
> any data loss due to starting the data node process before running the
> balancing tool?
>
> Best Regards,
>
> Andrew Touchet
>
>
>
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com>
> wrote:
>
>> After you added the nodes back to your cluster you run the balancer tool,
>> but it will not bring in exactly the same blocks like before.
>>
>
>> Cheers,
>> Mirko
>>
>>
>>
>> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
>>
>> Thanks for the reply,
>>>
>>> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
>>> makes a difference.
>>>
>>> Currently I really need to know how to get the data that was replicated
>>> during decommissioning back onto my two data nodes.
>>>
>>>
>>>
>>>
>>>
>>> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>>>
>>>> which distribution are you using?
>>>>
>>>> Regards,
>>>> *Stanley Shi,*
>>>>
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>>>> wrote:
>>>>
>>>>> I should have added this in my first email but I do get an error in
>>>>> the data node's log file
>>>>>
>>>>> '2014-07-12 19:39:58,027 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>>>> got processed in 1 msecs'
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>>>
>>>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>>>
>>>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>>>
>>>>>> Also: Under the namenode web interface I just noticed that the node I
>>>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>>>> Remaining memory and is now 100% Used.
>>>>>>
>>>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>>>
>>>>>> What steps have I missed in the decommissioning process or while
>>>>>> bringing the data node back online?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Hello Wellington,

That sounds wonderful!  I appreciate everyone's help.

Best Regards,

Andrew Touchet


On Thu, Jul 24, 2014 at 12:01 PM, Wellington Chevreuil <
wellington.chevreuil@gmail.com> wrote:

> You should not face any data loss. The replicas were just moved away from
> that node to other nodes in the cluster during decommission. Once you
> recommission the node and re-balance your cluster, HDFS will re-distribute
> replicas between the nodes evenly, and the recommissioned node will receive
> replicas from other nodes, but there is no guarantee that exact the same
> replicas that were stored on this node before it was decommissioned will be
> assigned to this node again, after recommission and rebalance.
>
> Cheers,
> Wellington.
>
>
> On 24 Jul 2014, at 17:55, andrew touchet <ad...@latech.edu> wrote:
>
> Hi Mirko,
>
> Thanks for the reply!
>
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect
> any data loss due to starting the data node process before running the
> balancing tool?
>
> Best Regards,
>
> Andrew Touchet
>
>
>
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com>
> wrote:
>
>> After you added the nodes back to your cluster you run the balancer tool,
>> but it will not bring in exactly the same blocks like before.
>>
>
>> Cheers,
>> Mirko
>>
>>
>>
>> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
>>
>> Thanks for the reply,
>>>
>>> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
>>> makes a difference.
>>>
>>> Currently I really need to know how to get the data that was replicated
>>> during decommissioning back onto my two data nodes.
>>>
>>>
>>>
>>>
>>>
>>> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>>>
>>>> which distribution are you using?
>>>>
>>>> Regards,
>>>> *Stanley Shi,*
>>>>
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>>>> wrote:
>>>>
>>>>> I should have added this in my first email but I do get an error in
>>>>> the data node's log file
>>>>>
>>>>> '2014-07-12 19:39:58,027 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>>>> got processed in 1 msecs'
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>>>
>>>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>>>
>>>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>>>
>>>>>> Also: Under the namenode web interface I just noticed that the node I
>>>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>>>> Remaining memory and is now 100% Used.
>>>>>>
>>>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>>>
>>>>>> What steps have I missed in the decommissioning process or while
>>>>>> bringing the data node back online?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Hello Wellington,

That sounds wonderful!  I appreciate everyone's help.

Best Regards,

Andrew Touchet


On Thu, Jul 24, 2014 at 12:01 PM, Wellington Chevreuil <
wellington.chevreuil@gmail.com> wrote:

> You should not face any data loss. The replicas were just moved away from
> that node to other nodes in the cluster during decommission. Once you
> recommission the node and re-balance your cluster, HDFS will re-distribute
> replicas between the nodes evenly, and the recommissioned node will receive
> replicas from other nodes, but there is no guarantee that exact the same
> replicas that were stored on this node before it was decommissioned will be
> assigned to this node again, after recommission and rebalance.
>
> Cheers,
> Wellington.
>
>
> On 24 Jul 2014, at 17:55, andrew touchet <ad...@latech.edu> wrote:
>
> Hi Mirko,
>
> Thanks for the reply!
>
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect
> any data loss due to starting the data node process before running the
> balancing tool?
>
> Best Regards,
>
> Andrew Touchet
>
>
>
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com>
> wrote:
>
>> After you added the nodes back to your cluster you run the balancer tool,
>> but it will not bring in exactly the same blocks like before.
>>
>
>> Cheers,
>> Mirko
>>
>>
>>
>> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
>>
>> Thanks for the reply,
>>>
>>> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
>>> makes a difference.
>>>
>>> Currently I really need to know how to get the data that was replicated
>>> during decommissioning back onto my two data nodes.
>>>
>>>
>>>
>>>
>>>
>>> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>>>
>>>> which distribution are you using?
>>>>
>>>> Regards,
>>>> *Stanley Shi,*
>>>>
>>>>
>>>>
>>>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>>>> wrote:
>>>>
>>>>> I should have added this in my first email but I do get an error in
>>>>> the data node's log file
>>>>>
>>>>> '2014-07-12 19:39:58,027 INFO
>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>>>> got processed in 1 msecs'
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>>>
>>>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>>>
>>>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>>>
>>>>>> Also: Under the namenode web interface I just noticed that the node I
>>>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>>>> Remaining memory and is now 100% Used.
>>>>>>
>>>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>>>
>>>>>> What steps have I missed in the decommissioning process or while
>>>>>> bringing the data node back online?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by Wellington Chevreuil <we...@gmail.com>.

You should not face any data loss. The replicas were just moved away from that node to other nodes in the cluster during decommission. Once you recommission the node and re-balance your cluster, HDFS will re-distribute replicas between the nodes evenly, and the recommissioned node will receive replicas from other nodes, but there is no guarantee that exact the same replicas that were stored on this node before it was decommissioned will be assigned to this node again, after recommission and rebalance.

Cheers,
Wellington. 


On 24 Jul 2014, at 17:55, andrew touchet <ad...@latech.edu> wrote:

> Hi Mirko,
> 
> Thanks for the reply!
> 
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect any data loss due to starting the data node process before running the balancing tool?
> 
> Best Regards,
> 
> Andrew Touchet
> 
> 
> 
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com> wrote:
> After you added the nodes back to your cluster you run the balancer tool, but it will not bring in exactly the same blocks like before.
> 
> Cheers,
> Mirko
> 
> 
> 
> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
> 
> Thanks for the reply,
> 
> I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference. 
> 
> Currently I really need to know how to get the data that was replicated during decommissioning back onto my two data nodes. 
> 
> 
> 
> 
> 
> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
> which distribution are you using? 
> 
> Regards,
> Stanley Shi,
> 
> 
> 
> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu> wrote:
> I should have added this in my first email but I do get an error in the data node's log file
> 
> '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs'
> 
> 
> 
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:
> Hello,
> 
> I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. 
> 
> Is this safe to do? Will edited files affect the decommissioning node?
> 
> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode.
> 
> Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. 
> 
> I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.  
> 
> What steps have I missed in the decommissioning process or while bringing the data node back online?
> 
> 
> 
> 
> 
> 
>

Re: Decommissioning a data node and problems bringing it back online

Posted by Wellington Chevreuil <we...@gmail.com>.

You should not face any data loss. The replicas were just moved away from that node to other nodes in the cluster during decommission. Once you recommission the node and re-balance your cluster, HDFS will re-distribute replicas between the nodes evenly, and the recommissioned node will receive replicas from other nodes, but there is no guarantee that exact the same replicas that were stored on this node before it was decommissioned will be assigned to this node again, after recommission and rebalance.

Cheers,
Wellington. 


On 24 Jul 2014, at 17:55, andrew touchet <ad...@latech.edu> wrote:

> Hi Mirko,
> 
> Thanks for the reply!
> 
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect any data loss due to starting the data node process before running the balancing tool?
> 
> Best Regards,
> 
> Andrew Touchet
> 
> 
> 
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com> wrote:
> After you added the nodes back to your cluster you run the balancer tool, but it will not bring in exactly the same blocks like before.
> 
> Cheers,
> Mirko
> 
> 
> 
> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
> 
> Thanks for the reply,
> 
> I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference. 
> 
> Currently I really need to know how to get the data that was replicated during decommissioning back onto my two data nodes. 
> 
> 
> 
> 
> 
> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
> which distribution are you using? 
> 
> Regards,
> Stanley Shi,
> 
> 
> 
> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu> wrote:
> I should have added this in my first email but I do get an error in the data node's log file
> 
> '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs'
> 
> 
> 
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:
> Hello,
> 
> I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. 
> 
> Is this safe to do? Will edited files affect the decommissioning node?
> 
> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode.
> 
> Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. 
> 
> I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.  
> 
> What steps have I missed in the decommissioning process or while bringing the data node back online?
> 
> 
> 
> 
> 
> 
>

Re: Decommissioning a data node and problems bringing it back online

Posted by Wellington Chevreuil <we...@gmail.com>.

You should not face any data loss. The replicas were just moved away from that node to other nodes in the cluster during decommission. Once you recommission the node and re-balance your cluster, HDFS will re-distribute replicas between the nodes evenly, and the recommissioned node will receive replicas from other nodes, but there is no guarantee that exact the same replicas that were stored on this node before it was decommissioned will be assigned to this node again, after recommission and rebalance.

Cheers,
Wellington. 


On 24 Jul 2014, at 17:55, andrew touchet <ad...@latech.edu> wrote:

> Hi Mirko,
> 
> Thanks for the reply!
> 
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect any data loss due to starting the data node process before running the balancing tool?
> 
> Best Regards,
> 
> Andrew Touchet
> 
> 
> 
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com> wrote:
> After you added the nodes back to your cluster you run the balancer tool, but it will not bring in exactly the same blocks like before.
> 
> Cheers,
> Mirko
> 
> 
> 
> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
> 
> Thanks for the reply,
> 
> I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference. 
> 
> Currently I really need to know how to get the data that was replicated during decommissioning back onto my two data nodes. 
> 
> 
> 
> 
> 
> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
> which distribution are you using? 
> 
> Regards,
> Stanley Shi,
> 
> 
> 
> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu> wrote:
> I should have added this in my first email but I do get an error in the data node's log file
> 
> '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs'
> 
> 
> 
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:
> Hello,
> 
> I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. 
> 
> Is this safe to do? Will edited files affect the decommissioning node?
> 
> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode.
> 
> Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. 
> 
> I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.  
> 
> What steps have I missed in the decommissioning process or while bringing the data node back online?
> 
> 
> 
> 
> 
> 
>

Re: Decommissioning a data node and problems bringing it back online

Posted by Wellington Chevreuil <we...@gmail.com>.

You should not face any data loss. The replicas were just moved away from that node to other nodes in the cluster during decommission. Once you recommission the node and re-balance your cluster, HDFS will re-distribute replicas between the nodes evenly, and the recommissioned node will receive replicas from other nodes, but there is no guarantee that exact the same replicas that were stored on this node before it was decommissioned will be assigned to this node again, after recommission and rebalance.

Cheers,
Wellington. 


On 24 Jul 2014, at 17:55, andrew touchet <ad...@latech.edu> wrote:

> Hi Mirko,
> 
> Thanks for the reply!
> 
> "...it will not bring in exactly the same blocks like before"
> Is that what usually happens when adding nodes back in? Should I expect any data loss due to starting the data node process before running the balancing tool?
> 
> Best Regards,
> 
> Andrew Touchet
> 
> 
> 
> On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com> wrote:
> After you added the nodes back to your cluster you run the balancer tool, but it will not bring in exactly the same blocks like before.
> 
> Cheers,
> Mirko
> 
> 
> 
> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
> 
> Thanks for the reply,
> 
> I am using Hadoop-0.20. We installed from Apache not cloundera, if that makes a difference. 
> 
> Currently I really need to know how to get the data that was replicated during decommissioning back onto my two data nodes. 
> 
> 
> 
> 
> 
> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
> which distribution are you using? 
> 
> Regards,
> Stanley Shi,
> 
> 
> 
> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu> wrote:
> I should have added this in my first email but I do get an error in the data node's log file
> 
> '2014-07-12 19:39:58,027 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs'
> 
> 
> 
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:
> Hello,
> 
> I am Decommissioning data nodes for an OS upgrade on a HPC cluster . Currently, users can run jobs that use data stored on /hdfs. They are able to access all datanodes/compute nodes except the one being decommissioned. 
> 
> Is this safe to do? Will edited files affect the decommissioning node?
> 
> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I simply wait for log files to report completion. After upgrade, I simply remove the node from hosts_exlude and start hadoop again on the datanode.
> 
> Also: Under the namenode web interface I just noticed that the node I have decommissioned previously now has 0 Configured capacity, Used, Remaining memory and is now 100% Used. 
> 
> I used the same /etc/sysconfig/hadoop file from before the upgrade, removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.  
> 
> What steps have I missed in the decommissioning process or while bringing the data node back online?
> 
> 
> 
> 
> 
> 
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Hi Mirko,

Thanks for the reply!

"...it will not bring in exactly the same blocks like before"
Is that what usually happens when adding nodes back in? Should I expect any
data loss due to starting the data node process before running the
balancing tool?

Best Regards,

Andrew Touchet



On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com>
wrote:

> After you added the nodes back to your cluster you run the balancer tool,
> but it will not bring in exactly the same blocks like before.
>

> Cheers,
> Mirko
>
>
>
> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
>
> Thanks for the reply,
>>
>> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
>> makes a difference.
>>
>> Currently I really need to know how to get the data that was replicated
>> during decommissioning back onto my two data nodes.
>>
>>
>>
>>
>>
>> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>>
>>> which distribution are you using?
>>>
>>> Regards,
>>> *Stanley Shi,*
>>>
>>>
>>>
>>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>>> wrote:
>>>
>>>> I should have added this in my first email but I do get an error in the
>>>> data node's log file
>>>>
>>>> '2014-07-12 19:39:58,027 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>>> got processed in 1 msecs'
>>>>
>>>>
>>>>
>>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>>
>>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>>
>>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>>
>>>>> Also: Under the namenode web interface I just noticed that the node I
>>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>>> Remaining memory and is now 100% Used.
>>>>>
>>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>>
>>>>> What steps have I missed in the decommissioning process or while
>>>>> bringing the data node back online?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Hi Mirko,

Thanks for the reply!

"...it will not bring in exactly the same blocks like before"
Is that what usually happens when adding nodes back in? Should I expect any
data loss due to starting the data node process before running the
balancing tool?

Best Regards,

Andrew Touchet



On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com>
wrote:

> After you added the nodes back to your cluster you run the balancer tool,
> but it will not bring in exactly the same blocks like before.
>

> Cheers,
> Mirko
>
>
>
> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
>
> Thanks for the reply,
>>
>> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
>> makes a difference.
>>
>> Currently I really need to know how to get the data that was replicated
>> during decommissioning back onto my two data nodes.
>>
>>
>>
>>
>>
>> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>>
>>> which distribution are you using?
>>>
>>> Regards,
>>> *Stanley Shi,*
>>>
>>>
>>>
>>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>>> wrote:
>>>
>>>> I should have added this in my first email but I do get an error in the
>>>> data node's log file
>>>>
>>>> '2014-07-12 19:39:58,027 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>>> got processed in 1 msecs'
>>>>
>>>>
>>>>
>>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>>
>>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>>
>>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>>
>>>>> Also: Under the namenode web interface I just noticed that the node I
>>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>>> Remaining memory and is now 100% Used.
>>>>>
>>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>>
>>>>> What steps have I missed in the decommissioning process or while
>>>>> bringing the data node back online?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Hi Mirko,

Thanks for the reply!

"...it will not bring in exactly the same blocks like before"
Is that what usually happens when adding nodes back in? Should I expect any
data loss due to starting the data node process before running the
balancing tool?

Best Regards,

Andrew Touchet



On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com>
wrote:

> After you added the nodes back to your cluster you run the balancer tool,
> but it will not bring in exactly the same blocks like before.
>

> Cheers,
> Mirko
>
>
>
> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
>
> Thanks for the reply,
>>
>> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
>> makes a difference.
>>
>> Currently I really need to know how to get the data that was replicated
>> during decommissioning back onto my two data nodes.
>>
>>
>>
>>
>>
>> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>>
>>> which distribution are you using?
>>>
>>> Regards,
>>> *Stanley Shi,*
>>>
>>>
>>>
>>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>>> wrote:
>>>
>>>> I should have added this in my first email but I do get an error in the
>>>> data node's log file
>>>>
>>>> '2014-07-12 19:39:58,027 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>>> got processed in 1 msecs'
>>>>
>>>>
>>>>
>>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>>
>>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>>
>>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>>
>>>>> Also: Under the namenode web interface I just noticed that the node I
>>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>>> Remaining memory and is now 100% Used.
>>>>>
>>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>>
>>>>> What steps have I missed in the decommissioning process or while
>>>>> bringing the data node back online?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Hi Mirko,

Thanks for the reply!

"...it will not bring in exactly the same blocks like before"
Is that what usually happens when adding nodes back in? Should I expect any
data loss due to starting the data node process before running the
balancing tool?

Best Regards,

Andrew Touchet



On Thu, Jul 24, 2014 at 11:37 AM, Mirko Kämpf <mi...@gmail.com>
wrote:

> After you added the nodes back to your cluster you run the balancer tool,
> but it will not bring in exactly the same blocks like before.
>

> Cheers,
> Mirko
>
>
>
> 2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:
>
> Thanks for the reply,
>>
>> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
>> makes a difference.
>>
>> Currently I really need to know how to get the data that was replicated
>> during decommissioning back onto my two data nodes.
>>
>>
>>
>>
>>
>> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>>
>>> which distribution are you using?
>>>
>>> Regards,
>>> *Stanley Shi,*
>>>
>>>
>>>
>>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>>> wrote:
>>>
>>>> I should have added this in my first email but I do get an error in the
>>>> data node's log file
>>>>
>>>> '2014-07-12 19:39:58,027 INFO
>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>>> got processed in 1 msecs'
>>>>
>>>>
>>>>
>>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>>
>>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>>
>>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>>
>>>>> Also: Under the namenode web interface I just noticed that the node I
>>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>>> Remaining memory and is now 100% Used.
>>>>>
>>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>>
>>>>> What steps have I missed in the decommissioning process or while
>>>>> bringing the data node back online?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by Mirko Kämpf <mi...@gmail.com>.

After you added the nodes back to your cluster you run the balancer tool,
but it will not bring in exactly the same blocks like before.

Cheers,
Mirko



2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:

> Thanks for the reply,
>
> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
> makes a difference.
>
> Currently I really need to know how to get the data that was replicated
> during decommissioning back onto my two data nodes.
>
>
>
>
>
> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>
>> which distribution are you using?
>>
>> Regards,
>> *Stanley Shi,*
>>
>>
>>
>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>> wrote:
>>
>>> I should have added this in my first email but I do get an error in the
>>> data node's log file
>>>
>>> '2014-07-12 19:39:58,027 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>> got processed in 1 msecs'
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>
>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>
>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>
>>>> Also: Under the namenode web interface I just noticed that the node I
>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>> Remaining memory and is now 100% Used.
>>>>
>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>
>>>> What steps have I missed in the decommissioning process or while
>>>> bringing the data node back online?
>>>>
>>>>
>>>>
>>>>
>>>
>>

Re: Decommissioning a data node and problems bringing it back online

Posted by Mirko Kämpf <mi...@gmail.com>.

After you added the nodes back to your cluster you run the balancer tool,
but it will not bring in exactly the same blocks like before.

Cheers,
Mirko



2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:

> Thanks for the reply,
>
> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
> makes a difference.
>
> Currently I really need to know how to get the data that was replicated
> during decommissioning back onto my two data nodes.
>
>
>
>
>
> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>
>> which distribution are you using?
>>
>> Regards,
>> *Stanley Shi,*
>>
>>
>>
>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>> wrote:
>>
>>> I should have added this in my first email but I do get an error in the
>>> data node's log file
>>>
>>> '2014-07-12 19:39:58,027 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>> got processed in 1 msecs'
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>
>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>
>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>
>>>> Also: Under the namenode web interface I just noticed that the node I
>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>> Remaining memory and is now 100% Used.
>>>>
>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>
>>>> What steps have I missed in the decommissioning process or while
>>>> bringing the data node back online?
>>>>
>>>>
>>>>
>>>>
>>>
>>

Re: Decommissioning a data node and problems bringing it back online

Posted by Mirko Kämpf <mi...@gmail.com>.

After you added the nodes back to your cluster you run the balancer tool,
but it will not bring in exactly the same blocks like before.

Cheers,
Mirko



2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:

> Thanks for the reply,
>
> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
> makes a difference.
>
> Currently I really need to know how to get the data that was replicated
> during decommissioning back onto my two data nodes.
>
>
>
>
>
> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>
>> which distribution are you using?
>>
>> Regards,
>> *Stanley Shi,*
>>
>>
>>
>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>> wrote:
>>
>>> I should have added this in my first email but I do get an error in the
>>> data node's log file
>>>
>>> '2014-07-12 19:39:58,027 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>> got processed in 1 msecs'
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>
>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>
>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>
>>>> Also: Under the namenode web interface I just noticed that the node I
>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>> Remaining memory and is now 100% Used.
>>>>
>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>
>>>> What steps have I missed in the decommissioning process or while
>>>> bringing the data node back online?
>>>>
>>>>
>>>>
>>>>
>>>
>>

Re: Decommissioning a data node and problems bringing it back online

Posted by Mirko Kämpf <mi...@gmail.com>.

After you added the nodes back to your cluster you run the balancer tool,
but it will not bring in exactly the same blocks like before.

Cheers,
Mirko



2014-07-24 17:34 GMT+01:00 andrew touchet <ad...@latech.edu>:

> Thanks for the reply,
>
> I am using Hadoop-0.20. We installed from Apache not cloundera, if that
> makes a difference.
>
> Currently I really need to know how to get the data that was replicated
> during decommissioning back onto my two data nodes.
>
>
>
>
>
> On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:
>
>> which distribution are you using?
>>
>> Regards,
>> *Stanley Shi,*
>>
>>
>>
>> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu>
>> wrote:
>>
>>> I should have added this in my first email but I do get an error in the
>>> data node's log file
>>>
>>> '2014-07-12 19:39:58,027 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>>> got processed in 1 msecs'
>>>
>>>
>>>
>>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>>
>>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>>
>>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>>> I simply wait for log files to report completion. After upgrade, I simply
>>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>>
>>>> Also: Under the namenode web interface I just noticed that the node I
>>>> have decommissioned previously now has 0 Configured capacity, Used,
>>>> Remaining memory and is now 100% Used.
>>>>
>>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>>
>>>> What steps have I missed in the decommissioning process or while
>>>> bringing the data node back online?
>>>>
>>>>
>>>>
>>>>
>>>
>>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Thanks for the reply,

I am using Hadoop-0.20. We installed from Apache not cloundera, if that
makes a difference.

Currently I really need to know how to get the data that was replicated
during decommissioning back onto my two data nodes.





On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:

> which distribution are you using?
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <adt027@latech.edu
> <javascript:_e(%7B%7D,'cvml','adt027@latech.edu');>> wrote:
>
>> I should have added this in my first email but I do get an error in the
>> data node's log file
>>
>> '2014-07-12 19:39:58,027 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>> got processed in 1 msecs'
>>
>>
>>
>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <adt027@latech.edu
>> <javascript:_e(%7B%7D,'cvml','adt027@latech.edu');>> wrote:
>>
>>> Hello,
>>>
>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>
>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>
>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>> I simply wait for log files to report completion. After upgrade, I simply
>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>
>>> Also: Under the namenode web interface I just noticed that the node I
>>> have decommissioned previously now has 0 Configured capacity, Used,
>>> Remaining memory and is now 100% Used.
>>>
>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>
>>> What steps have I missed in the decommissioning process or while
>>> bringing the data node back online?
>>>
>>>
>>>
>>>
>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Thanks for the reply,

I am using Hadoop-0.20. We installed from Apache not cloundera, if that
makes a difference.

Currently I really need to know how to get the data that was replicated
during decommissioning back onto my two data nodes.





On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:

> which distribution are you using?
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <adt027@latech.edu
> <javascript:_e(%7B%7D,'cvml','adt027@latech.edu');>> wrote:
>
>> I should have added this in my first email but I do get an error in the
>> data node's log file
>>
>> '2014-07-12 19:39:58,027 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>> got processed in 1 msecs'
>>
>>
>>
>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <adt027@latech.edu
>> <javascript:_e(%7B%7D,'cvml','adt027@latech.edu');>> wrote:
>>
>>> Hello,
>>>
>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>
>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>
>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>> I simply wait for log files to report completion. After upgrade, I simply
>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>
>>> Also: Under the namenode web interface I just noticed that the node I
>>> have decommissioned previously now has 0 Configured capacity, Used,
>>> Remaining memory and is now 100% Used.
>>>
>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>
>>> What steps have I missed in the decommissioning process or while
>>> bringing the data node back online?
>>>
>>>
>>>
>>>
>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Thanks for the reply,

I am using Hadoop-0.20. We installed from Apache not cloundera, if that
makes a difference.

Currently I really need to know how to get the data that was replicated
during decommissioning back onto my two data nodes.





On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:

> which distribution are you using?
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <adt027@latech.edu
> <javascript:_e(%7B%7D,'cvml','adt027@latech.edu');>> wrote:
>
>> I should have added this in my first email but I do get an error in the
>> data node's log file
>>
>> '2014-07-12 19:39:58,027 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>> got processed in 1 msecs'
>>
>>
>>
>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <adt027@latech.edu
>> <javascript:_e(%7B%7D,'cvml','adt027@latech.edu');>> wrote:
>>
>>> Hello,
>>>
>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>
>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>
>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>> I simply wait for log files to report completion. After upgrade, I simply
>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>
>>> Also: Under the namenode web interface I just noticed that the node I
>>> have decommissioned previously now has 0 Configured capacity, Used,
>>> Remaining memory and is now 100% Used.
>>>
>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>
>>> What steps have I missed in the decommissioning process or while
>>> bringing the data node back online?
>>>
>>>
>>>
>>>
>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

Thanks for the reply,

I am using Hadoop-0.20. We installed from Apache not cloundera, if that
makes a difference.

Currently I really need to know how to get the data that was replicated
during decommissioning back onto my two data nodes.





On Thursday, July 24, 2014, Stanley Shi <ss...@gopivotal.com> wrote:

> which distribution are you using?
>
> Regards,
> *Stanley Shi,*
>
>
>
> On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <adt027@latech.edu
> <javascript:_e(%7B%7D,'cvml','adt027@latech.edu');>> wrote:
>
>> I should have added this in my first email but I do get an error in the
>> data node's log file
>>
>> '2014-07-12 19:39:58,027 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
>> got processed in 1 msecs'
>>
>>
>>
>> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <adt027@latech.edu
>> <javascript:_e(%7B%7D,'cvml','adt027@latech.edu');>> wrote:
>>
>>> Hello,
>>>
>>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>>> Currently, users can run jobs that use data stored on /hdfs. They are able
>>> to access all datanodes/compute nodes except the one being decommissioned.
>>>
>>> Is this safe to do? Will edited files affect the decommissioning node?
>>>
>>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude
>>> and running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then
>>> I simply wait for log files to report completion. After upgrade, I simply
>>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>>
>>> Also: Under the namenode web interface I just noticed that the node I
>>> have decommissioned previously now has 0 Configured capacity, Used,
>>> Remaining memory and is now 100% Used.
>>>
>>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>>
>>> What steps have I missed in the decommissioning process or while
>>> bringing the data node back online?
>>>
>>>
>>>
>>>
>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by Stanley Shi <ss...@gopivotal.com>.

which distribution are you using?

Regards,
*Stanley Shi,*



On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu> wrote:

> I should have added this in my first email but I do get an error in the
> data node's log file
>
> '2014-07-12 19:39:58,027 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
> got processed in 1 msecs'
>
>
>
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:
>
>> Hello,
>>
>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>> Currently, users can run jobs that use data stored on /hdfs. They are able
>> to access all datanodes/compute nodes except the one being decommissioned.
>>
>> Is this safe to do? Will edited files affect the decommissioning node?
>>
>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
>> running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
>> simply wait for log files to report completion. After upgrade, I simply
>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>
>> Also: Under the namenode web interface I just noticed that the node I
>> have decommissioned previously now has 0 Configured capacity, Used,
>> Remaining memory and is now 100% Used.
>>
>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>
>> What steps have I missed in the decommissioning process or while bringing
>> the data node back online?
>>
>>
>>
>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by Stanley Shi <ss...@gopivotal.com>.

which distribution are you using?

Regards,
*Stanley Shi,*



On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu> wrote:

> I should have added this in my first email but I do get an error in the
> data node's log file
>
> '2014-07-12 19:39:58,027 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
> got processed in 1 msecs'
>
>
>
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:
>
>> Hello,
>>
>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>> Currently, users can run jobs that use data stored on /hdfs. They are able
>> to access all datanodes/compute nodes except the one being decommissioned.
>>
>> Is this safe to do? Will edited files affect the decommissioning node?
>>
>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
>> running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
>> simply wait for log files to report completion. After upgrade, I simply
>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>
>> Also: Under the namenode web interface I just noticed that the node I
>> have decommissioned previously now has 0 Configured capacity, Used,
>> Remaining memory and is now 100% Used.
>>
>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>
>> What steps have I missed in the decommissioning process or while bringing
>> the data node back online?
>>
>>
>>
>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by Stanley Shi <ss...@gopivotal.com>.

which distribution are you using?

Regards,
*Stanley Shi,*



On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu> wrote:

> I should have added this in my first email but I do get an error in the
> data node's log file
>
> '2014-07-12 19:39:58,027 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
> got processed in 1 msecs'
>
>
>
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:
>
>> Hello,
>>
>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>> Currently, users can run jobs that use data stored on /hdfs. They are able
>> to access all datanodes/compute nodes except the one being decommissioned.
>>
>> Is this safe to do? Will edited files affect the decommissioning node?
>>
>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
>> running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
>> simply wait for log files to report completion. After upgrade, I simply
>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>
>> Also: Under the namenode web interface I just noticed that the node I
>> have decommissioned previously now has 0 Configured capacity, Used,
>> Remaining memory and is now 100% Used.
>>
>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>
>> What steps have I missed in the decommissioning process or while bringing
>> the data node back online?
>>
>>
>>
>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by Stanley Shi <ss...@gopivotal.com>.

which distribution are you using?

Regards,
*Stanley Shi,*



On Thu, Jul 24, 2014 at 4:38 AM, andrew touchet <ad...@latech.edu> wrote:

> I should have added this in my first email but I do get an error in the
> data node's log file
>
> '2014-07-12 19:39:58,027 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
> got processed in 1 msecs'
>
>
>
> On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:
>
>> Hello,
>>
>> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
>> Currently, users can run jobs that use data stored on /hdfs. They are able
>> to access all datanodes/compute nodes except the one being decommissioned.
>>
>> Is this safe to do? Will edited files affect the decommissioning node?
>>
>> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
>> running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
>> simply wait for log files to report completion. After upgrade, I simply
>> remove the node from hosts_exlude and start hadoop again on the datanode.
>>
>> Also: Under the namenode web interface I just noticed that the node I
>> have decommissioned previously now has 0 Configured capacity, Used,
>> Remaining memory and is now 100% Used.
>>
>> I used the same /etc/sysconfig/hadoop file from before the upgrade,
>> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>>
>> What steps have I missed in the decommissioning process or while bringing
>> the data node back online?
>>
>>
>>
>>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

I should have added this in my first email but I do get an error in the
data node's log file

'2014-07-12 19:39:58,027 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
got processed in 1 msecs'



On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:

> Hello,
>
> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
> Currently, users can run jobs that use data stored on /hdfs. They are able
> to access all datanodes/compute nodes except the one being decommissioned.
>
> Is this safe to do? Will edited files affect the decommissioning node?
>
> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
> running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
> simply wait for log files to report completion. After upgrade, I simply
> remove the node from hosts_exlude and start hadoop again on the datanode.
>
> Also: Under the namenode web interface I just noticed that the node I have
> decommissioned previously now has 0 Configured capacity, Used, Remaining
> memory and is now 100% Used.
>
> I used the same /etc/sysconfig/hadoop file from before the upgrade,
> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>
> What steps have I missed in the decommissioning process or while bringing
> the data node back online?
>
>
>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

I should have added this in my first email but I do get an error in the
data node's log file

'2014-07-12 19:39:58,027 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
got processed in 1 msecs'



On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:

> Hello,
>
> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
> Currently, users can run jobs that use data stored on /hdfs. They are able
> to access all datanodes/compute nodes except the one being decommissioned.
>
> Is this safe to do? Will edited files affect the decommissioning node?
>
> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
> running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
> simply wait for log files to report completion. After upgrade, I simply
> remove the node from hosts_exlude and start hadoop again on the datanode.
>
> Also: Under the namenode web interface I just noticed that the node I have
> decommissioned previously now has 0 Configured capacity, Used, Remaining
> memory and is now 100% Used.
>
> I used the same /etc/sysconfig/hadoop file from before the upgrade,
> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>
> What steps have I missed in the decommissioning process or while bringing
> the data node back online?
>
>
>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

I should have added this in my first email but I do get an error in the
data node's log file

'2014-07-12 19:39:58,027 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
got processed in 1 msecs'



On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:

> Hello,
>
> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
> Currently, users can run jobs that use data stored on /hdfs. They are able
> to access all datanodes/compute nodes except the one being decommissioned.
>
> Is this safe to do? Will edited files affect the decommissioning node?
>
> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
> running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
> simply wait for log files to report completion. After upgrade, I simply
> remove the node from hosts_exlude and start hadoop again on the datanode.
>
> Also: Under the namenode web interface I just noticed that the node I have
> decommissioned previously now has 0 Configured capacity, Used, Remaining
> memory and is now 100% Used.
>
> I used the same /etc/sysconfig/hadoop file from before the upgrade,
> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>
> What steps have I missed in the decommissioning process or while bringing
> the data node back online?
>
>
>
>

Re: Decommissioning a data node and problems bringing it back online

Posted by andrew touchet <ad...@latech.edu>.

I should have added this in my first email but I do get an error in the
data node's log file

'2014-07-12 19:39:58,027 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks
got processed in 1 msecs'



On Wed, Jul 23, 2014 at 3:18 PM, andrew touchet <ad...@latech.edu> wrote:

> Hello,
>
> I am Decommissioning data nodes for an OS upgrade on a HPC cluster .
> Currently, users can run jobs that use data stored on /hdfs. They are able
> to access all datanodes/compute nodes except the one being decommissioned.
>
> Is this safe to do? Will edited files affect the decommissioning node?
>
> I've been adding the nodes to /usr/lib/hadoop-0.20/conf/hosts_exclude and
> running   'hadoop dfsadmin -refreshNodes' on the name name node.  Then I
> simply wait for log files to report completion. After upgrade, I simply
> remove the node from hosts_exlude and start hadoop again on the datanode.
>
> Also: Under the namenode web interface I just noticed that the node I have
> decommissioned previously now has 0 Configured capacity, Used, Remaining
> memory and is now 100% Used.
>
> I used the same /etc/sysconfig/hadoop file from before the upgrade,
> removed the node from hosts_exclude, and ran '-refreshNodes' afterwards.
>
> What steps have I missed in the decommissioning process or while bringing
> the data node back online?
>
>
>
>