You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by jiang licht <li...@yahoo.com> on 2010/08/20 22:31:56 UTC

Supersede a data node help: how to move all files out of a Hadoop data node?

Requirement: I want to get rid of a data node machine. But it has useful data that is still in use. So, I want to move all its files/blocks to other live data nodes in the same cluster.

Question: I understand that if a data node is down for a certain amount of time, it will be marked as "dead" and Hadoop will automatically generate a new replication on another live data node for each block on the dead node. So, sooner or later, all files/blocks on the dead data node will be replicated (or "moved") to other data node machines.  My question is:

Is it possible that this process can be explicitly controlled such that I know when all the missing blocks on the dead node are replicated to other live nodes?

What is the recommended way to do this?

How to check there is no missing blocks (or under replication)?

Thanks,

Michael

Re: Supersede a data node help: how to move all files out of a Hadoop data node?

Posted by jiang licht <li...@yahoo.com>.

Is there a throttle settings to control how fast the decommissioning can be?

Thanks,

Michael

--- On Fri, 8/20/10, Harsh J <qw...@gmail.com> wrote:

From: Harsh J <qw...@gmail.com>
Subject: Re: Supersede a data node help: how to move all files out of a Hadoop data node?
To: common-user@hadoop.apache.org
Date: Friday, August 20, 2010, 4:15 PM

Its called 'decommissioning' a DataNode. Check this FAQ:
http://wiki.apache.org/hadoop/FAQ#A17

This method should satisfy the notification requirement (for the whens, etc).
On Sat, Aug 21, 2010 at 2:17 AM, jiang licht <li...@yahoo.com> wrote:
> Thanks, Edward.
>
> Do you have ideas about the other question? Basically, one simple way is just simply shutdown a data node and wait for hadoop to detect this "dead" node and finish replicating all blocks on it. But I would guess hadoop will throttle this to not degrade other processing and thus it is hard to estimate when it can finish this process...
>
> Best regards,
>
> Michael
>
> --- On Fri, 8/20/10, Edward Capriolo <ed...@gmail.com> wrote:
>
> From: Edward Capriolo <ed...@gmail.com>
> Subject: Re: Supersede a data node help: how to move all files out of a Hadoop data node?
> To: common-user@hadoop.apache.org
> Date: Friday, August 20, 2010, 3:39 PM
>
> On Fri, Aug 20, 2010 at 4:31 PM, jiang licht <li...@yahoo.com> wrote:
>> Requirement: I want to get rid of a data node machine. But it has useful data that is still in use. So, I want to move all its files/blocks to other live data nodes in the same cluster.
>>
>> Question: I understand that if a data node is down for a certain amount of time, it will be marked as "dead" and Hadoop will automatically generate a new replication on another live data node for each block on the dead node. So, sooner or later, all files/blocks on the dead data node will be replicated (or "moved") to other data node machines.  My question is:
>>
>> Is it possible that this process can be explicitly controlled such that I know when all the missing blocks on the dead node are replicated to other live nodes?
>>
>> What is the recommended way to do this?
>>
>> How to check there is no missing blocks (or under replication)?
>>
>> Thanks,
>>
>> Michael
>>
>>
>>
>
> If you run a 'hadoop fsck /' one of the things reported is Under
> replicated blocks. When under replicated blocks =0 everything is
> moved. (assuming there are not other problems)
>
>
>
>



-- 
Harsh J
www.harshj.com

Re: Supersede a data node help: how to move all files out of a Hadoop data node?

Posted by jiang licht <li...@yahoo.com>.

Thanks, that is what I want. BTW, what a shame, I knew it but totally become ignorant of it for a while :(

Best regards,

Michael

--- On Fri, 8/20/10, Harsh J <qw...@gmail.com> wrote:

From: Harsh J <qw...@gmail.com>
Subject: Re: Supersede a data node help: how to move all files out of a Hadoop data node?
To: common-user@hadoop.apache.org
Date: Friday, August 20, 2010, 4:15 PM

Its called 'decommissioning' a DataNode. Check this FAQ:
http://wiki.apache.org/hadoop/FAQ#A17

This method should satisfy the notification requirement (for the whens, etc).
On Sat, Aug 21, 2010 at 2:17 AM, jiang licht <li...@yahoo.com> wrote:
> Thanks, Edward.
>
> Do you have ideas about the other question? Basically, one simple way is just simply shutdown a data node and wait for hadoop to detect this "dead" node and finish replicating all blocks on it. But I would guess hadoop will throttle this to not degrade other processing and thus it is hard to estimate when it can finish this process...
>
> Best regards,
>
> Michael
>
> --- On Fri, 8/20/10, Edward Capriolo <ed...@gmail.com> wrote:
>
> From: Edward Capriolo <ed...@gmail.com>
> Subject: Re: Supersede a data node help: how to move all files out of a Hadoop data node?
> To: common-user@hadoop.apache.org
> Date: Friday, August 20, 2010, 3:39 PM
>
> On Fri, Aug 20, 2010 at 4:31 PM, jiang licht <li...@yahoo.com> wrote:
>> Requirement: I want to get rid of a data node machine. But it has useful data that is still in use. So, I want to move all its files/blocks to other live data nodes in the same cluster.
>>
>> Question: I understand that if a data node is down for a certain amount of time, it will be marked as "dead" and Hadoop will automatically generate a new replication on another live data node for each block on the dead node. So, sooner or later, all files/blocks on the dead data node will be replicated (or "moved") to other data node machines.  My question is:
>>
>> Is it possible that this process can be explicitly controlled such that I know when all the missing blocks on the dead node are replicated to other live nodes?
>>
>> What is the recommended way to do this?
>>
>> How to check there is no missing blocks (or under replication)?
>>
>> Thanks,
>>
>> Michael
>>
>>
>>
>
> If you run a 'hadoop fsck /' one of the things reported is Under
> replicated blocks. When under replicated blocks =0 everything is
> moved. (assuming there are not other problems)
>
>
>
>



-- 
Harsh J
www.harshj.com

Re: Supersede a data node help: how to move all files out of a Hadoop data node?

Posted by Harsh J <qw...@gmail.com>.

Its called 'decommissioning' a DataNode. Check this FAQ:
http://wiki.apache.org/hadoop/FAQ#A17

This method should satisfy the notification requirement (for the whens, etc).
On Sat, Aug 21, 2010 at 2:17 AM, jiang licht <li...@yahoo.com> wrote:
> Thanks, Edward.
>
> Do you have ideas about the other question? Basically, one simple way is just simply shutdown a data node and wait for hadoop to detect this "dead" node and finish replicating all blocks on it. But I would guess hadoop will throttle this to not degrade other processing and thus it is hard to estimate when it can finish this process...
>
> Best regards,
>
> Michael
>
> --- On Fri, 8/20/10, Edward Capriolo <ed...@gmail.com> wrote:
>
> From: Edward Capriolo <ed...@gmail.com>
> Subject: Re: Supersede a data node help: how to move all files out of a Hadoop data node?
> To: common-user@hadoop.apache.org
> Date: Friday, August 20, 2010, 3:39 PM
>
> On Fri, Aug 20, 2010 at 4:31 PM, jiang licht <li...@yahoo.com> wrote:
>> Requirement: I want to get rid of a data node machine. But it has useful data that is still in use. So, I want to move all its files/blocks to other live data nodes in the same cluster.
>>
>> Question: I understand that if a data node is down for a certain amount of time, it will be marked as "dead" and Hadoop will automatically generate a new replication on another live data node for each block on the dead node. So, sooner or later, all files/blocks on the dead data node will be replicated (or "moved") to other data node machines.  My question is:
>>
>> Is it possible that this process can be explicitly controlled such that I know when all the missing blocks on the dead node are replicated to other live nodes?
>>
>> What is the recommended way to do this?
>>
>> How to check there is no missing blocks (or under replication)?
>>
>> Thanks,
>>
>> Michael
>>
>>
>>
>
> If you run a 'hadoop fsck /' one of the things reported is Under
> replicated blocks. When under replicated blocks =0 everything is
> moved. (assuming there are not other problems)
>
>
>
>



-- 
Harsh J
www.harshj.com

Re: Supersede a data node help: how to move all files out of a Hadoop data node?

Posted by jiang licht <li...@yahoo.com>.

Thanks, Edward.

Do you have ideas about the other question? Basically, one simple way is just simply shutdown a data node and wait for hadoop to detect this "dead" node and finish replicating all blocks on it. But I would guess hadoop will throttle this to not degrade other processing and thus it is hard to estimate when it can finish this process...

Best regards,

Michael

--- On Fri, 8/20/10, Edward Capriolo <ed...@gmail.com> wrote:

From: Edward Capriolo <ed...@gmail.com>
Subject: Re: Supersede a data node help: how to move all files out of a Hadoop data node?
To: common-user@hadoop.apache.org
Date: Friday, August 20, 2010, 3:39 PM

On Fri, Aug 20, 2010 at 4:31 PM, jiang licht <li...@yahoo.com> wrote:
> Requirement: I want to get rid of a data node machine. But it has useful data that is still in use. So, I want to move all its files/blocks to other live data nodes in the same cluster.
>
> Question: I understand that if a data node is down for a certain amount of time, it will be marked as "dead" and Hadoop will automatically generate a new replication on another live data node for each block on the dead node. So, sooner or later, all files/blocks on the dead data node will be replicated (or "moved") to other data node machines.  My question is:
>
> Is it possible that this process can be explicitly controlled such that I know when all the missing blocks on the dead node are replicated to other live nodes?
>
> What is the recommended way to do this?
>
> How to check there is no missing blocks (or under replication)?
>
> Thanks,
>
> Michael
>
>
>

If you run a 'hadoop fsck /' one of the things reported is Under
replicated blocks. When under replicated blocks =0 everything is
moved. (assuming there are not other problems)

Re: Supersede a data node help: how to move all files out of a Hadoop data node?

Posted by Edward Capriolo <ed...@gmail.com>.

On Fri, Aug 20, 2010 at 4:31 PM, jiang licht <li...@yahoo.com> wrote:
> Requirement: I want to get rid of a data node machine. But it has useful data that is still in use. So, I want to move all its files/blocks to other live data nodes in the same cluster.
>
> Question: I understand that if a data node is down for a certain amount of time, it will be marked as "dead" and Hadoop will automatically generate a new replication on another live data node for each block on the dead node. So, sooner or later, all files/blocks on the dead data node will be replicated (or "moved") to other data node machines.  My question is:
>
> Is it possible that this process can be explicitly controlled such that I know when all the missing blocks on the dead node are replicated to other live nodes?
>
> What is the recommended way to do this?
>
> How to check there is no missing blocks (or under replication)?
>
> Thanks,
>
> Michael
>
>
>

If you run a 'hadoop fsck /' one of the things reported is Under
replicated blocks. When under replicated blocks =0 everything is
moved. (assuming there are not other problems)