You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Mark Kerzner <ma...@shmsoft.com> on 2012/11/28 15:14:36 UTC

Replacing a hard drive on a slave

Hi,

can I remove one hard drive from a slave but tell Hadoop not to replicate
missing blocks for a few minutes, because I will return it back? Or will
this not work at all, and will Hadoop continue replicating, since I removed
blocks, even for a short time?

Thank you. Sincerely,
Mark

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you.

Mark

On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:

> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
> seconds by default.
>
> Right now there isn't a good way to replace a disk out from under a
> running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
>
>
>
>
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>
>> Hi,
>>
>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>> missing blocks for a few minutes, because I will return it back? Or will
>> this not work at all, and will Hadoop continue replicating, since I removed
>> blocks, even for a short time?
>>
>> Thank you. Sincerely,
>> Mark
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Michael Segel <mi...@hotmail.com>.

I think we kind of talked about this on one of the linkedIn discussion groups.  Either Hard Core Hadoop, or Big Data Low Latency. 

What I heard from guys who manage very, very large clusters is that they don't replace the disks right away.  In my own experience, you lose the drive late at night, you may or may not get the alert. (disk fails could go in to the 'morning report' bucket and not wake you up at 3:00 am bucket.)  So they may end up create a work ticket to get the disk replaced. 

The disk gets replaced, then someone in IS formats it and brings it up online. Then your hadoop admin creates the correct folders/permissions and bounces the DN so your Hadoop cluster can see it. (Unless a newer release makes it easier to do this...) 

The whole point of 3X replication is that you don't have to run around and worry about under replicated blocks and lost disks. 

I always liked the idea of someone running around with a little shopping cart, you know the one with two bins, one w new drives and one for the bad drives.... doing their daily walk in the machine room... 

HTH

-Mike

On Nov 28, 2012, at 9:43 AM, Mark Kerzner <ma...@shmsoft.com> wrote:

> Somebody asked me, and I did not know what to answer. I will ask them your questions.
> 
> Thank you.
> Mark
> 
> On Wed, Nov 28, 2012 at 7:41 AM, Michael Segel <mi...@hotmail.com> wrote:
> Silly question, why are you worrying about this?
> 
> In a production the odds of getting a replacement disk in service within 10 minutes after a fault is detected is highly improbable. 
> 
> Why do you care that the blocks are replicated to another node? 
> After you replace the disk, bounce the node (restart DN) (RS if running) , you can always force a rebalance of the cluster. 
> 
> 
> On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> 
>> What happens if I stop the datanode, miss the 10 min 30 seconds deadline, and restart the datanode say 30 minutes later? Will Hadoop re-use the data on this datanode, balancing it with HDFS? What happens to those blocks that correspond to file that have been updated meanwhile?
>> 
>> Mark
>> 
>> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com> wrote:
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30 seconds by default.
>> 
>> Right now there isn't a good way to replace a disk out from under a running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>> 
>> 
>> 
>> 
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
>> Hi,
>> 
>> can I remove one hard drive from a slave but tell Hadoop not to replicate missing blocks for a few minutes, because I will return it back? Or will this not work at all, and will Hadoop continue replicating, since I removed blocks, even for a short time?
>> 
>> Thank you. Sincerely,
>> Mark
>> 
>> 
> 
>

Re: Replacing a hard drive on a slave

Posted by Michael Segel <mi...@hotmail.com>.

I think we kind of talked about this on one of the linkedIn discussion groups.  Either Hard Core Hadoop, or Big Data Low Latency. 

What I heard from guys who manage very, very large clusters is that they don't replace the disks right away.  In my own experience, you lose the drive late at night, you may or may not get the alert. (disk fails could go in to the 'morning report' bucket and not wake you up at 3:00 am bucket.)  So they may end up create a work ticket to get the disk replaced. 

The disk gets replaced, then someone in IS formats it and brings it up online. Then your hadoop admin creates the correct folders/permissions and bounces the DN so your Hadoop cluster can see it. (Unless a newer release makes it easier to do this...) 

The whole point of 3X replication is that you don't have to run around and worry about under replicated blocks and lost disks. 

I always liked the idea of someone running around with a little shopping cart, you know the one with two bins, one w new drives and one for the bad drives.... doing their daily walk in the machine room... 

HTH

-Mike

On Nov 28, 2012, at 9:43 AM, Mark Kerzner <ma...@shmsoft.com> wrote:

> Somebody asked me, and I did not know what to answer. I will ask them your questions.
> 
> Thank you.
> Mark
> 
> On Wed, Nov 28, 2012 at 7:41 AM, Michael Segel <mi...@hotmail.com> wrote:
> Silly question, why are you worrying about this?
> 
> In a production the odds of getting a replacement disk in service within 10 minutes after a fault is detected is highly improbable. 
> 
> Why do you care that the blocks are replicated to another node? 
> After you replace the disk, bounce the node (restart DN) (RS if running) , you can always force a rebalance of the cluster. 
> 
> 
> On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> 
>> What happens if I stop the datanode, miss the 10 min 30 seconds deadline, and restart the datanode say 30 minutes later? Will Hadoop re-use the data on this datanode, balancing it with HDFS? What happens to those blocks that correspond to file that have been updated meanwhile?
>> 
>> Mark
>> 
>> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com> wrote:
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30 seconds by default.
>> 
>> Right now there isn't a good way to replace a disk out from under a running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>> 
>> 
>> 
>> 
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
>> Hi,
>> 
>> can I remove one hard drive from a slave but tell Hadoop not to replicate missing blocks for a few minutes, because I will return it back? Or will this not work at all, and will Hadoop continue replicating, since I removed blocks, even for a short time?
>> 
>> Thank you. Sincerely,
>> Mark
>> 
>> 
> 
>

Re: Replacing a hard drive on a slave

Posted by Michael Segel <mi...@hotmail.com>.

I think we kind of talked about this on one of the linkedIn discussion groups.  Either Hard Core Hadoop, or Big Data Low Latency. 

What I heard from guys who manage very, very large clusters is that they don't replace the disks right away.  In my own experience, you lose the drive late at night, you may or may not get the alert. (disk fails could go in to the 'morning report' bucket and not wake you up at 3:00 am bucket.)  So they may end up create a work ticket to get the disk replaced. 

The disk gets replaced, then someone in IS formats it and brings it up online. Then your hadoop admin creates the correct folders/permissions and bounces the DN so your Hadoop cluster can see it. (Unless a newer release makes it easier to do this...) 

The whole point of 3X replication is that you don't have to run around and worry about under replicated blocks and lost disks. 

I always liked the idea of someone running around with a little shopping cart, you know the one with two bins, one w new drives and one for the bad drives.... doing their daily walk in the machine room... 

HTH

-Mike

On Nov 28, 2012, at 9:43 AM, Mark Kerzner <ma...@shmsoft.com> wrote:

> Somebody asked me, and I did not know what to answer. I will ask them your questions.
> 
> Thank you.
> Mark
> 
> On Wed, Nov 28, 2012 at 7:41 AM, Michael Segel <mi...@hotmail.com> wrote:
> Silly question, why are you worrying about this?
> 
> In a production the odds of getting a replacement disk in service within 10 minutes after a fault is detected is highly improbable. 
> 
> Why do you care that the blocks are replicated to another node? 
> After you replace the disk, bounce the node (restart DN) (RS if running) , you can always force a rebalance of the cluster. 
> 
> 
> On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> 
>> What happens if I stop the datanode, miss the 10 min 30 seconds deadline, and restart the datanode say 30 minutes later? Will Hadoop re-use the data on this datanode, balancing it with HDFS? What happens to those blocks that correspond to file that have been updated meanwhile?
>> 
>> Mark
>> 
>> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com> wrote:
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30 seconds by default.
>> 
>> Right now there isn't a good way to replace a disk out from under a running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>> 
>> 
>> 
>> 
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
>> Hi,
>> 
>> can I remove one hard drive from a slave but tell Hadoop not to replicate missing blocks for a few minutes, because I will return it back? Or will this not work at all, and will Hadoop continue replicating, since I removed blocks, even for a short time?
>> 
>> Thank you. Sincerely,
>> Mark
>> 
>> 
> 
>

Re: Replacing a hard drive on a slave

Posted by Michael Segel <mi...@hotmail.com>.

I think we kind of talked about this on one of the linkedIn discussion groups.  Either Hard Core Hadoop, or Big Data Low Latency. 

What I heard from guys who manage very, very large clusters is that they don't replace the disks right away.  In my own experience, you lose the drive late at night, you may or may not get the alert. (disk fails could go in to the 'morning report' bucket and not wake you up at 3:00 am bucket.)  So they may end up create a work ticket to get the disk replaced. 

The disk gets replaced, then someone in IS formats it and brings it up online. Then your hadoop admin creates the correct folders/permissions and bounces the DN so your Hadoop cluster can see it. (Unless a newer release makes it easier to do this...) 

The whole point of 3X replication is that you don't have to run around and worry about under replicated blocks and lost disks. 

I always liked the idea of someone running around with a little shopping cart, you know the one with two bins, one w new drives and one for the bad drives.... doing their daily walk in the machine room... 

HTH

-Mike

On Nov 28, 2012, at 9:43 AM, Mark Kerzner <ma...@shmsoft.com> wrote:

> Somebody asked me, and I did not know what to answer. I will ask them your questions.
> 
> Thank you.
> Mark
> 
> On Wed, Nov 28, 2012 at 7:41 AM, Michael Segel <mi...@hotmail.com> wrote:
> Silly question, why are you worrying about this?
> 
> In a production the odds of getting a replacement disk in service within 10 minutes after a fault is detected is highly improbable. 
> 
> Why do you care that the blocks are replicated to another node? 
> After you replace the disk, bounce the node (restart DN) (RS if running) , you can always force a rebalance of the cluster. 
> 
> 
> On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> 
>> What happens if I stop the datanode, miss the 10 min 30 seconds deadline, and restart the datanode say 30 minutes later? Will Hadoop re-use the data on this datanode, balancing it with HDFS? What happens to those blocks that correspond to file that have been updated meanwhile?
>> 
>> Mark
>> 
>> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com> wrote:
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30 seconds by default.
>> 
>> Right now there isn't a good way to replace a disk out from under a running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>> 
>> 
>> 
>> 
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
>> Hi,
>> 
>> can I remove one hard drive from a slave but tell Hadoop not to replicate missing blocks for a few minutes, because I will return it back? Or will this not work at all, and will Hadoop continue replicating, since I removed blocks, even for a short time?
>> 
>> Thank you. Sincerely,
>> Mark
>> 
>> 
> 
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

Somebody asked me, and I did not know what to answer. I will ask them your
questions.

Thank you.
Mark

On Wed, Nov 28, 2012 at 7:41 AM, Michael Segel <mi...@hotmail.com>wrote:

> Silly question, why are you worrying about this?
>
> In a production the odds of getting a replacement disk in service within
> 10 minutes after a fault is detected is highly improbable.
>
> Why do you care that the blocks are replicated to another node?
> After you replace the disk, bounce the node (restart DN) (RS if running) ,
> you can always force a rebalance of the cluster.
>
>
> On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
>
> What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
> and restart the datanode say 30 minutes later? Will Hadoop re-use the data
> on this datanode, balancing it with HDFS? What happens to those blocks that
> correspond to file that have been updated meanwhile?
>
> Mark
>
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:
>
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes
>> 30 seconds by default.
>>
>> Right now there isn't a good way to replace a disk out from under a
>> running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>>
>>
>>
>>
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>>
>>> Hi,
>>>
>>> can I remove one hard drive from a slave but tell Hadoop not to
>>> replicate missing blocks for a few minutes, because I will return it back?
>>> Or will this not work at all, and will Hadoop continue replicating, since I
>>> removed blocks, even for a short time?
>>>
>>> Thank you. Sincerely,
>>> Mark
>>>
>>
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

Somebody asked me, and I did not know what to answer. I will ask them your
questions.

Thank you.
Mark

On Wed, Nov 28, 2012 at 7:41 AM, Michael Segel <mi...@hotmail.com>wrote:

> Silly question, why are you worrying about this?
>
> In a production the odds of getting a replacement disk in service within
> 10 minutes after a fault is detected is highly improbable.
>
> Why do you care that the blocks are replicated to another node?
> After you replace the disk, bounce the node (restart DN) (RS if running) ,
> you can always force a rebalance of the cluster.
>
>
> On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
>
> What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
> and restart the datanode say 30 minutes later? Will Hadoop re-use the data
> on this datanode, balancing it with HDFS? What happens to those blocks that
> correspond to file that have been updated meanwhile?
>
> Mark
>
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:
>
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes
>> 30 seconds by default.
>>
>> Right now there isn't a good way to replace a disk out from under a
>> running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>>
>>
>>
>>
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>>
>>> Hi,
>>>
>>> can I remove one hard drive from a slave but tell Hadoop not to
>>> replicate missing blocks for a few minutes, because I will return it back?
>>> Or will this not work at all, and will Hadoop continue replicating, since I
>>> removed blocks, even for a short time?
>>>
>>> Thank you. Sincerely,
>>> Mark
>>>
>>
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

Somebody asked me, and I did not know what to answer. I will ask them your
questions.

Thank you.
Mark

On Wed, Nov 28, 2012 at 7:41 AM, Michael Segel <mi...@hotmail.com>wrote:

> Silly question, why are you worrying about this?
>
> In a production the odds of getting a replacement disk in service within
> 10 minutes after a fault is detected is highly improbable.
>
> Why do you care that the blocks are replicated to another node?
> After you replace the disk, bounce the node (restart DN) (RS if running) ,
> you can always force a rebalance of the cluster.
>
>
> On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
>
> What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
> and restart the datanode say 30 minutes later? Will Hadoop re-use the data
> on this datanode, balancing it with HDFS? What happens to those blocks that
> correspond to file that have been updated meanwhile?
>
> Mark
>
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:
>
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes
>> 30 seconds by default.
>>
>> Right now there isn't a good way to replace a disk out from under a
>> running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>>
>>
>>
>>
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>>
>>> Hi,
>>>
>>> can I remove one hard drive from a slave but tell Hadoop not to
>>> replicate missing blocks for a few minutes, because I will return it back?
>>> Or will this not work at all, and will Hadoop continue replicating, since I
>>> removed blocks, even for a short time?
>>>
>>> Thank you. Sincerely,
>>> Mark
>>>
>>
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

Somebody asked me, and I did not know what to answer. I will ask them your
questions.

Thank you.
Mark

On Wed, Nov 28, 2012 at 7:41 AM, Michael Segel <mi...@hotmail.com>wrote:

> Silly question, why are you worrying about this?
>
> In a production the odds of getting a replacement disk in service within
> 10 minutes after a fault is detected is highly improbable.
>
> Why do you care that the blocks are replicated to another node?
> After you replace the disk, bounce the node (restart DN) (RS if running) ,
> you can always force a rebalance of the cluster.
>
>
> On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com>
> wrote:
>
> What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
> and restart the datanode say 30 minutes later? Will Hadoop re-use the data
> on this datanode, balancing it with HDFS? What happens to those blocks that
> correspond to file that have been updated meanwhile?
>
> Mark
>
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:
>
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes
>> 30 seconds by default.
>>
>> Right now there isn't a good way to replace a disk out from under a
>> running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>>
>>
>>
>>
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>>
>>> Hi,
>>>
>>> can I remove one hard drive from a slave but tell Hadoop not to
>>> replicate missing blocks for a few minutes, because I will return it back?
>>> Or will this not work at all, and will Hadoop continue replicating, since I
>>> removed blocks, even for a short time?
>>>
>>> Thank you. Sincerely,
>>> Mark
>>>
>>
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Michael Segel <mi...@hotmail.com>.

Silly question, why are you worrying about this?

In a production the odds of getting a replacement disk in service within 10 minutes after a fault is detected is highly improbable. 

Why do you care that the blocks are replicated to another node? 
After you replace the disk, bounce the node (restart DN) (RS if running) , you can always force a rebalance of the cluster. 


On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com> wrote:

> What happens if I stop the datanode, miss the 10 min 30 seconds deadline, and restart the datanode say 30 minutes later? Will Hadoop re-use the data on this datanode, balancing it with HDFS? What happens to those blocks that correspond to file that have been updated meanwhile?
> 
> Mark
> 
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com> wrote:
> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30 seconds by default.
> 
> Right now there isn't a good way to replace a disk out from under a running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
> 
> 
> 
> 
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
> 
> can I remove one hard drive from a slave but tell Hadoop not to replicate missing blocks for a few minutes, because I will return it back? Or will this not work at all, and will Hadoop continue replicating, since I removed blocks, even for a short time?
> 
> Thank you. Sincerely,
> Mark
> 
>

Re: Replacing a hard drive on a slave

Posted by Harsh J <ha...@cloudera.com>.

When added back, with blocks retained, the NN would detect that the
affected files have over-replicated conditions, and will suitably
delete any excess replicas while still adhering to the block placement
policy (for rack-aware clusters), but not necessarily everything from
the re-added DN will be erased.

This is an automatic process and should not worry you in any way, as
an operator.

On Wed, Nov 28, 2012 at 8:52 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
> and restart the datanode say 30 minutes later? Will Hadoop re-use the data
> on this datanode, balancing it with HDFS? What happens to those blocks that
> correspond to file that have been updated meanwhile?
>
> Mark
>
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>
> wrote:
>>
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
>> seconds by default.
>>
>> Right now there isn't a good way to replace a disk out from under a
>> running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>>
>>
>>
>>
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>>> missing blocks for a few minutes, because I will return it back? Or will
>>> this not work at all, and will Hadoop continue replicating, since I removed
>>> blocks, even for a short time?
>>>
>>> Thank you. Sincerely,
>>> Mark
>>
>>
>



-- 
Harsh J

Re: Replacing a hard drive on a slave

Posted by Michael Segel <mi...@hotmail.com>.

Silly question, why are you worrying about this?

In a production the odds of getting a replacement disk in service within 10 minutes after a fault is detected is highly improbable. 

Why do you care that the blocks are replicated to another node? 
After you replace the disk, bounce the node (restart DN) (RS if running) , you can always force a rebalance of the cluster. 


On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com> wrote:

> What happens if I stop the datanode, miss the 10 min 30 seconds deadline, and restart the datanode say 30 minutes later? Will Hadoop re-use the data on this datanode, balancing it with HDFS? What happens to those blocks that correspond to file that have been updated meanwhile?
> 
> Mark
> 
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com> wrote:
> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30 seconds by default.
> 
> Right now there isn't a good way to replace a disk out from under a running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
> 
> 
> 
> 
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
> 
> can I remove one hard drive from a slave but tell Hadoop not to replicate missing blocks for a few minutes, because I will return it back? Or will this not work at all, and will Hadoop continue replicating, since I removed blocks, even for a short time?
> 
> Thank you. Sincerely,
> Mark
> 
>

Re: Replacing a hard drive on a slave

Posted by Michael Segel <mi...@hotmail.com>.

Silly question, why are you worrying about this?

In a production the odds of getting a replacement disk in service within 10 minutes after a fault is detected is highly improbable. 

Why do you care that the blocks are replicated to another node? 
After you replace the disk, bounce the node (restart DN) (RS if running) , you can always force a rebalance of the cluster. 


On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com> wrote:

> What happens if I stop the datanode, miss the 10 min 30 seconds deadline, and restart the datanode say 30 minutes later? Will Hadoop re-use the data on this datanode, balancing it with HDFS? What happens to those blocks that correspond to file that have been updated meanwhile?
> 
> Mark
> 
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com> wrote:
> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30 seconds by default.
> 
> Right now there isn't a good way to replace a disk out from under a running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
> 
> 
> 
> 
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
> 
> can I remove one hard drive from a slave but tell Hadoop not to replicate missing blocks for a few minutes, because I will return it back? Or will this not work at all, and will Hadoop continue replicating, since I removed blocks, even for a short time?
> 
> Thank you. Sincerely,
> Mark
> 
>

Re: Replacing a hard drive on a slave

Posted by Harsh J <ha...@cloudera.com>.

When added back, with blocks retained, the NN would detect that the
affected files have over-replicated conditions, and will suitably
delete any excess replicas while still adhering to the block placement
policy (for rack-aware clusters), but not necessarily everything from
the re-added DN will be erased.

This is an automatic process and should not worry you in any way, as
an operator.

On Wed, Nov 28, 2012 at 8:52 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
> and restart the datanode say 30 minutes later? Will Hadoop re-use the data
> on this datanode, balancing it with HDFS? What happens to those blocks that
> correspond to file that have been updated meanwhile?
>
> Mark
>
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>
> wrote:
>>
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
>> seconds by default.
>>
>> Right now there isn't a good way to replace a disk out from under a
>> running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>>
>>
>>
>>
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>>> missing blocks for a few minutes, because I will return it back? Or will
>>> this not work at all, and will Hadoop continue replicating, since I removed
>>> blocks, even for a short time?
>>>
>>> Thank you. Sincerely,
>>> Mark
>>
>>
>



-- 
Harsh J

Re: Replacing a hard drive on a slave

Posted by Michael Segel <mi...@hotmail.com>.

Silly question, why are you worrying about this?

In a production the odds of getting a replacement disk in service within 10 minutes after a fault is detected is highly improbable. 

Why do you care that the blocks are replicated to another node? 
After you replace the disk, bounce the node (restart DN) (RS if running) , you can always force a rebalance of the cluster. 


On Nov 28, 2012, at 9:22 AM, Mark Kerzner <ma...@shmsoft.com> wrote:

> What happens if I stop the datanode, miss the 10 min 30 seconds deadline, and restart the datanode say 30 minutes later? Will Hadoop re-use the data on this datanode, balancing it with HDFS? What happens to those blocks that correspond to file that have been updated meanwhile?
> 
> Mark
> 
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com> wrote:
> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30 seconds by default.
> 
> Right now there isn't a good way to replace a disk out from under a running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
> 
> 
> 
> 
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com> wrote:
> Hi,
> 
> can I remove one hard drive from a slave but tell Hadoop not to replicate missing blocks for a few minutes, because I will return it back? Or will this not work at all, and will Hadoop continue replicating, since I removed blocks, even for a short time?
> 
> Thank you. Sincerely,
> Mark
> 
>

Re: Replacing a hard drive on a slave

Posted by Harsh J <ha...@cloudera.com>.

When added back, with blocks retained, the NN would detect that the
affected files have over-replicated conditions, and will suitably
delete any excess replicas while still adhering to the block placement
policy (for rack-aware clusters), but not necessarily everything from
the re-added DN will be erased.

This is an automatic process and should not worry you in any way, as
an operator.

On Wed, Nov 28, 2012 at 8:52 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
> and restart the datanode say 30 minutes later? Will Hadoop re-use the data
> on this datanode, balancing it with HDFS? What happens to those blocks that
> correspond to file that have been updated meanwhile?
>
> Mark
>
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>
> wrote:
>>
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
>> seconds by default.
>>
>> Right now there isn't a good way to replace a disk out from under a
>> running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>>
>>
>>
>>
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>>> missing blocks for a few minutes, because I will return it back? Or will
>>> this not work at all, and will Hadoop continue replicating, since I removed
>>> blocks, even for a short time?
>>>
>>> Thank you. Sincerely,
>>> Mark
>>
>>
>



-- 
Harsh J

Re: Replacing a hard drive on a slave

Posted by Harsh J <ha...@cloudera.com>.

When added back, with blocks retained, the NN would detect that the
affected files have over-replicated conditions, and will suitably
delete any excess replicas while still adhering to the block placement
policy (for rack-aware clusters), but not necessarily everything from
the re-added DN will be erased.

This is an automatic process and should not worry you in any way, as
an operator.

On Wed, Nov 28, 2012 at 8:52 PM, Mark Kerzner <ma...@shmsoft.com> wrote:
> What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
> and restart the datanode say 30 minutes later? Will Hadoop re-use the data
> on this datanode, balancing it with HDFS? What happens to those blocks that
> correspond to file that have been updated meanwhile?
>
> Mark
>
> On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>
> wrote:
>>
>> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
>> seconds by default.
>>
>> Right now there isn't a good way to replace a disk out from under a
>> running datanode, so the best way is:
>> - Stop the DN
>> - Replace the disk
>> - Restart the DN
>>
>>
>>
>>
>> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>>> missing blocks for a few minutes, because I will return it back? Or will
>>> this not work at all, and will Hadoop continue replicating, since I removed
>>> blocks, even for a short time?
>>>
>>> Thank you. Sincerely,
>>> Mark
>>
>>
>



-- 
Harsh J

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
and restart the datanode say 30 minutes later? Will Hadoop re-use the data
on this datanode, balancing it with HDFS? What happens to those blocks that
correspond to file that have been updated meanwhile?

Mark

On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:

> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
> seconds by default.
>
> Right now there isn't a good way to replace a disk out from under a
> running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
>
>
>
>
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>
>> Hi,
>>
>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>> missing blocks for a few minutes, because I will return it back? Or will
>> this not work at all, and will Hadoop continue replicating, since I removed
>> blocks, even for a short time?
>>
>> Thank you. Sincerely,
>> Mark
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
and restart the datanode say 30 minutes later? Will Hadoop re-use the data
on this datanode, balancing it with HDFS? What happens to those blocks that
correspond to file that have been updated meanwhile?

Mark

On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:

> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
> seconds by default.
>
> Right now there isn't a good way to replace a disk out from under a
> running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
>
>
>
>
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>
>> Hi,
>>
>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>> missing blocks for a few minutes, because I will return it back? Or will
>> this not work at all, and will Hadoop continue replicating, since I removed
>> blocks, even for a short time?
>>
>> Thank you. Sincerely,
>> Mark
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you.

Mark

On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:

> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
> seconds by default.
>
> Right now there isn't a good way to replace a disk out from under a
> running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
>
>
>
>
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>
>> Hi,
>>
>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>> missing blocks for a few minutes, because I will return it back? Or will
>> this not work at all, and will Hadoop continue replicating, since I removed
>> blocks, even for a short time?
>>
>> Thank you. Sincerely,
>> Mark
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you.

Mark

On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:

> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
> seconds by default.
>
> Right now there isn't a good way to replace a disk out from under a
> running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
>
>
>
>
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>
>> Hi,
>>
>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>> missing blocks for a few minutes, because I will return it back? Or will
>> this not work at all, and will Hadoop continue replicating, since I removed
>> blocks, even for a short time?
>>
>> Thank you. Sincerely,
>> Mark
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
and restart the datanode say 30 minutes later? Will Hadoop re-use the data
on this datanode, balancing it with HDFS? What happens to those blocks that
correspond to file that have been updated meanwhile?

Mark

On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:

> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
> seconds by default.
>
> Right now there isn't a good way to replace a disk out from under a
> running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
>
>
>
>
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>
>> Hi,
>>
>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>> missing blocks for a few minutes, because I will return it back? Or will
>> this not work at all, and will Hadoop continue replicating, since I removed
>> blocks, even for a short time?
>>
>> Thank you. Sincerely,
>> Mark
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

Thank you.

Mark

On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:

> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
> seconds by default.
>
> Right now there isn't a good way to replace a disk out from under a
> running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
>
>
>
>
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>
>> Hi,
>>
>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>> missing blocks for a few minutes, because I will return it back? Or will
>> this not work at all, and will Hadoop continue replicating, since I removed
>> blocks, even for a short time?
>>
>> Thank you. Sincerely,
>> Mark
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Mark Kerzner <ma...@shmsoft.com>.

What happens if I stop the datanode, miss the 10 min 30 seconds deadline,
and restart the datanode say 30 minutes later? Will Hadoop re-use the data
on this datanode, balancing it with HDFS? What happens to those blocks that
correspond to file that have been updated meanwhile?

Mark

On Wed, Nov 28, 2012 at 6:51 AM, Stephen Fritz <st...@cloudera.com>wrote:

> HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
> seconds by default.
>
> Right now there isn't a good way to replace a disk out from under a
> running datanode, so the best way is:
> - Stop the DN
> - Replace the disk
> - Restart the DN
>
>
>
>
> On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:
>
>> Hi,
>>
>> can I remove one hard drive from a slave but tell Hadoop not to replicate
>> missing blocks for a few minutes, because I will return it back? Or will
>> this not work at all, and will Hadoop continue replicating, since I removed
>> blocks, even for a short time?
>>
>> Thank you. Sincerely,
>> Mark
>>
>
>

Re: Replacing a hard drive on a slave

Posted by Stephen Fritz <st...@cloudera.com>.

HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
seconds by default.

Right now there isn't a good way to replace a disk out from under a running
datanode, so the best way is:
- Stop the DN
- Replace the disk
- Restart the DN



On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:

> Hi,
>
> can I remove one hard drive from a slave but tell Hadoop not to replicate
> missing blocks for a few minutes, because I will return it back? Or will
> this not work at all, and will Hadoop continue replicating, since I removed
> blocks, even for a short time?
>
> Thank you. Sincerely,
> Mark
>

Re: Replacing a hard drive on a slave

Posted by Stephen Fritz <st...@cloudera.com>.

HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
seconds by default.

Right now there isn't a good way to replace a disk out from under a running
datanode, so the best way is:
- Stop the DN
- Replace the disk
- Restart the DN



On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:

> Hi,
>
> can I remove one hard drive from a slave but tell Hadoop not to replicate
> missing blocks for a few minutes, because I will return it back? Or will
> this not work at all, and will Hadoop continue replicating, since I removed
> blocks, even for a short time?
>
> Thank you. Sincerely,
> Mark
>

Re: Replacing a hard drive on a slave

Posted by Stephen Fritz <st...@cloudera.com>.

HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
seconds by default.

Right now there isn't a good way to replace a disk out from under a running
datanode, so the best way is:
- Stop the DN
- Replace the disk
- Restart the DN



On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:

> Hi,
>
> can I remove one hard drive from a slave but tell Hadoop not to replicate
> missing blocks for a few minutes, because I will return it back? Or will
> this not work at all, and will Hadoop continue replicating, since I removed
> blocks, even for a short time?
>
> Thank you. Sincerely,
> Mark
>

Re: Replacing a hard drive on a slave

Posted by Stephen Fritz <st...@cloudera.com>.

HDFS will not start re-replicating blocks from a dead DN for 10 minutes 30
seconds by default.

Right now there isn't a good way to replace a disk out from under a running
datanode, so the best way is:
- Stop the DN
- Replace the disk
- Restart the DN



On Wed, Nov 28, 2012 at 9:14 AM, Mark Kerzner <ma...@shmsoft.com>wrote:

> Hi,
>
> can I remove one hard drive from a slave but tell Hadoop not to replicate
> missing blocks for a few minutes, because I will return it back? Or will
> this not work at all, and will Hadoop continue replicating, since I removed
> blocks, even for a short time?
>
> Thank you. Sincerely,
> Mark
>