You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Koert Kuipers <ko...@tresata.com> on 2013/10/27 19:42:12 UTC

question about hdfs data loss risk

i have a cluster with replication factor 2. wit the following events in
this order, do i have data loss?

1) shut down a datanode for maintenance unrelated to hdfs. so now some
blocks only have replication factor 1

2) a disk dies in another datanode. let's assume some blocks now have
replication factor 0 since they were on this disk that died and on the
datanode that is shut down for maintenance.

3) bring back up the datanode that was down for maintenance.

what i am worried about is that the namenode gives up on a block with
replication factor 0 after steps 1) and 2) and considers it lost, and by
the time the replica will come back on again in step 3) the namenode no
longer considers the block to be existent.

thanks! koert

Re: question about hdfs data loss risk

Posted by Koert Kuipers <ko...@tresata.com>.
thanks thats helpful


On Sun, Oct 27, 2013 at 6:03 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> Hi,
>
> 1) You may want to read about proper node decommissioning.
>
> http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
>
> 2) NameNode will replicate blocks when they do not comply with their
> replication factor.
>
> 3) NameNode does not give up.
>
> 4) Yes, ultimately, if you have a replication factor of n and the n
> replicas are lost at the same time, well, the data is truly lost. But
> that's not specific to Hadoop.
>
> Bertrand
>
>
>
>
> On Sun, Oct 27, 2013 at 7:42 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i have a cluster with replication factor 2. wit the following events in
>> this order, do i have data loss?
>>
>> 1) shut down a datanode for maintenance unrelated to hdfs. so now some
>> blocks only have replication factor 1
>>
>> 2) a disk dies in another datanode. let's assume some blocks now have
>> replication factor 0 since they were on this disk that died and on the
>> datanode that is shut down for maintenance.
>>
>> 3) bring back up the datanode that was down for maintenance.
>>
>> what i am worried about is that the namenode gives up on a block with
>> replication factor 0 after steps 1) and 2) and considers it lost, and by
>> the time the replica will come back on again in step 3) the namenode no
>> longer considers the block to be existent.
>>
>> thanks! koert
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: question about hdfs data loss risk

Posted by Koert Kuipers <ko...@tresata.com>.
thanks thats helpful


On Sun, Oct 27, 2013 at 6:03 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> Hi,
>
> 1) You may want to read about proper node decommissioning.
>
> http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
>
> 2) NameNode will replicate blocks when they do not comply with their
> replication factor.
>
> 3) NameNode does not give up.
>
> 4) Yes, ultimately, if you have a replication factor of n and the n
> replicas are lost at the same time, well, the data is truly lost. But
> that's not specific to Hadoop.
>
> Bertrand
>
>
>
>
> On Sun, Oct 27, 2013 at 7:42 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i have a cluster with replication factor 2. wit the following events in
>> this order, do i have data loss?
>>
>> 1) shut down a datanode for maintenance unrelated to hdfs. so now some
>> blocks only have replication factor 1
>>
>> 2) a disk dies in another datanode. let's assume some blocks now have
>> replication factor 0 since they were on this disk that died and on the
>> datanode that is shut down for maintenance.
>>
>> 3) bring back up the datanode that was down for maintenance.
>>
>> what i am worried about is that the namenode gives up on a block with
>> replication factor 0 after steps 1) and 2) and considers it lost, and by
>> the time the replica will come back on again in step 3) the namenode no
>> longer considers the block to be existent.
>>
>> thanks! koert
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: question about hdfs data loss risk

Posted by Koert Kuipers <ko...@tresata.com>.
thanks thats helpful


On Sun, Oct 27, 2013 at 6:03 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> Hi,
>
> 1) You may want to read about proper node decommissioning.
>
> http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
>
> 2) NameNode will replicate blocks when they do not comply with their
> replication factor.
>
> 3) NameNode does not give up.
>
> 4) Yes, ultimately, if you have a replication factor of n and the n
> replicas are lost at the same time, well, the data is truly lost. But
> that's not specific to Hadoop.
>
> Bertrand
>
>
>
>
> On Sun, Oct 27, 2013 at 7:42 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i have a cluster with replication factor 2. wit the following events in
>> this order, do i have data loss?
>>
>> 1) shut down a datanode for maintenance unrelated to hdfs. so now some
>> blocks only have replication factor 1
>>
>> 2) a disk dies in another datanode. let's assume some blocks now have
>> replication factor 0 since they were on this disk that died and on the
>> datanode that is shut down for maintenance.
>>
>> 3) bring back up the datanode that was down for maintenance.
>>
>> what i am worried about is that the namenode gives up on a block with
>> replication factor 0 after steps 1) and 2) and considers it lost, and by
>> the time the replica will come back on again in step 3) the namenode no
>> longer considers the block to be existent.
>>
>> thanks! koert
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: question about hdfs data loss risk

Posted by Koert Kuipers <ko...@tresata.com>.
thanks thats helpful


On Sun, Oct 27, 2013 at 6:03 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> Hi,
>
> 1) You may want to read about proper node decommissioning.
>
> http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
>
> 2) NameNode will replicate blocks when they do not comply with their
> replication factor.
>
> 3) NameNode does not give up.
>
> 4) Yes, ultimately, if you have a replication factor of n and the n
> replicas are lost at the same time, well, the data is truly lost. But
> that's not specific to Hadoop.
>
> Bertrand
>
>
>
>
> On Sun, Oct 27, 2013 at 7:42 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i have a cluster with replication factor 2. wit the following events in
>> this order, do i have data loss?
>>
>> 1) shut down a datanode for maintenance unrelated to hdfs. so now some
>> blocks only have replication factor 1
>>
>> 2) a disk dies in another datanode. let's assume some blocks now have
>> replication factor 0 since they were on this disk that died and on the
>> datanode that is shut down for maintenance.
>>
>> 3) bring back up the datanode that was down for maintenance.
>>
>> what i am worried about is that the namenode gives up on a block with
>> replication factor 0 after steps 1) and 2) and considers it lost, and by
>> the time the replica will come back on again in step 3) the namenode no
>> longer considers the block to be existent.
>>
>> thanks! koert
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: question about hdfs data loss risk

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi,

1) You may want to read about proper node decommissioning.
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F

2) NameNode will replicate blocks when they do not comply with their
replication factor.

3) NameNode does not give up.

4) Yes, ultimately, if you have a replication factor of n and the n
replicas are lost at the same time, well, the data is truly lost. But
that's not specific to Hadoop.

Bertrand




On Sun, Oct 27, 2013 at 7:42 PM, Koert Kuipers <ko...@tresata.com> wrote:

> i have a cluster with replication factor 2. wit the following events in
> this order, do i have data loss?
>
> 1) shut down a datanode for maintenance unrelated to hdfs. so now some
> blocks only have replication factor 1
>
> 2) a disk dies in another datanode. let's assume some blocks now have
> replication factor 0 since they were on this disk that died and on the
> datanode that is shut down for maintenance.
>
> 3) bring back up the datanode that was down for maintenance.
>
> what i am worried about is that the namenode gives up on a block with
> replication factor 0 after steps 1) and 2) and considers it lost, and by
> the time the replica will come back on again in step 3) the namenode no
> longer considers the block to be existent.
>
> thanks! koert
>
>


-- 
Bertrand Dechoux

Re: question about hdfs data loss risk

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi,

1) You may want to read about proper node decommissioning.
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F

2) NameNode will replicate blocks when they do not comply with their
replication factor.

3) NameNode does not give up.

4) Yes, ultimately, if you have a replication factor of n and the n
replicas are lost at the same time, well, the data is truly lost. But
that's not specific to Hadoop.

Bertrand




On Sun, Oct 27, 2013 at 7:42 PM, Koert Kuipers <ko...@tresata.com> wrote:

> i have a cluster with replication factor 2. wit the following events in
> this order, do i have data loss?
>
> 1) shut down a datanode for maintenance unrelated to hdfs. so now some
> blocks only have replication factor 1
>
> 2) a disk dies in another datanode. let's assume some blocks now have
> replication factor 0 since they were on this disk that died and on the
> datanode that is shut down for maintenance.
>
> 3) bring back up the datanode that was down for maintenance.
>
> what i am worried about is that the namenode gives up on a block with
> replication factor 0 after steps 1) and 2) and considers it lost, and by
> the time the replica will come back on again in step 3) the namenode no
> longer considers the block to be existent.
>
> thanks! koert
>
>


-- 
Bertrand Dechoux

Re: question about hdfs data loss risk

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi,

1) You may want to read about proper node decommissioning.
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F

2) NameNode will replicate blocks when they do not comply with their
replication factor.

3) NameNode does not give up.

4) Yes, ultimately, if you have a replication factor of n and the n
replicas are lost at the same time, well, the data is truly lost. But
that's not specific to Hadoop.

Bertrand




On Sun, Oct 27, 2013 at 7:42 PM, Koert Kuipers <ko...@tresata.com> wrote:

> i have a cluster with replication factor 2. wit the following events in
> this order, do i have data loss?
>
> 1) shut down a datanode for maintenance unrelated to hdfs. so now some
> blocks only have replication factor 1
>
> 2) a disk dies in another datanode. let's assume some blocks now have
> replication factor 0 since they were on this disk that died and on the
> datanode that is shut down for maintenance.
>
> 3) bring back up the datanode that was down for maintenance.
>
> what i am worried about is that the namenode gives up on a block with
> replication factor 0 after steps 1) and 2) and considers it lost, and by
> the time the replica will come back on again in step 3) the namenode no
> longer considers the block to be existent.
>
> thanks! koert
>
>


-- 
Bertrand Dechoux

Re: question about hdfs data loss risk

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi,

1) You may want to read about proper node decommissioning.
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F

2) NameNode will replicate blocks when they do not comply with their
replication factor.

3) NameNode does not give up.

4) Yes, ultimately, if you have a replication factor of n and the n
replicas are lost at the same time, well, the data is truly lost. But
that's not specific to Hadoop.

Bertrand




On Sun, Oct 27, 2013 at 7:42 PM, Koert Kuipers <ko...@tresata.com> wrote:

> i have a cluster with replication factor 2. wit the following events in
> this order, do i have data loss?
>
> 1) shut down a datanode for maintenance unrelated to hdfs. so now some
> blocks only have replication factor 1
>
> 2) a disk dies in another datanode. let's assume some blocks now have
> replication factor 0 since they were on this disk that died and on the
> datanode that is shut down for maintenance.
>
> 3) bring back up the datanode that was down for maintenance.
>
> what i am worried about is that the namenode gives up on a block with
> replication factor 0 after steps 1) and 2) and considers it lost, and by
> the time the replica will come back on again in step 3) the namenode no
> longer considers the block to be existent.
>
> thanks! koert
>
>


-- 
Bertrand Dechoux