You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "Xiaoqiao He (Jira)" <ji...@apache.org> on 2021/08/03 16:26:00 UTC

[jira] [Resolved] (HDFS-16146) All three replicas are lost due to not adding a new DataNode in time

     [ https://issues.apache.org/jira/browse/HDFS-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiaoqiao He resolved HDFS-16146.
--------------------------------
    Fix Version/s: 3.4.0
     Hadoop Flags: Reviewed
       Resolution: Fixed

Committed to trunk. Thanks @zhangshuyan0 for your works. Thanks @jojochuang for your reviews.

> All three replicas are lost due to not adding a new DataNode in time
> --------------------------------------------------------------------
>
>                 Key: HDFS-16146
>                 URL: https://issues.apache.org/jira/browse/HDFS-16146
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs
>            Reporter: Shuyan Zhang
>            Assignee: Shuyan Zhang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We have a three-replica file, and all replicas of a block are lost when the default datanode replacement strategy is used. It happened like this:
> 1. addBlock() applies for a new block and successfully connects three datanodes (dn1, dn2 and dn3) to build a pipeline;
> 2. Write data;
> 3. dn1 has an error and was kicked out. At this time, the remaining datanodes in the pipeline > 1, according to the replacement strategy, there is no need to add a new datanode;
> 4. After writing is completed, enter PIPELINE_CLOSE;
> 5. dn2 has an error and was kicked out. But because it is already in the close phase, addDatanode2ExistingPipeline() decides to hand over the task of transfering the replica to the NameNode. At this time, there is only one datanode left in the pipeline;
> 6. dn3 error, all replicas are lost.
> If we add a new datanode in step 5, we can avoid losing all replicas in this case. I think error in PIPELINE_CLOSE and error in DATA_STREAMING have the same risk of losing replicas,  we should not skip adding a new datanode during PIPELINE_CLOSE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org