You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Nathan Roberts (JIRA)" <ji...@apache.org> on 2017/05/04 17:34:04 UTC

[jira] [Created] (HDFS-11755) Underconstruction blocks can be considered missing

Nathan Roberts created HDFS-11755:
-------------------------------------

             Summary: Underconstruction blocks can be considered missing
                 Key: HDFS-11755
                 URL: https://issues.apache.org/jira/browse/HDFS-11755
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 3.0.0-alpha2, 2.8.1
            Reporter: Nathan Roberts
            Assignee: Nathan Roberts


Following sequence of events can lead to a block underconstruction being considered missing.

- pipeline of 3 DNs, DN1->DN2->DN3
- DN3 has a failing disk so some updates take a long time
- Client writes entire block and is waiting for final ack
- DN1, DN2 and DN3 have all received the block 
- DN1 is waiting for ACK from DN2 who is waiting for ACK from DN3
- DN3 is having trouble finalizing the block due to the failing drive. It does eventually succeed but it is VERY slow at doing so. 
- DN2 times out waiting for DN3 and tears down its pieces of the pipeline, so DN1 notices and does the same. Neither DN1 nor DN2 finalized the block.
- DN3 finally sends an IBR to the NN indicating the block has been received.
- Drive containing the block on DN3 fails enough that the DN takes it offline and notifies NN of failed volume
- NN removes DN3's replica from the triplets and then declares the block missing because there are no other replicas

Seems like we shouldn't consider uncompleted blocks for replication.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org