You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Istvan Fajth (Jira)" <ji...@apache.org> on 2020/04/28 15:47:00 UTC

[jira] [Created] (HDFS-15304) Infinite loop between DN and NN at rare condition

Istvan Fajth created HDFS-15304:
-----------------------------------

             Summary: Infinite loop between DN and NN at rare condition
                 Key: HDFS-15304
                 URL: https://issues.apache.org/jira/browse/HDFS-15304
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Istvan Fajth


During the investigation lead to HDFS-15303, we have identified the following infinite loop between the DNs affected by the data directory layout problem:
- for a particular misplaced block, the VolumeScanner finds the block file, and realizes that it is not part of the block map
- the block is added to the block map
- at the next FBR the block is reported to the NN
- the NN finds that the block should have been deleted already, as the corresponding inode was already deleted
- NN issues the deletion of the block on the DataNode
- DataNode runs the delete routine, but that fails to delete anything silently as it is trying to delete the block from the wrong internal subdir that is calculated based on the block id with a different algorythm.
- block is removed from the blockmap
- VolumeScanner finds the block again, and adds it back to the blockmap

The problem can happen only when there is a mixed layout on the DataNode due to some issue, and there are blocks in a subdir correct according to Hadoop2 format, but the DN is already hadoop3, or vice versa if the problematic layout born during a rollback. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org