You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ZanderXu (via GitHub)" <gi...@apache.org> on 2023/05/05 03:35:31 UTC

[GitHub] [hadoop] ZanderXu commented on pull request #5583: HDFS-16987. [BugFix] MarkBlockAsCorrupt should not mark a replica as corrupted if the DN has a newest replica

ZanderXu commented on PR #5583:
URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1535657249

   @Hexiaoqiao @ayushtkn Master, after deep thinking, maybe we can only fix this problem when processAllPendingDNMessages, because namenode doesn't know whether this report is consistent with the actual replica storage information of the DataNode.
   
   **Case1: This report with small GS is postponed report, which is different from the actual replica of the datanode.**
   For example:
   
   - The actual replica of DN is: blk_1024_1002
   - The postponed report is: blk_1024_1001
   
   For this case, namenode can ignore this postponed report and doesn't mark it as a corrupted replica. 
   
   **Case2: This report with small GS is the newest report, which is same with the actual replica of the datanode.**
   For example:
   
   - The actual replica of DN is: blk_1024_1001
   - The report is: blk_1024_1001
   - The storages of this block in namenode already contains this DN
   
   For this case, namenode shouldn't ignore this report, and it should mark this replica as a corrupted replica.  Manually modifying block storage files on DataNode may cause this problem.
   
   
   At present,  namenode can only consider that each report is the newest report, and then modify the status of the block in the memory of namenode, because datanode reports the state  to NN through block report or blockReceiveAndDelete. 
   
   
   If we modify the logic of `markBlockAsCorrupt`, namenode will can not mark the replica as a corrupted replica for case2.
   If we modify the logic of `processAllPendingDNMessages`, the postponed message will be temporarily ignored for case 2, and active namenode will mark it as a corrupted replica in the next block report of corressponding DN.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org