You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "zhangshuyan0 (via GitHub)" <gi...@apache.org> on 2023/05/05 15:06:11 UTC

[GitHub] [hadoop] zhangshuyan0 commented on pull request #5622: HDFS-16999. Fix wrong use of processFirstBlockReport().

zhangshuyan0 commented on PR #5622:
URL: https://github.com/apache/hadoop/pull/5622#issuecomment-1536398861

@Hexiaoqiao Thanks for your review. I think missing blocks is a more serious problem than performance loss. If the first block report is processed using `processFirstBlockReport` when datanode restart, the incorrect metadata in Namenode memory will not be fixed until the next block report. In my opinion, the processing of the first block report after datanode restart should be the same as the processing of subsequent block reports.
Here are the answers to your questions.
a. `processFirstBlockReport` is to speed up the startup of NameNode. If additional logic is added to it, it will prolong the time that NameNode is in safemode.
b. A new datanode has no replicas (or very few replicas), and its `reports`(param of `blockReport` RPC) and `DatanodeStorageInfo#BlockIterator` almost have no content. So this PR has no effect on adding new datanodes.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org