You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/04/29 09:01:48 UTC

[GitHub] [ozone] sodonnel commented on a diff in pull request #3329: HDDS-6567. Store datanode command queue counts from heartbeat in DatanodeInfo in SCM

sodonnel commented on code in PR #3329:
URL: https://github.com/apache/ozone/pull/3329#discussion_r861612693


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeInfo.java:
##########
@@ -49,6 +57,7 @@ public class DatanodeInfo extends DatanodeDetails {
   private List<StorageReportProto> storageReports;
   private List<MetadataStorageReportProto> metadataStorageReports;
   private LayoutVersionProto lastKnownLayoutVersion;

Review Comment:
   NodeStateManager.checkNodesHealth is what notices the lost heartbeats and triggers events based on that.
   
   The DeadNodeHandler is triggered when the node goes dead (there is also a StaleNodeHandler), and clears out its pipelines etc. Perhaps we should reset the command counts when this happens, or perhaps it is valid to leave them as the last known value. The datanodeInfo object is not removed AFAIK, as it holds the DN service state (in_service, decommissioning, healthy, stale, dead etc).  If the DN comes back, it will be reset by the heartbeat processing. If it never comes back, the datanodedetails and datanodeinfo stick around in SCM until it is restarted.
   
   I am not sure if the command counts remaining is a big issue, as we should avoid scheduling commands on dead (and maybe stale) nodes anyway. Eg before scheduling a command for a node, need to check it is HEALTHY, as otherwise the commands will be queued in SCM and never taken by a DN. If something in SCM keeps scheduling commands for dead nodes, it will slowly fill up the SCM memory on the command queue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org