You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/04/29 05:19:53 UTC

[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3329: HDDS-6567. Store datanode command queue counts from heartbeat in DatanodeInfo in SCM

umamaheswararao commented on code in PR #3329:
URL: https://github.com/apache/ozone/pull/3329#discussion_r861393180


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeInfo.java:
##########
@@ -49,6 +57,7 @@ public class DatanodeInfo extends DatanodeDetails {
   private List<StorageReportProto> storageReports;
   private List<MetadataStorageReportProto> metadataStorageReports;
   private LayoutVersionProto lastKnownLayoutVersion;

Review Comment:
   @sodonnel I am wondering what happens to this DatanodeInfo when it expire due to lack of HB from that node? is this object around or destroyed. I am trying to figure out that particular code part. I have not found so far. Please point me where we remove this nodeInfo object when node expires. Thanks



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DatanodeInfo.java:
##########
@@ -272,6 +282,57 @@ public void setNodeStatus(NodeStatus newNodeStatus) {
     }
   }
 
+  /**
+   * Set the current command counts for this datanode, as reported in the last
+   * heartbeat.
+   * @param cmds Proto message containing a list of command count pairs.
+   */
+  public void setCommandCounts(CommandQueueReportProto cmds) {
+    try {
+      int count = cmds.getCommandCount();
+      lock.writeLock().lock();
+      for (int i = 0; i < count; i++) {
+        SCMCommandProto.Type command = cmds.getCommand(i);
+        if (command == SCMCommandProto.Type.unknownScmCommand) {
+          LOG.warn("Unknown SCM Command received from {} in the "
+              + "heartbeat. SCM and the DN may not be at the same version.",
+              this);
+          continue;
+        }
+        int cmdCount = cmds.getCount(i);
+        if (cmdCount < 0) {
+          LOG.warn("Command count of {} from {} should be greater than zero. " +
+              "Setting it to zero", cmdCount, this);
+          cmdCount = 0;
+        }
+        commandCounts.put(command, cmdCount);
+      }
+    } finally {
+      lock.writeLock().unlock();
+    }
+  }
+
+  /**
+   * Retrieve the number of queued commands of the given type, as reported by
+   * the datanode at the last heartbeat.
+   * @param cmd The command for which to receive the queued command count

Review Comment:
   if it's -1, we should wait to assign any tasks to this node as we don;t know the actual state?



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeManager.java:
##########
@@ -296,6 +298,24 @@ void processNodeReport(DatanodeDetails datanodeDetails,
   void processLayoutVersionReport(DatanodeDetails datanodeDetails,
                          LayoutVersionProto layoutReport);
 
+  /**
+   * Process the Command Queue Report sent from datanodes as part of the
+   * heartbeat message.
+   * @param datanodeDetails
+   * @param commandReport
+   */
+  void processNodeCommandQueueReport(DatanodeDetails datanodeDetails,
+      CommandQueueReportProto commandReport);
+
+  /**
+   * Get the number of commands of the given type queued on the datanode at the
+   * last heartbeat. If the Datanode has not reported information for the given
+   * command type, -1 wil be returned.

Review Comment:
   nit: wil -> will



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java:
##########
@@ -621,6 +623,48 @@ public void processLayoutVersionReport(DatanodeDetails datanodeDetails,
     }
   }
 
+  /**
+   * Process Command Queue Reports from the Datanode Heartbeat.
+   *
+   * @param datanodeDetails
+   * @param commandQueueReportProto
+   */
+  @Override
+  public void processNodeCommandQueueReport(DatanodeDetails datanodeDetails,
+      CommandQueueReportProto commandQueueReportProto) {
+    LOG.debug("Processing Command Queue Report from [datanode={}]",
+        datanodeDetails.getHostName());
+    if (LOG.isTraceEnabled()) {
+      LOG.trace("Command Queue Report is received from [datanode={}]: " +
+          "<json>{}</json>", datanodeDetails.getHostName(),
+          commandQueueReportProto.toString().replaceAll("\n", "\\\\n"));
+    }
+    try {
+      DatanodeInfo datanodeInfo = nodeStateManager.getNode(datanodeDetails);
+      if (commandQueueReportProto != null) {
+        datanodeInfo.setCommandCounts(commandQueueReportProto);
+        metrics.incNumNodeCommandQueueReportProcessed();
+      }
+    } catch (NodeNotFoundException e) {

Review Comment:
   Is this metric a "report failed"? or just unknown node report? I am not sure about the definition of this metric here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org