You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/07/15 15:47:01 UTC

[GitHub] [ozone] sodonnel commented on a change in pull request #2382: HDDS-5401. Add more metrics to ReplicationManager to help monitor replication progress.

sodonnel commented on a change in pull request #2382:
URL: https://github.com/apache/ozone/pull/2382#discussion_r670586592



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -353,12 +352,22 @@ private void processContainer(ContainerInfo container) {
          */
         updateInflightAction(container, inflightReplication,
             action -> replicas.stream()
-                .anyMatch(r -> r.getDatanodeDetails().equals(action.datanode)));
+                .anyMatch(r -> r.getDatanodeDetails().equals(action.datanode)),
+            ()-> metrics.incrNumReplicateCmdsTimeout(),
+            () -> {
+              metrics.incrNumReplicateCmdsCompleted();
+              metrics.incrNumReplicateBytesCompleted(container.getUsedBytes());
+            });
 
         updateInflightAction(container, inflightDeletion,
             action -> replicas.stream()
                 .noneMatch(r ->
-                    r.getDatanodeDetails().equals(action.datanode)));
+                    r.getDatanodeDetails().equals(action.datanode)),
+            () -> metrics.incrNumDeleteCmdsTimeout(),
+            () -> metrics.incrNumDeleteCmdsCompleted());
+
+        metrics.setInflightReplication(inflightReplication.size());
+        metrics.setInflightDeletion(inflightDeletion.size());

Review comment:
       I can think of two options:
   
   1. Make the new Metrics class an inner class of ReplicationManager, so it can access RM's instance varaibles directly.
   
   2. Pass the replicationManager instance to the new Metrics class and add a getter for inflight Replication / Deletion. Then call those getters when the metrics are requested.
   
   Option 2 is probably better, but there may be other ways to do this too. 
   
   I think there may be a small bug here, due to where you are setting the inFlightRep / Deletion. You have sent it after removing any completed pending items, but before the container is processed for over / under replication. On the last container to be processed, it may be under replicated and hence it would be missed from the metrics until the next run of the RM.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org