You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/10/28 17:35:41 UTC

[GitHub] [ozone] aswinshakil opened a new pull request, #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

aswinshakil opened a new pull request, #3909:
URL: https://github.com/apache/ozone/pull/3909

   ## What changes were proposed in this pull request?
   
   To add relevant EC metrics to the Replication Manager.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6896
   
   ## How was this patch tested?
   
   The patch was tested with added unit tests. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3909:
URL: https://github.com/apache/ozone/pull/3909#discussion_r1010733984


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java:
##########
@@ -163,6 +176,9 @@ public void removeExpiredEntries(long expiryMilliSeconds) {
           if (op.getScheduledEpochMillis() + expiryMilliSeconds
               < clock.millis()) {
             iterator.remove();
+            pendingOpCount.put(op.getOpType(),
+                pendingOpCount.get(op.getOpType()) - 1);
+            updateMetrics(op.getOpType());

Review Comment:
   Maybe rename this to updateTimeoutMetrics.
   
   Also, is it worth passing the full Op in so we get the replica index in the updateMetrics and can apply the same `if replicaIndex > 0` logic as it other places?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3909:
URL: https://github.com/apache/ozone/pull/3909#discussion_r1010730408


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java:
##########
@@ -47,6 +49,9 @@ public class ContainerReplicaPendingOps {
   private final ConcurrentHashMap<ContainerID, List<ContainerReplicaOp>>
       pendingOps = new ConcurrentHashMap<>();
   private final Striped<ReadWriteLock> stripedLock = Striped.readWriteLock(64);
+  private Map<ContainerReplicaOp.PendingOpType, Long>
+      pendingOpCount = new HashMap<>();

Review Comment:
   I think this map could be updated concurrently. The lock in this class is a striped lock, so it is possible for multiple "add" or "complete" calls to run concurrently provided they are against different containerIDs.
   
   The PendingOps may is a ConcurrentHashMap for this reason, but even if pendingOpCount was a ConcurrentHashMap, I think the counter saved against the key could suffer from lost updates.
   
   If you pre-populate pendingOps with all the available opTypes, so the map is baiscally readonly, and make the value be an atomicLong, rather than a long, I think it would avoid any problems.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] aswinshakil commented on a diff in pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
aswinshakil commented on code in PR #3909:
URL: https://github.com/apache/ozone/pull/3909#discussion_r1010761718


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/OverReplicatedProcessor.java:
##########
@@ -114,6 +114,9 @@ private void adjustPendingOps(ContainerID containerID, SCMCommand<?> cmd,
       DeleteContainerCommand rcc = (DeleteContainerCommand) cmd;
       pendingOps.scheduleDeleteReplica(containerID, targetDatanode,
           rcc.getReplicaIndex());
+      if (rcc.getReplicaIndex() > 0) {

Review Comment:
   Initially, I tried adding only EC Metrics and left RATIS Metrics out of this as it doesn't follow this code path right now. But I can add it to make it future-proof. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] aswinshakil commented on a diff in pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
aswinshakil commented on code in PR #3909:
URL: https://github.com/apache/ozone/pull/3909#discussion_r1010759905


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/UnderReplicatedProcessor.java:
##########
@@ -119,12 +119,14 @@ private void adjustPendingOps(ContainerID containerID, SCMCommand<?> cmd,
       for (int i = 0; i < targetIndexes.length; i++) {
         pendingOps.scheduleAddReplica(containerID, targets.get(i),
             targetIndexes[i]);
+        replicationManager.getMetrics().incrEcReconstructionCmdsSentTotal();
       }
     } else if (cmd.getType() == StorageContainerDatanodeProtocolProtos
         .SCMCommandProto.Type.replicateContainerCommand) {
       ReplicateContainerCommand rcc = (ReplicateContainerCommand) cmd;
       pendingOps.scheduleAddReplica(
           containerID, targetDatanode, rcc.getReplicaIndex());
+      replicationManager.getMetrics().incrEcReplicationCmdsSentTotal();

Review Comment:
   Makes sense. Will update the PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3909:
URL: https://github.com/apache/ozone/pull/3909#discussion_r1010715598


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java:
##########
@@ -108,7 +113,11 @@ public void scheduleDeleteReplica(ContainerID containerID,
    */
   public boolean completeAddReplica(ContainerID containerID,
       DatanodeDetails target, int replicaIndex) {
-    return completeOp(ADD, containerID, target, replicaIndex);
+    boolean completed = completeOp(ADD, containerID, target, replicaIndex);
+    if (isMetricsNotNull() && completed && replicaIndex > 0) {

Review Comment:
   Replicas with an index of zero are Ratis. Indexes greater than zero indicate Erasure Coding replicas.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3909:
URL: https://github.com/apache/ozone/pull/3909#discussion_r1010753695


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/UnderReplicatedProcessor.java:
##########
@@ -119,12 +119,14 @@ private void adjustPendingOps(ContainerID containerID, SCMCommand<?> cmd,
       for (int i = 0; i < targetIndexes.length; i++) {
         pendingOps.scheduleAddReplica(containerID, targets.get(i),
             targetIndexes[i]);
+        replicationManager.getMetrics().incrEcReconstructionCmdsSentTotal();
       }
     } else if (cmd.getType() == StorageContainerDatanodeProtocolProtos
         .SCMCommandProto.Type.replicateContainerCommand) {
       ReplicateContainerCommand rcc = (ReplicateContainerCommand) cmd;
       pendingOps.scheduleAddReplica(
           containerID, targetDatanode, rcc.getReplicaIndex());
+      replicationManager.getMetrics().incrEcReplicationCmdsSentTotal();

Review Comment:
   Might need to the `if replicaIndex > 0 ... else if replicaIndex == 0` logic here, but not at line 122 above, as it is only EC code that can get there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] DaveTeng0 commented on a diff in pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
DaveTeng0 commented on code in PR #3909:
URL: https://github.com/apache/ozone/pull/3909#discussion_r1011090712


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java:
##########
@@ -108,7 +113,11 @@ public void scheduleDeleteReplica(ContainerID containerID,
    */
   public boolean completeAddReplica(ContainerID containerID,
       DatanodeDetails target, int replicaIndex) {
-    return completeOp(ADD, containerID, target, replicaIndex);
+    boolean completed = completeOp(ADD, containerID, target, replicaIndex);
+    if (isMetricsNotNull() && completed && replicaIndex > 0) {

Review Comment:
   Thanks Stephen!!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] DaveTeng0 commented on a diff in pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
DaveTeng0 commented on code in PR #3909:
URL: https://github.com/apache/ozone/pull/3909#discussion_r1010629383


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ContainerReplicaPendingOps.java:
##########
@@ -108,7 +113,11 @@ public void scheduleDeleteReplica(ContainerID containerID,
    */
   public boolean completeAddReplica(ContainerID containerID,
       DatanodeDetails target, int replicaIndex) {
-    return completeOp(ADD, containerID, target, replicaIndex);
+    boolean completed = completeOp(ADD, containerID, target, replicaIndex);
+    if (isMetricsNotNull() && completed && replicaIndex > 0) {

Review Comment:
   just for myself learning~ what does it mean if replicaIndex == 0?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3909:
URL: https://github.com/apache/ozone/pull/3909#discussion_r1010751386


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/OverReplicatedProcessor.java:
##########
@@ -114,6 +114,9 @@ private void adjustPendingOps(ContainerID containerID, SCMCommand<?> cmd,
       DeleteContainerCommand rcc = (DeleteContainerCommand) cmd;
       pendingOps.scheduleDeleteReplica(containerID, targetDatanode,
           rcc.getReplicaIndex());
+      if (rcc.getReplicaIndex() > 0) {

Review Comment:
   Inside the new RM code, we probably should have the `else rcc.getReplicaIndex() == 0`. Now, only EC containers will land here, but when we finally turn off the LegacyRM, Ratis containers will land here too. It would make sense to be ready for that. That said, there will be more changes needed to process Ratis containers in this code path, but not in this class I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #3909: HDDS-6896. EC: ReplicationManager - Add relevant metrics to the various ReplicationManager classes

Posted by GitBox <gi...@apache.org>.
sodonnel merged PR #3909:
URL: https://github.com/apache/ozone/pull/3909


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org