You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by "errose28 (via GitHub)" <gi...@apache.org> on 2023/09/07 20:39:15 UTC

[GitHub] [ozone] errose28 opened a new pull request, #5257: HDDS-9254. Legacy replication manager uses mismatched replicas as replication sources

errose28 opened a new pull request, #5257:
URL: https://github.com/apache/ozone/pull/5257

   ## What changes were proposed in this pull request?
   
   Consider a case where SCM has a CLOSED container and the replica states are CLOSED, CLOSED, QUASI. In the pull replication model, RM will send all 3 of these replicas to the datanode to use as replication sources. The DN will do a random shuffle and pick one to replicate. If it chooses the QUASI-CLOSED replica, the next iteration of RM will see replicas CLOSED, CLOSED, QUASI, QUASI. RM will issue the same command since the CLOSED replicas are still under replicated, but now the odds of the DN's random shuffle choosing a quasi closed replica are increased. This process can repeat until the cluster is filled with a quasi-closed replica on each datanode. This can bring the cluster into the stuck state described in [HDDS-8536](https://issues.apache.org/jira/browse/HDDS-8536).
   
   ## What is the link to the Apache JIRA
   
   HDDS-9254
   
   ## How was this patch tested?
   
   Unit tests added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel merged pull request #5257: HDDS-9254. Legacy replication manager uses mismatched replicas as replication sources

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel merged PR #5257:
URL: https://github.com/apache/ozone/pull/5257


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on pull request #5257: HDDS-9254. Legacy replication manager uses mismatched replicas as replication sources

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on PR #5257:
URL: https://github.com/apache/ozone/pull/5257#issuecomment-1720298541

   @sodonnel I just resolved the merge conflict.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #5257: HDDS-9254. Legacy replication manager uses mismatched replicas as replication sources

Posted by "sodonnel (via GitHub)" <gi...@apache.org>.
sodonnel commented on code in PR #5257:
URL: https://github.com/apache/ozone/pull/5257#discussion_r1319679689


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java:
##########
@@ -1138,8 +1138,12 @@ private void handleUnderReplicatedHealthy(final ContainerInfo container,
           container.containerID());
     }
 
+    State matchingReplicaState = State.CLOSED;
+    if (container.getState() == LifeCycleState.QUASI_CLOSED) {
+      matchingReplicaState = State.QUASI_CLOSED;

Review Comment:
   If we are in the container=QUASI_CLOSED branch, then do we need to ensure we only replicate the largest sequenceID quasi_closed replica? There isn't any logic in `getReplicationSources` to do that right now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] errose28 commented on a diff in pull request #5257: HDDS-9254. Legacy replication manager uses mismatched replicas as replication sources

Posted by "errose28 (via GitHub)" <gi...@apache.org>.
errose28 commented on code in PR #5257:
URL: https://github.com/apache/ozone/pull/5257#discussion_r1320424595


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/LegacyReplicationManager.java:
##########
@@ -1138,8 +1138,12 @@ private void handleUnderReplicatedHealthy(final ContainerInfo container,
           container.containerID());
     }
 
+    State matchingReplicaState = State.CLOSED;
+    if (container.getState() == LifeCycleState.QUASI_CLOSED) {
+      matchingReplicaState = State.QUASI_CLOSED;

Review Comment:
   This is not required. I added a comment to this part of the code to explain why, and added a unit test to verify this behavior. Let me know if the comment makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org