You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/08/23 09:18:25 UTC

[GitHub] [ozone] JacksonYao287 opened a new pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

JacksonYao287 opened a new pull request #2559:
URL: https://github.com/apache/ozone/pull/2559


   
   ## What changes were proposed in this pull request?
   
   The selected container replication dest DN is one of source DNs
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5643
   
   ## How was this patch tested?
   no need
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on a change in pull request #2559:
URL: https://github.com/apache/ozone/pull/2559#discussion_r795037023



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1116,19 +1116,37 @@ private void handleUnderReplicatedContainer(final ContainerInfo container,
           .stream()
           .map(action -> action.datanode)
           .collect(Collectors.toList());
-      final List<DatanodeDetails> source = replicas.stream()
-          .filter(r ->
-              r.getState() == State.QUASI_CLOSED ||
-              r.getState() == State.CLOSED)
-          // Exclude stale and dead nodes. This is particularly important for
-          // maintenance nodes, as the replicas will remain present in the
-          // container manager, even when they go dead.
-          .filter(r ->
-              getNodeStatus(r.getDatanodeDetails()).isHealthy())
-          .filter(r -> !deletionInFlight.contains(r.getDatanodeDetails()))
-          .sorted((r1, r2) -> r2.getSequenceId().compareTo(r1.getSequenceId()))
-          .map(ContainerReplica::getDatanodeDetails)
-          .collect(Collectors.toList());
+      final List<DatanodeDetails> source;
+      final List<DatanodeDetails> excludeList;
+
+      //there may be a scenario that a replica is in the replica set
+      //when `source` is being generated, and then removed from replica set
+      //somehow(maybe by containerReportHandler#processMissingReplicas).
+      //when `excludeList` is being generated, the replica is not included.
+      //so this replica will not be definitely evicted from `selectedDatanodes`
+      //this will lead to a scenario that we send a replication command
+      //to a datanode to replicate a container replica from itself.
+      synchronized (replicas) {

Review comment:
       thanks @sodonnel for the review , the suggestion looks good, i will do this change




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2559:
URL: https://github.com/apache/ozone/pull/2559#issuecomment-903710049


   @ChenSammi  please take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 closed pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
JacksonYao287 closed pull request #2559:
URL: https://github.com/apache/ozone/pull/2559


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a change in pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on a change in pull request #2559:
URL: https://github.com/apache/ozone/pull/2559#discussion_r790943727



##########
File path: hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ReplicationManager.java
##########
@@ -1116,19 +1116,37 @@ private void handleUnderReplicatedContainer(final ContainerInfo container,
           .stream()
           .map(action -> action.datanode)
           .collect(Collectors.toList());
-      final List<DatanodeDetails> source = replicas.stream()
-          .filter(r ->
-              r.getState() == State.QUASI_CLOSED ||
-              r.getState() == State.CLOSED)
-          // Exclude stale and dead nodes. This is particularly important for
-          // maintenance nodes, as the replicas will remain present in the
-          // container manager, even when they go dead.
-          .filter(r ->
-              getNodeStatus(r.getDatanodeDetails()).isHealthy())
-          .filter(r -> !deletionInFlight.contains(r.getDatanodeDetails()))
-          .sorted((r1, r2) -> r2.getSequenceId().compareTo(r1.getSequenceId()))
-          .map(ContainerReplica::getDatanodeDetails)
-          .collect(Collectors.toList());
+      final List<DatanodeDetails> source;
+      final List<DatanodeDetails> excludeList;
+
+      //there may be a scenario that a replica is in the replica set
+      //when `source` is being generated, and then removed from replica set
+      //somehow(maybe by containerReportHandler#processMissingReplicas).
+      //when `excludeList` is being generated, the replica is not included.
+      //so this replica will not be definitely evicted from `selectedDatanodes`
+      //this will lead to a scenario that we send a replication command
+      //to a datanode to replicate a container replica from itself.
+      synchronized (replicas) {

Review comment:
       This synchronization worries me. We are somehow supposed to know to synchronize on this replica object from deep within the containerManager. There is a chance that other code areas could run into problems if this map is not consistent between when it is read and used.
   
   I wonder if we would be better returning a new isolated map from containerStateMap. Eg, it would become something like:
    
   ```
     public Set<ContainerReplica> getContainerReplicas(
         final ContainerID containerID) {
       Preconditions.checkNotNull(containerID);
       final Set<ContainerReplica> replicas = replicaMap.get(containerID);
       return replicas == null ? null : Collections.unmodifiableSet(new HashSet<>(replica));
     }
   ```
   
   Any callers to get the replicas have a set that does not change until they request the replicas again?
   
   If we don't want to change the inners of ContainerStateMap, we could do something similar inside ReplicationManager and form a new set from the returned set.
   
   It would better to change the source of the map though, otherwise another code area could run into a similar problem.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2559:
URL: https://github.com/apache/ozone/pull/2559#issuecomment-1042659331


   @sodonnel thanks for the work, i will take a look.
   will close this patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 edited a comment on pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
JacksonYao287 edited a comment on pull request #2559:
URL: https://github.com/apache/ozone/pull/2559#issuecomment-964962444


   @sodonnel @bshashikant @ChenSammi  could you please take a look at this ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on pull request #2559:
URL: https://github.com/apache/ozone/pull/2559#issuecomment-964962444


   @sodonnel @bshashikant @ChenSammi  can you take a look at this ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bshashikant commented on pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
bshashikant commented on pull request #2559:
URL: https://github.com/apache/ozone/pull/2559#issuecomment-932170695


   @nandakumar131 , can you check this one?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
sodonnel commented on pull request #2559:
URL: https://github.com/apache/ozone/pull/2559#issuecomment-1042654885


   @JacksonYao287 I have opened #3071 to fix this in the way I suggested here. Could you give that one a review please and it will resolve this one too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 closed pull request #2559: HDDS-5643. The selected container replication dest DN is one of source DNs

Posted by GitBox <gi...@apache.org>.
JacksonYao287 closed pull request #2559:
URL: https://github.com/apache/ozone/pull/2559


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org