You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/07/14 21:56:16 UTC

[GitHub] [ozone] sodonnel opened a new pull request, #3599: HDDS-6895. EC: ReplicationManager - Logic to process the under replicated queue and assign work to DNs

sodonnel opened a new pull request, #3599:
URL: https://github.com/apache/ozone/pull/3599

   ## What changes were proposed in this pull request?
   
   We need some sort of thread which picks work from the under / over replicated queue and assigns the work to DNs with capacity. The DNs will pick the work up on each heartbeat.
   
   This initial change does not consider DN capacity or any limits. It will process all under replicated EC containers and assign them to the DN.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-6895
   
   ## How was this patch tested?
   
   New unit tests and manual testing still to be completed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao merged pull request #3599: HDDS-6895. EC: ReplicationManager - Logic to process the under replicated queue and assign work to DNs

Posted by GitBox <gi...@apache.org>.
umamaheswararao merged PR #3599:
URL: https://github.com/apache/ozone/pull/3599


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #3599: HDDS-6895. EC: ReplicationManager - Logic to process the under replicated queue and assign work to DNs

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on code in PR #3599:
URL: https://github.com/apache/ozone/pull/3599#discussion_r923248584


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/protocol/commands/ReconstructECContainersCommand.java:
##########
@@ -47,13 +48,8 @@ public ReconstructECContainersCommand(long containerID,
       List<DatanodeDetailsAndReplicaIndex> sources,
       List<DatanodeDetails> targetDatanodes, byte[] missingContainerIndexes,
       ECReplicationConfig ecReplicationConfig) {
-    super();
-    this.containerID = containerID;
-    this.sources = sources;
-    this.targetDatanodes = targetDatanodes;
-    this.missingContainerIndexes =
-        Arrays.copyOf(missingContainerIndexes, missingContainerIndexes.length);
-    this.ecReplicationConfig = ecReplicationConfig;
+    this(containerID, sources, targetDatanodes, missingContainerIndexes,
+        ecReplicationConfig, HddsIdFactory.getLongId());
   }
 
   // Should be called only for protobuf conversion

Review Comment:
   Nit: comment seems to be outdated, as now it's called from the other constructor.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ECUnderReplicationHandler.java:
##########
@@ -81,14 +81,17 @@ public ECUnderReplicationHandler(
    * @param remainingMaintenanceRedundancy - represents that how many nodes go
    *                                      into maintenance.
    * @return Returns the key value pair of destination dn where the command gets
-   * executed and the command itself.
+   * executed and the command itself. If an empty list if returned, it indicates

Review Comment:
   ```suggestion
      * executed and the command itself. If an empty list is returned, it indicates
   ```



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/UnhealthyReplicationHandler.java:
##########
@@ -41,9 +42,13 @@ public interface UnhealthyReplicationHandler {
    * @param remainingMaintenanceRedundancy - represents that how many nodes go
    *                                      into maintenance.
    * @return Returns the key value pair of destination dn where the command gets
-   * executed and the command itself.
+   * executed and the command itself. If an empty list if returned, it indicates

Review Comment:
   ```suggestion
      * executed and the command itself. If an empty list is returned, it indicates
   ```



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -137,6 +138,7 @@ public class ReplicationManager implements SCMService {
   private final ReentrantLock lock = new ReentrantLock();
   private Queue<ContainerHealthResult.UnderReplicatedHealthResult>
       underRepQueue;
+  private ECUnderReplicationHandler ecUnderReplicationHandler;

Review Comment:
   Can be final?
   
   ```suggestion
     private final ECUnderReplicationHandler ecUnderReplicationHandler;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3599: HDDS-6895. EC: ReplicationManager - Logic to process the under replicated queue and assign work to DNs

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3599:
URL: https://github.com/apache/ozone/pull/3599#discussion_r921793972


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -431,6 +446,18 @@ public static class ReplicationManagerConfiguration {
     )
     private long interval = Duration.ofSeconds(300).toMillis();
 
+    /**
+     * The frequency in which the Under Replicated queue is processed.
+     */
+    @Config(key = "under.replicated.interval",
+        type = ConfigType.TIME,
+        defaultValue = "30s",
+        tags = {SCM, OZONE},
+        description = "Hpw frequently to check if there are work to process " +

Review Comment:
   Hqw --> How



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on a diff in pull request #3599: HDDS-6895. EC: ReplicationManager - Logic to process the under replicated queue and assign work to DNs

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on code in PR #3599:
URL: https://github.com/apache/ozone/pull/3599#discussion_r922457958


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/protocol/commands/ReplicateContainerCommand.java:
##########
@@ -41,6 +41,7 @@
 
   private final long containerID;
   private final List<DatanodeDetails> sourceDatanodes;
+  private int replicaIndex = 0;

Review Comment:
   We discussed offline about alternative approach to get the replicaIndex. I am updating here for the discussion context.
   We discussed that this may be ok and simpler, even though we are not using this at the DN side currently. We could potentially use this for validation purposes if needed. 
   
   The alternative method is that we could add a methods at ContainerManager to get the replicaIndex and we can use that method to get replicaIndex for a given node. After discussion we felt this is ok to just add to command rather than going back to container manager.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org