You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/07/20 01:15:32 UTC

[GitHub] [ozone] umamaheswararao opened a new pull request, #3610: HDDS-7016. EC: Implement the Over replication Processor

umamaheswararao opened a new pull request, #3610:
URL: https://github.com/apache/ozone/pull/3610

   ## What changes were proposed in this pull request?
   
   Implemented the processing logic of OverReplicated containers of EC. The logic is very similar to UnderReplicated containers processing. 
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7016
   
   ## How was this patch tested?
   
   Added test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #3610: HDDS-7016. EC: Implement the Over replication Processor

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on PR #3610:
URL: https://github.com/apache/ozone/pull/3610#issuecomment-1190382695

   Thanks @adoroszlai for the review. I have just addressed the comments. Thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] JacksonYao287 commented on pull request #3610: HDDS-7016. EC: Implement the Over replication Processor

Posted by GitBox <gi...@apache.org>.
JacksonYao287 commented on PR #3610:
URL: https://github.com/apache/ozone/pull/3610#issuecomment-1190048762

   thanks @umamaheswararao for this patch, the changes overall looks good, i have several questions:
   
   1 `UnderReplicatedProcessor` and `OverReplicatedProcessor` now run in a single thread background service, can we add a thread pool to run these task concurrently?
   
   2 in `ReplicationManager#processAll`, we get two new instances of queue for both overReplicated and underReplicated tasks every time, and they are protected by a lock. there may be a case that the earlier generated task queue is replaced by a later generated task queue without completing all the tasks in the earlier one. for example,  the interval of `OverReplicatedProcessor` is  10m, and the interval of RM is 2m. so in 10m, RM will produce 5 task queues for overreplicated containers, but OverReplicatedProcessor can only consume the latest one.
   
   i suggest to add two current collections(Eg. PriorityBlockingQueue and LinkedBlockingQueue) for overReplicated and underReplicated containers, so that the processors can get task from this queue , meanwhile RM can add tasks into them without a lock.
   
   by the way, if we want to filter the same task generated by RM, we can use ConcurrentSkipListSet. etc.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #3610: HDDS-7016. EC: Implement the Over replication Processor

Posted by GitBox <gi...@apache.org>.
sodonnel commented on PR #3610:
URL: https://github.com/apache/ozone/pull/3610#issuecomment-1190394902

   > 2 in ReplicationManager#processAll, we get two new instances of queue for both overReplicated and underReplicated tasks every time, and they are protected by a lock. there may be a case that the earlier generated task queue is replaced by a later generated task queue without completing all the tasks in the earlier one. for example, the interval of OverReplicatedProcessor is 10m, and the interval of RM is 2m. so in 10m, RM will produce 5 task queues for overreplicated containers, but OverReplicatedProcessor can only consume the latest one.
   
   Not sure if an earlier commit had a different time, but the intention is that the under / over replication threads will run at a faster interval than the replication manager. Eg 30 seconds, as that is the DN heartbeart interval. The queues will be regenerated each time the RM thread runs, probably every 5 or 10 minutes.
   
   As we are not using an event driven model where, for example, the dead node event causes the containers to be marked under replicated or a node registration causes them to be over replicated (this is how HDFS works), we decided to keep things simple and just re-create the queues on each iteration rather than trying to maintain a single queue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #3610: HDDS-7016. EC: Implement the Over replication Processor

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on PR #3610:
URL: https://github.com/apache/ozone/pull/3610#issuecomment-1190284861

   @JacksonYao287 Let's discuss the thread pool thought, if that's helpful here, I will take up in the followup JIRA. Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a diff in pull request #3610: HDDS-7016. EC: Implement the Over replication Processor

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on code in PR #3610:
URL: https://github.com/apache/ozone/pull/3610#discussion_r925679657


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/container/TestECContainerRecovery.java:
##########
@@ -0,0 +1,262 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with this
+ * work for additional information regarding copyright ownership.  The ASF
+ * licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ * License for the specific language governing permissions and limitations under
+ * the License.
+ */
+package org.apache.hadoop.ozone.container;
+
+import org.apache.hadoop.conf.StorageUnit;
+import org.apache.hadoop.hdds.HddsConfigKeys;
+import org.apache.hadoop.hdds.client.DefaultReplicationConfig;
+import org.apache.hadoop.hdds.client.ECReplicationConfig;
+import org.apache.hadoop.hdds.client.ReplicationType;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import org.apache.hadoop.hdds.protocol.datanode.proto.ContainerProtos;
+import org.apache.hadoop.hdds.scm.OzoneClientConfig;
+import org.apache.hadoop.hdds.scm.ScmConfigKeys;
+import org.apache.hadoop.hdds.scm.container.ContainerID;
+import org.apache.hadoop.hdds.scm.container.ContainerInfo;
+import org.apache.hadoop.hdds.scm.container.ContainerNotFoundException;
+import org.apache.hadoop.hdds.scm.pipeline.Pipeline;
+import org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol;
+import org.apache.hadoop.hdds.scm.server.StorageContainerManager;
+import org.apache.hadoop.hdds.utils.HAUtils;
+import org.apache.hadoop.ozone.MiniOzoneCluster;
+import org.apache.hadoop.ozone.OzoneConfigKeys;
+import org.apache.hadoop.ozone.client.BucketArgs;
+import org.apache.hadoop.ozone.client.ObjectStore;
+import org.apache.hadoop.ozone.client.OzoneBucket;
+import org.apache.hadoop.ozone.client.OzoneClient;
+import org.apache.hadoop.ozone.client.OzoneClientFactory;
+import org.apache.hadoop.ozone.client.OzoneVolume;
+import org.apache.hadoop.ozone.client.io.ECKeyOutputStream;
+import org.apache.hadoop.ozone.client.io.OzoneOutputStream;
+import org.apache.ozone.test.GenericTestUtils;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.hadoop.hdds.scm.ScmConfigKeys.HDDS_SCM_WATCHER_TIMEOUT;
+
+/**
+ * Tests key output stream.

Review Comment:
   Stale comment?



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -295,6 +302,22 @@ public synchronized void processAll() {
     }
   }
 
+  /**
+   * Retrieve the new highest priority container to be replicated from the
+   * under replicated queue.
+   * @return The new underReplicated container to be processed, or null if the

Review Comment:
   ```suggestion
      * over-replicated queue.
      * @return The next over-replicated container to be processed, or null if the
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on pull request #3610: HDDS-7016. EC: Implement the Over replication Processor

Posted by GitBox <gi...@apache.org>.
sodonnel commented on PR #3610:
URL: https://github.com/apache/ozone/pull/3610#issuecomment-1190391540

   > 1 UnderReplicatedProcessor and OverReplicatedProcessor now run in a single thread background service, can we add a thread pool to run these task concurrently?
   
   @JacksonYao287  Why do you think these need a thread pool? I have not tested yet, but I feel that a single thread processing each queue should be fast enough. In the future, we also plan to limit the amount of work to assign to DNs in a given period, and then I suspect a single thread per queue would be fast enough to handle the processing - certainly it should be able to allocate more commands to the DNs in a DN heartbeat interval than the DN could process.
   
   @umamaheswararao These changes look good - we could do with some unit test for OverReplicationProcessor, similar to those in TestUnderReplicatedProcessor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #3610: HDDS-7016. EC: Implement the Over replication Processor

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on PR #3610:
URL: https://github.com/apache/ozone/pull/3610#issuecomment-1190243722

   Thank you @JacksonYao287 for the review!
   I think the thought was that, RM is always do full loop of rechecking. So, if we maintain the same queue and just add we get log of duplicates would appear. The idea here is, when RM does re-loop, that should get the latest full set, and that replaces the queue itself. Just for adding the failed things back of the queue we are using these queues and for under replication we are doing prioritizations. 
   They both needs to run dedicatedly. Not sure having thread pool make much difference.
   
   As per @sodonnel both of these threads needs the improvements with respective to DN load considerations. That is the next work we have there. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao commented on pull request #3610: HDDS-7016. EC: Implement the Over replication Processor

Posted by GitBox <gi...@apache.org>.
umamaheswararao commented on PR #3610:
URL: https://github.com/apache/ozone/pull/3610#issuecomment-1190784894

   Thanks everyone for the reviews!.
   @JacksonYao287 Let's continues the discussion in case if you feel we need the follow up, let's file and we will continue work on them to improve.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] umamaheswararao merged pull request #3610: HDDS-7016. EC: Implement the Over replication Processor

Posted by GitBox <gi...@apache.org>.
umamaheswararao merged PR #3610:
URL: https://github.com/apache/ozone/pull/3610


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org