You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/09/07 06:41:10 UTC

[GitHub] [ozone] siddhantsangwan opened a new pull request, #3738: HDDS-7204. EC: Schedule UnderReplicatedProcessor and OverReplicatedProcessor threads in RM instead of StorageContainerManager

siddhantsangwan opened a new pull request, #3738:
URL: https://github.com/apache/ozone/pull/3738

   ## What changes were proposed in this pull request?
   
   Currently, UnderReplicatedProcessor and OverReplicatedProcessor are scheduled as BackgroundSCMService threads in StorageContainerManager. These processors directly depend on RM and should be managed by RM. This Jira moves their initialisation to the ReplicationManager class.
   Under and Over Replicated processors don't check whether they should continue running when processing their queues. This is a potential bug - they won't get interrupted when looping through a very large number of containers. This is fixed by including a check for ReplicationManager#shouldRun() inside the loop.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7204
   
   ## How was this patch tested?
   
   Existing tests are passing. I'm still thinking about whether the existing ones are enough or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] siddhantsangwan commented on pull request #3738: HDDS-7204. EC: Schedule UnderReplicatedProcessor and OverReplicatedProcessor threads in RM instead of StorageContainerManager

Posted by GitBox <gi...@apache.org>.
siddhantsangwan commented on PR #3738:
URL: https://github.com/apache/ozone/pull/3738#issuecomment-1241932187

   @sodonnel Thanks for reviewing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #3738: HDDS-7204. EC: Schedule UnderReplicatedProcessor and OverReplicatedProcessor threads in RM instead of StorageContainerManager

Posted by GitBox <gi...@apache.org>.
siddhantsangwan commented on code in PR #3738:
URL: https://github.com/apache/ozone/pull/3738#discussion_r964660812


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -248,6 +261,38 @@ public synchronized void stop() {
     }
   }
 
+  /**
+   * Create Replication Manager sub services such as Over and Under Replication
+   * processors.
+   */
+  private void createSubServices() {

Review Comment:
   > I thought we concluded that these sub-services should not be registered with the ServiceManager themselves, but just be threads managed within the replication manager?
   
   Yeah. When implementing that, I realised that we would end up writing code similar to BackgroundSCMService. `BackgroundThread` implementing Runnable will have something like:
   ```
     private void run() {
       while (running.get()) {
         try {
           if (shouldRun()) {
             try {
               periodicalTask.run();
             } catch (Throwable e) {
               log.error("Caught Unhandled exception in {}. The task will be " +
                   "re-tried in {}ms", getServiceName(), intervalInMillis, e);
             }
           }
           synchronized (this) {
             if (!runImmediately) {
               wait(intervalInMillis);
             }
             runImmediately = false;
           }
         } catch (InterruptedException e) {
           log.warn("{} is interrupted, exit", serviceName);
           Thread.currentThread().interrupt();
           running.set(false);
         }
       }
     }
   ```
   We can avoid doing this by using BackgroundSCMService. The downside is that our definition of whether they are RM sub threads or SCM threads becomes hazy. But we are solving a part of that problem by restricting their visibility to only RM and SCMServiceManager. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3738: HDDS-7204. EC: Schedule UnderReplicatedProcessor and OverReplicatedProcessor threads in RM instead of StorageContainerManager

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3738:
URL: https://github.com/apache/ozone/pull/3738#discussion_r964558329


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -248,6 +261,38 @@ public synchronized void stop() {
     }
   }
 
+  /**
+   * Create Replication Manager sub services such as Over and Under Replication
+   * processors.
+   */
+  private void createSubServices() {

Review Comment:
   I thought we concluded that these sub-services should not be registered with the ServiceManager themselves, but just be threads managed within the replication manager? Perhaps using a "BackgroundThread" base class to manage their run loop etc?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] sodonnel commented on a diff in pull request #3738: HDDS-7204. EC: Schedule UnderReplicatedProcessor and OverReplicatedProcessor threads in RM instead of StorageContainerManager

Posted by GitBox <gi...@apache.org>.
sodonnel commented on code in PR #3738:
URL: https://github.com/apache/ozone/pull/3738#discussion_r964595292


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -248,6 +261,38 @@ public synchronized void stop() {
     }
   }
 
+  /**
+   * Create Replication Manager sub services such as Over and Under Replication
+   * processors.
+   */
+  private void createSubServices() {

Review Comment:
   How about just making these other thread simple threads that we start with runnables  that use the RM.shouldRun() internally?
   
   If we start finding a lot of this sort of things in the future, we can try to make it generic, but for now perhaps its not worth it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] siddhantsangwan merged pull request #3738: HDDS-7204. EC: Schedule UnderReplicatedProcessor and OverReplicatedProcessor threads in RM instead of StorageContainerManager

Posted by GitBox <gi...@apache.org>.
siddhantsangwan merged PR #3738:
URL: https://github.com/apache/ozone/pull/3738


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] siddhantsangwan commented on a diff in pull request #3738: HDDS-7204. EC: Schedule UnderReplicatedProcessor and OverReplicatedProcessor threads in RM instead of StorageContainerManager

Posted by GitBox <gi...@apache.org>.
siddhantsangwan commented on code in PR #3738:
URL: https://github.com/apache/ozone/pull/3738#discussion_r964660812


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java:
##########
@@ -248,6 +261,38 @@ public synchronized void stop() {
     }
   }
 
+  /**
+   * Create Replication Manager sub services such as Over and Under Replication
+   * processors.
+   */
+  private void createSubServices() {

Review Comment:
   > I thought we concluded that these sub-services should not be registered with the ServiceManager themselves, but just be threads managed within the replication manager?
   
   Yeah. When implementing that, I realised that we would end up writing code similar to BackgroundSCMService. `BackgroundThread` implementing Runnable will have something like:
   ```
     public void run() {
       while (running.get()) {
         try {
           if (shouldRun()) {
             try {
               periodicalTask.run();
             } catch (Throwable e) {
               log.error("Caught Unhandled exception in {}. The task will be " +
                   "re-tried in {}ms", getServiceName(), intervalInMillis, e);
             }
           }
           synchronized (this) {
             if (!runImmediately) {
               wait(intervalInMillis);
             }
             runImmediately = false;
           }
         } catch (InterruptedException e) {
           log.warn("{} is interrupted, exit", serviceName);
           Thread.currentThread().interrupt();
           running.set(false);
         }
       }
     }
   ```
   We can avoid doing this by using BackgroundSCMService. The downside is that our definition of whether they are RM sub threads or SCM threads becomes hazy. But we are solving a part of that problem by restricting their visibility to only RM and SCMServiceManager. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org