You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2022/05/23 10:46:11 UTC

[GitHub] [ozone] JacksonYao287 commented on a diff in pull request #3423: HDDS-6280. Support Container Balancer HA

JacksonYao287 commented on code in PR #3423:
URL: https://github.com/apache/ozone/pull/3423#discussion_r879299342


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java:
##########
@@ -179,19 +189,46 @@ private void balance() {
         }
       }
 
-      // stop balancing if iteration is not initialized
+      // initialize this iteration. stop balancing on initialization failure
       if (!initializeIteration()) {
-        stopBalancer();
+        // just return if the reason for initialization failure is that
+        // balancer has been stopped in another thread
+        if (!isBalancerRunning()) {
+          return;
+        }

Review Comment:
   can we add a balancer running check `currentThread = null` inside `stopBalancer`, so that the two operations will be protected by a single lock. if we first call `isBalancerRunning()` and then  call `stopBalancer()` , there might be a case that between the two operation in a single thread, `stopbalancer` is called from another thread.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java:
##########
@@ -801,24 +846,70 @@ private void resetState() {
   /**
    * Receives a notification for raft or safe mode related status changes.
    * Stops ContainerBalancer if it's running and the current SCM becomes a
-   * follower or enters safe mode.
+   * follower or enters safe mode. Starts ContainerBalancer if the current
+   * SCM becomes leader, is out of safe mode and balancer should run.
    */
   @Override
   public void notifyStatusChanged() {
-    if (!scmContext.isLeader() || scmContext.isInSafeMode()) {
-      if (isBalancerRunning()) {
-        stopBalancingThread();
+    lock.lock();
+    try {
+      if (!scmContext.isLeader() || scmContext.isInSafeMode()) {
+        if (isBalancerRunning()) {
+          stop();
+        }
+      } else {
+        if (shouldRun()) {
+          try {
+            start();
+          } catch (IllegalContainerBalancerStateException |
+              InvalidContainerBalancerConfigurationException e) {
+            LOG.warn("Could not start ContainerBalancer on raft/safe-mode " +
+                "status change.", e);
+          }
+        }
       }
+    } finally {
+      lock.unlock();
     }
   }
 
   /**
-   * Checks if ContainerBalancer should run.
-   * @return false
+   * Checks if ContainerBalancer should start (after a leader change, restart
+   * etc.) by reading persisted state.
+   * @return true if the persisted state is true, otherwise false
    */
   @Override
   public boolean shouldRun() {
-    return false;
+    try {

Review Comment:
   should we check the status and the delay here?
   ```
     public boolean shouldRun() {
       serviceLock.lock();
       try {
         // If safe mode is off, then this SCMService starts to run with a delay.
         return serviceStatus == ServiceStatus.RUNNING &&
             clock.millis() - lastTimeToBeReadyInMillis >= waitTimeInMillis;
       } finally {
         serviceLock.unlock();
       }
     }
   ```



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java:
##########
@@ -801,24 +846,70 @@ private void resetState() {
   /**
    * Receives a notification for raft or safe mode related status changes.
    * Stops ContainerBalancer if it's running and the current SCM becomes a
-   * follower or enters safe mode.
+   * follower or enters safe mode. Starts ContainerBalancer if the current
+   * SCM becomes leader, is out of safe mode and balancer should run.
    */
   @Override
   public void notifyStatusChanged() {
-    if (!scmContext.isLeader() || scmContext.isInSafeMode()) {
-      if (isBalancerRunning()) {
-        stopBalancingThread();
+    lock.lock();
+    try {
+      if (!scmContext.isLeader() || scmContext.isInSafeMode()) {
+        if (isBalancerRunning()) {

Review Comment:
   should we check and set the ServiceStatus here? for example 
   ```
         if (scmContext.isLeaderReady() && !scmContext.isInSafeMode()) {
           if (serviceStatus != ServiceStatus.RUNNING) {
             LOG.info("Service {} transitions to RUNNING.", getServiceName());
             serviceStatus = ServiceStatus.RUNNING;
             lastTimeToBeReadyInMillis = clock.millis();
           }
         } else {
           if (serviceStatus != ServiceStatus.PAUSING) {
             LOG.info("Service {} transitions to PAUSING.", getServiceName());
             serviceStatus = ServiceStatus.PAUSING;
           }
         }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org