You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2021/06/14 07:16:05 UTC

[GitHub] [helix] huizhilu opened a new pull request #1793: Move pause and maintenance handling out of controller

huizhilu opened a new pull request #1793:
URL: https://github.com/apache/helix/pull/1793


   ### Issues
   
   - [x] My PR addresses the following Helix issues and references them in the PR description:
   
   Resolves #1772 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI changes:
   
   With management mode pipeline, the pause and maintenance signals handling logic should be moved out of the onControllerChange() and moved to the management mode pipeline. 
   
   This PR handles pause/maintenance signals enable/disable and update cluster status accordingly. The whole logic of pause mode will be handled in the other following PRs.
   
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   The existing maintenance mode tests should cover the logic for maintenance. Tests for freeze mode will be added later.
   
   - The following is the result of the "mvn test" command on the appropriate module:
   
   (If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)
   
   ### Changes that Break Backward Compatibility (Optional)
   
   - My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:
   
   (Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
   
   ### Documentation (Optional)
   
   - In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1793: Move pause and maintenance handling out of controller

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1793:
URL: https://github.com/apache/helix/pull/1793#discussion_r650287018



##########
File path: helix-core/src/main/java/org/apache/helix/util/HelixUtil.java
##########
@@ -519,13 +519,20 @@ public static InstanceConfig composeInstanceConfig(String instanceName) {
   }
 
   /**
-   * Checks whether or not the cluster is in management mode.
+   * Checks whether or not the cluster is in management mode. It checks pause signal
+   * and live instances: whether any live instance is not in normal status, eg. frozen.
    *
-   * @param cache
-   * @return
+   * @param pauseSignal pause signal
+   * @param liveInstanceMap map of live instances
+   * @param enabledLiveInstances set of enabled live instance names. They should be all included
+   *                             in the liveInstanceMap.
+   * @return true if cluster is in management mode; otherwise, false
    */
-  public static boolean inManagementMode(BaseControllerDataProvider cache) {
-    // TODO: implement the logic. Parameters can also change
-    return true;
+  public static boolean inManagementMode(PauseSignal pauseSignal,
+      Map<String, LiveInstance> liveInstanceMap, Set<String> enabledLiveInstances) {
+    // Check pause signal and abnormal live instances (eg. in freeze mode)
+    // TODO: should check maintenance signal when moving maintenance to management pipeline
+    return pauseSignal != null || enabledLiveInstances.stream()
+        .anyMatch(instance -> liveInstanceMap.get(instance).getStatus() != null);

Review comment:
       IMO, getStatus() != null is little bit hacky. Better to check if the status is PAUSED or whatever other specific status.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ManagementModeStage.java
##########
@@ -36,13 +37,24 @@
   @Override
   public void process(ClusterEvent event) throws Exception {
     // TODO: implement the stage
+    _eventId = event.getEventId();
     String clusterName = event.getClusterName();
+    ClusterStatusMonitor clusterStatusMonitor =
+        event.getAttribute(AttributeName.clusterStatusMonitor.name());
     ManagementControllerDataProvider cache =
         event.getAttribute(AttributeName.ControllerDataProvider.name());
-    if (!HelixUtil.inManagementMode(cache)) {
-      LOG.info("Exiting management mode pipeline for cluster {}", clusterName);
-      RebalanceUtil.enableManagementMode(clusterName, false);
-      throw new StageException("Exiting management mode pipeline for cluster " + clusterName);
+
+    // TODO: move to the last stage of management pipeline
+    checkInManagementMode(clusterStatusMonitor, clusterName, cache);
+  }
+
+  private void checkInManagementMode(ClusterStatusMonitor clusterStatusMonitor,
+      String clusterName, ManagementControllerDataProvider cache) {
+    // Should exit management mode
+    if (!HelixUtil.inManagementMode(cache.getPauseSignal(), cache.getLiveInstances(),
+        cache.getEnabledLiveInstances())) {
+      LogUtil.logInfo(LOG, _eventId, "Exiting management mode pipeline for cluster " + clusterName);
+      RebalanceUtil.enableManagementMode(clusterName, false, clusterStatusMonitor);

Review comment:
       I don't think monitor logic is tightly related to the util method enableManagementMode. Shall we just report any metric here? So clusterStatusMonitor does not need to be passed to the RebalanceUtil.

##########
File path: helix-core/src/main/java/org/apache/helix/util/RebalanceUtil.java
##########
@@ -150,17 +151,25 @@
    *
    * @param clusterName target cluster name
    * @param enabled enable/disable controller to management mode pipeline
+   * @param clusterStatusMonitor
    */
-  public static void enableManagementMode(String clusterName, boolean enabled) {
+  public static void enableManagementMode(String clusterName, boolean enabled,
+      ClusterStatusMonitor clusterStatusMonitor) {
     GenericHelixController leaderController =
         GenericHelixController.getLeaderController(clusterName);
     if (leaderController != null) {
       LOG.info("Switching management mode pipeline for cluster={}, enabled={}", clusterName,
           enabled);
       leaderController.setInManagementMode(enabled);
     } else {
-      LOG.error("Failed to switch management mode pipeline, enabled={}. "
-          + "Controller for cluster {} does not exist", clusterName, enabled);
+      throw new HelixException(String.format("Failed to switch management mode pipeline, "
+          + "enabled=%s. Controller for cluster %s does not exist", enabled, clusterName));
+    }
+
+    // Cluster is paused/frozen.
+    if (clusterStatusMonitor != null) {
+      clusterStatusMonitor.setEnabled(!enabled);
+      clusterStatusMonitor.setPaused(enabled);

Review comment:
       As mentioned above, the monitor is an internal field of the GenericHelixController. We should keep the operations inside the controller class as long as it is possible. Otherwise, this util could be misused and cause some unexpected side effects.

##########
File path: helix-core/src/main/java/org/apache/helix/util/HelixUtil.java
##########
@@ -519,13 +519,20 @@ public static InstanceConfig composeInstanceConfig(String instanceName) {
   }
 
   /**
-   * Checks whether or not the cluster is in management mode.
+   * Checks whether or not the cluster is in management mode. It checks pause signal

Review comment:
       Question, is management mode defined as Pause + cluster freeze, or also with Maintenance mode? There have been some changes and I want to confirm.

##########
File path: helix-core/src/main/java/org/apache/helix/util/HelixUtil.java
##########
@@ -519,13 +519,20 @@ public static InstanceConfig composeInstanceConfig(String instanceName) {
   }
 
   /**
-   * Checks whether or not the cluster is in management mode.
+   * Checks whether or not the cluster is in management mode. It checks pause signal

Review comment:
       Sounds good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] huizhilu merged pull request #1793: Move pause and maintenance handling out of controller

Posted by GitBox <gi...@apache.org>.
huizhilu merged pull request #1793:
URL: https://github.com/apache/helix/pull/1793


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] junkaixue commented on a change in pull request #1793: Move pause and maintenance handling out of controller

Posted by GitBox <gi...@apache.org>.
junkaixue commented on a change in pull request #1793:
URL: https://github.com/apache/helix/pull/1793#discussion_r649764049



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceValidationStage.java
##########
@@ -91,6 +87,29 @@ public void process(ClusterEvent event) throws Exception {
     }
   }
 
+  private void processManagementMode(ClusterEvent event, BaseControllerDataProvider cache)

Review comment:
       I am confused. This is general handle even function. If it is in management mode, then we throw exception. Would this affect the management pipeline handle event?

##########
File path: helix-core/src/main/java/org/apache/helix/controller/dataproviders/ManagementControllerDataProvider.java
##########
@@ -19,9 +19,26 @@
  * under the License.
  */
 
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.helix.HelixConstants;
+
+/**
+ * Data provider for controller management mode pipeline.
+ */
 public class ManagementControllerDataProvider extends BaseControllerDataProvider {
-  // TODO: implement this class to only refresh required event types
-  public ManagementControllerDataProvider(String clusterName, String name) {
-    super(clusterName, name);
+  private static final List<HelixConstants.ChangeType> FULL_REFRESH_PROPERTIES =

Review comment:
       NIT: This naming could be confusing? With these change types, it does not require full refresh but these properties refresh, right?

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ManagementModeStage.java
##########
@@ -36,13 +38,30 @@
   @Override
   public void process(ClusterEvent event) throws Exception {
     // TODO: implement the stage
+    _eventId = event.getEventId();
     String clusterName = event.getClusterName();
+    ClusterStatusMonitor clusterStatusMonitor =

Review comment:
       NIT: This monitor will be management pipeline monitor, right? How about regular pipeline monitor update? Shall we update them as well?

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceValidationStage.java
##########
@@ -91,6 +87,29 @@ public void process(ClusterEvent event) throws Exception {
     }
   }
 
+  private void processManagementMode(ClusterEvent event, BaseControllerDataProvider cache)
+      throws StageException {
+    // Set cluster status monitor for maintenance mode
+    if (ClusterEventType.ControllerChange.equals(event.getEventType())) {
+      ClusterStatusMonitor monitor = event.getAttribute(AttributeName.clusterStatusMonitor.name());
+      if (monitor != null) {
+        monitor.setMaintenance(cache.isMaintenanceModeEnabled());
+      }
+    }
+
+    // Check if cluster is still in management mode. Eg. there exists any frozen live instance.
+    if (HelixUtil.inManagementMode(cache.getPauseSignal(), cache.getLiveInstances(),

Review comment:
       Can we unify these logic into a static util function? Because, more than one places hooked the logic almost same just enable/disable the management mode difference.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ManagementModeStage.java
##########
@@ -36,13 +38,30 @@
   @Override
   public void process(ClusterEvent event) throws Exception {
     // TODO: implement the stage
+    _eventId = event.getEventId();
     String clusterName = event.getClusterName();
+    ClusterStatusMonitor clusterStatusMonitor =
+        event.getAttribute(AttributeName.clusterStatusMonitor.name());
     ManagementControllerDataProvider cache =
         event.getAttribute(AttributeName.ControllerDataProvider.name());
-    if (!HelixUtil.inManagementMode(cache)) {
-      LOG.info("Exiting management mode pipeline for cluster {}", clusterName);
+
+    checkInManagementMode(clusterStatusMonitor, clusterName, cache);
+  }
+
+  private void checkInManagementMode(ClusterStatusMonitor clusterStatusMonitor,
+      String clusterName, ManagementControllerDataProvider cache) throws StageException {
+    // Should exit management mode
+    if (!HelixUtil.inManagementMode(cache.getPauseSignal(), cache.getLiveInstances(),

Review comment:
       As we discussed, pending message of freeze check is necessary to protect whether we can break this pipeline. Otherwise, there would be a race condition can cause lot of unnecessary traffic to ZK.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ManagementModeStage.java
##########
@@ -36,13 +38,30 @@
   @Override
   public void process(ClusterEvent event) throws Exception {
     // TODO: implement the stage
+    _eventId = event.getEventId();
     String clusterName = event.getClusterName();
+    ClusterStatusMonitor clusterStatusMonitor =

Review comment:
       Please add some comments here. If we dont break the pipeline, following monitor update needs to be synchronized since the monitor is sharing by multiple pipelines.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] huizhilu commented on a change in pull request #1793: Move pause and maintenance handling out of controller

Posted by GitBox <gi...@apache.org>.
huizhilu commented on a change in pull request #1793:
URL: https://github.com/apache/helix/pull/1793#discussion_r649775890



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceValidationStage.java
##########
@@ -91,6 +87,29 @@ public void process(ClusterEvent event) throws Exception {
     }
   }
 
+  private void processManagementMode(ClusterEvent event, BaseControllerDataProvider cache)

Review comment:
       No, because the management pipeline does not include this stage. This is data preprocess for default/task. Management does not run it.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ManagementModeStage.java
##########
@@ -36,13 +38,30 @@
   @Override
   public void process(ClusterEvent event) throws Exception {
     // TODO: implement the stage
+    _eventId = event.getEventId();
     String clusterName = event.getClusterName();
+    ClusterStatusMonitor clusterStatusMonitor =

Review comment:
       This monitor internally is synchronized. We don't have to explicitly synchronized it in the pipeline. The existing monitor update usage in default/task pipelines does not synchronized, either, which is fine.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/ResourceValidationStage.java
##########
@@ -91,6 +87,29 @@ public void process(ClusterEvent event) throws Exception {
     }
   }
 
+  private void processManagementMode(ClusterEvent event, BaseControllerDataProvider cache)
+      throws StageException {
+    // Set cluster status monitor for maintenance mode
+    if (ClusterEventType.ControllerChange.equals(event.getEventType())) {
+      ClusterStatusMonitor monitor = event.getAttribute(AttributeName.clusterStatusMonitor.name());
+      if (monitor != null) {
+        monitor.setMaintenance(cache.isMaintenanceModeEnabled());
+      }
+    }
+
+    // Check if cluster is still in management mode. Eg. there exists any frozen live instance.
+    if (HelixUtil.inManagementMode(cache.getPauseSignal(), cache.getLiveInstances(),

Review comment:
       We cannot combine all, because the logic depends on the result of `inManagement()` and the type of pipeline. But I've put the cluster status monitor logic into the util, which is cleaner now.

##########
File path: helix-core/src/main/java/org/apache/helix/util/HelixUtil.java
##########
@@ -519,13 +519,20 @@ public static InstanceConfig composeInstanceConfig(String instanceName) {
   }
 
   /**
-   * Checks whether or not the cluster is in management mode.
+   * Checks whether or not the cluster is in management mode. It checks pause signal

Review comment:
       Currently we only make it for pause & freeze. In the future when we move M mode to management pipeline, we should also check M mode. A TODO is already added to this API.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] huizhilu commented on pull request #1793: Move pause and maintenance handling out of controller

Posted by GitBox <gi...@apache.org>.
huizhilu commented on pull request #1793:
URL: https://github.com/apache/helix/pull/1793#issuecomment-860017783


   > The change looks good. But there are no tests. Are you planning to add in the next PR?
   
   The existing tests cover the logic of maintenance and cluster status monitoring. Will add unit/integration tests in later PRs.
   
   This PR is ready to be merged. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org