You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/05/27 03:30:37 UTC

[GitHub] [helix] jiajunwang opened a new pull request #1028: Add Abnormal States Resolver interface and configuration item.

jiajunwang opened a new pull request #1028:
URL: https://github.com/apache/helix/pull/1028


   ### Issues
   
   - [X] My PR addresses the following Helix issues and references them in the PR description:
   
   #1027
   
   ### Description
   
   - [X] Here are some details about my PR, including screenshots of any UI changes:
   
   The Abnormal States Resolver defines a generic interface to find and recover if the partition has any abnormal current states. For example,
   - double masters
   - application data out of sync
   The interface shall be implemented according to the requirement.
   
   The resolver is applied in the rebalance process according to the corresponding cluster config item. For example,
   "ABNORMAL_STATES_RESOLVER_MAP" : {
    "MASTERSLAVE" : "org.apache.helix.api.rebalancer.constraint.MasterSlaveAbnormalStateReslovler"
   }
   The default behavior without any configuration is not doing any recovery work.
   
   ### Tests
   
   - [X] The following tests are written for this issue:
   
   TestAbnormalStatesResolver
   TestClusterConfig.testAbnormalStatesResolverConfig()
   
   - [X] The following is the result of the "mvn test" command on the appropriate module:
   
   helix-rest, All passed
   
   helix-core,
   [ERROR] Failures:
   [ERROR]   TestNoThrottleDisabledPartitions.testDisablingTopStateReplicaByDisablingInstance:98 expected:<false> but was:<true>
   [ERROR]   TestWorkflowTermination.testWorkflowRunningTimeout:131->verifyWorkflowCleanup:257 expected:<true> but was:<false>
   [ERROR]   TestWorkflowTimeout.testWorkflowTimeoutWhenWorkflowCompleted:116 expected:<true> but was:<false>
   [INFO]
   [ERROR] Tests run: 1149, Failures: 3, Errors: 0, Skipped: 0
   [INFO]
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] ------------------------------------------------------------------------
   
   rerun
   [INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 73.794 s - in TestSuite
   [INFO] 
   [INFO] Results:
   [INFO] 
   [INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0
   [INFO] 
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time: 01:20 min
   [INFO] Finished at: 2020-05-23T13:06:40-07:00
   [INFO] ------------------------------------------------------------------------
   
   ### Commits
   
   - [X] My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation (Optional)
   
   - [ ] In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Code Quality
   
   - [X] My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang merged pull request #1028: Add Abnormal States Resolver interface and configuration item.

Posted by GitBox <gi...@apache.org>.
jiajunwang merged pull request #1028:
URL: https://github.com/apache/helix/pull/1028


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1028: Add Abnormal States Resolver interface and configuration item.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1028:
URL: https://github.com/apache/helix/pull/1028#discussion_r431444847



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java
##########
@@ -276,39 +278,20 @@ public ResourceAssignment computeBestPossiblePartitionState(ResourceControllerDa
     return partitionMapping;
   }
 
-  /**
-   * compute best state for resource in AUTO ideal state mode
-   * @param liveInstances
-   * @param stateModelDef
-   * @param preferenceList
-   * @param currentStateOutput
-   *          : instance->state for each partition
-   * @param disabledInstancesForPartition
-   * @param idealState
-   * @param  clusterConfig
-   * @param  partition
-   * @return
-   */
   @Override
   protected Map<String, String> computeBestPossibleStateForPartition(Set<String> liveInstances,
       StateModelDefinition stateModelDef, List<String> preferenceList,
       CurrentStateOutput currentStateOutput, Set<String> disabledInstancesForPartition,
-      IdealState idealState, ClusterConfig clusterConfig, Partition partition) {
-
+      IdealState idealState, ClusterConfig clusterConfig, Partition partition,
+      AbnormalStateResolver resolver) {
+    Optional<Map<String, String>> optionalOverwrittenStates =
+        computeStatesOverwriteForPartition(stateModelDef, preferenceList, currentStateOutput,
+            idealState, partition, resolver);
+    if (optionalOverwrittenStates.isPresent()) {

Review comment:
       If the IS drops or preference list is null, optionalOverwrittenStates will also be filled, and it will return directly.
   The intention here is to keep the original behavior. Did you see any exception?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1028: Add Abnormal States Resolver interface and configuration item.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1028:
URL: https://github.com/apache/helix/pull/1028#discussion_r431442739



##########
File path: helix-core/src/main/java/org/apache/helix/controller/dataproviders/BaseControllerDataProvider.java
##########
@@ -723,6 +729,43 @@ public void setAsyncTasksThreadPool(ExecutorService asyncTasksThreadPool) {
     _asyncTasksThreadPool = asyncTasksThreadPool;
   }
 
+
+  public AbnormalStateResolver getAbnormalStateResolver(String stateModel) {
+    return _abnormalStateResolverMap
+        .getOrDefault(stateModel, AbnormalStateResolver.DUMMY_STATE_RESOLVER);
+  }
+
+  private void refreshAbnormalStateResolverMap(ClusterConfig clusterConfig) {
+    if (clusterConfig == null) {
+      logger.debug("Skip refreshing abnormal state resolvers because the ClusterConfig is missing");
+      return;
+    }
+    Map<String, String> resolverMap = clusterConfig.getAbnormalStateResolverMap();
+    logger.debug("Start loading the abnormal state resolvers with configuration {}", resolverMap);
+    // Remove any resolver configuration that does not exist anymore.
+    _abnormalStateResolverMap.keySet().retainAll(resolverMap.keySet());
+    // Reload the resolver classes into cache based on the configuration.
+    for (String stateModel : resolverMap.keySet()) {
+      String resolverClassName = resolverMap.get(stateModel);
+      if (resolverClassName == null || resolverClassName.isEmpty()) {
+        // skip the empty definition.
+        continue;
+      }
+      if (!resolverClassName.equals(getAbnormalStateResolver(stateModel).getClass().getName())) {
+        try {
+          AbnormalStateResolver resolver = AbnormalStateResolver.class
+              .cast(HelixUtil.loadClass(getClass(), resolverClassName).newInstance());
+          _abnormalStateResolverMap.put(stateModel, resolver);
+        } catch (Exception e) {
+          throw new HelixException(String
+              .format("Failed to instantiate the abnormal state resolver %s for state model %s",
+                  resolverClassName, stateModel));
+        }
+      } // else, nothing to update since the same resolver class has been loaded.
+    }
+    logger.debug("Finish loading the abnormal state resolvers {}", _abnormalStateResolverMap);

Review comment:
       Sure




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] dasahcc commented on a change in pull request #1028: Add Abnormal States Resolver interface and configuration item.

Posted by GitBox <gi...@apache.org>.
dasahcc commented on a change in pull request #1028:
URL: https://github.com/apache/helix/pull/1028#discussion_r431466532



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java
##########
@@ -276,39 +278,20 @@ public ResourceAssignment computeBestPossiblePartitionState(ResourceControllerDa
     return partitionMapping;
   }
 
-  /**
-   * compute best state for resource in AUTO ideal state mode
-   * @param liveInstances
-   * @param stateModelDef
-   * @param preferenceList
-   * @param currentStateOutput
-   *          : instance->state for each partition
-   * @param disabledInstancesForPartition
-   * @param idealState
-   * @param  clusterConfig
-   * @param  partition
-   * @return
-   */
   @Override
   protected Map<String, String> computeBestPossibleStateForPartition(Set<String> liveInstances,
       StateModelDefinition stateModelDef, List<String> preferenceList,
       CurrentStateOutput currentStateOutput, Set<String> disabledInstancesForPartition,
-      IdealState idealState, ClusterConfig clusterConfig, Partition partition) {
-
+      IdealState idealState, ClusterConfig clusterConfig, Partition partition,
+      AbnormalStateResolver resolver) {
+    Optional<Map<String, String>> optionalOverwrittenStates =
+        computeStatesOverwriteForPartition(stateModelDef, preferenceList, currentStateOutput,
+            idealState, partition, resolver);
+    if (optionalOverwrittenStates.isPresent()) {

Review comment:
       I see. I was worrying about the Optional.empty(). But since this is not part of those two functions. The logic should be same.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on pull request #1028: Add Abnormal States Resolver interface and configuration item.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on pull request #1028:
URL: https://github.com/apache/helix/pull/1028#issuecomment-635646915


   This PR is ready to be merged, approved by @dasahcc 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on pull request #1028: Add Abnormal States Resolver interface and configuration item.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on pull request #1028:
URL: https://github.com/apache/helix/pull/1028#issuecomment-635650851


   Will checkin to a separate branch for reviewing the later PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] dasahcc commented on a change in pull request #1028: Add Abnormal States Resolver interface and configuration item.

Posted by GitBox <gi...@apache.org>.
dasahcc commented on a change in pull request #1028:
URL: https://github.com/apache/helix/pull/1028#discussion_r431414002



##########
File path: helix-core/src/main/java/org/apache/helix/controller/dataproviders/BaseControllerDataProvider.java
##########
@@ -723,6 +729,43 @@ public void setAsyncTasksThreadPool(ExecutorService asyncTasksThreadPool) {
     _asyncTasksThreadPool = asyncTasksThreadPool;
   }
 
+
+  public AbnormalStateResolver getAbnormalStateResolver(String stateModel) {
+    return _abnormalStateResolverMap
+        .getOrDefault(stateModel, AbnormalStateResolver.DUMMY_STATE_RESOLVER);
+  }
+
+  private void refreshAbnormalStateResolverMap(ClusterConfig clusterConfig) {
+    if (clusterConfig == null) {
+      logger.debug("Skip refreshing abnormal state resolvers because the ClusterConfig is missing");
+      return;
+    }
+    Map<String, String> resolverMap = clusterConfig.getAbnormalStateResolverMap();
+    logger.debug("Start loading the abnormal state resolvers with configuration {}", resolverMap);
+    // Remove any resolver configuration that does not exist anymore.
+    _abnormalStateResolverMap.keySet().retainAll(resolverMap.keySet());
+    // Reload the resolver classes into cache based on the configuration.
+    for (String stateModel : resolverMap.keySet()) {
+      String resolverClassName = resolverMap.get(stateModel);
+      if (resolverClassName == null || resolverClassName.isEmpty()) {
+        // skip the empty definition.
+        continue;
+      }
+      if (!resolverClassName.equals(getAbnormalStateResolver(stateModel).getClass().getName())) {
+        try {
+          AbnormalStateResolver resolver = AbnormalStateResolver.class
+              .cast(HelixUtil.loadClass(getClass(), resolverClassName).newInstance());
+          _abnormalStateResolverMap.put(stateModel, resolver);
+        } catch (Exception e) {
+          throw new HelixException(String
+              .format("Failed to instantiate the abnormal state resolver %s for state model %s",
+                  resolverClassName, stateModel));
+        }
+      } // else, nothing to update since the same resolver class has been loaded.
+    }
+    logger.debug("Finish loading the abnormal state resolvers {}", _abnormalStateResolverMap);

Review comment:
       We can let it be info since I will not see ClusterConfig change so frequently.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java
##########
@@ -276,39 +278,20 @@ public ResourceAssignment computeBestPossiblePartitionState(ResourceControllerDa
     return partitionMapping;
   }
 
-  /**
-   * compute best state for resource in AUTO ideal state mode
-   * @param liveInstances
-   * @param stateModelDef
-   * @param preferenceList
-   * @param currentStateOutput
-   *          : instance->state for each partition
-   * @param disabledInstancesForPartition
-   * @param idealState
-   * @param  clusterConfig
-   * @param  partition
-   * @return
-   */
   @Override
   protected Map<String, String> computeBestPossibleStateForPartition(Set<String> liveInstances,
       StateModelDefinition stateModelDef, List<String> preferenceList,
       CurrentStateOutput currentStateOutput, Set<String> disabledInstancesForPartition,
-      IdealState idealState, ClusterConfig clusterConfig, Partition partition) {
-
+      IdealState idealState, ClusterConfig clusterConfig, Partition partition,
+      AbnormalStateResolver resolver) {
+    Optional<Map<String, String>> optionalOverwrittenStates =
+        computeStatesOverwriteForPartition(stateModelDef, preferenceList, currentStateOutput,
+            idealState, partition, resolver);
+    if (optionalOverwrittenStates.isPresent()) {

Review comment:
       Are we changing the logic? For this change, we add optionalOverwrittenStates.isPresent(). This result could come from IS dropping or preference list is null. In this case, it will direct return.
   
   If you are not returning, it will execute following code, which may have NPE or some errors?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org