You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/01/18 00:47:26 UTC

[GitHub] [helix] jiajunwang opened a new pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

jiajunwang opened a new pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690
 
 
   ### Issues
   
   - [ ] My PR addresses the following Helix issues and references them in the PR description:
   
   #660 
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI changes:
   
   This is to prevent any cached assignment information which is recorded during the previous session from impacting the rebalance result.
   Detailed change list:
     1. Move the stateful WAGED rebalancer to the GenericHelixController object instead of the rebalance stage. This is for resolving the possible race condition between the event processing thread and leader switch handling thread.
     2. Add reset methods to all the stateful objects that are used in the WAGED rebalancer.
     3. Refine some of the potential race condition in the WAGED rebalancer components.
     4. Adjust the tests accordingly. Also adding new tests to cover the components reset / the WAGED rebalancer reset logic.
   
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   TestResourceChagneDetector.testResetSnapshot
   TestAssignmentMetadataStore.testAssignmentCache
   TestWagedRebalancer.testReset
   
   - [ ] The following is the result of the "mvn test" command on the appropriate module:
   
   [INFO] Tests run: 1064, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4,162.076 s - in TestSuite
   [INFO] 
   [INFO] Results:
   [INFO] 
   [INFO] Tests run: 1064, Failures: 0, Errors: 0, Skipped: 0
   [INFO] 
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   
   ### Commits
   
   - [ ] My commits all reference appropriate Apache Helix GitHub issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Code Quality
   
   - [ ] My diff has been formatted using helix-style.xml
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370759211
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -282,22 +263,22 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
 
     Map<String, IdealState> newIdealStates = new HashMap<>();
 
-    ClusterConfig clusterConfig = cache.getClusterConfig();
-    WagedRebalancer wagedRebalancer =
-        getWagedRebalancer(helixManager, clusterConfig.getGlobalRebalancePreference(),
-            clusterConfig.isGlobalRebalanceAsyncModeEnabled());
-    try {
-      newIdealStates.putAll(wagedRebalancer.computeNewIdealStates(cache, wagedRebalancedResourceMap,
-          currentStateOutput));
-    } catch (HelixRebalanceException ex) {
-      // Note that unlike the legacy rebalancer, the WAGED rebalance won't return partial result.
-      // Since it calculates for all the eligible resources globally, a partial result is invalid.
-      // TODO propagate the rebalancer failure information to updateRebalanceStatus for monitoring.
+    if (wagedRebalancer != null) {
 
 Review comment:
   Perhaps we should check whether the rebalancer has been closed here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369863611
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1259,3 +1287,54 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.start();
   }
 }
+
+/**
+ * A wrapper class for the WAGED rebalancer instance.
+ */
+class WagedRebalancerRef {
+  private WagedRebalancer _rebalancer = null;
+  private boolean _isRebalancerValid = true;
+
+  private void createWagedRebalancer(HelixManager helixManager) {
+    // Create WagedRebalancer instance if it hasn't been already initialized
+    if (_rebalancer == null) {
+      _rebalancer = new WagedRebalancer(helixManager);
+      _isRebalancerValid = true;
+    }
+  }
+
+  /**
+   * Mark the current rebalancer object to be invalid, which indicates it needs to be reset before
+   * the next usage.
+   */
+  synchronized void invalidRebalancer() {
 
 Review comment:
   "invalidate"Rebalancer?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371440400
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1293,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
+    private T _rebalancer = null;
+    private boolean _isRebalancerValid = true;
+
+    /**
+     * @param helixManager
+     * @return A new stateful rebalancer instance with initial state.
+     */
+    protected abstract T createRebalancer(HelixManager helixManager);
+
+    /**
+     * Mark the current rebalancer object to be invalid, which indicates it needs to be reset before
+     * the next usage.
+     */
+    synchronized void invalidateRebalancer() {
+      _isRebalancerValid = false;
+    }
+
+    /**
+     * @return A valid rebalancer object.
+     *         If the rebalancer is no longer valid, it will be reset before returning.
+     */
+    synchronized T getRebalancer(HelixManager helixManager) {
+      // Lazily initialize the stateful rebalancer instance since the GenericHelixController
+      // instance is instantiated without the HelixManager information that is required.
+      if (_rebalancer == null) {
+        _rebalancer = createRebalancer(helixManager);
+        _isRebalancerValid = true;
+      }
+      // If the rebalance exists but has been marked as invalid (due to leadership switch), it needs
+      // to be reset before return.
+      if (!_isRebalancerValid) {
+        _rebalancer.reset();
+        _isRebalancerValid = true;
+      }
+      return _rebalancer;
 
 Review comment:
   Should we follow the lazy-initialization idiom (using volatile keyword on the singleton objects)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#issuecomment-578977742
 
 
   This PR is ready to be merged, approved by @narendly 
   
   I have added TODOs regarding the potential concerns. Since they are related to the possible future improvements or new features, we will revisit those logics by then.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly edited a comment on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly edited a comment on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#issuecomment-577429092
 
 
   @jiajunwang Could we separate the scope for this change?
   
   1. Is it possible to create a PR that includes the change on resetting the cluster data change detector?
   2. The next PR would be the movement of the rebalancer to Generic Helix Controller. It's not too clear why this is necessary 
   
   > Move the stateful WAGED rebalancer to the GenericHelixController object instead of the rebalance stage. This is for resolving the possible race condition between the event processing thread and leader switch handling thread.
   
   Could you explain this further? I am afraid in terms of design, we might be introducing a new dependency (between GenericHelixController and the rebalancer/stages). This doesn't seem too great to me because Helix's design I believe aims to keep them independent.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371499479
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1293,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
+    private T _rebalancer = null;
+    private boolean _isRebalancerValid = true;
+
+    /**
+     * @param helixManager
+     * @return A new stateful rebalancer instance with initial state.
+     */
+    protected abstract T createRebalancer(HelixManager helixManager);
+
+    /**
+     * Mark the current rebalancer object to be invalid, which indicates it needs to be reset before
+     * the next usage.
+     */
+    synchronized void invalidateRebalancer() {
+      _isRebalancerValid = false;
+    }
+
+    /**
+     * @return A valid rebalancer object.
+     *         If the rebalancer is no longer valid, it will be reset before returning.
+     */
+    synchronized T getRebalancer(HelixManager helixManager) {
+      // Lazily initialize the stateful rebalancer instance since the GenericHelixController
+      // instance is instantiated without the HelixManager information that is required.
+      if (_rebalancer == null) {
+        _rebalancer = createRebalancer(helixManager);
+        _isRebalancerValid = true;
+      }
+      // If the rebalance exists but has been marked as invalid (due to leadership switch), it needs
+      // to be reset before return.
+      if (!_isRebalancerValid) {
+        _rebalancer.reset();
+        _isRebalancerValid = true;
+      }
+      return _rebalancer;
 
 Review comment:
   Adding volatile is for correctness and I don't think we should sacrifice correctness in favor of a minor performance improvement. And I am not positive if the performance is a valid concern at this juncture with so many synchronization primitives that are already in use. 
   
   Now is Helix Controller single-thread? Yes. Will it remain that way? Not necessarily. If we do make this change (which should happen soon, especially due to the memory pressure Controller is facing), we should revisit this code. Could we add a TODO here as well, marking it not thread-safe?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370754635
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -187,6 +188,16 @@
 
   private HelixManager _helixManager;
 
+  // Since the stateful rebalancer needs to be lazily constructed when the HelixManager instance is
+  // ready, the GenericHelixController is not constructed with a stateful rebalancer. This wrapper
+  // is to avoid the complexity of handling a nullable value in the event handling process.
+  private final StatefulRebalancerRef _rebalancerRef = new StatefulRebalancerRef() {
+    @Override
+    protected StatefulRebalancer createRebalancer(HelixManager helixManager) {
+      return new WagedRebalancer(helixManager);
+    }
 
 Review comment:
   Could we add a TODO here for making it configurable (based on ClusterConfig for example) in the future as we have different versions of stateful rebalancers?
   
   Ideally we won't have to import WagedRebalancer explicitly (which seems like we're creating a dependency between **Generic**HelixController and a specific rebalancer class). I'm fine with creating an issue and getting to it later.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370780296
 
 

 ##########
 File path: helix-core/src/test/java/org/apache/helix/integration/rebalancer/WagedRebalancer/TestWagedRebalance.java
 ##########
 @@ -473,6 +473,58 @@ public void testNewInstances()
     }
   }
 
+  /**
+   * The stateful WAGED rebalancer will be reset while the controller regains the leadership.
+   * This test is to verify if the reset has been done and the rebalancer has forgotten any previous
+   * status after leadership switched.
+   */
+  @Test(dependsOnMethods = "test")
+  public void testRebalancerReset() throws Exception {
+    // Configure the rebalance preference so as to trigger more partition movements for evenness.
+    // This is to ensure the controller will try to move something if the rebalancer has been reset.
+    ConfigAccessor configAccessor = new ConfigAccessor(_gZkClient);
+    ClusterConfig clusterConfig = configAccessor.getClusterConfig(CLUSTER_NAME);
+    clusterConfig.setGlobalRebalancePreference(ImmutableMap
+        .of(ClusterConfig.GlobalRebalancePreferenceKey.EVENNESS, 10,
+            ClusterConfig.GlobalRebalancePreferenceKey.LESS_MOVEMENT, 0));
+    configAccessor.setClusterConfig(CLUSTER_NAME, clusterConfig);
+
+    int i = 0;
+    for (String stateModel : _testModels) {
+      String db = "Test-DB-" + TestHelper.getTestMethodName() + i++;
+      createResourceWithWagedRebalance(CLUSTER_NAME, db, stateModel, PARTITIONS, _replica,
+          _replica);
+      _gSetupTool.rebalanceStorageCluster(CLUSTER_NAME, db, _replica);
+      _allDBs.add(db);
+    }
+    Thread.sleep(300);
 
 Review comment:
   You are right, but there are too many...
   Let me add to the newly created method. After we fixed the issue  https://github.com/apache/helix/issues/526 we will be able to remove all sleep anyway.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371441708
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
 ##########
 @@ -38,5 +38,6 @@
   AsyncFIFOWorkerPool,
   PipelineType,
   LastRebalanceFinishTimeStamp,
-  ControllerDataProvider
+  ControllerDataProvider,
+  STATEFUL_REBALANCER
 
 Review comment:
   Either way is fine - I realized there is no convention for this enum.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371444804
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -249,23 +220,33 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
     return true;
   }
 
+  private void updateWagedRebalancer(WagedRebalancer wagedRebalancer, ClusterConfig clusterConfig) {
 
 Review comment:
   True, but adding a check is safer and we could add a TODO to remove with better design? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370757079
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1292,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
 
 Review comment:
   1. Should we think about which methods in this class should be idempotent?
   
   2. Why an abstract class vs an interface?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370779423
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -282,22 +263,22 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
 
     Map<String, IdealState> newIdealStates = new HashMap<>();
 
-    ClusterConfig clusterConfig = cache.getClusterConfig();
-    WagedRebalancer wagedRebalancer =
-        getWagedRebalancer(helixManager, clusterConfig.getGlobalRebalancePreference(),
-            clusterConfig.isGlobalRebalanceAsyncModeEnabled());
-    try {
-      newIdealStates.putAll(wagedRebalancer.computeNewIdealStates(cache, wagedRebalancedResourceMap,
-          currentStateOutput));
-    } catch (HelixRebalanceException ex) {
-      // Note that unlike the legacy rebalancer, the WAGED rebalance won't return partial result.
-      // Since it calculates for all the eligible resources globally, a partial result is invalid.
-      // TODO propagate the rebalancer failure information to updateRebalanceStatus for monitoring.
+    if (wagedRebalancer != null) {
 
 Review comment:
   As mentioned above, I would prefer to reject call (by throwing exceptions or so) inside the rebalancer. For the calc stage, there is nothing it can do.
   Moreover, checking inside the rebalancer can help to keep the close status private.
   TODO added.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371459473
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -282,22 +263,22 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
 
     Map<String, IdealState> newIdealStates = new HashMap<>();
 
-    ClusterConfig clusterConfig = cache.getClusterConfig();
-    WagedRebalancer wagedRebalancer =
-        getWagedRebalancer(helixManager, clusterConfig.getGlobalRebalancePreference(),
-            clusterConfig.isGlobalRebalanceAsyncModeEnabled());
-    try {
-      newIdealStates.putAll(wagedRebalancer.computeNewIdealStates(cache, wagedRebalancedResourceMap,
-          currentStateOutput));
-    } catch (HelixRebalanceException ex) {
-      // Note that unlike the legacy rebalancer, the WAGED rebalance won't return partial result.
-      // Since it calculates for all the eligible resources globally, a partial result is invalid.
-      // TODO propagate the rebalancer failure information to updateRebalanceStatus for monitoring.
+    if (wagedRebalancer != null) {
 
 Review comment:
   No reject call yet. TODO is for this task. It is not a must for now since all the callers are private.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371457935
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1292,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
 
 Review comment:
   My point is that we don't restrict the behavior in the Ref class. The source of truth is in the rebalancer object itself. So it won't be explicitly mentioned here.
   Moreover, as I commented in the other thread, it is not a real concern for now. But I have added a TODO in the rebalancer class for rejecting requests after close() called.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369870892
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -626,6 +631,22 @@ private void handleEvent(ClusterEvent event, BaseControllerDataProvider dataProv
       return;
     }
 
+    // Event handling happens in a different thread from the onControllerChange processing thread.
+    // Thus, there are several possible conditions.
+    // 1. Event handled after leadership acquired. So we will have a valid rebalancer for the
+    // event processing.
+    // 2. Event handled shortly after leadership relinquished. And the rebalancer has not been
+    // marked as invalid yet. So the event will be processed the same as case one.
+    // 3. Event is leftover from the previous session, and it is handled when the controller
 
 Review comment:
   We should, but there is no good way to do it for now. And even we discard the queued events, the concerning one is actually the one that is being processed. Force canceling that could be risky. So I don't think to do it without deeply improve the pipeline logic is a good idea.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370758029
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
 ##########
 @@ -38,5 +38,6 @@
   AsyncFIFOWorkerPool,
   PipelineType,
   LastRebalanceFinishTimeStamp,
-  ControllerDataProvider
+  ControllerDataProvider,
+  STATEFUL_REBALANCER
 
 Review comment:
   I think now, with the introduction of a StatefulRebalancer interface, having this name makes sense and pretty clear. What do you think?
   
   Also, could we follow the enum convention? Use `StatefulRebalancer` instead?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang merged pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang merged pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370341846
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1259,3 +1287,54 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.start();
   }
 }
+
+/**
+ * A wrapper class for the WAGED rebalancer instance.
+ */
+class WagedRebalancerRef {
+  private WagedRebalancer _rebalancer = null;
+  private boolean _isRebalancerValid = true;
+
+  private void createWagedRebalancer(HelixManager helixManager) {
 
 Review comment:
   That's the first thing I tried, if we don't need the _isRebalancerValid flag, it works fine and we won't need to create StatefulRebalancerRef at all. Basically, in this case, we make the WAGED rebalancer singleton.
   However, since we have a flag that is not quite suitable for the WAGED class, it would be easier for me to put both the flag and the rebalancer in a StatefulRebalancerRef object. And the Ref class is for the GenericHelixController only. Given that done, the singleton pattern becomes optional.
   
   Moreover, it is concerning to make the WAGED rebalancer singleton. If we want to use it for both task framework and the resource rebalancer, or we want to manage different resources with different rebalancer configuration, singleton won't work. That's why I didn't pick up that design.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371439087
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1292,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
 
 Review comment:
   My concern is that we should probably make the behavior more explicit around create() or close() being multiple times. It wasn't very clear from the code?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370778177
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -249,23 +220,33 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
     return true;
   }
 
+  private void updateWagedRebalancer(WagedRebalancer wagedRebalancer, ClusterConfig clusterConfig) {
 
 Review comment:
   That's a good point. For now, an explicit check would be optional. If the rebalancer has been closed, the computing will fail. One concern might be metric. It will be unregistered after the close.
   
   Let me add a TODO to the close() method. I would prefer to keep this PR simple for now.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371460331
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -249,23 +220,33 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
     return true;
   }
 
+  private void updateWagedRebalancer(WagedRebalancer wagedRebalancer, ClusterConfig clusterConfig) {
 
 Review comment:
   I'd prefer adding a TODO to add this check properly. Since that is not one or two lines of change, let's not squeeze it into this PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371441708
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
 ##########
 @@ -38,5 +38,6 @@
   AsyncFIFOWorkerPool,
   PipelineType,
   LastRebalanceFinishTimeStamp,
-  ControllerDataProvider
+  ControllerDataProvider,
+  STATEFUL_REBALANCER
 
 Review comment:
   Either way is fine - I realized there is no convention for enum.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371495252
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -249,23 +220,33 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
     return true;
   }
 
+  private void updateWagedRebalancer(WagedRebalancer wagedRebalancer, ClusterConfig clusterConfig) {
 
 Review comment:
   Will we be tracking this work with an issue?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369865622
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1259,3 +1287,54 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.start();
   }
 }
+
+/**
+ * A wrapper class for the WAGED rebalancer instance.
+ */
+class WagedRebalancerRef {
+  private WagedRebalancer _rebalancer = null;
+  private boolean _isRebalancerValid = true;
+
+  private void createWagedRebalancer(HelixManager helixManager) {
 
 Review comment:
   If we want to follow the double-checked locking pattern for lazy initialization, we could follow the following structure using `volatile`: 
   
   ```
   public class Singleton {
       private volatile static Singleton instance;
       private Singleton() {}
       public static Singleton getInstance() {
           if (instance == null) {
               synchronized (Singleton.class) {
                   if (instance == null) {
                       instance = new Singleton();
                   }
               }
           }
           return instance;
       }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371495058
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -282,22 +263,22 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
 
     Map<String, IdealState> newIdealStates = new HashMap<>();
 
-    ClusterConfig clusterConfig = cache.getClusterConfig();
-    WagedRebalancer wagedRebalancer =
-        getWagedRebalancer(helixManager, clusterConfig.getGlobalRebalancePreference(),
-            clusterConfig.isGlobalRebalanceAsyncModeEnabled());
-    try {
-      newIdealStates.putAll(wagedRebalancer.computeNewIdealStates(cache, wagedRebalancedResourceMap,
-          currentStateOutput));
-    } catch (HelixRebalanceException ex) {
-      // Note that unlike the legacy rebalancer, the WAGED rebalance won't return partial result.
-      // Since it calculates for all the eligible resources globally, a partial result is invalid.
-      // TODO propagate the rebalancer failure information to updateRebalanceStatus for monitoring.
+    if (wagedRebalancer != null) {
 
 Review comment:
   I still don't see a TODO stating that we need to check on the rebalancer status (whether it's closed, null, etc.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370784633
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
 ##########
 @@ -38,5 +38,6 @@
   AsyncFIFOWorkerPool,
   PipelineType,
   LastRebalanceFinishTimeStamp,
-  ControllerDataProvider
+  ControllerDataProvider,
+  STATEFUL_REBALANCER
 
 Review comment:
   Thanks for the feedback.
   
   As for the convention, it is not really clear to me. So I did some research. There are different opinions. The one that seems to be more reasonable to me is that enum items should be treated as constants, so they should follow constants name convention. Given this, uppercase letters make more sense here. I intend to enforce this as our enum convention unless a different rule has been documented somewhere else that I don't know : )

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369741919
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
 ##########
 @@ -38,5 +38,6 @@
   AsyncFIFOWorkerPool,
   PipelineType,
   LastRebalanceFinishTimeStamp,
-  ControllerDataProvider
+  ControllerDataProvider,
+  STATEFUL_REBALANCER
 
 Review comment:
   Please comment if anyone of you has a better name. This one is not good.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369862100
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -626,6 +631,22 @@ private void handleEvent(ClusterEvent event, BaseControllerDataProvider dataProv
       return;
     }
 
+    // Event handling happens in a different thread from the onControllerChange processing thread.
+    // Thus, there are several possible conditions.
+    // 1. Event handled after leadership acquired. So we will have a valid rebalancer for the
+    // event processing.
+    // 2. Event handled shortly after leadership relinquished. And the rebalancer has not been
+    // marked as invalid yet. So the event will be processed the same as case one.
+    // 3. Event is leftover from the previous session, and it is handled when the controller
+    // regains the leadership. The rebalancer will be reset before being used. That is the
+    // expected behavior so as to avoid inconsistent rebalance result.
+    // 4. Event handled shortly after leadership relinquished. And the rebalancer has been marked
+    // as invalid. So we reset the rebalancer. But the later isLeader() check will return false and
 
 Review comment:
   Could you please explain why another pipeline is being triggered when the controller is no longer a leader?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370755780
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1292,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
+    private T _rebalancer = null;
+    private boolean _isRebalancerValid = true;
+
+    /**
+     * @param helixManager
+     * @return A new stateful rebalancer instance with initial state.
+     */
+    protected abstract T createRebalancer(HelixManager helixManager);
 
 Review comment:
   How do we prevent this from being called multiple times?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369871376
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -626,6 +631,22 @@ private void handleEvent(ClusterEvent event, BaseControllerDataProvider dataProv
       return;
     }
 
+    // Event handling happens in a different thread from the onControllerChange processing thread.
+    // Thus, there are several possible conditions.
+    // 1. Event handled after leadership acquired. So we will have a valid rebalancer for the
+    // event processing.
+    // 2. Event handled shortly after leadership relinquished. And the rebalancer has not been
+    // marked as invalid yet. So the event will be processed the same as case one.
+    // 3. Event is leftover from the previous session, and it is handled when the controller
+    // regains the leadership. The rebalancer will be reset before being used. That is the
+    // expected behavior so as to avoid inconsistent rebalance result.
+    // 4. Event handled shortly after leadership relinquished. And the rebalancer has been marked
+    // as invalid. So we reset the rebalancer. But the later isLeader() check will return false and
 
 Review comment:
   1. in event thread, check if is leader, return true, so event thread continues
   2. in the ZK thread, session expired, so handling triggered, and leadership switch is done.
   3. in the event thread, keep triggering the pipeline.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369864578
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -187,6 +187,11 @@
 
   private HelixManager _helixManager;
 
+  // Since the WAGED rebalancer needs to be lazily constructed, the GenericHelixController will not
+  // be constructed with a WAGED rebalancer. This wrapper is to avoid the complexity of handling a
+  // nullable value in the GenericHelixController logic.
+  private final WagedRebalancerRef _wagedRebalancerRef = new WagedRebalancerRef();
 
 Review comment:
   Design suggestion:
   
   Could we make this generic by using `RebalancerRef` or `StatefulRebalancerRef` (since this is only a problem for stateful rebalancers)? Also, instead of directly instantiating, we could make this a cluster config and have it be dynamically created? (Waged, CrushED, etc.). That way, we decouple GenericHelixController from WAGED (or a particular rebalancer) as much as possible.
   
   It would be okay to set the default type to be WAGED since this is the only stateful rebalancer we have available so far?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369864168
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1259,3 +1287,54 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.start();
   }
 }
+
+/**
+ * A wrapper class for the WAGED rebalancer instance.
+ */
+class WagedRebalancerRef {
 
 Review comment:
   Do you think it would be better to generalize this into all rebalancers?
   
   For example, `RebalancerRef` could be an interface here that provides things like `getRebalancer() or createRebalancer(), invalidate()`, etc. so that the GenericHelixController class does not depend on a specific stateful rebalancer here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370773330
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -187,6 +188,16 @@
 
   private HelixManager _helixManager;
 
+  // Since the stateful rebalancer needs to be lazily constructed when the HelixManager instance is
+  // ready, the GenericHelixController is not constructed with a stateful rebalancer. This wrapper
+  // is to avoid the complexity of handling a nullable value in the event handling process.
+  private final StatefulRebalancerRef _rebalancerRef = new StatefulRebalancerRef() {
+    @Override
+    protected StatefulRebalancer createRebalancer(HelixManager helixManager) {
+      return new WagedRebalancer(helixManager);
+    }
 
 Review comment:
   The vision is not clear to me. I don't think we have a solid idea for the TODO. For example, relying on the cluster config to set up the rebalancer type may be invalid. If we have multiple optional stateful rebalancers, they will be configured in the ideal state for each resource, instead of cluster config.
   
   More likely, we will need to init all the stateful rebalancers here. It is just too far even for planning. For now, please assume we will only have one stateful rebalancer.
   
   However, making the code generic enough for the future extension is always a good idea. That why I added the interface according to your suggestion : )

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370768799
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1292,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
+    private T _rebalancer = null;
+    private boolean _isRebalancerValid = true;
+
+    /**
+     * @param helixManager
+     * @return A new stateful rebalancer instance with initial state.
+     */
+    protected abstract T createRebalancer(HelixManager helixManager);
 
 Review comment:
   It is prevented in the get method. This method itself is a protected method, with no external callers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#issuecomment-577462460
 
 
   @narendly Thanks for the review, I took a quick look. The comments will be addressed in this PR after we finished https://github.com/apache/helix/pull/696. Please move to there and review that PR first : )

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369870319
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -626,6 +631,22 @@ private void handleEvent(ClusterEvent event, BaseControllerDataProvider dataProv
       return;
     }
 
+    // Event handling happens in a different thread from the onControllerChange processing thread.
+    // Thus, there are several possible conditions.
+    // 1. Event handled after leadership acquired. So we will have a valid rebalancer for the
+    // event processing.
+    // 2. Event handled shortly after leadership relinquished. And the rebalancer has not been
 
 Review comment:
   The one that is being processed will continue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370776068
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -187,6 +188,16 @@
 
   private HelixManager _helixManager;
 
+  // Since the stateful rebalancer needs to be lazily constructed when the HelixManager instance is
+  // ready, the GenericHelixController is not constructed with a stateful rebalancer. This wrapper
+  // is to avoid the complexity of handling a nullable value in the event handling process.
+  private final StatefulRebalancerRef _rebalancerRef = new StatefulRebalancerRef() {
+    @Override
+    protected StatefulRebalancer createRebalancer(HelixManager helixManager) {
+      return new WagedRebalancer(helixManager);
+    }
 
 Review comment:
   In addition, about your 2nd paragraph comment, I guess what you concerned about is that the rebalancer class should be only configurable in the IS. So the GenericHelixController should not be aware of the resource level thing. However, please note that even there is no resource using the WAGED rebalancer, the controller will still need to init the stateful rebalancer instance.
   
   I agree this design can be improved. But I would prefer to keep it simple before the rebalancer logic and the algorithm are stable enough.
   
   For now, let me add a TODO.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371506478
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1293,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
+    private T _rebalancer = null;
+    private boolean _isRebalancerValid = true;
+
+    /**
+     * @param helixManager
+     * @return A new stateful rebalancer instance with initial state.
+     */
+    protected abstract T createRebalancer(HelixManager helixManager);
+
+    /**
+     * Mark the current rebalancer object to be invalid, which indicates it needs to be reset before
+     * the next usage.
+     */
+    synchronized void invalidateRebalancer() {
+      _isRebalancerValid = false;
+    }
+
+    /**
+     * @return A valid rebalancer object.
+     *         If the rebalancer is no longer valid, it will be reset before returning.
+     */
+    synchronized T getRebalancer(HelixManager helixManager) {
+      // Lazily initialize the stateful rebalancer instance since the GenericHelixController
+      // instance is instantiated without the HelixManager information that is required.
+      if (_rebalancer == null) {
+        _rebalancer = createRebalancer(helixManager);
+        _isRebalancerValid = true;
+      }
+      // If the rebalance exists but has been marked as invalid (due to leadership switch), it needs
+      // to be reset before return.
+      if (!_isRebalancerValid) {
+        _rebalancer.reset();
+        _isRebalancerValid = true;
+      }
+      return _rebalancer;
 
 Review comment:
   I understand the concern. And I also thought about that. Actually, even the controller has been changed and split, there probably still be one single thread calculating for resource rebalance. Given that, in the foreseeable future, my assumption will still be valid.
   I agree with you that the performance impact is minor. But since it won't impact our correctness, I'd prefer to keep the design simple and not design for the unknown future.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#issuecomment-577448680
 
 
   Split the reset logic to https://github.com/apache/helix/pull/696 as suggested. @narendly Please review that one first.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#issuecomment-577445579
 
 
   > @jiajunwang Could we separate the scope for this change?
   > 
   > 1. Is it possible to create a PR that includes the change on resetting the cluster data change detector?
   > 2. The next PR would be the movement of the rebalancer to Generic Helix Controller. It's not too clear why this is necessary
   > 
   > > Move the stateful WAGED rebalancer to the GenericHelixController object instead of the rebalance stage. This is for resolving the possible race condition between the event processing thread and leader switch handling thread.
   > 
   > Could you explain this further? I am afraid in terms of design, we might be introducing a new dependency (between GenericHelixController and the rebalancer/stages). This doesn't seem too great to me because Helix's design I believe aims to keep them independent.
   
   Separate change is a good idea, will do so. So I will create another PR before this one. Just adding reset methods.
   As for the additional dependency that you mentioned,
   1. Another option is to keep the assignment metadata store and change detector in the controller. But that actually makes the story more complicated and introduce some skip level dependencies. I tried, this design is not looking nice.
   2. Even with this change, GenericHelixController does not depend on the stage.
   3. GenericHelixController will be bound with the WAGED rebalancer. Since it is stateful, tracking it in the controller makes more sense than a certain stage as we are doing now. Note that the stages are better to be stateless. But I agree it is not perfect.
   Like what we did for the several other stateful objects in the controller instance, I don't have a better option for now. Please let me know if you have a good idea.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370770630
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1292,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
 
 Review comment:
   Since it is a private class, I don't think we need to overdesign it. As for idempotent, could you be more specific, please? What's the concern that you have?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369856439
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -626,6 +631,22 @@ private void handleEvent(ClusterEvent event, BaseControllerDataProvider dataProv
       return;
     }
 
+    // Event handling happens in a different thread from the onControllerChange processing thread.
+    // Thus, there are several possible conditions.
+    // 1. Event handled after leadership acquired. So we will have a valid rebalancer for the
+    // event processing.
+    // 2. Event handled shortly after leadership relinquished. And the rebalancer has not been
 
 Review comment:
   Do you mean that the old controller will continue to process events even after having lost the leadership using the in-memory rebalancer/cache state?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on issue #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#issuecomment-577429092
 
 
   @jiajunwang Could we separate the scope for this change?
   
   1. Is it possible to create a PR that includes the change on resetting the cluster data change detector?
   2. The next PR would be the movement of the rebalancer to Generic Helix Controller. It's not too clear why this is necessary 
   
   > Move the stateful WAGED rebalancer to the GenericHelixController object instead of the rebalance stage. This is for resolving the possible race condition between the event processing thread and leader switch handling thread.
   
   Could you explain this further?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370336341
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -187,6 +187,11 @@
 
   private HelixManager _helixManager;
 
+  // Since the WAGED rebalancer needs to be lazily constructed, the GenericHelixController will not
+  // be constructed with a WAGED rebalancer. This wrapper is to avoid the complexity of handling a
+  // nullable value in the GenericHelixController logic.
+  private final WagedRebalancerRef _wagedRebalancerRef = new WagedRebalancerRef();
 
 Review comment:
   I've tried to make the class more generic. Now it is an abstract class. But the configuration part seems to be overdesign for now. Let's consider that part once we have multiple ones.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369857021
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -626,6 +631,22 @@ private void handleEvent(ClusterEvent event, BaseControllerDataProvider dataProv
       return;
     }
 
+    // Event handling happens in a different thread from the onControllerChange processing thread.
+    // Thus, there are several possible conditions.
+    // 1. Event handled after leadership acquired. So we will have a valid rebalancer for the
+    // event processing.
+    // 2. Event handled shortly after leadership relinquished. And the rebalancer has not been
+    // marked as invalid yet. So the event will be processed the same as case one.
+    // 3. Event is leftover from the previous session, and it is handled when the controller
 
 Review comment:
   Shouldn't we just discard all events from stale sessions?
   
   Resetting the cache will trigger a fresh, up-to-date global baseline calculation anyways?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371469494
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1293,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
+    private T _rebalancer = null;
+    private boolean _isRebalancerValid = true;
+
+    /**
+     * @param helixManager
+     * @return A new stateful rebalancer instance with initial state.
+     */
+    protected abstract T createRebalancer(HelixManager helixManager);
+
+    /**
+     * Mark the current rebalancer object to be invalid, which indicates it needs to be reset before
+     * the next usage.
+     */
+    synchronized void invalidateRebalancer() {
+      _isRebalancerValid = false;
+    }
+
+    /**
+     * @return A valid rebalancer object.
+     *         If the rebalancer is no longer valid, it will be reset before returning.
+     */
+    synchronized T getRebalancer(HelixManager helixManager) {
+      // Lazily initialize the stateful rebalancer instance since the GenericHelixController
+      // instance is instantiated without the HelixManager information that is required.
+      if (_rebalancer == null) {
+        _rebalancer = createRebalancer(helixManager);
+        _isRebalancerValid = true;
+      }
+      // If the rebalance exists but has been marked as invalid (due to leadership switch), it needs
+      // to be reset before return.
+      if (!_isRebalancerValid) {
+        _rebalancer.reset();
+        _isRebalancerValid = true;
+      }
+      return _rebalancer;
 
 Review comment:
   I still think it is not necessary since we want to maximize the event handling performance here.
   
   1. getRebalancer is called in a single thread, there is no concern that multiple getRebalancer will cause the rebalancer to be created multiple times.
   2. the possible concern is between getRebalancer() and closeRebalancer().
   2.1. it is not possible that getRebalancer is called after closeRebalancer. Since the event thread has been stopped by the controller before closeRebalancer is triggered.
   2.2. if the closeRebalancer is called after getRebalancer, and the reference has not been safed in the main memory, closeRebalancer does nothing. The clean up will be done a little bit late while the rebalancer object is GCed. This is fine. Since the controller has been close, the rebalancer will not be used for sure.
   3. On the other hand, when adding volatile to the _rebalancer object, each getRebalancer will access the main memory. It's a minor delay increament, but if it happens on every event, these latencies together might count something. So I prefer not adding volatile unless it help to resolve a real concern.
   
   Moreover, as I mentioned, the rebalancer should not be a singleton by design.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370759418
 
 

 ##########
 File path: helix-core/src/test/java/org/apache/helix/integration/rebalancer/WagedRebalancer/TestWagedRebalance.java
 ##########
 @@ -473,6 +473,58 @@ public void testNewInstances()
     }
   }
 
+  /**
+   * The stateful WAGED rebalancer will be reset while the controller regains the leadership.
+   * This test is to verify if the reset has been done and the rebalancer has forgotten any previous
+   * status after leadership switched.
+   */
+  @Test(dependsOnMethods = "test")
+  public void testRebalancerReset() throws Exception {
+    // Configure the rebalance preference so as to trigger more partition movements for evenness.
+    // This is to ensure the controller will try to move something if the rebalancer has been reset.
+    ConfigAccessor configAccessor = new ConfigAccessor(_gZkClient);
+    ClusterConfig clusterConfig = configAccessor.getClusterConfig(CLUSTER_NAME);
+    clusterConfig.setGlobalRebalancePreference(ImmutableMap
+        .of(ClusterConfig.GlobalRebalancePreferenceKey.EVENNESS, 10,
+            ClusterConfig.GlobalRebalancePreferenceKey.LESS_MOVEMENT, 0));
+    configAccessor.setClusterConfig(CLUSTER_NAME, clusterConfig);
+
+    int i = 0;
+    for (String stateModel : _testModels) {
+      String db = "Test-DB-" + TestHelper.getTestMethodName() + i++;
+      createResourceWithWagedRebalance(CLUSTER_NAME, db, stateModel, PARTITIONS, _replica,
+          _replica);
+      _gSetupTool.rebalanceStorageCluster(CLUSTER_NAME, db, _replica);
+      _allDBs.add(db);
+    }
+    Thread.sleep(300);
 
 Review comment:
   We should add a TODO here for all the sleep() calls?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370758902
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -249,23 +220,33 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
     return true;
   }
 
+  private void updateWagedRebalancer(WagedRebalancer wagedRebalancer, ClusterConfig clusterConfig) {
 
 Review comment:
   Should we check things on the WagedRebalancer too? is closed?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370768799
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1258,4 +1292,57 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.setDaemon(true);
     eventThread.start();
   }
+
+  /**
+   * A wrapper class for the stateful rebalancer instance that will be tracked in the
+   * GenericHelixController.
+   */
+  private abstract class StatefulRebalancerRef<T extends StatefulRebalancer> {
+    private T _rebalancer = null;
+    private boolean _isRebalancerValid = true;
+
+    /**
+     * @param helixManager
+     * @return A new stateful rebalancer instance with initial state.
+     */
+    protected abstract T createRebalancer(HelixManager helixManager);
 
 Review comment:
   It is prevented in the get method? This method itself is a protected method, with no external callers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369888181
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1259,3 +1287,54 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.start();
   }
 }
+
+/**
+ * A wrapper class for the WAGED rebalancer instance.
+ */
+class WagedRebalancerRef {
+  private WagedRebalancer _rebalancer = null;
+  private boolean _isRebalancerValid = true;
+
+  private void createWagedRebalancer(HelixManager helixManager) {
+    // Create WagedRebalancer instance if it hasn't been already initialized
+    if (_rebalancer == null) {
+      _rebalancer = new WagedRebalancer(helixManager);
+      _isRebalancerValid = true;
+    }
+  }
+
+  /**
+   * Mark the current rebalancer object to be invalid, which indicates it needs to be reset before
+   * the next usage.
+   */
+  synchronized void invalidRebalancer() {
 
 Review comment:
   Sure, I will update after https://github.com/apache/helix/pull/696.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
narendly commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r371445331
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -282,22 +263,22 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
 
     Map<String, IdealState> newIdealStates = new HashMap<>();
 
-    ClusterConfig clusterConfig = cache.getClusterConfig();
-    WagedRebalancer wagedRebalancer =
-        getWagedRebalancer(helixManager, clusterConfig.getGlobalRebalancePreference(),
-            clusterConfig.isGlobalRebalanceAsyncModeEnabled());
-    try {
-      newIdealStates.putAll(wagedRebalancer.computeNewIdealStates(cache, wagedRebalancedResourceMap,
-          currentStateOutput));
-    } catch (HelixRebalanceException ex) {
-      // Note that unlike the legacy rebalancer, the WAGED rebalance won't return partial result.
-      // Since it calculates for all the eligible resources globally, a partial result is invalid.
-      // TODO propagate the rebalancer failure information to updateRebalanceStatus for monitoring.
+    if (wagedRebalancer != null) {
 
 Review comment:
   Where is the reject call? Are you referring to Line 271?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r370786375
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/stages/BestPossibleStateCalcStage.java
 ##########
 @@ -249,23 +220,33 @@ private boolean validateOfflineInstancesLimit(final ResourceControllerDataProvid
     return true;
   }
 
+  private void updateWagedRebalancer(WagedRebalancer wagedRebalancer, ClusterConfig clusterConfig) {
 
 Review comment:
   One more comment, for now, the close is only called in 2 conditions:
   1. the instance is GCed. So there should be no one calling any methods.
   2. The controller is shutdown. There won't be a pipeline running after. So there should be no one calling any methods.
   
   That's why I don't think we need to do this check immediately.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #690: Reset the WAGED rebalancer once the controller newly acquires leadership.
URL: https://github.com/apache/helix/pull/690#discussion_r369888562
 
 

 ##########
 File path: helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 ##########
 @@ -1259,3 +1287,54 @@ private void initPipeline(Thread eventThread, BaseControllerDataProvider cache)
     eventThread.start();
   }
 }
+
+/**
+ * A wrapper class for the WAGED rebalancer instance.
+ */
+class WagedRebalancerRef {
 
 Review comment:
   I think it's a good idea. Let me try to implement a simple version as a starting point. Will update later.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org