You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2021/04/16 00:28:55 UTC

[GitHub] [helix] NealSun96 opened a new pull request #1701: Add WagedRebalancer Option to Throw Exception on FAILED_TO_CALCULATE

NealSun96 opened a new pull request #1701:
URL: https://github.com/apache/helix/pull/1701


   ### Issues
   
   - [x] My PR addresses the following Helix issues and references them in the PR description:
   
   Resolves #1700 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI changes:
   
   ```
   When WAGED encounters a "FAILED_TO_CALCULATE" exception, the exception is processed by the pipeline without propagating. This is by design because WAGED doesn't consider this exception as pipeline breaking - WAGED simply returns the last known good result instead and continue the pipeline. 
   WAGED simulation API calls the pipelines to get a simulated rebalance result, and as a result, it also cannot reflect "FAILED_TO_CALCULATE" except for an error log. 
   Down the line, we would like the simulation API to be used to forecast potential calculation failure: when a customer makes partition weight changes, for example, they can first use the API to see the aftermath of such change, and don't do the changes if they can result calculation failures. 
   Therefore, we need the simulation API to have the ability to propagate "FAILED_TO_CALCULATE" exception. 
   ```
   
   ### Tests
   
   - [x] The following tests are written for this issue:
   
   ```
   TestWagedRebalancer.testAlgorithmException is modified. 
   ```
   
   - The following is the result of the "mvn test" command on the appropriate module:
   
   ```
   [INFO] Tests run: 1264, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5,070.797 s - in TestSuite
   [INFO] 
   [INFO] Results:
   [INFO] 
   [INFO] Tests run: 1264, Failures: 0, Errors: 0, Skipped: 0
   [INFO] 
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  01:24 h
   [INFO] Finished at: 2021-04-13T22:27:08-07:00
   [INFO] ------------------------------------------------------------------------
   ```
   
   ### Documentation (Optional)
   
   - In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] NealSun96 commented on a change in pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
NealSun96 commented on a change in pull request #1701:
URL: https://github.com/apache/helix/pull/1701#discussion_r617784196



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/ReadOnlyWagedRebalancer.java
##########
@@ -48,6 +50,14 @@ public ReadOnlyWagedRebalancer(String metadataStoreAddress, String clusterName,
         ConstraintBasedAlgorithmFactory.getInstance(preferences), Optional.empty());
   }
 
+  @Override
+  protected List<HelixRebalanceException.Type> failureTypesToPropagate() {
+    // Also propagate FAILED_TO_CALCULATE for ReadOnlyWagedRebalancer
+    return Arrays.asList(HelixRebalanceException.Type.INVALID_REBALANCER_STATUS,

Review comment:
       I don't think that's better because ReadOnlyWagedRebalancer doesn't always propagate a superset of failure types; that might change in the future. Separate lists are clearer. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] NealSun96 commented on pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
NealSun96 commented on pull request #1701:
URL: https://github.com/apache/helix/pull/1701#issuecomment-833854765


   This PR is ready to be merged, approved by @jiajunwang      
   Final commit message:
   ## Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE ##
   WAGED simulation API now throws HelixException upon encountering FAILED_TO_CALCULATE


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] NealSun96 commented on a change in pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
NealSun96 commented on a change in pull request #1701:
URL: https://github.com/apache/helix/pull/1701#discussion_r626914954



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/ReadOnlyWagedRebalancer.java
##########
@@ -48,6 +50,14 @@ public ReadOnlyWagedRebalancer(String metadataStoreAddress, String clusterName,
         ConstraintBasedAlgorithmFactory.getInstance(preferences), Optional.empty());
   }
 
+  @Override
+  protected List<HelixRebalanceException.Type> failureTypesToPropagate() {
+    // Also propagate FAILED_TO_CALCULATE for ReadOnlyWagedRebalancer
+    return Arrays.asList(HelixRebalanceException.Type.INVALID_REBALANCER_STATUS,

Review comment:
       I can see your point about ReadOnlyWagedRebalancer now. Making changes accordingly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] NealSun96 commented on a change in pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
NealSun96 commented on a change in pull request #1701:
URL: https://github.com/apache/helix/pull/1701#discussion_r618791246



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/ReadOnlyWagedRebalancer.java
##########
@@ -48,6 +50,14 @@ public ReadOnlyWagedRebalancer(String metadataStoreAddress, String clusterName,
         ConstraintBasedAlgorithmFactory.getInstance(preferences), Optional.empty());
   }
 
+  @Override
+  protected List<HelixRebalanceException.Type> failureTypesToPropagate() {
+    // Also propagate FAILED_TO_CALCULATE for ReadOnlyWagedRebalancer
+    return Arrays.asList(HelixRebalanceException.Type.INVALID_REBALANCER_STATUS,

Review comment:
       I disagree with this: for example, some extension of `WagedRebalancer (if not `ReadOnlyWagedRebalancer`) could think UNKNOWN_FAILURE is okay and just take the last-known good result instead of an exception. 
   
   I understand where you come from: if we add a new HelixRebalanceException.Type, it seems easier if we do super sets so we just modify the base class. However, even if we introduce a new type, each implementation should be reviewed to see if that type should be propagated. Imo this coupling is unnecessary here. If you don't feel strongly about this, I'd like to stick to separate lists. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang merged pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
jiajunwang merged pull request #1701:
URL: https://github.com/apache/helix/pull/1701


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1701:
URL: https://github.com/apache/helix/pull/1701#discussion_r619416014



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/ReadOnlyWagedRebalancer.java
##########
@@ -48,6 +50,14 @@ public ReadOnlyWagedRebalancer(String metadataStoreAddress, String clusterName,
         ConstraintBasedAlgorithmFactory.getInstance(preferences), Optional.empty());
   }
 
+  @Override
+  protected List<HelixRebalanceException.Type> failureTypesToPropagate() {
+    // Also propagate FAILED_TO_CALCULATE for ReadOnlyWagedRebalancer
+    return Arrays.asList(HelixRebalanceException.Type.INVALID_REBALANCER_STATUS,

Review comment:
       1. Let's worry about another extension of the WAGED rebalancer when they are created. And no matter what you do for the ReadOnlyWagedRebalancer, you can do it differently according to the new requirement later. Not relevant.
   2. For ReadOnlyWagedRebalancer, I don't think we need to revisit. We can just carry over whatever WagedRebalancer has plus the FAILED_TO_CALCULATE failure type. If this is not guaranteed, then ReadOnlyWagedRebalancer cannot be used in the simulate tool, since it may cause different behavior. That's exactly why I strongly prefer we just call the method of super class here. It prevents potential bugs in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1701:
URL: https://github.com/apache/helix/pull/1701#discussion_r616929981



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/ReadOnlyWagedRebalancer.java
##########
@@ -48,6 +50,14 @@ public ReadOnlyWagedRebalancer(String metadataStoreAddress, String clusterName,
         ConstraintBasedAlgorithmFactory.getInstance(preferences), Optional.empty());
   }
 
+  @Override
+  protected List<HelixRebalanceException.Type> failureTypesToPropagate() {
+    // Also propagate FAILED_TO_CALCULATE for ReadOnlyWagedRebalancer
+    return Arrays.asList(HelixRebalanceException.Type.INVALID_REBALANCER_STATUS,

Review comment:
       nit, something like returning super.failureTypesToPropagate().add(HelixRebalanceException.Type.FAILED_TO_CALCULATE) ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1701:
URL: https://github.com/apache/helix/pull/1701#discussion_r626938777



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedRebalancer.java
##########
@@ -393,6 +394,12 @@ public void close() {
     return finalIdealStateMap;
   }
 
+  protected List<HelixRebalanceException.Type> failureTypesToPropagate() {
+    // By default, only propagate the following types of HelixRebalanceException
+    return Arrays.asList(HelixRebalanceException.Type.INVALID_REBALANCER_STATUS,

Review comment:
       nit, let's define a static unmodifiable list for the WagedRebalancer class to avoid creating a list on each call. Then we can always create a new list in the ReadOnlyWagedRebalancer.java. Considering ReadOnlyWagedRebalancer.java is less used and does not impact the critical pipeline time, it is better to create new list there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1701:
URL: https://github.com/apache/helix/pull/1701#discussion_r617786970



##########
File path: helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/ReadOnlyWagedRebalancer.java
##########
@@ -48,6 +50,14 @@ public ReadOnlyWagedRebalancer(String metadataStoreAddress, String clusterName,
         ConstraintBasedAlgorithmFactory.getInstance(preferences), Optional.empty());
   }
 
+  @Override
+  protected List<HelixRebalanceException.Type> failureTypesToPropagate() {
+    // Also propagate FAILED_TO_CALCULATE for ReadOnlyWagedRebalancer
+    return Arrays.asList(HelixRebalanceException.Type.INVALID_REBALANCER_STATUS,

Review comment:
       I think ReadOnlyWagedRebalancer should cover all the failure types of the regular rebalancer. Otherwise, what to return if the basic rebalancer cannot finish the calculation? Based on this assumption, adding an additional item to the list would be easier to be maintained.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on pull request #1701: Improve WAGED simulation API by throwing exception on FAILED_TO_CALCULATE

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on pull request #1701:
URL: https://github.com/apache/helix/pull/1701#issuecomment-823498097


   One more thing, please add a test for the tool method to cover the fail to rebalance case. This is the target usage, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org