You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2021/02/04 23:42:02 UTC

[GitHub] [helix] pkuwm opened a new pull request #1637: Skip cancellation for partition reset pending message

pkuwm opened a new pull request #1637:
URL: https://github.com/apache/helix/pull/1637


   ### Issues
   
   - [ ] My PR addresses the following Helix issues and references them in the PR description:
   
   Resolves #1636 
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI changes:
   
   Participant sends a resetting ERROR message but the ST is cancelled by helix controller.
   
   This PR adds a logic: Helix controller should not cancel the pending ST if the pending ST is ERROR -> initialState
   
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   `testNoCancellationForErrorReset`
   
   - The following is the result of the "mvn test" command on the appropriate module:
   
   Tests in helix-core
   ```
   [INFO] Tests run: 1259, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5,552.898 s - in TestSuite
   [INFO]
   [INFO] Results:
   [INFO]
   [INFO] Tests run: 1259, Failures: 0, Errors: 0, Skipped: 0
   [INFO]
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  01:32 h
   [INFO] Finished at: 2021-02-03T21:58:19-08:00
   [INFO] ------------------------------------------------------------------------
   ```
   
   ### Documentation (Optional)
   
   - In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] dasahcc commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
dasahcc commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570623421



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())

Review comment:
       NIT: HelixDefinedState.ERROR.name()

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       Question: This line logic is expected. But I did not find it from previous code?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570626788



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -437,10 +454,10 @@ private Message createStateTransitionCancellationMessage(HelixManager manager, R
       boolean isCancellationEnabled, String currentState) {
 
     if (isCancellationEnabled && cancellationMessage == null) {
-      LogUtil.logInfo(logger, _eventId,
-          "Send cancellation message of the state transition for " + resource.getResourceName()
-              + "." + partitionName + " on " + instanceName + ", currentState: " + currentState
-              + ", nextState: " + (nextState == null ? "N/A" : nextState));
+      logger.info("Event {} : Send cancellation message of the state transition for {}.{} on {}, "

Review comment:
       why change the logger? LogUtil should also log eventId. The thing is that if log format changed, later if some script searching the log may not work. Not a big deal. But if possible, let us don't change to `logger.info`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm commented on pull request #1637:
URL: https://github.com/apache/helix/pull/1637#issuecomment-773682190


   > Can I clarify about the root of the bug?
   > 
   > The error msg is send by the participant itself, not the controller. controller only send cancellation upon seeing the Error message and send cancellation of this error in message generation phase?
   
   It's described in the issue #1636 
   
   Helix ST cancellation is a feature that helix can cancel pending or ongoing partition assignment state transitions(ST) that are unnecessary due to some reasons, eg. pending state does not match the target state. The feature expects ST messages are only sent by helix controller so controller knows what kind of ST can be cancelled.
   
   Root cause: When SN resets partition with ERROR -> OFFLINE ST, helix controller determines it is an unnecessary ST so it sends out cancellation, which is expected for helix ST cancellation feature.
   
   This PR improves it by skipping cancelling the reset ST sent by Admin or participant.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1637:
URL: https://github.com/apache/helix/pull/1637#issuecomment-773680850






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570632024



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       You meant this? https://github.com/apache/helix/blob/b1b194c1bffa68a53f1a81ba307452474c71a016/helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java#L214-L216
   
   Previously only `!currentState.equalsIgnoreCase(pendingMessage.getToState()` is checked. The ERROR state and initial state are not.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())

Review comment:
       I understand `name()` is final and `toString()` could be override. There is mixed use in our code. Either should work. If in the future `toString` is override, we have to make all of them consistent in helix code. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570669595



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       There is a typo in the code,
   (HelixDefinedState.ERROR.name().equals(pendingMessage.getFromState()) && initialState.equals(pendingMessage.getToState()) should be checked together.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1637:
URL: https://github.com/apache/helix/pull/1637#issuecomment-773691056


   > > Can I clarify about the root of the bug?
   > > The error msg is send by the participant itself, not the controller. controller only send cancellation upon seeing the Error message and send cancellation of this error in message generation phase?
   > 
   > It's described in the issue #1636
   > 
   > Helix ST cancellation is a feature that helix can cancel pending or ongoing partition assignment state transitions(ST) that are unnecessary due to some reasons, eg. pending state does not match the target state. The feature expects ST messages are only sent by helix controller so controller knows what kind of ST can be cancelled.
   > 
   > Root cause: When SN resets partition with ERROR -> OFFLINE ST, helix controller determines it is an unnecessary ST so it sends out cancellation, which is expected for helix ST cancellation feature.
   > 
   > This PR improves it by skipping cancelling the reset ST sent by Admin or participant.
   
   Do we have other messages similar to error message that can be set by admin tool/ participant, that can be cancelled by controller?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm commented on pull request #1637:
URL: https://github.com/apache/helix/pull/1637#issuecomment-774304773


   This PR is ready to be merged, approved by @dasahcc 
   
   When admin or participant resets partition with ERROR -> OFFLINE state transition, helix controller determines it is an unnecessary state transition so it cancels the state transition. The ensure partition reset is successful, helix controller should skip cancelling the ERROR -> OFFLINE state transition.
   
   This commit adds a logic: Helix controller should not cancel the pending ST if the pending ST is ERROR -> initialState


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570627836



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,

Review comment:
       One question: As participants and controller are async with each other, what would happen when the participant is processing the previous message (pending message is still there) and once the cancellation message is being sent, then ST has been executed by the participant? This would potentially happen when pipeline processing time gets extended.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 edited a comment on pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
kaisun2000 edited a comment on pull request #1637:
URL: https://github.com/apache/helix/pull/1637#issuecomment-773691056


   > > Can I clarify about the root of the bug?
   > > The error msg is send by the participant itself, not the controller. controller only send cancellation upon seeing the Error message and send cancellation of this error in message generation phase?
   > 
   > It's described in the issue #1636
   > 
   > Helix ST cancellation is a feature that helix can cancel pending or ongoing partition assignment state transitions(ST) that are unnecessary due to some reasons, eg. pending state does not match the target state. The feature expects ST messages are only sent by helix controller so controller knows what kind of ST can be cancelled.
   > 
   > Root cause: When SN resets partition with ERROR -> OFFLINE ST, helix controller determines it is an unnecessary ST so it sends out cancellation, which is expected for helix ST cancellation feature.
   > 
   > This PR improves it by skipping cancelling the reset ST sent by Admin or participant.
   
   Do we have other messages similar to error message that can be set by admin tool/ participant, that can be cancelled by controller? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1637:
URL: https://github.com/apache/helix/pull/1637#issuecomment-773680850


   Can I clarify about the root of the bug?
   
   The error msg is send by the participant itself, not the controller. controller only send cancellation upon seeing the Error message and send cancellation of this error in message generation phase?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570638149



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -437,10 +454,10 @@ private Message createStateTransitionCancellationMessage(HelixManager manager, R
       boolean isCancellationEnabled, String currentState) {
 
     if (isCancellationEnabled && cancellationMessage == null) {
-      LogUtil.logInfo(logger, _eventId,
-          "Send cancellation message of the state transition for " + resource.getResourceName()
-              + "." + partitionName + " on " + instanceName + ", currentState: " + currentState
-              + ", nextState: " + (nextState == null ? "N/A" : nextState));
+      logger.info("Event {} : Send cancellation message of the state transition for {}.{} on {}, "

Review comment:
       Ok. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm merged pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm merged pull request #1637:
URL: https://github.com/apache/helix/pull/1637


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570664736



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       Minor thing, as we discussed offline, the comment is about why we don't cancel, but the code returns true if the message should be canceled. Please try to make them align, or this part tends to be confusing.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570630118



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,

Review comment:
       It could happen. Then the ST sent by participant will be completed, before the cancellation arrives at the participant. This is exactly what we have seen: when the partition reset ST is processed really fast by participant, controller cannot even read the pending message, so the partition reset is successful without being cancelled. But the success relies on the timing. So we want to skip the error reset ST to make sure the ST is not cancelled.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       You meant this? https://github.com/apache/helix/blob/b1b194c1bffa68a53f1a81ba307452474c71a016/helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java#L214-L216
   
   Previously only `!currentState.equalsIgnoreCase(pendingMessage.getToState()` is checked. The ERROR state and initial state are not.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())

Review comment:
       I understand `name()` is final and `toString()` could be override. There is mixed use in our code. Either should work. If in the future `toString` is override, we have to make all of them consistent in helix code. 

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -437,10 +454,10 @@ private Message createStateTransitionCancellationMessage(HelixManager manager, R
       boolean isCancellationEnabled, String currentState) {
 
     if (isCancellationEnabled && cancellationMessage == null) {
-      LogUtil.logInfo(logger, _eventId,
-          "Send cancellation message of the state transition for " + resource.getResourceName()
-              + "." + partitionName + " on " + instanceName + ", currentState: " + currentState
-              + ", nextState: " + (nextState == null ? "N/A" : nextState));
+      logger.info("Event {} : Send cancellation message of the state transition for {}.{} on {}, "

Review comment:
       1. I'd like to add the `toState` field to the log.
   2. parameterized logging is better than `String.format` in the LogUtil. When error logging level is enabled, the string won't be created in parameterized logging, but it will be created in the LogUtil.
   3. The log format is not changed. I don't think someone will really parse the log in the script.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       @jiajunwang Oh...Good catch! Yes. `(HelixDefinedState.ERROR.name().equals(pendingMessage.getFromState()) && initialState.equals(pendingMessage.getToState())` is supposed together as it represents `ERROR -> OFFLINE`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570627836



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,

Review comment:
       One question: As participants and controller are async with each other, what would happen when the participant is processing the previous message (pending message is still there) and once the cancellation message is being sent, then ST has been executed by the participant? This would potentially happen when pipeline processing time gets extended.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm commented on pull request #1637:
URL: https://github.com/apache/helix/pull/1637#issuecomment-773682190






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570635186



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -437,10 +454,10 @@ private Message createStateTransitionCancellationMessage(HelixManager manager, R
       boolean isCancellationEnabled, String currentState) {
 
     if (isCancellationEnabled && cancellationMessage == null) {
-      LogUtil.logInfo(logger, _eventId,
-          "Send cancellation message of the state transition for " + resource.getResourceName()
-              + "." + partitionName + " on " + instanceName + ", currentState: " + currentState
-              + ", nextState: " + (nextState == null ? "N/A" : nextState));
+      logger.info("Event {} : Send cancellation message of the state transition for {}.{} on {}, "

Review comment:
       1. I'd like to add the `toState` field to the log.
   2. parameterized logging is better than `String.format` in the LogUtil. When error logging level is enabled, the string won't be created in parameterized logging, but it will be created in the LogUtil.
   3. The log format is not changed. I don't think someone will really parse the log in the script.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570626788



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -437,10 +454,10 @@ private Message createStateTransitionCancellationMessage(HelixManager manager, R
       boolean isCancellationEnabled, String currentState) {
 
     if (isCancellationEnabled && cancellationMessage == null) {
-      LogUtil.logInfo(logger, _eventId,
-          "Send cancellation message of the state transition for " + resource.getResourceName()
-              + "." + partitionName + " on " + instanceName + ", currentState: " + currentState
-              + ", nextState: " + (nextState == null ? "N/A" : nextState));
+      logger.info("Event {} : Send cancellation message of the state transition for {}.{} on {}, "

Review comment:
       why change the logger? LogUtil should also log eventId. The thing is that if log format changed, later if some script searching the log may not work. Not a big deal. But if possible, let us don't change to `logger.info`

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -437,10 +454,10 @@ private Message createStateTransitionCancellationMessage(HelixManager manager, R
       boolean isCancellationEnabled, String currentState) {
 
     if (isCancellationEnabled && cancellationMessage == null) {
-      LogUtil.logInfo(logger, _eventId,
-          "Send cancellation message of the state transition for " + resource.getResourceName()
-              + "." + partitionName + " on " + instanceName + ", currentState: " + currentState
-              + ", nextState: " + (nextState == null ? "N/A" : nextState));
+      logger.info("Event {} : Send cancellation message of the state transition for {}.{} on {}, "

Review comment:
       Ok. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 edited a comment on pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
kaisun2000 edited a comment on pull request #1637:
URL: https://github.com/apache/helix/pull/1637#issuecomment-773691056


   > > Can I clarify about the root of the bug?
   > > The error msg is send by the participant itself, not the controller. controller only send cancellation upon seeing the Error message and send cancellation of this error in message generation phase?
   > 
   > It's described in the issue #1636
   > 
   > Helix ST cancellation is a feature that helix can cancel pending or ongoing partition assignment state transitions(ST) that are unnecessary due to some reasons, eg. pending state does not match the target state. The feature expects ST messages are only sent by helix controller so controller knows what kind of ST can be cancelled.
   > 
   > Root cause: When SN resets partition with ERROR -> OFFLINE ST, helix controller determines it is an unnecessary ST so it sends out cancellation, which is expected for helix ST cancellation feature.
   > 
   > This PR improves it by skipping cancelling the reset ST sent by Admin or participant.
   
   Do we have other messages similar to error message that can be set by admin tool/ participant, that can be cancelled by controller? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570664736



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       Minor thing, as we discussed offline, the comment is about why we don't cancel, but the code returns true if the message should be canceled. Please try to make them align, or this part tends to be confusing.

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       There is a typo in the code,
   (HelixDefinedState.ERROR.name().equals(pendingMessage.getFromState()) && initialState.equals(pendingMessage.getToState()) should be checked together.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570630118



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,

Review comment:
       It could happen. Then the ST sent by participant will be completed, before the cancellation arrives at the participant. This is exactly what we have seen: when the partition reset ST is processed really fast by participant, controller cannot even read the pending message, so the partition reset is successful without being cancelled. But the success relies on the timing. So we want to skip the error reset ST to make sure the ST is not cancelled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] dasahcc commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
dasahcc commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570623421



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())

Review comment:
       NIT: HelixDefinedState.ERROR.name()

##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       Question: This line logic is expected. But I did not find it from previous code?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1637:
URL: https://github.com/apache/helix/pull/1637#discussion_r570679877



##########
File path: helix-core/src/main/java/org/apache/helix/controller/stages/MessageGenerationPhase.java
##########
@@ -271,6 +271,23 @@ private void generateMessage(final Resource resource, final BaseControllerDataPr
     } // end of for-each-partition
   }
 
+  private boolean shouldCreateSTCancellation(Message pendingMessage, String currentState,
+      String desiredState, String initialState) {
+    if (pendingMessage == null) {
+      return false;
+    }
+    if (NO_DESIRED_STATE.equals(desiredState)) {
+      return true;
+    }
+
+    // Don't cancel the ST for below scenarios:
+    // 1. current state has arrived at pending message's to state
+    // 2. pending message is an ERROR reset: ERROR -> initState (eg. OFFLINE)
+    return !currentState.equalsIgnoreCase(pendingMessage.getToState())
+        && !HelixDefinedState.ERROR.toString().equals(pendingMessage.getFromState())
+        && !initialState.equals(pendingMessage.getToState());

Review comment:
       @jiajunwang Oh...Good catch! Yes. `(HelixDefinedState.ERROR.name().equals(pendingMessage.getFromState()) && initialState.equals(pendingMessage.getToState())` is supposed together as it represents `ERROR -> OFFLINE`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on pull request #1637: Skip cancellation for partition reset pending message

Posted by GitBox <gi...@apache.org>.
pkuwm commented on pull request #1637:
URL: https://github.com/apache/helix/pull/1637#issuecomment-773693233


   > > > Can I clarify about the root of the bug?
   > > > The error msg is send by the participant itself, not the controller. controller only send cancellation upon seeing the Error message and send cancellation of this error in message generation phase?
   > > 
   > > 
   > > It's described in the issue #1636
   > > Helix ST cancellation is a feature that helix can cancel pending or ongoing partition assignment state transitions(ST) that are unnecessary due to some reasons, eg. pending state does not match the target state. The feature expects ST messages are only sent by helix controller so controller knows what kind of ST can be cancelled.
   > > Root cause: When SN resets partition with ERROR -> OFFLINE ST, helix controller determines it is an unnecessary ST so it sends out cancellation, which is expected for helix ST cancellation feature.
   > > This PR improves it by skipping cancelling the reset ST sent by Admin or participant.
   > 
   > Do we have other messages similar to error message that can be set by admin tool/ participant, that can be cancelled by controller?
   
   As far as I know, this is the only type of message sent by helix admin. In the future, we may add a feature that allows admin/participant to send a custom request/message to state transition to a state. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org