You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/11/17 03:18:28 UTC

[GitHub] [helix] kaisun2000 opened a new pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

kaisun2000 opened a new pull request #1537:
URL: https://github.com/apache/helix/pull/1537


   
   
   ### Issues
   
   - [x] My PR addresses the following Helix issues and references them in the PR description:
   
   fix #1406; fix #1380 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI changes:
   Fix two flaky tests. For TestP2PNoDuplicatedMessage, cap the success
   rate to 90 percent as in github environment ZK access can be slow.
   For TestDeleteJobFromJobQueue, the design of force delete can be
   in conflict with controller writes as contrary to what was noted by
   the original deleveloper of this feature. We fix it by stop the
   controller for now.
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   (List the names of added unit/integration tests)
   
   - [ ] The following is the result of the "mvn test" command on the appropriate module:
   
   (Before CI test pass, please copy & paste the result of "mvn test")
   
   ### Documentation (Optional)
   
   - In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r525431220



##########
File path: helix-core/src/test/java/org/apache/helix/integration/messaging/TestP2PNoDuplicatedMessage.java
##########
@@ -174,7 +174,10 @@ public void testP2PStateTransitionEnabled() {
       verifyP2PEnabled(startTime);
     }
 
-    Assert.assertEquals(p2pTrigged, total);
+    // The success rate really depends on how quick participant act in relationship with controller.
+    // For now, we set 90% threshold.
+    long threshold = Math.round(total * 0.9);

Review comment:
       @jiajunwang, I sync-ed with Meng offline. @zhangmeng916, can you help to comment?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r525400308



##########
File path: helix-core/src/test/java/org/apache/helix/integration/messaging/TestP2PNoDuplicatedMessage.java
##########
@@ -174,7 +174,10 @@ public void testP2PStateTransitionEnabled() {
       verifyP2PEnabled(startTime);
     }
 
-    Assert.assertEquals(p2pTrigged, total);
+    // The success rate really depends on how quick participant act in relationship with controller.
+    // For now, we set 90% threshold.
+    long threshold = Math.round(total * 0.9);

Review comment:
       @zhangmeng916 and @lei-xia , could you please confirm if this is the expectation? I understand that Kai needs to change it since the current run does not always succeed with 100% threshold. But I'm not sure if this should be our expectation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r526527639



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,7 +69,14 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // Force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406, also force deletion may not be safe.
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    Thread.sleep(3000);
+    // note this sleep is critical as it would take time for controller to stop.

Review comment:
       changed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r528980350



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,7 +69,14 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // Force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406, also force deletion may not be safe.
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    // note this sleep is critical as it would take time for controller to stop.
+    // todo: this is not best practice to sleep. Let GenericHelixController gives out a signal would
+    // todO: be another way. But this needs much more careful design.

Review comment:
       Typo, and use uppercase "TODO", please.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1537:
URL: https://github.com/apache/helix/pull/1537#issuecomment-731073848


   This PR is approved, please help to merge in.
   
   >Fix two flaky tests. For TestP2PNoDuplicatedMessage, cap the success
   rate to 90 percent as in github environment ZK access can be slow.
   For TestDeleteJobFromJobQueue, the design of force delete can be
   in conflict with controller writes as contrary to what was noted by
   the original deleveloper of this feature. We fix it by stop the
   controller for now.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] zhangmeng916 commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
zhangmeng916 commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r525535972



##########
File path: helix-core/src/test/java/org/apache/helix/integration/messaging/TestP2PNoDuplicatedMessage.java
##########
@@ -174,7 +174,10 @@ public void testP2PStateTransitionEnabled() {
       verifyP2PEnabled(startTime);
     }
 
-    Assert.assertEquals(p2pTrigged, total);
+    // The success rate really depends on how quick participant act in relationship with controller.
+    // For now, we set 90% threshold.
+    long threshold = Math.round(total * 0.9);

Review comment:
       I think we cannot guarantee 100% P2P for all kinds of infrastructure. Although local tests always passed, there're some failures in our testing environment. The percent is always above 90% though. I believe there are corner cases that we cannot label all stale messages, which will block the sending of P2P.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang merged pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
jiajunwang merged pull request #1537:
URL: https://github.com/apache/helix/pull/1537


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r528276126



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,7 +69,14 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // Force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406, also force deletion may not be safe.
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);

Review comment:
       Ops, naming style as a constant but it is not... So easy to misread.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r528137543



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,7 +69,14 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // Force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406, also force deletion may not be safe.
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);

Review comment:
       `protected final String CLUSTER_NAME = CLUSTER_PREFIX + "_" + getShortClassName();` each test class has unique  cluster name. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r529075143



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,7 +69,14 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // Force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406, also force deletion may not be safe.
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    // note this sleep is critical as it would take time for controller to stop.
+    // todo: this is not best practice to sleep. Let GenericHelixController gives out a signal would
+    // todO: be another way. But this needs much more careful design.

Review comment:
       fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
pkuwm commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r528120026



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,7 +69,14 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // Force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406, also force deletion may not be safe.
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);

Review comment:
       CLUSTER_NAME is a defined in the parent class which is also used by other tests. How could you ensure disabling this CLUSTER_NAME would not impact or fail other tests that use the same cluster?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] zhangmeng916 commented on a change in pull request #1537: fix two flaky test -- TestP2PNoDuplicatedMessage and TestDeleteJobFromJobQueue

Posted by GitBox <gi...@apache.org>.
zhangmeng916 commented on a change in pull request #1537:
URL: https://github.com/apache/helix/pull/1537#discussion_r526501065



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,7 +69,14 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // Force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406, also force deletion may not be safe.
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    Thread.sleep(3000);
+    // note this sleep is critical as it would take time for controller to stop.

Review comment:
       Can you change to "TODO", and put before the sleep line?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/messaging/TestP2PNoDuplicatedMessage.java
##########
@@ -174,7 +174,10 @@ public void testP2PStateTransitionEnabled() {
       verifyP2PEnabled(startTime);
     }
 
-    Assert.assertEquals(p2pTrigged, total);
+    // The success rate really depends on how quick participant act in relationship with controller.
+    // For now, we set 90% threshold.
+    long threshold = Math.round(total * 0.9);

Review comment:
       Please make sure other reviewers are ok with this too.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org