You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/10/08 23:41:13 UTC

[GitHub] [helix] kaisun2000 opened a new pull request #1455: Fix task related flaky test due to phantom read

kaisun2000 opened a new pull request #1455:
URL: https://github.com/apache/helix/pull/1455


   
   ### Issues
   
   - [x] My PR addresses the following Helix issues and references them in the PR description:
   
   fix #1454 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI changes:
   
   Currently selective update can result in "phantom read". For
   example, when job1 cfg is added and then workflow cfg
   containing job 1 is updated. The selective update may only
   see updated workflow cfg but not job 1 cfg. This would cause
   job1 cfg and ctx removed. THus hanging tests.
   
   Currently we work around this issue by setting purge interval
   to -1 and enhance logging in task to make test stable.
   
   
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   (List the names of added unit/integration tests)
   
   - [x] The following is the result of the "mvn test" command on the appropriate module:
   
   running
   
   ### Documentation (Optional)
   
   - In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502165191



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -413,25 +447,38 @@ public void stopAndDeleteQueue() throws Exception {
     // Create a queue
     System.out.println("START " + queueName + " at " + new Date(System.currentTimeMillis()));
     WorkflowConfig wfCfg =
-        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES).build();
+        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES)
+            .setJobPurgeInterval(-1)
+            .build();
     JobQueue qCfg = new JobQueue.Builder(queueName).fromMap(wfCfg.getResourceConfigMap()).build();
     _driver.createQueue(qCfg);
 
     // Enqueue 2 jobs
+    List<String> jobNames = new ArrayList<>();
+    List<JobConfig.Builder> jobBuilders = new ArrayList<>();
+
     Set<String> master = Sets.newHashSet("MASTER");
     JobConfig.Builder job1 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))
         .setTargetResource(WorkflowGenerator.DEFAULT_TGT_DB).setTargetPartitionStates(master);
     String job1Name = "masterJob";
     LOG.info("Enqueuing job1: " + job1Name);
+    jobNames.add(job1Name);
+    jobBuilders.add(job1);
     _driver.enqueueJob(queueName, job1Name, job1);
 
     Set<String> slave = Sets.newHashSet("SLAVE");
     JobConfig.Builder job2 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))
         .setTargetResource(WorkflowGenerator.DEFAULT_TGT_DB).setTargetPartitionStates(slave);
     String job2Name = "slaveJob";
     LOG.info("Enqueuing job2: " + job2Name);
+    jobNames.add(job2Name);
+    jobBuilders.add(job2);
     _driver.enqueueJob(queueName, job2Name, job2);
 
+    //_driver.enqueueJobs(queueName, jobNames, jobBuilders);

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r503634072



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestBatchAddJobs.java
##########
@@ -86,6 +86,9 @@ private String getPrefix(String job) {
 
   @AfterClass
   public void afterClass() {
+    String testClassName = this.getShortClassName();

Review comment:
       Print in the superclass ZkTestBase.java?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502171949



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestMaxNumberOfAttemptsMasterSwitch.java
##########
@@ -71,6 +71,9 @@ public void beforeClass() throws Exception {
 
   @AfterClass
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       Good point. Moved to TaskSynchronizedTestBase @afterclass




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] NealSun96 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
NealSun96 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502701244



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -413,25 +447,38 @@ public void stopAndDeleteQueue() throws Exception {
     // Create a queue
     System.out.println("START " + queueName + " at " + new Date(System.currentTimeMillis()));
     WorkflowConfig wfCfg =
-        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES).build();
+        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES)
+            .setJobPurgeInterval(-1)
+            .build();
     JobQueue qCfg = new JobQueue.Builder(queueName).fromMap(wfCfg.getResourceConfigMap()).build();
     _driver.createQueue(qCfg);
 
     // Enqueue 2 jobs
+    List<String> jobNames = new ArrayList<>();
+    List<JobConfig.Builder> jobBuilders = new ArrayList<>();
+
     Set<String> master = Sets.newHashSet("MASTER");
     JobConfig.Builder job1 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))

Review comment:
       This is certainly harmless because we poll for completed. What is the reason why we need the delay, though? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502163975



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -491,7 +502,7 @@ public void testThreadLeak() throws InterruptedException {
    * Tests quota-based scheduling for a job queue with different quota types.
    * @throws InterruptedException
    */
-  @Test(dependsOnMethods = "testSchedulingWithoutQuota")
+  @Test(dependsOnMethods = "testSchedulingWithoutQuota", enabled = true)

Review comment:
       removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502180436



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -32,22 +34,23 @@
 import org.apache.helix.task.Workflow;
 import org.apache.helix.task.WorkflowConfig;
 import org.testng.Assert;
+import org.testng.ITestContext;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.AfterMethod;
 import org.testng.annotations.BeforeClass;
+import org.testng.annotations.BeforeMethod;
 import org.testng.annotations.Test;
 
 public class TestEnqueueJobs extends TaskTestBase {
 
-  @BeforeClass

Review comment:
       This should not be removed,




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502176200



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskSchedulingTwoCurrentStates.java
##########
@@ -93,6 +93,8 @@ public void beforeClass() throws Exception {
 
   @AfterClass()
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502179247



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -231,9 +239,17 @@ public static Date getDateFromStartTime(String startTime)
 
   public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,
       int failureThreshold, int capacity) {
+    return buildJobQueue(jobQueueName, delayStart, failureThreshold, capacity, false);
+  }
+
+  public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,
+      int failureThreshold, int capacity, boolean disablePurge) {

Review comment:
       Changed to do purge




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502126203



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,11 +69,18 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406
+
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    Thread.sleep(3000); // note this sleep is critical as it would take time for controller to stop

Review comment:
       How do we verify the pipeline stops processing event? Note, you can't just rely on pause signal node is in the Zookeeper. 
   
   This is similar to waitTillVerify(). It seems to me there is not good way to wait for some signal. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502126203



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,11 +69,18 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406
+
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    Thread.sleep(3000); // note this sleep is critical as it would take time for controller to stop

Review comment:
       How do we verify the pipeline stops processing event? Note, you can't just rely on pause signal node is in the Zookeeper. 
   
   This is similar to waitTillVerify(). It seems to me there is not good way to wait for some signal using TestHelper.verify()




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r503633722



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -231,9 +239,17 @@ public static Date getDateFromStartTime(String startTime)
 
   public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,

Review comment:
       This specific one has 3 callers. So please just change it.
   
   The one which has many references outside is "public static JobQueue.Builder buildJobQueue(String jobQueueName) {". But it seems that you are not changing it. So I guess we can just change it without too much overhead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502173025



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -116,6 +117,11 @@ public void beforeClass() throws Exception {
     _jobCommandMap = Maps.newHashMap();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502106398



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -61,12 +61,14 @@
 import org.apache.helix.task.TaskUtil;
 import org.apache.helix.task.WorkflowConfig;
 import org.apache.helix.task.WorkflowContext;
+import org.slf4j.LoggerFactory;
 import org.testng.Assert;
 
 /**
  * Static test utility methods.
  */
 public class TaskTestUtil {
+  private static final org.slf4j.Logger logger = LoggerFactory.getLogger(TaskTestUtil.class);

Review comment:
       nit, should be "LOG" based on our naming convention.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -231,9 +239,17 @@ public static Date getDateFromStartTime(String startTime)
 
   public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,

Review comment:
       This is a test method, we don't need to ensure backward compatibility. How about just change the method?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();

Review comment:
       What is this?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {
+    //WorkflowConfig.enableJobPurge();
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {

Review comment:
       Please confirm if this has already been done by the parent class ZkTestBase.
   I think this is duplicate logic.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -51,20 +52,36 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
 import org.testng.annotations.Test;
 
 public class TestTaskRebalancerStopResume extends TaskTestBase {
   private static final Logger LOG = LoggerFactory.getLogger(TestTaskRebalancerStopResume.class);
   private static final String JOB_RESOURCE = "SomeJob";
 
+  @BeforeClass

Review comment:
       Same here

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -491,7 +502,7 @@ public void testThreadLeak() throws InterruptedException {
    * Tests quota-based scheduling for a job queue with different quota types.
    * @throws InterruptedException
    */
-  @Test(dependsOnMethods = "testSchedulingWithoutQuota")
+  @Test(dependsOnMethods = "testSchedulingWithoutQuota", enabled = true)

Review comment:
       Isn't enabled default to be true?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -247,6 +263,11 @@ public static Date getDateFromStartTime(String startTime)
     return new JobQueue.Builder(jobQueueName).setWorkflowConfig(workflowCfgBuilder.build());
   }
 
+  public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,

Review comment:
       Same here.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestBatchAddJobs.java
##########
@@ -86,6 +86,9 @@ private String getPrefix(String job) {
 
   @AfterClass
   public void afterClass() {
+    String testClassName = this.getShortClassName();
+    System.out.println("AfterClass: " + testClassName + " of TestBachAddJobs called.");

Review comment:
       Humm, testClassName should be "TestBachAddJobs", right? So it will be "AfterClass: TestBachAddJobs of TestBachAddJobs called."?
   
   Moreover, why we add this logic to every child test class instead of print in the parent class?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,11 +69,18 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406
+
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    Thread.sleep(3000); // note this sleep is critical as it would take time for controller to stop

Review comment:
       I think you also emphasized that we shall avoid using Thread.sleep() in the tests. 3-second sleep is not guaranteed to work.
   Can we use verifier with a timeout to wait until the pipeline stopped?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -202,7 +237,7 @@ public void testQueueParallelJobs() throws InterruptedException {
     Assert.assertTrue(minFinishTime > maxStartTime);
   }
 
-  @Test
+  @Test()

Review comment:
       Same here

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -65,12 +97,13 @@ public void testJobQueueAddingJobsOneByOne() throws InterruptedException {
         TaskState.COMPLETED);
   }
 
-  @Test
+  @Test()

Review comment:
       It's OK, but why?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {

Review comment:
       I did a quick test, I think you can just remove it and the super.afterClass() will still be called, since you don't have any method with the same name to override it in that case.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {
+    //WorkflowConfig.enableJobPurge();
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {
+    long startTime = System.currentTimeMillis();
+    System.out.println("START " + testMethod.getName() + " at " + new Date(startTime));
+    testContext.setAttribute("StartTime", System.currentTimeMillis());
+  }
+
+  @AfterMethod
+  public void endTest(Method testMethod, ITestContext testContext) {

Review comment:
       Samething here, ZKTestBase has this method with same logic already.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -131,15 +165,16 @@ public void testJobSubmitGenericWorkflows() throws InterruptedException {
     _driver.pollForWorkflowState(workflowName, TaskState.COMPLETED);
   }
 
-  @Test
+  @Test()

Review comment:
       Same here

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -135,6 +141,7 @@ public void testSchedulingWithoutQuota() throws InterruptedException {
     Workflow.Builder workflowBuilder = new Workflow.Builder(workflowName);
     WorkflowConfig.Builder configBuilder = new WorkflowConfig.Builder(workflowName);
     configBuilder.setAllowOverlapJobAssignment(true);
+    configBuilder.setJobPurgeInterval(-1);

Review comment:
       Are you suggest we remove these changes one by one if later we fix the purge issue?
   @NealSun96 , what is the ETA of your change, please? I think removing these changes will be troublesome. Unless we separate them into a single change and revert it completely later.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestMaxNumberOfAttemptsMasterSwitch.java
##########
@@ -71,6 +71,9 @@ public void beforeClass() throws Exception {
 
   @AfterClass
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       Instead of doing this, I suggest refactoring the afterClass() method in TaskSynchronizedTestBase or ZkTestBase. In this case, you don't have duplicate code. And all the child classes have it automatically.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -231,9 +239,17 @@ public static Date getDateFromStartTime(String startTime)
 
   public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,
       int failureThreshold, int capacity) {
+    return buildJobQueue(jobQueueName, delayStart, failureThreshold, capacity, false);
+  }
+
+  public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,
+      int failureThreshold, int capacity, boolean disablePurge) {

Review comment:
       nit, but it would be more intuitive to be "boolean doPurge"

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -116,6 +117,11 @@ public void beforeClass() throws Exception {
     _jobCommandMap = Maps.newHashMap();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {

Review comment:
       I think it is not necessary. The super afterClass will still be called if there is no class with the same name override it in the child class. Please have a try.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -32,6 +32,7 @@
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Sets;
+import com.sun.corba.se.spi.orbutil.threadpool.Work;

Review comment:
       Humm, interesting package. What is the usage, please?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskSchedulingTwoCurrentStates.java
##########
@@ -93,6 +93,8 @@ public void beforeClass() throws Exception {
 
   @AfterClass()
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       Same here, try to update parent class please.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerParallel.java
##########
@@ -44,6 +45,11 @@ public void beforeClass() throws Exception {
     super.beforeClass();
   }
 
+  @AfterClass

Review comment:
       Same here.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -159,13 +190,16 @@ public void stopAndResumeNamedQueue() throws Exception {
     verifyJobNotInQueue(queueName, namespacedJob2);
   }
 
-  @Test
+  @Test(enabled = true)

Review comment:
       default is true.
   
     /**
      * Whether methods on this class/method are enabled.
      */
     public boolean enabled() default true;

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskAssignmentCalculator.java
##########
@@ -102,6 +108,26 @@ public void beforeClass() throws Exception {
     _jobCommandMap = Maps.newHashMap();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {
+    long startTime = System.currentTimeMillis();
+    System.out.println("START " + testMethod.getName() + " at " + new Date(startTime));
+    testContext.setAttribute("StartTime", System.currentTimeMillis());
+  }
+
+  @AfterMethod
+  public void endTest(Method testMethod, ITestContext testContext) {
+    Long startTime = (Long) testContext.getAttribute("StartTime");
+    long endTime = System.currentTimeMillis();
+    System.out.println("END " + testMethod.getName() + " at " + new Date(endTime) + ", took: "
+        + (endTime - startTime) + "ms.");
+  }
+

Review comment:
       Same here, I think if the classes are defined carefully, then the parent class logic can cover these.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestStuckTaskQuota.java
##########
@@ -63,6 +63,9 @@ public void beforeClass() throws Exception {
 
   @AfterClass
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       As we can see, there are many changes like this. So let's do it in the parent class. Shall we?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancer.java
##########
@@ -43,9 +45,21 @@
 import org.apache.helix.task.WorkflowConfig;
 import org.apache.helix.task.WorkflowContext;
 import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
 import org.testng.annotations.Test;
 
 public class TestTaskRebalancer extends TaskTestBase {
+  @BeforeClass
+  public void beforeClass() throws Exception {
+    super.beforeClass();
+  }
+
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();
+  }
+

Review comment:
       Same here. Are you sure we need them?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -413,25 +447,38 @@ public void stopAndDeleteQueue() throws Exception {
     // Create a queue
     System.out.println("START " + queueName + " at " + new Date(System.currentTimeMillis()));
     WorkflowConfig wfCfg =
-        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES).build();
+        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES)
+            .setJobPurgeInterval(-1)
+            .build();
     JobQueue qCfg = new JobQueue.Builder(queueName).fromMap(wfCfg.getResourceConfigMap()).build();
     _driver.createQueue(qCfg);
 
     // Enqueue 2 jobs
+    List<String> jobNames = new ArrayList<>();
+    List<JobConfig.Builder> jobBuilders = new ArrayList<>();
+
     Set<String> master = Sets.newHashSet("MASTER");
     JobConfig.Builder job1 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))

Review comment:
       I'm not sure if we really need it. But looks harmless.
   @alirezazamani @NealSun96 , could you please help to take another look?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -413,25 +447,38 @@ public void stopAndDeleteQueue() throws Exception {
     // Create a queue
     System.out.println("START " + queueName + " at " + new Date(System.currentTimeMillis()));
     WorkflowConfig wfCfg =
-        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES).build();
+        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES)
+            .setJobPurgeInterval(-1)
+            .build();
     JobQueue qCfg = new JobQueue.Builder(queueName).fromMap(wfCfg.getResourceConfigMap()).build();
     _driver.createQueue(qCfg);
 
     // Enqueue 2 jobs
+    List<String> jobNames = new ArrayList<>();
+    List<JobConfig.Builder> jobBuilders = new ArrayList<>();
+
     Set<String> master = Sets.newHashSet("MASTER");
     JobConfig.Builder job1 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))
         .setTargetResource(WorkflowGenerator.DEFAULT_TGT_DB).setTargetPartitionStates(master);
     String job1Name = "masterJob";
     LOG.info("Enqueuing job1: " + job1Name);
+    jobNames.add(job1Name);
+    jobBuilders.add(job1);
     _driver.enqueueJob(queueName, job1Name, job1);
 
     Set<String> slave = Sets.newHashSet("SLAVE");
     JobConfig.Builder job2 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))
         .setTargetResource(WorkflowGenerator.DEFAULT_TGT_DB).setTargetPartitionStates(slave);
     String job2Name = "slaveJob";
     LOG.info("Enqueuing job2: " + job2Name);
+    jobNames.add(job2Name);
+    jobBuilders.add(job2);
     _driver.enqueueJob(queueName, job2Name, job2);
 
+    //_driver.enqueueJobs(queueName, jobNames, jobBuilders);

Review comment:
       ???

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskThrottling.java
##########
@@ -50,6 +52,10 @@ public void beforeClass() throws Exception {
     super.beforeClass();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {

Review comment:
       Same here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502163349



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -159,13 +190,16 @@ public void stopAndResumeNamedQueue() throws Exception {
     verifyJobNotInQueue(queueName, namespacedJob2);
   }
 
-  @Test
+  @Test(enabled = true)

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502173518



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestStuckTaskQuota.java
##########
@@ -63,6 +63,9 @@ public void beforeClass() throws Exception {
 
   @AfterClass
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502130312



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -231,9 +239,17 @@ public static Date getDateFromStartTime(String startTime)
 
   public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,

Review comment:
       There are other places to use this function. If we don't have this signature, that means probably a lot of  usage places are also changed. That is not essential change, make it hard for people to reason and may be later to revert.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502175598



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerParallel.java
##########
@@ -44,6 +45,11 @@ public void beforeClass() throws Exception {
     super.beforeClass();
   }
 
+  @AfterClass

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502164751



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -202,7 +237,7 @@ public void testQueueParallelJobs() throws InterruptedException {
     Assert.assertTrue(minFinishTime > maxStartTime);
   }
 
-  @Test
+  @Test()

Review comment:
       restored.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502128838



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -61,12 +61,14 @@
 import org.apache.helix.task.TaskUtil;
 import org.apache.helix.task.WorkflowConfig;
 import org.apache.helix.task.WorkflowContext;
+import org.slf4j.LoggerFactory;
 import org.testng.Assert;
 
 /**
  * Static test utility methods.
  */
 public class TaskTestUtil {
+  private static final org.slf4j.Logger logger = LoggerFactory.getLogger(TaskTestUtil.class);

Review comment:
       Changed to LOG




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502164517



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -65,12 +97,13 @@ public void testJobQueueAddingJobsOneByOne() throws InterruptedException {
         TaskState.COMPLETED);
   }
 
-  @Test
+  @Test()

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502176017



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -51,20 +52,36 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
 import org.testng.annotations.Test;
 
 public class TestTaskRebalancerStopResume extends TaskTestBase {
   private static final Logger LOG = LoggerFactory.getLogger(TestTaskRebalancerStopResume.class);
   private static final String JOB_RESOURCE = "SomeJob";
 
+  @BeforeClass

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502174092



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskAssignmentCalculator.java
##########
@@ -102,6 +108,26 @@ public void beforeClass() throws Exception {
     _jobCommandMap = Maps.newHashMap();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {
+    long startTime = System.currentTimeMillis();
+    System.out.println("START " + testMethod.getName() + " at " + new Date(startTime));
+    testContext.setAttribute("StartTime", System.currentTimeMillis());
+  }
+
+  @AfterMethod
+  public void endTest(Method testMethod, ITestContext testContext) {
+    Long startTime = (Long) testContext.getAttribute("StartTime");
+    long endTime = System.currentTimeMillis();
+    System.out.println("END " + testMethod.getName() + " at " + new Date(endTime) + ", took: "
+        + (endTime - startTime) + "ms.");
+  }
+

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502164679



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -131,15 +165,16 @@ public void testJobSubmitGenericWorkflows() throws InterruptedException {
     _driver.pollForWorkflowState(workflowName, TaskState.COMPLETED);
   }
 
-  @Test
+  @Test()

Review comment:
       restored.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502127298



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();

Review comment:
       The should be removed. 
   
   Previously, we tried to set `protected static final long DEFAULT_JOB_PURGE_INTERVAL = 30 * 60 * 1000; // default 30 minutes` to -1 in WorkflowConfig.java, by using this function. 
   
   This has its drawback as people have concerns as it pollutes production code as:
   
   1/ need to make DEFAULT_JOB_PURGE_INTERVAL not final.
   2/ The WorkflowConfig.disableJobPurge needs to be public. Thus, not testOnly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 closed pull request #1455: Fix task related flaky test due to phantom read (WIP)

Posted by GitBox <gi...@apache.org>.
kaisun2000 closed pull request #1455:
URL: https://github.com/apache/helix/pull/1455


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1455:
URL: https://github.com/apache/helix/pull/1455#issuecomment-705926326


   > I have concerns about this PR.
   > 
   > 1. The way we disable purge is by setting the config everywhere. As we discussed, it is a temporary resolution. When we need to remove them, it would be messy.
   > 2. This PR mixes other changes with the purge change. Meaning we cannot revert it to remove the temporary "fix".
   > 3. I think we agreed on wait until Neal evaluate the ETA of a more graceful temp solution, do we have the conclusion yet?
   > 4. I think we can refactor a little bit by leveraging parent class. The current PR introduces lots of duplicate logic.
   
   3/ Last time we discussed, the conclusion is that let Neal work on read one more time before purging as a temp solution. I will help to validate by reproducing the phantom read problem in the debugger. Let us sync about this tomorrow then.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502176418



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskThrottling.java
##########
@@ -50,6 +52,10 @@ public void beforeClass() throws Exception {
     super.beforeClass();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502179325



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -247,6 +263,11 @@ public static Date getDateFromStartTime(String startTime)
     return new JobQueue.Builder(jobQueueName).setWorkflowConfig(workflowCfgBuilder.build());
   }
 
+  public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,

Review comment:
       changed to do purge




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] NealSun96 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
NealSun96 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502593448



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,11 +69,18 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406
+
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    Thread.sleep(3000); // note this sleep is critical as it would take time for controller to stop

Review comment:
       Is it better to retry `deleteJob` then? Disable the cluster and retry until no exception. 

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -135,6 +141,7 @@ public void testSchedulingWithoutQuota() throws InterruptedException {
     Workflow.Builder workflowBuilder = new Workflow.Builder(workflowName);
     WorkflowConfig.Builder configBuilder = new WorkflowConfig.Builder(workflowName);
     configBuilder.setAllowOverlapJobAssignment(true);
+    configBuilder.setJobPurgeInterval(-1);

Review comment:
       It's scheduled this sprint, and I can finish it as fast as mid-next week




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on pull request #1455:
URL: https://github.com/apache/helix/pull/1455#issuecomment-705926326


   > I have concerns about this PR.
   > 
   > 1. The way we disable purge is by setting the config everywhere. As we discussed, it is a temporary resolution. When we need to remove them, it would be messy.
   > 2. This PR mixes other changes with the purge change. Meaning we cannot revert it to remove the temporary "fix".
   > 3. I think we agreed on wait until Neal evaluate the ETA of a more graceful temp solution, do we have the conclusion yet?
   > 4. I think we can refactor a little bit by leveraging parent class. The current PR introduces lots of duplicate logic.
   
   3/ Last time we discussed, the conclusion is that let Neal work on read one more time before purging as a temp solution. I will help to validate by reproducing the phantom read problem in the debugger. Let us sync about this tomorrow then.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502174855



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancer.java
##########
@@ -43,9 +45,21 @@
 import org.apache.helix.task.WorkflowConfig;
 import org.apache.helix.task.WorkflowContext;
 import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
 import org.testng.annotations.Test;
 
 public class TestTaskRebalancer extends TaskTestBase {
+  @BeforeClass
+  public void beforeClass() throws Exception {
+    super.beforeClass();
+  }
+
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();
+  }
+

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502170218



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {

Review comment:
       These @beforeClass and @afterClass is due to the old way using WorkflowConfig.disableJobPurge(). Should be removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502177923



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestBatchAddJobs.java
##########
@@ -86,6 +86,9 @@ private String getPrefix(String job) {
 
   @AfterClass
   public void afterClass() {
+    String testClassName = this.getShortClassName();
+    System.out.println("AfterClass: " + testClassName + " of TestBachAddJobs called.");

Review comment:
         Changed to  System.out.println("AfterClass: " + testClassName + " called.");
   
   Now changed to print in parent class, as suggested.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502163484



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -32,6 +32,7 @@
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Sets;
+import com.sun.corba.se.spi.orbutil.threadpool.Work;

Review comment:
       removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502169001



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {
+    //WorkflowConfig.enableJobPurge();
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {

Review comment:
       Right. This should be removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {
+    //WorkflowConfig.enableJobPurge();
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {
+    long startTime = System.currentTimeMillis();
+    System.out.println("START " + testMethod.getName() + " at " + new Date(startTime));
+    testContext.setAttribute("StartTime", System.currentTimeMillis());
+  }
+
+  @AfterMethod
+  public void endTest(Method testMethod, ITestContext testContext) {

Review comment:
       Right. This should be removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502126203



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,11 +69,18 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406
+
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    Thread.sleep(3000); // note this sleep is critical as it would take time for controller to stop

Review comment:
       How do we verify the pipeline stops processing event? Note, you can't just rely on pause signal node is in the Zookeeper. 
   
   This is similar to waitTillVerify(). It seems to me there is not good way to wait for some signal. 

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();

Review comment:
       The should be removed. 
   
   Previously, we tried to set `protected static final long DEFAULT_JOB_PURGE_INTERVAL = 30 * 60 * 1000; // default 30 minutes` to -1 in WorkflowConfig.java, by using this function. 
   
   This has its drawback as people have concerns as it pollutes production code as:
   
   1/ need to make DEFAULT_JOB_PURGE_INTERVAL not final.
   2/ The WorkflowConfig.disableJobPurge needs to be public. Thus, not testOnly.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -61,12 +61,14 @@
 import org.apache.helix.task.TaskUtil;
 import org.apache.helix.task.WorkflowConfig;
 import org.apache.helix.task.WorkflowContext;
+import org.slf4j.LoggerFactory;
 import org.testng.Assert;
 
 /**
  * Static test utility methods.
  */
 public class TaskTestUtil {
+  private static final org.slf4j.Logger logger = LoggerFactory.getLogger(TaskTestUtil.class);

Review comment:
       Changed to LOG

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -231,9 +239,17 @@ public static Date getDateFromStartTime(String startTime)
 
   public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,

Review comment:
       There are other places to use this function. If we don't have this signature, that means probably a lot of  usage places are also changed. That is not essential change, make it hard for people to reason and may be later to revert.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,11 +69,18 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406
+
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    Thread.sleep(3000); // note this sleep is critical as it would take time for controller to stop

Review comment:
       How do we verify the pipeline stops processing event? Note, you can't just rely on pause signal node is in the Zookeeper. 
   
   This is similar to waitTillVerify(). It seems to me there is not good way to wait for some signal using TestHelper.verify()

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -159,13 +190,16 @@ public void stopAndResumeNamedQueue() throws Exception {
     verifyJobNotInQueue(queueName, namespacedJob2);
   }
 
-  @Test
+  @Test(enabled = true)

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -32,6 +32,7 @@
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Sets;
+import com.sun.corba.se.spi.orbutil.threadpool.Work;

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -491,7 +502,7 @@ public void testThreadLeak() throws InterruptedException {
    * Tests quota-based scheduling for a job queue with different quota types.
    * @throws InterruptedException
    */
-  @Test(dependsOnMethods = "testSchedulingWithoutQuota")
+  @Test(dependsOnMethods = "testSchedulingWithoutQuota", enabled = true)

Review comment:
       removed

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -65,12 +97,13 @@ public void testJobQueueAddingJobsOneByOne() throws InterruptedException {
         TaskState.COMPLETED);
   }
 
-  @Test
+  @Test()

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -131,15 +165,16 @@ public void testJobSubmitGenericWorkflows() throws InterruptedException {
     _driver.pollForWorkflowState(workflowName, TaskState.COMPLETED);
   }
 
-  @Test
+  @Test()

Review comment:
       restored.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -202,7 +237,7 @@ public void testQueueParallelJobs() throws InterruptedException {
     Assert.assertTrue(minFinishTime > maxStartTime);
   }
 
-  @Test
+  @Test()

Review comment:
       restored.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -413,25 +447,38 @@ public void stopAndDeleteQueue() throws Exception {
     // Create a queue
     System.out.println("START " + queueName + " at " + new Date(System.currentTimeMillis()));
     WorkflowConfig wfCfg =
-        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES).build();
+        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES)
+            .setJobPurgeInterval(-1)
+            .build();
     JobQueue qCfg = new JobQueue.Builder(queueName).fromMap(wfCfg.getResourceConfigMap()).build();
     _driver.createQueue(qCfg);
 
     // Enqueue 2 jobs
+    List<String> jobNames = new ArrayList<>();
+    List<JobConfig.Builder> jobBuilders = new ArrayList<>();
+
     Set<String> master = Sets.newHashSet("MASTER");
     JobConfig.Builder job1 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))
         .setTargetResource(WorkflowGenerator.DEFAULT_TGT_DB).setTargetPartitionStates(master);
     String job1Name = "masterJob";
     LOG.info("Enqueuing job1: " + job1Name);
+    jobNames.add(job1Name);
+    jobBuilders.add(job1);
     _driver.enqueueJob(queueName, job1Name, job1);
 
     Set<String> slave = Sets.newHashSet("SLAVE");
     JobConfig.Builder job2 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))
         .setTargetResource(WorkflowGenerator.DEFAULT_TGT_DB).setTargetPartitionStates(slave);
     String job2Name = "slaveJob";
     LOG.info("Enqueuing job2: " + job2Name);
+    jobNames.add(job2Name);
+    jobBuilders.add(job2);
     _driver.enqueueJob(queueName, job2Name, job2);
 
+    //_driver.enqueueJobs(queueName, jobNames, jobBuilders);

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {
+    //WorkflowConfig.enableJobPurge();
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {

Review comment:
       Right. This should be removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {
+    //WorkflowConfig.enableJobPurge();
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {
+    long startTime = System.currentTimeMillis();
+    System.out.println("START " + testMethod.getName() + " at " + new Date(startTime));
+    testContext.setAttribute("StartTime", System.currentTimeMillis());
+  }
+
+  @AfterMethod
+  public void endTest(Method testMethod, ITestContext testContext) {

Review comment:
       Right. This should be removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {

Review comment:
       These @beforeClass and @afterClass is due to the old way using WorkflowConfig.disableJobPurge(). Should be removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestMaxNumberOfAttemptsMasterSwitch.java
##########
@@ -71,6 +71,9 @@ public void beforeClass() throws Exception {
 
   @AfterClass
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       Good point. Moved to TaskSynchronizedTestBase @afterclass

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -116,6 +117,11 @@ public void beforeClass() throws Exception {
     _jobCommandMap = Maps.newHashMap();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestStuckTaskQuota.java
##########
@@ -63,6 +63,9 @@ public void beforeClass() throws Exception {
 
   @AfterClass
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskAssignmentCalculator.java
##########
@@ -102,6 +108,26 @@ public void beforeClass() throws Exception {
     _jobCommandMap = Maps.newHashMap();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {
+    long startTime = System.currentTimeMillis();
+    System.out.println("START " + testMethod.getName() + " at " + new Date(startTime));
+    testContext.setAttribute("StartTime", System.currentTimeMillis());
+  }
+
+  @AfterMethod
+  public void endTest(Method testMethod, ITestContext testContext) {
+    Long startTime = (Long) testContext.getAttribute("StartTime");
+    long endTime = System.currentTimeMillis();
+    System.out.println("END " + testMethod.getName() + " at " + new Date(endTime) + ", took: "
+        + (endTime - startTime) + "ms.");
+  }
+

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancer.java
##########
@@ -43,9 +45,21 @@
 import org.apache.helix.task.WorkflowConfig;
 import org.apache.helix.task.WorkflowContext;
 import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
 import org.testng.annotations.Test;
 
 public class TestTaskRebalancer extends TaskTestBase {
+  @BeforeClass
+  public void beforeClass() throws Exception {
+    super.beforeClass();
+  }
+
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();
+  }
+

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerParallel.java
##########
@@ -44,6 +45,11 @@ public void beforeClass() throws Exception {
     super.beforeClass();
   }
 
+  @AfterClass

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -51,20 +52,36 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
 import org.testng.annotations.Test;
 
 public class TestTaskRebalancerStopResume extends TaskTestBase {
   private static final Logger LOG = LoggerFactory.getLogger(TestTaskRebalancerStopResume.class);
   private static final String JOB_RESOURCE = "SomeJob";
 
+  @BeforeClass

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskSchedulingTwoCurrentStates.java
##########
@@ -93,6 +93,8 @@ public void beforeClass() throws Exception {
 
   @AfterClass()
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskThrottling.java
##########
@@ -50,6 +52,10 @@ public void beforeClass() throws Exception {
     super.beforeClass();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {

Review comment:
       removed.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestBatchAddJobs.java
##########
@@ -86,6 +86,9 @@ private String getPrefix(String job) {
 
   @AfterClass
   public void afterClass() {
+    String testClassName = this.getShortClassName();
+    System.out.println("AfterClass: " + testClassName + " of TestBachAddJobs called.");

Review comment:
         Changed to  System.out.println("AfterClass: " + testClassName + " called.");
   
   Now changed to print in parent class, as suggested.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -231,9 +239,17 @@ public static Date getDateFromStartTime(String startTime)
 
   public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,
       int failureThreshold, int capacity) {
+    return buildJobQueue(jobQueueName, delayStart, failureThreshold, capacity, false);
+  }
+
+  public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,
+      int failureThreshold, int capacity, boolean disablePurge) {

Review comment:
       Changed to do purge

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -247,6 +263,11 @@ public static Date getDateFromStartTime(String startTime)
     return new JobQueue.Builder(jobQueueName).setWorkflowConfig(workflowCfgBuilder.build());
   }
 
+  public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,

Review comment:
       changed to do purge

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -32,22 +34,23 @@
 import org.apache.helix.task.Workflow;
 import org.apache.helix.task.WorkflowConfig;
 import org.testng.Assert;
+import org.testng.ITestContext;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.AfterMethod;
 import org.testng.annotations.BeforeClass;
+import org.testng.annotations.BeforeMethod;
 import org.testng.annotations.Test;
 
 public class TestEnqueueJobs extends TaskTestBase {
 
-  @BeforeClass

Review comment:
       This should not be removed,




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1455: Fix task related flaky test due to phantom read

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1455:
URL: https://github.com/apache/helix/pull/1455#discussion_r502106398



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -61,12 +61,14 @@
 import org.apache.helix.task.TaskUtil;
 import org.apache.helix.task.WorkflowConfig;
 import org.apache.helix.task.WorkflowContext;
+import org.slf4j.LoggerFactory;
 import org.testng.Assert;
 
 /**
  * Static test utility methods.
  */
 public class TaskTestUtil {
+  private static final org.slf4j.Logger logger = LoggerFactory.getLogger(TaskTestUtil.class);

Review comment:
       nit, should be "LOG" based on our naming convention.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -231,9 +239,17 @@ public static Date getDateFromStartTime(String startTime)
 
   public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,

Review comment:
       This is a test method, we don't need to ensure backward compatibility. How about just change the method?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();

Review comment:
       What is this?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {
+    //WorkflowConfig.enableJobPurge();
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {

Review comment:
       Please confirm if this has already been done by the parent class ZkTestBase.
   I think this is duplicate logic.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -51,20 +52,36 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
 import org.testng.annotations.Test;
 
 public class TestTaskRebalancerStopResume extends TaskTestBase {
   private static final Logger LOG = LoggerFactory.getLogger(TestTaskRebalancerStopResume.class);
   private static final String JOB_RESOURCE = "SomeJob";
 
+  @BeforeClass

Review comment:
       Same here

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -491,7 +502,7 @@ public void testThreadLeak() throws InterruptedException {
    * Tests quota-based scheduling for a job queue with different quota types.
    * @throws InterruptedException
    */
-  @Test(dependsOnMethods = "testSchedulingWithoutQuota")
+  @Test(dependsOnMethods = "testSchedulingWithoutQuota", enabled = true)

Review comment:
       Isn't enabled default to be true?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -247,6 +263,11 @@ public static Date getDateFromStartTime(String startTime)
     return new JobQueue.Builder(jobQueueName).setWorkflowConfig(workflowCfgBuilder.build());
   }
 
+  public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,

Review comment:
       Same here.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestBatchAddJobs.java
##########
@@ -86,6 +86,9 @@ private String getPrefix(String job) {
 
   @AfterClass
   public void afterClass() {
+    String testClassName = this.getShortClassName();
+    System.out.println("AfterClass: " + testClassName + " of TestBachAddJobs called.");

Review comment:
       Humm, testClassName should be "TestBachAddJobs", right? So it will be "AfterClass: TestBachAddJobs of TestBachAddJobs called."?
   
   Moreover, why we add this logic to every child test class instead of print in the parent class?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestDeleteJobFromJobQueue.java
##########
@@ -69,11 +69,18 @@ public void testForceDeleteJobFromJobQueue() throws InterruptedException {
     Assert
         .assertNotNull(_driver.getJobContext(TaskUtil.getNamespacedJobName(jobQueueName, "job2")));
 
-    // The following force delete for the job should go through without getting an exception
+    // force deletion can have Exception thrown as controller is writing to propertystore path too.
+    // https://github.com/apache/helix/issues/1406
+
+    // Thus, we stop pipeline to make sure there is not such race condition.
+    _gSetupTool.getClusterManagementTool().enableCluster(CLUSTER_NAME, false);
+    Thread.sleep(3000); // note this sleep is critical as it would take time for controller to stop

Review comment:
       I think you also emphasized that we shall avoid using Thread.sleep() in the tests. 3-second sleep is not guaranteed to work.
   Can we use verifier with a timeout to wait until the pipeline stopped?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -202,7 +237,7 @@ public void testQueueParallelJobs() throws InterruptedException {
     Assert.assertTrue(minFinishTime > maxStartTime);
   }
 
-  @Test
+  @Test()

Review comment:
       Same here

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -65,12 +97,13 @@ public void testJobQueueAddingJobsOneByOne() throws InterruptedException {
         TaskState.COMPLETED);
   }
 
-  @Test
+  @Test()

Review comment:
       It's OK, but why?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {

Review comment:
       I did a quick test, I think you can just remove it and the super.afterClass() will still be called, since you don't have any method with the same name to override it in that case.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -41,13 +47,39 @@
   public void beforeClass() throws Exception {
     setSingleTestEnvironment();
     super.beforeClass();
+    //WorkflowConfig.disableJobPurge();
+  }
+
+  @AfterClass
+  @Override
+  public void afterClass() throws Exception {
+    //WorkflowConfig.enableJobPurge();
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {
+    long startTime = System.currentTimeMillis();
+    System.out.println("START " + testMethod.getName() + " at " + new Date(startTime));
+    testContext.setAttribute("StartTime", System.currentTimeMillis());
+  }
+
+  @AfterMethod
+  public void endTest(Method testMethod, ITestContext testContext) {

Review comment:
       Samething here, ZKTestBase has this method with same logic already.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -131,15 +165,16 @@ public void testJobSubmitGenericWorkflows() throws InterruptedException {
     _driver.pollForWorkflowState(workflowName, TaskState.COMPLETED);
   }
 
-  @Test
+  @Test()

Review comment:
       Same here

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -135,6 +141,7 @@ public void testSchedulingWithoutQuota() throws InterruptedException {
     Workflow.Builder workflowBuilder = new Workflow.Builder(workflowName);
     WorkflowConfig.Builder configBuilder = new WorkflowConfig.Builder(workflowName);
     configBuilder.setAllowOverlapJobAssignment(true);
+    configBuilder.setJobPurgeInterval(-1);

Review comment:
       Are you suggest we remove these changes one by one if later we fix the purge issue?
   @NealSun96 , what is the ETA of your change, please? I think removing these changes will be troublesome. Unless we separate them into a single change and revert it completely later.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestMaxNumberOfAttemptsMasterSwitch.java
##########
@@ -71,6 +71,9 @@ public void beforeClass() throws Exception {
 
   @AfterClass
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       Instead of doing this, I suggest refactoring the afterClass() method in TaskSynchronizedTestBase or ZkTestBase. In this case, you don't have duplicate code. And all the child classes have it automatically.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TaskTestUtil.java
##########
@@ -231,9 +239,17 @@ public static Date getDateFromStartTime(String startTime)
 
   public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,
       int failureThreshold, int capacity) {
+    return buildJobQueue(jobQueueName, delayStart, failureThreshold, capacity, false);
+  }
+
+  public static JobQueue.Builder buildJobQueue(String jobQueueName, int delayStart,
+      int failureThreshold, int capacity, boolean disablePurge) {

Review comment:
       nit, but it would be more intuitive to be "boolean doPurge"

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestQuotaBasedScheduling.java
##########
@@ -116,6 +117,11 @@ public void beforeClass() throws Exception {
     _jobCommandMap = Maps.newHashMap();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {

Review comment:
       I think it is not necessary. The super afterClass will still be called if there is no class with the same name override it in the child class. Please have a try.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -32,6 +32,7 @@
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Lists;
 import com.google.common.collect.Sets;
+import com.sun.corba.se.spi.orbutil.threadpool.Work;

Review comment:
       Humm, interesting package. What is the usage, please?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskSchedulingTwoCurrentStates.java
##########
@@ -93,6 +93,8 @@ public void beforeClass() throws Exception {
 
   @AfterClass()
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       Same here, try to update parent class please.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerParallel.java
##########
@@ -44,6 +45,11 @@ public void beforeClass() throws Exception {
     super.beforeClass();
   }
 
+  @AfterClass

Review comment:
       Same here.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -159,13 +190,16 @@ public void stopAndResumeNamedQueue() throws Exception {
     verifyJobNotInQueue(queueName, namespacedJob2);
   }
 
-  @Test
+  @Test(enabled = true)

Review comment:
       default is true.
   
     /**
      * Whether methods on this class/method are enabled.
      */
     public boolean enabled() default true;

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskAssignmentCalculator.java
##########
@@ -102,6 +108,26 @@ public void beforeClass() throws Exception {
     _jobCommandMap = Maps.newHashMap();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();
+  }
+
+  @BeforeMethod
+  public void beforeTest(Method testMethod, ITestContext testContext) {
+    long startTime = System.currentTimeMillis();
+    System.out.println("START " + testMethod.getName() + " at " + new Date(startTime));
+    testContext.setAttribute("StartTime", System.currentTimeMillis());
+  }
+
+  @AfterMethod
+  public void endTest(Method testMethod, ITestContext testContext) {
+    Long startTime = (Long) testContext.getAttribute("StartTime");
+    long endTime = System.currentTimeMillis();
+    System.out.println("END " + testMethod.getName() + " at " + new Date(endTime) + ", took: "
+        + (endTime - startTime) + "ms.");
+  }
+

Review comment:
       Same here, I think if the classes are defined carefully, then the parent class logic can cover these.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestStuckTaskQuota.java
##########
@@ -63,6 +63,9 @@ public void beforeClass() throws Exception {
 
   @AfterClass
   public void afterClass() throws Exception {
+    String testClassName = this.getShortClassName();

Review comment:
       As we can see, there are many changes like this. So let's do it in the parent class. Shall we?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancer.java
##########
@@ -43,9 +45,21 @@
 import org.apache.helix.task.WorkflowConfig;
 import org.apache.helix.task.WorkflowContext;
 import org.testng.Assert;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
 import org.testng.annotations.Test;
 
 public class TestTaskRebalancer extends TaskTestBase {
+  @BeforeClass
+  public void beforeClass() throws Exception {
+    super.beforeClass();
+  }
+
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();
+  }
+

Review comment:
       Same here. Are you sure we need them?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -413,25 +447,38 @@ public void stopAndDeleteQueue() throws Exception {
     // Create a queue
     System.out.println("START " + queueName + " at " + new Date(System.currentTimeMillis()));
     WorkflowConfig wfCfg =
-        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES).build();
+        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES)
+            .setJobPurgeInterval(-1)
+            .build();
     JobQueue qCfg = new JobQueue.Builder(queueName).fromMap(wfCfg.getResourceConfigMap()).build();
     _driver.createQueue(qCfg);
 
     // Enqueue 2 jobs
+    List<String> jobNames = new ArrayList<>();
+    List<JobConfig.Builder> jobBuilders = new ArrayList<>();
+
     Set<String> master = Sets.newHashSet("MASTER");
     JobConfig.Builder job1 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))

Review comment:
       I'm not sure if we really need it. But looks harmless.
   @alirezazamani @NealSun96 , could you please help to take another look?

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskRebalancerStopResume.java
##########
@@ -413,25 +447,38 @@ public void stopAndDeleteQueue() throws Exception {
     // Create a queue
     System.out.println("START " + queueName + " at " + new Date(System.currentTimeMillis()));
     WorkflowConfig wfCfg =
-        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES).build();
+        new WorkflowConfig.Builder(queueName).setExpiry(2, TimeUnit.MINUTES)
+            .setJobPurgeInterval(-1)
+            .build();
     JobQueue qCfg = new JobQueue.Builder(queueName).fromMap(wfCfg.getResourceConfigMap()).build();
     _driver.createQueue(qCfg);
 
     // Enqueue 2 jobs
+    List<String> jobNames = new ArrayList<>();
+    List<JobConfig.Builder> jobBuilders = new ArrayList<>();
+
     Set<String> master = Sets.newHashSet("MASTER");
     JobConfig.Builder job1 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))
         .setTargetResource(WorkflowGenerator.DEFAULT_TGT_DB).setTargetPartitionStates(master);
     String job1Name = "masterJob";
     LOG.info("Enqueuing job1: " + job1Name);
+    jobNames.add(job1Name);
+    jobBuilders.add(job1);
     _driver.enqueueJob(queueName, job1Name, job1);
 
     Set<String> slave = Sets.newHashSet("SLAVE");
     JobConfig.Builder job2 = new JobConfig.Builder().setCommand(MockTask.TASK_COMMAND)
+        .setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "2000"))
         .setTargetResource(WorkflowGenerator.DEFAULT_TGT_DB).setTargetPartitionStates(slave);
     String job2Name = "slaveJob";
     LOG.info("Enqueuing job2: " + job2Name);
+    jobNames.add(job2Name);
+    jobBuilders.add(job2);
     _driver.enqueueJob(queueName, job2Name, job2);
 
+    //_driver.enqueueJobs(queueName, jobNames, jobBuilders);

Review comment:
       ???

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskThrottling.java
##########
@@ -50,6 +52,10 @@ public void beforeClass() throws Exception {
     super.beforeClass();
   }
 
+  @AfterClass
+  public void afterClass() throws Exception {

Review comment:
       Same here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org