You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/09/24 21:34:31 UTC

[GitHub] [helix] kaisun2000 opened a new issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

kaisun2000 opened a new issue #1404:
URL: https://github.com/apache/helix/issues/1404


   LOG 1388
   
   >2020-09-24T08:57:45.2678267Z [ERROR] stopAndResumeNamedQueue(org.apache.helix.integration.task.TestTaskRebalancerStopResume)  Time elapsed: 601.489 s  <<< FAILURE!
   2020-09-24T08:57:45.2688663Z org.apache.helix.HelixException: Workflow "stopAndResumeNamedQueue", job "stopAndResumeNamedQueue_slaveJob" timed out
   2020-09-24T08:57:45.2700477Z 	at org.apache.helix.integration.task.TestTaskRebalancerStopResume.stopAndResumeNamedQueue(TestTaskRebalancerStopResume.java:143)
   2020-09-24T08:57:45.2703858Z 
   2020-09-24T08:57:45.6574743Z [ERROR] Failures: 
   2020-09-24T08:57:45.6579734Z [ERROR]   TestTaskRebalancerStopResume.stopAndResumeNamedQueue:143 » Helix Workflow "sto...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on issue #1404:
URL: https://github.com/apache/helix/issues/1404#issuecomment-699564717


   Add even more information to dump job ctx and job cfg with wf cfg


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on issue #1404:
URL: https://github.com/apache/helix/issues/1404#issuecomment-698600686


   try batch add


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on issue #1404:
URL: https://github.com/apache/helix/issues/1404#issuecomment-699553683


   stopDeleteJobAndResumeNamedQueue_slaveJob2_second not in ctx in property store.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on issue #1404:
URL: https://github.com/apache/helix/issues/1404#issuecomment-698600686


   try batch add


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on issue #1404:
URL: https://github.com/apache/helix/issues/1404#issuecomment-849105054


   Close test unstable tickets since we have an automatic tracking mechanism https://github.com/apache/helix/pull/1757 now for tracking the most recent test issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 commented on issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on issue #1404:
URL: https://github.com/apache/helix/issues/1404#issuecomment-699550264


   LOG 1734
   
   >2020-09-26T07:51:37.7357007Z [ERROR] stopDeleteJobAndResumeNamedQueue(org.apache.helix.integration.task.TestTaskRebalancerStopResume)  Time elapsed: 650.685 s  <<< FAILURE!
   2020-09-26T07:51:37.7455981Z org.apache.helix.HelixException: Workflow "stopDeleteJobAndResumeNamedQueue" context is null or job "stopDeleteJobAndResumeNamedQueue_slaveJob2_second" is not in states: [COMPLETED]; ctx is ZnRecord=WorkflowContext, {NAME=stopDeleteJobAndResumeNamedQueue, START_TIME=1601104360465, STATE=IN_PROGRESS}{JOB_STATES={stopDeleteJobAndResumeNamedQueue_slaveJob1=COMPLETED, stopDeleteJobAndResumeNamedQueue_slaveJob3=COMPLETED, stopDeleteJobAndResumeNamedQueue_slaveJob4=COMPLETED}, StartTime={stopDeleteJobAndResumeNamedQueue_slaveJob1=1601104360757, stopDeleteJobAndResumeNamedQueue_slaveJob3=1601104377687, stopDeleteJobAndResumeNamedQueue_slaveJob4=1601104394366}}{}, Stat=Stat {_version=0, _creationTime=0, _modifiedTime=0, _ephemeralOwner=0}, jobState is null .
   2020-09-26T07:51:37.7472762Z 	at org.apache.helix.integration.task.TestTaskRebalancerStopResume.stopDeleteJobAndResumeNamedQueue(TestTaskRebalancerStopResume.java:255)
   2020-09-26T07:51:37.7476604Z 
   2020-09-26T07:51:38.1533236Z [ERROR] Failures: 
   2020-09-26T07:51:38.1535848Z [ERROR]   TestTaskRebalancerStopResume.stopDeleteJobAndResumeNamedQueue:255 » Helix Work...
   
   code 
   ```
   public TaskState pollForJobState(String workflowName, String jobName, long timeout,
         TaskState... states) throws InterruptedException {
       // Get workflow config
       WorkflowConfig workflowConfig = getWorkflowConfig(workflowName);
   
       if (workflowConfig == null) {
         throw new HelixException(String.format("Workflow \"%s\" does not exists!", workflowName));
       }
   
       long timeToSleep = timeout > 50L ? 50L : timeout;
   
       WorkflowContext ctx;
       if (workflowConfig.isRecurring()) {
         // if it's recurring, need to reconstruct workflow and job name
         do {
           Thread.sleep(timeToSleep);
           ctx = getWorkflowContext(workflowName);
         } while ((ctx == null || ctx.getLastScheduledSingleWorkflow() == null));
   
         jobName = jobName.substring(workflowName.length() + 1);
         workflowName = ctx.getLastScheduledSingleWorkflow();
       }
   
       Set<TaskState> allowedStates = new HashSet<>(Arrays.asList(states));
       // Wait for state
       long st = System.currentTimeMillis();
       do {
         Thread.sleep(timeToSleep);
         ctx = getWorkflowContext(workflowName);
       } while ((ctx == null || ctx.getJobState(jobName) == null
           || !allowedStates.contains(ctx.getJobState(jobName)))
           && System.currentTimeMillis() < st + timeout);
   
       if (ctx == null || !allowedStates.contains(ctx.getJobState(jobName))) {
         throw new HelixException(
             String.format("Workflow \"%s\" context is null or job \"%s\" is not in states: %s; ctx is %s, jobState is %s .",
                 workflowName, jobName, allowedStates, ctx == null ? "null" : ctx, ctx != null ? ctx.getJobState(jobName) : "null"));
       }
   
       return ctx.getJobState(jobName);
     }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang closed issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

Posted by GitBox <gi...@apache.org>.
jiajunwang closed issue #1404:
URL: https://github.com/apache/helix/issues/1404


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 edited a comment on issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 edited a comment on issue #1404:
URL: https://github.com/apache/helix/issues/1404#issuecomment-699564717


   Add even more information to dump job ctx and job cfg with wf cfg
   
   ```
       if (ctx == null || !allowedStates.contains(ctx.getJobState(jobName))) {
         WorkflowConfig wfcfg = getWorkflowConfig(workflowName);
         JobConfig jobConfig = getJobConfig(jobName);
         JobContext jbCtx = getJobContext(jobName);
         throw new HelixException(
             String.format("Workflow \"%s\" context is null or job \"%s\" is not in states: %s; ctx is %s, jobState is %s, wf cfg %s, jobcfg %s, jbctx %s",
                 workflowName, jobName, allowedStates,
                 ctx == null ? "null" : ctx, ctx != null ? ctx.getJobState(jobName) : "null",
                 wfcfg, jobConfig, jbCtx));
       }
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] kaisun2000 edited a comment on issue #1404: fix TestTaskRebalancerStopResume.stopAndResumeNamedQueue

Posted by GitBox <gi...@apache.org>.
kaisun2000 edited a comment on issue #1404:
URL: https://github.com/apache/helix/issues/1404#issuecomment-699550264


   LOG 1734
   
   >2020-09-26T07:51:37.7357007Z [ERROR] stopDeleteJobAndResumeNamedQueue(org.apache.helix.integration.task.TestTaskRebalancerStopResume)  Time elapsed: 650.685 s  <<< FAILURE!
   2020-09-26T07:51:37.7455981Z org.apache.helix.HelixException: Workflow "stopDeleteJobAndResumeNamedQueue" context is null or job "stopDeleteJobAndResumeNamedQueue_slaveJob2_second" is not in states: [COMPLETED]; ctx is ZnRecord=WorkflowContext, {NAME=stopDeleteJobAndResumeNamedQueue, START_TIME=1601104360465, **STATE=IN_PROGRESS**}{JOB_STATES={stopDeleteJobAndResumeNamedQueue_slaveJob1=COMPLETED, stopDeleteJobAndResumeNamedQueue_slaveJob3=COMPLETED, stopDeleteJobAndResumeNamedQueue_slaveJob4=COMPLETED}, StartTime={stopDeleteJobAndResumeNamedQueue_slaveJob1=1601104360757, stopDeleteJobAndResumeNamedQueue_slaveJob3=1601104377687, stopDeleteJobAndResumeNamedQueue_slaveJob4=1601104394366}}{}, Stat=Stat {_version=0, _creationTime=0, _modifiedTime=0, _ephemeralOwner=0}, jobState is null .
   2020-09-26T07:51:37.7472762Z 	at org.apache.helix.integration.task.TestTaskRebalancerStopResume.stopDeleteJobAndResumeNamedQueue(TestTaskRebalancerStopResume.java:255)
   2020-09-26T07:51:37.7476604Z 
   2020-09-26T07:51:38.1533236Z [ERROR] Failures: 
   2020-09-26T07:51:38.1535848Z [ERROR]   TestTaskRebalancerStopResume.stopDeleteJobAndResumeNamedQueue:255 » Helix Work...
   
   code 
   ```
   public TaskState pollForJobState(String workflowName, String jobName, long timeout,
         TaskState... states) throws InterruptedException {
       // Get workflow config
       WorkflowConfig workflowConfig = getWorkflowConfig(workflowName);
   
       if (workflowConfig == null) {
         throw new HelixException(String.format("Workflow \"%s\" does not exists!", workflowName));
       }
   
       long timeToSleep = timeout > 50L ? 50L : timeout;
   
       WorkflowContext ctx;
       if (workflowConfig.isRecurring()) {
         // if it's recurring, need to reconstruct workflow and job name
         do {
           Thread.sleep(timeToSleep);
           ctx = getWorkflowContext(workflowName);
         } while ((ctx == null || ctx.getLastScheduledSingleWorkflow() == null));
   
         jobName = jobName.substring(workflowName.length() + 1);
         workflowName = ctx.getLastScheduledSingleWorkflow();
       }
   
       Set<TaskState> allowedStates = new HashSet<>(Arrays.asList(states));
       // Wait for state
       long st = System.currentTimeMillis();
       do {
         Thread.sleep(timeToSleep);
         ctx = getWorkflowContext(workflowName);
       } while ((ctx == null || ctx.getJobState(jobName) == null
           || !allowedStates.contains(ctx.getJobState(jobName)))
           && System.currentTimeMillis() < st + timeout);
   
       if (ctx == null || !allowedStates.contains(ctx.getJobState(jobName))) {
         throw new HelixException(
             String.format("Workflow \"%s\" context is null or job \"%s\" is not in states: %s; ctx is %s, jobState is %s .",
                 workflowName, jobName, allowedStates, ctx == null ? "null" : ctx, ctx != null ? ctx.getJobState(jobName) : "null"));
       }
   
       return ctx.getJobState(jobName);
     }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org