You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/05/06 08:54:56 UTC

[GitHub] [dolphinscheduler] ShayvChan opened a new issue, #9916: [Bug] [Master] Dependent Node Did Not Retry

ShayvChan opened a new issue, #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
    The dependent node did not retry after fail sometimes running after retry.
   And the lastest task instance execute duration is null
   ![图片](https://user-images.githubusercontent.com/25319167/167099254-139aa110-f39e-4eb5-b80f-76a2bedce842.png)
   
   
   ### What you expected to happen
   
   All dependent node should retry with the config
   
   ### How to reproduce
   
   1. Create about 10 process definition  instance with dependent node(another process definition), set the retry times(20) and retry interval(5 mins).
   2. Ensure the dependent process definition state is fail.
   3. Run all the process definition which has dependent node.
   4. Found some  process instance with not trigger retry after taskinstance fail
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   2.0.5
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] ShayvChan commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
ShayvChan commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1119402368

   ![图片](https://user-images.githubusercontent.com/25319167/167100350-47dabeb3-ef0d-42d2-b068-6d7c06aef869.png)
   If i try to restart the master, it will trigger a retry immediately. After this retry fail,  still not trigger next retry


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] ShayvChan commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
ShayvChan commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1120230197

   <img width="1201" alt="截屏2022-05-07 下午11 39 38" src="https://user-images.githubusercontent.com/25319167/167261552-a2e5fb88-88f5-4362-882b-b819d98b320f.png">
   <img width="1238" alt="截屏2022-05-07 下午11 40 51" src="https://user-images.githubusercontent.com/25319167/167261558-1e45d529-6beb-453d-b6c7-fcbacd71f49f.png">
   <img width="845" alt="截屏2022-05-07 下午11 42 03" src="https://user-images.githubusercontent.com/25319167/167261561-24a982db-a425-4d14-893b-d7799fcf3ea2.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
SbloodyS commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1120104773

   > Update: when the dependent process definition not schedule, all process definition which depend it will trigger retry normally. But when dependent process definition scheduler up(every 5 min), some process definition(every 10 min) which depend it cannot trigger retry normally
   
   Could you take a screenshot of this dependent task's config?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] closed issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #9916: [Bug] [Master] Dependent Node Did Not Retry
URL: https://github.com/apache/dolphinscheduler/issues/9916


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1119400225

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-omtdhuio-_JISsxYhiVsltmC5h38yfw) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] ShayvChan commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
ShayvChan commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1119449925

   there have some retry logs of the first two times, but  have not next trigger log
   ```
   [INFO] 2022-05-06 17:48:04.528 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[52] - received command : CacheExpireCommand{CacheType=PROCESS_DEFINITION, cacheKey=5415903871264}
   [INFO] 2022-05-06 17:48:04.530 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[70] - cache evict, type:processDefinition, key:5415903871264
   [INFO] 2022-05-06 17:48:05.117 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[255] - find command 5408, slot:0 :
   [INFO] 2022-05-06 17:48:05.117 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[202] - find one command: id: 5408, type: START_PROCESS
   [INFO] 2022-05-06 17:48:05.180 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - handle command end, command 5408 process 5272 start...
   [INFO] 2022-05-06 17:48:05.250 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1276] - add task to stand by list, task name:depend, task id:0, task code:5415870599968
   [INFO] 2022-05-06 17:48:05.261 org.apache.dolphinscheduler.service.process.ProcessService:[1080] - start submit task : depend, instance id:5272, state: RUNNING_EXECUTION
   [INFO] 2022-05-06 17:48:05.274 org.apache.dolphinscheduler.service.process.ProcessService:[1093] - end submit task to db successfully:9832 depend state:SUBMITTED_SUCCESS complete, instance id:5272 state: RUNNING_EXECUTION  
   [INFO] 2022-05-06 17:48:05.337 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1290] - remove task from stand by list, id: 9832 name:depend
   [INFO] 2022-05-06 17:48:07.935 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 5272 , events count:1
   [INFO] 2022-05-06 17:48:07.935 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   [INFO] 2022-05-06 17:48:07.936 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[301] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 9832 process instance id: 5272 context: null
   [INFO] 2022-05-06 17:48:07.945 TaskLogLogger-class org.apache.dolphinscheduler.server.master.runner.task.DependentTaskProcessor:[187] - dependent item complete :|| 5305615306658-0-day-today,FAILED
   [INFO] 2022-05-06 17:48:07.951 TaskLogLogger-class org.apache.dolphinscheduler.server.master.runner.task.DependentTaskProcessor:[209] - dependent task completed, dependent result:FAILED
   [INFO] 2022-05-06 17:48:07.961 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[378] - work flow 5272 task 9832 state:FAILURE 
   [INFO] 2022-05-06 17:48:07.965 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1276] - add task to stand by list, task name:depend, task id:9832, task code:5415870599968
   [INFO] 2022-05-06 17:48:07.966 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[385] - failure task will be submitted: process id: 5272, task instance id: 9832 state:FAILURE retry times:0 / 20, interval:5
   [INFO] 2022-05-06 17:50:09.576 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[52] - received command : CacheExpireCommand{CacheType=PROCESS_DEFINITION, cacheKey=5305324465696}
   [INFO] 2022-05-06 17:50:09.576 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[70] - cache evict, type:processDefinition, key:5305324465696
   [INFO] 2022-05-06 17:50:09.580 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[52] - received command : CacheExpireCommand{CacheType=PROCESS_TASK_RELATION, cacheKey=5303710835744_5305324465696}
   [INFO] 2022-05-06 17:50:09.580 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[70] - cache evict, type:processTaskRelation, key:5303710835744_5305324465696
   [INFO] 2022-05-06 17:50:27.820 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[52] - received command : CacheExpireCommand{CacheType=PROCESS_DEFINITION, cacheKey=5305324469408}
   [INFO] 2022-05-06 17:50:27.821 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[70] - cache evict, type:processDefinition, key:5305324469408
   [INFO] 2022-05-06 17:50:27.824 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[52] - received command : CacheExpireCommand{CacheType=PROCESS_TASK_RELATION, cacheKey=5303710835744_5305324469408}
   [INFO] 2022-05-06 17:50:27.824 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[70] - cache evict, type:processTaskRelation, key:5303710835744_5305324469408
   [INFO] 2022-05-06 17:53:00.030 org.apache.dolphinscheduler.service.quartz.ProcessScheduleJob:[74] - scheduled fire time :Fri May 06 17:53:00 CST 2022, fire time :Fri May 06 17:53:00 CST 2022, process id :61
   [INFO] 2022-05-06 17:53:11.987 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 5272 , events count:1
   [INFO] 2022-05-06 17:53:11.988 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   [INFO] 2022-05-06 17:53:11.989 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[301] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 9832 process instance id: 5272 context: null
   [INFO] 2022-05-06 17:53:11.995 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[378] - work flow 5272 task 9832 state:FAILURE 
   [WARN] 2022-05-06 17:53:11.995 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1254] - task was found in ready submit queue, task code:5415870599968
   [INFO] 2022-05-06 17:53:12.002 org.apache.dolphinscheduler.service.process.ProcessService:[1080] - start submit task : depend, instance id:5272, state: RUNNING_EXECUTION
   [INFO] 2022-05-06 17:53:12.018 org.apache.dolphinscheduler.service.process.ProcessService:[1093] - end submit task to db successfully:9835 depend state:SUBMITTED_SUCCESS complete, instance id:5272 state: RUNNING_EXECUTION  
   [INFO] 2022-05-06 17:53:12.058 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1290] - remove task from stand by list, id: 9835 name:depend
   [INFO] 2022-05-06 17:53:16.989 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 5272 , events count:1
   [INFO] 2022-05-06 17:53:16.990 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   [INFO] 2022-05-06 17:53:16.991 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[301] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 9835 process instance id: 5272 context: null
   [INFO] 2022-05-06 17:53:16.997 TaskLogLogger-class org.apache.dolphinscheduler.server.master.runner.task.DependentTaskProcessor:[187] - dependent item complete :|| 5305615306658-0-day-today,FAILED
   [INFO] 2022-05-06 17:53:17.000 TaskLogLogger-class org.apache.dolphinscheduler.server.master.runner.task.DependentTaskProcessor:[209] - dependent task completed, dependent result:FAILED
   [INFO] 2022-05-06 17:53:17.010 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[378] - work flow 5272 task 9835 state:FAILURE 
   [INFO] 2022-05-06 17:53:17.012 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1276] - add task to stand by list, task name:depend, task id:9835, task code:5415870599968
   [INFO] 2022-05-06 17:53:17.013 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[385] - failure task will be submitted: process id: 5272, task instance id: 9835 state:FAILURE retry times:1 / 20, interval:5
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] zhongjiajie commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1134384388

   I will close this issue if there is no response in the next 14 days


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
SbloodyS commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1119425759

   Hi @ShayvChan , Is there any error log in ```dolphinscheduler-master.log```?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1119400038

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
    The dependent node did not retry after fail sometimes running after retry.
   And the lastest task instance execute duration is null
   ![图片](https://user-images.githubusercontent.com/25319167/167099254-139aa110-f39e-4eb5-b80f-76a2bedce842.png)
   
   
   ### What you expected to happen
   
   All dependent node should retry with the config
   
   ### How to reproduce
   
   1. Create about 10 process definition  instance with dependent node(another process definition), set the retry times(20) and retry interval(5 mins).
   2. Ensure the dependent process definition state is fail.
   3. Run all the process definition which has dependent node.
   4. Found some  process instance with not trigger retry after taskinstance fail
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   2.0.5
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1163792760

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
SbloodyS commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1120417536

   @ShayvChan From your screenshot, the dependent node is not configured with the number of retries. And I have test a dependent node with 5 retry times and 1 min retry interval in 2.0.5-release standalone mode. It works fine. Is there a way to reproduce accurately?
   
   ![](https://vip2.loli.io/2022/05/08/ZlqVhW9OCRfwX2n.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1174471981

   This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] ShayvChan commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
ShayvChan commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1119438505

   @SbloodyS We can't found any error in master log


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] ShayvChan commented on issue #9916: [Bug] [Master] Dependent Node Did Not Retry

Posted by GitBox <gi...@apache.org>.
ShayvChan commented on issue #9916:
URL: https://github.com/apache/dolphinscheduler/issues/9916#issuecomment-1119518759

   Update: when the dependent process definition not schedule, all process definition which depend it will trigger retry normally. But when dependent process definition scheduler up(every 5 min), some process definition(every 10 min) which depend it cannot trigger retry normally


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org