You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/05/04 10:01:22 UTC

[GitHub] [dolphinscheduler] JinyLeeChina opened a new issue, #9873: [Bug] [Server] Fault tolerance of dependent nodes leads to stuck

JinyLeeChina opened a new issue, #9873:
URL: https://github.com/apache/dolphinscheduler/issues/9873

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   [INFO] 2022-05-04 06:37:50.756 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[380] - work flow 1129 task 9904 state:NEED_FAULT_TOLERANCE
   [INFO] 2022-05-04 06:37:50.757 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1278] - add task to stand by list, task name:dep_dwd_1d_dag, task id:9904, task code:5196592535840
   [INFO] 2022-05-04 06:37:50.758 org.apache.dolphinscheduler.service.process.ProcessService:[1080] - start submit task : dep_dwd_1d_dag, instance id:1129, state: RUNNING_EXECUTION
   [INFO] 2022-05-04 06:37:50.761 org.apache.dolphinscheduler.service.process.ProcessService:[1093] - end submit task to db successfully:9904 dep_dwd_1d_dag state:SUBMITTED_SUCCESS complete, instance id:1129 state: RUNNING_EXECUTION
   [INFO] 2022-05-04 06:37:50.767 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1292] - remove task from stand by list, id: 9904 name:dep_dwd_1d_dag
   [WARN] 2022-05-04 06:38:03.004 com.zaxxer.hikari.pool.PoolBase:[184] - DolphinScheduler - Failed to validate connection com.mysql.jdbc.JDBC4Connection@4eba99b2 (No operations allowed after connection closed.). Possibly consider using a shorter maxLifetime value.
   [INFO] 2022-05-04 06:47:50.342 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - failover execute started
   [INFO] 2022-05-04 06:47:50.344 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - need failover hosts:[host:port]
   [INFO] 2022-05-04 06:47:50.349 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - start master[host:port] failover, process list size:2
   [INFO] 2022-05-04 06:47:50.351 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[442] - failover task instance id: 9904, process instance id: 1129
   [INFO] 2022-05-04 06:47:51.352 org.apache.dolphinscheduler.service.log.LogClientService:[117] - view log path /opt/dolphinscheduler/logs/5196620942752_3/1129/9904.log
   [ERROR] 2022-05-04 06:47:51.352 org.apache.dolphinscheduler.common.utils.LoggerUtils:[117] - read file error
   java.io.FileNotFoundException: /opt/dolphinscheduler/logs/5196620942752_3/1129/9904.log (No such file or directory)
   	at java.io.FileInputStream.open0(Native Method)
   	at java.io.FileInputStream.open(FileInputStream.java:195)
   	at java.io.FileInputStream.<init>(FileInputStream.java:138)
   	at java.io.FileInputStream.<init>(FileInputStream.java:93)
   	at org.apache.dolphinscheduler.common.utils.LoggerUtils.readWholeFileContent(LoggerUtils.java:111)
   	at org.apache.dolphinscheduler.service.log.LogClientService.viewLog(LogClientService.java:123)
   	at org.apache.dolphinscheduler.server.utils.ProcessUtils.killYarnJob(ProcessUtils.java:190)
   	at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverTaskInstance(MasterRegistryClient.java:486)
   	at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverMaster(MasterRegistryClient.java:443)
   	at org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread.run(FailoverExecuteThread.java:80)
   [INFO] 2022-05-04 06:47:51.353 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed
   [INFO] 2022-05-04 06:47:51.353 org.apache.dolphinscheduler.service.log.LogClientService:[74] - logger client closed
   [INFO] 2022-05-04 06:47:51.356 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - master[host:port] failover end, useTime:1008ms
   [INFO] 2022-05-04 06:47:51.803 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 1129 , events count:1
   [INFO] 2022-05-04 06:47:51.803 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   [INFO] 2022-05-04 06:47:51.803 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[301] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: NEED_FAULT_TOLERANCE task instance id: 9904 process instance id: 1129 context: null
   [INFO] 2022-05-04 06:47:51.804 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[380] - work flow 1129 task 9904 state:NEED_FAULT_TOLERANCE
   [INFO] 2022-05-04 06:47:51.805 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1278] - add task to stand by list, task name:dep_dwd_1d_dag, task id:9904, task code:5196592535840
   [INFO] 2022-05-04 06:47:51.806 org.apache.dolphinscheduler.service.process.ProcessService:[1080] - start submit task : dep_dwd_1d_dag, instance id:1129, state: RUNNING_EXECUTION
   [INFO] 2022-05-04 06:47:51.809 org.apache.dolphinscheduler.service.process.ProcessService:[1093] - end submit task to db successfully:9904 dep_dwd_1d_dag state:SUBMITTED_SUCCESS complete, instance id:1129 state: RUNNING_EXECUTION
   [INFO] 2022-05-04 06:47:51.814 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1292] - remove task from stand by list, id: 9904 name:dep_dwd_1d_dag
   
   
   ### What you expected to happen
   
   It can be normal
   
   ### How to reproduce
   
    Fault tolerance of dependent nodes
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   2.0.5
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] TheOldOne-SU commented on issue #9873: [Bug] [Server] Fault tolerance of dependent nodes leads to stuck

Posted by GitBox <gi...@apache.org>.
TheOldOne-SU commented on issue #9873:
URL: https://github.com/apache/dolphinscheduler/issues/9873#issuecomment-1179965074

   我也有遇见了这个问题,有没有解决方案?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina commented on issue #9873: [Bug] [Server] Fault tolerance of dependent nodes leads to stuck

Posted by GitBox <gi...@apache.org>.
JinyLeeChina commented on issue #9873:
URL: https://github.com/apache/dolphinscheduler/issues/9873#issuecomment-1180084170

   It has been repaired in version 2.0.6. Please see #10517 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] closed issue #9873: [Bug] [Server] Fault tolerance of dependent nodes leads to stuck

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #9873: [Bug] [Server] Fault tolerance of dependent nodes leads to stuck
URL: https://github.com/apache/dolphinscheduler/issues/9873


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #9873: [Bug] [Server] Fault tolerance of dependent nodes leads to stuck

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9873:
URL: https://github.com/apache/dolphinscheduler/issues/9873#issuecomment-1146474774

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #9873: [Bug] [Server] Fault tolerance of dependent nodes leads to stuck

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9873:
URL: https://github.com/apache/dolphinscheduler/issues/9873#issuecomment-1152813472

   This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #9873: [Bug] [Server] Fault tolerance of dependent nodes leads to stuck

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9873:
URL: https://github.com/apache/dolphinscheduler/issues/9873#issuecomment-1117134915

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://join.slack.com/t/asf-dolphinscheduler/shared_invite/zt-omtdhuio-_JISsxYhiVsltmC5h38yfw) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org