You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/06/14 11:40:48 UTC

[GitHub] [dolphinscheduler] gglinux opened a new issue, #10443: [Bug] [dependent node] Dependent nodes are always blocked

gglinux opened a new issue, #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   Occasionally dependent nodes are blocked all the time. In fact, the upstream node has been completed.
   
   I haven't found an accurate recurrence rule yet. Most of them will appear after I edit the node
   
   <img width="545" alt="image" src="https://user-images.githubusercontent.com/7273957/173568139-a6ba1f6a-b752-499c-9c8a-ca7c3a45b6a0.png">
   
   
   ### What you expected to happen
   
   I get this log...
   
   [INFO] 2022-06-10 18:00:10.754 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[301] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 471707 process instance id: 136719 context: null
   
   /data/emr/dolphinscheduler/logs/dolphinscheduler-master.2022-06-10_18.9.log:
   
   [INFO] 2022-06-10 18:26:49.211 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[301] - process event: State Event :key: 133921-0-133921-0 type: PROCESS_STATE_CHANGE executeStatus: READY_STOP task instance id: 0 process instance id: 133921 context: null
   
   
   <img width="1715" alt="企业微信截图_dbf16b9e-25d5-4933-808b-173f559163ff" src="https://user-images.githubusercontent.com/7273957/173568580-08b06d7a-93eb-40cd-9cb7-54a9e243acc9.png">
   
   
   ### How to reproduce
   
   I haven't found an accurate recurrence rule yet. Most of them will appear after I edit the node
   
   ### Anything else
   
   sometimes...
   
   ### Version
   
   2.0.5
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1155071286

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] weeway commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
weeway commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1165242099

   @gglinux I have the same problem and I have not idea to reproduce it. The process is scheduled to execute every day, but the problem occurs several times a week or does not occur. it is really annoying.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS closed issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
SbloodyS closed issue #10443: [Bug] [dependent node] Dependent nodes are always blocked
URL: https://github.com/apache/dolphinscheduler/issues/10443


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
SbloodyS commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170665164

   Close this issue due to fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] weeway commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
weeway commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170882016

   > > I think this is the bug of 2.0.5. It has been solved in #10541 and will be released in 2.0.6
   > 
   > @JinyLeeChina I switch to the branch `2.0.6-prepare` and test it. The problem still exists. Please check it again. Thanks.
   > 
   > ### The topology
   > There are three process pA, pB and pC.
   > 
   > * pA has a task pAt1
   > * pB has a dependent task pBt1 depending on pAt1
   > * pC has a task pCt1 that is long running task
   > 
   > ### How to reproduce it?
   > * start pB
   > * start pC with failedStrategy `End`
   > * stop pC
   > * start pA and pA finished
   > * finally, In the UI you can see that the dependent task pBt1 always running
   > 
   > The detail to start pC: <img alt="image" width="836" src="https://user-images.githubusercontent.com/12637868/176584789-af83ecfc-9606-4c77-b706-ff7caa7829ad.png">
   > 
   > ### The reason
   > All `WorkflowExecuteThread` instances share the same `depStateCheckList `. When pC starting with failedStartegy `End`, you stop it then the `depStateCheckList` be all cleared. **The correct logic is only clearing the taskinstance belong to pC**.
   > 
   > The critical Code in `WorkflowExecuteThread` based on `2.0.6-prepare`: `org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread#processStateChangeHandler` <img alt="image" width="752" src="https://user-images.githubusercontent.com/12637868/176620035-123bc3a8-1b24-4537-b877-154a7fdb2926.png">
   > 
   > `org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread#killAllTasks` <img alt="image" width="851" src="https://user-images.githubusercontent.com/12637868/176620122-c0964ac4-6420-43bc-a5fd-d268eaf58316.png">
   
   ### Another situation to reproduce the problem
   And there are another situation to reproduce it:
   - start pB
   - start pC with failedStrategy `End`, containing a always fail task
   - pC failed
   - start pA and pA finished
   - finally, In the UI you can see that the dependent task pBt1 always running
   
   ### The Code
   `org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread#taskFinished`
   <img width="834" alt="image" src="https://user-images.githubusercontent.com/12637868/176621477-13549b94-1636-459c-aa6e-67fa573d3851.png">
   
   ### The reason
   same as the first situation metioned previously
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] lordk911 commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
lordk911 commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170081554

   when Dependent node blocked , can you see "submit standby task error 
   java.lang.NullPointerException: null" log message on master-server's log?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] lordk911 commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
lordk911 commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170670191

   > The reason for this bug is that the dependent node starts earlier than the dependent task. Because there is no internal mechanism to judge the completion of the dependent task, the dependent node waits empty
   
   thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina closed issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
JinyLeeChina closed issue #10443: [Bug] [dependent node] Dependent nodes are always blocked
URL: https://github.com/apache/dolphinscheduler/issues/10443


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
JinyLeeChina commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170667694

   > the dependent type task is run on master , master.exec.threads default is 100, will increase this value be help ? thanks. @JinyLeeChina
   
   The reason for this bug is that the dependent node starts earlier than the dependent task. Because there is no internal mechanism to judge the completion of the dependent task, the dependent node waits empty


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] weeway commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
weeway commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170705973

   > I think this is the bug of 2.0.5. It has been solved in #10541 and will be released in 2.0.6
   
   I switch to the branch `2.0.6-prepare` and test it. The problem still exists.
   
   ### The topology
   There are three process pA, pB and pC. 
   - pA has a task pAt1
   - pB has a dependent task pBt1 depending on pAt1
   - pC has a task pCt1 that is long running task
   
   ### How to reproduce it?
   - start pB
   - start pC with failedStrategy `End`
   - stop pC
   - start pA and pA finished
   - finally, In the UI you can see that the dependent task pBt1 always running
   
   The detail to start pC:
   <img width="836" alt="image" src="https://user-images.githubusercontent.com/12637868/176584789-af83ecfc-9606-4c77-b706-ff7caa7829ad.png">
   
   ### The reason
   All `WorkflowExecuteThread` instances share the same `taskRetryCheckList `. When pC starting with failedStartegy `End`, you stop it then the `taskRetryCheckList` be all cleared. **The correct logic is only clearing the taskinstance belong to pC**.
   
   The critical Code in `WorkflowExecuteThread`:
   `org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread#taskFinished`
   <img width="872" alt="image" src="https://user-images.githubusercontent.com/12637868/176585635-9b49c84d-ddd9-472c-a797-a5cc6d2151e9.png">
   
   `org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread#killAllTasks`
   <img width="626" alt="image" src="https://user-images.githubusercontent.com/12637868/176585703-cc267b8a-dec9-476e-8015-7aa8463345c5.png">
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] weeway commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
weeway commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170651115

   I found the root cause and already submited a [PR](https://github.com/apache/dolphinscheduler/pull/10684)
   
   The reason is that all `WorkflowExecuteThread` share the same `taskRetryCheckList` and dependent task checking rely on the `taskRetryCheckList `. When a workflow run with `failedstrategy:end tasks` and some task failed in the workflow, the  `taskRetryCheckList ` will be clear including the dependent tasks belong to another workflow. Beacause the `taskRetryCheckList ` is empty, the `StateWheelExecuteThread` can not generate `TASK_STATE_CHANGE` any more then the dependent tasks will not be checked any more. So in the UI, the dependent task node are always running and waiting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] lordk911 commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
lordk911 commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1169848117

   the dependent type task is run on master , master.exec.threads default is 100, will increase this value be help ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
JinyLeeChina commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170718559

   OK, let me test by this way


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] lordk911 commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
lordk911 commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1169692038

   when a task is running with a long time about more than one hour, the dependent nodes that depend on the task are always running and waiting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
JinyLeeChina commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1170657564

   I think this is the bug of 2.0.5. It has been solved in #1054 and will be released in 2.0.6


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
JinyLeeChina commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1172059387

   @weeway Very good.  I reproduced the bug and reviewed the pr. please have a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #10443: [Bug] [dependent node] Dependent nodes are always blocked

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #10443:
URL: https://github.com/apache/dolphinscheduler/issues/10443#issuecomment-1155071036

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   Occasionally dependent nodes are blocked all the time. In fact, the upstream node has been completed.
   
   I haven't found an accurate recurrence rule yet. Most of them will appear after I edit the node
   
   <img width="545" alt="image" src="https://user-images.githubusercontent.com/7273957/173568139-a6ba1f6a-b752-499c-9c8a-ca7c3a45b6a0.png">
   
   
   ### What you expected to happen
   
   I get this log...
   
   [INFO] 2022-06-10 18:00:10.754 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[301] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: RUNNING_EXECUTION task instance id: 471707 process instance id: 136719 context: null
   
   /data/emr/dolphinscheduler/logs/dolphinscheduler-master.2022-06-10_18.9.log:
   
   [INFO] 2022-06-10 18:26:49.211 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[301] - process event: State Event :key: 133921-0-133921-0 type: PROCESS_STATE_CHANGE executeStatus: READY_STOP task instance id: 0 process instance id: 133921 context: null
   
   
   <img width="1715" alt="企业微信截图_dbf16b9e-25d5-4933-808b-173f559163ff" src="https://user-images.githubusercontent.com/7273957/173568580-08b06d7a-93eb-40cd-9cb7-54a9e243acc9.png">
   
   
   ### How to reproduce
   
   I haven't found an accurate recurrence rule yet. Most of them will appear after I edit the node
   
   ### Anything else
   
   sometimes...
   
   ### Version
   
   2.0.5
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org