You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2021/11/22 07:54:08 UTC

[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #6959: [Bug] [endless loop] Workergroup have only one Worker,when this worker is down。run task job is endless loop

github-actions[bot] commented on issue #6959:
URL: https://github.com/apache/dolphinscheduler/issues/6959#issuecomment-975215962

### Search before asking

-[X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.

### What happened

My dolphinScheduler-cluster version is 1.3.3, I have double master and many WorkerGroup,one WorkerGroup have only one Worker.
When this worker is down. run task job is endless loop.
All tasks are blocked.
-------------------------------------------------- ----------------------------------------
My Dolphin cluster version is 1.3.3, I have two master nodes and many work groups, one of which has one worker node.
When my worker node service is down. The task that configures this node to run will go through an endless loop.
All tasks will wait for the end of this endless loop to execute.

### What you expected to happen

When one WorkerGroup is abnormal, It doesn't affect anything else.
-------------------------------------------------- ----------------------------------------
When my workgroup node is down, it will not affect the execution of other tasks

### How to reproduce

1 set two WorkerGroup ,A and B, Each work have a Worker;
2 set two task, one is A,other is B;
3 stop B worker-service, then run more workerGroupB job;
4 run A task, then this task are blocked.
-------------------------------------------------- ----------------------------------------
1. I have two working groups, A and B. Set up a Worker for each workgroup.
2. Set up two tasks, one task is executed by A working group, and the other is executed by B.
3. I close the Worker service of the worker B node in the server, and run the task of the B working group at the same time. Run a few more.
4. If you execute A again at this time, you will find that A cannot be executed.
Looking at the log, you will find that the node is down constantly.

### Anything else

dolphinscheduler-master.log:
[ERROR] 2021-11-18 14:42:24.568 org.apache.dolphinscheduler.server.master.consumer.TaskPriorityQueueConsumer:[148]-dispatch error
org.apache.dolphinscheduler.server.master.dispatch.exceptions.ExecuteException: fail to execute: Command [type=TASK_EXECUTE_REQUEST, opaque=2867, bodyLen=1735] due to no suitable worker, current task need to bi worker group execute
at org.apache.dolphinscheduler.server.master.dispatch.ExecutorDispatcher.dispatch(ExecutorDispatcher.java:87)
at org.apache.dolphinscheduler.server.master.consumer.TaskPriorityQueueConsumer.dispatch(TaskPriorityQueueConsumer.java:145)
at org.apache.dolphinscheduler.server.master.consumer.TaskPriorityQueueConsumer.run(TaskPriorityQueueConsumer.java:114)

### Version

1.3.3

### Are you willing to submit PR?

-[X] Yes I am willing to submit a PR!

### Code of Conduct

-[X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org