You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by "Sky-Gu (via GitHub)" <gi...@apache.org> on 2023/03/13 06:02:56 UTC

[GitHub] [dolphinscheduler] Sky-Gu opened a new issue, #13726: [Bug] [TaskGroupQueue] Task group release data dirty write

Sky-Gu opened a new issue, #13726:
URL: https://github.com/apache/dolphinscheduler/issues/13726

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   version:3.1.3
   
   When there are too many tasks in the task group, the tasks in the queue are incorrectly updated.
   
   t_ds_task_group_queue.in_queue will be updated to 1
    
   If t_ds_task_group_queue.in_queue = 1, the queued task cannot be submitted again and can only be executed forcibly.
   
   If t_ds_task_group_queue.in_queue = 1, the workflow cannot be terminated or even cancelled, and subsequent tasks submitted to the task group are affected.
    
   
   
   The actual task id=2175, but the id=2178 was updated during the update, as shown in the log screenshot
   ![1](https://user-images.githubusercontent.com/5527811/224618543-dc144a88-f778-4d69-8034-aae78cdc3411.png)
   
   Possible exception code as the screenshot below
   ![2](https://user-images.githubusercontent.com/5527811/224618952-80ad4ce4-0ae2-4273-bcff-2401f02f7acd.png)
   
   
   
   ### What you expected to happen
   
   Task groups can be released correctly
    
   
   ### How to reproduce
   
   Multiple tasks are submitted to the task group at the same time, and the time required for each task is different. There is a probability that this problem will occur
    
   
   ### Anything else
   
   [INFO] 2023-03-13 11:37:23.412 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[3030] - [WorkflowInstance-10245][TaskInstance-80134] - Begin to release task group: 10
   [DEBUG] 2023-03-13 11:37:23.412 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupMapper.selectById:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==>  Preparing: SELECT id,name,description,group_size,use_size,user_id,status,create_time,update_time,project_code FROM t_ds_task_group WHERE id=?
   [DEBUG] 2023-03-13 11:37:23.413 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupMapper.selectById:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==> Parameters: 10(Integer)
   [DEBUG] 2023-03-13 11:37:23.416 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupMapper.selectById:[137] - [WorkflowInstance-10245][TaskInstance-80134] - <==      Total: 1
   [DEBUG] 2023-03-13 11:37:23.416 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.queryByTaskId:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==>  Preparing: select id, task_id, task_name, group_id, process_id, priority, status , force_start , in_queue, create_time, update_time from t_ds_task_group_queue where task_id = ?
   [DEBUG] 2023-03-13 11:37:23.416 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.queryByTaskId:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==> Parameters: 80134(Integer)
   [DEBUG] 2023-03-13 11:37:23.420 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.queryByTaskId:[137] - [WorkflowInstance-10245][TaskInstance-80134] - <==      Total: 1
   [DEBUG] 2023-03-13 11:37:23.420 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupMapper.releaseTaskGroupResource:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==>  Preparing: update t_ds_task_group set use_size = use_size-1 where id = ? and use_size > 0 and (select count(1) FROM t_ds_task_group_queue where id = ? and status = ? ) = 1
   [DEBUG] 2023-03-13 11:37:23.420 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupMapper.releaseTaskGroupResource:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==> Parameters: 10(Integer), 2175(Integer), 1(Integer)
   [DEBUG] 2023-03-13 11:37:23.426 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupMapper.releaseTaskGroupResource:[137] - [WorkflowInstance-10245][TaskInstance-80134] - <==    Updates: 1
   [INFO] 2023-03-13 11:37:23.426 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[3056] - [WorkflowInstance-10245][TaskInstance-80134] - Finished to release task group, taskGroupId: 10
   [INFO] 2023-03-13 11:37:23.426 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[3058] - [WorkflowInstance-10245][TaskInstance-80134] - Begin to release task group queue, taskGroupId: 10
   [DEBUG] 2023-03-13 11:37:23.426 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.queryByTaskId:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==>  Preparing: select id, task_id, task_name, group_id, process_id, priority, status , force_start , in_queue, create_time, update_time from t_ds_task_group_queue where task_id = ?
   [DEBUG] 2023-03-13 11:37:23.427 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.queryByTaskId:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==> Parameters: 80134(Integer)
   [DEBUG] 2023-03-13 11:37:23.429 +0800 org.apache.dolphinscheduler.remote.handler.NettyClientHandler:[191] - [WorkflowInstance-0][TaskInstance-0] - Client send heart beat to: 172.16.10.205:1234
   [DEBUG] 2023-03-13 11:37:23.430 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.queryByTaskId:[137] - [WorkflowInstance-10245][TaskInstance-80134] - <==      Total: 1
   [DEBUG] 2023-03-13 11:37:23.431 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.updateById:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==>  Preparing: UPDATE t_ds_task_group_queue SET task_id=?, task_name=?, group_id=?, process_id=?, priority=?, force_start=?, in_queue=?, status=?, create_time=?, update_time=? WHERE id=?
   [DEBUG] 2023-03-13 11:37:23.431 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.updateById:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==> Parameters: 80134(Integer), report_xxx(String), 10(Integer), 10245(Integer), 0(Integer), 0(Integer), 0(Integer), 2(Integer), 2023-03-13 11:37:00.0(Timestamp), 2023-03-13 11:37:23.431(Timestamp), 2175(Integer)
   [DEBUG] 2023-03-13 11:37:23.437 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.updateById:[137] - [WorkflowInstance-10245][TaskInstance-80134] - <==    Updates: [DEBUG] 2023-03-13 11:37:23.437 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.queryTheHighestPriorityTasks:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==>  Preparing: select id, task_id, task_name, group_id, process_id, priority, status , force_start , in_queue, create_time, update_time from t_ds_task_group_queue where priority = (select max(priority) from t_ds_task_group_queue where group_id = ? and status = ? and in_queue = ? and force_start = ? ) and group_id = ? and status = ? and in_queue = ? and force_start = ? limit 1
   [DEBUG] 2023-03-13 11:37:23.437 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.queryTheHighestPriorityTasks:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==> Parameters: 10(Integer), -1(Integer), 0(Integer), 0(Integer), 10(Integer), -1(Integer), 0(Integer), 0(Integer)
   [DEBUG] 2023-03-13 11:37:23.442 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.queryTheHighestPriorityTasks:[137] - [WorkflowInstance-10245][TaskInstance-80134] - <==      Total: 1
   [DEBUG] 2023-03-13 11:37:23.442 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.updateInQueueCAS:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==>  Preparing: update t_ds_task_group_queue set in_queue = ? where id = ? and in_queue = ?
   [DEBUG] 2023-03-13 11:37:23.443 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.updateInQueueCAS:[137] - [WorkflowInstance-10245][TaskInstance-80134] - ==> Parameters: 1(Integer), 2178(Integer), 0(Integer)
   [DEBUG] 2023-03-13 11:37:23.448 +0800 org.apache.dolphinscheduler.dao.mapper.TaskGroupQueueMapper.updateInQueueCAS:[137] - [WorkflowInstance-10245][TaskInstance-80134] - <==    Updates: 1
   [INFO] 2023-03-13 11:37:23.448 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[3073] - [WorkflowInstance-10245][TaskInstance-80134] - Finished to release task group queue: taskGroupId: 10, taskGroupQueueId: 2178
   
   ### Version
   
   3.1.x
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] Sky-Gu commented on issue #13726: [Bug] [TaskGroupQueue] Task group release data dirty write

Posted by "Sky-Gu (via GitHub)" <gi...@apache.org>.
Sky-Gu commented on issue #13726:
URL: https://github.com/apache/dolphinscheduler/issues/13726#issuecomment-1607209287

   > @Sky-Gu 您好,Sky, 我们遇到了同样的问题,您是如何解决的? 另外,为什么要关闭这个bug?固定了吗?
   
   the system automatically closed the bug.
   
   I didn't fix the bug because it's hard to reproduce.
   
   my current approach is to work through a workflow composed of serial tasks.
   
   for example, if the resource of my task group is 5, I will enable 10 parallel tasks in the workflow
    
   when there are upstream and downstream tasks, they are not all submitted to the master node at one time, and the resource Settings of the task group are also taken into account.
   
   assigning tasks upstream and downstream is solved through API
    
   
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #13726: [Bug] [TaskGroupQueue] Task group release data dirty write

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #13726:
URL: https://github.com/apache/dolphinscheduler/issues/13726#issuecomment-1465563381

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information, version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] dragonliubei83 commented on issue #13726: [Bug] [TaskGroupQueue] Task group release data dirty write

Posted by "dragonliubei83 (via GitHub)" <gi...@apache.org>.
dragonliubei83 commented on issue #13726:
URL: https://github.com/apache/dolphinscheduler/issues/13726#issuecomment-1607149879

   @Sky-Gu  Hi  Sky,
   we meet the same issue, how do you solve it ? 
   In addition , why do you close the bug ? is it fixed ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] Sky-Gu closed issue #13726: [Bug] [TaskGroupQueue] Task group release data dirty write

Posted by "Sky-Gu (via GitHub)" <gi...@apache.org>.
Sky-Gu closed issue #13726: [Bug] [TaskGroupQueue] Task group release data dirty write   
URL: https://github.com/apache/dolphinscheduler/issues/13726


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org