You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/12/21 13:00:06 UTC

[GitHub] [dolphinscheduler] wcmolin opened a new issue, #13244: [Bug] [Master] Stopping a workflow does not update task status correctly

wcmolin opened a new issue, #13244:
URL: https://github.com/apache/dolphinscheduler/issues/13244

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   Sometimes it happens that after stopping the workflow, the status of the killed task is still running
   
   <img width="1701" alt="image" src="https://user-images.githubusercontent.com/11606262/208908647-222ed3e5-5bff-420c-a141-a321d910bd6e.png">
   
   <img width="1176" alt="image" src="https://user-images.githubusercontent.com/11606262/208908697-eecec7d5-808e-4a23-8efa-12ded2473205.png">
   
   This is a screenshot of the task configuration:
   <img width="1100" alt="image" src="https://user-images.githubusercontent.com/11606262/208909286-7a0081e5-4c81-4da8-bd8d-4571f7b4f092.png">
   
   
   ### What you expected to happen
   
   Killed task status shows as killed
   
   ### How to reproduce
   
   The code must be changed to reproduce as much as possible.
   
   Modify line 175 of `org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService` and add the following code
   ```java
   if (null != taskResponsePersistThread) {
       if (taskResponsePersistThread.addEvent(taskResponseEvent))...    
   } else {
       logger.error("task response persist thread is null");
   }
   ```
   
   Modify line 200 of `org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService` and add the following code
   ```java
   while (Stopper.isRunning()) {
       try {
           eventHandler();
           // TimeUnit.MILLISECONDS.sleep(Constants.SLEEP_TIME_MILLIS);
           TimeUnit.MILLISECONDS.sleep(Constants.SLEEP_TIME_MILLIS * 10);
   ```
   
   The reason for this is to simulate another site to remove the instance in processInstanceMap.
   ```java
   if (taskResponsePersistThread.eventSize() == 0) {
       if (!processInstanceMap.containsKey(taskResponsePersistThread.getProcessInstanceId())) {
           processTaskResponseMap.remove(taskResponsePersistThread.getProcessInstanceId());
           logger.info("remove process instance: {}", taskResponsePersistThread.getProcessInstanceId());
       }
       continue;
   }
   ```
    This will cause the instance in `processTaskResponseMap` to be removed in the following code, resulting in `TaskResponsePersistThread` being empty and printing 
   ```
   task response persist thread is null
   ```
   
   **master log:**
   ```
   [INFO] 2022-12-21 20:44:27.372 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[52] - received command : CacheExpireCommand{CacheType=PROCESS_DEFINITION, cacheKey=7960963905984}
   [INFO] 2022-12-21 20:44:27.390 org.apache.dolphinscheduler.server.master.processor.CacheProcessor:[70] - cache evict, type:processDefinition, key:7960963905984
   [INFO] 2022-12-21 20:44:28.363 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[255] - find command 8, slot:0 :
   [INFO] 2022-12-21 20:44:28.363 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] - find one command: id: 8, type: START_PROCESS
   [INFO] 2022-12-21 20:44:28.482 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - handle command end, command 8 process 8 start...
   [INFO] 2022-12-21 20:44:28.650 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1370] - add task to stand by list, task name:sleep-20s, task id:0, task code:7960945419712
   [INFO] 2022-12-21 20:44:28.685 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : sleep-20s, instance id:8, state: RUNNING_EXECUTION
   [INFO] 2022-12-21 20:44:28.803 org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit task to db successfully:8 sleep-20s state:SUBMITTED_SUCCESS complete, instance id:8 state: RUNNING_EXECUTION  
   [INFO] 2022-12-21 20:44:28.823 org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164] - master submit success, task id:8, task name:sleep-20s, process id:8
   [INFO] 2022-12-21 20:44:28.824 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1384] - remove task from stand by list, id: 8 name:sleep-20s
   [INFO] 2022-12-21 20:44:29.043 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=8, startTime=Wed Dec 21 20:44:29 CST 2022, host='192.168.20.62:1234', status=1, logPath='/Users/molin/dev/github/dolphinscheduler/logs/7960963905984_2/8/8.log', executePath='/tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8', processInstanceId='0'}
   [INFO] 2022-12-21 20:44:29.043 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=8, startTime=Wed Dec 21 20:44:29 CST 2022, host='192.168.20.62:1234', status=1, logPath='/Users/molin/dev/github/dolphinscheduler/logs/7960963905984_2/8/8.log', executePath='/tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8', processInstanceId='8'}
   [ERROR] 2022-12-21 20:44:29.044 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[176] - task response persist thread is null
   [INFO] 2022-12-21 20:44:33.368 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[226] - already exists handler process size:0
   [INFO] 2022-12-21 20:44:33.703 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[233] - persist events 8 succeeded.
   [INFO] 2022-12-21 20:44:42.226 org.apache.dolphinscheduler.server.master.processor.StateEventProcessor:[75] - received command : State Event :key: 8-0-8-0 type: PROCESS_STATE_CHANGE executeStatus: READY_STOP task instance id: 0 process instance id: 8 context: null
   [INFO] 2022-12-21 20:44:42.975 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 8 , events count:1
   [INFO] 2022-12-21 20:44:42.980 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   [INFO] 2022-12-21 20:44:42.982 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[310] - process event: State Event :key: 8-0-8-0 type: PROCESS_STATE_CHANGE executeStatus: READY_STOP task instance id: 0 process instance id: 8 context: null
   [INFO] 2022-12-21 20:44:42.983 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[497] - process:8 state RUNNING_EXECUTION change to READY_STOP
   [INFO] 2022-12-21 20:44:43.013 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1422] - kill called on process instance id: 8, num: 1
   [INFO] 2022-12-21 20:44:43.094 org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[209] - master kill taskInstance name :sleep-20s taskInstance id:8
   [INFO] 2022-12-21 20:44:43.095 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[310] - process event: State Event :key: null type: PROCESS_STATE_CHANGE executeStatus: STOP task instance id: 0 process instance id: 8 context: null
   [INFO] 2022-12-21 20:44:43.095 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[497] - process:8 state READY_STOP change to STOP
   [INFO] 2022-12-21 20:44:43.107 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1320] - work flow process instance [id: 8, name:test_task_status_error-2-20221221204428415], state change from READY_STOP to STOP, cmd type: STOP
   [INFO] 2022-12-21 20:44:43.307 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[139] - process instance 8 finished.
   [INFO] 2022-12-21 20:44:43.385 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[222] - remove process instance: 8
   [INFO] 2022-12-21 20:44:43.610 org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72] - received command : TaskExecuteResponseCommand{taskInstanceId=8, processInstanceId=8, status=9, startTime=null, endTime=Wed Dec 21 20:44:43 CST 2022, host=null, logPath=null, executePath=null, processId=2407, appIds='', varPool=[]}
   [ERROR] 2022-12-21 20:44:43.613 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[176] - task response persist thread is null
   
   ```
   
   **worker log:**
   ```
   [INFO] 2022-12-21 20:44:28.882 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[110] - received command : TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":8,"taskName":"sleep-20s","firstSubmitTime":"2022-12-21 20:44:28","startTime":null,"endTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":7960963905984,"processDefineVersion":2,"appIds":null,"processInstanceId":8,"scheduleTime":null,"globalParams":null,"executorId":1,"cmdTypeIfComplement":0,"tenantCode":"molin","queue":"default","projectCode":7960944102080,"taskParams":"{\"resourceList\":[],\"localParams\":[],\"rawScript\":\"echo \\\"start\\\"\\nsleep 20\\necho \\\"end\\\"\",\"dependence\":{},\"conditionResult\":{\"successNode\":[],\"failedNode\":[]},\"waitStartTimeout\":{},\"switchResult\":{}}","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":21
 47483647,"workerGroup":"default","delayTime":0,"currentExecutionStatus":null,"resources":{},"sqlTaskExecutionContext":{"warningGroupId":0,"connectionParams":null,"udfFuncTenantCodeMap":null,"defaultFS":null},"dataxTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"dependenceTaskExecutionContext":null,"sqoopTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"varPool":null,"dryRun":0,"paramsMap":null,"procedureTaskExecutionContext":{"connectionParams":null}}'}
   [INFO] 2022-12-21 20:44:28.919 org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success /tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8
   [INFO] 2022-12-21 20:44:28.921 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[128] - task instance local execute path : /tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8
   [INFO] 2022-12-21 20:44:29.038 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[146] - the task begins to execute. task instance id: 8
   [INFO] 2022-12-21 20:44:29.075 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[83] - shell task params {"resourceList":[],"localParams":[],"rawScript":"echo \"start\"\nsleep 20\necho \"end\"","dependence":{},"conditionResult":{"successNode":[],"failedNode":[]},"waitStartTimeout":{},"switchResult":{}}
   [INFO] 2022-12-21 20:44:29.098 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[137] - raw script : echo "start"
   sleep 20
   echo "end"
   [INFO] 2022-12-21 20:44:29.099 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[138] - task execute path : /tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8
   [INFO] 2022-12-21 20:44:29.100 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[86] - tenantCode user:molin, task dir:8_8
   [INFO] 2022-12-21 20:44:29.100 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[91] - create command file:/tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8/8_8.command
   [INFO] 2022-12-21 20:44:29.101 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[117] - command : #!/bin/sh
   BASEDIR=$(cd `dirname $0`; pwd)
   cd $BASEDIR
   source /etc/profile
   /tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8/8_8_node.sh
   [INFO] 2022-12-21 20:44:29.115 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[287] - task run command: sudo -u molin sh /tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8/8_8.command
   [INFO] 2022-12-21 20:44:29.117 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[178] - process start, process id is: 2407
   [INFO] 2022-12-21 20:44:30.121 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[66] -  -> welcome to use bigdata scheduling system...
   	start
   [INFO] 2022-12-21 20:44:43.101 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[96] - received kill command : TaskKillRequestCommand{taskInstanceId=8}
   [INFO] 2022-12-21 20:44:43.550 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[145] - process id:2407, cmd:sudo -u molin kill -9 02408 02414 02415
   [INFO] 2022-12-21 20:44:43.598 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[202] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8, processId:2407 ,exitStatusCode:137 ,processWaitForStatus:true ,processExitValue:137
   [INFO] 2022-12-21 20:44:43.599 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[206] - task instance id : 8,task final status : KILL
   [INFO] 2022-12-21 20:44:43.602 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[264] - develop mode is: false
   [INFO] 2022-12-21 20:44:43.608 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[282] - exec local path: /tmp/dolphinscheduler/exec/process/7960944102080/7960963905984_2/8/8 cleared.
   [INFO] 2022-12-21 20:44:43.617 org.apache.dolphinscheduler.server.worker.processor.TaskKillProcessor:[194] - view log host : 192.168.20.62,logPath : /Users/molin/dev/github/dolphinscheduler/logs/7960963905984_2/8/8.log
   [INFO] 2022-12-21 20:44:43.617 org.apache.dolphinscheduler.service.log.LogClientService:[117] - view log path /Users/molin/dev/github/dolphinscheduler/logs/7960963905984_2/8/8.log
   [INFO] 2022-12-21 20:44:43.623 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed
   [INFO] 2022-12-21 20:44:43.630 org.apache.dolphinscheduler.service.log.LogClientService:[74] - logger client closed
   [INFO] 2022-12-21 20:44:44.149 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[60] - FINALIZE_SESSION
   ```
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   2.0.x
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina closed issue #13244: [Bug] [Master] Stopping a workflow does not update task status correctly

Posted by GitBox <gi...@apache.org>.
JinyLeeChina closed issue #13244: [Bug] [Master] Stopping a workflow does not update task status correctly
URL: https://github.com/apache/dolphinscheduler/issues/13244


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #13244: [Bug] [Master] Stopping a workflow does not update task status correctly

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #13244:
URL: https://github.com/apache/dolphinscheduler/issues/13244#issuecomment-1361284597

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] wcmolin commented on issue #13244: [Bug] [Master] Stopping a workflow does not update task status correctly

Posted by GitBox <gi...@apache.org>.
wcmolin commented on issue #13244:
URL: https://github.com/apache/dolphinscheduler/issues/13244#issuecomment-1361285985

   @JinyLeeChina @zwZjut can you help me to look at this problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org