You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@dolphinscheduler.apache.org by GitBox <gi...@apache.org> on 2022/08/01 06:08:45 UTC

[GitHub] [dolphinscheduler] MarsByteso0 opened a new issue, #11227: [Bug] [Module Name] Bug title

MarsByteso0 opened a new issue, #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### What happened
   
   Process Instance is in RUNNING status for a long time.  
   
   ### What you expected to happen
   
   Process Instance  should be SUCCESS 
   
   ### How to reproduce
   
   I create a process use python .    python code is "print(1)" .   
   
   ### Anything else
   
   _No response_
   
   ### Version
   
   2.0.6
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1234936271

   This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] danielfree commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
danielfree commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1241441474

   @SbloodyS correct me if I'm wrong, from https://github.com/apache/dolphinscheduler/blob/2.0.6-release/dolphinscheduler-server/src/main/java/org/apache/dolphinscheduler/server/master/runner/FailoverExecuteThread.java#L95
   seems the master server will always failover itself ? why is that?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
JinyLeeChina commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1272936766

   Judging from the log, it is found that the readyToSubmitTaskQueue will not be removed after the task succeeds, which is impossible in the 206 code. Please redeploy the code to try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] MarsByteso0 commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
MarsByteso0 commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1200799376

   ![image](https://user-images.githubusercontent.com/109576142/182091908-f42c8bf3-8106-48ac-8023-adf802218a28.png)
   ![image](https://user-images.githubusercontent.com/109576142/182091929-4475442c-c033-4727-9d66-1b118d1ab5e6.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
SbloodyS commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1249016201

   > thx for your reply, but this is running in k8s, there's no yarn env by default
   
   You can take a look at the `NOTE` part in the document. The unfriendly thing is that there is no switch to control the start and stop of this function.
   https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/task/shell.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] CainGao commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
CainGao commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1201937897

   I don't know how to reproduce this... 
   It appears occasionally... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] danielfree commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
danielfree commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1249012948

   > The root cause for master failover is encountered NPE. It can be seen that the kill yarn job failed from your log. Please check whether the configuration related to the yarn rest api is correct and whether the yarn service is running normally. @danielfree
   
   thx for your reply, but this is running in k8s, there's no yarn env by default


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
SbloodyS commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1249010415

   The root cause for master failover is encountered NPE. It can be seen that the kill yarn job failed from your log. Please check whether the configuration related to the yarn rest api is correct and whether the yarn service is running normally. @danielfree 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] danielfree commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
danielfree commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1241413973

   @SbloodyS  I've seen similar issues in 2.0.6,  from the log it will show PROCESS_TIMEOUT. I suspect it might be related with un-necessary failover info appeared in the log.
   
   instance id 153:
   
   [INFO] 2022-09-08 23:30:00.543 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] - find one command: id: 154, type: SCHEDULER
   [INFO] 2022-09-08 23:30:00.546 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - handle command end, command 154 process 153 start...
   [INFO] 2022-09-08 23:30:00.551 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - add task to stand by list, task name:ssh-check, task id:0, task code:6565890718688
   [INFO] 2022-09-08 23:30:00.553 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : ssh-check, instance id:153, state: RUNNING_EXECUTION
   ....
   ...
   [INFO] 2022-09-08 23:31:47.712 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: SUCCESS task instance id: 507 process instance id: 153 context: null
   [INFO] 2022-09-08 23:31:47.713 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - work flow 153 task 507 state:SUCCESS
   **[INFO] 2022-09-08 23:40:27.862 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - failover execute started**
   [INFO] 2022-09-08 23:40:27.865 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   [INFO] 2022-09-08 23:40:27.869 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - start master[dolphin-master-1.dolphin-master-headless:5678] failover, process list size:2
   [INFO] 2022-09-08 23:40:27.871 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - master[dolphin-master-1.dolphin-master-headless:5678] failover end, useTime:4ms
   [INFO] 2022-09-08 23:45:04.780 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 153 , events count:1
   [INFO] 2022-09-08 23:45:04.780 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   **[INFO] 2022-09-08 23:45:04.780 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event : key : null type: PROCESS_TIMEOUT executeStatus: null task instance id: 0 process instance id: 153 context: null**
   [INFO] 2022-09-08 23:50:27.872 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - failover execute started
   [INFO] 2022-09-08 23:50:27.875 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   [INFO] 2022-09-08 23:50:27.879 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - start master[dolphin-master-1.dolphin-master-headless:5678] failover, process list size:2
   [INFO] 2022-09-08 23:50:27.881 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - master[dolphin-master-1.dolphin-master-headless:5678] failover end, useTime:4ms
   
   I'm deploying ds in k8s with two masters, my question is why this 'need failover' happens a lot? maybe related with some configs in zookeeper?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] github-actions[bot] commented on issue #11227: [Bug] [Module Name] Bug title

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1200752985

   Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
   * In order for us to understand your request as soon as possible, please provide detailed information、version or pictures.
   * If you haven't received a reply for a long time, you can [join our slack](https://s.apache.org/dolphinscheduler-slack) and send your question to channel `#troubleshooting`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] SbloodyS commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
SbloodyS commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1200827462

   Could you provide the steps to reproduce this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] danielfree commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
danielfree commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1249064275

   > > thx for your reply, but this is running in k8s, there's no yarn env by default
   > 
   > You can take a look at the `NOTE` part in the document. The unfriendly thing is that there is no switch to control the start and stop of this function. https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/task/shell.html
   
   interesting, I don't have any logs like `application_xxx_xxx`, the error traces back to https://github.com/apache/dolphinscheduler/blob/2.0.6-release/dolphinscheduler-server/src/main/java/org/apache/dolphinscheduler/server/utils/ProcessUtils.java#L197


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
JinyLeeChina commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1371749922

   It cannot reproduce in the 207 code, I close it. If someone can reproduce, please submit  new issue to discuss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
JinyLeeChina commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1259031353

   > > I think this has nothing to do with the master failover. Can you provide the logs of the instance id 153 of the worker
   > 
   > Here's another case with my running installation:
   > 
   > instance id: 211 task instance id: 672 logs from worker:
   > 
   > ```
   > [INFO] 2022-09-16 07:50:01.165 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[110] - received command : TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":672,"taskName":"flightradar_top_routes","firstSubmitTime":"2022-09-16 07:50:01","startTime":null,"endTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":6629036562401,"processDefineVersion":3,"appIds":null,"processInstanceId":211,"scheduleTime":"2022-09-16 07:50:00","globalParams":null,"executorId":2,"cmdTypeIfComplement":6,"tenantCode":"aspex","queue":"default","projectCode":6565237764192,"taskParams":"***","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":2147483647,"workerGroup":"default","delayTime":0,"currentExecutionStatus":null,"sqlTaskExecutionContext":{"warningGroupId":0,"connectionParams":null,"udfFuncTenantCodeMap":null,"defaultFS":nu
 ll},"dataxTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"dependenceTaskExecutionContext":null,"sqoopTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"varPool":null,"dryRun":0,"paramsMap":null,"procedureTaskExecutionContext":{"connectionParams":null}}'}
   > [INFO] 2022-09-16 07:50:01.167 org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:50:01.167 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[128] - task instance local execute path : /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:50:01.167 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[146] - the task begins to execute. task instance id: 672
   > [INFO] 2022-09-16 07:50:01.265 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[60] - FINALIZE_SESSION
   > ...
   > [INFO] 2022-09-16 07:50:01.287 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[138] - task execute path : /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:50:01.291 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[178] - process start, process id is: 24435
   > 07:50:42.439 [pool-2-thread-4] INFO org.apache.dolphinscheduler.server.log.LoggerRequestProcessor - received command : Command [type=VIEW_WHOLE_LOG_REQUEST, opaque=695701, genCommandTimeMillis=0, bodyLen=82]
   > [INFO] 2022-09-16 07:53:58.443 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[202] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672, processId:24435 ,exitStatusCode:0 ,processWaitForStatus:true ,processExitValue:0
   > [INFO] 2022-09-16 07:53:58.444 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[206] - task instance id : 672,task final status : SUCCESS
   > [INFO] 2022-09-16 07:53:58.444 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[264] - develop mode is: false
   > [INFO] 2022-09-16 07:53:58.445 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[282] - exec local path: /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672 cleared.
   > [INFO] 2022-09-16 07:53:59.196 org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:53:59.196 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[128] - task instance local execute path : /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   > [INFO] 2022-09-16 07:53:59.196 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[146] - the task begins to execute. task instance id: 672
   > ```
   > 
   > Note that at **[INFO] 2022-09-16 07:53:59.196** the task instance 672 is executed again after a SUCCESS run
   > 
   > master logs:
   > 
   > ```
   > [INFO] 2022-09-16 07:50:00.012 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[255] - find command 212, slot:0 :
   > [INFO] 2022-09-16 07:50:00.013 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] - find one command: id: 212, type: SCHEDULER
   > [INFO] 2022-09-16 07:50:00.016 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - handle command end, command 212 process 211 start...
   > [INFO] 2022-09-16 07:50:00.025 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - add task to stand by list, task name:ssh-check, task id:0, task code:6629036562272
   > [INFO] 2022-09-16 07:50:00.027 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : ssh-check, instance id:211, state: RUNNING_EXECUTION
   > [INFO] 2022-09-16 07:50:00.031 org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit task to db successfully:671 ssh-check state:SUBMITTED_SUCCESS complete, instance id:211 state: RUNNING_EXECUTION  
   > [INFO] 2022-09-16 07:50:00.037 org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164] - master submit success, task id:671, task name:ssh-check, process id:211
   > [INFO] 2022-09-16 07:50:00.039 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1374] - remove task from stand by list, id: 671 name:ssh-check
   > [INFO] 2022-09-16 07:50:00.042 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=671, startTime=Fri Sep 16 07:50:00 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', status=1, logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/671.log', executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/671', processInstanceId='0'}
   > [INFO] 2022-09-16 07:50:00.042 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=671, startTime=Fri Sep 16 07:50:00 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', status=1, logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/671.log', executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/671', processInstanceId='211'}
   > [INFO] 2022-09-16 07:50:00.964 org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72] - received command : TaskExecuteResponseCommand{taskInstanceId=671, processInstanceId=211, status=7, startTime=null, endTime=Fri Sep 16 07:50:00 CST 2022, host=null, logPath=null, executePath=null, processId=24426, appIds='', varPool=[]}
   > [INFO] 2022-09-16 07:50:01.022 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223] - already exists handler process size:0
   > [INFO] 2022-09-16 07:50:01.026 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230] - persist events 211 succeeded.
   > [INFO] 2022-09-16 07:50:01.144 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 211 , events count:1
   > [INFO] 2022-09-16 07:50:01.145 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   > [INFO] 2022-09-16 07:50:01.145 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: SUCCESS task instance id: 671 process instance id: 211 context: null
   > [INFO] 2022-09-16 07:50:01.146 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - work flow 211 task 671 state:SUCCESS 
   > [INFO] 2022-09-16 07:50:01.150 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - add task to stand by list, task name:flightradar_top_routes, task id:0, task code:6629036562400
   > [INFO] 2022-09-16 07:50:01.150 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1033] - taskCode: 6629036562400 completeDependTaskList: [6629036562272]
   > [INFO] 2022-09-16 07:50:01.151 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : flightradar_top_routes, instance id:211, state: RUNNING_EXECUTION
   > [INFO] 2022-09-16 07:50:01.153 org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit task to db successfully:672 flightradar_top_routes state:SUBMITTED_SUCCESS complete, instance id:211 state: RUNNING_EXECUTION  
   > [INFO] 2022-09-16 07:50:01.159 org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164] - master submit success, task id:672, task name:flightradar_top_routes, process id:211
   > [INFO] 2022-09-16 07:50:01.161 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1374] - remove task from stand by list, id: 672 name:flightradar_top_routes
   > [INFO] 2022-09-16 07:50:01.168 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=672, startTime=Fri Sep 16 07:50:01 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', status=1, logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/672.log', executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672', processInstanceId='0'}
   > [INFO] 2022-09-16 07:50:01.168 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=672, startTime=Fri Sep 16 07:50:01 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', status=1, logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/672.log', executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672', processInstanceId='211'}
   > [INFO] 2022-09-16 07:50:02.022 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223] - already exists handler process size:0
   > [INFO] 2022-09-16 07:50:02.025 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230] - persist events 211 succeeded.
   > [INFO] 2022-09-16 07:50:41.424 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - failover execute started
   > [INFO] 2022-09-16 07:50:41.426 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   > [INFO] 2022-09-16 07:50:41.429 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - start master[dolphin-master-1.dolphin-master-headless:5678] failover, process list size:2
   > [INFO] 2022-09-16 07:50:41.431 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[442] - failover task instance id: 672, process instance id: 211
   > [INFO] 2022-09-16 07:50:41.424 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - failover execute started
   > [INFO] 2022-09-16 07:50:41.426 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   > [INFO] 2022-09-16 07:50:41.429 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - start master[dolphin-master-1.dolphin-master-headless:5678] failover, process list size:2
   > **[INFO] 2022-09-16 07:50:41.431** org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[442] - failover task instance id: 672, process instance id: 211
   > [INFO] 2022-09-16 07:50:42.432 org.apache.dolphinscheduler.service.log.LogClientService:[117] - view log path /opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/672.log
   > [INFO] 2022-09-16 07:50:42.441 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed
   > [INFO] 2022-09-16 07:50:42.445 org.apache.dolphinscheduler.service.log.LogClientService:[74] - logger client closed
   > [ERROR] 2022-09-16 07:50:42.445 org.apache.dolphinscheduler.server.utils.ProcessUtils:[211] - kill yarn job failure
   > java.lang.NullPointerException: null
   > 	at org.apache.dolphinscheduler.server.utils.ProcessUtils.killYarnJob(ProcessUtils.java:197)
   > 	at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverTaskInstance(MasterRegistryClient.java:490)
   > 	at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverMaster(MasterRegistryClient.java:443)
   > 	at org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread.run(FailoverExecuteThread.java:80)
   > [INFO] 2022-09-16 07:50:42.447 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - master[dolphin-master-1.dolphin-master-headless:5678] failover end, useTime:1019ms
   > [INFO] 2022-09-16 07:50:43.147 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 211 , events count:1
   > [INFO] 2022-09-16 07:50:43.148 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   > [INFO] 2022-09-16 07:50:43.148 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: NEED_FAULT_TOLERANCE task instance id: 672 process instance id: 211 context: null
   > [INFO] 2022-09-16 07:50:43.149 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - work flow 211 task 672 state:NEED_FAULT_TOLERANCE 
   > [INFO] 2022-09-16 07:50:43.150 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - add task to stand by list, task name:flightradar_top_routes, task id:672, task code:6629036562400
   > **[INFO] 2022-09-16 07:53:58.445** org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72] - received command : TaskExecuteResponseCommand{taskInstanceId=672, processInstanceId=211, status=7, startTime=null, endTime=Fri Sep 16 07:53:58 CST 2022, host=null, logPath=null, executePath=null, processId=24435, appIds='', varPool=[]}
   > **[INFO] 2022-09-16 07:53:59.041** org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223] - already exists handler process size:0
   > [INFO] 2022-09-16 07:53:59.044 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230] - persist events 211 succeeded.
   > [INFO] 2022-09-16 07:53:59.168 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 211 , events count:1
   > [INFO] 2022-09-16 07:53:59.168 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   > **[INFO] 2022-09-16 07:53:59.168** org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: SUCCESS task instance id: 672 process instance id: 211 context: null
   > [INFO] 2022-09-16 07:53:59.169 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - work flow 211 task 672 state:SUCCESS 
   > [INFO] 2022-09-16 07:53:59.172 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1033] - taskCode: 6629036562400 completeDependTaskList: [6629036562272, 6629036562400]
   > [INFO] 2022-09-16 07:53:59.173 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : flightradar_top_routes, instance id:211, state: RUNNING_EXECUTION
   > [INFO] 2022-09-16 07:53:59.175 org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit task to db successfully:672 flightradar_top_routes state:SUBMITTED_SUCCESS complete, instance id:211 state: RUNNING_EXECUTION
   > ```
   > 
   > Note at **[INFO] 2022-09-16 07:50:41.431** the master somehow starts to failover with running task 672 in worker, and in **[INFO] 2022-09-16 07:53:59.168** the task status is SUCCESS but the master submits the task again in **[INFO] 2022-09-16 07:53:59.173**
   
   The log taskInstance id 672 does not end


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina closed issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
JinyLeeChina closed issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.
URL: https://github.com/apache/dolphinscheduler/issues/11227


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] danielfree commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
danielfree commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1248837649

   > I think this has nothing to do with the master failover. Can you provide the logs of the instance id 153 of the worker
   
   Here's another case with my running installation:
   
   instance id: 211
   task instance id: 672
   logs from worker:
   ```
   [INFO] 2022-09-16 07:50:01.165 org.apache.dolphinscheduler.server.worker.processor.TaskExecuteProcessor:[110] - received command : TaskExecuteRequestCommand{taskExecutionContext='{"taskInstanceId":672,"taskName":"flightradar_top_routes","firstSubmitTime":"2022-09-16 07:50:01","startTime":null,"endTime":null,"taskType":"SHELL","host":null,"executePath":null,"logPath":null,"taskJson":null,"processId":0,"processDefineCode":6629036562401,"processDefineVersion":3,"appIds":null,"processInstanceId":211,"scheduleTime":"2022-09-16 07:50:00","globalParams":null,"executorId":2,"cmdTypeIfComplement":6,"tenantCode":"aspex","queue":"default","projectCode":6565237764192,"taskParams":"***","envFile":null,"environmentConfig":null,"definedParams":null,"taskAppId":null,"taskTimeoutStrategy":null,"taskTimeout":2147483647,"workerGroup":"default","delayTime":0,"currentExecutionStatus":null,"sqlTaskExecutionContext":{"warningGroupId":0,"connectionParams":null,"udfFuncTenantCodeMap":null,"defaultFS":null
 },"dataxTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"dependenceTaskExecutionContext":null,"sqoopTaskExecutionContext":{"dataSourceId":0,"sourcetype":0,"sourceConnectionParams":null,"dataTargetId":0,"targetType":0,"targetConnectionParams":null},"varPool":null,"dryRun":0,"paramsMap":null,"procedureTaskExecutionContext":{"connectionParams":null}}'}
   [INFO] 2022-09-16 07:50:01.167 org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   [INFO] 2022-09-16 07:50:01.167 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[128] - task instance local execute path : /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   [INFO] 2022-09-16 07:50:01.167 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[146] - the task begins to execute. task instance id: 672
   [INFO] 2022-09-16 07:50:01.265 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[60] - FINALIZE_SESSION
   ...
   [INFO] 2022-09-16 07:50:01.287 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[138] - task execute path : /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   [INFO] 2022-09-16 07:50:01.291 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[178] - process start, process id is: 24435
   07:50:42.439 [pool-2-thread-4] INFO org.apache.dolphinscheduler.server.log.LoggerRequestProcessor - received command : Command [type=VIEW_WHOLE_LOG_REQUEST, opaque=695701, genCommandTimeMillis=0, bodyLen=82]
   [INFO] 2022-09-16 07:53:58.443 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.shell.ShellTask:[202] - process has exited, execute path:/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672, processId:24435 ,exitStatusCode:0 ,processWaitForStatus:true ,processExitValue:0
   [INFO] 2022-09-16 07:53:58.444 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[206] - task instance id : 672,task final status : SUCCESS
   [INFO] 2022-09-16 07:53:58.444 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[264] - develop mode is: false
   [INFO] 2022-09-16 07:53:58.445 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[282] - exec local path: /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672 cleared.
   [INFO] 2022-09-16 07:53:59.196 org.apache.dolphinscheduler.common.utils.FileUtils:[121] - create dir success /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   [INFO] 2022-09-16 07:53:59.196 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[128] - task instance local execute path : /tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672
   [INFO] 2022-09-16 07:53:59.196 org.apache.dolphinscheduler.server.worker.runner.TaskExecuteThread:[146] - the task begins to execute. task instance id: 672
   
   ```
   Note that at **[INFO] 2022-09-16 07:53:59.196** the task instance 672 is executed again after a SUCCESS run
   
   master logs:
   
   ```
   [INFO] 2022-09-16 07:50:00.012 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[255] - find command 212, slot:0 :
   [INFO] 2022-09-16 07:50:00.013 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[208] - find one command: id: 212, type: SCHEDULER
   [INFO] 2022-09-16 07:50:00.016 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerService:[221] - handle command end, command 212 process 211 start...
   [INFO] 2022-09-16 07:50:00.025 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - add task to stand by list, task name:ssh-check, task id:0, task code:6629036562272
   [INFO] 2022-09-16 07:50:00.027 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : ssh-check, instance id:211, state: RUNNING_EXECUTION
   [INFO] 2022-09-16 07:50:00.031 org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit task to db successfully:671 ssh-check state:SUBMITTED_SUCCESS complete, instance id:211 state: RUNNING_EXECUTION  
   [INFO] 2022-09-16 07:50:00.033 org.apache.dolphinscheduler.service.process.ProcessService:[1584] - updated resource info {"id":2,"res":"ec2_aspexeng.pem","resourceName":"/pem/ec2_aspexeng.pem"}
   [INFO] 2022-09-16 07:50:00.037 org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164] - master submit success, task id:671, task name:ssh-check, process id:211
   [INFO] 2022-09-16 07:50:00.039 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1374] - remove task from stand by list, id: 671 name:ssh-check
   [INFO] 2022-09-16 07:50:00.042 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=671, startTime=Fri Sep 16 07:50:00 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', status=1, logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/671.log', executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/671', processInstanceId='0'}
   [INFO] 2022-09-16 07:50:00.042 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=671, startTime=Fri Sep 16 07:50:00 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', status=1, logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/671.log', executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/671', processInstanceId='211'}
   [INFO] 2022-09-16 07:50:00.964 org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72] - received command : TaskExecuteResponseCommand{taskInstanceId=671, processInstanceId=211, status=7, startTime=null, endTime=Fri Sep 16 07:50:00 CST 2022, host=null, logPath=null, executePath=null, processId=24426, appIds='', varPool=[]}
   [INFO] 2022-09-16 07:50:01.022 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223] - already exists handler process size:0
   [INFO] 2022-09-16 07:50:01.026 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230] - persist events 211 succeeded.
   [INFO] 2022-09-16 07:50:01.144 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 211 , events count:1
   [INFO] 2022-09-16 07:50:01.145 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   [INFO] 2022-09-16 07:50:01.145 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: SUCCESS task instance id: 671 process instance id: 211 context: null
   [INFO] 2022-09-16 07:50:01.146 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - work flow 211 task 671 state:SUCCESS 
   [INFO] 2022-09-16 07:50:01.150 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - add task to stand by list, task name:flightradar_top_routes, task id:0, task code:6629036562400
   [INFO] 2022-09-16 07:50:01.150 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1033] - taskCode: 6629036562400 completeDependTaskList: [6629036562272]
   [INFO] 2022-09-16 07:50:01.151 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : flightradar_top_routes, instance id:211, state: RUNNING_EXECUTION
   [INFO] 2022-09-16 07:50:01.153 org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit task to db successfully:672 flightradar_top_routes state:SUBMITTED_SUCCESS complete, instance id:211 state: RUNNING_EXECUTION  
   [INFO] 2022-09-16 07:50:01.156 org.apache.dolphinscheduler.service.process.ProcessService:[1584] - updated resource info {"id":2,"res":"ec2_aspexeng.pem","resourceName":"/pem/ec2_aspexeng.pem"}
   [INFO] 2022-09-16 07:50:01.159 org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[164] - master submit success, task id:672, task name:flightradar_top_routes, process id:211
   [INFO] 2022-09-16 07:50:01.161 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1374] - remove task from stand by list, id: 672 name:flightradar_top_routes
   [INFO] 2022-09-16 07:50:01.168 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=672, startTime=Fri Sep 16 07:50:01 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', status=1, logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/672.log', executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672', processInstanceId='0'}
   [INFO] 2022-09-16 07:50:01.168 org.apache.dolphinscheduler.server.master.processor.TaskAckProcessor:[71] - taskAckCommand : TaskExecuteAckCommand{taskInstanceId=672, startTime=Fri Sep 16 07:50:01 CST 2022, host='dolphin-worker-1.dolphin-worker-headless:1234', status=1, logPath='/opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/672.log', executePath='/tmp/dolphinscheduler/exec/process/6565237764192/6629036562401_3/211/672', processInstanceId='211'}
   [INFO] 2022-09-16 07:50:02.022 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223] - already exists handler process size:0
   [INFO] 2022-09-16 07:50:02.025 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230] - persist events 211 succeeded.
   [INFO] 2022-09-16 07:50:41.424 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - failover execute started
   [INFO] 2022-09-16 07:50:41.426 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   [INFO] 2022-09-16 07:50:41.429 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - start master[dolphin-master-1.dolphin-master-headless:5678] failover, process list size:2
   [INFO] 2022-09-16 07:50:41.431 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[442] - failover task instance id: 672, process instance id: 211
   [INFO] 2022-09-16 07:50:41.424 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[68] - failover execute started
   [INFO] 2022-09-16 07:50:41.426 org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread:[74] - need failover hosts:[dolphin-master-1.dolphin-master-headless:5678]
   [INFO] 2022-09-16 07:50:41.429 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[424] - start master[dolphin-master-1.dolphin-master-headless:5678] failover, process list size:2
   **[INFO] 2022-09-16 07:50:41.431** org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[442] - failover task instance id: 672, process instance id: 211
   [INFO] 2022-09-16 07:50:42.432 org.apache.dolphinscheduler.service.log.LogClientService:[117] - view log path /opt/apache-dolphinscheduler-2.0.6-bin/logs/6629036562401_3/211/672.log
   [INFO] 2022-09-16 07:50:42.441 org.apache.dolphinscheduler.remote.NettyRemotingClient:[390] - netty client closed
   [INFO] 2022-09-16 07:50:42.445 org.apache.dolphinscheduler.service.log.LogClientService:[74] - logger client closed
   [ERROR] 2022-09-16 07:50:42.445 org.apache.dolphinscheduler.server.utils.ProcessUtils:[211] - kill yarn job failure
   java.lang.NullPointerException: null
   	at org.apache.dolphinscheduler.server.utils.ProcessUtils.killYarnJob(ProcessUtils.java:197)
   	at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverTaskInstance(MasterRegistryClient.java:490)
   	at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverMaster(MasterRegistryClient.java:443)
   	at org.apache.dolphinscheduler.server.master.runner.FailoverExecuteThread.run(FailoverExecuteThread.java:80)
   [INFO] 2022-09-16 07:50:42.447 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[456] - master[dolphin-master-1.dolphin-master-headless:5678] failover end, useTime:1019ms
   [INFO] 2022-09-16 07:50:43.147 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 211 , events count:1
   [INFO] 2022-09-16 07:50:43.148 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   [INFO] 2022-09-16 07:50:43.148 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: NEED_FAULT_TOLERANCE task instance id: 672 process instance id: 211 context: null
   [INFO] 2022-09-16 07:50:43.149 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - work flow 211 task 672 state:NEED_FAULT_TOLERANCE 
   [INFO] 2022-09-16 07:50:43.150 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1360] - add task to stand by list, task name:flightradar_top_routes, task id:672, task code:6629036562400
   **[INFO] 2022-09-16 07:53:58.445** org.apache.dolphinscheduler.server.master.processor.TaskResponseProcessor:[72] - received command : TaskExecuteResponseCommand{taskInstanceId=672, processInstanceId=211, status=7, startTime=null, endTime=Fri Sep 16 07:53:58 CST 2022, host=null, logPath=null, executePath=null, processId=24435, appIds='', varPool=[]}
   **[INFO] 2022-09-16 07:53:59.041** org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[223] - already exists handler process size:0
   [INFO] 2022-09-16 07:53:59.044 org.apache.dolphinscheduler.server.master.processor.queue.TaskResponseService:[230] - persist events 211 succeeded.
   [INFO] 2022-09-16 07:53:59.168 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[127] - handle process instance : 211 , events count:1
   [INFO] 2022-09-16 07:53:59.168 org.apache.dolphinscheduler.server.master.runner.EventExecuteService:[130] - already exists handler process size:0
   **[INFO] 2022-09-16 07:53:59.168** org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[308] - process event: State Event :key: null type: TASK_STATE_CHANGE executeStatus: SUCCESS task instance id: 672 process instance id: 211 context: null
   [INFO] 2022-09-16 07:53:59.169 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[422] - work flow 211 task 672 state:SUCCESS 
   [INFO] 2022-09-16 07:53:59.172 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThread:[1033] - taskCode: 6629036562400 completeDependTaskList: [6629036562272, 6629036562400]
   [INFO] 2022-09-16 07:53:59.173 org.apache.dolphinscheduler.service.process.ProcessService:[1088] - start submit task : flightradar_top_routes, instance id:211, state: RUNNING_EXECUTION
   [INFO] 2022-09-16 07:53:59.175 org.apache.dolphinscheduler.service.process.ProcessService:[1101] - end submit task to db successfully:672 flightradar_top_routes state:SUBMITTED_SUCCESS complete, instance id:211 state: RUNNING_EXECUTION
   ```
   Note at **[INFO] 2022-09-16 07:50:41.431**  the master somehow starts to failover with running task 672 in worker, and in **[INFO] 2022-09-16 07:53:59.168** the task status is SUCCESS but the master submits the task again in **[INFO] 2022-09-16 07:53:59.173**
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [dolphinscheduler] JinyLeeChina commented on issue #11227: [Bug] [Module Name]Process Instance is in RUNNING status for a long time.

Posted by GitBox <gi...@apache.org>.
JinyLeeChina commented on issue #11227:
URL: https://github.com/apache/dolphinscheduler/issues/11227#issuecomment-1247755897

   I think this has nothing to do with the master failover. Can you provide the logs of the instance id 153 of the worker


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@dolphinscheduler.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org