You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/11 09:41:03 UTC
[GitHub] [airflow] atrbgithub edited a comment on issue #13808: Task incorrectly marked as orphaned when using 2 schedulers

atrbgithub edited a comment on issue #13808:
URL: https://github.com/apache/airflow/issues/13808#issuecomment-796580922


   @ephraimbuddy @dimberman @eejbyfeldt 
   
   We're seeing an issue similar to this, but in our case we never have two schedulers running. 
   
   We see this after moving to 2.0.1, we have a dag which has a subdag, both tasks are launched using the k8s executor. The task in the subdag does nothing but sleep for an hour or so. When the task is running, and we kill the scheduler pod, everything is fine, when the scheduler pod returns, the task continues to run.
   
   However, when we redeploy airflow, and the scheduler is recreated (with a new docker image), we see that the task is terminated:
   
   ```
   [2021-03-11 08:29:32,491] {{logging_mixin.py:104}} INFO - I'm sleeping for 10/60 minutes
   [2021-03-11 08:30:32,501] {{logging_mixin.py:104}} INFO - I'm sleeping for 11/60 minutes
   [2021-03-11 08:30:55,688] {{local_task_job.py:187}} WARNING - State of this instance has been externally set to queued. Terminating instance.
   [2021-03-11 08:30:55,690] {{process_utils.py:100}} INFO - Sending Signals.SIGTERM to GPID 16
   [2021-03-11 08:30:55,691] {{taskinstance.py:1240}} ERROR - Received SIGTERM. Terminating subprocesses.
   [2021-03-11 08:30:55,719] {{taskinstance.py:1456}} ERROR - Task received SIGTERM signal
   ```
   
   This seems to be new behaviour in 2.0.1. The scheduler has the following logs
   
   ```
   [ESC[34m2021-03-11 08:30:51,524ESC[0m] {ESC[34mscheduler_job.py:ESC[0m1898} INFOESC[0m - Reset the following 2 orphaned TaskInstances:
           <TaskInstance: kubernetes_executor_worker_adhoc_dag.worker_pod_adhoc_job 2021-03-11 08:19:21.448002+00:00 [running]>
           <TaskInstance: kubernetes_executor_worker_adhoc_dag.worker_pod_adhoc_job.kubernetes_executor_worker_adhoc_task 2021-03-11 08:19:21.448002+00:00 [running]>ESC[0m 
   ```
   
   ```
   [ESC[34m2021-03-11 08:30:51,646ESC[0m] {ESC[34mscheduler_job.py:ESC[0m1063} INFOESC[0m - Setting the following tasks to queued state:
           <TaskInstance: kubernetes_executor_worker_adhoc_dag.worker_pod_adhoc_job 2021-03-11 08:19:21.448002+00:00 [scheduled]>
           <TaskInstance: kubernetes_executor_worker_adhoc_dag.worker_pod_adhoc_job.kubernetes_executor_worker_adhoc_task 2021-03-11 08:19:21.448002+00:00 [scheduled]>ESC[0m
   ```
   
   Which relate to the task which was killed above. Is this expected? We're using the recreate strategy in k8s, which means the old scheduler is stopped before the new one is created. They are never running at the same time.
   
   Is there perhaps a config option to prevent this from happening?
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org