You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/12 14:09:16 UTC

[GitHub] [airflow] larsjeh opened a new issue #14744: Task Instances in the "shutdown" state prevent the scheduler from scheduling new tasks

larsjeh opened a new issue #14744:
URL: https://github.com/apache/airflow/issues/14744


   
   **Apache Airflow version**: 2.0.1
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   **Environment**:
   
   - **Cloud provider or hardware configuration**: AWS ECS
   - **OS** (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
   - **Kernel** (e.g. `uname -a`):  Linux 4d41276694bd 4.14.62-65.117.amzn1.x86_64 #1 SMP Fri Aug 10 20:03:52 UTC 2018 x86_64 GNU/Linux
   - **Install tools**:
   - **Others**: Python 3.7
   
   **What happened**:
   This bug is comparable to #13151, but only for the `shutdown` state instead of the `failed` state.
   
   `DAG <dag_name> already has 1 active runs, not queuing any tasks for run 2020-12-17 08:05:00+00:00`
   
   A bit of digging revealed that this DAG had task instances associated with it that are in the shutdown state. As soon as I forced the task instances that are in the shutdown state into the failed state, the tasks would be scheduled.
   
   **What you expected to happen**:
   I expected the task instances in the DAG to be scheduled, because the DAG did not actually exceed the number of `max_active_runs`.
   
   **How to reproduce it**:
   I think the best approach to reproduce it is as follows:
   
   - Create a DAG and set max_active_runs to 1.
   - Ensure the DAG has ran successfully a number of times, such that it has some history associated with it.
   - Set one historical task instance to the shutdown state by directly updating it in the DB
   
   
   **Anything else we need to know**:
   A workaround is to set the tasks to failed, which will allow the scheduler to proceed.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on issue #14744: Task Instances in the "shutdown" state prevent the scheduler from scheduling new tasks

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on issue #14744:
URL: https://github.com/apache/airflow/issues/14744#issuecomment-801089725


   Ah, I wonder if the "correct" fix is then instead to look for zombie task in RUNNING or SHUTDOWN state, where as currently it only looks at them in the RUNNING state
   
   https://github.com/apache/airflow/blob/2a2adb3f94cc165014d746102e12f9620f271391/airflow/utils/dag_processing.py#L1077-L1089
   
   I wonder if a related problem is that the LocalTaskJob is never cleaned up and left in the Running state even though it isn't heartbeating anymore.
   
   @larsjeh Could you go to Browse -> Jobs and see if you have a number of "old" LoclalTaskJobs in running?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #14744: Task Instances in the "shutdown" state prevent the scheduler from scheduling new tasks

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #14744:
URL: https://github.com/apache/airflow/issues/14744#issuecomment-801089725


   Ah, I wonder if the "correct" fix is then instead to look for zombie task in RUNNING or SHUTDOWN state, where as currently it only looks at them in the RUNNING state
   
   https://github.com/apache/airflow/blob/2a2adb3f94cc165014d746102e12f9620f271391/airflow/utils/dag_processing.py#L1077-L1089


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] larsjeh commented on issue #14744: Task Instances in the "shutdown" state prevent the scheduler from scheduling new tasks

Posted by GitBox <gi...@apache.org>.
larsjeh commented on issue #14744:
URL: https://github.com/apache/airflow/issues/14744#issuecomment-800915354


   Thanks for your response @ashb! For some reason, the tasks remain in the shutdown state and are not failing automatically. 
   
   We have a setup with AWS ECS containers where we redeploy the containers after dependency changes etc. Smaller changes are synced over EFS. 
   
   I have a feeling that the tasks end up in the shutdown state after/during a new deployment. It appears that the worker, that is currently processing a task, is taken down by the orchestrator causing the task to go into the shutdown state. Unfortunately, the tasks are not failed automatically afterward and they remain in the shutdown state. This results in new dagruns that are not scheduled if `max_active_runs=1`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #14744: Task Instances in the "shutdown" state prevent the scheduler from scheduling new tasks

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #14744:
URL: https://github.com/apache/airflow/issues/14744#issuecomment-799553067


   How long were these tasks in the shutdown state? They should have automatically progressed to failed reasonable quickly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #14744: Task Instances in the "shutdown" state prevent the scheduler from scheduling new tasks

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #14744:
URL: https://github.com/apache/airflow/issues/14744#issuecomment-797512847


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org