You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/30 17:02:36 UTC

[GitHub] [airflow] Chris7 opened a new issue #21225: Tasks stuck in queued state

Chris7 opened a new issue #21225:
URL: https://github.com/apache/airflow/issues/21225


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   Tasks are stuck in the queued state and will not be scheduled for execution. In the logs, these tasks have a message of `could not queue task <task details>`, as they are currently in the `queued` or `running` sets in the executor.
   
   ### What you expected to happen
   
   Tasks run :)
   
   ### How to reproduce
   
   We have a dockerized airflow setup, using celery with a rabbit broker & postgres as the result db. When rolling out DAG updates, we redeploy most of the components (the workers, scheduler, webserver, and rabbit). We can have a few thousand Dagruns at a given time. This error seems to happen during a load spike when a deployment happens.
   
   Looking at the code, this is what I believe is happening:
   
   Starting from the initial debug message of `could not queue task` I found tasks are marked as running (but in the UI they still appear as queued): https://github.com/apache/airflow/blob/main/airflow/executors/base_executor.py#L85
   
   Tracking through our logs, I see these tasks are recovered by the adoption code, and the state there is STARTED (https://github.com/apache/airflow/blob/main/airflow/executors/celery_executor.py#L540). 
   
   Following the state update code, I see this does not cause any state updates to occur in Airflow (https://github.com/apache/airflow/blob/main/airflow/executors/celery_executor.py#L465). Thus, if a task is marked as STARTED in the results db, but queued in the airflow task state, it will never be transferred out by the scheduler. However ,you can get these tasks to finally run by clicking the `run` button. 
   
   ### Operating System
   
   Ubuntu 20.04.3 LTS
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   It's during load peaks, I believe this is a race condition with task execution and recording of task state.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] easontm edited a comment on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
easontm edited a comment on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1031361404


   I'm getting the same message but different symptoms on a similar setup. CeleryKubernetesExecutor, RabbitMQ, and MySQL backend. For me, the task gets marked failed with no logs. 
   
   A sensor in `reschedule` mode was doing it's periodic run-sleep-run pattern. On one of the later runs, I got the `could not queue task instance` message. I grabbed the `external_id` from the logs and searched for it in Flower, and it turns out the celery task actually succeeded (that is, performed its check -- the sensor itself didn't "succeed").
   
   I think some kind of desync between the states known by Airflow and the executor is causing this, but I'm not sure why the results of the successfully executed task weren't reported, even though Airflow thought it couldn't be queued (which it was).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] arkadiusz-bach edited a comment on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
arkadiusz-bach edited a comment on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1035656930


   I had similar problem, it was happening, because KubernetesExecutor is picking CeleryExecutor tasks on CeleryKubernetesExecutor
   
   Celery is changing task state to queued and KubernetesExecutor to scheduled, it is happening over and over again(depends how fast task gets to running state)
   
   
   I fixed it by adding additional filter on task queue(which defaults to 'kubernetes' ) for KubernetesExecutor in couple of places, below is probably the one that is causing most of the problems:
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/executors/kubernetes_executor.py#L455
   
   It is taking all of the tasks in queued state, but it should take only those that should run on Kubernetes - has queue equaled to: 
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/config_templates/default_airflow.cfg#L743
   
   That is why there is log entry:
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/executors/base_executor.py#L85
   
   Because celery gets the same task that is already inside queued_tasks
   
   
   If this is the case then you will see  following messages in the logs as well:
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/executors/kubernetes_executor.py#L494


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] easontm commented on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
easontm commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1031361404


   I'm getting the same message but different symptoms. For me, the task gets marked failed with no logs. Setup is similar, with CeleryKubernetesExecutor, RabbitMQ, and MySQL backend.
   
   A sensor in `reschedule` mode was doing it's periodic run-sleep-run pattern. On one of the later runs, I got the `could not queue task instance` message. I grabbed the `external_id` from the logs and searched for it in Flower, and it turns out the celery task actually succeeded (that is, performed its check -- the sensor itself didn't "succeed").
   
   I think some kind of desync between the states known by Airflow and the executor is causing this, but I'm not sure why the results of the successfully executed task weren't reported, even though Airflow thought it couldn't be queued (which it was).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] arkadiusz-bach commented on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
arkadiusz-bach commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1035656930


   I had similar problem, it was happening, because KubernetesExecutor is picking CeleryExecutor tasks on CeleryKubernetesExecutor
   
   Celery is changing task state to queued and KubernetesExecutor to scheduled, it is happening over and over again(depends how fast task gets to running state)
   
   
   I fixed it by adding additional filter on task queue(which default to 'kubernetes' ) for KubernetesExecutor in couple of places, below is probably the one that is causing most of the problems:
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/executors/kubernetes_executor.py#L455
   
   It is taking all of the tasks in queued state, but it should take only thoose that should run on Kubernetes - has queue equaled to: 
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/config_templates/default_airflow.cfg#L743'')


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1025185381


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] arkadiusz-bach edited a comment on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
arkadiusz-bach edited a comment on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1035656930


   I had similar problem, it was happening, because KubernetesExecutor is picking CeleryExecutor tasks on CeleryKubernetesExecutor
   
   Celery is changing task state to queued and KubernetesExecutor to scheduled, it is happening over and over again(depends how fast task gets to running state)
   
   
   I fixed it by adding additional filter on task queue(which default to 'kubernetes' ) for KubernetesExecutor in couple of places, below is probably the one that is causing most of the problems:
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/executors/kubernetes_executor.py#L455
   
   It is taking all of the tasks in queued state, but it should take only thoose that should run on Kubernetes - has queue equaled to: 
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/config_templates/default_airflow.cfg#L743
   
   That is why there is log entry:
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/executors/base_executor.py#L85
   
   Because celery gets the same task that is already inside queued_tasks
   
   
   If this is the case then you will see  following messages in the logs as well:
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/executors/kubernetes_executor.py#L494


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1031380601


   > I think this is being handled in one of the fixes coming to 2.2.4 (@ephraimbuddy ?)
   
   Yes. Here https://github.com/apache/airflow/pull/19769


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Chris7 edited a comment on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
Chris7 edited a comment on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1029060830


   This is not ideal, but for those who want to kick all their queued tasks out, here's a snippet i've been using:
   ```
   from airflow import models, settings
   from airflow.executors.executor_loader import ExecutorLoader
   
   session = settings.Session()
   tis = session.query(models.TaskInstance).filter(models.TaskInstance.state=='queued')
   dagbag = models.DagBag()
   for ti in tis:
     dag = dagbag.get_dag(ti.dag_id)
     task = dag.get_task(ti.task_id)
     ti.refresh_from_task(task)
     executor = ExecutorLoader.get_default_executor()
     executor.job_id = "manual"
     executor.start()
     executor.queue_task_instance(ti, ignore_all_deps=False, ignore_task_deps=False, ignore_ti_state=False)
     executor.heartbeat()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Chris7 commented on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
Chris7 commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1029060830


   This is not ideal, but for those who want to kick all their queued tasks out, here's a snippet i've been using:
   ```
   from airflow import models, settings
   session = settings.Session()
   
   from airflow.executors.executor_loader import ExecutorLoader
   tis = session.query(models.TaskInstance).filter(models.TaskInstance.state=='queued')
   dagbag = models.DagBag()
   for ti in tis:
     dag = dagbag.get_dag(ti.dag_id)
     task = dag.get_task(ti.task_id)
     ti.refresh_from_task(task)
     executor = ExecutorLoader.get_default_executor()
     executor.job_id = "manual"
     executor.start()
     executor.queue_task_instance(ti, ignore_all_deps=False, ignore_task_deps=False, ignore_ti_state=False)
     executor.heartbeat()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] easontm edited a comment on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
easontm edited a comment on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1031361404


   I'm getting the same message but different symptoms on a similar setup. CeleryKubernetesExecutor, RabbitMQ, and MySQL backend. For me, the task gets marked failed with no logs. 
   
   A sensor in `reschedule` mode was doing it's periodic run-sleep-run pattern. On one of the later runs, I got the `could not queue task instance` message. I grabbed the `external_id` from the logs and searched for it in Flower, and it turns out the celery task actually succeeded (that is, performed its check -- the sensor itself didn't "succeed"). However, the Airflow state was still recorded as failed.
   
   I think some kind of desync between the states known by Airflow and the executor is causing this, but I'm not sure why the results of the successfully executed task weren't reported, even though Airflow thought it couldn't be queued (which it was).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Chris7 edited a comment on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
Chris7 edited a comment on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1029060830


   This is not ideal, but for those who want to kick all their queued tasks to running, here's a snippet i've been using:
   ```
   from airflow import models, settings
   from airflow.executors.executor_loader import ExecutorLoader
   
   session = settings.Session()
   tis = session.query(models.TaskInstance).filter(models.TaskInstance.state=='queued')
   dagbag = models.DagBag()
   for ti in tis:
     dag = dagbag.get_dag(ti.dag_id)
     task = dag.get_task(ti.task_id)
     ti.refresh_from_task(task)
     executor = ExecutorLoader.get_default_executor()
     executor.job_id = "manual"
     executor.start()
     executor.queue_task_instance(ti, ignore_all_deps=False, ignore_task_deps=False, ignore_ti_state=False)
     executor.heartbeat()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimon222 commented on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
dimon222 commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1025978020


   Plus one here, noticing this frequently after bump from 2.1.4 to 2.2.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dhananjays commented on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
dhananjays commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1025745012


   We're facing this as well after upgrading to `2.2.3` from `2.1.0`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1030900588


   I think this is being handled in one of the fixes coming to 2.2.4 (@ephraimbuddy ?) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Chris7 commented on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
Chris7 commented on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1031543120


   Thanks @ephraimbuddy ,
   
   Looking at the PR, I'm not sure if it will address this problem. Notably, tasks are added to the `running` set by the abandoned task code, which means this line will skip over all these forever queued tasks:
    https://github.com/apache/airflow/pull/19769/files#diff-ac6d6f745ae19450e4bfbd1087d865b5784294354c885136b97df437460d5f10R417


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] arkadiusz-bach edited a comment on issue #21225: Tasks stuck in queued state

Posted by GitBox <gi...@apache.org>.
arkadiusz-bach edited a comment on issue #21225:
URL: https://github.com/apache/airflow/issues/21225#issuecomment-1035656930


   I had similar problem, it was happening, because KubernetesExecutor is picking CeleryExecutor tasks on CeleryKubernetesExecutor
   
   Celery is changing task state to queued and KubernetesExecutor to scheduled, it is happening over and over again(depends how fast task gets to running state)
   
   
   I fixed it by adding additional filter on task queue(which default to 'kubernetes' ) for KubernetesExecutor in couple of places, below is probably the one that is causing most of the problems:
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/executors/kubernetes_executor.py#L455
   
   It is taking all of the tasks in queued state, but it should take only thoose that should run on Kubernetes - has queue equaled to: 
   https://github.com/apache/airflow/blob/5a6a2d604979cb70c5c9d3797738f0876dd38c3b/airflow/config_templates/default_airflow.cfg#L743


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org