You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/12 10:35:33 UTC

[GitHub] [airflow] Overbryd commented on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

Overbryd commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-817698207

@ephraimbuddy no I do not have that issue.

When I try to observe it closely, it always goes like this:

* There are 3 tasks in the `mssql_dwh` pool. All of them have the state `queued`. Nothing is running. Nothing is started. The scheduler does not start anything new, because the pool has 0 available slots.
* Then I clear those 3 tasks.
* The scheduler immediately picks some tasks and puts them into `queued` state. Meanwhile, Kubernetes starts the pods.
* If I am lucky, some of the tasks get executed properly, and the scheduler continues what it is supposed to do.
* But not long, it starts to accumulate "dead" tasks in `queued` state. Those are NOT running in Kubernetes.
* I checked the scheduler for error logs, and I can see some log lines like these

```
ERROR - Executor reports task instance <TaskInstance: <redacted> 2021-04-10 00:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
```

So think there must be some kind of race condition between the scheduler and the Kubernetes pod startup.
Some tasks finish really quickly (successfully so) and the scheduler KEEPS them in `queued` state.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org