You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/02 15:19:19 UTC
[GitHub] [airflow] easontm commented on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing
easontm commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-853117451
More data, if it can benefit:
I'm using Airflow `2.1.0` (upgraded from `1.10.15` in our development environment) and trying to use the `KubernetesExecutor`. I am running _extremely_ basic DAGs that exist solely to test. In `airflow.cfg` I set `delete_worker_pods = False` so that I could try and examine what's going on. As far as I can tell, my worker pods are being created, immediately considering themselves successful (not sure if they actually receive a task to complete), and terminating. I don't think the pods are "successful but not writing the success state to the DB" because one of my tasks is a simple `CREATE TABLE` statement on my data warehouse.
Here are the results of various logs:
If I run `kubectl get pods`, I see my `airflow-webserver` and `airflow-scheduler` pods, as well as some "Completed" worker pods.
```
NAME READY STATUS RESTARTS AGE
airflow-scheduler 1/1 Running 0 45m
airflow-webserver-1 2/2 Running 0 56m
airflow-webserver-2 2/2 Running 0 56m
airflow-webserver-3 2/2 Running 0 45m
airflow-webserver-4 2/2 Running 0 45m
airflow-webserver-5 2/2 Running 0 45m
<GENERATED_WORKER_POD_1> 0/1 Completed 0 15m
<GENERATED_WORKER_POD_2> 0/1 Completed 0 56m
<GENERATED_WORKER_POD_3> 0/1 Completed 0 45m
```
`kubectl logs <my_worker_pod>` - I can tell that the pod was briefly alive but that did nothing, because the only output is a line that always appears as a side effect of the Docker image config. In contrast, if I get the logs from one of my functional `1.10.15` pods, I can see the task start:
```
[2021-06-02 13:33:01,358] {__init__.py:50} INFO - Using executor LocalExecutor
[2021-06-02 13:33:01,358] {dagbag.py:417} INFO - Filling up the DagBag from /usr/local/airflow/dags/my_dag.py
...
etc
```
`kubectl describe pod <my_worker_pod>` - the event log is quite tame:
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m21s default-scheduler Successfully assigned <GENERATED_POD_NAME> to <EC2_INSTANCE>
Normal Pulling 3m19s kubelet Pulling image <MY_AIRFLOW_2.1.0_DOCKER_IMAGE>
Normal Pulled 3m19s kubelet Successfully pulled image <MY_AIRFLOW_2.1.0_DOCKER_IMAGE>
Normal Created 3m19s kubelet Created container base
Normal Started 3m19s kubelet Started container base
```
That's it. No further events.
`kubectl logs airflow-scheduler` - I've trimmed the logs significantly, but here are the statements mentioning a stuck task
```
[2021-06-02 14:49:42,742] {kubernetes_executor.py:369} INFO - Attempting to finish pod; pod_id: <GENERATED_POD_NAME>; state: None; annotations: {'dag_id': '<TEST_DAG>', 'task_id': '<TASK_ID>', 'execution_date': '2021-05-30T00:00:00+00:00', 'try_number': '1'}
[2021-06-02 14:49:42,748] {kubernetes_executor.py:546} INFO - Changing state of (TaskInstanceKey(dag_id='<TEST_DAG>', task_id='<TASK_ID>', execution_date=datetime.datetime(2021, 5, 30, 0, 0, tzinfo=tzlocal()), try_number=1), None, '<GENERATED_POD_NAME>', 'airflow', '1000200742') to None
[2021-06-02 14:49:42,751] {scheduler_job.py:1212} INFO - Executor reports execution of <TASK_ID> execution_date=2021-05-30 00:00:00+00:00 exited with status None for try_number 1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org