You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/20 14:32:08 UTC

[GitHub] [airflow] bensonnd commented on issue #13542: Task stuck in "scheduled" or "queued" state, pool has all slots queued, nothing is executing

bensonnd commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-823322869


   Following on what @pelaprat mentioned, we are not running with either the CeleryExecutor or KubernetesExecutor, but the LocalExecutor in a Docker container. We get tasks stuck in scheduled or queued and the dag is marked as running, but is not. It seems like the scheduler falls asleep or misses queued tasks.
   
   Either clearing the queued tasks or restarting the scheduler with `airflow scheduler` inside the container gets it moving again. 
   
   We've observed two different sets of logs over and over again when it does get in this stuck state. One detecting zombie jobs, and the other just checking for the regular heartbeat.
   
   ```
        61 File Path                                       PID  Runtime      # DAGs    # Errors  Last Runtime    Last Run
        62 -------------------------------------------  ------  ---------  --------  ----------  --------------  -------------------
        63 /opt/ingest/batch_ingest/dags/ingest_dag.py  120318  4.02s             1           0  5.43s           2021-04-08T16:37:43
        64 ================================================================================
        65 [2021-04-08 16:37:58,444] {dag_processing.py:1071} INFO - Finding 'running' jobs without a recent heartbeat
        66 [2021-04-08 16:37:58,445] {dag_processing.py:1075} INFO - Failing jobs without heartbeat after 2021-04-08 16:32:58.445055+00:00
        67 [2021-04-08 16:37:58,455] {dag_processing.py:1098} INFO - Detected zombie job: {'full_filepath': '/opt/ingest/batch_ingest/dags/ingest_dag.py', 'msg': 'Detected as zombie', 'simple_task_instance': <airflow.models.taskinstance.Si>
        68 [2021-04-08 16:38:08,595] {dag_processing.py:1071} INFO - Finding 'running' jobs without a recent heartbeat
        69 [2021-04-08 16:38:08,596] {dag_processing.py:1075} INFO - Failing jobs without heartbeat after 2021-04-08 16:33:08.596291+00:00
        70 [2021-04-08 16:38:08,607] {dag_processing.py:1098} INFO - Detected zombie job: {'full_filepath': '/opt/ingest/batch_ingest/dags/ingest_dag.py', 'msg': 'Detected as zombie', 'simple_task_instance': <airflow.models.taskinstance.Si>
        71 [2021-04-08 16:38:18,650] {dag_processing.py:1071} INFO - Finding 'running' jobs without a recent heartbeat
        72 [2021-04-08 16:38:18,651] {dag_processing.py:1075} INFO - Failing jobs without heartbeat after 2021-04-08 16:33:18.651308+00:00
        73 [2021-04-08 16:38:18,661] {dag_processing.py:1098} INFO - Detected zombie job: {'full_filepath': '/opt/ingest/batch_ingest/dags/ingest_dag.py', 'msg': 'Detected as zombie', 'simple_task_instance': <airflow.models.taskinstance.Si>
        74 [2021-04-08 16:38:22,690] {dag_processing.py:838} INFO - 
        75 ================================================================================
        76 DAG File Processing Stats
   ```
   
   or
   
   ```
   File Path                                    PID    Runtime      # DAGs    # Errors  Last Runtime    Last Run
   -------------------------------------------  -----  ---------  --------  ----------  --------------  -------------------
   /opt/ingest/batch_ingest/dags/ingest_dag.py                           1           0  1.52s           2021-04-08T18:29:22
   ================================================================================
   [2021-04-08 18:29:33,015] {dag_processing.py:1071} INFO - Finding 'running' jobs without a recent heartbeat
   [2021-04-08 18:29:33,016] {dag_processing.py:1075} INFO - Failing jobs without heartbeat after 2021-04-08 18:24:33.016077+00:00
   [2021-04-08 18:29:43,036] {dag_processing.py:1071} INFO - Finding 'running' jobs without a recent heartbeat
   [2021-04-08 18:29:43,037] {dag_processing.py:1075} INFO - Failing jobs without heartbeat after 2021-04-08 18:24:43.037136+00:00
   [2021-04-08 18:29:53,072] {dag_processing.py:1071} INFO - Finding 'running' jobs without a recent heartbeat
   [2021-04-08 18:29:53,072] {dag_processing.py:1075} INFO - Failing jobs without heartbeat after 2021-04-08 18:24:53.072257+00:00
   [2021-04-08 18:29:53,080] {dag_processing.py:838} INFO - 
   ================================================================================
   DAG File Processing Stats
   ```
   
   We are in the process of pushing 2.0.2 as @kaxil noted to see if that is the issue. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org