You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "t oo (Jira)" <ji...@apache.org> on 2019/09/16 18:07:00 UTC

[jira] [Created] (AIRFLOW-5506) Airflow scheduler stuck

t oo created AIRFLOW-5506:
-----------------------------

             Summary: Airflow scheduler stuck
                 Key: AIRFLOW-5506
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5506
             Project: Apache Airflow
          Issue Type: Bug
          Components: scheduler
    Affects Versions: 1.10.5, 1.10.4
            Reporter: t oo


re-post of [https://stackoverflow.com/questions/57713394/airflow-scheduler-stuck] and slack discussion

 
 
I'm testing the use of Airflow, and after triggering a (seemingly) large number of DAGs at the same time, it seems to just fail to schedule anything and starts killing processes. These are the logs the scheduler prints:
{{[2019-08-29 11:17:13,542] \{scheduler_job.py:214} WARNING - Killing PID 199809
[2019-08-29 11:17:13,544] \{scheduler_job.py:214} WARNING - Killing PID 199809
[2019-08-29 11:17:44,614] \{scheduler_job.py:214} WARNING - Killing PID 2992
[2019-08-29 11:17:44,614] \{scheduler_job.py:214} WARNING - Killing PID 2992
[2019-08-29 11:18:15,692] \{scheduler_job.py:214} WARNING - Killing PID 5174
[2019-08-29 11:18:15,693] \{scheduler_job.py:214} WARNING - Killing PID 5174
[2019-08-29 11:18:46,765] \{scheduler_job.py:214} WARNING - Killing PID 22410
[2019-08-29 11:18:46,766] \{scheduler_job.py:214} WARNING - Killing PID 22410
[2019-08-29 11:19:17,845] \{scheduler_job.py:214} WARNING - Killing PID 42177
[2019-08-29 11:19:17,846] \{scheduler_job.py:214} WARNING - Killing PID 42177
...}}
I'm using a LocalExecutor with a PostgreSQL backend DB. It seems to be happening only after I'm triggering a large number (>100) of DAGs at about the same time using external triggering. As in:
{{airflow trigger_dag DAG_NAME}}
After waiting for it to finish killing whatever processes he is killing, he starts executing all of the tasks properly. I don't even know what these processes were, as I can't really see them after they are killed...

Did anyone encounter this kind of behavior? Any idea why would that happen?

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)