You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Yee Ting Li (JIRA)" <ji...@apache.org> on 2017/11/29 06:15:00 UTC

[jira] [Created] (AIRFLOW-1864) Dags with large number of Tasks are very slow

Yee Ting Li created AIRFLOW-1864:
------------------------------------

             Summary: Dags with large number of Tasks are very slow
                 Key: AIRFLOW-1864
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1864
             Project: Apache Airflow
          Issue Type: Bug
          Components: scheduler
    Affects Versions: 1.8.2
         Environment: job_heartbeat_sec = 1
scheduler_heartbeat_sec = 1
            Reporter: Yee Ting Li


i have a dag which according to the scheduler logs([DAG File Processing Stats), take about 10 seconds to process. it has about 20 odd tasks, mostly linear in nature.

it takes a long time for the dag to complete, even though the individual tasks are fast - many only a few seconds with a couple of longer tasks.

looking at the scheduler logs more closely, it seems that most of the time it is waiting for jobs to be put into the executor. i am assuming (without looking at the code), that during the time the dag is being processed, no jobs are being queued or being send to the executor. it seems as though the entire scheduler is basically waiting for the dag processing to finish before it does anything - which means that larger dags inherently do no scale and linear graphs take a significant amount of time to finish.

is there a way to improve the responsiveness of new jobs being queued and executed beyond the two parameters (job_heartbeat_sec and scheduler_heartbeat_sec)?






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)