You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "David Vaughan (JIRA)" <ji...@apache.org> on 2017/04/21 18:09:04 UTC

[jira] [Created] (AIRFLOW-1139) Scheduler runs very slowly when many DAGs in DAG directory

David Vaughan created AIRFLOW-1139:
--------------------------------------

             Summary: Scheduler runs very slowly when many DAGs in DAG directory
                 Key: AIRFLOW-1139
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1139
             Project: Apache Airflow
          Issue Type: Improvement
    Affects Versions: 1.8.0
         Environment: macOS Sierra, v10.12.2, MacBook Pro, 2.5 GHz Intel Core i7, 16 GB RAM
            Reporter: David Vaughan
            Priority: Minor


When we have several (10-15) DAGs in our DAG directory, and each of them is pretty large (~900 tasks on average), Airflow's periodic re-processing of the DAGs in our DAG directory takes a long time and takes resources away from running DAGs.

Almost always we only have one DAG actually running at any given time, and the rest are paused. The one running DAG, however, crawls along noticeably slower than if we only have one or two DAGs total in the DAG directory.

I think it would be nice to have an option to turn off re-processing of DAGs completely, after the initial processing.

The way we use Airflow right now, we don't edit our existing DAGs frequently, so we have no need for periodic refresh. We have experimented with the min_file_process_interval option in airflow.cfg, but setting it to small numbers causes no noticeable change, and setting it to very large numbers (to emulate not refreshing at all) actually causes the DAG to run much slower than it already was.

Is anybody else still experiencing this? Are there existing ways to avoid this problem? Here are some links to people referencing, I believe, this same issue, but they're all from last year:

https://issues.apache.org/jira/browse/AIRFLOW-160
https://github.com/apache/incubator-airflow/pull/1636
https://issues.apache.org/jira/browse/AIRFLOW-435
http://stackoverflow.com/questions/40466732/apache-airflow-scheduler-slowness

Thanks in advance for any discussion or help.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)