You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "David Vaughan (JIRA)" <ji...@apache.org> on 2017/04/21 18:09:04 UTC
[jira] [Created] (AIRFLOW-1139) Scheduler runs very slowly when
many DAGs in DAG directory
David Vaughan created AIRFLOW-1139:
--------------------------------------
Summary: Scheduler runs very slowly when many DAGs in DAG directory
Key: AIRFLOW-1139
URL: https://issues.apache.org/jira/browse/AIRFLOW-1139
Project: Apache Airflow
Issue Type: Improvement
Affects Versions: 1.8.0
Environment: macOS Sierra, v10.12.2, MacBook Pro, 2.5 GHz Intel Core i7, 16 GB RAM
Reporter: David Vaughan
Priority: Minor
When we have several (10-15) DAGs in our DAG directory, and each of them is pretty large (~900 tasks on average), Airflow's periodic re-processing of the DAGs in our DAG directory takes a long time and takes resources away from running DAGs.
Almost always we only have one DAG actually running at any given time, and the rest are paused. The one running DAG, however, crawls along noticeably slower than if we only have one or two DAGs total in the DAG directory.
I think it would be nice to have an option to turn off re-processing of DAGs completely, after the initial processing.
The way we use Airflow right now, we don't edit our existing DAGs frequently, so we have no need for periodic refresh. We have experimented with the min_file_process_interval option in airflow.cfg, but setting it to small numbers causes no noticeable change, and setting it to very large numbers (to emulate not refreshing at all) actually causes the DAG to run much slower than it already was.
Is anybody else still experiencing this? Are there existing ways to avoid this problem? Here are some links to people referencing, I believe, this same issue, but they're all from last year:
https://issues.apache.org/jira/browse/AIRFLOW-160
https://github.com/apache/incubator-airflow/pull/1636
https://issues.apache.org/jira/browse/AIRFLOW-435
http://stackoverflow.com/questions/40466732/apache-airflow-scheduler-slowness
Thanks in advance for any discussion or help.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)