You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ash Berlin-Taylor (JIRA)" <ji...@apache.org> on 2019/06/07 15:50:00 UTC

[jira] [Commented] (AIRFLOW-4747) Airflow Scheduling and DAG Parsing

    [ https://issues.apache.org/jira/browse/AIRFLOW-4747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858753#comment-16858753 ] 

Ash Berlin-Taylor commented on AIRFLOW-4747:
--------------------------------------------

AIRFLOW-2761 (PR: https://github.com/apache/airflow/pull/4234/files) which landed in 1.10.3 might help things a bit - depending exactly what the slow bit is. (Check out the graphs in the PR)

> Airflow Scheduling and DAG Parsing
> ----------------------------------
>
>                 Key: AIRFLOW-4747
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4747
>             Project: Apache Airflow
>          Issue Type: Wish
>          Components: scheduler
>    Affects Versions: 1.10.2
>            Reporter: Michael Smith
>            Priority: Major
>
> I read somewhere that there was going to be an attempt to decouple Airflow's DAG  parsing from its scheduler function. My assumption would be that this could be achieved, for example, by driving Scheduler actions (almost?) entirely from the Airflow database. This would eliminate the need for a continuously running DAG parse process?
> At present we observe significant lag and significant overheads with the current (1.10.2) model of scheduling which appears to be heavily coupled with the DAG parse. In our environment DAG parse times are typically >1 sec per DAG. This means a single DAG parse cycle can take several minutes. DAG parsing is a large CPU overhead (on a single node cloud VM we've been forced to allocate 2 cpu nodes for example). In addition production jobs suffer from fairly large lag times between tasks (time between task end and start of follow on task). This can be in the order of minutes even when task slots are available.
>  
> Is anyone working on this enhancement or could provide guidance on resolving (possibly a configuration issue our side, but I have experimented with configuration options extensively).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)