You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Bolke de Bruin (JIRA)" <ji...@apache.org> on 2016/05/22 19:44:12 UTC

[jira] [Updated] (AIRFLOW-128) Optimize and refactor process_dag

     [ https://issues.apache.org/jira/browse/AIRFLOW-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bolke de Bruin updated AIRFLOW-128:
-----------------------------------
    Summary: Optimize and refactor process_dag  (was: Reduce roundtrips to database in process_dag)

> Optimize and refactor process_dag
> ---------------------------------
>
>                 Key: AIRFLOW-128
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-128
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: Airflow 1.7.1
>            Reporter: Bolke de Bruin
>
> process_dag is currently taskinstance based and programmatically determines which tasks should be part of a "dagrun" (between quotes as it is not a real dagrun). This requires a round trip to the database for every task, easily touching 10-20 per dag per execution_ date every heartbeat or even higher for more complex dags. 
> In addition the session is not reused within process_dag thus for every dag it will open 10-20 sessions per execution_date every heartbeat.
> This is suboptimal. Using dag runs that are instantiated with their associated tasks (see AIRFLOW-124) it can be reduced to one roundtrip per dagrun. Lowering the pressure on the db significantly, in addition if using the database session carefully it can be done within one session further lowering the db pressure and speeding up the scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)