You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Bolke de Bruin (JIRA)" <ji...@apache.org> on 2016/05/18 10:27:13 UTC

[jira] [Created] (AIRFLOW-128) Reduce roundtrips to database in process_dag

Bolke de Bruin created AIRFLOW-128:
--------------------------------------

             Summary: Reduce roundtrips to database in process_dag
                 Key: AIRFLOW-128
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-128
             Project: Apache Airflow
          Issue Type: Improvement
          Components: scheduler
    Affects Versions: Airflow 1.7.1
            Reporter: Bolke de Bruin


process_dag is currently taskinstance based and programmatically determines which tasks should be part of a "dagrun" (between quotes as it is not a real dagrun). This requires a round trip to the database for every task, easily touching 10-20 per dag per executio_ date every heartbeat or even higher for more complex dags. 

In addition the session is not reused within process_dag thus for every dag it will open 10-20 sessions per execution_date every heartbeat.

This is suboptimal. Using dag runs (see AIRFLOW-124) it can be reduced to one roundtrip per dagrun. Lowering the pressure on the db significantly, in addition if using the database session carefully it can be done within one session further lowering the db pressure and speeding up the scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)