You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kamil Bregula (Jira)" <ji...@apache.org> on 2020/03/01 13:50:00 UTC

[jira] [Updated] (AIRFLOW-6965) The get_task_instances method is performed three times during one creation of the DAGRun file.

     [ https://issues.apache.org/jira/browse/AIRFLOW-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kamil Bregula updated AIRFLOW-6965:
-----------------------------------
    Summary: The get_task_instances method is performed three times during one creation of the DAGRun file.  (was: The method is performed playthree times during one creation of the DAGRun file.)

> The get_task_instances method is performed three times during one creation of the DAGRun file.
> ----------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-6965
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6965
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: scheduler
>    Affects Versions: 1.10.9
>            Reporter: Kamil Bregula
>            Priority: Major
>
> Hello,
> Task_instances queries are executed three times. This is redundant. If we can limit the number of these queries, we can achieve performance improvements.
> First query:
> perform_file: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792]
> process_dags: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853]
> create_dag_run: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L726]
> create_dagrun: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L638]
> verify_integrity: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dag.py#L1454]
> get_task_instances: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436]
> Third query:
> perform_file: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792]
> process_dags: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853]
> _process_task_instances: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738]
> update_state: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L685]
> get_task_instances: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L292
> ]
> perform_file: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792]
> process_dags: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853]
> _process_task_instances: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738]
> verify_integrity: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L684]
> get_task_instances: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436]
> [|https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L292]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)