You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kamil Bregula (Jira)" <ji...@apache.org> on 2020/03/01 13:50:00 UTC
[jira] [Updated] (AIRFLOW-6965) The get_task_instances method is
performed three times during one creation of the DAGRun file.
[ https://issues.apache.org/jira/browse/AIRFLOW-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kamil Bregula updated AIRFLOW-6965:
-----------------------------------
Summary: The get_task_instances method is performed three times during one creation of the DAGRun file. (was: The method is performed playthree times during one creation of the DAGRun file.)
> The get_task_instances method is performed three times during one creation of the DAGRun file.
> ----------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-6965
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6965
> Project: Apache Airflow
> Issue Type: Improvement
> Components: scheduler
> Affects Versions: 1.10.9
> Reporter: Kamil Bregula
> Priority: Major
>
> Hello,
> Task_instances queries are executed three times. This is redundant. If we can limit the number of these queries, we can achieve performance improvements.
> First query:
> perform_file: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792]
> process_dags: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853]
> create_dag_run: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L726]
> create_dagrun: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L638]
> verify_integrity: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dag.py#L1454]
> get_task_instances: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436]
> Third query:
> perform_file: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792]
> process_dags: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853]
> _process_task_instances: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738]
> update_state: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L685]
> get_task_instances: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L292
> ]
> perform_file: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L792]
> process_dags: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L853]
> _process_task_instances: [https://github.com/apache/airflow/blob/cc562dd/airflow/jobs/scheduler_job.py#L738]
> verify_integrity: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/jobs/scheduler_job.py#L684]
> get_task_instances: [https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L436]
> [|https://github.com/apache/airflow/blob/cc562ddfc7a53932d89c92ee1fb8f780c1fb38e3/airflow/models/dagrun.py#L292]
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)