You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/02/22 15:05:00 UTC

[jira] [Commented] (AIRFLOW-6881) Bulk fetch DAGRun for create_dag_run

    [ https://issues.apache.org/jira/browse/AIRFLOW-6881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042596#comment-17042596 ] 

ASF GitHub Bot commented on AIRFLOW-6881:
-----------------------------------------

mik-laj commented on pull request #7502: [AIRFLOW-6881][depends on AIRFLOW-6869][WIP] Bulk fetch DAGRun for create_dag_run
URL: https://github.com/apache/airflow/pull/7502
 
 
   Another performance optimization.
   When I have following DAG file 
   ```
   from datetime import timedelta
   
   from airflow.models import DAG
   from airflow.operators.bash_operator import BashOperator
   from airflow.operators.dummy_operator import DummyOperator
   from airflow.utils.dates import days_ago
   
   args = {
       'owner': 'airflow',
       'start_date': days_ago(3),
   }
   
   def create_dag(dag_number):
       dag = DAG(
           dag_id=f'perf_50_dag_dummy_tasks_{dag_number}_of_50', default_args=args,
           schedule_interval="@once",
           dagrun_timeout=timedelta(minutes=60),
           is_paused_upon_creation=False,
       )
   
       for j in range(1, 10):
           DummyOperator(
               task_id='task_{}_of_5'.format(j),
               dag=dag
           )
   
       return dag
   
   for i in range(1, 200):
       globals()[f"dag_{i}"] = create_dag(i)
   ```
   I ran the following code:
   ```
   from airflow.jobs.scheduler_job import DagFileProcessor
   
   processor = DagFileProcessor([], log)
   processor.process_file(DAG_FILE, None, pickle_dags=False)
   ```
   I got the following values.
   **Before:**
   Query count:  2589
   Average time 7980.117 ms
   **After:**
   Query count:  2390
   Average time: 7261.959 ms
   **Diff:**
   Query count: -199 (-7%)
   Average time: -719 ms (-9%)
   
   Thanks for support to @evgenyshulman from Databand!
   
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [ ] Description above provides context of the change
   - [ ] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [ ] Unit tests coverage for changes (not needed for documentation changes)
   - [ ] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [ ] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Bulk fetch DAGRun for create_dag_run
> ------------------------------------
>
>                 Key: AIRFLOW-6881
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6881
>             Project: Apache Airflow
>          Issue Type: New Feature
>          Components: scheduler
>    Affects Versions: 1.10.9
>            Reporter: Kamil Bregula
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)