You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/29 16:13:41 UTC

[GitHub] [airflow] XD-DENG opened a new pull request #12707: Refine the DB query logics in www.views.task_stats()

XD-DENG opened a new pull request #12707:
URL: https://github.com/apache/airflow/pull/12707


   - given `filter_dag_ids` is either `allowed_dag_ids`, or intersection of `allowed_dag_ids` AND `selected_dag_ids`,
     hence, no matter if `selected_dag_ids` is `None` or not, `filter_dag_ids` should ALWAYS be considered into the SQL query.
   
     However, currently, if `selected_dag_ids` is None, the query is actually getting the full result (then 'filter' at the end).
     This means more (unnecessary) data travel between Airflow and DB.
   
   - When we join table A and B with `A.id == B.id` (default is `INNER` join), if we always confirm ALL A.id is in a specific list,
     implicitly ALL ids in the result table are already guaranteed in this specific list as well.
     This is why the two redundant `.filter()` chunks are removed.
   
    I didn't do performance benchmarking, but minor performance improvement should be expected.
    Meanwhile, this change makes the code cleaner.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG merged pull request #12707: Refine the DB query logics in www.views.task_stats()

Posted by GitBox <gi...@apache.org>.
XD-DENG merged pull request #12707:
URL: https://github.com/apache/airflow/pull/12707


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on pull request #12707: Refine the DB query logics in www.views.task_stats()

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on pull request #12707:
URL: https://github.com/apache/airflow/pull/12707#issuecomment-735456216


   I checked and it should be a simple cherry-pick into `v2-0-stable` without conflict, so added `Airflow 2.0.0-beta4` milestone. FYI @ashb 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #12707: Refine the DB query logics in www.views.task_stats()

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #12707:
URL: https://github.com/apache/airflow/pull/12707#discussion_r532231189



##########
File path: airflow/www/views.py
##########
@@ -678,10 +677,6 @@ def task_stats(self, session=None):
                 running_dag_run_query_result.c.execution_date == TaskInstance.execution_date,
             ),
         )
-        if selected_dag_ids:
-            running_task_instance_query_result = running_task_instance_query_result.filter(
-                TaskInstance.dag_id.in_(filter_dag_ids)
-            )

Review comment:
       This chunk is not necessary because of line 662 + line 676 above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #12707: Refine the DB query logics in www.views.task_stats()

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #12707:
URL: https://github.com/apache/airflow/pull/12707#discussion_r532231218



##########
File path: airflow/www/views.py
##########
@@ -710,12 +704,6 @@ def task_stats(self, session=None):
                     last_dag_run.c.execution_date == TaskInstance.execution_date,
                 ),
             )
-            # pylint: disable=no-member
-            if selected_dag_ids:
-                last_task_instance_query_result = last_task_instance_query_result.filter(
-                    TaskInstance.dag_id.in_(filter_dag_ids)
-                )
-            # pylint: enable=no-member

Review comment:
       Similar to the case in the last comment




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #12707: Refine the DB query logics in www.views.task_stats()

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #12707:
URL: https://github.com/apache/airflow/pull/12707#issuecomment-735464322


   @XD-DENG Thanks -- right now I'm still simply merging v2-0-stable with master after first checking the changes. I'm _hoping_ I'll continue to be able to do that right up until rc1 is cut (it involves looking at every commit to "core" and making a judgement call.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12707: Refine the DB query logics in www.views.task_stats()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12707:
URL: https://github.com/apache/airflow/pull/12707#issuecomment-735420427


   The PR should be OK to be merged with just subset of tests as it does not modify Core of Airflow. The committers might merge it or can add a label 'full tests needed' and re-run it to run all tests if they see it is needed!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org