You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/01/04 03:58:35 UTC

[GitHub] XD-DENG opened a new pull request #4433: [AIRFLOW-3627] Improve queries performance in /task_stats in views.py by ~20%

XD-DENG opened a new pull request #4433: [AIRFLOW-3627] Improve queries performance in /task_stats in views.py by ~20%
URL: https://github.com/apache/incubator-airflow/pull/4433
 
 
   
   ### Jira
   
     - https://issues.apache.org/jira/browse/AIRFLOW-3627
   
   ### Description
   
   `/task_stats` is used quite heavily. Every time the main page is loaded, it will be called. But actually the performance of it can be improved significantly by minor changes.
   
   1. In the sqlalchemy `.filter()` statement, no need to explicitly add `==True`. The value itself is already Boolean. Doing another explicit comparison is adding heavy overhead. This change helps improve query performance significantly
   
   2. We can merge the multiple `.filter()`s (https://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.filter).
   
   **After these changes, the query time can be reduced by ~20%**
   
   #### Benchmarking
   
   ```python
   from airflow.settings import Session
   from airflow import models
   from datetime import datetime
   
   def test0():
       s.query(DM).filter(DM.is_active == True).filter(DM.dag_id == 'tutorial')
   
   def test1():
       s.query(DM).filter(DM.is_active).filter(DM.dag_id == 'tutorial')
   
   def test2():
       s.query(DM).filter(DM.is_active, DM.dag_id == 'tutorial')
   
   if __name__ == '__main__':
   
       import timeit
   
       s = Session()
       DM = models.DagModel
       n = 300000
   
       print(s.query(DM).filter(DM.is_paused == True).count() == s.query(DM).filter(DM.is_paused).count())
   
       print(timeit.timeit("test0()", number=n, setup="from __main__ import test0"))
       print(timeit.timeit("test1()", number=n, setup="from __main__ import test1"))
       print(timeit.timeit("test2()", number=n, setup="from __main__ import test2"))
   
       print("\n --- Run Tests in Reverse Order --- \n")
   
       print(timeit.timeit("test2()", number=n, setup="from __main__ import test2"))
       print(timeit.timeit("test1()", number=n, setup="from __main__ import test1"))
       print(timeit.timeit("test0()", number=n, setup="from __main__ import test0"))
   ```
   
   Result:
   ```
   True
   
   94.96372179000173
   65.610312083998
   65.09665720800695
   
    --- Run Tests in Reverse Order ---
   
   61.59604939300334
   65.70635982099338
   83.37481481899158
   ```
   ### Code Quality
   
   - [x] Passes `flake8`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services