You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/01/04 03:58:35 UTC
[GitHub] XD-DENG opened a new pull request #4433: [AIRFLOW-3627] Improve
queries performance in /task_stats in views.py by ~20%
XD-DENG opened a new pull request #4433: [AIRFLOW-3627] Improve queries performance in /task_stats in views.py by ~20%
URL: https://github.com/apache/incubator-airflow/pull/4433
### Jira
- https://issues.apache.org/jira/browse/AIRFLOW-3627
### Description
`/task_stats` is used quite heavily. Every time the main page is loaded, it will be called. But actually the performance of it can be improved significantly by minor changes.
1. In the sqlalchemy `.filter()` statement, no need to explicitly add `==True`. The value itself is already Boolean. Doing another explicit comparison is adding heavy overhead. This change helps improve query performance significantly
2. We can merge the multiple `.filter()`s (https://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.filter).
**After these changes, the query time can be reduced by ~20%**
#### Benchmarking
```python
from airflow.settings import Session
from airflow import models
from datetime import datetime
def test0():
s.query(DM).filter(DM.is_active == True).filter(DM.dag_id == 'tutorial')
def test1():
s.query(DM).filter(DM.is_active).filter(DM.dag_id == 'tutorial')
def test2():
s.query(DM).filter(DM.is_active, DM.dag_id == 'tutorial')
if __name__ == '__main__':
import timeit
s = Session()
DM = models.DagModel
n = 300000
print(s.query(DM).filter(DM.is_paused == True).count() == s.query(DM).filter(DM.is_paused).count())
print(timeit.timeit("test0()", number=n, setup="from __main__ import test0"))
print(timeit.timeit("test1()", number=n, setup="from __main__ import test1"))
print(timeit.timeit("test2()", number=n, setup="from __main__ import test2"))
print("\n --- Run Tests in Reverse Order --- \n")
print(timeit.timeit("test2()", number=n, setup="from __main__ import test2"))
print(timeit.timeit("test1()", number=n, setup="from __main__ import test1"))
print(timeit.timeit("test0()", number=n, setup="from __main__ import test0"))
```
Result:
```
True
94.96372179000173
65.610312083998
65.09665720800695
--- Run Tests in Reverse Order ---
61.59604939300334
65.70635982099338
83.37481481899158
```
### Code Quality
- [x] Passes `flake8`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services