You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/02 14:24:56 UTC
[GitHub] [airflow] mik-laj commented on issue #7597: [AIRFLOW-6497] Avoid
loading DAGs in the main scheduler loop
mik-laj commented on issue #7597: [AIRFLOW-6497] Avoid loading DAGs in the main scheduler loop
URL: https://github.com/apache/airflow/pull/7597#issuecomment-593427244
> Do you have any numbers for this please?
It is very difficult to measure because it depends on the specific DAG File. Some DAG files take up to 30 seconds or more to load. During this time, the scheduler loop is stopped and does not start any new tasks. I can measure how long it takes to load example_dags, but it's not just a subset of cases. It doesn't provide real values,... but I created a spreadsheet:
When I ran the following script:
```python
import os
import sys
import time
from contextlib import contextmanager
import psutil
from airflow.models import DagBag
@contextmanager
def timing_ctx():
time1 = time.time()
try:
yield
finally:
time2 = time.time()
diff = (time2 - time1) * 1000.0
print('Time: %0.3f ms' % diff)
def get_process_memory():
process = psutil.Process(os.getpid())
return process.memory_info().rss
@contextmanager
def memory_ctx():
before = get_process_memory()
try:
yield
finally:
after = get_process_memory()
diff = after - before
print('Memory: %d bytes' % diff)
filename = sys.argv[1]
with timing_ctx(), memory_ctx():
print("Filename:", filename)
DagBag(dag_folder=filename, include_examples=False, store_serialized_dags=False)
```
```
find airflow/providers/google/cloud/example_dags/ -type f | sort| grep -v "__init__.py" | grep -v "__init__.py" | xargs -n 1 readlink -e | xargs -t -n 1 python /files/performance/load_dag_perf_test.py
```
I got following values:
https://docs.google.com/spreadsheets/d/1T0kLEQLSU5ujxU-W_PoxddbkEgWx70EQkpwjRLNWaic/edit?usp=sharing
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services