You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/02 14:24:56 UTC

[GitHub] [airflow] mik-laj commented on issue #7597: [AIRFLOW-6497] Avoid loading DAGs in the main scheduler loop

mik-laj commented on issue #7597: [AIRFLOW-6497] Avoid loading DAGs in the main scheduler loop
URL: https://github.com/apache/airflow/pull/7597#issuecomment-593427244
 
 
   > Do you have any numbers for this please?
   
   It is very difficult to measure because it depends on the specific DAG File. Some DAG files take up to 30 seconds or more to load. During this time, the scheduler loop is stopped and does not start any new tasks.  I can measure how long it takes to load example_dags, but it's not just a subset of cases. It doesn't provide real values,... but I created a spreadsheet:
   When I ran the following script:
   ```python
   import os
   import sys
   import time
   from contextlib import contextmanager
   
   import psutil
   
   from airflow.models import DagBag
   
   
   @contextmanager
   def timing_ctx():
       time1 = time.time()
       try:
           yield
       finally:
           time2 = time.time()
           diff = (time2 - time1) * 1000.0
           print('Time: %0.3f ms' % diff)
   
   
   def get_process_memory():
       process = psutil.Process(os.getpid())
       return process.memory_info().rss
   
   
   @contextmanager
   def memory_ctx():
       before = get_process_memory()
       try:
           yield
       finally:
           after = get_process_memory()
           diff = after - before
           print('Memory: %d bytes' % diff)
   
   
   filename = sys.argv[1]
   
   with timing_ctx(), memory_ctx():
       print("Filename:", filename)
       DagBag(dag_folder=filename, include_examples=False, store_serialized_dags=False)
   ```
   ```
   find  airflow/providers/google/cloud/example_dags/ -type f | sort| grep -v "__init__.py" | grep -v "__init__.py" | xargs -n 1 readlink -e  | xargs -t -n 1 python /files/performance/load_dag_perf_test.py
   ```
   I got following values:
   https://docs.google.com/spreadsheets/d/1T0kLEQLSU5ujxU-W_PoxddbkEgWx70EQkpwjRLNWaic/edit?usp=sharing
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services