You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/10 03:34:40 UTC

[GitHub] [airflow] huozhanfeng commented on issue #17437: It's too slow to recognize new dag file when there are a log of dags files

huozhanfeng commented on issue #17437:
URL: https://github.com/apache/airflow/issues/17437#issuecomment-895702439


   > Raise the `min_file_process_interval` to `600` (10 mins) or even `6000` (100 mins). Newly added or modified files will already be parsed, the dag parser will skip the `min_file_process_interval` check if a file is recently modified.
   > 
   > We had benchmarked this with more than 10k dag files
   
   The PR is good but I wonder whether it can solve this problem. Suppose there are 10k dags and it needs 10mins to consume and process the whole 10k tasks in `_file_path_queue`, can the first dag be parsed when it is just be put into `_file_path_queue` by `calling prepare_file_path_queue` and after that, we modify the first dag file? 
   
   The logic this PR improved is in method `prepare_file_path_queue`, but this method is called only when `_file_path_queue` is empty in method `_run_parsing_loop`. So maybe it will delay almost 10mins to parse the modified first dag.
   
   @ashb could you please help to take look when you have free time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org