You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/05 10:30:37 UTC

[GitHub] [airflow] potiuk edited a comment on issue #17437: It's too slow to recognize new dag file when there is a log of dags files

potiuk edited a comment on issue #17437:
URL: https://github.com/apache/airflow/issues/17437#issuecomment-893348355


   I think, before you start requesting new features, it is great to check if the existing features are not working well for you:
   
   1) Did you try to configure multiple schedulers? Airflow 2 has been specifically designed to be able to scale it's operations with mulltiple schedulers. Please try to increase the number of schedulers you have and see if that can improve your experience.
   
   2) There are a number of settings that you can configure to prioritize scheduler and improve it's speed. Did you try to fine-tune them? For example "file-parsing-sort-mode" should be able to control the sequence of parsing the file https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#file-parsing-sort-mode
   
   3) There are also other parameters that can control the behaviour of parsers (see the "scrheduler" section in config.
   
   There are also plenty of materials that you can learn from and try to fine tune the behaviour of scheduler:
   
   * Official documentation https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html
   * Astronomer's Blog detailing the new features, tunables and scalabiliy of the scheduler https://www.astronomer.io/blog/airflow-2-scheduler#:~:text=As%20part%20of%20Apache%20Airflow,once%20their%20dependencies%20are%20met.
   * This fantastic talk from @ashb  about Scheduler in Airflow 2 and how it works and how it can be tuned https://www.youtube.com/watch?v=DYC4-xElccE
   
   Please take a look at those resources and try to fine tune your scheduler accordingly. Come back please with your findings and some more data detailing what you have done and how you tried to fine-tune your configuration. 
   
   Ideally, it would be great if you can report both - if you manage to improve your configuration, let us know what worked and why, if you will try all of that and it did not work - please also report back all the observations you had during your trials - CPU, memory used, I/O usage, what kind of storage you have for dags, whether you tried to fine tune the storage options (for example we know that you need to buy extra I/O when you use EFS as DAG storage otherwise you are limited with the efficiency of the storage). You need to tell us  where you saw the bottlenecks and how you tried to overcome them..
   
    That will help us to see if there are still some bottlenecks that we were not able to foresee when we designed fine-tuning possibilities for the scheduler (but we need more data from you). 
   
   I am closing it for now, until you can provide this data for us to investigate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org