You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/04 18:50:45 UTC

[GitHub] [airflow] shenoykarthikd opened a new issue #11266: Change FAQ documentation for max_threads

shenoykarthikd opened a new issue #11266:
URL: https://github.com/apache/airflow/issues/11266


   FAQ Documentation for max_threads currently reads as follows:
   
   max_threads: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by max_threads with default value of 2. User should increase this value to a larger value (e.g numbers of cpus where scheduler runs - 1) in production.
   
   The example above creates confusion in the minds of new developers as it is incorrectly understood as the maximum number of threads for the scheduler cannot exceed the number of cpus - 1. I have seen many Airflow installations where the value is setup as max number of cpus - 1, while the upper limit of threads should actually be determined by the size of the instance (CPU + Memory) onto which the scheduler is installed. Due to this misunderstanding, I've heard many new Airflow developers say that Airflow is very slow at scheduling DAGs. When I delve deeper into their config I see the max_threads configuration limited to the number of CPUs.
   
   Kindly consider changing this to the below as follows - 
   max_threads: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by max_threads with default value of 2. User should increase this value to a larger value that fits the size of the installed hardware in production.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] teastburn commented on issue #11266: Change FAQ documentation for max_threads

Posted by GitBox <gi...@apache.org>.
teastburn commented on issue #11266:
URL: https://github.com/apache/airflow/issues/11266#issuecomment-705823369


   `User should increase this value to a larger value that fits the size of the installed hardware in production.` doesn't seem to guide the user on what `fit` and `size` mean. CPU? RAM? It should be as specific as possible in my opinion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #11266: Change FAQ documentation for max_threads

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #11266:
URL: https://github.com/apache/airflow/issues/11266#issuecomment-787445930


   The [quote](https://airflow.apache.org/docs/apache-airflow/1.10.13/faq.html#how-to-reduce-airflow-dag-scheduling-latency-in-production) you are referring to is from 1.10.13 was changed in 1.10.14 by https://github.com/apache/airflow/pull/12605 when `max_threads` is changed to `parsing_processes` After that it was refactored again for Airflow 2.0
   See updated [doc](https://airflow.apache.org/docs/apache-airflow/stable/faq.html#how-to-reduce-airflow-dag-scheduling-latency-in-production)
   
   
   Closing as the docs are updated.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] shenoykarthikd commented on issue #11266: Change FAQ documentation for max_threads

Posted by GitBox <gi...@apache.org>.
shenoykarthikd commented on issue #11266:
URL: https://github.com/apache/airflow/issues/11266#issuecomment-708014805


   Good point, @teastburn! I have typically changed the max_threads value for my Airflow clusters so that CPU utilization doesn't increase beyond 80% over a reasonable time period. Rarely ever checked the RAM as memory usage is typically pretty low on the master node. It may make sense to change the FAQ as below unless someone feels memory should also be a consideration.
   
   max_threads: Scheduler will spawn multiple threads in parallel to schedule dags. This is controlled by max_threads with default value of 2. User should increase it to a higher value in production. Note that an increase in this value causes a corresponding increase in CPU utilization.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal closed issue #11266: Change FAQ documentation for max_threads

Posted by GitBox <gi...@apache.org>.
eladkal closed issue #11266:
URL: https://github.com/apache/airflow/issues/11266


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on issue #11266: Change FAQ documentation for max_threads

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on issue #11266:
URL: https://github.com/apache/airflow/issues/11266#issuecomment-787445930


   The [quote](https://airflow.apache.org/docs/apache-airflow/1.10.13/faq.html#how-to-reduce-airflow-dag-scheduling-latency-in-production) you are referring to is from older airflow versions it was changed in 1.10.14 by https://github.com/apache/airflow/pull/12605 when `max_threads` is changed to `parsing_processes` After that it was refactored again for Airflow 2.0
   See updated [doc](https://airflow.apache.org/docs/apache-airflow/stable/faq.html#how-to-reduce-airflow-dag-scheduling-latency-in-production)
   
   
   Closing as the docs are updated.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] teastburn commented on issue #11266: Change FAQ documentation for max_threads

Posted by GitBox <gi...@apache.org>.
teastburn commented on issue #11266:
URL: https://github.com/apache/airflow/issues/11266#issuecomment-705823369


   `User should increase this value to a larger value that fits the size of the installed hardware in production.` doesn't seem to guide the user on what `fit` and `size` mean. CPU? RAM? It should be as specific as possible in my opinion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #11266: Change FAQ documentation for max_threads

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #11266:
URL: https://github.com/apache/airflow/issues/11266#issuecomment-703299030


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org