You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/29 07:39:31 UTC

[GitHub] [airflow] mik-laj commented on issue #13941: The PgBouncer configuration is not described in the documentation

mik-laj commented on issue #13941:
URL: https://github.com/apache/airflow/issues/13941#issuecomment-769636502


   The following description seems to me to be part of this documentation, but we should verify it.
   
   > We ensure isolation at the process level and each process opens a new connection so these components have many open connections. The new processes also allow us to circumvent GIL limitations, ie the problems with multi-thread handling in Python.
   >
   > - **Scheduler** processes files in a loop. For each file, we create a new process.  The number of files processed simultaneously is controlled by scheduler,max_threads (Airflow 1.10), scheduler.parsing_process (Airflow 2.0).  We recommend setting this option to CPU Count-1.  Additionally, the main scheduler loop has an open connection as well. Managing the processing of files takes place in a separate process/loop, which creates another connection. This means we already have `[processing_process] +2` open connections at the same time.
   > - The main **webserver** process creates many gunicorn workers. The number of processes is controlled by the webserver.gunicorn options. In Airflow 1.10, each worker opened 2 connections to the database, but in Airflow 2.0, I fixed this and now each process opens only one connection. By default, we start 4 workers.
   > - **Worker** processes handle multiple tasks, and for each task, three processes and 2 connections are created. The number of tasks per worker is configurable by the `core.parrallelism` options.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org