You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/17 12:42:11 UTC

[GitHub] [airflow] vineethguna opened a new issue #14850: Airflow workers causing 100% CPU on Postgresql Database

vineethguna opened a new issue #14850:
URL: https://github.com/apache/airflow/issues/14850


   ### Issue Description
   
   While running 100 parallel tasks using airflow workers with PostgreSQL as metadata database, the CPU on PostgreSQL is hitting 100% consistently even though the database is provisioned for 16 cores.
   
   Due to the above bottleneck on PostgreSQL the time taken to execute the tasks is increasing proportionally to the number of parallel tasks getting executed
   
   If airflow workers only run 1 task it takes 15 seconds to complete the task
   If airflow workers are running 100 similar tasks parallelly the task execution time is increasing to 270 seconds on average
   
   The is no CPU, memory bottlenecks on airflow workers
   
   ### Airflow Setup Used
   
   **Airflow Version:** 1.10.12
   **PostgreSQL version:** 10
   **Executor:** Celery Executor
   **Broker:** Redis
   **Result Backend:** Redis
   **Worker Concurrency:** 25
   **Number of workers:** 4
   **Airflow Configuration**:
   AIRFLOW__CELERY__WORKER_CONCURRENCY: 25
   AIRFLOW__CORE__PARALLELISM: 100
   AIRFLOW__CORE__DAG_CONCURRENCY: 100
   AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG: 100
   AIRFLOW__CELERY__SYNC_PARALLELISM: 5
   
   ### Steps to Reproduce
   - Use the above airflow setup to launch web server, scheduler, workers
   - Create a simple DAG with python operator which prints "hello world"
   - Trigger 100 DAG runs
   -  Observe the CPU on PostgreSQL database
   
   ### Observations
   All airflow workers use NullPool to execute queries on PostgreSQL, i.e. it opens a connection, executes the query, closes the connection
   The above lifecycle happens for each and every query, upon inspecting the query metrics on PostgreSQL there was no latency issues with the query execution, but the PostgreSQL CPU is getting consumed for handling the connects and disconnects from the airflow workers


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #14850: Airflow workers causing 100% CPU on Postgresql Database

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #14850:
URL: https://github.com/apache/airflow/issues/14850#issuecomment-801050704


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vineethguna closed issue #14850: Airflow workers causing 100% CPU on PostgreSQL Database

Posted by GitBox <gi...@apache.org>.
vineethguna closed issue #14850:
URL: https://github.com/apache/airflow/issues/14850


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org