You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jiaweigoh (via GitHub)" <gi...@apache.org> on 2023/02/22 11:34:58 UTC

[GitHub] [airflow] jiaweigoh opened a new issue, #29690: Airflow DB query takes 70% of DB cpu

jiaweigoh opened a new issue, #29690:
URL: https://github.com/apache/airflow/issues/29690

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   Hi guys, we noticed that our Airflow DB (GCP CloudSQL Postgres) is often at 100%, with 70% consumed by a particular query. Checked via pg_stat_statements, Google's Query Insight tells us the same thing.
   
   We are using the most basic set-up, 1 vcpu core, 3.75gb ram.
   
   While we figure out what Airflow configs to tune, and whether to scale up our DB instance, i wanted to figure out what this query achieves. I checked that the filter conditions are indexed in Airflow DB, so its shocking for a single query type to consume so much CPU.
   
   ```
   SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.run_id AS task_instance_run_id, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.pool_slots AS task_instance_pool_slots, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.queued_by_job_id AS task_instance_queued_by_job_id, task_instance.pid AS t
 ask_instance_pid, task_instance.executor_config AS task_instance_executor_config, task_instance.external_executor_id AS task_instance_external_executor_id, task_instance.trigger_id AS task_instance_trigger_id, task_instance.trigger_timeout AS task_instance_trigger_timeout, task_instance.next_method AS task_instance_next_method, task_instance.next_kwargs AS task_instance_next_kwargs, dag_run_1.state AS dag_run_1_state, dag_run_1.id AS dag_run_1_id, dag_run_1.dag_id AS dag_run_1_dag_id, dag_run_1.queued_at AS dag_run_1_queued_at, dag_run_1.execution_date AS dag_run_1_execution_date, dag_run_1.start_date AS dag_run_1_start_date, dag_run_1.end_date AS dag_run_1_end_date, dag_run_1.run_id AS dag_run_1_run_id, dag_run_1.creating_job_id AS dag_run_1_creating_job_id, dag_run_1.external_trigger AS dag_run_1_external_trigger, dag_run_1.run_type AS dag_run_1_run_type, dag_run_1.conf AS dag_run_1_conf, dag_run_1.data_interval_start AS dag_run_1_data_interval_start, dag_run_1.data_interval_end A
 S dag_run_1_data_interval_end, dag_run_1.last_scheduling_decision AS dag_run_1_last_scheduling_decision, dag_run_1.dag_hash AS dag_run_1_dag_hash 
   FROM task_instance JOIN dag_run AS dag_run_1 ON dag_run_1.dag_id = task_instance.dag_id AND dag_run_1.run_id = task_instance.run_id 
   WHERE task_instance.dag_id = ? AND task_instance.task_id = ? AND task_instance.run_id = ? 
    LIMIT ? FOR UPDATE
   ```
   
   Thanks!
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   NA
   
   ### Operating System
   
   Ubuntu
   
   ### Versions of Apache Airflow Providers
   
   Airflow version: 2.2.3
   Postgres: 9.6
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #29690: Airflow DB query takes 70% of DB cpu

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #29690: Airflow DB query takes 70% of DB cpu
URL: https://github.com/apache/airflow/issues/29690


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #29690: Airflow DB query takes 70% of DB cpu

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #29690:
URL: https://github.com/apache/airflow/issues/29690#issuecomment-1439869611

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org