You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/02 20:19:26 UTC

[GitHub] [airflow] matthew-kruse opened a new issue, #24124: hung database transactions on xcom table

matthew-kruse opened a new issue, #24124:
URL: https://github.com/apache/airflow/issues/24124

   ### Apache Airflow version
   
   2.2.5
   
   ### What happened
   
   When the xcom table becomes large deletes like DELETE FROM xcom WHERE xcom.dag_id = %s AND xcom.task_id = %s AND xcom.execution_date = %s] have caused mysql transactions to hang indefinitely.  
   
   mysql> explain DELETE FROM xcom WHERE xcom.dag_id = '<dag id>' AND xcom.task_id = '<task id>' AND xcom.execution_date = '2022-06-02 18:52:00';
   +----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------------+
   | id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref         | rows | filtered | Extra       |
   +----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------------+
   |  1 | DELETE      | xcom  | NULL       | range | PRIMARY       | PRIMARY | 1504    | const,const | 2590 |   100.00 | Using where |
   +----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------------+
   
   
   
   ### What you think should happen instead
   
   We manually fixed this by adding a new index to the xcom table designed to increase performance on this delete.
   
   ALTER TABLE xcom ADD INDEX xcom_delete_idx (dag_id, task_id, execution_date)
   
   After we added this index the deletes would typically scan 1 record to delete one record instead of having to scan 2500 records to delete 1 record.  
   
   ### How to reproduce
   
   Mileage will vary depending on how much activity your airflow instance gets, but as you scale up dag runs you'd tend to get big enough record counts to potentially reproduce this performance slowdown.  In our case, our xcom table has 500k entries in it when we first started to see this and can have in the millions of records.  We typically blow away all data in our airflow database several times a year as airflow's ui will stop functioning all around when record counts in various tables get to large.
   
   ### Operating System
   
   ubuntu
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow[mysql]==2.2.5
   apache-airflow-providers-http==2.1.2
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   Nothing particularly special, we run airflow in a docker container.  MySQL is run on dedicated gear not on the same machine that airflow resides on.
   
   ### Anything else
   
   This happens only as record counts in the various airflow tables get higher.
   
   We've had to make a number of schema adjustments for performance reasons due to locking/failure of mysql to detect deadlocks properly.  The #1 bottleneck for us on airflow is the database and due to our scale we've had to be aggressive about modifying airflow indexes to keep things going.  If the airflow devs want help with database performance I'm happy to assist.
   
   I am willing to submit a PR for this and other things performance related to the backing database if someone was willing to walk me through how to do that.  I assume you are using some sort of database abstraction layer due to supporting multiple database backends.  I've got 10+ years as a dba supporting mysql, postgres, and mongo.  I've even used sqlite for a few projects so if you are interested in a database schema review and performance evaluation I'm happy to assist as it will ultimately make my airflow project better as well.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #24124: hung database transactions on xcom table

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #24124:
URL: https://github.com/apache/airflow/issues/24124#issuecomment-1145406959

   In aiflow 2.2 you should use run_id as index not execution date. This has changed in Airlfow 2.2 as part of [AIP-39](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-39+Richer+scheduler_interval). Possibly you missed that in adjusting your queries. 
   
   Also In airflow 2.3.0 xcom and other tables were still heavily updated to include map_index.
   
   I propose you test it with the right selection query (using run_id) and also make sure that you adjust your DB queries when you migrate to latest Airflow 2.3 (2.3.2 is likely out tomorrow and test it there - DB in airflow is pretty much internal detail, so if you use any db queries, they might need adjustments.
   
   Also you might be happy to learn that in Airflow 2.3 there is the new "airflow db clean" command that you can likely make use of to replace your manual database retention. https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#clean. This might solve your "maintenance" issue for custom queries, because it will be always using the right queries to clean old data and you can implement your own retention schedule and criteria.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #24124: hung database transactions on xcom table

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #24124: hung database transactions on xcom table
URL: https://github.com/apache/airflow/issues/24124


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #24124: hung database transactions on xcom table

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #24124:
URL: https://github.com/apache/airflow/issues/24124#issuecomment-1145303113

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org