You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/17 01:11:37 UTC
[GitHub] [airflow] asd855280 opened a new issue, #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
asd855280 opened a new issue, #25750:
URL: https://github.com/apache/airflow/issues/25750
### Apache Airflow version
Other Airflow 2 version
### What happened
We are seeing airflow DAG reported failed occasionally due to table lock timeout while delete from xcom table
Exception:
(pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction')
[SQL: DELETE FROM xcom WHERE xcom.dag_id = %(dag_id_1)s AND xcom.task_id = %(task_id_1)s AND xcom.execution_date = %(execution_date_1)s]
[parameters: {'dag_id_1': 'XXXXXXX', 'task_id_1': 'XXXXXXXX', 'execution_date_1': datetime.datetime(2022, 8, 16, 0, 0)}]
### What you think should happen instead
Possibly extended period of table transaction before the deleting from xcom cause the lock timeout exceeded issue
### How to reproduce
We have 6 instances running
Using celery worker mode
node1: metadata db(MySQL 8), message queue(redis 6), webserver, celery flower and worker1
node2: worker2
node3: scheduler 1
node4: scheduler 2
node5: worker3
node6: worker4
basic configuration:
max_dagruns_to_create_per_loop = 32
max_dagruns_per_loop_to_schedule = 32
use_row_level_locking = true
worker_autoscale = 256,16
### Operating System
NAME="Rocky Linux" VERSION="8.6 (Green Obsidian)" ID="rocky" ID_LIKE="rhel centos fedora" VERSION_ID="8.6" PLATFORM_ID="platform:el8" PRETTY_NAME="Rocky Linux 8.6 (Green Obsidian)" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:rocky:rocky:8:GA" HOME_URL="https://rockylinux.org/" BUG_REPORT_URL="https://bugs.rockylinux.org/" ROCKY_SUPPORT_PRODUCT="Rocky Linux" ROCKY_SUPPORT_PRODUCT_VERSION="8" REDHAT_SUPPORT_PRODUCT="Rocky Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8"
### Versions of Apache Airflow Providers
airflow 2.1.0
### Deployment
Virtualenv installation
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] asd855280 commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
Posted by GitBox <gi...@apache.org>.
asd855280 commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1269959595
> I think this is some of the cleanup which is run by you. Airflow never deletes Xcom on its own. Likely you are using DAG cleanups from a 3rd-party. I recommend to switch to latest Airflow version and rather than relying on 3rd-party cleanups use periodically run `airflow db clean` CLI that was added in 2.3.3.
Hi thanks, and sorry for the late reply,
We are not using any 3rd-party cleanup tool, we installed pure open source airflow 2.1.0 version.
We found out that the xcom table in metadata db did not consist of any index, we were wondering would it be possible that when multiple workers and schedulers are manipulating data on xcom table, the execution takes longer because of lacking of index.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
URL: https://github.com/apache/airflow/issues/25750
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1217342897
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1437514896
> seems like this deletion process is fired for each running task. i couldn't find in the codebase from which process this query comes from, could you advice here?
Open a new issue and describe details there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1228934031
I think this is some of the cleanup which is run by you. Airflow never deletes Xcom on its own. Likely you are using DAG cleanups from a 3rd-party. I recommend to switch to latest Airflow version and rather than relying on 3rd-party cleanups use periodically run `airflow db clean` CLI that was added in 2.3.3.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1270439818
You should migrate to latest verion @asd855280 . Airflow 2.3.0 - https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#xcom-now-defined-by-run-id-instead-of-execution-date-20975 added index to Xcom.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] asd855280 commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
Posted by GitBox <gi...@apache.org>.
asd855280 commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1270749941
@potiuk Thank you for the note.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] zambadruzaman commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
Posted by "zambadruzaman (via GitHub)" <gi...@apache.org>.
zambadruzaman commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1434132239
Hi @potiuk i am using Airlfow 2.3.2 and face the exact same issue, the below query is fired lots of time, and since the xcom table grows a lot in my case teh query execution time is getting higher (~18 sec) at the moment. In the xcom table i don't see a composite index for these where clause combination, so the query always do the sequential scan in Postgres :
`DELETE
FROM
xcom
WHERE
xcom.dag_id = $1
AND xcom.task_id = $2
AND xcom.run_id = $3`
seems like this deletion process is fired for each running task. i couldn't find in the codebase from which process this query comes from, could you advice here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org