You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/17 01:11:37 UTC

[GitHub] [airflow] asd855280 opened a new issue, #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)

asd855280 opened a new issue, #25750:
URL: https://github.com/apache/airflow/issues/25750

   ### Apache Airflow version
   
   Other Airflow 2 version
   
   ### What happened
   
   We are seeing airflow DAG reported failed occasionally due to table lock timeout while delete from xcom table
   
    Exception:
   (pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction')
   [SQL: DELETE FROM xcom WHERE xcom.dag_id = %(dag_id_1)s AND xcom.task_id = %(task_id_1)s AND xcom.execution_date = %(execution_date_1)s]
   [parameters: {'dag_id_1': 'XXXXXXX', 'task_id_1': 'XXXXXXXX', 'execution_date_1': datetime.datetime(2022, 8, 16, 0, 0)}]
   
   
   ### What you think should happen instead
   
   Possibly extended period of table transaction before the deleting from xcom cause the lock timeout exceeded issue
   
   
   ### How to reproduce
   
   We have 6 instances running
   Using celery worker mode
   
   node1: metadata db(MySQL 8), message queue(redis 6), webserver, celery flower and worker1
   
   node2: worker2
   
   node3: scheduler 1
   
   node4: scheduler 2
   
   node5: worker3
   
   node6: worker4
   
   basic configuration:
   
   max_dagruns_to_create_per_loop = 32
   max_dagruns_per_loop_to_schedule = 32
   use_row_level_locking = true
   
   worker_autoscale = 256,16
   
   
   
   ### Operating System
   
   NAME="Rocky Linux" VERSION="8.6 (Green Obsidian)" ID="rocky" ID_LIKE="rhel centos fedora" VERSION_ID="8.6" PLATFORM_ID="platform:el8" PRETTY_NAME="Rocky Linux 8.6 (Green Obsidian)" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:rocky:rocky:8:GA" HOME_URL="https://rockylinux.org/" BUG_REPORT_URL="https://bugs.rockylinux.org/" ROCKY_SUPPORT_PRODUCT="Rocky Linux" ROCKY_SUPPORT_PRODUCT_VERSION="8" REDHAT_SUPPORT_PRODUCT="Rocky Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8"
   
   ### Versions of Apache Airflow Providers
   
   airflow 2.1.0
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] asd855280 commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)

Posted by GitBox <gi...@apache.org>.
asd855280 commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1269959595

   > I think this is some of the cleanup which is run by you. Airflow never deletes Xcom on its own. Likely you are using DAG cleanups from a 3rd-party. I recommend to switch to latest Airflow version and rather than relying on 3rd-party cleanups use periodically run `airflow db clean` CLI that was added in 2.3.3.
   
   Hi thanks, and sorry for the late reply, 
   We are not using any 3rd-party cleanup tool, we installed pure open source airflow 2.1.0 version.
   
   We found out that the xcom table in metadata db did not consist of any index, we were wondering would it be possible that when multiple workers and schedulers are manipulating data on xcom table, the execution takes longer because of lacking of index.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)
URL: https://github.com/apache/airflow/issues/25750


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1217342897

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1437514896

   > seems like this deletion process is fired for each running task. i couldn't find in the codebase from which process this query comes from, could you advice here?
   
   Open a new issue and describe details there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1228934031

   I think this is some of the cleanup which is run by you. Airflow never deletes Xcom on its own. Likely you are using DAG cleanups from a 3rd-party. I recommend to switch to latest Airflow version and rather than relying on 3rd-party cleanups use periodically run `airflow db clean` CLI that was added in 2.3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1270439818

   You  should migrate to latest verion @asd855280 . Airflow 2.3.0 - https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#xcom-now-defined-by-run-id-instead-of-execution-date-20975 added index to Xcom. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] asd855280 commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)

Posted by GitBox <gi...@apache.org>.
asd855280 commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1270749941

   @potiuk Thank you for the note.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] zambadruzaman commented on issue #25750: Metadata DB lock wait timeout exceeded when delete from xcom table(airflow 2.1.0)

Posted by "zambadruzaman (via GitHub)" <gi...@apache.org>.
zambadruzaman commented on issue #25750:
URL: https://github.com/apache/airflow/issues/25750#issuecomment-1434132239

   Hi @potiuk i am using Airlfow 2.3.2 and face the exact same issue, the below query is fired lots of time, and since the xcom table grows a lot in my case teh query execution time is getting higher (~18 sec) at the moment. In the xcom table i don't see a composite index for these where clause combination, so the query always do the sequential scan in Postgres :
   `DELETE
   FROM
     xcom
   WHERE
     xcom.dag_id = $1
     AND xcom.task_id = $2
     AND xcom.run_id = $3`
   
   seems like this deletion process is fired for each running task. i couldn't find in the codebase from which process this query comes from, could you advice here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org