You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/29 21:56:40 UTC
[GitHub] [airflow] potiuk opened a new pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
potiuk opened a new pull request #18616:
URL: https://github.com/apache/airflow/pull/18616
The query that deletes rendered old rendered task fields for MySQL
can occasionally deadlock because it is unncesssary complex with
a subquery (due to features missing in MySQL). This change
adds DB retries to get rid of the deadlock (as is the
recommended practice for MySQL).
Fixes: #18512
<!--
Thank you for contributing! Please make sure that your code changes
are covered with tests. And in case of new features or big changes
remember to adjust the documentation.
Feel free to ping committers for the review!
In case of existing issue, reference it using one of the following:
closes: #ISSUE
related: #ISSUE
How to write a good git commit message:
http://chris.beams.io/posts/git-commit/
-->
---
**^ Add meaningful description above**
Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-930574529
@ashb @kaxil I think this is the simplest way to handle it (I believe the retry decorator is exactly for such cases). This is MySQL only so it has no impact for other databases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-930581092
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] ashb commented on a change in pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#discussion_r719206339
##########
File path: airflow/models/renderedtifields.py
##########
@@ -191,3 +179,20 @@ def delete_old_records(
]
session.query(cls).filter(and_(*filter_tis)).delete(synchronize_session=False)
+
+ @classmethod
+ @retry_db_transaction
+ def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):
Review comment:
```suggestion
def _remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):
```
##########
File path: airflow/models/renderedtifields.py
##########
@@ -161,20 +162,7 @@ def delete_old_records(
tuple_(cls.dag_id, cls.task_id, cls.execution_date).notin_(subq1),
).delete(synchronize_session=False)
elif session.bind.dialect.name in ["mysql"]:
- # Fetch Top X records given dag_id & task_id ordered by Execution Date
- subq1 = tis_to_keep_query.subquery('subq1')
-
- # Second Subquery
- # Workaround for MySQL Limitation (https://stackoverflow.com/a/19344141/5691525)
- # Limitation: This version of MySQL does not yet support
- # LIMIT & IN/ALL/ANY/SOME subquery
- subq2 = session.query(subq1.c.dag_id, subq1.c.task_id, subq1.c.execution_date).subquery('subq2')
-
- session.query(cls).filter(
- cls.dag_id == dag_id,
- cls.task_id == task_id,
- tuple_(cls.dag_id, cls.task_id, cls.execution_date).notin_(subq2),
- ).delete(synchronize_session=False)
+ cls.remove_old_rendered_ti_fields_mysql(dag_id, session, task_id, tis_to_keep_query)
Review comment:
```suggestion
cls._remove_old_rendered_ti_fields_mysql(dag_id, session, task_id, tis_to_keep_query)
```
##########
File path: airflow/models/renderedtifields.py
##########
@@ -191,3 +179,20 @@ def delete_old_records(
]
session.query(cls).filter(and_(*filter_tis)).delete(synchronize_session=False)
+
+ @classmethod
+ @retry_db_transaction
+ def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):
+ # Fetch Top X records given dag_id & task_id ordered by Execution Date
+ subq1 = tis_to_keep_query.subquery('subq1')
+ # Second Subquery
+ # Workaround for MySQL Limitation (https://stackoverflow.com/a/19344141/5691525)
+ # Limitation: This version of MySQL does not yet support
+ # LIMIT & IN/ALL/ANY/SOME subquery
+ subq2 = session.query(subq1.c.dag_id, subq1.c.task_id, subq1.c.execution_date).subquery('subq2')
+ # This query might deadlock occasionally and it should be retried if fails (see decorator)
+ session.query(cls).filter(
+ cls.dag_id == dag_id,
+ cls.task_id == task_id,
+ tuple_(cls.dag_id, cls.task_id, cls.execution_date).notin_(subq2),
+ ).delete(synchronize_session=False)
Review comment:
I think this should have
```suggestion
).delete(synchronize_session=False)
session.flush()
```
Otherwise it _might_ not make it to the DB but be pending in the SQLA unit-of-work. (Not sure in this case, maybe `query.delete()` might issue the query directly)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jbarnettfreejazz commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
jbarnettfreejazz commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-933097644
Thank you! We will look for that release. Much appreciated!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on a change in pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#discussion_r719062774
##########
File path: airflow/models/renderedtifields.py
##########
@@ -191,3 +179,20 @@ def delete_old_records(
]
session.query(cls).filter(and_(*filter_tis)).delete(synchronize_session=False)
+
+ @retry_db_transaction
+ @classmethod
+ def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):
Review comment:
```suggestion
@classmethod
@retry_db_transaction
def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):
```
We need to change order as this won't work:
```
$ airflow db check
Traceback (most recent call last):
File "/usr/local/bin/airflow", line 33, in <module>
sys.exit(load_entry_point('apache-airflow', 'console_scripts', 'airflow')())
File "/opt/airflow/airflow/__main__.py", line 40, in main
args.func(args)
File "/opt/airflow/airflow/cli/cli_parser.py", line 47, in command
func = import_string(import_path)
File "/opt/airflow/airflow/utils/module_loading.py", line 32, in import_string
module = import_module(module_path)
File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/opt/airflow/airflow/cli/commands/db_command.py", line 24, in <module>
from airflow.utils import cli as cli_utils, db
File "/opt/airflow/airflow/utils/db.py", line 27, in <module>
from airflow.jobs.base_job import BaseJob # noqa: F401
File "/opt/airflow/airflow/jobs/__init__.py", line 19, in <module>
import airflow.jobs.backfill_job
File "/opt/airflow/airflow/jobs/backfill_job.py", line 29, in <module>
from airflow import models
File "/opt/airflow/airflow/models/__init__.py", line 30, in <module>
from airflow.models.renderedtifields import RenderedTaskInstanceFields
File "/opt/airflow/airflow/models/renderedtifields.py", line 36, in <module>
class RenderedTaskInstanceFields(Base):
File "/opt/airflow/airflow/models/renderedtifields.py", line 184, in RenderedTaskInstanceFields
@classmethod
File "/opt/airflow/airflow/utils/retries.py", line 96, in retry_db_transaction
return retry_decorator(_func)
File "/opt/airflow/airflow/utils/retries.py", line 55, in retry_decorator
func_params = signature(func).parameters
File "/usr/local/lib/python3.6/inspect.py", line 3065, in signature
return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
File "/usr/local/lib/python3.6/inspect.py", line 2815, in from_callable
follow_wrapper_chains=follow_wrapped)
File "/usr/local/lib/python3.6/inspect.py", line 2193, in _signature_from_callable
raise TypeError('{!r} is not a callable object'.format(obj))
TypeError: <classmethod object at 0x7fd198edf9e8> is not a callable object
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] github-actions[bot] commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-930581092
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-930574529
@ashb @kaxil I think this is the simplest way to handle it (I believe the retry decorator is exactly for such cases). This is MySQL only so it has no impact for other databases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-934951986
Yup, it will be part of 2.2.0 to be released next week - RCs within next 2 days
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] jbarnettfreejazz commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
jbarnettfreejazz commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-931527695
Hi -- thanks for resolving this issue. Will there be a patch release?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on a change in pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#discussion_r719180681
##########
File path: airflow/models/renderedtifields.py
##########
@@ -191,3 +179,20 @@ def delete_old_records(
]
session.query(cls).filter(and_(*filter_tis)).delete(synchronize_session=False)
+
+ @retry_db_transaction
+ @classmethod
+ def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):
Review comment:
Yep. I noticed yesterday but was too late to fix :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil merged pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #18616:
URL: https://github.com/apache/airflow/pull/18616
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields
Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-932963197
> Hi -- thanks for resolving this issue. Will there be a patch release?
We are going to release it in 2.2.0 Airflow release I believe.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org