You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/29 21:56:40 UTC

[GitHub] [airflow] potiuk opened a new pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

potiuk opened a new pull request #18616:
URL: https://github.com/apache/airflow/pull/18616


   The query that deletes rendered old rendered task fields for MySQL
   can occasionally deadlock because it is unncesssary complex with
   a subquery (due to features missing in MySQL). This change
   adds DB retries to get rid of the deadlock (as is the
   recommended practice for MySQL).
   
   Fixes: #18512
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-930574529


   @ashb  @kaxil  I think this is the simplest way to handle it (I believe  the retry decorator is exactly for such cases). This is MySQL only so it has no impact for other databases. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-930581092


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#discussion_r719206339



##########
File path: airflow/models/renderedtifields.py
##########
@@ -191,3 +179,20 @@ def delete_old_records(
             ]
 
             session.query(cls).filter(and_(*filter_tis)).delete(synchronize_session=False)
+
+    @classmethod
+    @retry_db_transaction
+    def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):

Review comment:
       ```suggestion
       def _remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):
   ```

##########
File path: airflow/models/renderedtifields.py
##########
@@ -161,20 +162,7 @@ def delete_old_records(
                 tuple_(cls.dag_id, cls.task_id, cls.execution_date).notin_(subq1),
             ).delete(synchronize_session=False)
         elif session.bind.dialect.name in ["mysql"]:
-            # Fetch Top X records given dag_id & task_id ordered by Execution Date
-            subq1 = tis_to_keep_query.subquery('subq1')
-
-            # Second Subquery
-            # Workaround for MySQL Limitation (https://stackoverflow.com/a/19344141/5691525)
-            # Limitation: This version of MySQL does not yet support
-            # LIMIT & IN/ALL/ANY/SOME subquery
-            subq2 = session.query(subq1.c.dag_id, subq1.c.task_id, subq1.c.execution_date).subquery('subq2')
-
-            session.query(cls).filter(
-                cls.dag_id == dag_id,
-                cls.task_id == task_id,
-                tuple_(cls.dag_id, cls.task_id, cls.execution_date).notin_(subq2),
-            ).delete(synchronize_session=False)
+            cls.remove_old_rendered_ti_fields_mysql(dag_id, session, task_id, tis_to_keep_query)

Review comment:
       ```suggestion
               cls._remove_old_rendered_ti_fields_mysql(dag_id, session, task_id, tis_to_keep_query)
   ```

##########
File path: airflow/models/renderedtifields.py
##########
@@ -191,3 +179,20 @@ def delete_old_records(
             ]
 
             session.query(cls).filter(and_(*filter_tis)).delete(synchronize_session=False)
+
+    @classmethod
+    @retry_db_transaction
+    def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):
+        # Fetch Top X records given dag_id & task_id ordered by Execution Date
+        subq1 = tis_to_keep_query.subquery('subq1')
+        # Second Subquery
+        # Workaround for MySQL Limitation (https://stackoverflow.com/a/19344141/5691525)
+        # Limitation: This version of MySQL does not yet support
+        # LIMIT & IN/ALL/ANY/SOME subquery
+        subq2 = session.query(subq1.c.dag_id, subq1.c.task_id, subq1.c.execution_date).subquery('subq2')
+        # This query might deadlock occasionally and it should be retried if fails (see decorator)
+        session.query(cls).filter(
+            cls.dag_id == dag_id,
+            cls.task_id == task_id,
+            tuple_(cls.dag_id, cls.task_id, cls.execution_date).notin_(subq2),
+        ).delete(synchronize_session=False)

Review comment:
       I think this should have
   ```suggestion
           ).delete(synchronize_session=False)
           session.flush()
   ```
   
   Otherwise it _might_ not make it to the DB but be pending in the SQLA unit-of-work. (Not sure in this case, maybe `query.delete()` might issue the query directly)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jbarnettfreejazz commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
jbarnettfreejazz commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-933097644


   Thank you! We will look for that release. Much appreciated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#discussion_r719062774



##########
File path: airflow/models/renderedtifields.py
##########
@@ -191,3 +179,20 @@ def delete_old_records(
             ]
 
             session.query(cls).filter(and_(*filter_tis)).delete(synchronize_session=False)
+
+    @retry_db_transaction
+    @classmethod
+    def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):

Review comment:
       ```suggestion
       @classmethod
       @retry_db_transaction
       def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):
   ```
   We need to change order as this won't work:
   
   ```
    $ airflow db check
     Traceback (most recent call last):
       File "/usr/local/bin/airflow", line 33, in <module>
         sys.exit(load_entry_point('apache-airflow', 'console_scripts', 'airflow')())
       File "/opt/airflow/airflow/__main__.py", line 40, in main
         args.func(args)
       File "/opt/airflow/airflow/cli/cli_parser.py", line 47, in command
         func = import_string(import_path)
       File "/opt/airflow/airflow/utils/module_loading.py", line 32, in import_string
         module = import_module(module_path)
       File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
         return _bootstrap._gcd_import(name[level:], package, level)
       File "<frozen importlib._bootstrap>", line 994, in _gcd_import
       File "<frozen importlib._bootstrap>", line 971, in _find_and_load
       File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
       File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
       File "<frozen importlib._bootstrap_external>", line 678, in exec_module
       File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
       File "/opt/airflow/airflow/cli/commands/db_command.py", line 24, in <module>
         from airflow.utils import cli as cli_utils, db
       File "/opt/airflow/airflow/utils/db.py", line 27, in <module>
         from airflow.jobs.base_job import BaseJob  # noqa: F401
       File "/opt/airflow/airflow/jobs/__init__.py", line 19, in <module>
         import airflow.jobs.backfill_job
       File "/opt/airflow/airflow/jobs/backfill_job.py", line 29, in <module>
         from airflow import models
       File "/opt/airflow/airflow/models/__init__.py", line 30, in <module>
         from airflow.models.renderedtifields import RenderedTaskInstanceFields
       File "/opt/airflow/airflow/models/renderedtifields.py", line 36, in <module>
         class RenderedTaskInstanceFields(Base):
       File "/opt/airflow/airflow/models/renderedtifields.py", line 184, in RenderedTaskInstanceFields
         @classmethod
       File "/opt/airflow/airflow/utils/retries.py", line 96, in retry_db_transaction
         return retry_decorator(_func)
       File "/opt/airflow/airflow/utils/retries.py", line 55, in retry_decorator
         func_params = signature(func).parameters
       File "/usr/local/lib/python3.6/inspect.py", line 3065, in signature
         return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
       File "/usr/local/lib/python3.6/inspect.py", line 2815, in from_callable
         follow_wrapper_chains=follow_wrapped)
       File "/usr/local/lib/python3.6/inspect.py", line 2193, in _signature_from_callable
         raise TypeError('{!r} is not a callable object'.format(obj))
     TypeError: <classmethod object at 0x7fd198edf9e8> is not a callable object
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-930581092


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-930574529


   @ashb  @kaxil  I think this is the simplest way to handle it (I believe  the retry decorator is exactly for such cases). This is MySQL only so it has no impact for other databases. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-934951986


   Yup, it will be part of 2.2.0 to be released next week - RCs within next 2 days


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jbarnettfreejazz commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
jbarnettfreejazz commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-931527695


   Hi -- thanks for resolving this issue.  Will there be a patch release? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#discussion_r719180681



##########
File path: airflow/models/renderedtifields.py
##########
@@ -191,3 +179,20 @@ def delete_old_records(
             ]
 
             session.query(cls).filter(and_(*filter_tis)).delete(synchronize_session=False)
+
+    @retry_db_transaction
+    @classmethod
+    def remove_old_rendered_ti_fields_mysql(cls, dag_id, session, task_id, tis_to_keep_query):

Review comment:
       Yep. I noticed yesterday but was too late to fix :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #18616:
URL: https://github.com/apache/airflow/pull/18616


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18616: Retry deadlocked transactions on deleting old rendered task fields

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18616:
URL: https://github.com/apache/airflow/pull/18616#issuecomment-932963197


   > Hi -- thanks for resolving this issue. Will there be a patch release?
   
   We are going to release it in 2.2.0 Airflow release I believe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org