You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/12 15:52:03 UTC

[GitHub] [airflow] anthonyp97 opened a new pull request, #26347: Retry on Airflow Schedule DAG Run DB Deadlock

anthonyp97 opened a new pull request, #26347:
URL: https://github.com/apache/airflow/pull/26347

   - This resolves the following issue: https://github.com/apache/airflow/issues/25765
   - We have tested this on our most recent deployment and confirmed that we are retrying these deadlocks gracefully now with this change and the main scheduler loop no longer breaks with these deadlocks.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #26347:
URL: https://github.com/apache/airflow/pull/26347#issuecomment-1259949614

   I also thing @ashb should take a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #26347:
URL: https://github.com/apache/airflow/pull/26347


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #26347:
URL: https://github.com/apache/airflow/pull/26347#issuecomment-1259949017

   I think we are just about to release 2.4.1 I think it looks good to me, but we have **just** merged 2.4.1 branch to release it. I think it will go to 2.4.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] anthonyp97 commented on pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
anthonyp97 commented on PR #26347:
URL: https://github.com/apache/airflow/pull/26347#issuecomment-1259817997

   @potiuk @ashb just following up here, do you think this is good to merge now? Just hoping this can get in to the 2.4.1 release whenever that is planned to be, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #26347:
URL: https://github.com/apache/airflow/pull/26347#issuecomment-1243941902

   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] blag commented on a diff in pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
blag commented on code in PR #26347:
URL: https://github.com/apache/airflow/pull/26347#discussion_r981810841


##########
airflow/jobs/scheduler_job.py:
##########
@@ -1227,6 +1222,19 @@ def _update_state(dag: DAG, dag_run: DagRun):
                 active_runs_of_dags[dag_run.dag_id] += 1
                 _update_state(dag, dag_run)
 
+    @retry_db_transaction
+    def _schedule_all_dag_runs(self, guard, dag_runs, session):
+        """Makes scheduling decisions for all `dag_runs`"""
+        callback_tuples = []
+        callback_to_run = None
+        for dag_run in dag_runs:
+            callback_to_run = self._schedule_dag_run(dag_run, session)
+            callback_tuples.append((dag_run, callback_to_run))
+
+        guard.commit()
+
+        return callback_tuples, callback_to_run

Review Comment:
   I don't strongly prefer a dict, that's just what makes more sense to me. But yeah, if a list works then go with that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #26347:
URL: https://github.com/apache/airflow/pull/26347#issuecomment-1250393684

   You also need to trebase @anthonyp97 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] blag commented on a diff in pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
blag commented on code in PR #26347:
URL: https://github.com/apache/airflow/pull/26347#discussion_r974652668


##########
airflow/jobs/scheduler_job.py:
##########
@@ -1227,6 +1222,19 @@ def _update_state(dag: DAG, dag_run: DagRun):
                 active_runs_of_dags[dag_run.dag_id] += 1
                 _update_state(dag, dag_run)
 
+    @retry_db_transaction
+    def _schedule_all_dag_runs(self, guard, dag_runs, session):
+        """Makes scheduling decisions for all `dag_runs`"""
+        callback_tuples = []
+        callback_to_run = None
+        for dag_run in dag_runs:
+            callback_to_run = self._schedule_dag_run(dag_run, session)
+            callback_tuples.append((dag_run, callback_to_run))
+
+        guard.commit()
+
+        return callback_tuples, callback_to_run

Review Comment:
   Is there a reason you're including `callback_to_run` in the return statement here?? It seems cleaner to just `return callback_tuples`, especially because the last `callback_to_run` is going to be included in the last tuple of `callback_tuples` anyway.
   
   The next line of code overwrites `callback_to_run` as well, so it just seems unnecessary to pass it back:
   
   ```python
               callback_tuples, callback_to_run = self._schedule_all_dag_runs(guard, dag_runs, session)
   
           # ...
           for dag_run, callback_to_run in callback_tuples:  # <-- callback_to_run overwritten immediately
   ```
   
   Also, would a dictionary be a better option to map dag runs to callbacks?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] anthonyp97 commented on a diff in pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
anthonyp97 commented on code in PR #26347:
URL: https://github.com/apache/airflow/pull/26347#discussion_r975723281


##########
airflow/jobs/scheduler_job.py:
##########
@@ -1227,6 +1222,19 @@ def _update_state(dag: DAG, dag_run: DagRun):
                 active_runs_of_dags[dag_run.dag_id] += 1
                 _update_state(dag, dag_run)
 
+    @retry_db_transaction
+    def _schedule_all_dag_runs(self, guard, dag_runs, session):
+        """Makes scheduling decisions for all `dag_runs`"""
+        callback_tuples = []
+        callback_to_run = None
+        for dag_run in dag_runs:
+            callback_to_run = self._schedule_dag_run(dag_run, session)
+            callback_tuples.append((dag_run, callback_to_run))
+
+        guard.commit()
+
+        return callback_tuples, callback_to_run

Review Comment:
   @blag yea definitely right that I should just be returning `callback_tuples`, will update this. For the second point I imagine it shouldn't really matter between list of tuples and a dict since we are just iterating through the data structure either way element by element but if you strongly prefer a dict I can update to use that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on pull request #26347: Retry on Airflow Schedule DAG Run DB Deadlock

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on PR #26347:
URL: https://github.com/apache/airflow/pull/26347#issuecomment-1264540989

   Awesome work, congrats on your first merged pull request!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org