You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/26 23:03:02 UTC

[GitHub] [airflow] kaxil opened a new pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

kaxil opened a new pull request #13920:
URL: https://github.com/apache/airflow/pull/13920


   closes https://github.com/apache/airflow/issues/13685
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on pull request #13920:
URL: https://github.com/apache/airflow/pull/13920#issuecomment-768390805


   > I think creating but catching the integrity error is better - otherwise it's still prone to a race condition.
   > 
   > (With the locking it shouldn't go on under normal circumstances, but it's More Correct to catch the integrity error instead and protects against more cases)
   
   I actually did start with that, but the problem is the session needs to be rollback'd all the way if we get an Integrity error and we need to update`next_dagrun` column in `_update_dag_next_dagruns`.
   
   Added the explanation in the code comments


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13920:
URL: https://github.com/apache/airflow/pull/13920#issuecomment-768649207


   [The Workflow run](https://github.com/apache/airflow/actions/runs/516470496) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jhtimmins commented on a change in pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

Posted by GitBox <gi...@apache.org>.
jhtimmins commented on a change in pull request #13920:
URL: https://github.com/apache/airflow/pull/13920#discussion_r567048967



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1571,16 +1585,25 @@ def _create_dag_runs(self, dag_models: Iterable[DagModel], session: Session) ->
                 continue
 
             dag_hash = self.dagbag.dags_hash.get(dag.dag_id)
-            dag.create_dagrun(
-                run_type=DagRunType.SCHEDULED,
-                execution_date=dag_model.next_dagrun,
-                start_date=timezone.utcnow(),
-                state=State.RUNNING,
-                external_trigger=False,
-                session=session,
-                dag_hash=dag_hash,
-                creating_job_id=self.id,
-            )
+            # Explicitly check if the DagRun already exists. This is an edge case
+            # where a Dag Run is created but `DagModel.next_dagrun` and `DagModel.next_dagrun_create_after`
+            # are not updated.
+            # We opted to check DagRun existence instead
+            # of catching an Integrity error and rolling back the session i.e
+            # we need to run self._update_dag_next_dagruns if the Dag Run already exists or if we
+            # create a new one. This is so that in the next Scheduling loop we try to create new runs
+            # instead of falling in a loop of Integrity Error.
+            if (dag.dag_id, dag_model.next_dagrun) not in active_dagruns:

Review comment:
       @kaxil If `dag_model.next_dagrun` hasn't been updated, why are you using its value to look for active dagruns? Won't it be an empty value in this case?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13920:
URL: https://github.com/apache/airflow/pull/13920#issuecomment-769155936


   @ephraimbuddy @jhtimmins Can you take a look and review too please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #13920:
URL: https://github.com/apache/airflow/pull/13920#discussion_r567055461



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -1571,16 +1585,25 @@ def _create_dag_runs(self, dag_models: Iterable[DagModel], session: Session) ->
                 continue
 
             dag_hash = self.dagbag.dags_hash.get(dag.dag_id)
-            dag.create_dagrun(
-                run_type=DagRunType.SCHEDULED,
-                execution_date=dag_model.next_dagrun,
-                start_date=timezone.utcnow(),
-                state=State.RUNNING,
-                external_trigger=False,
-                session=session,
-                dag_hash=dag_hash,
-                creating_job_id=self.id,
-            )
+            # Explicitly check if the DagRun already exists. This is an edge case
+            # where a Dag Run is created but `DagModel.next_dagrun` and `DagModel.next_dagrun_create_after`
+            # are not updated.
+            # We opted to check DagRun existence instead
+            # of catching an Integrity error and rolling back the session i.e
+            # we need to run self._update_dag_next_dagruns if the Dag Run already exists or if we
+            # create a new one. This is so that in the next Scheduling loop we try to create new runs
+            # instead of falling in a loop of Integrity Error.
+            if (dag.dag_id, dag_model.next_dagrun) not in active_dagruns:

Review comment:
       `dag_model.next_dagrun` isn't updated so it will still have the old value which is what we want to check -- that we don't try to recreate the same dag run otherwise it will error with Unique Constraint Violation error




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #13920:
URL: https://github.com/apache/airflow/pull/13920


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #13920:
URL: https://github.com/apache/airflow/pull/13920#issuecomment-768549269


   [The Workflow run](https://github.com/apache/airflow/actions/runs/516107685) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #13920:
URL: https://github.com/apache/airflow/pull/13920#issuecomment-768390805


   > I think creating but catching the integrity error is better - otherwise it's still prone to a race condition.
   > 
   > (With the locking it shouldn't go on under normal circumstances, but it's More Correct to catch the integrity error instead and protects against more cases)
   
   I actually did start with that, but the problem is the session needs to be rollback'd all the way if we get an Integrity error


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil edited a comment on pull request #13920: Bugfix: Don't try to create a duplicate Dag Run in Scheduler

Posted by GitBox <gi...@apache.org>.
kaxil edited a comment on pull request #13920:
URL: https://github.com/apache/airflow/pull/13920#issuecomment-768390805


   > I think creating but catching the integrity error is better - otherwise it's still prone to a race condition.
   > 
   > (With the locking it shouldn't go on under normal circumstances, but it's More Correct to catch the integrity error instead and protects against more cases)
   
   I actually did start with that, but the problem is the session needs to be rollback'd all the way if we get an Integrity error and we need to update`next_dagrun` column in `_update_dag_next_dagruns`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org