You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/28 14:29:58 UTC

[GitHub] [airflow] potiuk opened a new pull request #19862: Clarify behaviour of test_backfill_depends_on_past

potiuk opened a new pull request #19862:
URL: https://github.com/apache/airflow/pull/19862


   It used to be that the backfill job deadlocked when a task had
   "depends_on_past" set. This was because it depended on past,
   non-existing and not succesful task.
   
   You needed to set `ignore_first_depends_on_past' set in order to
   avoid the deadlock. This test was quarantined but I think
   the behaviour has changed.
   
   It looks like currently Airflow behaves differently: when checking
   dependencies, it actually retrieves the previous dag-run and
   if there is none available it does not deadlock no matter if
   depends_on_past is set. That makes much more sense and likely
   was fixed during the recent changes involved in timetable
   implementation.
   
   This PR updates the tests to show that backfill will behave
   properly, regardless from `ignore_first_depends_on_past`.
   
   Fixes: #14755
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19862: Clarify behaviour of test_backfill_depends_on_past

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19862:
URL: https://github.com/apache/airflow/pull/19862#issuecomment-981097308


   I am not 100% - but from what I debugged, the behaviour of backfill in case `depends_on_past==Truw` in a task have been fixed in the meantime (likely during timeteable implementation):
   
   From what I see the backfil job will not deadlock any more but it will **just work** even if "ignore_first_depends_on_past" is set to true, simply because in prev_dagrun_dep, the prev_dagrun_dep is None (as it should be IMHO):
   
   in  https://github.com/apache/airflow/blob/846586eb86cfa927eaebbba7124e8cd34c0bebce/airflow/ti_deps/deps/prev_dagrun_dep.py#L52
   
   No matter if catchup is True or false, previous DagRun is None  (as I think it should be for backfill job).
   ```
   
           if catchup:
               last_dagrun = dr.get_previous_scheduled_dagrun(session)
           else:
               last_dagrun = dr.get_previous_dagrun(session=session)
   ```
   
   However I'd love to hear other's comment if understand it right. I am not sure if there is other case where we actually still need `ignore_first_depends_on_past` in this case or should we depreacate it ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #19862: Clarify behaviour of test_backfill_depends_on_past

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #19862:
URL: https://github.com/apache/airflow/pull/19862#issuecomment-982290453


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #19862: Clarify behaviour of test_backfill_depends_on_past

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #19862:
URL: https://github.com/apache/airflow/pull/19862#discussion_r758452199



##########
File path: tests/jobs/test_backfill_job.py
##########
@@ -797,25 +798,20 @@ def test_backfill_pooled_tasks(self):
         ti.refresh_from_db()
         assert ti.state == State.SUCCESS
 
-    @pytest.mark.quarantined
-    def test_backfill_depends_on_past(self):
-        """
-        Test that backfill respects ignore_depends_on_past
-        """
+    @parameterized.expand([(True,), (False,)])

Review comment:
       ```suggestion
       @pytest.mark.parametrize("ignore_depends_on_past", [True, False])
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #19862: Clarify behaviour of test_backfill_depends_on_past

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #19862:
URL: https://github.com/apache/airflow/pull/19862


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19862: Clarify behaviour of test_backfill_depends_on_past

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19862:
URL: https://github.com/apache/airflow/pull/19862#issuecomment-981097308


   I am not 100% - but from what I debugged, the behaviour of backfill in case depends_on_past tasks have been fixed in the meantime (likely during timeteable implementation):
   
   From what I see the backfil job will not deadlock any more but it will **just work** even if "ignore_first_depends_on_past" is set to true, simply because in prev_dagrun_dep, the prev_dagrun_dep is None (as it should be IMHO):
   
   in  https://github.com/apache/airflow/blob/846586eb86cfa927eaebbba7124e8cd34c0bebce/airflow/ti_deps/deps/prev_dagrun_dep.py#L52
   
   No matter if catchup is True or false, previous DagRun is None  (as I think it should be for backfill job).
   ```
   
           if catchup:
               last_dagrun = dr.get_previous_scheduled_dagrun(session)
           else:
               last_dagrun = dr.get_previous_dagrun(session=session)
   ```
   
   However I'd love to hear other's comment if understand it right. I am not sure if there is other case where we actually still need `ignore_first_depends_on_past` in this case or should we depreacate it ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org