You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/26 15:10:00 UTC

[GitHub] [airflow] potiuk opened a new pull request #18531: Workaround intermittently failing scheduler test

potiuk opened a new pull request #18531:
URL: https://github.com/apache/airflow/pull/18531


   Some of the executions of this test return dagrun in Queued
   rather than Running state. This PR attempts to wokraround it
   by trying to re-run scheduling in such case (up to several times)
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18531: Workaround intermittently failing scheduler test

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18531:
URL: https://github.com/apache/airflow/pull/18531#issuecomment-937604663


   Take a look @ephraimbuddy please  - I think I got it, but would love confirmation :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #18531: Workaround intermittently failing scheduler test

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #18531:
URL: https://github.com/apache/airflow/pull/18531#discussion_r716273133



##########
File path: tests/jobs/test_scheduler_job.py
##########
@@ -2658,9 +2659,14 @@ def test_do_schedule_max_active_runs_dag_timed_out(self, dag_maker):
         assert run1_ti.state == State.SKIPPED
 
         # Run scheduling again to assert run2 has started
-        self.scheduler_job._do_scheduling(session)
-        run2 = session.merge(run2)
-        session.refresh(run2)
+        for i in range(1, 10):
+            self.scheduler_job._do_scheduling(session)
+            run2 = session.merge(run2)
+            session.refresh(run2)
+            if run2.state == State.QUEUED:
+                sleep(0.1)
+                continue
+            break

Review comment:
       For the code starting from 2661, I would suggest this:
   ```python
   # Run scheduling again to assert run2 has started
   self.scheduler_job._start_queued_dagruns(session)
   session.flush()
   run2 = session.merge(run2)
   session.refresh(run2)
   assert run2.state == State.RUNNING
   ```
   Since this is testing max_active_runs and dag_timeout, I think we don't need to schedule the task instances.
   We can also run the _schedule_dag_run to have it put the ti into scheduled:
   ```python
   # Run scheduling again to assert run2 has started
   self.scheduler_job._start_queued_dagruns(session)
   session.flush()
   self.scheduler_job._schedule_dag_run(run2, session)
   run2 = session.merge(run2)
   session.refresh(run2)
   assert run2.state == State.RUNNING
   run2_ti = run2.get_task_instance(task1.task_id, session)
   assert run2_ti.state == State.SCHEDULED
   ```
   
   What I have observed is that using `_do_scheduling` in tests usually doesn't do what we want. I prefer using `_start_queued_dagrun` to start dagruns instead of using `do_scheduling`. Maybe we should use it here too.
   https://github.com/apache/airflow/blob/2643345e4b72064c605e42901a3dc531e6aa2f4e/tests/jobs/test_scheduler_job.py#L2755




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #18531: Stabilize flaky test_do_schedule_max_active_runs_dag_timed_out

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #18531:
URL: https://github.com/apache/airflow/pull/18531


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18531: Workaround intermittently failing scheduler test

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18531:
URL: https://github.com/apache/airflow/pull/18531#issuecomment-937604663


   Take a look @ephraimbuddy please  - I think I got it, but would love confirmation :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #18531: Stabilize flaky test_do_schedule_max_active_runs_dag_timed_out

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #18531:
URL: https://github.com/apache/airflow/pull/18531


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #18531: Workaround intermittently failing scheduler test

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #18531:
URL: https://github.com/apache/airflow/pull/18531#discussion_r716678182



##########
File path: tests/jobs/test_scheduler_job.py
##########
@@ -2658,9 +2659,14 @@ def test_do_schedule_max_active_runs_dag_timed_out(self, dag_maker):
         assert run1_ti.state == State.SKIPPED
 
         # Run scheduling again to assert run2 has started
-        self.scheduler_job._do_scheduling(session)
-        run2 = session.merge(run2)
-        session.refresh(run2)
+        for i in range(1, 10):
+            self.scheduler_job._do_scheduling(session)
+            run2 = session.merge(run2)
+            session.refresh(run2)
+            if run2.state == State.QUEUED:
+                sleep(0.1)
+                continue
+            break

Review comment:
       Ah coll. Good points. I will take a look a bit closer soon and see it. I would love to learn a bit more on how those tests are working :D

##########
File path: tests/jobs/test_scheduler_job.py
##########
@@ -2658,9 +2659,14 @@ def test_do_schedule_max_active_runs_dag_timed_out(self, dag_maker):
         assert run1_ti.state == State.SKIPPED
 
         # Run scheduling again to assert run2 has started
-        self.scheduler_job._do_scheduling(session)
-        run2 = session.merge(run2)
-        session.refresh(run2)
+        for i in range(1, 10):
+            self.scheduler_job._do_scheduling(session)
+            run2 = session.merge(run2)
+            session.refresh(run2)
+            if run2.state == State.QUEUED:
+                sleep(0.1)
+                continue
+            break

Review comment:
       Ah cool. Good points. I will take a look a bit closer soon and see it. I would love to learn a bit more on how those tests are working :D




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18531: Workaround intermittently failing schediuler test

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18531:
URL: https://github.com/apache/airflow/pull/18531#issuecomment-927289950


   Not sure if this is a good solution - maybe the state SHOULD be RUNNING immediately and we have some actual problem ? But worth trying: @ashb @ephraimbuddy  - I would love if you took a look to see if this is a legitimate possibility to have a QUEUED state there for a short while (and whether my approach to workaround is correct). 
   
   Example failure that made me create this PR: https://github.com/apache/airflow/runs/3712148691?check_suite_focus=true#step:6:9688


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18531: Workaround intermittently failing scheduler test

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18531:
URL: https://github.com/apache/airflow/pull/18531#issuecomment-927321490


   Trying out on full tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #18531: Workaround intermittently failing scheduler test

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #18531:
URL: https://github.com/apache/airflow/pull/18531


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org