You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/09 07:04:07 UTC

[GitHub] [airflow] Jorricks commented on issue #25615: Random DagRuns set to running during large catch up

Jorricks commented on issue #25615:
URL: https://github.com/apache/airflow/issues/25615#issuecomment-1208992911

   I did some initial exploration of what could be causing this issue.
   I suppose the issue is in `_start_queued_dagruns` shown [here](https://github.com/apache/airflow/blob/2.2.3/airflow/jobs/scheduler_job.py#L935).
   I checked quite thoroughly on the order and limit query denoted by `dag_runs = self._get_next_dagruns_to_examine(State.QUEUED, session)`.
   I feel like this is correctly set as `last_scheduling_decision` should be None. Unfortunately, we are still planning our upgrade to Airflow 2.2.3 so I have not been able to verify whether `last_scheduling_decision` is in fact None.
   
   Then, the only reason I have been able to come up with this far is that when there are multiple schedulers in this loop, it could cause issues. Let me reason about this:
   Imagine we have a DAG called `my_dag` that has a max of 16 running DagRuns.
   1. Scheduler A enters this loop and tries to schedule Queued DagRuns to running for `my_dag`. At this point in time (T), the number of active Runs is 15, which equals one less than the limit. Scheduler A will schedule one extra run.
   2. While scheduler A is still in its loop, at time (T+1) a DagRun has been marked Success by Scheduler B.
   3. Scheduler C enters this loop and tries to schedule Queued DagRuns to running for `my_dag`. Now it's time (T+2), at this point Scheduler C is also allowed to schedule a task, however, scheduler A locked all the earliest DagRuns, so now scheduler C resorts to way newer DagRuns. This could potentially lead to scheduling a task that is way later than the DagRun that was up next, after the one Scheduler A was scheduling.
   4. Scheduler A completes its loop and unlocks the rows.
   5. Scheduler C completes its loop and unlocks the rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org