You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/12 16:40:43 UTC

[GitHub] [airflow] theister commented on issue #18501: Scheduler overloaded when backfilling by clearing DAG history

theister commented on issue #18501:
URL: https://github.com/apache/airflow/issues/18501#issuecomment-967255792


   We came across the same problem, but on `2.0.2`.
   
   We recently migrated from 1.10.15 and we're also regularly doing backfills by clearing history via the UI and making the airflow catchup re-run them, which puts 100s of dag runs to `running` state.
   While the reprocessing the data eventually succeeded, it completely starved out all other dags in the meanwhile, printing many lines of  
   `DAG XYZ already has 10 active runs, not queuing any tasks for run 2021-06-30 00:00:00+00:00` .
   to the scheduler logs. These were printed for the for the 10 next execution dates where no tasks were scheduled for just yet, which corresponds to our scheduler setting of `max_dagruns_per_loop_to_schedule=10`.
   
   After digging into the code a little, my undertanding is that if the `max_active_runs` limit is hit, the 2.0.2 scheduler prints the above message, but returns from `_schedule_dag_run()` without actually updating the `last_scheduling_decision` timestamp of the DagRun (See https://github.com/apache/airflow/blob/2.0.2/airflow/jobs/scheduler_job.py#L1776), which to my understanding only happens in `update_state()`.
   
   Since the `DagRun.next_dagruns_to_examine()` method returns the next DagRuns to check sorted ascending by last_scheduling_decision, this will effectively block any other dag runs from being scheduled, as long as the max_active_runs limit of the dag is being hit.
   
   @leonsmith did you manage to find out if the issue is still present on 2.2.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org