You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/07 07:51:36 UTC

[GitHub] [airflow] ephraimbuddy commented on pull request #21901: Add `max_active_runs_reached` column to DagModel

ephraimbuddy commented on pull request #21901:
URL: https://github.com/apache/airflow/pull/21901#issuecomment-1060284341


   > Hey @ephraimbuddy - any more context on this one? I'd love to understand more context but I am afraid I am missing it :).
   
   Ok. The issue is that the scheduler crashes when running more than 2 schedulers. Looking at where this error is coming from, this may also occur with one scheduler but I saw it while running 3 schedulers.
   
   What we do currently is to nullify `next_dagrun_create_after` in the scheduler to prevent creating new dagruns when we have reached max_active_runs.  https://github.com/apache/airflow/blob/4e05bbc92557f5eafaa37509387b2fb2b0ab3d4a/airflow/jobs/scheduler_job.py#L988
   
   We calculate this field once the dagrun has completed https://github.com/apache/airflow/blob/4e05bbc92557f5eafaa37509387b2fb2b0ab3d4a/airflow/jobs/scheduler_job.py#L1101-L1102
   
   I'm proposing that instead of nullifying and calculating this field, we should have a new column on the DagModel to track the max_active_runs. Because it looks to me that the operation of recalculating this field is expensive.
   
   Here's a dag I used to reproduce this behaviour:
   ```python
   from airflow.sensors.date_time import DateTimeSensor
   from airflow import DAG
   from pendulum import datetime
   
   
   def create_dag(dag_name):
       dag = DAG(
           dag_name,
           schedule_interval='@daily',
           start_date=datetime(2022, 1, 1),
       )
       check_time = DateTimeSensor(
           task_id='check_time',
           target_time=datetime(2022, 3, 8, 12, 0, 0),
           poke_interval=30,
           mode='reschedule',
           dag=dag
       )
       check_time
       return dag
   
   for i in range(100):
       globals()[f'dag_{i}'] = create_dag(f'dag_{i}')
   ```
   First, you need to have the schedulers running, before triggering all the dags.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org