You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/17 06:21:05 UTC

[GitHub] [airflow] tanelk commented on pull request #21901: Add `max_active_runs_reached` column to DagModel

tanelk commented on PR #21901:
URL: https://github.com/apache/airflow/pull/21901#issuecomment-1100813499

   I have some questions about this PR. Mainly how does it help with the "original" issue - deadlocking?
   
   These changes you propose might bring amazing performace improvements (hard to say without benchmarks) but it "feels" like a wrong way to solve a deadlock issue.  
   
   I'm now just spitballing without actually running the code.  The pair of methods `self._should_update_dag_next_dagruns` and `dag_model.calculate_dagrun_date_fields` are called in 3 places:
   
   Once when creating dag runs:
   https://github.com/apache/airflow/blob/3a2eb961ca628d962ed5fad68760ef3439b2f40b/airflow/jobs/scheduler_job.py#L1038-L1039
   
   Twice when scheduling those dag runs:
   https://github.com/apache/airflow/blob/3a2eb961ca628d962ed5fad68760ef3439b2f40b/airflow/jobs/scheduler_job.py#L1139-L1140
   https://github.com/apache/airflow/blob/3a2eb961ca628d962ed5fad68760ef3439b2f40b/airflow/jobs/scheduler_job.py#L1165-L1166
   
   In the first case we have a lock on the `DagModel` from the `DagModel.dags_needing_dagruns` method. In the second two cases we have a lock on the `DagRun` from the `DagRun.next_dagruns_to_examine` method.
   
   Perhaps we should be locking also on the `DagModel`, because we are modifing it? It is probably best to bounce it by @ashb.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org