You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/01 22:18:19 UTC

[GitHub] [airflow] soltanianalytics commented on issue #13407: Clearing tasks for previously finished DAG runs in airflow 2.0 does not lead to scheduling of tasks

soltanianalytics commented on issue #13407:
URL: https://github.com/apache/airflow/issues/13407#issuecomment-753393443


   I did some more reading (mainly https://github.com/apache/airflow/issues/1442 and https://issues.apache.org/jira/browse/AIRFLOW-137). I see now that using TI was entirely on purpose. Currently, it is expected to ignore the `DagRun`s which were re-set to `running` by the clearing of tasks in order to avoid a violation of `max_active_runs`, until such a violation is avoided. My issue with this is that
   1. The scheduler does not schedule tasks in `DagRun`s which are, in fact, `running`
   2. When a user clears tasks, the user would _want_ these tasks to be scheduled, therefore I think the violation of `max_active_runs` - as is the case in my usecase - is on purpose and a feature, not a bug
   
   But, from https://github.com/apache/airflow/issues/1442 I can see that a user might also want to just have specific tasks run, but have them run across a large number of `DagRun`s, while only executing tasks in `max_active_runs` or less `DagRun`s. Arguably, when using backfill or just generally `catchup=True`, I would expect that I can rely on the tasks being executed ordered by execution_date, because if I just have my airflow installation running, that is also the order in which the tasks are being run. Thus, I think a second alternative is an approach where we keep the abovementioned logic but adjust it so that only tasks in the first `max_active_runs` `DagRun`s are run, ordered by `execution_date`.
   
   I will create a second PR with this alternative approach.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org