You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Winston Huang (JIRA)" <ji...@apache.org> on 2018/03/29 22:42:00 UTC

[jira] [Created] (AIRFLOW-2270) Subdag backfill spins on removed tasks

Winston Huang created AIRFLOW-2270:
--------------------------------------

             Summary: Subdag backfill spins on removed tasks
                 Key: AIRFLOW-2270
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2270
             Project: Apache Airflow
          Issue Type: Bug
            Reporter: Winston Huang


My understanding is that subdag operators execute via a backfill job which runs in a loop, maintaining the state of the associated tasks and breaking only once all pending tasks have been exhausted: [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2159]

 

The issue is that this task instance status is initialized by this method [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2075,] which may include tasks with {{state = State.REMOVED}}, i.e. tasks that were previously instantiated in the database but removed from the dag definition. Hence, the task will be missing from this list [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2168] but will exist in {{ti_status.to_run}}. This causes the backfill job to loop indefinitely, since it considers those removed tasks to be pending but doesn't attempt to run them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)