You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Vijay Ramesh <vi...@change.org> on 2017/02/18 05:18:15 UTC

backfill doesn't create task instances for subdag

Hey all,

We're running Airflow 1.7.1.2 and recently had some very odd behavior with
the scheduler falling behind and then immediately marking new dag runs as
failed without creating any of the task instances.  Looking at the
scheduler log it seems to have tripped some issue where the task instance
was not premature, but was also not already in a runnable state?


[2017-02-17 20:30:04,524] {models.py:2660} INFO - Checking state for
<DagRun petition_view_counts @ 2017-02-17 20:00:00:
scheduled__2017-02-17T20:00:00, externally triggered: False>
[2017-02-17 20:30:04,526] {jobs.py:498} INFO - Getting list of tasks to
skip for active runs.
[2017-02-17 20:30:04,537] {jobs.py:514} INFO - Checking dependencies on 2
tasks instances, minus 0 skippable ones
[2017-02-17 20:30:04,549] {jobs.py:498} INFO - Getting list of tasks to
skip for active runs.
[2017-02-17 20:30:04,549] {jobs.py:514} INFO - Checking dependencies on 0
tasks instances, minus 0 skippable ones
[2017-02-17 20:30:04,549] {jobs.py:550} DEBUG - Unlocking DAG
(scheduler_lock)
[2017-02-17 20:30:04,561] {jobs.py:531} DEBUG - Adding task: <TaskInstance:
petition_view_counts.get_counts 2017-02-17 20:00:00 [None]> to the
COULD_NOT_RUN set
[2017-02-17 20:30:04,598] {jobs.py:531} DEBUG - Adding task: <TaskInstance:
petition_view_counts.load_redis 2017-02-17 20:00:00 [None]> to the
COULD_NOT_RUN set
[2017-02-17 20:30:04,598] {jobs.py:538} ERROR - Dag runs are deadlocked for
DAG: petition_view_counts


I am still uncertain if there was some bug on our end (we recently started
using Pools, where many operators run on a redshift_pool with much more
limited concurrency than the general scheduler parallelism) - and
eventually gave up debugging and instead wiped the database and reset all
the DAG start_dates.

When doing this reset, I tried to also backfill some DAGs (start_date was
reset to 2017-02-17 but I didn't want the 03:00, 06:00, 09:00, 12:00, and
15:00 runs to all queue up when it was already time for the 18:00). I ran,
e.g.,

airflow backfill -s "2017-02-17T00:00:00" -e "2017-02-17T18:00:00" -m
 ltx_tag_counters

and it seemed to work correctly, all the task instances were created in the
database.  I then unpaused the DAG and turned back on the scheduler, and
very quickly the web UI showed the dag_runs being created. When it went to
the current 2017-02-17T18:00 DAG run though it again immediately went to
failed (and looking in the database showed no task_instances had been
created). This led me to discover that the backfill hadn't created any
task_instances for the subdag (I was able to create them by manually
running a backfill on the subdag id:

airflow backfill -s "2017-02-17T00:00:00" -e "2017-02-17T18:00:00" -m
 ltx_tag_counters.load_redis

But this all seems very odd to me.


So two questions: Has anybody run into this issue where the scheduler keeps
making dag runs that immediately are failed, have no task instances
created, and show the "Dag runs are deadlocked" message in the scheduler
log?

And second, is it correct that backfilling a DAG with a subdag won't
actually create task instances for the subdag?

Thanks!
 - Vijay Ramesh