You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Vijay Ramesh <vi...@change.org> on 2017/02/28 18:52:14 UTC

Task instance enqueued but never runs, holds up next run of the DAG

I have a large DAG (32 tasks) with concurrency=2 and max_active_runs=1.
Most of the tasks also use a redshift_pool, and this is running the
LocalExecutor on 1.8.0RC4.

When the DAG kicks off things seem to generally function, but a few of the
tasks get moved to queued status (appropriately) but then never actually
start.  Looking in the logs I see:

[2017-02-27 13:20:10,349] {base_task_runner.py:95} INFO - Subtask:
[2017-02-27 13:20:10,348] {models.py:1128} INFO - Dependencies all met for
<TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00
[queued]>
[2017-02-27 13:20:10,356] {base_task_runner.py:95} INFO - Subtask:
[2017-02-27 13:20:10,356] {models.py:1122} INFO - Dependencies not met for
<TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00
[queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum
number of running tasks (etl_queries_v3) for this task's DAG '2' has been
reached.
[2017-02-27 13:20:14,444] {jobs.py:2062} INFO - Task exited with return
code 0

and then that's it, the queued task never is picked up again. It has been
different tasks each day, which makes me suspect it's some sort of
scheduling race condition.  And because they are enqueued not failed, the
DAG run never finishes (and so this morning our DAG didn't kick off because
yesterday's was still technically "running").

Any thoughts/advice? (I also added
https://github.com/apache/incubator-airflow/pull/2109 to fix the formatting
of that error message)

Thanks,
 - Vijay Ramesh

Re: Task instance enqueued but never runs, holds up next run of the DAG

Posted by Maxime Beauchemin <ma...@gmail.com>.
For the record, in 1.8.x Dan added a new UI element in the `Task Details`
page that makes it clear which ones of the dependency rule aren't met.

"Why isn't my task running!?" used to be the most common question we get on
our internal #airflow Slack channel, not anymore!

Here's a screenshot of the feature: http://i.imgur.com/KLWPpwk.png

Max

On Wed, Mar 1, 2017 at 11:21 AM, Paul Minton <
paul.minton@cloverhealth.com.invalid> wrote:

> Could the problem be related to this note in the "updating" docs?
>
> https://github.com/apache/incubator-airflow/blob/1.8.0/
> UPDATING.md#tasks-not-starting-although-dependencies-are-met-due-to-
> stricter-pool-checking
>
> On Tue, Feb 28, 2017 at 10:52 AM, Vijay Ramesh <vi...@change.org> wrote:
>
> > I have a large DAG (32 tasks) with concurrency=2 and max_active_runs=1.
> > Most of the tasks also use a redshift_pool, and this is running the
> > LocalExecutor on 1.8.0RC4.
> >
> > When the DAG kicks off things seem to generally function, but a few of
> the
> > tasks get moved to queued status (appropriately) but then never actually
> > start.  Looking in the logs I see:
> >
> > [2017-02-27 13:20:10,349] {base_task_runner.py:95} INFO - Subtask:
> > [2017-02-27 13:20:10,348] {models.py:1128} INFO - Dependencies all met
> for
> > <TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00
> > [queued]>
> > [2017-02-27 13:20:10,356] {base_task_runner.py:95} INFO - Subtask:
> > [2017-02-27 13:20:10,356] {models.py:1122} INFO - Dependencies not met
> for
> > <TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00
> > [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum
> > number of running tasks (etl_queries_v3) for this task's DAG '2' has been
> > reached.
> > [2017-02-27 13:20:14,444] {jobs.py:2062} INFO - Task exited with return
> > code 0
> >
> > and then that's it, the queued task never is picked up again. It has been
> > different tasks each day, which makes me suspect it's some sort of
> > scheduling race condition.  And because they are enqueued not failed, the
> > DAG run never finishes (and so this morning our DAG didn't kick off
> because
> > yesterday's was still technically "running").
> >
> > Any thoughts/advice? (I also added
> > https://github.com/apache/incubator-airflow/pull/2109 to fix the
> > formatting
> > of that error message)
> >
> > Thanks,
> >  - Vijay Ramesh
> >
>

Re: Task instance enqueued but never runs, holds up next run of the DAG

Posted by Paul Minton <pa...@cloverhealth.com.INVALID>.
Could the problem be related to this note in the "updating" docs?

https://github.com/apache/incubator-airflow/blob/1.8.0/UPDATING.md#tasks-not-starting-although-dependencies-are-met-due-to-stricter-pool-checking

On Tue, Feb 28, 2017 at 10:52 AM, Vijay Ramesh <vi...@change.org> wrote:

> I have a large DAG (32 tasks) with concurrency=2 and max_active_runs=1.
> Most of the tasks also use a redshift_pool, and this is running the
> LocalExecutor on 1.8.0RC4.
>
> When the DAG kicks off things seem to generally function, but a few of the
> tasks get moved to queued status (appropriately) but then never actually
> start.  Looking in the logs I see:
>
> [2017-02-27 13:20:10,349] {base_task_runner.py:95} INFO - Subtask:
> [2017-02-27 13:20:10,348] {models.py:1128} INFO - Dependencies all met for
> <TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00
> [queued]>
> [2017-02-27 13:20:10,356] {base_task_runner.py:95} INFO - Subtask:
> [2017-02-27 13:20:10,356] {models.py:1122} INFO - Dependencies not met for
> <TaskInstance: etl_queries_v3.a_user_day_v2_query 2017-02-26 07:00:00
> [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum
> number of running tasks (etl_queries_v3) for this task's DAG '2' has been
> reached.
> [2017-02-27 13:20:14,444] {jobs.py:2062} INFO - Task exited with return
> code 0
>
> and then that's it, the queued task never is picked up again. It has been
> different tasks each day, which makes me suspect it's some sort of
> scheduling race condition.  And because they are enqueued not failed, the
> DAG run never finishes (and so this morning our DAG didn't kick off because
> yesterday's was still technically "running").
>
> Any thoughts/advice? (I also added
> https://github.com/apache/incubator-airflow/pull/2109 to fix the
> formatting
> of that error message)
>
> Thanks,
>  - Vijay Ramesh
>