You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Drew Zoellner <dr...@mainstreethub.com> on 2017/06/22 18:50:42 UTC

Airflow 1.8.1 scheduler issue

Hi airflow dev team,

We have a subdag which looks like the following...



This subdag has a concurrency limit of 8. As you can see 8 tasks after our
'select_locations'  task succeed and so do their downstream tasks. The rest
of the tasks seem to get forgotten about by the scheduler. We've re-ran
this dag a few times, cleaned out the database, ran it again but we still
run into the problem.

After some digging, we found that the scheduler seems to be adding too many
tasks to our redis queue. The worker then picks up these tasks from the
redis queue, runs 8 of them fine and fails the rest.

Here is the error message from the failed ones : Dependencies not met for
<TaskInstance:
performance_kit_180_v1.calculate_stats.transform_tripadvisor_reviews
2017-06-21 00:00:00 [queued]>, dependency 'Task Instance Slots Available'
FAILED: The maximum number of running tasks
(performance_kit_180_v1.calculate_stats) for this task's DAG '8' has been
reached.

Then is prints this message :

--------------------------------------------------------------------------------

FIXME: Rescheduling due to concurrency limits reached at task runtime.
Attempt 1 of 2. State set to NONE.

--------------------------------------------------------------------------------
Queuing into pool None

We noticed in the airflow code that this error message wasn't expected to
happen and that it might point to a problem in the scheduler. Here is the
comment which led us to believe that :

# FIXME: we might have hit concurrency limits, which means we probably #
have been running prematurely. This should be handled in the # scheduling
mechanism.


We have other dags which are limited by the concurrency limit just fine and
none of the tasks of those dags print the above error on our worker node.

Thanks for any insight into this problem!

Re: Airflow 1.8.1 scheduler issue

Posted by Bolke de Bruin <bd...@gmail.com>.
This will be fixed in 1.8.2 which will be out shortly (rc2 has the fix).

Bolke

> On 22 Jun 2017, at 20:50, Drew Zoellner <dr...@mainstreethub.com> wrote:
> 
> Hi airflow dev team,
> 
> We have a subdag which looks like the following...
> 
> 
> 
> This subdag has a concurrency limit of 8. As you can see 8 tasks after our 'select_locations'  task succeed and so do their downstream tasks. The rest of the tasks seem to get forgotten about by the scheduler. We've re-ran this dag a few times, cleaned out the database, ran it again but we still run into the problem.
> 
> After some digging, we found that the scheduler seems to be adding too many tasks to our redis queue. The worker then picks up these tasks from the redis queue, runs 8 of them fine and fails the rest. 
> 
> Here is the error message from the failed ones : Dependencies not met for <TaskInstance: performance_kit_180_v1.calculate_stats.transform_tripadvisor_reviews 2017-06-21 00:00:00 [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum number of running tasks (performance_kit_180_v1.calculate_stats) for this task's DAG '8' has been reached.
> 
> Then is prints this message : 
> --------------------------------------------------------------------------------
> 
> 
> FIXME: Rescheduling due to concurrency limits reached at task runtime. Attempt 1 of 2. State set to NONE.
> 
> 
> --------------------------------------------------------------------------------
> 
> Queuing into pool None
> 
> We noticed in the airflow code that this error message wasn't expected to happen and that it might point to a problem in the scheduler. Here is the comment which led us to believe that : 
> 
> # FIXME: we might have hit concurrency limits, which means we probably
> # have been running prematurely. This should be handled in the
> # scheduling mechanism.
> 
> 
> 
> We have other dags which are limited by the concurrency limit just fine and none of the tasks of those dags print the above error on our worker node.
> 
> Thanks for any insight into this problem!