You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Marcin Szymanski (JIRA)" <ji...@apache.org> on 2018/09/14 15:40:00 UTC

[jira] [Updated] (AIRFLOW-3065) Scheduler failing tasks when DAG concurrency limit reached

     [ https://issues.apache.org/jira/browse/AIRFLOW-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcin Szymanski updated AIRFLOW-3065:
--------------------------------------
    Description: 
In a DAG with concurrency limit of 4, with about 150 task inside, when the limit of active tasks is reached, the scheduler starts to fail queued tasks. They later are retried, but if they have downstream tasks, these remain in upstream_failed status.

A few additional details:
 * celery executor
 * environment upgraded from 1.9 (no issues back then)
 * all configuration in airflow.cfg updated to the latest set of options
 * issue happens both with PyPi 1.10 and a build from branch v1-10-test (c36ef06)

 

 
{noformat}
[2018-09-14 13:51:23,560] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
[2018-09-14 13:51:23,850] {models.py:1330} INFO - Dependencies not met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum number of running tasks (4) for this task's DAG 'consolidated_db' has been reached.
[2018-09-14 13:51:23,852] {models.py:1531} WARNING - 
--------------------------------------------------------------------------------
FIXME: Rescheduling due to concurrency limits reached at task runtime. Attempt 1 of 1. State set to NONE.
--------------------------------------------------------------------------------

[2018-09-14 13:51:23,853] {models.py:1534} INFO - Queuing into pool None

[2018-09-14 13:51:23,560] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
[2018-09-14 13:51:23,850] {models.py:1330} INFO - Dependencies not met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum number of running tasks (4) for this task's DAG 'consolidated_db' has been reached.
[2018-09-14 13:51:23,852] {models.py:1531} WARNING - 
--------------------------------------------------------------------------------
FIXME: Rescheduling due to concurrency limits reached at task runtime. Attempt 1 of 1. State set to NONE.
--------------------------------------------------------------------------------

[2018-09-14 13:51:23,853] {models.py:1534} INFO - Queuing into pool None
[2018-09-14 13:52:49,939] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
[2018-09-14 13:52:50,142] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
[2018-09-14 13:52:50,235] {models.py:1548} INFO - 
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------

[2018-09-14 13:52:50,646] {models.py:1570} INFO - Executing <Task(PostgresDumpOperator): item> on 2018-09-14T12:42:55.379761+00:00
{noformat}
 

  was:
In a DAG with concurrency limit of 4, with about 150 task inside, when the limit limit of active tasks is reached, the scheduler starts to fail queued tasks. They later are retried, but if they have downstream tasks, these remain in upstream_failed status.

A few additional details:
 * celery executor
 * environment upgraded from 1.9 (no issues back then)
 * all configuration in airflow.cfg updated to the latest set of options
 * issue happens both with PyPi 1.10 and a build from branch v1-10-test (c36ef06)

 

 
{noformat}
[2018-09-14 13:51:23,560] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
[2018-09-14 13:51:23,850] {models.py:1330} INFO - Dependencies not met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum number of running tasks (4) for this task's DAG 'consolidated_db' has been reached.
[2018-09-14 13:51:23,852] {models.py:1531} WARNING - 
--------------------------------------------------------------------------------
FIXME: Rescheduling due to concurrency limits reached at task runtime. Attempt 1 of 1. State set to NONE.
--------------------------------------------------------------------------------

[2018-09-14 13:51:23,853] {models.py:1534} INFO - Queuing into pool None

[2018-09-14 13:51:23,560] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
[2018-09-14 13:51:23,850] {models.py:1330} INFO - Dependencies not met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum number of running tasks (4) for this task's DAG 'consolidated_db' has been reached.
[2018-09-14 13:51:23,852] {models.py:1531} WARNING - 
--------------------------------------------------------------------------------
FIXME: Rescheduling due to concurrency limits reached at task runtime. Attempt 1 of 1. State set to NONE.
--------------------------------------------------------------------------------

[2018-09-14 13:51:23,853] {models.py:1534} INFO - Queuing into pool None
[2018-09-14 13:52:49,939] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
[2018-09-14 13:52:50,142] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
[2018-09-14 13:52:50,235] {models.py:1548} INFO - 
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------

[2018-09-14 13:52:50,646] {models.py:1570} INFO - Executing <Task(PostgresDumpOperator): item> on 2018-09-14T12:42:55.379761+00:00
{noformat}
 


> Scheduler failing tasks when DAG concurrency limit reached
> ----------------------------------------------------------
>
>                 Key: AIRFLOW-3065
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3065
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.0
>            Reporter: Marcin Szymanski
>            Priority: Critical
>
> In a DAG with concurrency limit of 4, with about 150 task inside, when the limit of active tasks is reached, the scheduler starts to fail queued tasks. They later are retried, but if they have downstream tasks, these remain in upstream_failed status.
> A few additional details:
>  * celery executor
>  * environment upgraded from 1.9 (no issues back then)
>  * all configuration in airflow.cfg updated to the latest set of options
>  * issue happens both with PyPi 1.10 and a build from branch v1-10-test (c36ef06)
>  
>  
> {noformat}
> [2018-09-14 13:51:23,560] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
> [2018-09-14 13:51:23,850] {models.py:1330} INFO - Dependencies not met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum number of running tasks (4) for this task's DAG 'consolidated_db' has been reached.
> [2018-09-14 13:51:23,852] {models.py:1531} WARNING - 
> --------------------------------------------------------------------------------
> FIXME: Rescheduling due to concurrency limits reached at task runtime. Attempt 1 of 1. State set to NONE.
> --------------------------------------------------------------------------------
> [2018-09-14 13:51:23,853] {models.py:1534} INFO - Queuing into pool None
> [2018-09-14 13:51:23,560] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
> [2018-09-14 13:51:23,850] {models.py:1330} INFO - Dependencies not met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>, dependency 'Task Instance Slots Available' FAILED: The maximum number of running tasks (4) for this task's DAG 'consolidated_db' has been reached.
> [2018-09-14 13:51:23,852] {models.py:1531} WARNING - 
> --------------------------------------------------------------------------------
> FIXME: Rescheduling due to concurrency limits reached at task runtime. Attempt 1 of 1. State set to NONE.
> --------------------------------------------------------------------------------
> [2018-09-14 13:51:23,853] {models.py:1534} INFO - Queuing into pool None
> [2018-09-14 13:52:49,939] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
> [2018-09-14 13:52:50,142] {models.py:1336} INFO - Dependencies all met for <TaskInstance: consolidated_db.item 2018-09-14T12:42:55.379761+00:00 [queued]>
> [2018-09-14 13:52:50,235] {models.py:1548} INFO - 
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2018-09-14 13:52:50,646] {models.py:1570} INFO - Executing <Task(PostgresDumpOperator): item> on 2018-09-14T12:42:55.379761+00:00
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)