You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Chris (Jira)" <ji...@apache.org> on 2021/01/22 19:15:01 UTC

[jira] [Commented] (AIRFLOW-137) Airflow does not respect 'max_active_runs' when task from multiple dag runs cleared

    [ https://issues.apache.org/jira/browse/AIRFLOW-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270368#comment-17270368 ] 

Chris commented on AIRFLOW-137:
-------------------------------

I've experienced this issue on 1.10.10. This is not a minor issue. Depending on the schedule of the DAG backfill, this can potentially result in multiple data loading operations running in parallel, resulting in destination database resources to get maxed out, and excessive rows being written to destination, which can wreak havoc if not caught in time.

> Airflow does not respect 'max_active_runs' when task from multiple dag runs cleared
> -----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-137
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-137
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Tomasz Bartczak
>            Priority: Minor
>
> Also requested at https://github.com/apache/incubator-airflow/issues/1442
> Dear Airflow Maintainers,
> Environment
> Before I tell you about my issue, let me describe my Airflow environment:
> Please fill out any appropriate fields:
>     Airflow version: 1.7.0
>     Airflow components: webserver, mysql, scheduler with celery executor
>     Python Version: 2.7.6
>     Operating System: Linux Ubuntu 3.19.0-26-generic Scheduler runs with --num-runs and get restarted around every minute or so
> Description of Issue
> Now that you know a little about me, let me tell you about the issue I am having:
>     What did you expect to happen?
>     After running 'airflow clear -t spark_final_observations2csv -s 2016-04-07T01:00:00 -e 2016-04-11T01:00:00 MODELLING_V6' I expected that this task gets executed in all dag-runs in specified by given time-range - respecting 'max_active_runs'
>     Dag configuration:
>     concurrency= 3,
>     max_active_runs = 2,
>     What happened instead?
>     Airflow at first started executing 3 of those tasks, which already violates 'max_active_runs', but it looks like 'concurrency' was the applied limit here.
>     3_running_2_pending
> After first task was done - airflow scheduled all other tasks, making it 5 running dags at the same time that violates all specified limit.
> In the GUI we saw red warning (5/2 Dags running ;-) )
> Reproducing the Issue
> max_active_runs is respected in a day-to-day basis - when of the tasks was stuck - airflow didn't start more than 2 dags concurrently.
> [screenshots in the original issue: https://github.com/apache/incubator-airflow/issues/1442]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)