You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Frederico Nunes (Jira)" <ji...@apache.org> on 2021/03/02 11:20:00 UTC

[jira] [Comment Edited] (AIRFLOW-137) Airflow does not respect 'max_active_runs' when task from multiple dag runs cleared

    [ https://issues.apache.org/jira/browse/AIRFLOW-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293624#comment-17293624 ] 

Frederico Nunes edited comment on AIRFLOW-137 at 3/2/21, 11:19 AM:
-------------------------------------------------------------------

Having the same problem on version 2.0.1. I have a value of max_active_runs_per_dag=1 and dag_concurrency=3, to make 100% sure that the same DAG doesn't run more than once at a time, because each instance is creating/deleting the same intermediary tables on my database.

I'm trying to do a very simple backfilling which will take a couple of days, and some dozens of runs failed during the night due to database downtime. Ideally, I would just like to clear all the failed DAG runs and they would retry one-by-one during the day. In reality, what happens is that several of them run at once and I get errors on all my data loading processes. This will force me to manually clear one DAG run at a time and to check every 10 minutes during the day to clear the next one, to minimize downtime.

I share [~cdabel]'s opinion that this is not a minor issue as it makes backfilling completely impractical.

Thanks!


was (Author: frednunes):
Having the same problem on version 2.0.1. I have a value of max_active_runs_per_dag=1 and dag_concurrency=3, to make 100% sure that the same DAG doesn't run more than once at a time, because each instance is creating/deleting the same intermediary tables on my database.

I'm trying to do a very simple backfilling which will take a couple of days, and some dozens of runs failed during the night due to database downtime. Ideally, I would just like to clear all the failed DAG runs and they would retry one-by-one during the day. In reality, what happens is that several of them run at once and I get errors on all my data loading processes. This will force me to manually clear one task at a time and to check every 10 minutes during the day to clear the next one.

I share [~cdabel]'s opinion that this is not a minor issue as it makes backfilling completely impractical.

Thanks!

> Airflow does not respect 'max_active_runs' when task from multiple dag runs cleared
> -----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-137
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-137
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Tomasz Bartczak
>            Priority: Minor
>
> Also requested at https://github.com/apache/incubator-airflow/issues/1442
> Dear Airflow Maintainers,
> Environment
> Before I tell you about my issue, let me describe my Airflow environment:
> Please fill out any appropriate fields:
>     Airflow version: 1.7.0
>     Airflow components: webserver, mysql, scheduler with celery executor
>     Python Version: 2.7.6
>     Operating System: Linux Ubuntu 3.19.0-26-generic Scheduler runs with --num-runs and get restarted around every minute or so
> Description of Issue
> Now that you know a little about me, let me tell you about the issue I am having:
>     What did you expect to happen?
>     After running 'airflow clear -t spark_final_observations2csv -s 2016-04-07T01:00:00 -e 2016-04-11T01:00:00 MODELLING_V6' I expected that this task gets executed in all dag-runs in specified by given time-range - respecting 'max_active_runs'
>     Dag configuration:
>     concurrency= 3,
>     max_active_runs = 2,
>     What happened instead?
>     Airflow at first started executing 3 of those tasks, which already violates 'max_active_runs', but it looks like 'concurrency' was the applied limit here.
>     3_running_2_pending
> After first task was done - airflow scheduled all other tasks, making it 5 running dags at the same time that violates all specified limit.
> In the GUI we saw red warning (5/2 Dags running ;-) )
> Reproducing the Issue
> max_active_runs is respected in a day-to-day basis - when of the tasks was stuck - airflow didn't start more than 2 dags concurrently.
> [screenshots in the original issue: https://github.com/apache/incubator-airflow/issues/1442]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)