You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jack (JIRA)" <ji...@apache.org> on 2019/07/17 12:14:00 UTC
[jira] [Commented] (AIRFLOW-3335) Bulk backfill & faster
mark_success
[ https://issues.apache.org/jira/browse/AIRFLOW-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16886997#comment-16886997 ]
jack commented on AIRFLOW-3335:
-------------------------------
May have been fixed by [https://github.com/apache/airflow/pull/5323]
> Bulk backfill & faster mark_success
> -----------------------------------
>
> Key: AIRFLOW-3335
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3335
> Project: Apache Airflow
> Issue Type: Improvement
> Components: backfill
> Reporter: belgacea
> Priority: Major
> Labels: features, performance
>
> I'm using Airflow to schedule Spark jobs and I wanted to be able to `backfill` a large time range (to catch up dags that are far beyond their schedules). I used the `backfill` command with the `mark_success` argument and I thought all the dagruns would be marked as successful in a second, but airflow seems to mark dags one by one (with some parallelization, using the `parallelism`/`dag_concurrency` configuration). Each dag take approximately 2 seconds to be marked as succeed and this makes the backfill process really slow for a large time range (or for small `time intervals`).
> Is there a way to speed up the `mark_success` bakfilling ? And also is there a way to tell to Airflow scheduler to backfill dags with a single instance per task using the specified backfill time range (`start_date` + `end_date`) and then mark as succeed all dagruns within the time range ?
> Note : The dag I tried to backfill doesn't `depends_on_past`.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)