You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "yxiao1996 (via GitHub)" <gi...@apache.org> on 2023/02/11 18:03:03 UTC

[GitHub] [airflow] yxiao1996 opened a new issue, #29479: Batch DAG rerun without impacting latest run?

yxiao1996 opened a new issue, #29479:
URL: https://github.com/apache/airflow/issues/29479

   ### Description
   
   My team has an airflow stack orchestrating many of our ETL jobs. Recently, we noticed a bug in one of our hourly job's code causing a large number of hourly datasets to be malformed(more than 500). Thus, after fixing the ETL job, we have the need to rerun all the impacted DAG runs to correct the data.
   
   I learnt that it's possible to batch initiate DAG reruns through the "Browse" -> "Task Instances" view, where I can filter out the task instances I want to rerun through execution date and dag Id and simply clear their states. However, since we set a concurrency limit for DAG runs, after we initiate the reruns, it appears that airflow is prioritizing reruns over latest runs, so out latest DAG runs got delayed. As we have a agreement with our consumers on data processing time, we don't want the latest DAG runs to get delayed. 
   
   I'm wondering if it's possible for Airflow to support a more intuitive and safe way to rerun DAG's in batch, where reruns only start if they don't impact latest runs.
   
   ### Use case/motivation
   
   Backfill/rerun is a common use case for ETL workflows. I feel it could be valuable to make it safe and more intuitive.
   
   ### Related issues
   
   Didn't find related issues after some simple searches. Please let me know if this is discussed somewhere else before.
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #29479: Batch DAG rerun without impacting latest run?

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #29479:
URL: https://github.com/apache/airflow/issues/29479#issuecomment-1426881234

   I believe this to be duplicate of https://github.com/apache/airflow/issues/9176


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal closed issue #29479: Batch DAG rerun without impacting latest run?

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal closed issue #29479: Batch DAG rerun without impacting latest run?
URL: https://github.com/apache/airflow/issues/29479


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org