You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Tomasz Urbaszek <tu...@apache.org> on 2020/12/23 13:44:53 UTC

[DISCUSS] AIP: Trigger backfills externally

Hello all,

I would like to discuss the new Airflow Improvement Proposal which
aims to give Airflow users the possibility to trigger backfill
externally (via API or UI).

In short this AIP proposes:
- create new API endpoint to trigger backfill job (and same mechanism
for web ui)
- create new celery task to run backfill on worker machines (in case
of Celery-like executors)
- extend scheduler mechanism of removing zombies to take care of
backfill-triggered tasks and dagruns
- improve UI so users can see difference between scheduled and backfilled runs

I drafted a doc with the proposal:  https://s.apache.org/backfill-aip
so we can discuss it there before moving it to cwiki.

Happy to hear your opinion on this. And have a peaceful and warm holidays!

Best,
Tomek

Re: [DISCUSS] AIP: Trigger backfills externally

Posted by Jacob Ward <jw...@brandwatch.com>.
Hey Tomasz,

We discussed on a ticket in November about rolling in other backfill
improvements: https://github.com/apache/airflow/issues/12654. Unfortunately
I've been unable to work on that since then, but I have the time to do it
now. Should I add this to your AIP document, or do you think it would be
better to have a separate document (or separate AIP)?

Thanks

On Mon, 4 Jan 2021 at 14:45, Ash Berlin-Taylor <as...@apache.org> wrote:

> I agree with Jarek -- running a backfill is essentially another scheduler
> (albeit it one with a lot of specialised logic -- how much of it is needed
> is unclear) so it would be nice to have that all rolled up in to the
> scheduler, and `airflow backfill` and the new API just sets "something"
> that the scheduler than looks at to start running the backfill.
>
> (The scheduling of backfill should not be very resource intensive, no more
> so than normal dag runs)
>
> -ash
>
>
>
> On Mon, 28 Dec, 2020 at 13:23, Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> Made some comments. Summarising: I think we need it, and the proposal is
> reasonable, I only have one serious question where the Backjob should be
> running, My guts feeling tell me that scheduler is a better "entity" for
> such a job than workers as proposed in the original document, but I am
> happy to discuss it.
>
> On Wed, Dec 23, 2020 at 2:45 PM Tomasz Urbaszek <tu...@apache.org>
> wrote:
>
>> Hello all,
>>
>> I would like to discuss the new Airflow Improvement Proposal which
>> aims to give Airflow users the possibility to trigger backfill
>> externally (via API or UI).
>>
>> In short this AIP proposes:
>> - create new API endpoint to trigger backfill job (and same mechanism
>> for web ui)
>> - create new celery task to run backfill on worker machines (in case
>> of Celery-like executors)
>> - extend scheduler mechanism of removing zombies to take care of
>> backfill-triggered tasks and dagruns
>> - improve UI so users can see difference between scheduled and backfilled
>> runs
>>
>> I drafted a doc with the proposal:  https://s.apache.org/backfill-aip
>> so we can discuss it there before moving it to cwiki.
>>
>> Happy to hear your opinion on this. And have a peaceful and warm holidays!
>>
>> Best,
>> Tomek
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jacob Ward    |    Graduate Data Infrastructure Engineer

jward@brandwatch.com


NEW YORK   | BOSTON   | BRIGHTON   | LONDON   | BERLIN |   STUTTGART |
PARIS   | SINGAPORE | SYDNEY

Re: [DISCUSS] AIP: Trigger backfills externally

Posted by Ash Berlin-Taylor <as...@apache.org>.
I agree with Jarek -- running a backfill is essentially another 
scheduler (albeit it one with a lot of specialised logic -- how much of 
it is needed is unclear) so it would be nice to have that all rolled up 
in to the scheduler, and `airflow backfill` and the new API just sets 
"something" that the scheduler than looks at to start running the 
backfill.

(The scheduling of backfill should not be very resource intensive, no 
more so than normal dag runs)

-ash



On Mon, 28 Dec, 2020 at 13:23, Jarek Potiuk <Ja...@polidea.com> 
wrote:
> Made some comments. Summarising: I think we need it, and the proposal 
> is reasonable, I only have one serious question where the Backjob 
> should be running, My guts feeling tell me that scheduler is a better 
> "entity" for such a job than workers as proposed in the original 
> document, but I am happy to discuss it.
> 
> On Wed, Dec 23, 2020 at 2:45 PM Tomasz Urbaszek <turbaszek@apache.org 
> <ma...@apache.org>> wrote:
>> Hello all,
>> 
>>  I would like to discuss the new Airflow Improvement Proposal which
>>  aims to give Airflow users the possibility to trigger backfill
>>  externally (via API or UI).
>> 
>>  In short this AIP proposes:
>>  - create new API endpoint to trigger backfill job (and same 
>> mechanism
>>  for web ui)
>>  - create new celery task to run backfill on worker machines (in case
>>  of Celery-like executors)
>>  - extend scheduler mechanism of removing zombies to take care of
>>  backfill-triggered tasks and dagruns
>>  - improve UI so users can see difference between scheduled and 
>> backfilled runs
>> 
>>  I drafted a doc with the proposal:  
>> <https://s.apache.org/backfill-aip>
>>  so we can discuss it there before moving it to cwiki.
>> 
>>  Happy to hear your opinion on this. And have a peaceful and warm 
>> holidays!
>> 
>>  Best,
>>  Tomek
> 
> 
> --
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
> 
> M: +48 660 796 129 <tel:+48660796129>
>  <https://www.polidea.com/>
> 
> 


Re: [DISCUSS] AIP: Trigger backfills externally

Posted by Jarek Potiuk <Ja...@polidea.com>.
Made some comments. Summarising: I think we need it, and the proposal is
reasonable, I only have one serious question where the Backjob should be
running, My guts feeling tell me that scheduler is a better "entity" for
such a job than workers as proposed in the original document, but I am
happy to discuss it.

On Wed, Dec 23, 2020 at 2:45 PM Tomasz Urbaszek <tu...@apache.org>
wrote:

> Hello all,
>
> I would like to discuss the new Airflow Improvement Proposal which
> aims to give Airflow users the possibility to trigger backfill
> externally (via API or UI).
>
> In short this AIP proposes:
> - create new API endpoint to trigger backfill job (and same mechanism
> for web ui)
> - create new celery task to run backfill on worker machines (in case
> of Celery-like executors)
> - extend scheduler mechanism of removing zombies to take care of
> backfill-triggered tasks and dagruns
> - improve UI so users can see difference between scheduled and backfilled
> runs
>
> I drafted a doc with the proposal:  https://s.apache.org/backfill-aip
> so we can discuss it there before moving it to cwiki.
>
> Happy to hear your opinion on this. And have a peaceful and warm holidays!
>
> Best,
> Tomek
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>