You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Malthe <mb...@gmail.com> on 2022/03/01 06:55:30 UTC

Re: Simplifying the timetable interface

On Sun, 27 Feb 2022 at 14:04, Jarek Potiuk <ja...@potiuk.com> wrote:
> TL;DR; I think it is about time to complete what we were planning in AIP-39 as "Future Enhancement" and implement a few simple timetable implementations that will handle most popular use cases (using the "complex" timetable API) that will be available to regular users to use (without the need of writing new code). My proposal is that we should define what timetables to add and aim to implement them to include them in Airflow 2.3. Sounds doable and should solve the real problem of our users.

This would be great. I agree that if there were ready-to-use
timetables with a composable behavior then very few users will need to
write custom timetables which is a good thing.

> I do not think the current interface is "too complex". Not at all. But I think that it is targeted to a different audience than Malthe and Bas talk about. It is addressed for "power users" - not only because it requires deep understanding of Airflow scheduling internals and optimizations but also, because it requires "admin" rights to develop, test and install it. Regular users. who are Dag authors cannot create new Timetables. This is mostly because of security. The "regular users"  need to convince the admins to do so. And yes I am talking about the important segment of our users where you have professional admins/devops configuring Airflow and DAG authors who just write DAGs. I think this is the most interesting and biggest segment of our users to be honest. We should always think about this segment of our users first IMHO.

Well I think they're too complex!

And I actually elaborated quite a bit on that in this correspondence –
including giving a concrete proposal which you did not consider or
mention. But I'm happy to be proven wrong. Perhaps it is only me who
thinks the interface is wrong. Let us see an example implementation of
"-2 day of every month" or "every day after work", either using a
declarative specification built on top of some composable timetable,
or as a direct implementation.

> The current API is great when it comes to power users who know airflow's scheduling internals and optimizations that Ash explained.

I know the scheduling internals and I was not able to write a custom
timetable. I was looking at the included example and it did not look
like code that a power user should be able to write.

But let's then turn our attention to what we want out of predefined timetables:

- Composability. It should be possible to use and/or operators in a
nested fashion to compose any timetable (e.g. every Friday OR every
last weekday AND every day at 6pm).
- Intervals. It should be possible to define a timetable which changes
over time, as non-overlapping intervals (e.g. one of each year).
- Exclude. It should be possible to exclude certain days (i.e. holidays).

I think this is a good starting point and it would allow users to meet
most of their scheduling needs.

Cheers

Re: Simplifying the timetable interface

Posted by Collin McNulty <co...@astronomer.io.INVALID>.
Jarek,

I agree fully. I think bare minimum we should provide a version of the
CronDataIntervalTimetable that tries to schedule the run immediately after
the logical_date and a Timetable that just takes a list of datetimes and
schedules at those times.

Collin McNulty

Re: Simplifying the timetable interface

Posted by Jarek Potiuk <ja...@potiuk.com>.
BTW.  Just to add to the discussion this is an attempt of "regular"
users to use and modify the timetable as defined now:
https://github.com/apache/airflow/issues/22242

To be honest, I don't even know what to answer the user. Seems that
the user follows our advice - tries to work from our Workday examples
but there is no way the current timetable approach can be easily
tested or verified and if people will start using the current
timetables to do anything, this will lead to similar problems and
confusion.
I honestly think we should provide the users with a few well tested
(automatically) configurable timetable "implementations".

J.

On Tue, Mar 1, 2022 at 9:27 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>
>> This would be great. I agree that if there were ready-to-use
>> timetables with a composable behavior then very few users will need to
>> write custom timetables which is a good thing.
>>
> Cool!
>
>>
>> > I do not think the current interface is "too complex". Not at all. But I think that it is targeted to a different audience than Malthe and Bas talk about. It is addressed for "power users" - not only because it requires deep understanding of Airflow scheduling internals and optimizations but also, because it requires "admin" rights to develop, test and install it. Regular users. who are Dag authors cannot create new Timetables. This is mostly because of security. The "regular users"  need to convince the admins to do so. And yes I am talking about the important segment of our users where you have professional admins/devops configuring Airflow and DAG authors who just write DAGs. I think this is the most interesting and biggest segment of our users to be honest. We should always think about this segment of our users first IMHO.
>>
>> Well I think they're too complex!
>>
>> And I actually elaborated quite a bit on that in this correspondence –
>> including giving a concrete proposal which you did not consider or
>> mention. But I'm happy to be proven wrong. Perhaps it is only me who
>> thinks the interface is wrong. Let us see an example implementation of
>> "-2 day of every month" or "every day after work", either using a
>> declarative specification built on top of some composable timetable,
>> or as a direct implementation.
>>
>
> Precisely. I think we should implement those and then we can see if they can be simplified. I think if we have a few customized timetables including comprehensive unit tests, we will be able to see if we can simplify the whole interface for Airflow 3.
> But having those few predefined timetables and the unit tests for those - will be of a great help when we will want to simplify it IMHO. Then for Airflow 3 (which is still months away if not years) we will be in a much better position to make even breaking changes.
>
>>
>>
>> - Composability. It should be possible to use and/or operators in a
>> nested fashion to compose any timetable (e.g. every Friday OR every
>> last weekday AND every day at 6pm).
>> - Intervals. It should be possible to define a timetable which changes
>> over time, as non-overlapping intervals (e.g. one of each year).
>> - Exclude. It should be possible to exclude certain days (i.e. holidays).
>
>
> By all means :  - most of those were already mentioned as "future work" in https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-39+Richer+scheduler_interval - composability, intervals. I think we should just follow up what has been planned there.
>
>>
>> I think this is a good starting point and it would allow users to meet
>> most of their scheduling needs.
>>
>> Cheers

Re: Simplifying the timetable interface

Posted by Jarek Potiuk <ja...@potiuk.com>.
>
> This would be great. I agree that if there were ready-to-use
> timetables with a composable behavior then very few users will need to
> write custom timetables which is a good thing.
>
> Cool!


> > I do not think the current interface is "too complex". Not at all. But I
> think that it is targeted to a different audience than Malthe and Bas talk
> about. It is addressed for "power users" - not only because it requires
> deep understanding of Airflow scheduling internals and optimizations but
> also, because it requires "admin" rights to develop, test and install it.
> Regular users. who are Dag authors cannot create new Timetables. This is
> mostly because of security. The "regular users"  need to convince the
> admins to do so. And yes I am talking about the important segment of our
> users where you have professional admins/devops configuring Airflow and DAG
> authors who just write DAGs. I think this is the most interesting and
> biggest segment of our users to be honest. We should always think about
> this segment of our users first IMHO.
>
> Well I think they're too complex!
>
> And I actually elaborated quite a bit on that in this correspondence –
> including giving a concrete proposal which you did not consider or
> mention. But I'm happy to be proven wrong. Perhaps it is only me who
> thinks the interface is wrong. Let us see an example implementation of
> "-2 day of every month" or "every day after work", either using a
> declarative specification built on top of some composable timetable,
> or as a direct implementation.
>
>
Precisely. I think we should implement those and then we can see if they
can be simplified. I think if we have a few customized timetables including
comprehensive unit tests, we will be able to see if we can simplify the
whole interface for Airflow 3.
But having those few predefined timetables and the unit tests for those -
will be of a great help when we will want to simplify it IMHO. Then for
Airflow 3 (which is still months away if not years) we will be in a much
better position to make even breaking changes.


>
> - Composability. It should be possible to use and/or operators in a
> nested fashion to compose any timetable (e.g. every Friday OR every
> last weekday AND every day at 6pm).
> - Intervals. It should be possible to define a timetable which changes
> over time, as non-overlapping intervals (e.g. one of each year).
> - Exclude. It should be possible to exclude certain days (i.e. holidays).
>

By all means :  - most of those were already mentioned as "future work" in
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-39+Richer+scheduler_interval
- composability, intervals. I think we should just follow up what has been
planned there.


> I think this is a good starting point and it would allow users to meet
> most of their scheduling needs.
>
> Cheers
>