You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Phil Yardley <ph...@dunnhumby.com> on 2021/02/17 15:47:01 UTC

Re: Scoping out a new feature for 2.1: improving schedule_interval

Is it possible to offer both? (maybe in two releases)..

that then allows the user to select the most appropriate for their scenario. 

My scenario for example is easy with multiple crons:

Monday - Thursday run job A at 9pm
Friday - run job A at 8pm

this is easier in cron than writing a python extension to handle it.

But - having the ability to write a custom method in language x then satisfies those that need something more complex such as the astronomy example in the thread.

"it looking complicated" to the user, is probably for the user to worry about - if it looks too complicated, they've probably selected the wrong way of doing it. (or chosen the simplest and not worried about how it looks)

Phil


On 2021/01/24 07:04:03, Jarek Potiuk <ja...@potiuk.com> wrote: 
> Yep. I agree with Daniel - adding multiple crons is very difficult to
> reason about. you can create arbitrary complex declarative way of defining
> complex schedule that you will have hard time understanding.  We are
> already entering the realm of programming the schedule, which IMHO is
> better to do in a "programming" language rather than cron declarations.
> 
> J.
> 
> On Sun, Jan 24, 2021 at 7:48 AM Daniel Imberman <da...@gmail.com>
> wrote:
> 
> > I worry that multiple crons would become difficult to read for stranger
> > use-cases (for example "run on the first trading day after the 15th of the
> > month"). If we create a python function or class we can easily create a
> > "CronTimeTable" that does exactly what Dmitry is suggesting while still
> > leaving open the possibility of creating other custom schedules.
> >
> > On Sat, Jan 23, 2021, 2:32 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> >> I think whatever approach we decide on we should display
> >> *next_execution_date* in the webserver for each DAG. This would help
> >> most of the users.
> >>
> >> Regards,
> >> Kaxil
> >>
> >> On Sat, Jan 23, 2021 at 10:25 PM Dmitri Khokhlov <dk...@gmail.com>
> >> wrote:
> >>
> >>> Root problem:
> >>> - existing Airflow schedule syntax defines only one interval pattern per
> >>> DAG
> >>> - there are use-cases that need multiple interval patterns per DAG
> >>> (during a day etc)
> >>>
> >>> I vote for "crontab list" solution from Deng Xiaodong. Example:
> >>>
> >>> *schedule_interval = ["* 0,22,23 * * *", "30 1-21 * * *"]
> >>>
> >>> Reasoning:
> >>> - it is additive change - does not remove or break existing usage
> >>> patterns (very important)
> >>> - it is generic and it has compact definition - easy to
> >>> read/print/present in UI (a string). that is why it is better than
> >>> "function" approach.
> >>> - it is complete solution as it allows to define interval based
> >>> schedules of any complexity.
> >>> - it is relatively easy to implement by OR-ing crontabs times and
> >>> choosing next earliest run time and following these instructions from Ash
> >>> Berlin-Taylor <as...@apache.org>:
> >>> "
> >>> The way the scheduler works now it just looks at two columns on the dag
> >>> (model) table called I think "next_dagrun_after" (which is the earliest
> >>> date that the dag run can be created, and "next execution date" (which is
> >>> the value to put in the execution date of the dag run when it's created.
> >>>
> >>> Both these values are set by the dag parser process, which has full
> >>> access to run code. What ever interface for defining new schedule
> >>> expression should run in the existing process, much like how James C did in
> >>> a subclass.
> >>> "
> >>> --
> >>> Dmitri
> >>>
> >>>
> >>> On 2021/01/21 19:12:06, Daniel Imberman <da...@gmail.com>
> >>> wrote:
> >>> > My only concern with tying this to the dag_parsing process is that
> >>> that process might miss SLAs because it takes too long to loop around. I
> >>> could imagine a separate thread or component that can read either TimeTable
> >>> objects or SmartSensor objects and run them might make sense.
> >>> > Ultimately I don’t see anything about SmartSensors that specifically
> >>> need to run in a DAG. It could just as easily be while loop or something
> >>> embarrasingly parallel (as sensors/timetables shouldn’t depend on each
> >>> other).
> >>> >
> >>> > On Thu, Jan 21, 2021 at 11:07 AM, Vikram Koka <vi...@astronomer.io>
> >>> wrote:
> >>> > Great discussion.
> >>> > I generally agree with the "Custom scheduling class" / subclass
> >>> approach which would run as part of the "scheduler" set of processes,
> >>> rather than an internal DAG approach.
> >>> > I do think it would be good to have boundaries on what information
> >>> this class would operate on and at what frequency. This is primarily from a
> >>> performance standpoint, though it could be argued that there are security
> >>> concerns with that as well.
> >>> > Specifically from the "what information would this have access to"
> >>> perspective, I think that interface would be helpful in clarifying some of
> >>> the use cases and making sure that those are covered. One example I was
> >>> thinking about in the "sunset" example is location. I was originally
> >>> thinking of a timezone, but this is more specific than that.
> >>> >
> >>> >
> >>> > On Thu, Jan 21, 2021 at 10:35 AM Ash Berlin-Taylor < ash@apache.org [
> >>> ash@apache.org] > wrote:
> >>> > It shouldn't need something that complex (or to my mind hacky) as in
> >>> internal DAG.
> >>> >
> >>> > The way the scheduler works now it just looks at two columns on the
> >>> dag (model) table called I think "next_dagrun_after" (which is the earliest
> >>> date that the dag run can be created, and "next execution date" (which is
> >>> the value to put in the execution date of the dag run when it's created.
> >>> >
> >>> > Both these values are set by the dag parser process, which has full
> >>> access to run code. What ever interface for defining new schedule
> >>> expression should run in the existing process, much like how James C did in
> >>> a subclass.
> >>> >
> >>> > Ash
> >>> >
> >>> > On 21 January 2021 18:21:58 GMT, Daniel Imberman <
> >>> daniel.imberman@gmail.com [daniel.imberman@gmail.com] > wrote: I think
> >>> James Idea sounds like a pretty good idea. What would you all think of us
> >>> doing something similar to how we handle smart sensors for how we implement
> >>> this? Have an internal DAG that reads all custom timetables and triggers a
> >>> DAG if the function returns True? Seems like a pretty simple/customizeable
> >>> solution.
> >>> > On Wed, Jan 20, 2021 at 5:52 PM, James Timmins < james@astronomer.io [
> >>> james@astronomer.io] > wrote:
> >>> > Django provides a really good model for allowing users to customize
> >>> the behavior of Class Based Views. It's in line w/ what Daniel/Kaxil and co
> >>> are saying about a consistent backend class. It uses a standard base class
> >>> as well as a default concrete implementation. Customization then only
> >>> requires setting an explicit class if you're overriding the default.
> >>> > Seems that the interface is more important than the backend mechanism
> >>> to make this work. There are multiple ways to make this work internally,
> >>> but the interface should be in line with future plans for hooks/extensible
> >>> areas.
> >>> > Just to make things concrete, here's my understanding of what that
> >>> would look like / what they're suggesting.
> >>> > BaseTimetable abstract class - Defines a ` get_next_execution_time `
> >>> method. This method accepts one argument, an arbitrary datetime value.
> >>> Based on that datetime, this method returns the next time the DAG should
> >>> start. This makes it easy to schedule past events, and also makes it easy
> >>> to print out a "dry run" of execution times for testing purposes. - Defines
> >>> a '_check_timetable_arguments ` method that looks for any existing
> >>> timetable args in the DAG and makes sure they're used by whatever Timetable
> >>> class is selected. Error checking.
> >>> > CronTimetable - Default TimetableClass. Built on BaseTimetable.
> >>> > If they want a different timetable, they can just extend BaseTimetable
> >>> and define a custom `get_next_execution_time` class. Then pass the class
> >>> into the DAG constructor under the `timetable_class` argument. So for
> >>> `sunset` or `sunrise`, they could easily create a `SolarTimetable` class
> >>> and pass that in.
> >>> > `get_next_execution_time` can then be called whenever DAGs are parsed
> >>> or whenever tasks run.
> >>> > On Wed, Jan 20, 2021 at 3:53 PM James Coder < jcoder01@gmail.com [
> >>> jcoder01@gmail.com] > wrote:
> >>> > Kaxil you beat me to it. I actually have a dag where I achieve an
> >>> irregular schedule by overriding DAG.next [http://DAG.next]
> >>> _dagrun_info(). If that method were swapped out for an object it may be a
> >>> semi-easy way to make the schedule “plugable”.
> >>> >
> >>> > James Coder
> >>> > On Jan 20, 2021, at 6:37 PM, Kaxil Naik < kaxilnaik@gmail.com [
> >>> kaxilnaik@gmail.com] > wrote:
> >>> >
> >>> > "CronBackend" / "ScheduleIntervalBackend" :D similar to Xcom and
> >>> Secrets Backend
> >>> > Would be definitely good to have Custom Schedule intervals using
> >>> functions/class that is Serializable too.
> >>> >
> >>> > On Wed, Jan 20, 2021 at 11:02 PM QP Hou <qp...@scribd.com.invalid>
> >>> wrote:
> >>> > On Wed, Jan 20, 2021 at 10:22 AM Daniel Imberman
> >>> > < daniel.imberman@gmail.com [daniel.imberman@gmail.com] > wrote:
> >>> > >
> >>> > > I love the idea of allowing users to create their own scheduling
> >>> objects/scheduling python functions. They could either live in the
> >>> scheduler or as a seperate process that trips some value in the DB when it
> >>> is “true”. Would be great from a “marketplace” standpoint as well as users
> >>> could post their custom scheduling objects for others to use.
> >>> > >
> >>> >
> >>> > I like this idea as well, a quick escape patch for custom and complex
> >>> > scheduling behaviors without having to wait for upstream support.
> >>>
> >>
> 
> -- 
> +48 660 796 129
>