You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Daniel Standish <da...@astronomer.io.INVALID> on 2022/08/01 15:40:34 UTC

Re: [DISCUSS] consolidate dag scheduling params

Well, this is a discussion thread, so let's discuss!

Vikram, what kind of implications are you thinking of?  Maybe you could
provide a little more detail about your concerns?

There are certainly alternatives, and of course each of them has tradeoffs.

Here are the options I see:

1. consolidate to schedule (deprecate schedule_interval and timetable; the
proposal under discussion in this thread)
2. do nothing (keep 3 different scheduling params)
3. consolidate to schedule_interval (deprecate timetable)
4. don't deprecate timetable, but remove schedule_on (AIP-48 has added it
in main) and take List[Dataset] in `schedule_interval`

Which would you propose we go with?

If we have a veto against 1 (and again, we're in hypothetical territory
since this is still a discussion thread), my vote would definitely be to go
with 3.

The param `schedule_interval` as catch all is not *sooo *bad, but
`schedule` definitely seems better.  From a new user's perspective, having
misleading names is confusing, and can be a negative when looking at
adoption.  And for a tenured user, it can nag at you.  And dealing with
deprecations as an Airflow user, well it's sort of part of the job, though
it can be bothersome, certainly.

Just to look at / compare feasibility.

Deprecating schedule_interval, it would be more prominent than some other
deprecations because it's a DAG param.  That said, I think it's pretty
straightforward to handle.  For one, it's at the cardinality of DAG and not
*task*, so there is less to change with than some other deprecations.  And
being kwargs-only, it's an easy find and replace that could be automated
with an `upgrade-check` script.  It's very easy to detect, either to warn
about, or ultimately, after upgrade, to fail the DAG parse (which forces
user to fix).  Perhaps we could even experiment with a deprecation fix
tool, so that users could get help fixing deprecations, not necessarily in
the context of an upgrade, since we're not looking at 3.0 any time soon.

We could compare with execution_date, if execution_date is to be removed in
3.0.  I think the deprecation of execution_date will be a more difficult
thing for users to handle.  For one, potentially higher cardinality, since
it is used in tasks.  But also, the use of execution_date can be harder to
detect, since it can be in templates and elsewhere in custom operators.  *(With
schedule_interval, since it's a top-level dag kwarg, it will be easier.)*
And execution_date was replaced with `logical_date` for similar reasons --
avoiding user confusion.




On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka <vi...@astronomer.io.invalid>
wrote:

> -1 from me.
>
> Though I agree in principle with the idea of consolidation, I don't think
> we should be doing this yet until we understand the implications
> completely.
> I am really not in favor of deprecation of the existing params, unless
> there is really no alternative.
>
>
> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
> <da...@astronomer.io.invalid> wrote:
>
>> So far, seems all in favor.
>>
>> I will just highlight, in case it's not clear.... when we release this
>> (presumably 2.4), basically every single dag in existence will start
>> emitting deprecation warnings, and prior to 3.0, basically every dag in
>> existence will need to be updated.
>>
>> Thankfully, for most people, this should be an easy update since DAG is
>> kwargs only, IIRC.
>>
>>
>>
>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi <br...@astronomer.io.invalid>
>> wrote:
>>
>>> +1 to consolidating and "schedule".
>>>
>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl <dr...@astronomer.io.invalid>
>>> wrote:
>>>
>>>> +1 for ‘schedule', and another +1 to the importance of consolidating to
>>>> one being more important than the name of that one
>>>>
>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>
>>>> Consolidating to a single scheduling parameter is a big +1 from me.
>>>> `schedule` seems like a nice catch-all.
>>>>
>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>> jedcunningham@apache.org> wrote:
>>>>
>>>>> +1 on moving to a single `schedule` param.
>>>>>
>>>>
>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Jarek Potiuk <ja...@potiuk.com>.
Yep. I think immediate deprecation is the way to go.

Those who care will change. For those who don't care - well, they don't
care so it doesn't matter.

On Sat, Aug 6, 2022 at 9:10 PM Daniel Standish
<da...@astronomer.io.invalid> wrote:

> It's not addressed explicitly in the vote.  Personally I think we should
> add a deprecation warning immediately.  I understand that there is the
> thought to "give users time to make the change" before adding the warning.
> But deprecation warnings are probably the most effective, most relied upon
> mechanism by which we signal that change is required.  I don't really see a
> benefit to delaying.  And even though 3.0 is not anywhere in sight, some
> users will probably end up staying on 2.4 until 3.0 đŸ¤·.  Adding the warning
> immediately maximizes the time that users have to make the change, and
> maximizes the likelihood that users will receive the warning.  That said, I
> would certainly not veto if that's what the group wants to do.  And if it
> seems that folks are in favor of this, I can amend the vote thread to add
> that provision.
>
> On Sat, Aug 6, 2022 at 12:01 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> One question: are we deprecating the existing parameters straight away or
>> leaving it without a warning for one release? (I think someone talked about
>> that at one point)
>>
>> On 6 August 2022 19:46:02 BST, Daniel Standish
>> <da...@astronomer.io.INVALID> wrote:
>>>
>>> Well, thanks for forcing us to make the case :)
>>>
>>> I'll initiate a lazy consensus "vote"
>>>
>>>
>>> On Fri, Aug 5, 2022 at 1:01 PM Vikram Koka <vi...@astronomer.io.invalid>
>>> wrote:
>>>
>>>> Fair points all.
>>>> I withdraw my objection and agree with the consolidation proposal.
>>>>
>>>> Vikram
>>>>
>>>>
>>>> On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> Let me also add to why I think we should consolidate to one param::
>>>>>
>>>>> * "schedule_interval"
>>>>> * "schedule_on"
>>>>> * "timetable"
>>>>>
>>>>> Imagine you are a new user coming to Airflow.
>>>>>
>>>>> Do you immediately recognise which one is which ? Do you see what the
>>>>> difference is?
>>>>>
>>>>> * "Schedule_on" = "Every Sunday" is equally good as "schedule_on"
>>>>> "dataset".
>>>>> * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or
>>>>> what?
>>>>>
>>>>> For me even if we wanted to have different names, the current set of
>>>>> names for those params is just terribly ambiguous.
>>>>>
>>>>> Making it "schedule" and acting depending on what you pass to it is so
>>>>> much better and simpler.
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>> On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish
>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> @Vikram, yeah, I understand that concern.  Making a breaking change
>>>>>> is something we shouldn't do carelessly, without consideration, without
>>>>>> good reason.  But I think we have a good reason here.
>>>>>>
>>>>>> Deprecations are somewhat regular in airflow.  In 2.0 we moved a lot
>>>>>> of imports.  We also changed the CLI semantics.  Probably there were other
>>>>>> breaking changes that I'm forgetting *(granted sometimes we didn't
>>>>>> remove the backward-compatibility immediately in 2.0, and that remains an
>>>>>> option here)*.  Probably just about every dag in existence had to be
>>>>>> changed then too.  And that's what a major release is for.
>>>>>>
>>>>>> In 3.0, it looks like we're deprecating execution_date...  That's
>>>>>> even bigger.  So, were we to do this, (1) moving to `schedule` wouldn't be
>>>>>> the only notable breaking change and (2) by comparison it's a pretty simple
>>>>>> change.  And if, as part of the upgrade to a major release, you have to do
>>>>>> renames and such anyway, the marginal cost of one more isn't massive.
>>>>>>
>>>>>> I guess my take is that we can't be chained to our previous decisions
>>>>>> forever.  And we have to have the flexibility to adapt our interfaces so
>>>>>> that they still make sense and are consistent and don't cause excessive
>>>>>> eyebrow wrinkling.  We have adopted semantic versioning to provide a
>>>>>> mechanism for signaling when breaking changes are coming.  And we can add
>>>>>> tooling to help users along the way.
>>>>>>
>>>>>> Additionally, we can't *only *look at the burden of such a change
>>>>>> for users. We must also weigh this against the burden that we maintain by
>>>>>> *not* changing, i.e. by having too many params, or ill-named params.
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <
>>>>>> jedcunningham@apache.org> wrote:
>>>>>>
>>>>>>> I'm not in favor of reusing `schedule_interval` for datasets (or
>>>>>>> folding `timetable` into it either), primarily because it doesn't feel like
>>>>>>> a good name to describe what it does. If we don't have the appetite to
>>>>>>> consolidate, my vote is we stick with 3 separate params. So preferably #1,
>>>>>>> if not then #2 for me.
>>>>>>>
>>>>>>> I'm less concerned about the "everyone has to update every DAG"
>>>>>>> issue, as it only becomes a problem for Airflow 3, folks can freely ignore
>>>>>>> it until then, and when they do update it it's a simple find/replace.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <
>>>>>>> a.r.gibbs.massmails@gmail.com> wrote:
>>>>>>>
>>>>>>>> I'll just +1 that IMO a UI reminder is a great option.
>>>>>>>>
>>>>>>>> 2.3.0 added an alert when robots.txt was being scanned, which was
>>>>>>>> great, because I ignored it for a while, but eventually caved and looked
>>>>>>>> into it. This I would see as more of the same. I currently get loads of
>>>>>>>> warnings about things in my config file that have moved; but typically only
>>>>>>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see
>>>>>>>> the alert, think "I must fix those", and then the Homer effect kicks in,
>>>>>>>> and something else takes up the space in my brain. Having them as a banner
>>>>>>>> in the GUI would 100% get me to sort it sooner or later.
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka
>>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> I completely agree with wanting to consolidate, but I am very
>>>>>>>>> concerned about people needing to change their existing code.
>>>>>>>>> As Ping and Felix also said, I worry that people ignore
>>>>>>>>> deprecation warnings, especially since the original code authors don't
>>>>>>>>> really look at logs or other pipeline artifacts unless something is broken.
>>>>>>>>>
>>>>>>>>> With respect to the options you outlined, I am personally leaning
>>>>>>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48
>>>>>>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I
>>>>>>>>> could be convinced with (3), since I think that "timetable" is relatively
>>>>>>>>> recent and have not seen as much usage for this invocation yet. I am
>>>>>>>>> definitely curious about other perspectives on this as well.
>>>>>>>>>
>>>>>>>>> I have a hard time with option (1) of deprecating "schedule
>>>>>>>>> interval", because of the historical nature of it and the code changes that
>>>>>>>>> people may have to make to deal with it.
>>>>>>>>>
>>>>>>>>> Thanks for raising this topic and the discussion,
>>>>>>>>>
>>>>>>>>> Vikram
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> I agree with those (including myself) who have suggested that it
>>>>>>>>>> would be great to have a deprecation fix tool.
>>>>>>>>>>
>>>>>>>>>> I also think that UI notifications would be valuable.  Another
>>>>>>>>>> option would be to provide a less intrusive indicator somewhere (like when
>>>>>>>>>> chrome lets you know you need to update), and then you click on it and it
>>>>>>>>>> takes you to a page that lists out all the deprecation warnings.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I know we are distracting a bit from the main subject, but I
>>>>>>>>>>> think it's worth it.
>>>>>>>>>>>
>>>>>>>>>>> I think this is a fact of life that people don't remove
>>>>>>>>>>> deprecations. But we have a choice of "not doing anything about it" and
>>>>>>>>>>> complaining. Or doing something that might improve it.
>>>>>>>>>>>
>>>>>>>>>>> I usually err on the side of "doing something" when I see a
>>>>>>>>>>> problem. I am a doer :).
>>>>>>>>>>>
>>>>>>>>>>> And I have a possibly controversial proposal. I got the
>>>>>>>>>>> inspiration from node.js old versions. Whenever an old version of Node is
>>>>>>>>>>> being used, you have a 20 seconds delay artificially added and information
>>>>>>>>>>> that in order to remove those 20 seconds you need to upgrade.
>>>>>>>>>>>
>>>>>>>>>>> Why don't we do that for some really old/serious deprecations ?
>>>>>>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the UI if
>>>>>>>>>>> you have any deprecation unhandled. Something that is annoying enough that
>>>>>>>>>>> will cause people to make an action but not intrusive enough to break
>>>>>>>>>>> things.
>>>>>>>>>>>
>>>>>>>>>>> I think this is a very basic psychological trait:
>>>>>>>>>>>
>>>>>>>>>>> If it does not cost the user anything to leave the deprecations,
>>>>>>>>>>> there will be a big number of people who will do that.
>>>>>>>>>>> We need to make sure that deprecations are costing our users "a
>>>>>>>>>>> little" and that they are aware that they can remove the cost (and know how
>>>>>>>>>>> - this is super important that we only show such a message with clear
>>>>>>>>>>> instructions on what to do).
>>>>>>>>>>>
>>>>>>>>>>> Such a message might be annoying in many ways:
>>>>>>>>>>>
>>>>>>>>>>> * it can obstruct significant part of the screen
>>>>>>>>>>> * It can be flashy and had very annoying colours (think bold
>>>>>>>>>>> pink fonts on flash-green background for example)
>>>>>>>>>>> * It can not allow to do anything before the user closes it with
>>>>>>>>>>> a REALLY small 'x' that it is difficult to point at (think cookie banners).
>>>>>>>>>>> And when they click they can be greeted with "You know - you can actually
>>>>>>>>>>> remove this warning by doing THIS".
>>>>>>>>>>> * and we have probably more options - our UX specialists might
>>>>>>>>>>> have some good ideas here
>>>>>>>>>>>
>>>>>>>>>>> For anyone who complains about "people do not act on
>>>>>>>>>>> deprecation." - yeah this is true.
>>>>>>>>>>> But one universal truth there is that if we won't act, this will
>>>>>>>>>>> change nothing in the behaviour of our users - this is 100% certain.
>>>>>>>>>>>
>>>>>>>>>>> If we do something - it might improve it. We will not get a 100%
>>>>>>>>>>> success for sure, but we can at least do something to improve it.
>>>>>>>>>>>
>>>>>>>>>>> WDYT?
>>>>>>>>>>>
>>>>>>>>>>> J.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall
>>>>>>>>>>> <fe...@pm.me.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Ping,
>>>>>>>>>>>>
>>>>>>>>>>>> I had this idea about auto-fixing deprecation warnings too when
>>>>>>>>>>>> I started the airflint project.
>>>>>>>>>>>> I totally agree with what you said that often people ignore
>>>>>>>>>>>> these warnings and therefore a cli tool would come in very handy. It could
>>>>>>>>>>>> look similar to the airflint project and also be based on the python
>>>>>>>>>>>> refactor package, but I would keep it out of this discussion for now.
>>>>>>>>>>>>
>>>>>>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there
>>>>>>>>>>>> but also tend to having one schedule arg, because of
>>>>>>>>>>>> simplicity for the dag author.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Felix
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sent from Proton Mail for iOS
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>>
>>>>>>>>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>>>>>>>>
>>>>>>>>>>>> I agree with Vikram, we should be very careful about
>>>>>>>>>>>> deprecating existing params even if we have warnings around it. Not sure if
>>>>>>>>>>>> this is a general case, but I notice that people ignore warnings all the
>>>>>>>>>>>> time until it starts to break. Thus I am thinking if airflow can have a cli
>>>>>>>>>>>> tool to help users to batch update these deprecating params just like
>>>>>>>>>>>> python's futurize tool. This can greatly ease the burden of upgrading the
>>>>>>>>>>>> airflow version.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Ping
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>>>>>>>>> daniel.standish@astronomer.io> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe
>>>>>>>>>>>>>> you could provide a little more detail about your concerns?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are certainly alternatives, and of course each of them
>>>>>>>>>>>>>> has tradeoffs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here are the options I see:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>>>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>>>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48
>>>>>>>>>>>>>> has added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Which would you propose we go with?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>>>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>>>>>>>>> definitely be to go with 3.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad,
>>>>>>>>>>>>>> but `schedule` definitely seems better. From a new user's perspective,
>>>>>>>>>>>>>> having misleading names is confusing, and can be a negative when looking at
>>>>>>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>>>>>>>>>>> it can be bothersome, certainly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just to look at / compare feasibility.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Deprecating schedule_interval, it would be more prominent
>>>>>>>>>>>>>> than some other deprecations because it's a DAG param. That said, I think
>>>>>>>>>>>>>> it's pretty straightforward to handle. For one, it's at the cardinality of
>>>>>>>>>>>>>> DAG and not *task*, so there is less to change with than
>>>>>>>>>>>>>> some other deprecations. And being kwargs-only, it's an easy find and
>>>>>>>>>>>>>> replace that could be automated with an `upgrade-check` script. It's very
>>>>>>>>>>>>>> easy to detect, either to warn about, or ultimately, after upgrade, to fail
>>>>>>>>>>>>>> the DAG parse (which forces user to fix). Perhaps we could even experiment
>>>>>>>>>>>>>> with a deprecation fix tool, so that users could get help fixing
>>>>>>>>>>>>>> deprecations, not necessarily in the context of an upgrade, since we're not
>>>>>>>>>>>>>> looking at 3.0 any time soon.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We could compare with execution_date, if execution_date is to
>>>>>>>>>>>>>> be removed in 3.0. I think the deprecation of execution_date will be a more
>>>>>>>>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>>>>>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>>>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>>>>>>>>>>> custom operators. *(With schedule_interval, since it's a
>>>>>>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date
>>>>>>>>>>>>>> was replaced with `logical_date` for similar reasons -- avoiding user
>>>>>>>>>>>>>> confusion.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>>>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -1 from me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Though I agree in principle with the idea of consolidation,
>>>>>>>>>>>>>>> I don't think we should be doing this yet until we understand the
>>>>>>>>>>>>>>> implications completely.
>>>>>>>>>>>>>>> I am really not in favor of deprecation of the existing
>>>>>>>>>>>>>>> params, unless there is really no alternative.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So far, seems all in favor.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I will just highlight, in case it's not clear.... when we
>>>>>>>>>>>>>>>> release this (presumably 2.4), basically every single dag in existence will
>>>>>>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically every dag
>>>>>>>>>>>>>>>> in existence will need to be updated.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thankfully, for most people, this should be an easy update
>>>>>>>>>>>>>>>> since DAG is kwargs only, IIRC.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big
>>>>>>>>>>>>>>>>>> +1 from me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
It's not addressed explicitly in the vote.  Personally I think we should
add a deprecation warning immediately.  I understand that there is the
thought to "give users time to make the change" before adding the warning.
But deprecation warnings are probably the most effective, most relied upon
mechanism by which we signal that change is required.  I don't really see a
benefit to delaying.  And even though 3.0 is not anywhere in sight, some
users will probably end up staying on 2.4 until 3.0 đŸ¤·.  Adding the warning
immediately maximizes the time that users have to make the change, and
maximizes the likelihood that users will receive the warning.  That said, I
would certainly not veto if that's what the group wants to do.  And if it
seems that folks are in favor of this, I can amend the vote thread to add
that provision.

On Sat, Aug 6, 2022 at 12:01 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> One question: are we deprecating the existing parameters straight away or
> leaving it without a warning for one release? (I think someone talked about
> that at one point)
>
> On 6 August 2022 19:46:02 BST, Daniel Standish
> <da...@astronomer.io.INVALID> wrote:
>>
>> Well, thanks for forcing us to make the case :)
>>
>> I'll initiate a lazy consensus "vote"
>>
>>
>> On Fri, Aug 5, 2022 at 1:01 PM Vikram Koka <vi...@astronomer.io.invalid>
>> wrote:
>>
>>> Fair points all.
>>> I withdraw my objection and agree with the consolidation proposal.
>>>
>>> Vikram
>>>
>>>
>>> On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Let me also add to why I think we should consolidate to one param::
>>>>
>>>> * "schedule_interval"
>>>> * "schedule_on"
>>>> * "timetable"
>>>>
>>>> Imagine you are a new user coming to Airflow.
>>>>
>>>> Do you immediately recognise which one is which ? Do you see what the
>>>> difference is?
>>>>
>>>> * "Schedule_on" = "Every Sunday" is equally good as "schedule_on"
>>>> "dataset".
>>>> * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what?
>>>>
>>>> For me even if we wanted to have different names, the current set of
>>>> names for those params is just terribly ambiguous.
>>>>
>>>> Making it "schedule" and acting depending on what you pass to it is so
>>>> much better and simpler.
>>>>
>>>> J.
>>>>
>>>>
>>>> On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish
>>>> <da...@astronomer.io.invalid> wrote:
>>>>
>>>>> @Vikram, yeah, I understand that concern.  Making a breaking change is
>>>>> something we shouldn't do carelessly, without consideration, without good
>>>>> reason.  But I think we have a good reason here.
>>>>>
>>>>> Deprecations are somewhat regular in airflow.  In 2.0 we moved a lot
>>>>> of imports.  We also changed the CLI semantics.  Probably there were other
>>>>> breaking changes that I'm forgetting *(granted sometimes we didn't
>>>>> remove the backward-compatibility immediately in 2.0, and that remains an
>>>>> option here)*.  Probably just about every dag in existence had to be
>>>>> changed then too.  And that's what a major release is for.
>>>>>
>>>>> In 3.0, it looks like we're deprecating execution_date...  That's even
>>>>> bigger.  So, were we to do this, (1) moving to `schedule` wouldn't be the
>>>>> only notable breaking change and (2) by comparison it's a pretty simple
>>>>> change.  And if, as part of the upgrade to a major release, you have to do
>>>>> renames and such anyway, the marginal cost of one more isn't massive.
>>>>>
>>>>> I guess my take is that we can't be chained to our previous decisions
>>>>> forever.  And we have to have the flexibility to adapt our interfaces so
>>>>> that they still make sense and are consistent and don't cause excessive
>>>>> eyebrow wrinkling.  We have adopted semantic versioning to provide a
>>>>> mechanism for signaling when breaking changes are coming.  And we can add
>>>>> tooling to help users along the way.
>>>>>
>>>>> Additionally, we can't *only *look at the burden of such a change for
>>>>> users. We must also weigh this against the burden that we maintain by
>>>>> *not* changing, i.e. by having too many params, or ill-named params.
>>>>>
>>>>>
>>>>> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <
>>>>> jedcunningham@apache.org> wrote:
>>>>>
>>>>>> I'm not in favor of reusing `schedule_interval` for datasets (or
>>>>>> folding `timetable` into it either), primarily because it doesn't feel like
>>>>>> a good name to describe what it does. If we don't have the appetite to
>>>>>> consolidate, my vote is we stick with 3 separate params. So preferably #1,
>>>>>> if not then #2 for me.
>>>>>>
>>>>>> I'm less concerned about the "everyone has to update every DAG"
>>>>>> issue, as it only becomes a problem for Airflow 3, folks can freely ignore
>>>>>> it until then, and when they do update it it's a simple find/replace.
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <
>>>>>> a.r.gibbs.massmails@gmail.com> wrote:
>>>>>>
>>>>>>> I'll just +1 that IMO a UI reminder is a great option.
>>>>>>>
>>>>>>> 2.3.0 added an alert when robots.txt was being scanned, which was
>>>>>>> great, because I ignored it for a while, but eventually caved and looked
>>>>>>> into it. This I would see as more of the same. I currently get loads of
>>>>>>> warnings about things in my config file that have moved; but typically only
>>>>>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see
>>>>>>> the alert, think "I must fix those", and then the Homer effect kicks in,
>>>>>>> and something else takes up the space in my brain. Having them as a banner
>>>>>>> in the GUI would 100% get me to sort it sooner or later.
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka
>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>>
>>>>>>>> I completely agree with wanting to consolidate, but I am very
>>>>>>>> concerned about people needing to change their existing code.
>>>>>>>> As Ping and Felix also said, I worry that people ignore deprecation
>>>>>>>> warnings, especially since the original code authors don't really look at
>>>>>>>> logs or other pipeline artifacts unless something is broken.
>>>>>>>>
>>>>>>>> With respect to the options you outlined, I am personally leaning
>>>>>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48
>>>>>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I
>>>>>>>> could be convinced with (3), since I think that "timetable" is relatively
>>>>>>>> recent and have not seen as much usage for this invocation yet. I am
>>>>>>>> definitely curious about other perspectives on this as well.
>>>>>>>>
>>>>>>>> I have a hard time with option (1) of deprecating "schedule
>>>>>>>> interval", because of the historical nature of it and the code changes that
>>>>>>>> people may have to make to deal with it.
>>>>>>>>
>>>>>>>> Thanks for raising this topic and the discussion,
>>>>>>>>
>>>>>>>> Vikram
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>> I agree with those (including myself) who have suggested that it
>>>>>>>>> would be great to have a deprecation fix tool.
>>>>>>>>>
>>>>>>>>> I also think that UI notifications would be valuable.  Another
>>>>>>>>> option would be to provide a less intrusive indicator somewhere (like when
>>>>>>>>> chrome lets you know you need to update), and then you click on it and it
>>>>>>>>> takes you to a page that lists out all the deprecation warnings.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I know we are distracting a bit from the main subject, but I
>>>>>>>>>> think it's worth it.
>>>>>>>>>>
>>>>>>>>>> I think this is a fact of life that people don't remove
>>>>>>>>>> deprecations. But we have a choice of "not doing anything about it" and
>>>>>>>>>> complaining. Or doing something that might improve it.
>>>>>>>>>>
>>>>>>>>>> I usually err on the side of "doing something" when I see a
>>>>>>>>>> problem. I am a doer :).
>>>>>>>>>>
>>>>>>>>>> And I have a possibly controversial proposal. I got the
>>>>>>>>>> inspiration from node.js old versions. Whenever an old version of Node is
>>>>>>>>>> being used, you have a 20 seconds delay artificially added and information
>>>>>>>>>> that in order to remove those 20 seconds you need to upgrade.
>>>>>>>>>>
>>>>>>>>>> Why don't we do that for some really old/serious deprecations ?
>>>>>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the UI if
>>>>>>>>>> you have any deprecation unhandled. Something that is annoying enough that
>>>>>>>>>> will cause people to make an action but not intrusive enough to break
>>>>>>>>>> things.
>>>>>>>>>>
>>>>>>>>>> I think this is a very basic psychological trait:
>>>>>>>>>>
>>>>>>>>>> If it does not cost the user anything to leave the deprecations,
>>>>>>>>>> there will be a big number of people who will do that.
>>>>>>>>>> We need to make sure that deprecations are costing our users "a
>>>>>>>>>> little" and that they are aware that they can remove the cost (and know how
>>>>>>>>>> - this is super important that we only show such a message with clear
>>>>>>>>>> instructions on what to do).
>>>>>>>>>>
>>>>>>>>>> Such a message might be annoying in many ways:
>>>>>>>>>>
>>>>>>>>>> * it can obstruct significant part of the screen
>>>>>>>>>> * It can be flashy and had very annoying colours (think bold pink
>>>>>>>>>> fonts on flash-green background for example)
>>>>>>>>>> * It can not allow to do anything before the user closes it with
>>>>>>>>>> a REALLY small 'x' that it is difficult to point at (think cookie banners).
>>>>>>>>>> And when they click they can be greeted with "You know - you can actually
>>>>>>>>>> remove this warning by doing THIS".
>>>>>>>>>> * and we have probably more options - our UX specialists might
>>>>>>>>>> have some good ideas here
>>>>>>>>>>
>>>>>>>>>> For anyone who complains about "people do not act on
>>>>>>>>>> deprecation." - yeah this is true.
>>>>>>>>>> But one universal truth there is that if we won't act, this will
>>>>>>>>>> change nothing in the behaviour of our users - this is 100% certain.
>>>>>>>>>>
>>>>>>>>>> If we do something - it might improve it. We will not get a 100%
>>>>>>>>>> success for sure, but we can at least do something to improve it.
>>>>>>>>>>
>>>>>>>>>> WDYT?
>>>>>>>>>>
>>>>>>>>>> J.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall
>>>>>>>>>> <fe...@pm.me.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ping,
>>>>>>>>>>>
>>>>>>>>>>> I had this idea about auto-fixing deprecation warnings too when
>>>>>>>>>>> I started the airflint project.
>>>>>>>>>>> I totally agree with what you said that often people ignore
>>>>>>>>>>> these warnings and therefore a cli tool would come in very handy. It could
>>>>>>>>>>> look similar to the airflint project and also be based on the python
>>>>>>>>>>> refactor package, but I would keep it out of this discussion for now.
>>>>>>>>>>>
>>>>>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there
>>>>>>>>>>> but also tend to having one schedule arg, because of simplicity
>>>>>>>>>>> for the dag author.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Felix
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sent from Proton Mail for iOS
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Daniel,
>>>>>>>>>>>
>>>>>>>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>>>>>>>
>>>>>>>>>>> I agree with Vikram, we should be very careful about deprecating
>>>>>>>>>>> existing params even if we have warnings around it. Not sure if this is a
>>>>>>>>>>> general case, but I notice that people ignore warnings all the time until
>>>>>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool to
>>>>>>>>>>> help users to batch update these deprecating params just like python's
>>>>>>>>>>> futurize tool. This can greatly ease the burden of upgrading the airflow
>>>>>>>>>>> version.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Ping
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>>>>>>>> daniel.standish@astronomer.io> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe
>>>>>>>>>>>>> you could provide a little more detail about your concerns?
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are certainly alternatives, and of course each of them
>>>>>>>>>>>>> has tradeoffs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here are the options I see:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48
>>>>>>>>>>>>> has added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>>>>>>>
>>>>>>>>>>>>> Which would you propose we go with?
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>>>>>>>> definitely be to go with 3.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad,
>>>>>>>>>>>>> but `schedule` definitely seems better. From a new user's perspective,
>>>>>>>>>>>>> having misleading names is confusing, and can be a negative when looking at
>>>>>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>>>>>>>>>> it can be bothersome, certainly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Just to look at / compare feasibility.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Deprecating schedule_interval, it would be more prominent than
>>>>>>>>>>>>> some other deprecations because it's a DAG param. That said, I think it's
>>>>>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality of DAG
>>>>>>>>>>>>> and not *task*, so there is less to change with than some
>>>>>>>>>>>>> other deprecations. And being kwargs-only, it's an easy find and replace
>>>>>>>>>>>>> that could be automated with an `upgrade-check` script. It's very easy to
>>>>>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail the DAG
>>>>>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment with a
>>>>>>>>>>>>> deprecation fix tool, so that users could get help fixing deprecations, not
>>>>>>>>>>>>> necessarily in the context of an upgrade, since we're not looking at 3.0
>>>>>>>>>>>>> any time soon.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We could compare with execution_date, if execution_date is to
>>>>>>>>>>>>> be removed in 3.0. I think the deprecation of execution_date will be a more
>>>>>>>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>>>>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>>>>>>>>>> custom operators. *(With schedule_interval, since it's a
>>>>>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date
>>>>>>>>>>>>> was replaced with `logical_date` for similar reasons -- avoiding user
>>>>>>>>>>>>> confusion.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> -1 from me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Though I agree in principle with the idea of consolidation, I
>>>>>>>>>>>>>> don't think we should be doing this yet until we understand the
>>>>>>>>>>>>>> implications completely.
>>>>>>>>>>>>>> I am really not in favor of deprecation of the existing
>>>>>>>>>>>>>> params, unless there is really no alternative.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So far, seems all in favor.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will just highlight, in case it's not clear.... when we
>>>>>>>>>>>>>>> release this (presumably 2.4), basically every single dag in existence will
>>>>>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically every dag
>>>>>>>>>>>>>>> in existence will need to be updated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thankfully, for most people, this should be an easy update
>>>>>>>>>>>>>>> since DAG is kwargs only, IIRC.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1
>>>>>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Ash Berlin-Taylor <as...@apache.org>.
One question: are we deprecating the existing parameters straight away or leaving it without a warning for one release? (I think someone talked about that at one point)

On 6 August 2022 19:46:02 BST, Daniel Standish <da...@astronomer.io.INVALID> wrote:
>Well, thanks for forcing us to make the case :)
>
>I'll initiate a lazy consensus "vote"
>
>
>On Fri, Aug 5, 2022 at 1:01 PM Vikram Koka <vi...@astronomer.io.invalid>
>wrote:
>
>> Fair points all.
>> I withdraw my objection and agree with the consolidation proposal.
>>
>> Vikram
>>
>>
>> On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Let me also add to why I think we should consolidate to one param::
>>>
>>> * "schedule_interval"
>>> * "schedule_on"
>>> * "timetable"
>>>
>>> Imagine you are a new user coming to Airflow.
>>>
>>> Do you immediately recognise which one is which ? Do you see what the
>>> difference is?
>>>
>>> * "Schedule_on" = "Every Sunday" is equally good as "schedule_on"
>>> "dataset".
>>> * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what?
>>>
>>> For me even if we wanted to have different names, the current set of
>>> names for those params is just terribly ambiguous.
>>>
>>> Making it "schedule" and acting depending on what you pass to it is so
>>> much better and simpler.
>>>
>>> J.
>>>
>>>
>>> On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish
>>> <da...@astronomer.io.invalid> wrote:
>>>
>>>> @Vikram, yeah, I understand that concern.  Making a breaking change is
>>>> something we shouldn't do carelessly, without consideration, without good
>>>> reason.  But I think we have a good reason here.
>>>>
>>>> Deprecations are somewhat regular in airflow.  In 2.0 we moved a lot of
>>>> imports.  We also changed the CLI semantics.  Probably there were other
>>>> breaking changes that I'm forgetting *(granted sometimes we didn't
>>>> remove the backward-compatibility immediately in 2.0, and that remains an
>>>> option here)*.  Probably just about every dag in existence had to be
>>>> changed then too.  And that's what a major release is for.
>>>>
>>>> In 3.0, it looks like we're deprecating execution_date...  That's even
>>>> bigger.  So, were we to do this, (1) moving to `schedule` wouldn't be the
>>>> only notable breaking change and (2) by comparison it's a pretty simple
>>>> change.  And if, as part of the upgrade to a major release, you have to do
>>>> renames and such anyway, the marginal cost of one more isn't massive.
>>>>
>>>> I guess my take is that we can't be chained to our previous decisions
>>>> forever.  And we have to have the flexibility to adapt our interfaces so
>>>> that they still make sense and are consistent and don't cause excessive
>>>> eyebrow wrinkling.  We have adopted semantic versioning to provide a
>>>> mechanism for signaling when breaking changes are coming.  And we can add
>>>> tooling to help users along the way.
>>>>
>>>> Additionally, we can't *only *look at the burden of such a change for
>>>> users. We must also weigh this against the burden that we maintain by
>>>> *not* changing, i.e. by having too many params, or ill-named params.
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <je...@apache.org>
>>>> wrote:
>>>>
>>>>> I'm not in favor of reusing `schedule_interval` for datasets (or
>>>>> folding `timetable` into it either), primarily because it doesn't feel like
>>>>> a good name to describe what it does. If we don't have the appetite to
>>>>> consolidate, my vote is we stick with 3 separate params. So preferably #1,
>>>>> if not then #2 for me.
>>>>>
>>>>> I'm less concerned about the "everyone has to update every DAG" issue,
>>>>> as it only becomes a problem for Airflow 3, folks can freely ignore it
>>>>> until then, and when they do update it it's a simple find/replace.
>>>>>
>>>>>
>>>>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <
>>>>> a.r.gibbs.massmails@gmail.com> wrote:
>>>>>
>>>>>> I'll just +1 that IMO a UI reminder is a great option.
>>>>>>
>>>>>> 2.3.0 added an alert when robots.txt was being scanned, which was
>>>>>> great, because I ignored it for a while, but eventually caved and looked
>>>>>> into it. This I would see as more of the same. I currently get loads of
>>>>>> warnings about things in my config file that have moved; but typically only
>>>>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see
>>>>>> the alert, think "I must fix those", and then the Homer effect kicks in,
>>>>>> and something else takes up the space in my brain. Having them as a banner
>>>>>> in the GUI would 100% get me to sort it sooner or later.
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka
>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> I completely agree with wanting to consolidate, but I am very
>>>>>>> concerned about people needing to change their existing code.
>>>>>>> As Ping and Felix also said, I worry that people ignore deprecation
>>>>>>> warnings, especially since the original code authors don't really look at
>>>>>>> logs or other pipeline artifacts unless something is broken.
>>>>>>>
>>>>>>> With respect to the options you outlined, I am personally leaning
>>>>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48
>>>>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I
>>>>>>> could be convinced with (3), since I think that "timetable" is relatively
>>>>>>> recent and have not seen as much usage for this invocation yet. I am
>>>>>>> definitely curious about other perspectives on this as well.
>>>>>>>
>>>>>>> I have a hard time with option (1) of deprecating "schedule
>>>>>>> interval", because of the historical nature of it and the code changes that
>>>>>>> people may have to make to deal with it.
>>>>>>>
>>>>>>> Thanks for raising this topic and the discussion,
>>>>>>>
>>>>>>> Vikram
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>>> I agree with those (including myself) who have suggested that it
>>>>>>>> would be great to have a deprecation fix tool.
>>>>>>>>
>>>>>>>> I also think that UI notifications would be valuable.  Another
>>>>>>>> option would be to provide a less intrusive indicator somewhere (like when
>>>>>>>> chrome lets you know you need to update), and then you click on it and it
>>>>>>>> takes you to a page that lists out all the deprecation warnings.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I know we are distracting a bit from the main subject, but I think
>>>>>>>>> it's worth it.
>>>>>>>>>
>>>>>>>>> I think this is a fact of life that people don't remove
>>>>>>>>> deprecations. But we have a choice of "not doing anything about it" and
>>>>>>>>> complaining. Or doing something that might improve it.
>>>>>>>>>
>>>>>>>>> I usually err on the side of "doing something" when I see a
>>>>>>>>> problem. I am a doer :).
>>>>>>>>>
>>>>>>>>> And I have a possibly controversial proposal. I got the inspiration
>>>>>>>>> from node.js old versions. Whenever an old version of Node is being used,
>>>>>>>>> you have a 20 seconds delay artificially added and information that in
>>>>>>>>> order to remove those 20 seconds you need to upgrade.
>>>>>>>>>
>>>>>>>>> Why don't we do that for some really old/serious deprecations ?
>>>>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the UI if
>>>>>>>>> you have any deprecation unhandled. Something that is annoying enough that
>>>>>>>>> will cause people to make an action but not intrusive enough to break
>>>>>>>>> things.
>>>>>>>>>
>>>>>>>>> I think this is a very basic psychological trait:
>>>>>>>>>
>>>>>>>>> If it does not cost the user anything to leave the deprecations,
>>>>>>>>> there will be a big number of people who will do that.
>>>>>>>>> We need to make sure that deprecations are costing our users "a
>>>>>>>>> little" and that they are aware that they can remove the cost (and know how
>>>>>>>>> - this is super important that we only show such a message with clear
>>>>>>>>> instructions on what to do).
>>>>>>>>>
>>>>>>>>> Such a message might be annoying in many ways:
>>>>>>>>>
>>>>>>>>> * it can obstruct significant part of the screen
>>>>>>>>> * It can be flashy and had very annoying colours (think bold pink
>>>>>>>>> fonts on flash-green background for example)
>>>>>>>>> * It can not allow to do anything before the user closes it with a
>>>>>>>>> REALLY small 'x' that it is difficult to point at (think cookie banners).
>>>>>>>>> And when they click they can be greeted with "You know - you can actually
>>>>>>>>> remove this warning by doing THIS".
>>>>>>>>> * and we have probably more options - our UX specialists might have
>>>>>>>>> some good ideas here
>>>>>>>>>
>>>>>>>>> For anyone who complains about "people do not act on deprecation."
>>>>>>>>> - yeah this is true.
>>>>>>>>> But one universal truth there is that if we won't act, this will
>>>>>>>>> change nothing in the behaviour of our users - this is 100% certain.
>>>>>>>>>
>>>>>>>>> If we do something - it might improve it. We will not get a 100%
>>>>>>>>> success for sure, but we can at least do something to improve it.
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>> J.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall
>>>>>>>>> <fe...@pm.me.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ping,
>>>>>>>>>>
>>>>>>>>>> I had this idea about auto-fixing deprecation warnings too when I
>>>>>>>>>> started the airflint project.
>>>>>>>>>> I totally agree with what you said that often people ignore these
>>>>>>>>>> warnings and therefore a cli tool would come in very handy. It could look
>>>>>>>>>> similar to the airflint project and also be based on the python refactor
>>>>>>>>>> package, but I would keep it out of this discussion for now.
>>>>>>>>>>
>>>>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there
>>>>>>>>>> but also tend to having one schedule arg, because of simplicity
>>>>>>>>>> for the dag author.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Felix
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sent from Proton Mail for iOS
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Daniel,
>>>>>>>>>>
>>>>>>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>>>>>>
>>>>>>>>>> I agree with Vikram, we should be very careful about deprecating
>>>>>>>>>> existing params even if we have warnings around it. Not sure if this is a
>>>>>>>>>> general case, but I notice that people ignore warnings all the time until
>>>>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool to
>>>>>>>>>> help users to batch update these deprecating params just like python's
>>>>>>>>>> futurize tool. This can greatly ease the burden of upgrading the airflow
>>>>>>>>>> version.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Ping
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>>>>>>> daniel.standish@astronomer.io> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>>>>>>
>>>>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you
>>>>>>>>>>>> could provide a little more detail about your concerns?
>>>>>>>>>>>>
>>>>>>>>>>>> There are certainly alternatives, and of course each of them has
>>>>>>>>>>>> tradeoffs.
>>>>>>>>>>>>
>>>>>>>>>>>> Here are the options I see:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has
>>>>>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>>>>>>
>>>>>>>>>>>> Which would you propose we go with?
>>>>>>>>>>>>
>>>>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>>>>>>> definitely be to go with 3.
>>>>>>>>>>>>
>>>>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad,
>>>>>>>>>>>> but `schedule` definitely seems better. From a new user's perspective,
>>>>>>>>>>>> having misleading names is confusing, and can be a negative when looking at
>>>>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>>>>>>>>> it can be bothersome, certainly.
>>>>>>>>>>>>
>>>>>>>>>>>> Just to look at / compare feasibility.
>>>>>>>>>>>>
>>>>>>>>>>>> Deprecating schedule_interval, it would be more prominent than
>>>>>>>>>>>> some other deprecations because it's a DAG param. That said, I think it's
>>>>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality of DAG
>>>>>>>>>>>> and not *task*, so there is less to change with than some other
>>>>>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace that
>>>>>>>>>>>> could be automated with an `upgrade-check` script. It's very easy to
>>>>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail the DAG
>>>>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment with a
>>>>>>>>>>>> deprecation fix tool, so that users could get help fixing deprecations, not
>>>>>>>>>>>> necessarily in the context of an upgrade, since we're not looking at 3.0
>>>>>>>>>>>> any time soon.
>>>>>>>>>>>>
>>>>>>>>>>>> We could compare with execution_date, if execution_date is to be
>>>>>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a more
>>>>>>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>>>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>>>>>>>>> custom operators. *(With schedule_interval, since it's a
>>>>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date
>>>>>>>>>>>> was replaced with `logical_date` for similar reasons -- avoiding user
>>>>>>>>>>>> confusion.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> -1 from me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Though I agree in principle with the idea of consolidation, I
>>>>>>>>>>>>> don't think we should be doing this yet until we understand the
>>>>>>>>>>>>> implications completely.
>>>>>>>>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>>>>>>>>> unless there is really no alternative.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> So far, seems all in favor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I will just highlight, in case it's not clear.... when we
>>>>>>>>>>>>>> release this (presumably 2.4), basically every single dag in existence will
>>>>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically every dag
>>>>>>>>>>>>>> in existence will need to be updated.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thankfully, for most people, this should be an easy update
>>>>>>>>>>>>>> since DAG is kwargs only, IIRC.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1
>>>>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
Well, thanks for forcing us to make the case :)

I'll initiate a lazy consensus "vote"


On Fri, Aug 5, 2022 at 1:01 PM Vikram Koka <vi...@astronomer.io.invalid>
wrote:

> Fair points all.
> I withdraw my objection and agree with the consolidation proposal.
>
> Vikram
>
>
> On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Let me also add to why I think we should consolidate to one param::
>>
>> * "schedule_interval"
>> * "schedule_on"
>> * "timetable"
>>
>> Imagine you are a new user coming to Airflow.
>>
>> Do you immediately recognise which one is which ? Do you see what the
>> difference is?
>>
>> * "Schedule_on" = "Every Sunday" is equally good as "schedule_on"
>> "dataset".
>> * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what?
>>
>> For me even if we wanted to have different names, the current set of
>> names for those params is just terribly ambiguous.
>>
>> Making it "schedule" and acting depending on what you pass to it is so
>> much better and simpler.
>>
>> J.
>>
>>
>> On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish
>> <da...@astronomer.io.invalid> wrote:
>>
>>> @Vikram, yeah, I understand that concern.  Making a breaking change is
>>> something we shouldn't do carelessly, without consideration, without good
>>> reason.  But I think we have a good reason here.
>>>
>>> Deprecations are somewhat regular in airflow.  In 2.0 we moved a lot of
>>> imports.  We also changed the CLI semantics.  Probably there were other
>>> breaking changes that I'm forgetting *(granted sometimes we didn't
>>> remove the backward-compatibility immediately in 2.0, and that remains an
>>> option here)*.  Probably just about every dag in existence had to be
>>> changed then too.  And that's what a major release is for.
>>>
>>> In 3.0, it looks like we're deprecating execution_date...  That's even
>>> bigger.  So, were we to do this, (1) moving to `schedule` wouldn't be the
>>> only notable breaking change and (2) by comparison it's a pretty simple
>>> change.  And if, as part of the upgrade to a major release, you have to do
>>> renames and such anyway, the marginal cost of one more isn't massive.
>>>
>>> I guess my take is that we can't be chained to our previous decisions
>>> forever.  And we have to have the flexibility to adapt our interfaces so
>>> that they still make sense and are consistent and don't cause excessive
>>> eyebrow wrinkling.  We have adopted semantic versioning to provide a
>>> mechanism for signaling when breaking changes are coming.  And we can add
>>> tooling to help users along the way.
>>>
>>> Additionally, we can't *only *look at the burden of such a change for
>>> users. We must also weigh this against the burden that we maintain by
>>> *not* changing, i.e. by having too many params, or ill-named params.
>>>
>>>
>>> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <je...@apache.org>
>>> wrote:
>>>
>>>> I'm not in favor of reusing `schedule_interval` for datasets (or
>>>> folding `timetable` into it either), primarily because it doesn't feel like
>>>> a good name to describe what it does. If we don't have the appetite to
>>>> consolidate, my vote is we stick with 3 separate params. So preferably #1,
>>>> if not then #2 for me.
>>>>
>>>> I'm less concerned about the "everyone has to update every DAG" issue,
>>>> as it only becomes a problem for Airflow 3, folks can freely ignore it
>>>> until then, and when they do update it it's a simple find/replace.
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <
>>>> a.r.gibbs.massmails@gmail.com> wrote:
>>>>
>>>>> I'll just +1 that IMO a UI reminder is a great option.
>>>>>
>>>>> 2.3.0 added an alert when robots.txt was being scanned, which was
>>>>> great, because I ignored it for a while, but eventually caved and looked
>>>>> into it. This I would see as more of the same. I currently get loads of
>>>>> warnings about things in my config file that have moved; but typically only
>>>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see
>>>>> the alert, think "I must fix those", and then the Homer effect kicks in,
>>>>> and something else takes up the space in my brain. Having them as a banner
>>>>> in the GUI would 100% get me to sort it sooner or later.
>>>>>
>>>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka
>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> I completely agree with wanting to consolidate, but I am very
>>>>>> concerned about people needing to change their existing code.
>>>>>> As Ping and Felix also said, I worry that people ignore deprecation
>>>>>> warnings, especially since the original code authors don't really look at
>>>>>> logs or other pipeline artifacts unless something is broken.
>>>>>>
>>>>>> With respect to the options you outlined, I am personally leaning
>>>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48
>>>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I
>>>>>> could be convinced with (3), since I think that "timetable" is relatively
>>>>>> recent and have not seen as much usage for this invocation yet. I am
>>>>>> definitely curious about other perspectives on this as well.
>>>>>>
>>>>>> I have a hard time with option (1) of deprecating "schedule
>>>>>> interval", because of the historical nature of it and the code changes that
>>>>>> people may have to make to deal with it.
>>>>>>
>>>>>> Thanks for raising this topic and the discussion,
>>>>>>
>>>>>> Vikram
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> I agree with those (including myself) who have suggested that it
>>>>>>> would be great to have a deprecation fix tool.
>>>>>>>
>>>>>>> I also think that UI notifications would be valuable.  Another
>>>>>>> option would be to provide a less intrusive indicator somewhere (like when
>>>>>>> chrome lets you know you need to update), and then you click on it and it
>>>>>>> takes you to a page that lists out all the deprecation warnings.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I know we are distracting a bit from the main subject, but I think
>>>>>>>> it's worth it.
>>>>>>>>
>>>>>>>> I think this is a fact of life that people don't remove
>>>>>>>> deprecations. But we have a choice of "not doing anything about it" and
>>>>>>>> complaining. Or doing something that might improve it.
>>>>>>>>
>>>>>>>> I usually err on the side of "doing something" when I see a
>>>>>>>> problem. I am a doer :).
>>>>>>>>
>>>>>>>> And I have a possibly controversial proposal. I got the inspiration
>>>>>>>> from node.js old versions. Whenever an old version of Node is being used,
>>>>>>>> you have a 20 seconds delay artificially added and information that in
>>>>>>>> order to remove those 20 seconds you need to upgrade.
>>>>>>>>
>>>>>>>> Why don't we do that for some really old/serious deprecations ?
>>>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the UI if
>>>>>>>> you have any deprecation unhandled. Something that is annoying enough that
>>>>>>>> will cause people to make an action but not intrusive enough to break
>>>>>>>> things.
>>>>>>>>
>>>>>>>> I think this is a very basic psychological trait:
>>>>>>>>
>>>>>>>> If it does not cost the user anything to leave the deprecations,
>>>>>>>> there will be a big number of people who will do that.
>>>>>>>> We need to make sure that deprecations are costing our users "a
>>>>>>>> little" and that they are aware that they can remove the cost (and know how
>>>>>>>> - this is super important that we only show such a message with clear
>>>>>>>> instructions on what to do).
>>>>>>>>
>>>>>>>> Such a message might be annoying in many ways:
>>>>>>>>
>>>>>>>> * it can obstruct significant part of the screen
>>>>>>>> * It can be flashy and had very annoying colours (think bold pink
>>>>>>>> fonts on flash-green background for example)
>>>>>>>> * It can not allow to do anything before the user closes it with a
>>>>>>>> REALLY small 'x' that it is difficult to point at (think cookie banners).
>>>>>>>> And when they click they can be greeted with "You know - you can actually
>>>>>>>> remove this warning by doing THIS".
>>>>>>>> * and we have probably more options - our UX specialists might have
>>>>>>>> some good ideas here
>>>>>>>>
>>>>>>>> For anyone who complains about "people do not act on deprecation."
>>>>>>>> - yeah this is true.
>>>>>>>> But one universal truth there is that if we won't act, this will
>>>>>>>> change nothing in the behaviour of our users - this is 100% certain.
>>>>>>>>
>>>>>>>> If we do something - it might improve it. We will not get a 100%
>>>>>>>> success for sure, but we can at least do something to improve it.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> J.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall
>>>>>>>> <fe...@pm.me.invalid> wrote:
>>>>>>>>
>>>>>>>>> Hi Ping,
>>>>>>>>>
>>>>>>>>> I had this idea about auto-fixing deprecation warnings too when I
>>>>>>>>> started the airflint project.
>>>>>>>>> I totally agree with what you said that often people ignore these
>>>>>>>>> warnings and therefore a cli tool would come in very handy. It could look
>>>>>>>>> similar to the airflint project and also be based on the python refactor
>>>>>>>>> package, but I would keep it out of this discussion for now.
>>>>>>>>>
>>>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there
>>>>>>>>> but also tend to having one schedule arg, because of simplicity
>>>>>>>>> for the dag author.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Felix
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sent from Proton Mail for iOS
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>>>>>
>>>>>>>>> I agree with Vikram, we should be very careful about deprecating
>>>>>>>>> existing params even if we have warnings around it. Not sure if this is a
>>>>>>>>> general case, but I notice that people ignore warnings all the time until
>>>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool to
>>>>>>>>> help users to batch update these deprecating params just like python's
>>>>>>>>> futurize tool. This can greatly ease the burden of upgrading the airflow
>>>>>>>>> version.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Ping
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>>>>>> daniel.standish@astronomer.io> wrote:
>>>>>>>>>>
>>>>>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>>>>>
>>>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you
>>>>>>>>>>> could provide a little more detail about your concerns?
>>>>>>>>>>>
>>>>>>>>>>> There are certainly alternatives, and of course each of them has
>>>>>>>>>>> tradeoffs.
>>>>>>>>>>>
>>>>>>>>>>> Here are the options I see:
>>>>>>>>>>>
>>>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has
>>>>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>>>>>
>>>>>>>>>>> Which would you propose we go with?
>>>>>>>>>>>
>>>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>>>>>> definitely be to go with 3.
>>>>>>>>>>>
>>>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad,
>>>>>>>>>>> but `schedule` definitely seems better. From a new user's perspective,
>>>>>>>>>>> having misleading names is confusing, and can be a negative when looking at
>>>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>>>>>>>> it can be bothersome, certainly.
>>>>>>>>>>>
>>>>>>>>>>> Just to look at / compare feasibility.
>>>>>>>>>>>
>>>>>>>>>>> Deprecating schedule_interval, it would be more prominent than
>>>>>>>>>>> some other deprecations because it's a DAG param. That said, I think it's
>>>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality of DAG
>>>>>>>>>>> and not *task*, so there is less to change with than some other
>>>>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace that
>>>>>>>>>>> could be automated with an `upgrade-check` script. It's very easy to
>>>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail the DAG
>>>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment with a
>>>>>>>>>>> deprecation fix tool, so that users could get help fixing deprecations, not
>>>>>>>>>>> necessarily in the context of an upgrade, since we're not looking at 3.0
>>>>>>>>>>> any time soon.
>>>>>>>>>>>
>>>>>>>>>>> We could compare with execution_date, if execution_date is to be
>>>>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a more
>>>>>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>>>>>>>> custom operators. *(With schedule_interval, since it's a
>>>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date
>>>>>>>>>>> was replaced with `logical_date` for similar reasons -- avoiding user
>>>>>>>>>>> confusion.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> -1 from me.
>>>>>>>>>>>>
>>>>>>>>>>>> Though I agree in principle with the idea of consolidation, I
>>>>>>>>>>>> don't think we should be doing this yet until we understand the
>>>>>>>>>>>> implications completely.
>>>>>>>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>>>>>>>> unless there is really no alternative.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> So far, seems all in favor.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will just highlight, in case it's not clear.... when we
>>>>>>>>>>>>> release this (presumably 2.4), basically every single dag in existence will
>>>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically every dag
>>>>>>>>>>>>> in existence will need to be updated.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thankfully, for most people, this should be an easy update
>>>>>>>>>>>>> since DAG is kwargs only, IIRC.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1
>>>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Vikram Koka <vi...@astronomer.io.INVALID>.
Fair points all.
I withdraw my objection and agree with the consolidation proposal.

Vikram


On Fri, Aug 5, 2022 at 10:06 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Let me also add to why I think we should consolidate to one param::
>
> * "schedule_interval"
> * "schedule_on"
> * "timetable"
>
> Imagine you are a new user coming to Airflow.
>
> Do you immediately recognise which one is which ? Do you see what the
> difference is?
>
> * "Schedule_on" = "Every Sunday" is equally good as "schedule_on"
> "dataset".
> * "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what?
>
> For me even if we wanted to have different names, the current set of names
> for those params is just terribly ambiguous.
>
> Making it "schedule" and acting depending on what you pass to it is so
> much better and simpler.
>
> J.
>
>
> On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish
> <da...@astronomer.io.invalid> wrote:
>
>> @Vikram, yeah, I understand that concern.  Making a breaking change is
>> something we shouldn't do carelessly, without consideration, without good
>> reason.  But I think we have a good reason here.
>>
>> Deprecations are somewhat regular in airflow.  In 2.0 we moved a lot of
>> imports.  We also changed the CLI semantics.  Probably there were other
>> breaking changes that I'm forgetting *(granted sometimes we didn't
>> remove the backward-compatibility immediately in 2.0, and that remains an
>> option here)*.  Probably just about every dag in existence had to be
>> changed then too.  And that's what a major release is for.
>>
>> In 3.0, it looks like we're deprecating execution_date...  That's even
>> bigger.  So, were we to do this, (1) moving to `schedule` wouldn't be the
>> only notable breaking change and (2) by comparison it's a pretty simple
>> change.  And if, as part of the upgrade to a major release, you have to do
>> renames and such anyway, the marginal cost of one more isn't massive.
>>
>> I guess my take is that we can't be chained to our previous decisions
>> forever.  And we have to have the flexibility to adapt our interfaces so
>> that they still make sense and are consistent and don't cause excessive
>> eyebrow wrinkling.  We have adopted semantic versioning to provide a
>> mechanism for signaling when breaking changes are coming.  And we can add
>> tooling to help users along the way.
>>
>> Additionally, we can't *only *look at the burden of such a change for
>> users. We must also weigh this against the burden that we maintain by
>> *not* changing, i.e. by having too many params, or ill-named params.
>>
>>
>> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <je...@apache.org>
>> wrote:
>>
>>> I'm not in favor of reusing `schedule_interval` for datasets (or folding
>>> `timetable` into it either), primarily because it doesn't feel like a good
>>> name to describe what it does. If we don't have the appetite to
>>> consolidate, my vote is we stick with 3 separate params. So preferably #1,
>>> if not then #2 for me.
>>>
>>> I'm less concerned about the "everyone has to update every DAG" issue,
>>> as it only becomes a problem for Airflow 3, folks can freely ignore it
>>> until then, and when they do update it it's a simple find/replace.
>>>
>>>
>>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <
>>> a.r.gibbs.massmails@gmail.com> wrote:
>>>
>>>> I'll just +1 that IMO a UI reminder is a great option.
>>>>
>>>> 2.3.0 added an alert when robots.txt was being scanned, which was
>>>> great, because I ignored it for a while, but eventually caved and looked
>>>> into it. This I would see as more of the same. I currently get loads of
>>>> warnings about things in my config file that have moved; but typically only
>>>> if I have to do a `airflow db upgrade` or `airflow dags backfill`. I see
>>>> the alert, think "I must fix those", and then the Homer effect kicks in,
>>>> and something else takes up the space in my brain. Having them as a banner
>>>> in the GUI would 100% get me to sort it sooner or later.
>>>>
>>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka <vi...@astronomer.io.invalid>
>>>> wrote:
>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> I completely agree with wanting to consolidate, but I am very
>>>>> concerned about people needing to change their existing code.
>>>>> As Ping and Felix also said, I worry that people ignore deprecation
>>>>> warnings, especially since the original code authors don't really look at
>>>>> logs or other pipeline artifacts unless something is broken.
>>>>>
>>>>> With respect to the options you outlined, I am personally leaning
>>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48
>>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I
>>>>> could be convinced with (3), since I think that "timetable" is relatively
>>>>> recent and have not seen as much usage for this invocation yet. I am
>>>>> definitely curious about other perspectives on this as well.
>>>>>
>>>>> I have a hard time with option (1) of deprecating "schedule interval",
>>>>> because of the historical nature of it and the code changes that people may
>>>>> have to make to deal with it.
>>>>>
>>>>> Thanks for raising this topic and the discussion,
>>>>>
>>>>> Vikram
>>>>>
>>>>>
>>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> I agree with those (including myself) who have suggested that it
>>>>>> would be great to have a deprecation fix tool.
>>>>>>
>>>>>> I also think that UI notifications would be valuable.  Another option
>>>>>> would be to provide a less intrusive indicator somewhere (like when chrome
>>>>>> lets you know you need to update), and then you click on it and it takes
>>>>>> you to a page that lists out all the deprecation warnings.
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>>
>>>>>>> I know we are distracting a bit from the main subject, but I think
>>>>>>> it's worth it.
>>>>>>>
>>>>>>> I think this is a fact of life that people don't remove
>>>>>>> deprecations. But we have a choice of "not doing anything about it" and
>>>>>>> complaining. Or doing something that might improve it.
>>>>>>>
>>>>>>> I usually err on the side of "doing something" when I see a problem.
>>>>>>> I am a doer :).
>>>>>>>
>>>>>>> And I have a possibly controversial proposal. I got the inspiration
>>>>>>> from node.js old versions. Whenever an old version of Node is being used,
>>>>>>> you have a 20 seconds delay artificially added and information that in
>>>>>>> order to remove those 20 seconds you need to upgrade.
>>>>>>>
>>>>>>> Why don't we do that for some really old/serious deprecations ?
>>>>>>> Maybe not 20 seconds. Maybe adding an unremovable RED warning in the UI if
>>>>>>> you have any deprecation unhandled. Something that is annoying enough that
>>>>>>> will cause people to make an action but not intrusive enough to break
>>>>>>> things.
>>>>>>>
>>>>>>> I think this is a very basic psychological trait:
>>>>>>>
>>>>>>> If it does not cost the user anything to leave the deprecations,
>>>>>>> there will be a big number of people who will do that.
>>>>>>> We need to make sure that deprecations are costing our users "a
>>>>>>> little" and that they are aware that they can remove the cost (and know how
>>>>>>> - this is super important that we only show such a message with clear
>>>>>>> instructions on what to do).
>>>>>>>
>>>>>>> Such a message might be annoying in many ways:
>>>>>>>
>>>>>>> * it can obstruct significant part of the screen
>>>>>>> * It can be flashy and had very annoying colours (think bold pink
>>>>>>> fonts on flash-green background for example)
>>>>>>> * It can not allow to do anything before the user closes it with a
>>>>>>> REALLY small 'x' that it is difficult to point at (think cookie banners).
>>>>>>> And when they click they can be greeted with "You know - you can actually
>>>>>>> remove this warning by doing THIS".
>>>>>>> * and we have probably more options - our UX specialists might have
>>>>>>> some good ideas here
>>>>>>>
>>>>>>> For anyone who complains about "people do not act on deprecation." -
>>>>>>> yeah this is true.
>>>>>>> But one universal truth there is that if we won't act, this will
>>>>>>> change nothing in the behaviour of our users - this is 100% certain.
>>>>>>>
>>>>>>> If we do something - it might improve it. We will not get a 100%
>>>>>>> success for sure, but we can at least do something to improve it.
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall
>>>>>>> <fe...@pm.me.invalid> wrote:
>>>>>>>
>>>>>>>> Hi Ping,
>>>>>>>>
>>>>>>>> I had this idea about auto-fixing deprecation warnings too when I
>>>>>>>> started the airflint project.
>>>>>>>> I totally agree with what you said that often people ignore these
>>>>>>>> warnings and therefore a cli tool would come in very handy. It could look
>>>>>>>> similar to the airflint project and also be based on the python refactor
>>>>>>>> package, but I would keep it out of this discussion for now.
>>>>>>>>
>>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there but
>>>>>>>> also tend to having one schedule arg, because of simplicity for
>>>>>>>> the dag author.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Felix
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sent from Proton Mail for iOS
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>>>>>>>>
>>>>>>>> Hi Daniel,
>>>>>>>>
>>>>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>>>>
>>>>>>>> I agree with Vikram, we should be very careful about deprecating
>>>>>>>> existing params even if we have warnings around it. Not sure if this is a
>>>>>>>> general case, but I notice that people ignore warnings all the time until
>>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool to
>>>>>>>> help users to batch update these deprecating params just like python's
>>>>>>>> futurize tool. This can greatly ease the burden of upgrading the airflow
>>>>>>>> version.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Ping
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>>>>
>>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>>>>> daniel.standish@astronomer.io> wrote:
>>>>>>>>>
>>>>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>>>>
>>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you
>>>>>>>>>> could provide a little more detail about your concerns?
>>>>>>>>>>
>>>>>>>>>> There are certainly alternatives, and of course each of them has
>>>>>>>>>> tradeoffs.
>>>>>>>>>>
>>>>>>>>>> Here are the options I see:
>>>>>>>>>>
>>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has
>>>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>>>>
>>>>>>>>>> Which would you propose we go with?
>>>>>>>>>>
>>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>>>>> definitely be to go with 3.
>>>>>>>>>>
>>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad,
>>>>>>>>>> but `schedule` definitely seems better. From a new user's perspective,
>>>>>>>>>> having misleading names is confusing, and can be a negative when looking at
>>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>>>>>>> it can be bothersome, certainly.
>>>>>>>>>>
>>>>>>>>>> Just to look at / compare feasibility.
>>>>>>>>>>
>>>>>>>>>> Deprecating schedule_interval, it would be more prominent than
>>>>>>>>>> some other deprecations because it's a DAG param. That said, I think it's
>>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality of DAG
>>>>>>>>>> and not *task*, so there is less to change with than some other
>>>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace that
>>>>>>>>>> could be automated with an `upgrade-check` script. It's very easy to
>>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail the DAG
>>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment with a
>>>>>>>>>> deprecation fix tool, so that users could get help fixing deprecations, not
>>>>>>>>>> necessarily in the context of an upgrade, since we're not looking at 3.0
>>>>>>>>>> any time soon.
>>>>>>>>>>
>>>>>>>>>> We could compare with execution_date, if execution_date is to be
>>>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a more
>>>>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>>>>>>> custom operators. *(With schedule_interval, since it's a
>>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date was
>>>>>>>>>> replaced with `logical_date` for similar reasons -- avoiding user confusion.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> -1 from me.
>>>>>>>>>>>
>>>>>>>>>>> Though I agree in principle with the idea of consolidation, I
>>>>>>>>>>> don't think we should be doing this yet until we understand the
>>>>>>>>>>> implications completely.
>>>>>>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>>>>>>> unless there is really no alternative.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> So far, seems all in favor.
>>>>>>>>>>>>
>>>>>>>>>>>> I will just highlight, in case it's not clear.... when we
>>>>>>>>>>>> release this (presumably 2.4), basically every single dag in existence will
>>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically every dag
>>>>>>>>>>>> in existence will need to be updated.
>>>>>>>>>>>>
>>>>>>>>>>>> Thankfully, for most people, this should be an easy update
>>>>>>>>>>>> since DAG is kwargs only, IIRC.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1
>>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Jarek Potiuk <ja...@potiuk.com>.
Let me also add to why I think we should consolidate to one param::

* "schedule_interval"
* "schedule_on"
* "timetable"

Imagine you are a new user coming to Airflow.

Do you immediately recognise which one is which ? Do you see what the
difference is?

* "Schedule_on" = "Every Sunday" is equally good as "schedule_on" "dataset".
* "timetable" - what even is a timetable? Is it a "cron TAB" ? Or what?

For me even if we wanted to have different names, the current set of names
for those params is just terribly ambiguous.

Making it "schedule" and acting depending on what you pass to it is so much
better and simpler.

J.


On Fri, Aug 5, 2022 at 6:57 PM Daniel Standish
<da...@astronomer.io.invalid> wrote:

> @Vikram, yeah, I understand that concern.  Making a breaking change is
> something we shouldn't do carelessly, without consideration, without good
> reason.  But I think we have a good reason here.
>
> Deprecations are somewhat regular in airflow.  In 2.0 we moved a lot of
> imports.  We also changed the CLI semantics.  Probably there were other
> breaking changes that I'm forgetting *(granted sometimes we didn't remove
> the backward-compatibility immediately in 2.0, and that remains an option
> here)*.  Probably just about every dag in existence had to be changed
> then too.  And that's what a major release is for.
>
> In 3.0, it looks like we're deprecating execution_date...  That's even
> bigger.  So, were we to do this, (1) moving to `schedule` wouldn't be the
> only notable breaking change and (2) by comparison it's a pretty simple
> change.  And if, as part of the upgrade to a major release, you have to do
> renames and such anyway, the marginal cost of one more isn't massive.
>
> I guess my take is that we can't be chained to our previous decisions
> forever.  And we have to have the flexibility to adapt our interfaces so
> that they still make sense and are consistent and don't cause excessive
> eyebrow wrinkling.  We have adopted semantic versioning to provide a
> mechanism for signaling when breaking changes are coming.  And we can add
> tooling to help users along the way.
>
> Additionally, we can't *only *look at the burden of such a change for
> users. We must also weigh this against the burden that we maintain by
> *not* changing, i.e. by having too many params, or ill-named params.
>
>
> On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <je...@apache.org>
> wrote:
>
>> I'm not in favor of reusing `schedule_interval` for datasets (or folding
>> `timetable` into it either), primarily because it doesn't feel like a good
>> name to describe what it does. If we don't have the appetite to
>> consolidate, my vote is we stick with 3 separate params. So preferably #1,
>> if not then #2 for me.
>>
>> I'm less concerned about the "everyone has to update every DAG" issue, as
>> it only becomes a problem for Airflow 3, folks can freely ignore it until
>> then, and when they do update it it's a simple find/replace.
>>
>>
>> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <
>> a.r.gibbs.massmails@gmail.com> wrote:
>>
>>> I'll just +1 that IMO a UI reminder is a great option.
>>>
>>> 2.3.0 added an alert when robots.txt was being scanned, which was great,
>>> because I ignored it for a while, but eventually caved and looked into it.
>>> This I would see as more of the same. I currently get loads of warnings
>>> about things in my config file that have moved; but typically only if I
>>> have to do a `airflow db upgrade` or `airflow dags backfill`. I see the
>>> alert, think "I must fix those", and then the Homer effect kicks in, and
>>> something else takes up the space in my brain. Having them as a banner in
>>> the GUI would 100% get me to sort it sooner or later.
>>>
>>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka <vi...@astronomer.io.invalid>
>>> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I completely agree with wanting to consolidate, but I am very concerned
>>>> about people needing to change their existing code.
>>>> As Ping and Felix also said, I worry that people ignore deprecation
>>>> warnings, especially since the original code authors don't really look at
>>>> logs or other pipeline artifacts unless something is broken.
>>>>
>>>> With respect to the options you outlined, I am personally leaning
>>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48
>>>> has added it in main) and take List[Dataset] in `schedule_interval`". I
>>>> could be convinced with (3), since I think that "timetable" is relatively
>>>> recent and have not seen as much usage for this invocation yet. I am
>>>> definitely curious about other perspectives on this as well.
>>>>
>>>> I have a hard time with option (1) of deprecating "schedule interval",
>>>> because of the historical nature of it and the code changes that people may
>>>> have to make to deal with it.
>>>>
>>>> Thanks for raising this topic and the discussion,
>>>>
>>>> Vikram
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>>>> <da...@astronomer.io.invalid> wrote:
>>>>
>>>>> I agree with those (including myself) who have suggested that it would
>>>>> be great to have a deprecation fix tool.
>>>>>
>>>>> I also think that UI notifications would be valuable.  Another option
>>>>> would be to provide a less intrusive indicator somewhere (like when chrome
>>>>> lets you know you need to update), and then you click on it and it takes
>>>>> you to a page that lists out all the deprecation warnings.
>>>>>
>>>>>
>>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>
>>>>>> I know we are distracting a bit from the main subject, but I think
>>>>>> it's worth it.
>>>>>>
>>>>>> I think this is a fact of life that people don't remove deprecations.
>>>>>> But we have a choice of "not doing anything about it" and complaining. Or
>>>>>> doing something that might improve it.
>>>>>>
>>>>>> I usually err on the side of "doing something" when I see a problem.
>>>>>> I am a doer :).
>>>>>>
>>>>>> And I have a possibly controversial proposal. I got the inspiration
>>>>>> from node.js old versions. Whenever an old version of Node is being used,
>>>>>> you have a 20 seconds delay artificially added and information that in
>>>>>> order to remove those 20 seconds you need to upgrade.
>>>>>>
>>>>>> Why don't we do that for some really old/serious deprecations ? Maybe
>>>>>> not 20 seconds. Maybe adding an unremovable RED warning in the UI if you
>>>>>> have any deprecation unhandled. Something that is annoying enough that
>>>>>> will cause people to make an action but not intrusive enough to break
>>>>>> things.
>>>>>>
>>>>>> I think this is a very basic psychological trait:
>>>>>>
>>>>>> If it does not cost the user anything to leave the deprecations,
>>>>>> there will be a big number of people who will do that.
>>>>>> We need to make sure that deprecations are costing our users "a
>>>>>> little" and that they are aware that they can remove the cost (and know how
>>>>>> - this is super important that we only show such a message with clear
>>>>>> instructions on what to do).
>>>>>>
>>>>>> Such a message might be annoying in many ways:
>>>>>>
>>>>>> * it can obstruct significant part of the screen
>>>>>> * It can be flashy and had very annoying colours (think bold pink
>>>>>> fonts on flash-green background for example)
>>>>>> * It can not allow to do anything before the user closes it with a
>>>>>> REALLY small 'x' that it is difficult to point at (think cookie banners).
>>>>>> And when they click they can be greeted with "You know - you can actually
>>>>>> remove this warning by doing THIS".
>>>>>> * and we have probably more options - our UX specialists might have
>>>>>> some good ideas here
>>>>>>
>>>>>> For anyone who complains about "people do not act on deprecation." -
>>>>>> yeah this is true.
>>>>>> But one universal truth there is that if we won't act, this will
>>>>>> change nothing in the behaviour of our users - this is 100% certain.
>>>>>>
>>>>>> If we do something - it might improve it. We will not get a 100%
>>>>>> success for sure, but we can at least do something to improve it.
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> J.
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall
>>>>>> <fe...@pm.me.invalid> wrote:
>>>>>>
>>>>>>> Hi Ping,
>>>>>>>
>>>>>>> I had this idea about auto-fixing deprecation warnings too when I
>>>>>>> started the airflint project.
>>>>>>> I totally agree with what you said that often people ignore these
>>>>>>> warnings and therefore a cli tool would come in very handy. It could look
>>>>>>> similar to the airflint project and also be based on the python refactor
>>>>>>> package, but I would keep it out of this discussion for now.
>>>>>>>
>>>>>>> Regarding the schedule arg, I don’t have a strong opinion there but
>>>>>>> also tend to having one schedule arg, because of simplicity for the
>>>>>>> dag author.
>>>>>>>
>>>>>>> Best,
>>>>>>> Felix
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sent from Proton Mail for iOS
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>>>>>>>
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>>>
>>>>>>> I agree with Vikram, we should be very careful about deprecating
>>>>>>> existing params even if we have warnings around it. Not sure if this is a
>>>>>>> general case, but I notice that people ignore warnings all the time until
>>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool to
>>>>>>> help users to batch update these deprecating params just like python's
>>>>>>> futurize tool. This can greatly ease the burden of upgrading the airflow
>>>>>>> version.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Ping
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>>>
>>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>>>> daniel.standish@astronomer.io> wrote:
>>>>>>>>
>>>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>>>
>>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you
>>>>>>>>> could provide a little more detail about your concerns?
>>>>>>>>>
>>>>>>>>> There are certainly alternatives, and of course each of them has
>>>>>>>>> tradeoffs.
>>>>>>>>>
>>>>>>>>> Here are the options I see:
>>>>>>>>>
>>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has
>>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>>>
>>>>>>>>> Which would you propose we go with?
>>>>>>>>>
>>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>>>> definitely be to go with 3.
>>>>>>>>>
>>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad, but
>>>>>>>>> `schedule` definitely seems better. From a new user's perspective, having
>>>>>>>>> misleading names is confusing, and can be a negative when looking at
>>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>>>>>> it can be bothersome, certainly.
>>>>>>>>>
>>>>>>>>> Just to look at / compare feasibility.
>>>>>>>>>
>>>>>>>>> Deprecating schedule_interval, it would be more prominent than
>>>>>>>>> some other deprecations because it's a DAG param. That said, I think it's
>>>>>>>>> pretty straightforward to handle. For one, it's at the cardinality of DAG
>>>>>>>>> and not *task*, so there is less to change with than some other
>>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace that
>>>>>>>>> could be automated with an `upgrade-check` script. It's very easy to
>>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail the DAG
>>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment with a
>>>>>>>>> deprecation fix tool, so that users could get help fixing deprecations, not
>>>>>>>>> necessarily in the context of an upgrade, since we're not looking at 3.0
>>>>>>>>> any time soon.
>>>>>>>>>
>>>>>>>>> We could compare with execution_date, if execution_date is to be
>>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a more
>>>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>>>>>> custom operators. *(With schedule_interval, since it's a
>>>>>>>>> top-level dag kwarg, it will be easier.)* And execution_date was
>>>>>>>>> replaced with `logical_date` for similar reasons -- avoiding user confusion.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> -1 from me.
>>>>>>>>>>
>>>>>>>>>> Though I agree in principle with the idea of consolidation, I
>>>>>>>>>> don't think we should be doing this yet until we understand the
>>>>>>>>>> implications completely.
>>>>>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>>>>>> unless there is really no alternative.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> So far, seems all in favor.
>>>>>>>>>>>
>>>>>>>>>>> I will just highlight, in case it's not clear.... when we
>>>>>>>>>>> release this (presumably 2.4), basically every single dag in existence will
>>>>>>>>>>> start emitting deprecation warnings, and prior to 3.0, basically every dag
>>>>>>>>>>> in existence will need to be updated.
>>>>>>>>>>>
>>>>>>>>>>> Thankfully, for most people, this should be an easy update since
>>>>>>>>>>> DAG is kwargs only, IIRC.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1
>>>>>>>>>>>>> from me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
@Vikram, yeah, I understand that concern.  Making a breaking change is
something we shouldn't do carelessly, without consideration, without good
reason.  But I think we have a good reason here.

Deprecations are somewhat regular in airflow.  In 2.0 we moved a lot of
imports.  We also changed the CLI semantics.  Probably there were other
breaking changes that I'm forgetting *(granted sometimes we didn't remove
the backward-compatibility immediately in 2.0, and that remains an option
here)*.  Probably just about every dag in existence had to be changed then
too.  And that's what a major release is for.

In 3.0, it looks like we're deprecating execution_date...  That's even
bigger.  So, were we to do this, (1) moving to `schedule` wouldn't be the
only notable breaking change and (2) by comparison it's a pretty simple
change.  And if, as part of the upgrade to a major release, you have to do
renames and such anyway, the marginal cost of one more isn't massive.

I guess my take is that we can't be chained to our previous decisions
forever.  And we have to have the flexibility to adapt our interfaces so
that they still make sense and are consistent and don't cause excessive
eyebrow wrinkling.  We have adopted semantic versioning to provide a
mechanism for signaling when breaking changes are coming.  And we can add
tooling to help users along the way.

Additionally, we can't *only *look at the burden of such a change for
users. We must also weigh this against the burden that we maintain by
*not* changing, i.e.
by having too many params, or ill-named params.


On Thu, Aug 4, 2022 at 2:57 PM Jed Cunningham <je...@apache.org>
wrote:

> I'm not in favor of reusing `schedule_interval` for datasets (or folding
> `timetable` into it either), primarily because it doesn't feel like a good
> name to describe what it does. If we don't have the appetite to
> consolidate, my vote is we stick with 3 separate params. So preferably #1,
> if not then #2 for me.
>
> I'm less concerned about the "everyone has to update every DAG" issue, as
> it only becomes a problem for Airflow 3, folks can freely ignore it until
> then, and when they do update it it's a simple find/replace.
>
>
> On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <a....@gmail.com>
> wrote:
>
>> I'll just +1 that IMO a UI reminder is a great option.
>>
>> 2.3.0 added an alert when robots.txt was being scanned, which was great,
>> because I ignored it for a while, but eventually caved and looked into it.
>> This I would see as more of the same. I currently get loads of warnings
>> about things in my config file that have moved; but typically only if I
>> have to do a `airflow db upgrade` or `airflow dags backfill`. I see the
>> alert, think "I must fix those", and then the Homer effect kicks in, and
>> something else takes up the space in my brain. Having them as a banner in
>> the GUI would 100% get me to sort it sooner or later.
>>
>> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka <vi...@astronomer.io.invalid>
>> wrote:
>>
>>> Hi Daniel,
>>>
>>> I completely agree with wanting to consolidate, but I am very concerned
>>> about people needing to change their existing code.
>>> As Ping and Felix also said, I worry that people ignore deprecation
>>> warnings, especially since the original code authors don't really look at
>>> logs or other pipeline artifacts unless something is broken.
>>>
>>> With respect to the options you outlined, I am personally leaning
>>> towards (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48
>>> has added it in main) and take List[Dataset] in `schedule_interval`". I
>>> could be convinced with (3), since I think that "timetable" is relatively
>>> recent and have not seen as much usage for this invocation yet. I am
>>> definitely curious about other perspectives on this as well.
>>>
>>> I have a hard time with option (1) of deprecating "schedule interval",
>>> because of the historical nature of it and the code changes that people may
>>> have to make to deal with it.
>>>
>>> Thanks for raising this topic and the discussion,
>>>
>>> Vikram
>>>
>>>
>>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>>> <da...@astronomer.io.invalid> wrote:
>>>
>>>> I agree with those (including myself) who have suggested that it would
>>>> be great to have a deprecation fix tool.
>>>>
>>>> I also think that UI notifications would be valuable.  Another option
>>>> would be to provide a less intrusive indicator somewhere (like when chrome
>>>> lets you know you need to update), and then you click on it and it takes
>>>> you to a page that lists out all the deprecation warnings.
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>
>>>>> I know we are distracting a bit from the main subject, but I think
>>>>> it's worth it.
>>>>>
>>>>> I think this is a fact of life that people don't remove deprecations.
>>>>> But we have a choice of "not doing anything about it" and complaining. Or
>>>>> doing something that might improve it.
>>>>>
>>>>> I usually err on the side of "doing something" when I see a problem. I
>>>>> am a doer :).
>>>>>
>>>>> And I have a possibly controversial proposal. I got the inspiration
>>>>> from node.js old versions. Whenever an old version of Node is being used,
>>>>> you have a 20 seconds delay artificially added and information that in
>>>>> order to remove those 20 seconds you need to upgrade.
>>>>>
>>>>> Why don't we do that for some really old/serious deprecations ? Maybe
>>>>> not 20 seconds. Maybe adding an unremovable RED warning in the UI if you
>>>>> have any deprecation unhandled. Something that is annoying enough that
>>>>> will cause people to make an action but not intrusive enough to break
>>>>> things.
>>>>>
>>>>> I think this is a very basic psychological trait:
>>>>>
>>>>> If it does not cost the user anything to leave the deprecations, there
>>>>> will be a big number of people who will do that.
>>>>> We need to make sure that deprecations are costing our users "a
>>>>> little" and that they are aware that they can remove the cost (and know how
>>>>> - this is super important that we only show such a message with clear
>>>>> instructions on what to do).
>>>>>
>>>>> Such a message might be annoying in many ways:
>>>>>
>>>>> * it can obstruct significant part of the screen
>>>>> * It can be flashy and had very annoying colours (think bold pink
>>>>> fonts on flash-green background for example)
>>>>> * It can not allow to do anything before the user closes it with a
>>>>> REALLY small 'x' that it is difficult to point at (think cookie banners).
>>>>> And when they click they can be greeted with "You know - you can actually
>>>>> remove this warning by doing THIS".
>>>>> * and we have probably more options - our UX specialists might have
>>>>> some good ideas here
>>>>>
>>>>> For anyone who complains about "people do not act on deprecation." -
>>>>> yeah this is true.
>>>>> But one universal truth there is that if we won't act, this will
>>>>> change nothing in the behaviour of our users - this is 100% certain.
>>>>>
>>>>> If we do something - it might improve it. We will not get a 100%
>>>>> success for sure, but we can at least do something to improve it.
>>>>>
>>>>> WDYT?
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall <fe...@pm.me.invalid>
>>>>> wrote:
>>>>>
>>>>>> Hi Ping,
>>>>>>
>>>>>> I had this idea about auto-fixing deprecation warnings too when I
>>>>>> started the airflint project.
>>>>>> I totally agree with what you said that often people ignore these
>>>>>> warnings and therefore a cli tool would come in very handy. It could look
>>>>>> similar to the airflint project and also be based on the python refactor
>>>>>> package, but I would keep it out of this discussion for now.
>>>>>>
>>>>>> Regarding the schedule arg, I don’t have a strong opinion there but
>>>>>> also tend to having one schedule arg, because of simplicity for the
>>>>>> dag author.
>>>>>>
>>>>>> Best,
>>>>>> Felix
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sent from Proton Mail for iOS
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>>>>>>
>>>>>> Hi Daniel,
>>>>>>
>>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>>
>>>>>> I agree with Vikram, we should be very careful about deprecating
>>>>>> existing params even if we have warnings around it. Not sure if this is a
>>>>>> general case, but I notice that people ignore warnings all the time until
>>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool to
>>>>>> help users to batch update these deprecating params just like python's
>>>>>> futurize tool. This can greatly ease the burden of upgrading the airflow
>>>>>> version.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Ping
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>>
>>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>>> daniel.standish@astronomer.io> wrote:
>>>>>>>
>>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>>
>>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you
>>>>>>>> could provide a little more detail about your concerns?
>>>>>>>>
>>>>>>>> There are certainly alternatives, and of course each of them has
>>>>>>>> tradeoffs.
>>>>>>>>
>>>>>>>> Here are the options I see:
>>>>>>>>
>>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has
>>>>>>>> added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>>
>>>>>>>> Which would you propose we go with?
>>>>>>>>
>>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>>> definitely be to go with 3.
>>>>>>>>
>>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad, but
>>>>>>>> `schedule` definitely seems better. From a new user's perspective, having
>>>>>>>> misleading names is confusing, and can be a negative when looking at
>>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>>>>> it can be bothersome, certainly.
>>>>>>>>
>>>>>>>> Just to look at / compare feasibility.
>>>>>>>>
>>>>>>>> Deprecating schedule_interval, it would be more prominent than some
>>>>>>>> other deprecations because it's a DAG param. That said, I think it's pretty
>>>>>>>> straightforward to handle. For one, it's at the cardinality of DAG and not
>>>>>>>> *task*, so there is less to change with than some other
>>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace that
>>>>>>>> could be automated with an `upgrade-check` script. It's very easy to
>>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail the DAG
>>>>>>>> parse (which forces user to fix). Perhaps we could even experiment with a
>>>>>>>> deprecation fix tool, so that users could get help fixing deprecations, not
>>>>>>>> necessarily in the context of an upgrade, since we're not looking at 3.0
>>>>>>>> any time soon.
>>>>>>>>
>>>>>>>> We could compare with execution_date, if execution_date is to be
>>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a more
>>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>>>>> custom operators. *(With schedule_interval, since it's a top-level
>>>>>>>> dag kwarg, it will be easier.)* And execution_date was replaced
>>>>>>>> with `logical_date` for similar reasons -- avoiding user confusion.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>> -1 from me.
>>>>>>>>>
>>>>>>>>> Though I agree in principle with the idea of consolidation, I
>>>>>>>>> don't think we should be doing this yet until we understand the
>>>>>>>>> implications completely.
>>>>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>>>>> unless there is really no alternative.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> So far, seems all in favor.
>>>>>>>>>>
>>>>>>>>>> I will just highlight, in case it's not clear.... when we release
>>>>>>>>>> this (presumably 2.4), basically every single dag in existence will start
>>>>>>>>>> emitting deprecation warnings, and prior to 3.0, basically every dag in
>>>>>>>>>> existence will need to be updated.
>>>>>>>>>>
>>>>>>>>>> Thankfully, for most people, this should be an easy update since
>>>>>>>>>> DAG is kwargs only, IIRC.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>>>>
>>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1 from
>>>>>>>>>>>> me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Jed Cunningham <je...@apache.org>.
I'm not in favor of reusing `schedule_interval` for datasets (or folding
`timetable` into it either), primarily because it doesn't feel like a good
name to describe what it does. If we don't have the appetite to
consolidate, my vote is we stick with 3 separate params. So preferably #1,
if not then #2 for me.

I'm less concerned about the "everyone has to update every DAG" issue, as
it only becomes a problem for Airflow 3, folks can freely ignore it until
then, and when they do update it it's a simple find/replace.


On Thu, Aug 4, 2022 at 2:17 PM Andrew Gibbs <a....@gmail.com>
wrote:

> I'll just +1 that IMO a UI reminder is a great option.
>
> 2.3.0 added an alert when robots.txt was being scanned, which was great,
> because I ignored it for a while, but eventually caved and looked into it.
> This I would see as more of the same. I currently get loads of warnings
> about things in my config file that have moved; but typically only if I
> have to do a `airflow db upgrade` or `airflow dags backfill`. I see the
> alert, think "I must fix those", and then the Homer effect kicks in, and
> something else takes up the space in my brain. Having them as a banner in
> the GUI would 100% get me to sort it sooner or later.
>
> On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka <vi...@astronomer.io.invalid>
> wrote:
>
>> Hi Daniel,
>>
>> I completely agree with wanting to consolidate, but I am very concerned
>> about people needing to change their existing code.
>> As Ping and Felix also said, I worry that people ignore deprecation
>> warnings, especially since the original code authors don't really look at
>> logs or other pipeline artifacts unless something is broken.
>>
>> With respect to the options you outlined, I am personally leaning towards
>> (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48 has
>> added it in main) and take List[Dataset] in `schedule_interval`". I could
>> be convinced with (3), since I think that "timetable" is relatively recent
>> and have not seen as much usage for this invocation yet. I am definitely
>> curious about other perspectives on this as well.
>>
>> I have a hard time with option (1) of deprecating "schedule interval",
>> because of the historical nature of it and the code changes that people may
>> have to make to deal with it.
>>
>> Thanks for raising this topic and the discussion,
>>
>> Vikram
>>
>>
>> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
>> <da...@astronomer.io.invalid> wrote:
>>
>>> I agree with those (including myself) who have suggested that it would
>>> be great to have a deprecation fix tool.
>>>
>>> I also think that UI notifications would be valuable.  Another option
>>> would be to provide a less intrusive indicator somewhere (like when chrome
>>> lets you know you need to update), and then you click on it and it takes
>>> you to a page that lists out all the deprecation warnings.
>>>
>>>
>>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> I know we are distracting a bit from the main subject, but I think it's
>>>> worth it.
>>>>
>>>> I think this is a fact of life that people don't remove deprecations.
>>>> But we have a choice of "not doing anything about it" and complaining. Or
>>>> doing something that might improve it.
>>>>
>>>> I usually err on the side of "doing something" when I see a problem. I
>>>> am a doer :).
>>>>
>>>> And I have a possibly controversial proposal. I got the inspiration
>>>> from node.js old versions. Whenever an old version of Node is being used,
>>>> you have a 20 seconds delay artificially added and information that in
>>>> order to remove those 20 seconds you need to upgrade.
>>>>
>>>> Why don't we do that for some really old/serious deprecations ? Maybe
>>>> not 20 seconds. Maybe adding an unremovable RED warning in the UI if you
>>>> have any deprecation unhandled. Something that is annoying enough that
>>>> will cause people to make an action but not intrusive enough to break
>>>> things.
>>>>
>>>> I think this is a very basic psychological trait:
>>>>
>>>> If it does not cost the user anything to leave the deprecations, there
>>>> will be a big number of people who will do that.
>>>> We need to make sure that deprecations are costing our users "a little"
>>>> and that they are aware that they can remove the cost (and know how - this
>>>> is super important that we only show such a message with clear instructions
>>>> on what to do).
>>>>
>>>> Such a message might be annoying in many ways:
>>>>
>>>> * it can obstruct significant part of the screen
>>>> * It can be flashy and had very annoying colours (think bold pink fonts
>>>> on flash-green background for example)
>>>> * It can not allow to do anything before the user closes it with a
>>>> REALLY small 'x' that it is difficult to point at (think cookie banners).
>>>> And when they click they can be greeted with "You know - you can actually
>>>> remove this warning by doing THIS".
>>>> * and we have probably more options - our UX specialists might have
>>>> some good ideas here
>>>>
>>>> For anyone who complains about "people do not act on deprecation." -
>>>> yeah this is true.
>>>> But one universal truth there is that if we won't act, this will change
>>>> nothing in the behaviour of our users - this is 100% certain.
>>>>
>>>> If we do something - it might improve it. We will not get a 100%
>>>> success for sure, but we can at least do something to improve it.
>>>>
>>>> WDYT?
>>>>
>>>> J.
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall <fe...@pm.me.invalid>
>>>> wrote:
>>>>
>>>>> Hi Ping,
>>>>>
>>>>> I had this idea about auto-fixing deprecation warnings too when I
>>>>> started the airflint project.
>>>>> I totally agree with what you said that often people ignore these
>>>>> warnings and therefore a cli tool would come in very handy. It could look
>>>>> similar to the airflint project and also be based on the python refactor
>>>>> package, but I would keep it out of this discussion for now.
>>>>>
>>>>> Regarding the schedule arg, I don’t have a strong opinion there but
>>>>> also tend to having one schedule arg, because of simplicity for the
>>>>> dag author.
>>>>>
>>>>> Best,
>>>>> Felix
>>>>>
>>>>>
>>>>>
>>>>> Sent from Proton Mail for iOS
>>>>>
>>>>>
>>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>>
>>>>> I agree with Vikram, we should be very careful about deprecating
>>>>> existing params even if we have warnings around it. Not sure if this is a
>>>>> general case, but I notice that people ignore warnings all the time until
>>>>> it starts to break. Thus I am thinking if airflow can have a cli tool to
>>>>> help users to batch update these deprecating params just like python's
>>>>> futurize tool. This can greatly ease the burden of upgrading the airflow
>>>>> version.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ping
>>>>>
>>>>>
>>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>>
>>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>>> daniel.standish@astronomer.io> wrote:
>>>>>>
>>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>>
>>>>>>> Vikram, what kind of implications are you thinking of? Maybe you
>>>>>>> could provide a little more detail about your concerns?
>>>>>>>
>>>>>>> There are certainly alternatives, and of course each of them has
>>>>>>> tradeoffs.
>>>>>>>
>>>>>>> Here are the options I see:
>>>>>>>
>>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>>> timetable; the proposal under discussion in this thread)
>>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has
>>>>>>> added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>>
>>>>>>> Which would you propose we go with?
>>>>>>>
>>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>>> territory since this is still a discussion thread), my vote would
>>>>>>> definitely be to go with 3.
>>>>>>>
>>>>>>> The param `schedule_interval` as catch all is not *sooo *bad, but
>>>>>>> `schedule` definitely seems better. From a new user's perspective, having
>>>>>>> misleading names is confusing, and can be a negative when looking at
>>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>>>> it can be bothersome, certainly.
>>>>>>>
>>>>>>> Just to look at / compare feasibility.
>>>>>>>
>>>>>>> Deprecating schedule_interval, it would be more prominent than some
>>>>>>> other deprecations because it's a DAG param. That said, I think it's pretty
>>>>>>> straightforward to handle. For one, it's at the cardinality of DAG and not
>>>>>>> *task*, so there is less to change with than some other
>>>>>>> deprecations. And being kwargs-only, it's an easy find and replace that
>>>>>>> could be automated with an `upgrade-check` script. It's very easy to
>>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail the DAG
>>>>>>> parse (which forces user to fix). Perhaps we could even experiment with a
>>>>>>> deprecation fix tool, so that users could get help fixing deprecations, not
>>>>>>> necessarily in the context of an upgrade, since we're not looking at 3.0
>>>>>>> any time soon.
>>>>>>>
>>>>>>> We could compare with execution_date, if execution_date is to be
>>>>>>> removed in 3.0. I think the deprecation of execution_date will be a more
>>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>>>> custom operators. *(With schedule_interval, since it's a top-level
>>>>>>> dag kwarg, it will be easier.)* And execution_date was replaced
>>>>>>> with `logical_date` for similar reasons -- avoiding user confusion.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>>> -1 from me.
>>>>>>>>
>>>>>>>> Though I agree in principle with the idea of consolidation, I don't
>>>>>>>> think we should be doing this yet until we understand the implications
>>>>>>>> completely.
>>>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>>>> unless there is really no alternative.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>> So far, seems all in favor.
>>>>>>>>>
>>>>>>>>> I will just highlight, in case it's not clear.... when we release
>>>>>>>>> this (presumably 2.4), basically every single dag in existence will start
>>>>>>>>> emitting deprecation warnings, and prior to 3.0, basically every dag in
>>>>>>>>> existence will need to be updated.
>>>>>>>>>
>>>>>>>>> Thankfully, for most people, this should be an easy update since
>>>>>>>>> DAG is kwargs only, IIRC.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>>>
>>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1 from
>>>>>>>>>>> me. `schedule` seems like a nice catch-all.
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Andrew Gibbs <a....@gmail.com>.
I'll just +1 that IMO a UI reminder is a great option.

2.3.0 added an alert when robots.txt was being scanned, which was great,
because I ignored it for a while, but eventually caved and looked into it.
This I would see as more of the same. I currently get loads of warnings
about things in my config file that have moved; but typically only if I
have to do a `airflow db upgrade` or `airflow dags backfill`. I see the
alert, think "I must fix those", and then the Homer effect kicks in, and
something else takes up the space in my brain. Having them as a banner in
the GUI would 100% get me to sort it sooner or later.

On Thu, Aug 4, 2022 at 9:32 PM Vikram Koka <vi...@astronomer.io.invalid>
wrote:

> Hi Daniel,
>
> I completely agree with wanting to consolidate, but I am very concerned
> about people needing to change their existing code.
> As Ping and Felix also said, I worry that people ignore deprecation
> warnings, especially since the original code authors don't really look at
> logs or other pipeline artifacts unless something is broken.
>
> With respect to the options you outlined, I am personally leaning towards
> (4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48 has
> added it in main) and take List[Dataset] in `schedule_interval`". I could
> be convinced with (3), since I think that "timetable" is relatively recent
> and have not seen as much usage for this invocation yet. I am definitely
> curious about other perspectives on this as well.
>
> I have a hard time with option (1) of deprecating "schedule interval",
> because of the historical nature of it and the code changes that people may
> have to make to deal with it.
>
> Thanks for raising this topic and the discussion,
>
> Vikram
>
>
> On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
> <da...@astronomer.io.invalid> wrote:
>
>> I agree with those (including myself) who have suggested that it would be
>> great to have a deprecation fix tool.
>>
>> I also think that UI notifications would be valuable.  Another option
>> would be to provide a less intrusive indicator somewhere (like when chrome
>> lets you know you need to update), and then you click on it and it takes
>> you to a page that lists out all the deprecation warnings.
>>
>>
>> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> I know we are distracting a bit from the main subject, but I think it's
>>> worth it.
>>>
>>> I think this is a fact of life that people don't remove deprecations.
>>> But we have a choice of "not doing anything about it" and complaining. Or
>>> doing something that might improve it.
>>>
>>> I usually err on the side of "doing something" when I see a problem. I
>>> am a doer :).
>>>
>>> And I have a possibly controversial proposal. I got the inspiration from
>>> node.js old versions. Whenever an old version of Node is being used, you
>>> have a 20 seconds delay artificially added and information that in order to
>>> remove those 20 seconds you need to upgrade.
>>>
>>> Why don't we do that for some really old/serious deprecations ? Maybe
>>> not 20 seconds. Maybe adding an unremovable RED warning in the UI if you
>>> have any deprecation unhandled. Something that is annoying enough that
>>> will cause people to make an action but not intrusive enough to break
>>> things.
>>>
>>> I think this is a very basic psychological trait:
>>>
>>> If it does not cost the user anything to leave the deprecations, there
>>> will be a big number of people who will do that.
>>> We need to make sure that deprecations are costing our users "a little"
>>> and that they are aware that they can remove the cost (and know how - this
>>> is super important that we only show such a message with clear instructions
>>> on what to do).
>>>
>>> Such a message might be annoying in many ways:
>>>
>>> * it can obstruct significant part of the screen
>>> * It can be flashy and had very annoying colours (think bold pink fonts
>>> on flash-green background for example)
>>> * It can not allow to do anything before the user closes it with a
>>> REALLY small 'x' that it is difficult to point at (think cookie banners).
>>> And when they click they can be greeted with "You know - you can actually
>>> remove this warning by doing THIS".
>>> * and we have probably more options - our UX specialists might have some
>>> good ideas here
>>>
>>> For anyone who complains about "people do not act on deprecation." -
>>> yeah this is true.
>>> But one universal truth there is that if we won't act, this will change
>>> nothing in the behaviour of our users - this is 100% certain.
>>>
>>> If we do something - it might improve it. We will not get a 100% success
>>> for sure, but we can at least do something to improve it.
>>>
>>> WDYT?
>>>
>>> J.
>>>
>>>
>>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall <fe...@pm.me.invalid>
>>> wrote:
>>>
>>>> Hi Ping,
>>>>
>>>> I had this idea about auto-fixing deprecation warnings too when I
>>>> started the airflint project.
>>>> I totally agree with what you said that often people ignore these
>>>> warnings and therefore a cli tool would come in very handy. It could look
>>>> similar to the airflint project and also be based on the python refactor
>>>> package, but I would keep it out of this discussion for now.
>>>>
>>>> Regarding the schedule arg, I don’t have a strong opinion there but
>>>> also tend to having one schedule arg, because of simplicity for the
>>>> dag author.
>>>>
>>>> Best,
>>>> Felix
>>>>
>>>>
>>>>
>>>> Sent from Proton Mail for iOS
>>>>
>>>>
>>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>>>>
>>>> Hi Daniel,
>>>>
>>>> +1 for ‘schedule' and thanks for bringing this up.
>>>>
>>>> I agree with Vikram, we should be very careful about deprecating
>>>> existing params even if we have warnings around it. Not sure if this is a
>>>> general case, but I notice that people ignore warnings all the time until
>>>> it starts to break. Thus I am thinking if airflow can have a cli tool to
>>>> help users to batch update these deprecating params just like python's
>>>> futurize tool. This can greatly ease the burden of upgrading the airflow
>>>> version.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Ping
>>>>
>>>>
>>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>>> <da...@astronomer.io.invalid> wrote:
>>>>
>>>>> In case you forgot about us @Vikram, gentle nudge
>>>>>
>>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>>> daniel.standish@astronomer.io> wrote:
>>>>>
>>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>>
>>>>>> Vikram, what kind of implications are you thinking of? Maybe you
>>>>>> could provide a little more detail about your concerns?
>>>>>>
>>>>>> There are certainly alternatives, and of course each of them has
>>>>>> tradeoffs.
>>>>>>
>>>>>> Here are the options I see:
>>>>>>
>>>>>> 1. consolidate to schedule (deprecate schedule_interval and
>>>>>> timetable; the proposal under discussion in this thread)
>>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has
>>>>>> added it in main) and take List[Dataset] in `schedule_interval`
>>>>>>
>>>>>> Which would you propose we go with?
>>>>>>
>>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>>> territory since this is still a discussion thread), my vote would
>>>>>> definitely be to go with 3.
>>>>>>
>>>>>> The param `schedule_interval` as catch all is not *sooo *bad, but
>>>>>> `schedule` definitely seems better. From a new user's perspective, having
>>>>>> misleading names is confusing, and can be a negative when looking at
>>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>>> it can be bothersome, certainly.
>>>>>>
>>>>>> Just to look at / compare feasibility.
>>>>>>
>>>>>> Deprecating schedule_interval, it would be more prominent than some
>>>>>> other deprecations because it's a DAG param. That said, I think it's pretty
>>>>>> straightforward to handle. For one, it's at the cardinality of DAG and not
>>>>>> *task*, so there is less to change with than some other
>>>>>> deprecations. And being kwargs-only, it's an easy find and replace that
>>>>>> could be automated with an `upgrade-check` script. It's very easy to
>>>>>> detect, either to warn about, or ultimately, after upgrade, to fail the DAG
>>>>>> parse (which forces user to fix). Perhaps we could even experiment with a
>>>>>> deprecation fix tool, so that users could get help fixing deprecations, not
>>>>>> necessarily in the context of an upgrade, since we're not looking at 3.0
>>>>>> any time soon.
>>>>>>
>>>>>> We could compare with execution_date, if execution_date is to be
>>>>>> removed in 3.0. I think the deprecation of execution_date will be a more
>>>>>> difficult thing for users to handle. For one, potentially higher
>>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>>> custom operators. *(With schedule_interval, since it's a top-level
>>>>>> dag kwarg, it will be easier.)* And execution_date was replaced with
>>>>>> `logical_date` for similar reasons -- avoiding user confusion.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> -1 from me.
>>>>>>>
>>>>>>> Though I agree in principle with the idea of consolidation, I don't
>>>>>>> think we should be doing this yet until we understand the implications
>>>>>>> completely.
>>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>>> unless there is really no alternative.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>>> So far, seems all in favor.
>>>>>>>>
>>>>>>>> I will just highlight, in case it's not clear.... when we release
>>>>>>>> this (presumably 2.4), basically every single dag in existence will start
>>>>>>>> emitting deprecation warnings, and prior to 3.0, basically every dag in
>>>>>>>> existence will need to be updated.
>>>>>>>>
>>>>>>>> Thankfully, for most people, this should be an easy update since
>>>>>>>> DAG is kwargs only, IIRC.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>>
>>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>>
>>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>>
>>>>>>>>>> Consolidating to a single scheduling parameter is a big +1 from
>>>>>>>>>> me. `schedule` seems like a nice catch-all.
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Vikram Koka <vi...@astronomer.io.INVALID>.
Hi Daniel,

I completely agree with wanting to consolidate, but I am very concerned
about people needing to change their existing code.
As Ping and Felix also said, I worry that people ignore deprecation
warnings, especially since the original code authors don't really look at
logs or other pipeline artifacts unless something is broken.

With respect to the options you outlined, I am personally leaning towards
(4) i.e. "don't deprecate timetable, but remove schedule_on (AIP-48 has
added it in main) and take List[Dataset] in `schedule_interval`". I could
be convinced with (3), since I think that "timetable" is relatively recent
and have not seen as much usage for this invocation yet. I am definitely
curious about other perspectives on this as well.

I have a hard time with option (1) of deprecating "schedule interval",
because of the historical nature of it and the code changes that people may
have to make to deal with it.

Thanks for raising this topic and the discussion,

Vikram


On Thu, Aug 4, 2022 at 10:24 AM Daniel Standish
<da...@astronomer.io.invalid> wrote:

> I agree with those (including myself) who have suggested that it would be
> great to have a deprecation fix tool.
>
> I also think that UI notifications would be valuable.  Another option
> would be to provide a less intrusive indicator somewhere (like when chrome
> lets you know you need to update), and then you click on it and it takes
> you to a page that lists out all the deprecation warnings.
>
>
> On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> I know we are distracting a bit from the main subject, but I think it's
>> worth it.
>>
>> I think this is a fact of life that people don't remove deprecations. But
>> we have a choice of "not doing anything about it" and complaining. Or doing
>> something that might improve it.
>>
>> I usually err on the side of "doing something" when I see a problem. I am
>> a doer :).
>>
>> And I have a possibly controversial proposal. I got the inspiration from
>> node.js old versions. Whenever an old version of Node is being used, you
>> have a 20 seconds delay artificially added and information that in order to
>> remove those 20 seconds you need to upgrade.
>>
>> Why don't we do that for some really old/serious deprecations ? Maybe not
>> 20 seconds. Maybe adding an unremovable RED warning in the UI if you have
>> any deprecation unhandled. Something that is annoying enough that
>> will cause people to make an action but not intrusive enough to break
>> things.
>>
>> I think this is a very basic psychological trait:
>>
>> If it does not cost the user anything to leave the deprecations, there
>> will be a big number of people who will do that.
>> We need to make sure that deprecations are costing our users "a little"
>> and that they are aware that they can remove the cost (and know how - this
>> is super important that we only show such a message with clear instructions
>> on what to do).
>>
>> Such a message might be annoying in many ways:
>>
>> * it can obstruct significant part of the screen
>> * It can be flashy and had very annoying colours (think bold pink fonts
>> on flash-green background for example)
>> * It can not allow to do anything before the user closes it with a REALLY
>> small 'x' that it is difficult to point at (think cookie banners). And when
>> they click they can be greeted with "You know - you can actually remove
>> this warning by doing THIS".
>> * and we have probably more options - our UX specialists might have some
>> good ideas here
>>
>> For anyone who complains about "people do not act on deprecation." - yeah
>> this is true.
>> But one universal truth there is that if we won't act, this will change
>> nothing in the behaviour of our users - this is 100% certain.
>>
>> If we do something - it might improve it. We will not get a 100% success
>> for sure, but we can at least do something to improve it.
>>
>> WDYT?
>>
>> J.
>>
>>
>> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall <fe...@pm.me.invalid>
>> wrote:
>>
>>> Hi Ping,
>>>
>>> I had this idea about auto-fixing deprecation warnings too when I
>>> started the airflint project.
>>> I totally agree with what you said that often people ignore these
>>> warnings and therefore a cli tool would come in very handy. It could look
>>> similar to the airflint project and also be based on the python refactor
>>> package, but I would keep it out of this discussion for now.
>>>
>>> Regarding the schedule arg, I don’t have a strong opinion there but
>>> also tend to having one schedule arg, because of simplicity for the dag
>>> author.
>>>
>>> Best,
>>> Felix
>>>
>>>
>>>
>>> Sent from Proton Mail for iOS
>>>
>>>
>>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>>>
>>> Hi Daniel,
>>>
>>> +1 for ‘schedule' and thanks for bringing this up.
>>>
>>> I agree with Vikram, we should be very careful about deprecating
>>> existing params even if we have warnings around it. Not sure if this is a
>>> general case, but I notice that people ignore warnings all the time until
>>> it starts to break. Thus I am thinking if airflow can have a cli tool to
>>> help users to batch update these deprecating params just like python's
>>> futurize tool. This can greatly ease the burden of upgrading the airflow
>>> version.
>>>
>>>
>>> Thanks,
>>>
>>> Ping
>>>
>>>
>>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>>> <da...@astronomer.io.invalid> wrote:
>>>
>>>> In case you forgot about us @Vikram, gentle nudge
>>>>
>>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>>> daniel.standish@astronomer.io> wrote:
>>>>
>>>>> Well, this is a discussion thread, so let's discuss!
>>>>>
>>>>> Vikram, what kind of implications are you thinking of? Maybe you could
>>>>> provide a little more detail about your concerns?
>>>>>
>>>>> There are certainly alternatives, and of course each of them has
>>>>> tradeoffs.
>>>>>
>>>>> Here are the options I see:
>>>>>
>>>>> 1. consolidate to schedule (deprecate schedule_interval and timetable;
>>>>> the proposal under discussion in this thread)
>>>>> 2. do nothing (keep 3 different scheduling params)
>>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has added
>>>>> it in main) and take List[Dataset] in `schedule_interval`
>>>>>
>>>>> Which would you propose we go with?
>>>>>
>>>>> If we have a veto against 1 (and again, we're in hypothetical
>>>>> territory since this is still a discussion thread), my vote would
>>>>> definitely be to go with 3.
>>>>>
>>>>> The param `schedule_interval` as catch all is not *sooo *bad, but
>>>>> `schedule` definitely seems better. From a new user's perspective, having
>>>>> misleading names is confusing, and can be a negative when looking at
>>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>>> it can be bothersome, certainly.
>>>>>
>>>>> Just to look at / compare feasibility.
>>>>>
>>>>> Deprecating schedule_interval, it would be more prominent than some
>>>>> other deprecations because it's a DAG param. That said, I think it's pretty
>>>>> straightforward to handle. For one, it's at the cardinality of DAG and not
>>>>> *task*, so there is less to change with than some other deprecations.
>>>>> And being kwargs-only, it's an easy find and replace that could be
>>>>> automated with an `upgrade-check` script. It's very easy to detect, either
>>>>> to warn about, or ultimately, after upgrade, to fail the DAG parse (which
>>>>> forces user to fix). Perhaps we could even experiment with a deprecation
>>>>> fix tool, so that users could get help fixing deprecations, not necessarily
>>>>> in the context of an upgrade, since we're not looking at 3.0 any time soon.
>>>>>
>>>>> We could compare with execution_date, if execution_date is to be
>>>>> removed in 3.0. I think the deprecation of execution_date will be a more
>>>>> difficult thing for users to handle. For one, potentially higher
>>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>>> custom operators. *(With schedule_interval, since it's a top-level
>>>>> dag kwarg, it will be easier.)* And execution_date was replaced with
>>>>> `logical_date` for similar reasons -- avoiding user confusion.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> -1 from me.
>>>>>>
>>>>>> Though I agree in principle with the idea of consolidation, I don't
>>>>>> think we should be doing this yet until we understand the implications
>>>>>> completely.
>>>>>> I am really not in favor of deprecation of the existing params,
>>>>>> unless there is really no alternative.
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> So far, seems all in favor.
>>>>>>>
>>>>>>> I will just highlight, in case it's not clear.... when we release
>>>>>>> this (presumably 2.4), basically every single dag in existence will start
>>>>>>> emitting deprecation warnings, and prior to 3.0, basically every dag in
>>>>>>> existence will need to be updated.
>>>>>>>
>>>>>>> Thankfully, for most people, this should be an easy update since DAG
>>>>>>> is kwargs only, IIRC.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>>> +1 to consolidating and "schedule".
>>>>>>>>
>>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>>
>>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>>
>>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>>
>>>>>>>>> Consolidating to a single scheduling parameter is a big +1 from
>>>>>>>>> me. `schedule` seems like a nice catch-all.
>>>>>>>>>
>>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
I agree with those (including myself) who have suggested that it would be
great to have a deprecation fix tool.

I also think that UI notifications would be valuable.  Another option would
be to provide a less intrusive indicator somewhere (like when chrome lets
you know you need to update), and then you click on it and it takes you to
a page that lists out all the deprecation warnings.


On Thu, Aug 4, 2022 at 1:03 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> I know we are distracting a bit from the main subject, but I think it's
> worth it.
>
> I think this is a fact of life that people don't remove deprecations. But
> we have a choice of "not doing anything about it" and complaining. Or doing
> something that might improve it.
>
> I usually err on the side of "doing something" when I see a problem. I am
> a doer :).
>
> And I have a possibly controversial proposal. I got the inspiration from
> node.js old versions. Whenever an old version of Node is being used, you
> have a 20 seconds delay artificially added and information that in order to
> remove those 20 seconds you need to upgrade.
>
> Why don't we do that for some really old/serious deprecations ? Maybe not
> 20 seconds. Maybe adding an unremovable RED warning in the UI if you have
> any deprecation unhandled. Something that is annoying enough that
> will cause people to make an action but not intrusive enough to break
> things.
>
> I think this is a very basic psychological trait:
>
> If it does not cost the user anything to leave the deprecations, there
> will be a big number of people who will do that.
> We need to make sure that deprecations are costing our users "a little"
> and that they are aware that they can remove the cost (and know how - this
> is super important that we only show such a message with clear instructions
> on what to do).
>
> Such a message might be annoying in many ways:
>
> * it can obstruct significant part of the screen
> * It can be flashy and had very annoying colours (think bold pink fonts on
> flash-green background for example)
> * It can not allow to do anything before the user closes it with a REALLY
> small 'x' that it is difficult to point at (think cookie banners). And when
> they click they can be greeted with "You know - you can actually remove
> this warning by doing THIS".
> * and we have probably more options - our UX specialists might have some
> good ideas here
>
> For anyone who complains about "people do not act on deprecation." - yeah
> this is true.
> But one universal truth there is that if we won't act, this will change
> nothing in the behaviour of our users - this is 100% certain.
>
> If we do something - it might improve it. We will not get a 100% success
> for sure, but we can at least do something to improve it.
>
> WDYT?
>
> J.
>
>
> On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall <fe...@pm.me.invalid>
> wrote:
>
>> Hi Ping,
>>
>> I had this idea about auto-fixing deprecation warnings too when I started
>> the airflint project.
>> I totally agree with what you said that often people ignore these
>> warnings and therefore a cli tool would come in very handy. It could look
>> similar to the airflint project and also be based on the python refactor
>> package, but I would keep it out of this discussion for now.
>>
>> Regarding the schedule arg, I don’t have a strong opinion there but
>> also tend to having one schedule arg, because of simplicity for the dag
>> author.
>>
>> Best,
>> Felix
>>
>>
>>
>> Sent from Proton Mail for iOS
>>
>>
>> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>>
>> Hi Daniel,
>>
>> +1 for ‘schedule' and thanks for bringing this up.
>>
>> I agree with Vikram, we should be very careful about deprecating existing
>> params even if we have warnings around it. Not sure if this is a general
>> case, but I notice that people ignore warnings all the time until it starts
>> to break. Thus I am thinking if airflow can have a cli tool to help users
>> to batch update these deprecating params just like python's futurize tool.
>> This can greatly ease the burden of upgrading the airflow version.
>>
>>
>> Thanks,
>>
>> Ping
>>
>>
>> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
>> <da...@astronomer.io.invalid> wrote:
>>
>>> In case you forgot about us @Vikram, gentle nudge
>>>
>>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>>> daniel.standish@astronomer.io> wrote:
>>>
>>>> Well, this is a discussion thread, so let's discuss!
>>>>
>>>> Vikram, what kind of implications are you thinking of? Maybe you could
>>>> provide a little more detail about your concerns?
>>>>
>>>> There are certainly alternatives, and of course each of them has
>>>> tradeoffs.
>>>>
>>>> Here are the options I see:
>>>>
>>>> 1. consolidate to schedule (deprecate schedule_interval and timetable;
>>>> the proposal under discussion in this thread)
>>>> 2. do nothing (keep 3 different scheduling params)
>>>> 3. consolidate to schedule_interval (deprecate timetable)
>>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has added
>>>> it in main) and take List[Dataset] in `schedule_interval`
>>>>
>>>> Which would you propose we go with?
>>>>
>>>> If we have a veto against 1 (and again, we're in hypothetical territory
>>>> since this is still a discussion thread), my vote would definitely be to go
>>>> with 3.
>>>>
>>>> The param `schedule_interval` as catch all is not *sooo *bad, but
>>>> `schedule` definitely seems better. From a new user's perspective, having
>>>> misleading names is confusing, and can be a negative when looking at
>>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>>> it can be bothersome, certainly.
>>>>
>>>> Just to look at / compare feasibility.
>>>>
>>>> Deprecating schedule_interval, it would be more prominent than some
>>>> other deprecations because it's a DAG param. That said, I think it's pretty
>>>> straightforward to handle. For one, it's at the cardinality of DAG and not
>>>> *task*, so there is less to change with than some other deprecations.
>>>> And being kwargs-only, it's an easy find and replace that could be
>>>> automated with an `upgrade-check` script. It's very easy to detect, either
>>>> to warn about, or ultimately, after upgrade, to fail the DAG parse (which
>>>> forces user to fix). Perhaps we could even experiment with a deprecation
>>>> fix tool, so that users could get help fixing deprecations, not necessarily
>>>> in the context of an upgrade, since we're not looking at 3.0 any time soon.
>>>>
>>>> We could compare with execution_date, if execution_date is to be
>>>> removed in 3.0. I think the deprecation of execution_date will be a more
>>>> difficult thing for users to handle. For one, potentially higher
>>>> cardinality, since it is used in tasks. But also, the use of execution_date
>>>> can be harder to detect, since it can be in templates and elsewhere in
>>>> custom operators. *(With schedule_interval, since it's a top-level dag
>>>> kwarg, it will be easier.)* And execution_date was replaced with
>>>> `logical_date` for similar reasons -- avoiding user confusion.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka
>>>> <vi...@astronomer.io.invalid> wrote:
>>>>
>>>>> -1 from me.
>>>>>
>>>>> Though I agree in principle with the idea of consolidation, I don't
>>>>> think we should be doing this yet until we understand the implications
>>>>> completely.
>>>>> I am really not in favor of deprecation of the existing params, unless
>>>>> there is really no alternative.
>>>>>
>>>>>
>>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>>> <da...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> So far, seems all in favor.
>>>>>>
>>>>>> I will just highlight, in case it's not clear.... when we release
>>>>>> this (presumably 2.4), basically every single dag in existence will start
>>>>>> emitting deprecation warnings, and prior to 3.0, basically every dag in
>>>>>> existence will need to be updated.
>>>>>>
>>>>>> Thankfully, for most people, this should be an easy update since DAG
>>>>>> is kwargs only, IIRC.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> +1 to consolidating and "schedule".
>>>>>>>
>>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>>> +1 for ‘schedule', and another +1 to the importance of
>>>>>>>> consolidating to one being more important than the name of that one
>>>>>>>>
>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>>
>>>>>>>> Consolidating to a single scheduling parameter is a big +1 from me.
>>>>>>>> `schedule` seems like a nice catch-all.
>>>>>>>>
>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>>
>>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>>
>>>>>>>>
>>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Jarek Potiuk <ja...@potiuk.com>.
I know we are distracting a bit from the main subject, but I think it's
worth it.

I think this is a fact of life that people don't remove deprecations. But
we have a choice of "not doing anything about it" and complaining. Or doing
something that might improve it.

I usually err on the side of "doing something" when I see a problem. I am a
doer :).

And I have a possibly controversial proposal. I got the inspiration from
node.js old versions. Whenever an old version of Node is being used, you
have a 20 seconds delay artificially added and information that in order to
remove those 20 seconds you need to upgrade.

Why don't we do that for some really old/serious deprecations ? Maybe not
20 seconds. Maybe adding an unremovable RED warning in the UI if you have
any deprecation unhandled. Something that is annoying enough that
will cause people to make an action but not intrusive enough to break
things.

I think this is a very basic psychological trait:

If it does not cost the user anything to leave the deprecations, there will
be a big number of people who will do that.
We need to make sure that deprecations are costing our users "a little" and
that they are aware that they can remove the cost (and know how - this is
super important that we only show such a message with clear instructions on
what to do).

Such a message might be annoying in many ways:

* it can obstruct significant part of the screen
* It can be flashy and had very annoying colours (think bold pink fonts on
flash-green background for example)
* It can not allow to do anything before the user closes it with a REALLY
small 'x' that it is difficult to point at (think cookie banners). And when
they click they can be greeted with "You know - you can actually remove
this warning by doing THIS".
* and we have probably more options - our UX specialists might have some
good ideas here

For anyone who complains about "people do not act on deprecation." - yeah
this is true.
But one universal truth there is that if we won't act, this will change
nothing in the behaviour of our users - this is 100% certain.

If we do something - it might improve it. We will not get a 100% success
for sure, but we can at least do something to improve it.

WDYT?

J.


On Thu, Aug 4, 2022 at 6:52 AM Felix Uellendall <fe...@pm.me.invalid>
wrote:

> Hi Ping,
>
> I had this idea about auto-fixing deprecation warnings too when I started
> the airflint project.
> I totally agree with what you said that often people ignore these warnings
> and therefore a cli tool would come in very handy. It could look similar to
> the airflint project and also be based on the python refactor package, but
> I would keep it out of this discussion for now.
>
> Regarding the schedule arg, I don’t have a strong opinion there but
> also tend to having one schedule arg, because of simplicity for the dag
> author.
>
> Best,
> Felix
>
>
>
> Sent from Proton Mail for iOS
>
>
> On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:
>
> Hi Daniel,
>
> +1 for ‘schedule' and thanks for bringing this up.
>
> I agree with Vikram, we should be very careful about deprecating existing
> params even if we have warnings around it. Not sure if this is a general
> case, but I notice that people ignore warnings all the time until it starts
> to break. Thus I am thinking if airflow can have a cli tool to help users
> to batch update these deprecating params just like python's futurize tool.
> This can greatly ease the burden of upgrading the airflow version.
>
>
> Thanks,
>
> Ping
>
>
> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
> <da...@astronomer.io.invalid> wrote:
>
>> In case you forgot about us @Vikram, gentle nudge
>>
>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
>> daniel.standish@astronomer.io> wrote:
>>
>>> Well, this is a discussion thread, so let's discuss!
>>>
>>> Vikram, what kind of implications are you thinking of? Maybe you could
>>> provide a little more detail about your concerns?
>>>
>>> There are certainly alternatives, and of course each of them has
>>> tradeoffs.
>>>
>>> Here are the options I see:
>>>
>>> 1. consolidate to schedule (deprecate schedule_interval and timetable;
>>> the proposal under discussion in this thread)
>>> 2. do nothing (keep 3 different scheduling params)
>>> 3. consolidate to schedule_interval (deprecate timetable)
>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has added
>>> it in main) and take List[Dataset] in `schedule_interval`
>>>
>>> Which would you propose we go with?
>>>
>>> If we have a veto against 1 (and again, we're in hypothetical territory
>>> since this is still a discussion thread), my vote would definitely be to go
>>> with 3.
>>>
>>> The param `schedule_interval` as catch all is not *sooo *bad, but
>>> `schedule` definitely seems better. From a new user's perspective, having
>>> misleading names is confusing, and can be a negative when looking at
>>> adoption. And for a tenured user, it can nag at you. And dealing with
>>> deprecations as an Airflow user, well it's sort of part of the job, though
>>> it can be bothersome, certainly.
>>>
>>> Just to look at / compare feasibility.
>>>
>>> Deprecating schedule_interval, it would be more prominent than some
>>> other deprecations because it's a DAG param. That said, I think it's pretty
>>> straightforward to handle. For one, it's at the cardinality of DAG and not
>>> *task*, so there is less to change with than some other deprecations.
>>> And being kwargs-only, it's an easy find and replace that could be
>>> automated with an `upgrade-check` script. It's very easy to detect, either
>>> to warn about, or ultimately, after upgrade, to fail the DAG parse (which
>>> forces user to fix). Perhaps we could even experiment with a deprecation
>>> fix tool, so that users could get help fixing deprecations, not necessarily
>>> in the context of an upgrade, since we're not looking at 3.0 any time soon.
>>>
>>> We could compare with execution_date, if execution_date is to be removed
>>> in 3.0. I think the deprecation of execution_date will be a more difficult
>>> thing for users to handle. For one, potentially higher cardinality, since
>>> it is used in tasks. But also, the use of execution_date can be harder to
>>> detect, since it can be in templates and elsewhere in custom operators. *(With
>>> schedule_interval, since it's a top-level dag kwarg, it will be easier.)*
>>> And execution_date was replaced with `logical_date` for similar reasons --
>>> avoiding user confusion.
>>>
>>>
>>>
>>>
>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka <vi...@astronomer.io.invalid>
>>> wrote:
>>>
>>>> -1 from me.
>>>>
>>>> Though I agree in principle with the idea of consolidation, I don't
>>>> think we should be doing this yet until we understand the implications
>>>> completely.
>>>> I am really not in favor of deprecation of the existing params, unless
>>>> there is really no alternative.
>>>>
>>>>
>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>>> <da...@astronomer.io.invalid> wrote:
>>>>
>>>>> So far, seems all in favor.
>>>>>
>>>>> I will just highlight, in case it's not clear.... when we release this
>>>>> (presumably 2.4), basically every single dag in existence will start
>>>>> emitting deprecation warnings, and prior to 3.0, basically every dag in
>>>>> existence will need to be updated.
>>>>>
>>>>> Thankfully, for most people, this should be an easy update since DAG
>>>>> is kwargs only, IIRC.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi
>>>>> <br...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> +1 to consolidating and "schedule".
>>>>>>
>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> +1 for ‘schedule', and another +1 to the importance of consolidating
>>>>>>> to one being more important than the name of that one
>>>>>>>
>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>>
>>>>>>> Consolidating to a single scheduling parameter is a big +1 from me.
>>>>>>> `schedule` seems like a nice catch-all.
>>>>>>>
>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>>> jedcunningham@apache.org> wrote:
>>>>>>>
>>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>>
>>>>>>>
>>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Felix Uellendall <fe...@pm.me.INVALID>.
Hi Ping,

I had this idea about auto-fixing deprecation warnings too when I started the airflint project.
I totally agree with what you said that often people ignore these warnings and therefore a cli tool would come in very handy. It could look similar to the airflint project and also be based on the python refactor package, but I would keep it out of this discussion for now.

Regarding the schedule arg, I don’t have a strong opinion there but also tend to having one schedule arg, because of simplicity for the dag author.

Best,
Felix

Sent from Proton Mail for iOS

On Thu, Aug 4, 2022 at 00:23, Ping Zhang <pi...@umich.edu> wrote:

> Hi Daniel,
>
> +1 for ‘schedule' and thanks for bringing this up.
>
> I agree with Vikram, we should be very careful about deprecating existing params even if we have warnings around it. Not sure if this is a general case, but I notice that people ignore warnings all the time until it starts to break. Thus I am thinking if airflow can have a cli tool to help users to batch update these deprecating params just like python's futurize tool. This can greatly ease the burden of upgrading the airflow version.
>
> Thanks,
>
> Ping
>
> On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish <da...@astronomer.io.invalid> wrote:
>
>> In case you forgot about us @Vikram, gentle nudge
>>
>> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <da...@astronomer.io> wrote:
>>
>>> Well, this is a discussion thread, so let's discuss!
>>>
>>> Vikram, what kind of implications are you thinking of? Maybe you could provide a little more detail about your concerns?
>>>
>>> There are certainly alternatives, and of course each of them has tradeoffs.
>>>
>>> Here are the options I see:
>>>
>>> 1. consolidate to schedule (deprecate schedule_interval and timetable; the proposal under discussion in this thread)
>>> 2. do nothing (keep 3 different scheduling params)
>>> 3. consolidate to schedule_interval (deprecate timetable)
>>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has added it in main) and take List[Dataset] in `schedule_interval`
>>>
>>> Which would you propose we go with?
>>>
>>> If we have a veto against 1 (and again, we're in hypothetical territory since this is still a discussion thread), my vote would definitely be to go with 3.
>>>
>>> The param `schedule_interval` as catch all is not sooo bad, but `schedule` definitely seems better. From a new user's perspective, having misleading names is confusing, and can be a negative when looking at adoption. And for a tenured user, it can nag at you. And dealing with deprecations as an Airflow user, well it's sort of part of the job, though it can be bothersome, certainly.
>>>
>>> Just to look at / compare feasibility.
>>>
>>> Deprecating schedule_interval, it would be more prominent than some other deprecations because it's a DAG param. That said, I think it's pretty straightforward to handle. For one, it's at the cardinality of DAG and not task, so there is less to change with than some other deprecations. And being kwargs-only, it's an easy find and replace that could be automated with an `upgrade-check` script. It's very easy to detect, either to warn about, or ultimately, after upgrade, to fail the DAG parse (which forces user to fix). Perhaps we could even experiment with a deprecation fix tool, so that users could get help fixing deprecations, not necessarily in the context of an upgrade, since we're not looking at 3.0 any time soon.
>>>
>>> We could compare with execution_date, if execution_date is to be removed in 3.0. I think the deprecation of execution_date will be a more difficult thing for users to handle. For one, potentially higher cardinality, since it is used in tasks. But also, the use of execution_date can be harder to detect, since it can be in templates and elsewhere in custom operators. (With schedule_interval, since it's a top-level dag kwarg, it will be easier.) And execution_date was replaced with `logical_date` for similar reasons -- avoiding user confusion.
>>>
>>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka <vi...@astronomer.io.invalid> wrote:
>>>
>>>> -1 from me.
>>>>
>>>> Though I agree in principle with the idea of consolidation, I don't think we should be doing this yet until we understand the implications completely.
>>>> I am really not in favor of deprecation of the existing params, unless there is really no alternative.
>>>>
>>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish <da...@astronomer.io.invalid> wrote:
>>>>
>>>>> So far, seems all in favor.
>>>>>
>>>>> I will just highlight, in case it's not clear.... when we release this (presumably 2.4), basically every single dag in existence will start emitting deprecation warnings, and prior to 3.0, basically every dag in existence will need to be updated.
>>>>>
>>>>> Thankfully, for most people, this should be an easy update since DAG is kwargs only, IIRC.
>>>>>
>>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi <br...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> +1 to consolidating and "schedule".
>>>>>>
>>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl <dr...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> +1 for ‘schedule', and another +1 to the importance of consolidating to one being more important than the name of that one
>>>>>>>
>>>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <jo...@astronomer.io.INVALID> wrote:
>>>>>>>>
>>>>>>>> Consolidating to a single scheduling parameter is a big +1 from me. `schedule` seems like a nice catch-all.
>>>>>>>>
>>>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <je...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> +1 on moving to a single `schedule` param.

Re: [DISCUSS] consolidate dag scheduling params

Posted by Ping Zhang <pi...@umich.edu>.
Hi Daniel,

+1 for ‘schedule' and thanks for bringing this up.

I agree with Vikram, we should be very careful about deprecating existing
params even if we have warnings around it. Not sure if this is a general
case, but I notice that people ignore warnings all the time until it starts
to break. Thus I am thinking if airflow can have a cli tool to help users
to batch update these deprecating params just like python's futurize tool.
This can greatly ease the burden of upgrading the airflow version.


Thanks,

Ping


On Wed, Aug 3, 2022 at 3:13 PM Daniel Standish
<da...@astronomer.io.invalid> wrote:

> In case you forgot about us @Vikram, gentle nudge
>
> On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
> daniel.standish@astronomer.io> wrote:
>
>> Well, this is a discussion thread, so let's discuss!
>>
>> Vikram, what kind of implications are you thinking of?  Maybe you could
>> provide a little more detail about your concerns?
>>
>> There are certainly alternatives, and of course each of them has
>> tradeoffs.
>>
>> Here are the options I see:
>>
>> 1. consolidate to schedule (deprecate schedule_interval and timetable;
>> the proposal under discussion in this thread)
>> 2. do nothing (keep 3 different scheduling params)
>> 3. consolidate to schedule_interval (deprecate timetable)
>> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has added it
>> in main) and take List[Dataset] in `schedule_interval`
>>
>> Which would you propose we go with?
>>
>> If we have a veto against 1 (and again, we're in hypothetical territory
>> since this is still a discussion thread), my vote would definitely be to go
>> with 3.
>>
>> The param `schedule_interval` as catch all is not *sooo *bad, but
>> `schedule` definitely seems better.  From a new user's perspective, having
>> misleading names is confusing, and can be a negative when looking at
>> adoption.  And for a tenured user, it can nag at you.  And dealing with
>> deprecations as an Airflow user, well it's sort of part of the job, though
>> it can be bothersome, certainly.
>>
>> Just to look at / compare feasibility.
>>
>> Deprecating schedule_interval, it would be more prominent than some other
>> deprecations because it's a DAG param.  That said, I think it's pretty
>> straightforward to handle.  For one, it's at the cardinality of DAG and not
>> *task*, so there is less to change with than some other deprecations.
>> And being kwargs-only, it's an easy find and replace that could be
>> automated with an `upgrade-check` script.  It's very easy to detect, either
>> to warn about, or ultimately, after upgrade, to fail the DAG parse (which
>> forces user to fix).  Perhaps we could even experiment with a deprecation
>> fix tool, so that users could get help fixing deprecations, not necessarily
>> in the context of an upgrade, since we're not looking at 3.0 any time
>> soon.
>>
>> We could compare with execution_date, if execution_date is to be removed
>> in 3.0.  I think the deprecation of execution_date will be a more difficult
>> thing for users to handle.  For one, potentially higher cardinality, since
>> it is used in tasks.  But also, the use of execution_date can be harder to
>> detect, since it can be in templates and elsewhere in custom operators.  *(With
>> schedule_interval, since it's a top-level dag kwarg, it will be easier.)*
>> And execution_date was replaced with `logical_date` for similar reasons --
>> avoiding user confusion.
>>
>>
>>
>>
>> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka <vi...@astronomer.io.invalid>
>> wrote:
>>
>>> -1 from me.
>>>
>>> Though I agree in principle with the idea of consolidation, I don't
>>> think we should be doing this yet until we understand the implications
>>> completely.
>>> I am really not in favor of deprecation of the existing params, unless
>>> there is really no alternative.
>>>
>>>
>>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>>> <da...@astronomer.io.invalid> wrote:
>>>
>>>> So far, seems all in favor.
>>>>
>>>> I will just highlight, in case it's not clear.... when we release this
>>>> (presumably 2.4), basically every single dag in existence will start
>>>> emitting deprecation warnings, and prior to 3.0, basically every dag in
>>>> existence will need to be updated.
>>>>
>>>> Thankfully, for most people, this should be an easy update since DAG is
>>>> kwargs only, IIRC.
>>>>
>>>>
>>>>
>>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi <br...@astronomer.io.invalid>
>>>> wrote:
>>>>
>>>>> +1 to consolidating and "schedule".
>>>>>
>>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>>> <dr...@astronomer.io.invalid> wrote:
>>>>>
>>>>>> +1 for ‘schedule', and another +1 to the importance of consolidating
>>>>>> to one being more important than the name of that one
>>>>>>
>>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>>
>>>>>> Consolidating to a single scheduling parameter is a big +1 from me.
>>>>>> `schedule` seems like a nice catch-all.
>>>>>>
>>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>>> jedcunningham@apache.org> wrote:
>>>>>>
>>>>>>> +1 on moving to a single `schedule` param.
>>>>>>>
>>>>>>
>>>>>>

Re: [DISCUSS] consolidate dag scheduling params

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
In case you forgot about us @Vikram, gentle nudge

On Mon, Aug 1, 2022 at 8:40 AM Daniel Standish <
daniel.standish@astronomer.io> wrote:

> Well, this is a discussion thread, so let's discuss!
>
> Vikram, what kind of implications are you thinking of?  Maybe you could
> provide a little more detail about your concerns?
>
> There are certainly alternatives, and of course each of them has tradeoffs.
>
> Here are the options I see:
>
> 1. consolidate to schedule (deprecate schedule_interval and timetable; the
> proposal under discussion in this thread)
> 2. do nothing (keep 3 different scheduling params)
> 3. consolidate to schedule_interval (deprecate timetable)
> 4. don't deprecate timetable, but remove schedule_on (AIP-48 has added it
> in main) and take List[Dataset] in `schedule_interval`
>
> Which would you propose we go with?
>
> If we have a veto against 1 (and again, we're in hypothetical territory
> since this is still a discussion thread), my vote would definitely be to go
> with 3.
>
> The param `schedule_interval` as catch all is not *sooo *bad, but
> `schedule` definitely seems better.  From a new user's perspective, having
> misleading names is confusing, and can be a negative when looking at
> adoption.  And for a tenured user, it can nag at you.  And dealing with
> deprecations as an Airflow user, well it's sort of part of the job, though
> it can be bothersome, certainly.
>
> Just to look at / compare feasibility.
>
> Deprecating schedule_interval, it would be more prominent than some other
> deprecations because it's a DAG param.  That said, I think it's pretty
> straightforward to handle.  For one, it's at the cardinality of DAG and not
> *task*, so there is less to change with than some other deprecations.
> And being kwargs-only, it's an easy find and replace that could be
> automated with an `upgrade-check` script.  It's very easy to detect, either
> to warn about, or ultimately, after upgrade, to fail the DAG parse (which
> forces user to fix).  Perhaps we could even experiment with a deprecation
> fix tool, so that users could get help fixing deprecations, not necessarily
> in the context of an upgrade, since we're not looking at 3.0 any time
> soon.
>
> We could compare with execution_date, if execution_date is to be removed
> in 3.0.  I think the deprecation of execution_date will be a more difficult
> thing for users to handle.  For one, potentially higher cardinality, since
> it is used in tasks.  But also, the use of execution_date can be harder to
> detect, since it can be in templates and elsewhere in custom operators.  *(With
> schedule_interval, since it's a top-level dag kwarg, it will be easier.)*
> And execution_date was replaced with `logical_date` for similar reasons --
> avoiding user confusion.
>
>
>
>
> On Fri, Jul 29, 2022 at 2:42 PM Vikram Koka <vi...@astronomer.io.invalid>
> wrote:
>
>> -1 from me.
>>
>> Though I agree in principle with the idea of consolidation, I don't think
>> we should be doing this yet until we understand the implications
>> completely.
>> I am really not in favor of deprecation of the existing params, unless
>> there is really no alternative.
>>
>>
>> On Fri, Jul 29, 2022 at 2:37 PM Daniel Standish
>> <da...@astronomer.io.invalid> wrote:
>>
>>> So far, seems all in favor.
>>>
>>> I will just highlight, in case it's not clear.... when we release this
>>> (presumably 2.4), basically every single dag in existence will start
>>> emitting deprecation warnings, and prior to 3.0, basically every dag in
>>> existence will need to be updated.
>>>
>>> Thankfully, for most people, this should be an easy update since DAG is
>>> kwargs only, IIRC.
>>>
>>>
>>>
>>> On Fri, Jul 29, 2022, 2:27 PM Brent Bovenzi <br...@astronomer.io.invalid>
>>> wrote:
>>>
>>>> +1 to consolidating and "schedule".
>>>>
>>>> On Fri, Jul 29, 2022, 10:12 PM Drew Hubl
>>>> <dr...@astronomer.io.invalid> wrote:
>>>>
>>>>> +1 for ‘schedule', and another +1 to the importance of consolidating
>>>>> to one being more important than the name of that one
>>>>>
>>>>> On Jul 29, 2022, at 1:58 PM, Josh Fell <
>>>>> josh.d.fell@astronomer.io.INVALID> wrote:
>>>>>
>>>>> Consolidating to a single scheduling parameter is a big +1 from me.
>>>>> `schedule` seems like a nice catch-all.
>>>>>
>>>>> On Fri, Jul 29, 2022 at 9:10 AM Jed Cunningham <
>>>>> jedcunningham@apache.org> wrote:
>>>>>
>>>>>> +1 on moving to a single `schedule` param.
>>>>>>
>>>>>
>>>>>

Re: Best Materials For Starting out with Airflow.

Posted by Leah Cole <co...@google.com.INVALID>.
Hey Mustapha!

+1 to what Dennis and Jarek sent already. We also have a medium blog
<https://medium.com/apache-airflow> and there's great content from all
years of the Airflow summit <https://airflowsummit.org/sessions/2022/>
available
on YouTube (that page has this years, but there's a link at the top to go
to previous years.) I also wanted to send a quick reminder that if you (or
anyone else) notice info that's missing in the docs or corrections that
need to be made, docs changes are a great place to make contributions
<https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst>! :)

Leah

On Mon, Aug 1, 2022 at 4:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Also check out https://github.com/jghoman/awesome-apache-airflow
>
> On Mon, Aug 1, 2022 at 6:25 PM Dennis Akpenyi <de...@gmail.com>
> wrote:
>
>> Hello Mustapha,
>>
>> Apache Airflow docs (
>> https://airflow.apache.org/docs/apache-airflow/stable/index.html) is a
>> great place to start. And Marc Lamberti (Astronomer Academy) also has great
>> Airflow introductory videos:
>> https://academy.astronomer.io/page/apache-airflow.
>>
>> Check out these YouTube channels for more:
>> https://www.youtube.com/c/Astronomer,
>> https://www.youtube.com/c/MarcLamberti.
>>
>> I hope you find these helpful.
>>
>> Best regards,
>>
>> Dennis
>>
>> On Mon, Aug 1, 2022 at 5:05 PM Mustapha Adekunle <dk...@gmail.com>
>> wrote:
>>
>>> Hello Airflow community,
>>>
>>> I am in the process of collating the best materials for learning
>>> airflow.
>>>
>>> I would appreciate if you all can suggest videos, courses, books,
>>> playlist, article or sorts of that you fill tick this Criteria.
>>>
>>> Looking forward to hearing from you all.
>>>
>>> Best,
>>> Mustapha
>>>
>>

-- 

Leah Cole (she/her) | Developer Programs Engineer, Data Analytics |
coleleah@google.com | +1 (925) 257-2112
*My working hours may not be your working hours. Please ping me anytime if
you'd like a status update on anything we are working on together - my goal
is to never be a blocker for you. *

Re: Best Materials For Starting out with Airflow.

Posted by Jarek Potiuk <ja...@potiuk.com>.
Also check out https://github.com/jghoman/awesome-apache-airflow

On Mon, Aug 1, 2022 at 6:25 PM Dennis Akpenyi <de...@gmail.com>
wrote:

> Hello Mustapha,
>
> Apache Airflow docs (
> https://airflow.apache.org/docs/apache-airflow/stable/index.html) is a
> great place to start. And Marc Lamberti (Astronomer Academy) also has great
> Airflow introductory videos:
> https://academy.astronomer.io/page/apache-airflow.
>
> Check out these YouTube channels for more:
> https://www.youtube.com/c/Astronomer,
> https://www.youtube.com/c/MarcLamberti.
>
> I hope you find these helpful.
>
> Best regards,
>
> Dennis
>
> On Mon, Aug 1, 2022 at 5:05 PM Mustapha Adekunle <dk...@gmail.com>
> wrote:
>
>> Hello Airflow community,
>>
>> I am in the process of collating the best materials for learning airflow.
>>
>> I would appreciate if you all can suggest videos, courses, books,
>> playlist, article or sorts of that you fill tick this Criteria.
>>
>> Looking forward to hearing from you all.
>>
>> Best,
>> Mustapha
>>
>

Re: Best Materials For Starting out with Airflow.

Posted by Dennis Akpenyi <de...@gmail.com>.
Hello Mustapha,

Apache Airflow docs (
https://airflow.apache.org/docs/apache-airflow/stable/index.html) is a
great place to start. And Marc Lamberti (Astronomer Academy) also has great
Airflow introductory videos:
https://academy.astronomer.io/page/apache-airflow.

Check out these YouTube channels for more:
https://www.youtube.com/c/Astronomer, https://www.youtube.com/c/MarcLamberti
.

I hope you find these helpful.

Best regards,

Dennis

On Mon, Aug 1, 2022 at 5:05 PM Mustapha Adekunle <dk...@gmail.com>
wrote:

> Hello Airflow community,
>
> I am in the process of collating the best materials for learning airflow.
>
> I would appreciate if you all can suggest videos, courses, books,
> playlist, article or sorts of that you fill tick this Criteria.
>
> Looking forward to hearing from you all.
>
> Best,
> Mustapha
>

Best Materials For Starting out with Airflow.

Posted by Mustapha Adekunle <dk...@gmail.com>.
Hello Airflow community,

I am in the process of collating the best materials for learning airflow.

I would appreciate if you all can suggest videos, courses, books, playlist,
article or sorts of that you fill tick this Criteria.

Looking forward to hearing from you all.

Best,
Mustapha