You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Daniel Standish <da...@astronomer.io.INVALID> on 2021/12/28 00:08:53 UTC

[DISCUSS] deprecate `days_ago` dates utility

I recall some time ago we removed `days_ago` from all  example dags.  Not
sure why we didn't also deprecate it.

For reference, `days_ago(N)` returns utcnow minus N days.

There's a PR to make it return a value in the default timezone, so that
when you use it in an expression for dag `start_date`, the dag will be in
the default timezone.

I don't want to get into the merits of that here.  But even assuming that
this would be desirable, there's still some ambiguity we'd have to
resolve.  Namely, should we return `now minus N 24-hour periods` (as `now -
timedelta(N)` would do) or should we return now minus N days (as
pendulum.now().add(days=-N)  would do)?  Because of DST the two
different approaches result in values that differ by 1 hour.

What I *do* want to explore here is whether folks think we can / should
just deprecate the function entirely.  Personally this would be my
preference.  Using `days_ago(5)` is not much more convenient than
`dttm.add(days=-N)`.   And the latter has the benefit that it is
unambiguous, doesn't make assumptions, and doesn't get in the way between
user and library.

So my proposal would be, don't change the behavior of `days_ago` and
deprecate it with removal targeted in 3.0.

Re: [DISCUSS] deprecate `days_ago` dates utility

Posted by Jarek Potiuk <ja...@potiuk.com>.
Good point Bas.

While I am all for deprecation, I agree this one is very much "hard-wired"
into many users of Airflow, for many likely this is almost a  muscle-memory
or copying from existing DAGs.
It's not enough to mandate it but we also should be empathetic and provide
a viable replacement.

Unless there are good reasons (loudly thinking - performance? scheduling
complexity? the delete/recreate scenario and edge cases involved? ) I like
the proposal of start_date defaulting to DAG creation date.
It is fully backwards-compatible, might be a nice feature of 2.3.0 and it
is much more user-friendly than having to figure out artificial date and
"catchup=False" to achieve the same.
It does break a little the premise of a bit unwritten rule - "everything
needed to create a DAG needs to be in the dag python code" - because the
"creation date" is only stored in the database, but actually the start-date
is not really a "DAG" structural property when you think of it, so this is
not a real "DAG structure validation". Since all our DAGs are now stored in
serialized form in the DB in order to be scheduled, using "creation_date"
to fill the "start_date" makes perfect sense actually.

J,

On Tue, Feb 1, 2022 at 9:55 AM Bas Harenslak <ba...@astronomer.io.invalid>
wrote:

> Regardless the behaviour of days_ago(), a lot of people use it so we’ll
> definitely need to document it well with some good examples of alternatives.
>
> That said, I think the usage of days_ago() is actually a side-effect of
> users that don’t really need their DAGs to start at X days ago, but want
> their DAGs to “just run”. Airflow requiring a start_date forces people to
> set something which they often do using days_ago(). Having Airflow default
> the start_date to the date a DAG was added would take away the need for
> days_ago().
>
> Bas
>
> On 1 Feb 2022, at 05:33, Tzu-ping Chung <tp...@astronomer.io.INVALID> wrote:
>
> I was brought here by
> https://github.com/apache/airflow/pull/20508#issuecomment-1026414890
> Also +1 to deprecation from me. Since the function cannot be safely used
> in start_date and end_date, the only sensible way for a general,
> non-advanced user to use the function is in a task Python callable (e.g.
> @task function). But importing Airflow in a task callable is always a bad
> practice since it can slow things down way too much, and a more lightweight
> solution (e.g. Pendulum as Daniel mentioned) is much preferred. Conversely,
> by having the function in Airflow core, we are somewhat suggesting the
> function can be used in DAG definition, which is bad. The presence of the
> function does not provide any advantages.
>
> TP
>
> On Jan 6 2022, at 12:11 pm, Josh Fell <jo...@astronomer.io.invalid>
> wrote:
>
>
> +1 for deprecation as well.
>
> `days_ago()` was removed from example DAGs and other documentation since
> it was mainly being used for dynamic `start_date` values which is not a
> best practice in DAG authoring. Seemed to create more confusion and odd
> behavior than value.
>
> On Tue, Dec 28, 2021 at 7:00 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> I'd be for deprecating it. It's too easy to use with too much too
> loose and too little value. I see no real "business" value in it.
>
> On Tue, Dec 28, 2021 at 5:27 PM Daniel Standish
> <da...@astronomer.io.invalid> wrote:
> >
> > Yeah that's correct. Sorry, I should have used `pendulum.today`.   But
> yeah also equivalent to `pendulum.today('UTC').add(days=-N)` (while
> `days_ago` uses timedelta it's the same when there's no DST is involved)
> >
> >
> > On Tue, Dec 28, 2021, 1:59 AM Ash Berlin-Taylor <as...@apache.org> wrote:
> >>
> >> days_ago is not just the same as utcnow minus N days, it is always
> "truncated" to the start of the day, so it's closer to
> "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)”
> >>
> >>
> >> On 28 December 2021 00:08:53 GMT, Daniel Standish <
> daniel.standish@astronomer.io.INVALID> wrote:
> >>>
> >>> I recall some time ago we removed `days_ago` from all  example dags.
> Not sure why we didn't also deprecate it.
> >>>
> >>> For reference, `days_ago(N)` returns utcnow minus N days.
> >>>
> >>> There's a PR to make it return a value in the default timezone, so
> that when you use it in an expression for dag `start_date`, the dag will be
> in the default timezone.
> >>>
> >>> I don't want to get into the merits of that here.  But even assuming
> that this would be desirable, there's still some ambiguity we'd have to
> resolve.  Namely, should we return `now minus N 24-hour periods` (as `now -
> timedelta(N)` would do) or should we return now minus N days (as
> pendulum.now().add(days=-N)  would do)?  Because of DST the two different
> approaches result in values that differ by 1 hour.
> >>>
> >>> What I do want to explore here is whether folks think we can / should
> just deprecate the function entirely.  Personally this would be my
> preference.  Using `days_ago(5)` is not much more convenient than
> `dttm.add(days=-N)`.   And the latter has the benefit that it is
> unambiguous, doesn't make assumptions, and doesn't get in the way between
> user and library.
> >>>
> >>> So my proposal would be, don't change the behavior of `days_ago` and
> deprecate it with removal targeted in 3.0.
> >>>
>
>
>

Re: [DISCUSS] deprecate `days_ago` dates utility

Posted by Bas Harenslak <ba...@astronomer.io.INVALID>.
Regardless the behaviour of days_ago(), a lot of people use it so we’ll definitely need to document it well with some good examples of alternatives.

That said, I think the usage of days_ago() is actually a side-effect of users that don’t really need their DAGs to start at X days ago, but want their DAGs to “just run”. Airflow requiring a start_date forces people to set something which they often do using days_ago(). Having Airflow default the start_date to the date a DAG was added would take away the need for days_ago().

Bas

> On 1 Feb 2022, at 05:33, Tzu-ping Chung <tp...@astronomer.io.INVALID> wrote:
> 
> I was brought here by https://github.com/apache/airflow/pull/20508#issuecomment-1026414890 <https://github.com/apache/airflow/pull/20508#issuecomment-1026414890>
> Also +1 to deprecation from me. Since the function cannot be safely used in start_date and end_date, the only sensible way for a general, non-advanced user to use the function is in a task Python callable (e.g. @task function). But importing Airflow in a task callable is always a bad practice since it can slow things down way too much, and a more lightweight solution (e.g. Pendulum as Daniel mentioned) is much preferred. Conversely, by having the function in Airflow core, we are somewhat suggesting the function can be used in DAG definition, which is bad. The presence of the function does not provide any advantages.
> 
> TP
> 
> On Jan 6 2022, at 12:11 pm, Josh Fell <jo...@astronomer.io.invalid> wrote:
> 
> +1 for deprecation as well. 
> 
> `days_ago()` was removed from example DAGs and other documentation since it was mainly being used for dynamic `start_date` values which is not a best practice in DAG authoring. Seemed to create more confusion and odd behavior than value.
> 
> On Tue, Dec 28, 2021 at 7:00 PM Jarek Potiuk <jarek@potiuk.com <ma...@potiuk.com>> wrote:
> I'd be for deprecating it. It's too easy to use with too much too
> loose and too little value. I see no real "business" value in it.
> 
> On Tue, Dec 28, 2021 at 5:27 PM Daniel Standish
> <da...@astronomer.io.invalid> wrote:
> >
> > Yeah that's correct. Sorry, I should have used `pendulum.today`.   But yeah also equivalent to `pendulum.today('UTC').add(days=-N)` (while `days_ago` uses timedelta it's the same when there's no DST is involved)
> >
> >
> > On Tue, Dec 28, 2021, 1:59 AM Ash Berlin-Taylor <ash@apache.org <ma...@apache.org>> wrote:
> >>
> >> days_ago is not just the same as utcnow minus N days, it is always "truncated" to the start of the day, so it's closer to "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)”
> >>
> >>
> >> On 28 December 2021 00:08:53 GMT, Daniel Standish <da...@astronomer.io.INVALID> wrote:
> >>>
> >>> I recall some time ago we removed `days_ago` from all  example dags.  Not sure why we didn't also deprecate it.
> >>>
> >>> For reference, `days_ago(N)` returns utcnow minus N days.
> >>>
> >>> There's a PR to make it return a value in the default timezone, so that when you use it in an expression for dag `start_date`, the dag will be in the default timezone.
> >>>
> >>> I don't want to get into the merits of that here.  But even assuming that this would be desirable, there's still some ambiguity we'd have to resolve.  Namely, should we return `now minus N 24-hour periods` (as `now - timedelta(N)` would do) or should we return now minus N days (as pendulum.now().add(days=-N)  would do)?  Because of DST the two different approaches result in values that differ by 1 hour.
> >>>
> >>> What I do want to explore here is whether folks think we can / should just deprecate the function entirely.  Personally this would be my preference.  Using `days_ago(5)` is not much more convenient than `dttm.add(days=-N)`.   And the latter has the benefit that it is unambiguous, doesn't make assumptions, and doesn't get in the way between user and library.
> >>>
> >>> So my proposal would be, don't change the behavior of `days_ago` and deprecate it with removal targeted in 3.0.
> >>>


Re: [DISCUSS] deprecate `days_ago` dates utility

Posted by Tzu-ping Chung <tp...@astronomer.io.INVALID>.
I was brought here by https://github.com/apache/airflow/pull/20508#issuecomment-1026414890

Also +1 to deprecation from me. Since the function cannot be safely used in start_date and end_date, the only sensible way for a general, non-advanced user to use the function is in a task Python callable (e.g. @task function). But importing Airflow in a task callable is always a bad practice since it can slow things down way too much, and a more lightweight solution (e.g. Pendulum as Daniel mentioned) is much preferred. Conversely, by having the function in Airflow core, we are somewhat suggesting the function can be used in DAG definition, which is bad. The presence of the function does not provide any advantages.

TP
On Jan 6 2022, at 12:11 pm, Josh Fell <jo...@astronomer.io.invalid> wrote:
>
> +1 for deprecation as well.
>
> `days_ago()` was removed from example DAGs and other documentation since it was mainly being used for dynamic `start_date` values which is not a best practice in DAG authoring. Seemed to create more confusion and odd behavior than value.
> On Tue, Dec 28, 2021 at 7:00 PM Jarek Potiuk <jarek@potiuk.com (mailto:jarek@potiuk.com)> wrote:
> > I'd be for deprecating it. It's too easy to use with too much too
> > loose and too little value. I see no real "business" value in it.
> >
> > On Tue, Dec 28, 2021 at 5:27 PM Daniel Standish
> > <da...@astronomer.io.invalid> wrote:
> > >
> > > Yeah that's correct. Sorry, I should have used `pendulum.today`. But yeah also equivalent to `pendulum.today('UTC').add(days=-N)` (while `days_ago` uses timedelta it's the same when there's no DST is involved)
> > >
> > >
> > > On Tue, Dec 28, 2021, 1:59 AM Ash Berlin-Taylor <ash@apache.org (mailto:ash@apache.org)> wrote:
> > >>
> > >> days_ago is not just the same as utcnow minus N days, it is always "truncated" to the start of the day, so it's closer to "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)”
> > >>
> > >>
> > >> On 28 December 2021 00:08:53 GMT, Daniel Standish <da...@astronomer.io.INVALID> wrote:
> > >>>
> > >>> I recall some time ago we removed `days_ago` from all example dags. Not sure why we didn't also deprecate it.
> > >>>
> > >>> For reference, `days_ago(N)` returns utcnow minus N days.
> > >>>
> > >>> There's a PR to make it return a value in the default timezone, so that when you use it in an expression for dag `start_date`, the dag will be in the default timezone.
> > >>>
> > >>> I don't want to get into the merits of that here. But even assuming that this would be desirable, there's still some ambiguity we'd have to resolve. Namely, should we return `now minus N 24-hour periods` (as `now - timedelta(N)` would do) or should we return now minus N days (as pendulum.now().add(days=-N) would do)? Because of DST the two different approaches result in values that differ by 1 hour.
> > >>>
> > >>> What I do want to explore here is whether folks think we can / should just deprecate the function entirely. Personally this would be my preference. Using `days_ago(5)` is not much more convenient than `dttm.add(days=-N)`. And the latter has the benefit that it is unambiguous, doesn't make assumptions, and doesn't get in the way between user and library.
> > >>>
> > >>> So my proposal would be, don't change the behavior of `days_ago` and deprecate it with removal targeted in 3.0.
> > >>>
>
>


Re: [DISCUSS] deprecate `days_ago` dates utility

Posted by Josh Fell <jo...@astronomer.io.INVALID>.
+1 for deprecation as well.

`days_ago()` was removed from example DAGs and other documentation since it
was mainly being used for dynamic `start_date` values which is not a best
practice in DAG authoring. Seemed to create more confusion and odd behavior
than value.

On Tue, Dec 28, 2021 at 7:00 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> I'd be for deprecating it. It's too easy to use with too much too
> loose and too little value. I see no real "business" value in it.
>
> On Tue, Dec 28, 2021 at 5:27 PM Daniel Standish
> <da...@astronomer.io.invalid> wrote:
> >
> > Yeah that's correct. Sorry, I should have used `pendulum.today`.   But
> yeah also equivalent to `pendulum.today('UTC').add(days=-N)` (while
> `days_ago` uses timedelta it's the same when there's no DST is involved)
> >
> >
> > On Tue, Dec 28, 2021, 1:59 AM Ash Berlin-Taylor <as...@apache.org> wrote:
> >>
> >> days_ago is not just the same as utcnow minus N days, it is always
> "truncated" to the start of the day, so it's closer to
> "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)”
> >>
> >>
> >> On 28 December 2021 00:08:53 GMT, Daniel Standish
> <da...@astronomer.io.INVALID> wrote:
> >>>
> >>> I recall some time ago we removed `days_ago` from all  example dags.
> Not sure why we didn't also deprecate it.
> >>>
> >>> For reference, `days_ago(N)` returns utcnow minus N days.
> >>>
> >>> There's a PR to make it return a value in the default timezone, so
> that when you use it in an expression for dag `start_date`, the dag will be
> in the default timezone.
> >>>
> >>> I don't want to get into the merits of that here.  But even assuming
> that this would be desirable, there's still some ambiguity we'd have to
> resolve.  Namely, should we return `now minus N 24-hour periods` (as `now -
> timedelta(N)` would do) or should we return now minus N days (as
> pendulum.now().add(days=-N)  would do)?  Because of DST the two different
> approaches result in values that differ by 1 hour.
> >>>
> >>> What I do want to explore here is whether folks think we can / should
> just deprecate the function entirely.  Personally this would be my
> preference.  Using `days_ago(5)` is not much more convenient than
> `dttm.add(days=-N)`.   And the latter has the benefit that it is
> unambiguous, doesn't make assumptions, and doesn't get in the way between
> user and library.
> >>>
> >>> So my proposal would be, don't change the behavior of `days_ago` and
> deprecate it with removal targeted in 3.0.
> >>>
>

Re: [DISCUSS] deprecate `days_ago` dates utility

Posted by Jarek Potiuk <ja...@potiuk.com>.
I'd be for deprecating it. It's too easy to use with too much too
loose and too little value. I see no real "business" value in it.

On Tue, Dec 28, 2021 at 5:27 PM Daniel Standish
<da...@astronomer.io.invalid> wrote:
>
> Yeah that's correct. Sorry, I should have used `pendulum.today`.   But yeah also equivalent to `pendulum.today('UTC').add(days=-N)` (while `days_ago` uses timedelta it's the same when there's no DST is involved)
>
>
> On Tue, Dec 28, 2021, 1:59 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>>
>> days_ago is not just the same as utcnow minus N days, it is always "truncated" to the start of the day, so it's closer to "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)”
>>
>>
>> On 28 December 2021 00:08:53 GMT, Daniel Standish <da...@astronomer.io.INVALID> wrote:
>>>
>>> I recall some time ago we removed `days_ago` from all  example dags.  Not sure why we didn't also deprecate it.
>>>
>>> For reference, `days_ago(N)` returns utcnow minus N days.
>>>
>>> There's a PR to make it return a value in the default timezone, so that when you use it in an expression for dag `start_date`, the dag will be in the default timezone.
>>>
>>> I don't want to get into the merits of that here.  But even assuming that this would be desirable, there's still some ambiguity we'd have to resolve.  Namely, should we return `now minus N 24-hour periods` (as `now - timedelta(N)` would do) or should we return now minus N days (as pendulum.now().add(days=-N)  would do)?  Because of DST the two different approaches result in values that differ by 1 hour.
>>>
>>> What I do want to explore here is whether folks think we can / should just deprecate the function entirely.  Personally this would be my preference.  Using `days_ago(5)` is not much more convenient than `dttm.add(days=-N)`.   And the latter has the benefit that it is unambiguous, doesn't make assumptions, and doesn't get in the way between user and library.
>>>
>>> So my proposal would be, don't change the behavior of `days_ago` and deprecate it with removal targeted in 3.0.
>>>

Re: [DISCUSS] deprecate `days_ago` dates utility

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
Yeah that's correct. Sorry, I should have used `pendulum.today`.   But yeah
also equivalent to `pendulum.today('UTC').add(days=-N)` *(while `days_ago`
uses timedelta it's the same when there's no DST is involved)*


On Tue, Dec 28, 2021, 1:59 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> days_ago is not just the same as utcnow minus N days, it is always
> "truncated" to the start of the day, so it's closer to
> "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)”
>
>
> On 28 December 2021 00:08:53 GMT, Daniel Standish
> <da...@astronomer.io.INVALID> wrote:
>>
>> I recall some time ago we removed `days_ago` from all  example dags.  Not
>> sure why we didn't also deprecate it.
>>
>> For reference, `days_ago(N)` returns utcnow minus N days.
>>
>> There's a PR to make it return a value in the default timezone, so that
>> when you use it in an expression for dag `start_date`, the dag will be in
>> the default timezone.
>>
>> I don't want to get into the merits of that here.  But even assuming that
>> this would be desirable, there's still some ambiguity we'd have to
>> resolve.  Namely, should we return `now minus N 24-hour periods` (as `now -
>> timedelta(N)` would do) or should we return now minus N days (as
>> pendulum.now().add(days=-N)  would do)?  Because of DST the two
>> different approaches result in values that differ by 1 hour.
>>
>> What I *do* want to explore here is whether folks think we can / should
>> just deprecate the function entirely.  Personally this would be my
>> preference.  Using `days_ago(5)` is not much more convenient than
>> `dttm.add(days=-N)`.   And the latter has the benefit that it is
>> unambiguous, doesn't make assumptions, and doesn't get in the way between
>> user and library.
>>
>> So my proposal would be, don't change the behavior of `days_ago` and
>> deprecate it with removal targeted in 3.0.
>>
>>

Re: [DISCUSS] deprecate `days_ago` dates utility

Posted by Ash Berlin-Taylor <as...@apache.org>.
days_ago is not just the same as utcnow minus N days, it is always "truncated" to the start of the day, so it's closer to "utcnow().replace(hour=0, minute=0, second=0) - timedelta(n)”


On 28 December 2021 00:08:53 GMT, Daniel Standish <da...@astronomer.io.INVALID> wrote:
>I recall some time ago we removed `days_ago` from all  example dags.  Not
>sure why we didn't also deprecate it.
>
>For reference, `days_ago(N)` returns utcnow minus N days.
>
>There's a PR to make it return a value in the default timezone, so that
>when you use it in an expression for dag `start_date`, the dag will be in
>the default timezone.
>
>I don't want to get into the merits of that here.  But even assuming that
>this would be desirable, there's still some ambiguity we'd have to
>resolve.  Namely, should we return `now minus N 24-hour periods` (as `now -
>timedelta(N)` would do) or should we return now minus N days (as
>pendulum.now().add(days=-N)  would do)?  Because of DST the two
>different approaches result in values that differ by 1 hour.
>
>What I *do* want to explore here is whether folks think we can / should
>just deprecate the function entirely.  Personally this would be my
>preference.  Using `days_ago(5)` is not much more convenient than
>`dttm.add(days=-N)`.   And the latter has the benefit that it is
>unambiguous, doesn't make assumptions, and doesn't get in the way between
>user and library.
>
>So my proposal would be, don't change the behavior of `days_ago` and
>deprecate it with removal targeted in 3.0.