You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Bolke de Bruin <bd...@gmail.com> on 2017/11/13 18:09:05 UTC

Making Airflow Timezone aware

Hi All,

I just want to make you aware that I am creating patches that make Airflow timezone aware. The gist of the idea is that Airflow internally will use and store UTC everywhere. This allows you to have start_date = datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly take care of day light savings time. If you are using cron we will make sure to always run at the exact time (end of interval of course) which you specify even when DST is in effect, e.g. 8.00am is always 8.00am regardless of if a day lights savings time has happened. DAGs that don’t have a timezone associated, get a default timezone that is configurable.

In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there. As the patches are invasive particularly in tests (everything needs a timezone basically) less so in other areas I like to raise special attention to a couple of places where this has impact.

1. All database DateTime fields are converted to timezone aware Timestamp fields. This impacts MySQL deployments particularly as MySQL was storing DateTime fields, which cannot be made timezone aware. Also, to make sure conversion happens properly we set the connection time zone to UTC. This is supported by Postgres and MySQL. However, it is not supported by SQLServer. So if you are running outside of UTC you need to take special care when upgrading.

2. Thou shall not use datetime.now() and datetime.utcnow() when writing code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can still use it). Both create naive date times (yes even utcnow() ). You can use airflow.utils.timezone utcnow() for this. As you will not be able to store naive datetime fields anymore you will notice soon enough.

Finally, and that is the main reason fir this email, I am looking for feedback and testers. The PR can be found here: https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the tests yet, but you can see that I am working hard on that ;-).

Cheers
Bolke

Re: Making Airflow Timezone aware

Posted by Bolke de Bruin <bd...@gmail.com>.
Tests are passing now on all tested databases. Please try it out. I do expect some quirks, mainly in the user interface.

If you are using Maria/MySQL db it highly advised to use "explicit_defaults_for_timestamp = on” in your my.cnf [1].

https://github.com/apache/incubator-airflow/pull/2781

I have tried to keep commit history as clean as possible, but this is a major change, mostly in *tests* and *views*. Core has been updated here and there, but I did no changes to scheduler code.

Bolke

[1] https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_explicit_defaults_for_timestamp

> On 16 Nov 2017, at 06:44, Daniel (Daniel Lamblin) [BDP - Seoul] <la...@coupang.com> wrote:
> 
> I agree it's a good idea
> 
> Instead of posters I've been using time.is; EG time.is/gmt time.is/utc time.is/z are the same thing, it's also got time.is/unix and https://time.is/compare/now_in_KST/PST/UTC for offsets and a table.
> 
> 
> 
> On 11/16/17, 10:46 AM, "Rob Goretsky" <ro...@gmail.com> wrote:
> 
> 
> 
>    This will be huge for my team at MLB.com!  Really appreciate your work on this, Bolke!  We will finally be able to take down the posters we've all hung up at our desks that show the current GMT offset!  Let us know how/when we can try it out!
> 
> 
> 
>    -rob
> 
> 
> 
>> On Nov 15, 2017, at 7:33 PM, George Leslie-Waksman <ge...@cloverhealth.com.INVALID> wrote:
> 
>> 
> 
>> Really happy to hear this moving forward. Thanks Bolke!
> 
>> 
> 
>>> On Tue, Nov 14, 2017 at 7:44 AM Bolke de Bruin <bd...@gmail.com> wrote:
> 
>>> 
> 
>>> See inline answers below.
> 
>>> 
> 
>>> Verstuurd vanaf mijn iPad
> 
>>> 
> 
>>>> Op 14 nov. 2017 om 16:33 heeft Heistermann, Till <
> 
>>> Till.Heistermann@blue-yonder.com> het volgende geschreven:
> 
>>>> 
> 
>>>> Hi Bolke,
> 
>>>> 
> 
>>>> This looks great.
> 
>>>> 
> 
>>>> We have had the requirement to run DAGs in different local time zones
> 
>>> for a while, so far we worked around the limitation on dag-level to
> 
>>> automate most of our DST switches.
> 
>>>> 
> 
>>>> How would the approach behave in the DST-Switch corner cases?
> 
>>>> 
> 
>>>> For the regular case, I understand that if start_date=datetime(2017, 1,
> 
>>> 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the  schedule is “30 8 * * *”,
> 
>>> the DST switch would work as expected, and the dag would get scheduled at
> 
>>> 7:30 am UTC in European Winter and 6:30 UTC in European Summer.
> 
>>> 
> 
>>> Actually no. For cron defined schedules we will always use local time, but
> 
>>> naive. This means your 8.30 schedule will always happen 8.30 local time
> 
>>> regardless.
> 
>>> 
> 
>>>> 
> 
>>>> However, if start_date=datetime(2017, 1, 1, 2, 30, 0,
> 
>>> tzinfo=“Europe/Amsterdam”)  and the schedule is “30 2 * * *”, would we skip
> 
>>> a nightly run in March and have two nightly runs in October?
> 
>>>> This seems like the correct thing to do from a time zone logic point of
> 
>>> view, although I can imagine that there are many operational use cases
> 
>>> where the user wants something different.
> 
>>> 
> 
>>> I have to verify what happens. I think what will happen is that it will
> 
>>> run at 3.30 as we convert to naive local time (dst unaware) add the
> 
>>> interval convert back to UTC. UTC will then translate to 3.30 local time
> 
>>> which is btw equal to 2.30 local time.
> 
>>> 
> 
>>> Execution_date will be in UTC. The DAG will store time zone information so
> 
>>> you can decide yourself what you want to do with that.
> 
>>> 
> 
>>> 
> 
>>>> 
> 
>>>> If start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)
> 
>>> and the schedule is timedelta(days=14), would a DST switch actually occur?
> 
>>>> There is some ambiguity in this case, depending on the
> 
>>> timedelta(days=14) being understood as either “14 days in local calendar”
> 
>>> or 14*24*60*60 seconds on the system clock.
> 
>>>> I’m not sure what the expected behaviour should be in this case.
> 
>>> 
> 
>>> For timedeltas DST is in effect. It is assumed here that you want to run X
> 
>>> hours later, not at a specific time. Obviously if you want to keep the old
> 
>>> behavior (and this is the default) keep your Timezone at Utc.
> 
>>> 
> 
>>>> 
> 
>>>> Cheers,
> 
>>>> Till
> 
>>>> 
> 
>>>> 
> 
>>>> On 13.11.17, 19:47, "Ash Berlin-Taylor" <as...@firemirror.com>
> 
>>> wrote:
> 
>>>> 
> 
>>>>  This sounds like an awesome change!
> 
>>>> 
> 
>>>>  I'm happy to review (will take a look tomorrow) but won't be a
> 
>>> suitable tester as all our DAGs operate in UTC.
> 
>>>> 
> 
>>>>  -ash
> 
>>>> 
> 
>>>> 
> 
>>>>> On 13 Nov 2017, at 18:09, Bolke de Bruin <bd...@gmail.com> wrote:
> 
>>>>> 
> 
>>>>> Hi All,
> 
>>>>> 
> 
>>>>> I just want to make you aware that I am creating patches that make
> 
>>> Airflow timezone aware. The gist of the idea is that Airflow internally
> 
>>> will use and store UTC everywhere. This allows you to have start_date =
> 
>>> datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly
> 
>>> take care of day light savings time. If you are using cron we will make
> 
>>> sure to always run at the exact time (end of interval of course) which you
> 
>>> specify even when DST is in effect, e.g. 8.00am is always 8.00am regardless
> 
>>> of if a day lights savings time has happened. DAGs that don’t have a
> 
>>> timezone associated, get a default timezone that is configurable.
> 
>>>>> 
> 
>>>>> In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there.
> 
>>> As the patches are invasive particularly in tests (everything needs a
> 
>>> timezone basically) less so in other areas I like to raise special
> 
>>> attention to a couple of places where this has impact.
> 
>>>>> 
> 
>>>>> 1. All database DateTime fields are converted to timezone aware
> 
>>> Timestamp fields. This impacts MySQL deployments particularly as MySQL was
> 
>>> storing DateTime fields, which cannot be made timezone aware. Also, to make
> 
>>> sure conversion happens properly we set the connection time zone to UTC.
> 
>>> This is supported by Postgres and MySQL. However, it is not supported by
> 
>>> SQLServer. So if you are running outside of UTC you need to take special
> 
>>> care when upgrading.
> 
>>>>> 
> 
>>>>> 2. Thou shall not use datetime.now() and datetime.utcnow() when writing
> 
>>> code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can
> 
>>> still use it). Both create naive date times (yes even utcnow() ). You can
> 
>>> use airflow.utils.timezone utcnow() for this. As you will not be able to
> 
>>> store naive datetime fields anymore you will notice soon enough.
> 
>>>>> 
> 
>>>>> Finally, and that is the main reason fir this email, I am looking for
> 
>>> feedback and testers. The PR can be found here:
> 
>>> https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the
> 
>>> tests yet, but you can see that I am working hard on that ;-).
> 
>>>>> 
> 
>>>>> Cheers
> 
>>>>> Bolke
> 
>>>> 
> 
>>>> 
> 
>>>> 
> 
>>> 
> 
> 


Re: Making Airflow Timezone aware

Posted by "Daniel (Daniel Lamblin) [BDP - Seoul]" <la...@coupang.com>.
I agree it's a good idea

Instead of posters I've been using time.is; EG time.is/gmt time.is/utc time.is/z are the same thing, it's also got time.is/unix and https://time.is/compare/now_in_KST/PST/UTC for offsets and a table.



On 11/16/17, 10:46 AM, "Rob Goretsky" <ro...@gmail.com> wrote:



    This will be huge for my team at MLB.com!  Really appreciate your work on this, Bolke!  We will finally be able to take down the posters we've all hung up at our desks that show the current GMT offset!  Let us know how/when we can try it out!



    -rob



    > On Nov 15, 2017, at 7:33 PM, George Leslie-Waksman <ge...@cloverhealth.com.INVALID> wrote:

    >

    > Really happy to hear this moving forward. Thanks Bolke!

    >

    >> On Tue, Nov 14, 2017 at 7:44 AM Bolke de Bruin <bd...@gmail.com> wrote:

    >>

    >> See inline answers below.

    >>

    >> Verstuurd vanaf mijn iPad

    >>

    >>> Op 14 nov. 2017 om 16:33 heeft Heistermann, Till <

    >> Till.Heistermann@blue-yonder.com> het volgende geschreven:

    >>>

    >>> Hi Bolke,

    >>>

    >>> This looks great.

    >>>

    >>> We have had the requirement to run DAGs in different local time zones

    >> for a while, so far we worked around the limitation on dag-level to

    >> automate most of our DST switches.

    >>>

    >>> How would the approach behave in the DST-Switch corner cases?

    >>>

    >>> For the regular case, I understand that if start_date=datetime(2017, 1,

    >> 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the  schedule is “30 8 * * *”,

    >> the DST switch would work as expected, and the dag would get scheduled at

    >> 7:30 am UTC in European Winter and 6:30 UTC in European Summer.

    >>

    >> Actually no. For cron defined schedules we will always use local time, but

    >> naive. This means your 8.30 schedule will always happen 8.30 local time

    >> regardless.

    >>

    >>>

    >>> However, if start_date=datetime(2017, 1, 1, 2, 30, 0,

    >> tzinfo=“Europe/Amsterdam”)  and the schedule is “30 2 * * *”, would we skip

    >> a nightly run in March and have two nightly runs in October?

    >>> This seems like the correct thing to do from a time zone logic point of

    >> view, although I can imagine that there are many operational use cases

    >> where the user wants something different.

    >>

    >> I have to verify what happens. I think what will happen is that it will

    >> run at 3.30 as we convert to naive local time (dst unaware) add the

    >> interval convert back to UTC. UTC will then translate to 3.30 local time

    >> which is btw equal to 2.30 local time.

    >>

    >> Execution_date will be in UTC. The DAG will store time zone information so

    >> you can decide yourself what you want to do with that.

    >>

    >>

    >>>

    >>> If start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)

    >> and the schedule is timedelta(days=14), would a DST switch actually occur?

    >>> There is some ambiguity in this case, depending on the

    >> timedelta(days=14) being understood as either “14 days in local calendar”

    >> or 14*24*60*60 seconds on the system clock.

    >>> I’m not sure what the expected behaviour should be in this case.

    >>

    >> For timedeltas DST is in effect. It is assumed here that you want to run X

    >> hours later, not at a specific time. Obviously if you want to keep the old

    >> behavior (and this is the default) keep your Timezone at Utc.

    >>

    >>>

    >>> Cheers,

    >>> Till

    >>>

    >>>

    >>> On 13.11.17, 19:47, "Ash Berlin-Taylor" <as...@firemirror.com>

    >> wrote:

    >>>

    >>>   This sounds like an awesome change!

    >>>

    >>>   I'm happy to review (will take a look tomorrow) but won't be a

    >> suitable tester as all our DAGs operate in UTC.

    >>>

    >>>   -ash

    >>>

    >>>

    >>>> On 13 Nov 2017, at 18:09, Bolke de Bruin <bd...@gmail.com> wrote:

    >>>>

    >>>> Hi All,

    >>>>

    >>>> I just want to make you aware that I am creating patches that make

    >> Airflow timezone aware. The gist of the idea is that Airflow internally

    >> will use and store UTC everywhere. This allows you to have start_date =

    >> datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly

    >> take care of day light savings time. If you are using cron we will make

    >> sure to always run at the exact time (end of interval of course) which you

    >> specify even when DST is in effect, e.g. 8.00am is always 8.00am regardless

    >> of if a day lights savings time has happened. DAGs that don’t have a

    >> timezone associated, get a default timezone that is configurable.

    >>>>

    >>>> In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there.

    >> As the patches are invasive particularly in tests (everything needs a

    >> timezone basically) less so in other areas I like to raise special

    >> attention to a couple of places where this has impact.

    >>>>

    >>>> 1. All database DateTime fields are converted to timezone aware

    >> Timestamp fields. This impacts MySQL deployments particularly as MySQL was

    >> storing DateTime fields, which cannot be made timezone aware. Also, to make

    >> sure conversion happens properly we set the connection time zone to UTC.

    >> This is supported by Postgres and MySQL. However, it is not supported by

    >> SQLServer. So if you are running outside of UTC you need to take special

    >> care when upgrading.

    >>>>

    >>>> 2. Thou shall not use datetime.now() and datetime.utcnow() when writing

    >> code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can

    >> still use it). Both create naive date times (yes even utcnow() ). You can

    >> use airflow.utils.timezone utcnow() for this. As you will not be able to

    >> store naive datetime fields anymore you will notice soon enough.

    >>>>

    >>>> Finally, and that is the main reason fir this email, I am looking for

    >> feedback and testers. The PR can be found here:

    >> https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the

    >> tests yet, but you can see that I am working hard on that ;-).

    >>>>

    >>>> Cheers

    >>>> Bolke

    >>>

    >>>

    >>>

    >>



Re: Making Airflow Timezone aware

Posted by Rob Goretsky <ro...@gmail.com>.
This will be huge for my team at MLB.com!  Really appreciate your work on this, Bolke!  We will finally be able to take down the posters we've all hung up at our desks that show the current GMT offset!  Let us know how/when we can try it out!

-rob 

> On Nov 15, 2017, at 7:33 PM, George Leslie-Waksman <ge...@cloverhealth.com.INVALID> wrote:
> 
> Really happy to hear this moving forward. Thanks Bolke!
> 
>> On Tue, Nov 14, 2017 at 7:44 AM Bolke de Bruin <bd...@gmail.com> wrote:
>> 
>> See inline answers below.
>> 
>> Verstuurd vanaf mijn iPad
>> 
>>> Op 14 nov. 2017 om 16:33 heeft Heistermann, Till <
>> Till.Heistermann@blue-yonder.com> het volgende geschreven:
>>> 
>>> Hi Bolke,
>>> 
>>> This looks great.
>>> 
>>> We have had the requirement to run DAGs in different local time zones
>> for a while, so far we worked around the limitation on dag-level to
>> automate most of our DST switches.
>>> 
>>> How would the approach behave in the DST-Switch corner cases?
>>> 
>>> For the regular case, I understand that if start_date=datetime(2017, 1,
>> 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the  schedule is “30 8 * * *”,
>> the DST switch would work as expected, and the dag would get scheduled at
>> 7:30 am UTC in European Winter and 6:30 UTC in European Summer.
>> 
>> Actually no. For cron defined schedules we will always use local time, but
>> naive. This means your 8.30 schedule will always happen 8.30 local time
>> regardless.
>> 
>>> 
>>> However, if start_date=datetime(2017, 1, 1, 2, 30, 0,
>> tzinfo=“Europe/Amsterdam”)  and the schedule is “30 2 * * *”, would we skip
>> a nightly run in March and have two nightly runs in October?
>>> This seems like the correct thing to do from a time zone logic point of
>> view, although I can imagine that there are many operational use cases
>> where the user wants something different.
>> 
>> I have to verify what happens. I think what will happen is that it will
>> run at 3.30 as we convert to naive local time (dst unaware) add the
>> interval convert back to UTC. UTC will then translate to 3.30 local time
>> which is btw equal to 2.30 local time.
>> 
>> Execution_date will be in UTC. The DAG will store time zone information so
>> you can decide yourself what you want to do with that.
>> 
>> 
>>> 
>>> If start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)
>> and the schedule is timedelta(days=14), would a DST switch actually occur?
>>> There is some ambiguity in this case, depending on the
>> timedelta(days=14) being understood as either “14 days in local calendar”
>> or 14*24*60*60 seconds on the system clock.
>>> I’m not sure what the expected behaviour should be in this case.
>> 
>> For timedeltas DST is in effect. It is assumed here that you want to run X
>> hours later, not at a specific time. Obviously if you want to keep the old
>> behavior (and this is the default) keep your Timezone at Utc.
>> 
>>> 
>>> Cheers,
>>> Till
>>> 
>>> 
>>> On 13.11.17, 19:47, "Ash Berlin-Taylor" <as...@firemirror.com>
>> wrote:
>>> 
>>>   This sounds like an awesome change!
>>> 
>>>   I'm happy to review (will take a look tomorrow) but won't be a
>> suitable tester as all our DAGs operate in UTC.
>>> 
>>>   -ash
>>> 
>>> 
>>>> On 13 Nov 2017, at 18:09, Bolke de Bruin <bd...@gmail.com> wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> I just want to make you aware that I am creating patches that make
>> Airflow timezone aware. The gist of the idea is that Airflow internally
>> will use and store UTC everywhere. This allows you to have start_date =
>> datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly
>> take care of day light savings time. If you are using cron we will make
>> sure to always run at the exact time (end of interval of course) which you
>> specify even when DST is in effect, e.g. 8.00am is always 8.00am regardless
>> of if a day lights savings time has happened. DAGs that don’t have a
>> timezone associated, get a default timezone that is configurable.
>>>> 
>>>> In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there.
>> As the patches are invasive particularly in tests (everything needs a
>> timezone basically) less so in other areas I like to raise special
>> attention to a couple of places where this has impact.
>>>> 
>>>> 1. All database DateTime fields are converted to timezone aware
>> Timestamp fields. This impacts MySQL deployments particularly as MySQL was
>> storing DateTime fields, which cannot be made timezone aware. Also, to make
>> sure conversion happens properly we set the connection time zone to UTC.
>> This is supported by Postgres and MySQL. However, it is not supported by
>> SQLServer. So if you are running outside of UTC you need to take special
>> care when upgrading.
>>>> 
>>>> 2. Thou shall not use datetime.now() and datetime.utcnow() when writing
>> code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can
>> still use it). Both create naive date times (yes even utcnow() ). You can
>> use airflow.utils.timezone utcnow() for this. As you will not be able to
>> store naive datetime fields anymore you will notice soon enough.
>>>> 
>>>> Finally, and that is the main reason fir this email, I am looking for
>> feedback and testers. The PR can be found here:
>> https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the
>> tests yet, but you can see that I am working hard on that ;-).
>>>> 
>>>> Cheers
>>>> Bolke
>>> 
>>> 
>>> 
>> 

Re: Making Airflow Timezone aware

Posted by George Leslie-Waksman <ge...@cloverhealth.com.INVALID>.
Really happy to hear this moving forward. Thanks Bolke!

On Tue, Nov 14, 2017 at 7:44 AM Bolke de Bruin <bd...@gmail.com> wrote:

> See inline answers below.
>
> Verstuurd vanaf mijn iPad
>
> > Op 14 nov. 2017 om 16:33 heeft Heistermann, Till <
> Till.Heistermann@blue-yonder.com> het volgende geschreven:
> >
> > Hi Bolke,
> >
> > This looks great.
> >
> > We have had the requirement to run DAGs in different local time zones
> for a while, so far we worked around the limitation on dag-level to
> automate most of our DST switches.
> >
> > How would the approach behave in the DST-Switch corner cases?
> >
> > For the regular case, I understand that if start_date=datetime(2017, 1,
> 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the  schedule is “30 8 * * *”,
> the DST switch would work as expected, and the dag would get scheduled at
> 7:30 am UTC in European Winter and 6:30 UTC in European Summer.
>
> Actually no. For cron defined schedules we will always use local time, but
> naive. This means your 8.30 schedule will always happen 8.30 local time
> regardless.
>
> >
> > However, if start_date=datetime(2017, 1, 1, 2, 30, 0,
> tzinfo=“Europe/Amsterdam”)  and the schedule is “30 2 * * *”, would we skip
> a nightly run in March and have two nightly runs in October?
> > This seems like the correct thing to do from a time zone logic point of
> view, although I can imagine that there are many operational use cases
> where the user wants something different.
>
> I have to verify what happens. I think what will happen is that it will
> run at 3.30 as we convert to naive local time (dst unaware) add the
> interval convert back to UTC. UTC will then translate to 3.30 local time
> which is btw equal to 2.30 local time.
>
> Execution_date will be in UTC. The DAG will store time zone information so
> you can decide yourself what you want to do with that.
>
>
> >
> > If start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)
> and the schedule is timedelta(days=14), would a DST switch actually occur?
> > There is some ambiguity in this case, depending on the
> timedelta(days=14) being understood as either “14 days in local calendar”
> or 14*24*60*60 seconds on the system clock.
> > I’m not sure what the expected behaviour should be in this case.
>
> For timedeltas DST is in effect. It is assumed here that you want to run X
> hours later, not at a specific time. Obviously if you want to keep the old
> behavior (and this is the default) keep your Timezone at Utc.
>
> >
> > Cheers,
> > Till
> >
> >
> > On 13.11.17, 19:47, "Ash Berlin-Taylor" <as...@firemirror.com>
> wrote:
> >
> >    This sounds like an awesome change!
> >
> >    I'm happy to review (will take a look tomorrow) but won't be a
> suitable tester as all our DAGs operate in UTC.
> >
> >    -ash
> >
> >
> >> On 13 Nov 2017, at 18:09, Bolke de Bruin <bd...@gmail.com> wrote:
> >>
> >> Hi All,
> >>
> >> I just want to make you aware that I am creating patches that make
> Airflow timezone aware. The gist of the idea is that Airflow internally
> will use and store UTC everywhere. This allows you to have start_date =
> datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly
> take care of day light savings time. If you are using cron we will make
> sure to always run at the exact time (end of interval of course) which you
> specify even when DST is in effect, e.g. 8.00am is always 8.00am regardless
> of if a day lights savings time has happened. DAGs that don’t have a
> timezone associated, get a default timezone that is configurable.
> >>
> >> In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there.
> As the patches are invasive particularly in tests (everything needs a
> timezone basically) less so in other areas I like to raise special
> attention to a couple of places where this has impact.
> >>
> >> 1. All database DateTime fields are converted to timezone aware
> Timestamp fields. This impacts MySQL deployments particularly as MySQL was
> storing DateTime fields, which cannot be made timezone aware. Also, to make
> sure conversion happens properly we set the connection time zone to UTC.
> This is supported by Postgres and MySQL. However, it is not supported by
> SQLServer. So if you are running outside of UTC you need to take special
> care when upgrading.
> >>
> >> 2. Thou shall not use datetime.now() and datetime.utcnow() when writing
> code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can
> still use it). Both create naive date times (yes even utcnow() ). You can
> use airflow.utils.timezone utcnow() for this. As you will not be able to
> store naive datetime fields anymore you will notice soon enough.
> >>
> >> Finally, and that is the main reason fir this email, I am looking for
> feedback and testers. The PR can be found here:
> https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the
> tests yet, but you can see that I am working hard on that ;-).
> >>
> >> Cheers
> >> Bolke
> >
> >
> >
>

Re: Making Airflow Timezone aware

Posted by Bolke de Bruin <bd...@gmail.com>.
See inline answers below.

Verstuurd vanaf mijn iPad

> Op 14 nov. 2017 om 16:33 heeft Heistermann, Till <Ti...@blue-yonder.com> het volgende geschreven:
> 
> Hi Bolke,
> 
> This looks great.
> 
> We have had the requirement to run DAGs in different local time zones for a while, so far we worked around the limitation on dag-level to automate most of our DST switches.
> 
> How would the approach behave in the DST-Switch corner cases?
> 
> For the regular case, I understand that if start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the  schedule is “30 8 * * *”, the DST switch would work as expected, and the dag would get scheduled at 7:30 am UTC in European Winter and 6:30 UTC in European Summer.

Actually no. For cron defined schedules we will always use local time, but naive. This means your 8.30 schedule will always happen 8.30 local time regardless.

> 
> However, if start_date=datetime(2017, 1, 1, 2, 30, 0, tzinfo=“Europe/Amsterdam”)  and the schedule is “30 2 * * *”, would we skip a nightly run in March and have two nightly runs in October?
> This seems like the correct thing to do from a time zone logic point of view, although I can imagine that there are many operational use cases where the user wants something different.

I have to verify what happens. I think what will happen is that it will run at 3.30 as we convert to naive local time (dst unaware) add the interval convert back to UTC. UTC will then translate to 3.30 local time which is btw equal to 2.30 local time. 

Execution_date will be in UTC. The DAG will store time zone information so you can decide yourself what you want to do with that.


> 
> If start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the schedule is timedelta(days=14), would a DST switch actually occur?
> There is some ambiguity in this case, depending on the timedelta(days=14) being understood as either “14 days in local calendar” or 14*24*60*60 seconds on the system clock.
> I’m not sure what the expected behaviour should be in this case.

For timedeltas DST is in effect. It is assumed here that you want to run X hours later, not at a specific time. Obviously if you want to keep the old behavior (and this is the default) keep your Timezone at Utc. 

> 
> Cheers,
> Till
> 
> 
> On 13.11.17, 19:47, "Ash Berlin-Taylor" <as...@firemirror.com> wrote:
> 
>    This sounds like an awesome change!
> 
>    I'm happy to review (will take a look tomorrow) but won't be a suitable tester as all our DAGs operate in UTC.
> 
>    -ash
> 
> 
>> On 13 Nov 2017, at 18:09, Bolke de Bruin <bd...@gmail.com> wrote:
>> 
>> Hi All,
>> 
>> I just want to make you aware that I am creating patches that make Airflow timezone aware. The gist of the idea is that Airflow internally will use and store UTC everywhere. This allows you to have start_date = datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly take care of day light savings time. If you are using cron we will make sure to always run at the exact time (end of interval of course) which you specify even when DST is in effect, e.g. 8.00am is always 8.00am regardless of if a day lights savings time has happened. DAGs that don’t have a timezone associated, get a default timezone that is configurable.
>> 
>> In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there. As the patches are invasive particularly in tests (everything needs a timezone basically) less so in other areas I like to raise special attention to a couple of places where this has impact.
>> 
>> 1. All database DateTime fields are converted to timezone aware Timestamp fields. This impacts MySQL deployments particularly as MySQL was storing DateTime fields, which cannot be made timezone aware. Also, to make sure conversion happens properly we set the connection time zone to UTC. This is supported by Postgres and MySQL. However, it is not supported by SQLServer. So if you are running outside of UTC you need to take special care when upgrading.
>> 
>> 2. Thou shall not use datetime.now() and datetime.utcnow() when writing code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can still use it). Both create naive date times (yes even utcnow() ). You can use airflow.utils.timezone utcnow() for this. As you will not be able to store naive datetime fields anymore you will notice soon enough.
>> 
>> Finally, and that is the main reason fir this email, I am looking for feedback and testers. The PR can be found here: https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the tests yet, but you can see that I am working hard on that ;-).
>> 
>> Cheers
>> Bolke
> 
> 
> 

Re: Making Airflow Timezone aware

Posted by "Heistermann, Till" <Ti...@blue-yonder.com>.
Hi Bolke,

This looks great.

We have had the requirement to run DAGs in different local time zones for a while, so far we worked around the limitation on dag-level to automate most of our DST switches.

How would the approach behave in the DST-Switch corner cases?

For the regular case, I understand that if start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the  schedule is “30 8 * * *”, the DST switch would work as expected, and the dag would get scheduled at 7:30 am UTC in European Winter and 6:30 UTC in European Summer.

However, if start_date=datetime(2017, 1, 1, 2, 30, 0, tzinfo=“Europe/Amsterdam”)  and the schedule is “30 2 * * *”, would we skip a nightly run in March and have two nightly runs in October?
This seems like the correct thing to do from a time zone logic point of view, although I can imagine that there are many operational use cases where the user wants something different.

If start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the schedule is timedelta(days=14), would a DST switch actually occur?
There is some ambiguity in this case, depending on the timedelta(days=14) being understood as either “14 days in local calendar” or 14*24*60*60 seconds on the system clock.
I’m not sure what the expected behaviour should be in this case.

Cheers,
Till


On 13.11.17, 19:47, "Ash Berlin-Taylor" <as...@firemirror.com> wrote:

    This sounds like an awesome change!
    
    I'm happy to review (will take a look tomorrow) but won't be a suitable tester as all our DAGs operate in UTC.
    
    -ash
    
    
    > On 13 Nov 2017, at 18:09, Bolke de Bruin <bd...@gmail.com> wrote:
    > 
    > Hi All,
    > 
    > I just want to make you aware that I am creating patches that make Airflow timezone aware. The gist of the idea is that Airflow internally will use and store UTC everywhere. This allows you to have start_date = datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly take care of day light savings time. If you are using cron we will make sure to always run at the exact time (end of interval of course) which you specify even when DST is in effect, e.g. 8.00am is always 8.00am regardless of if a day lights savings time has happened. DAGs that don’t have a timezone associated, get a default timezone that is configurable.
    > 
    > In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there. As the patches are invasive particularly in tests (everything needs a timezone basically) less so in other areas I like to raise special attention to a couple of places where this has impact.
    > 
    > 1. All database DateTime fields are converted to timezone aware Timestamp fields. This impacts MySQL deployments particularly as MySQL was storing DateTime fields, which cannot be made timezone aware. Also, to make sure conversion happens properly we set the connection time zone to UTC. This is supported by Postgres and MySQL. However, it is not supported by SQLServer. So if you are running outside of UTC you need to take special care when upgrading.
    > 
    > 2. Thou shall not use datetime.now() and datetime.utcnow() when writing code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can still use it). Both create naive date times (yes even utcnow() ). You can use airflow.utils.timezone utcnow() for this. As you will not be able to store naive datetime fields anymore you will notice soon enough.
    > 
    > Finally, and that is the main reason fir this email, I am looking for feedback and testers. The PR can be found here: https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the tests yet, but you can see that I am working hard on that ;-).
    > 
    > Cheers
    > Bolke
    
    


Re: Making Airflow Timezone aware

Posted by Ash Berlin-Taylor <as...@firemirror.com>.
This sounds like an awesome change!

I'm happy to review (will take a look tomorrow) but won't be a suitable tester as all our DAGs operate in UTC.

-ash


> On 13 Nov 2017, at 18:09, Bolke de Bruin <bd...@gmail.com> wrote:
> 
> Hi All,
> 
> I just want to make you aware that I am creating patches that make Airflow timezone aware. The gist of the idea is that Airflow internally will use and store UTC everywhere. This allows you to have start_date = datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly take care of day light savings time. If you are using cron we will make sure to always run at the exact time (end of interval of course) which you specify even when DST is in effect, e.g. 8.00am is always 8.00am regardless of if a day lights savings time has happened. DAGs that don’t have a timezone associated, get a default timezone that is configurable.
> 
> In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there. As the patches are invasive particularly in tests (everything needs a timezone basically) less so in other areas I like to raise special attention to a couple of places where this has impact.
> 
> 1. All database DateTime fields are converted to timezone aware Timestamp fields. This impacts MySQL deployments particularly as MySQL was storing DateTime fields, which cannot be made timezone aware. Also, to make sure conversion happens properly we set the connection time zone to UTC. This is supported by Postgres and MySQL. However, it is not supported by SQLServer. So if you are running outside of UTC you need to take special care when upgrading.
> 
> 2. Thou shall not use datetime.now() and datetime.utcnow() when writing code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can still use it). Both create naive date times (yes even utcnow() ). You can use airflow.utils.timezone utcnow() for this. As you will not be able to store naive datetime fields anymore you will notice soon enough.
> 
> Finally, and that is the main reason fir this email, I am looking for feedback and testers. The PR can be found here: https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the tests yet, but you can see that I am working hard on that ;-).
> 
> Cheers
> Bolke