You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "George Leslie-Waksman (JIRA)" <ji...@apache.org> on 2018/01/08 20:56:00 UTC

[jira] [Commented] (AIRFLOW-1930) start_date and execution_date should default to timezone.utcnow() not to func.now()

    [ https://issues.apache.org/jira/browse/AIRFLOW-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317049#comment-16317049 ] 

George Leslie-Waksman commented on AIRFLOW-1930:
------------------------------------------------

1. Is a strong point and I empathize with the maintainability issue. I do, however, worry that the risks of allowing race conditions due to clock skew will result in harder bugs to track down and fix than maintaining `now` functions for different databases.

2. I don't understand why we need to worry about database configuration. If the column is timezone aware and the time is written to match the timezone it is written with, won't using tz aware datetime objects take care of the rest for us? If the DB is in PST and knows it, things should "just work". If the DB is in PST and thinks it's in EST, I don't see how that should be Airflow's responsibility to figure out.

3. For me, it's less about added value and more about decreased risk. Although rarely an issue in most cases, clock skew does happen and we want Airflow to be resilient to it. Time servers go down, ntp fails, light only travels so fast. Celery will certainly complain but it won't necessarily do anything to mitigate the problem. This creates a possibility where a scheduler could be running slow, a worker could be running fast, and we could end up with tasks that start (and finish) before they are even scheduled (according to the metadata db). Or, similarly, you could have tasks finish before their dependencies start (again according to the metadata db).

I would think we want to use a single source of truth for time, if at all possible. So, I'd say we want to use server time for everything.

In what situations won't `sql_utcnow` work?

> start_date and execution_date should default to timezone.utcnow() not to func.now()
> -----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-1930
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1930
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.9.0, 1.8.2
>            Reporter: Bolke de Bruin
>            Assignee: Bolke de Bruin
>             Fix For: 1.9.1
>
>
> func.now() defaults to the time zone of the database, while we assume every date in the db is UTC. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)