You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/30 00:34:02 UTC

[GitHub] [airflow] gor-obr opened a new issue #7999: Incorrect DAG scheduling after DST

gor-obr opened a new issue #7999: Incorrect DAG scheduling after DST
URL: https://github.com/apache/airflow/issues/7999
 
 
   **Apache Airflow version**: 1.10.9, 2.0.0dev
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release):
   - **Kernel** (e.g. `uname -a`):
   - **Install tools**:
   - **Others**:
   **What happened**:
   
   If DAG's cron contains "non trivial" hours section, Scheduler will not schedule DAG correctly immediately after DST switch. In this case, "non-trivial" means either having an interval (`0 7-8 * * *`) or multiple values (`0 7,9 * * *`).
   
   If DST occurs in morning (2-3 AM of March 8th), following following executions will occur:
   
   ```
   0 7-8 * * *
   "2020-03-07T07:00:00-08:00"
   "2020-03-07T08:00:00-08:00" # DST switch after this run
   "2020-03-08T08:00:00-07:00" # 8 AM instead of 7 AM
   
   0 7,9 * * *
   
   "2020-03-07T07:00:00-08:00"
   "2020-03-07T09:00:00-08:00"
   "2020-03-08T07:00:00-07:00" # DST switch after this run
   "2020-03-08T08:00:00-07:00" # 8 AM instead of 7 AM
   ```
   
   Cause for this is the method `is_fixed_time_schedule` in `dag.py`, which tests whether cron is "fixed" (i.e. "execute exactly at this time") or "relative" (i.e. "execute on each n hours"). Method relies on a quite crude test, it calculates two subsequent times and checks whether both hours and minutes of them are the same:
   
   ```python
           now = datetime.now()
           cron = croniter(self._schedule_interval, now)
   
           start = cron.get_next(datetime)
           cron_next = cron.get_next(datetime)
   
           if cron_next.minute == start.minute and cron_next.hour == start.hour:
               return True
   
           return False
   ```
   
   This is not satisfied in case of above examples (it is executed at two different hours during each day, so hours in subsequent executions are never the same).
   
   Based on this, method `following_schedule` in `dag.py` thinks that this DAG should be executed "on each n hours", and explicitly works around DST. It calculates the amount of time which needs to pass until next run, and adds that amount of time to the previous run, thus ignoring DST.
   
   ```python
               # We assume that DST transitions happen on the minute/hour
               if not self.is_fixed_time_schedule():
                   # relative offset (eg. every 5 minutes)
                   delta = cron.get_next(datetime) - naive
                   following = dttm.in_timezone(self.timezone).add_timedelta(delta)
               else:
                   # absolute (e.g. 3 AM)
                   naive = cron.get_next(datetime)
                   tz = pendulum.timezone(self.timezone.name)
                   following = timezone.make_aware(naive, tz)
   ```
   
   Note that only the first execution (or first few executions) will be affected. After them, the calculation will stabilize and will work correctly going forward.
   
   This is describes the same issue as https://issues.apache.org/jira/browse/AIRFLOW-7039
   
   
   **What you expected to happen**:
   
   <!-- What do you think went wrong? -->
   
   **How to reproduce it**:
   
   
   
   **Anything else we need to know**:
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] boring-cyborg[bot] commented on issue #7999: Incorrect DAG scheduling after DST

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #7999: Incorrect DAG scheduling after DST
URL: https://github.com/apache/airflow/issues/7999#issuecomment-605727273
 
 
   Thanks for opening your first issue here! Be sure to follow the issue template!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj edited a comment on issue #7999: Incorrect DAG scheduling after DST

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #7999: Incorrect DAG scheduling after DST
URL: https://github.com/apache/airflow/issues/7999#issuecomment-605729478
 
 
   Do you have any proposition for solutions or workarounds for this problem??

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj commented on issue #7999: Incorrect DAG scheduling after DST

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #7999: Incorrect DAG scheduling after DST
URL: https://github.com/apache/airflow/issues/7999#issuecomment-605729478
 
 
   Do you have any suggestions for solutions?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] gor-obr commented on issue #7999: Incorrect DAG scheduling after DST

Posted by GitBox <gi...@apache.org>.
gor-obr commented on issue #7999: Incorrect DAG scheduling after DST
URL: https://github.com/apache/airflow/issues/7999#issuecomment-605740563
 
 
   > Do you have any proposition for solutions or workarounds for this problem??
   
   I added a broader context with some possible solutions in the description of the bug. As I don't have any experience with the project, probably someone from the community is better suited to evaluate the alternatives.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services