You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2019/10/29 18:49:47 UTC

[GitHub] [airflow] dstandish commented on a change in pull request #5787: [AIRFLOW-5172] Add choice of interval edge scheduling

dstandish commented on a change in pull request #5787: [AIRFLOW-5172] Add choice of interval edge scheduling
URL: https://github.com/apache/airflow/pull/5787#discussion_r340259297
 
 

 ##########
 File path: airflow/jobs/scheduler_job.py
 ##########
 @@ -648,7 +648,7 @@ def create_dag_run(self, dag, session=None):
             if dag.schedule_interval == '@once':
                 period_end = next_run_date
             elif next_run_date:
-                period_end = dag.following_schedule(next_run_date)
+                period_end = dag.period_end(next_run_date)
 
 Review comment:
   it seems there is an inconsistency in the language here.  
   
   The function is called `period_end`, yet sometimes it returns the start of the interval. 
   
   The name `period_end` for this variable reflects an assumption that dags are scheduled after the _end_ of the interval.  
   
   That's why the code checks that `period_end <= timezone.utcnow()`: end of interval is in the past.
   
   But language of this PR is in conflict with that.  
   
   The parameter `schedule_at_interval_end` implies that the _interval_ doesn't change, but _where we schedule_ does.  So, we may schedule at start or end of "the interval".  But as written, if `schedule_at_interval_end=False`, it will in general be the case that `period_end==excecution_date`, which implies that exec date is the end of "the interval" and not the start, and this is a contradiction.
   
   It seems that what `period_end` represents in this code is more like `run_after_dttm` -- the datetime before which the dag may not be scheduled.  When `schedule_at_interval_end` is True, we can run after exec date + 1 interval; otherwise, we can run after exec date.  
   
   So in this bit of code, we probably don't even need `period_end()` because we could do this:
   ```
               run_after_dttm = None
               if dag.schedule_interval == '@once':
                   run_after_dttm = next_run_date
               elif next_run_date and not self.schedule_at_interval_end:
                   run_after_dttm = next_run_date
               elif next_run_date and self.schedule_at_interval_end:
                   run_after_dttm = dag.following_schedule(next_run_date)
   ```
   And this:
   ```
               if next_run_date and run_after_dttm and run_after_dttm <= timezone.utcnow():
   ```
   
   But elsewhere, it seems that `period_end()` function is used to mean `min_run_date` or `target_run_date` or `run_after_date`.  Perhaps a name like this would be clearer.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services