You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Milan van der Meer <mi...@realimpactanalytics.com> on 2017/01/27 10:35:46 UTC

Behavior scheduler

Currently the behavior of the scheduler is counter intuitive which is
confirmed by the discussions on the Gitter channel on this subject. (also:
https://issues.apache.org/jira/browse/AIRFLOW-271)

*Current behavior:*

start_date = 22/01/2017 09:15
schedule_interval = '15 9 * * *'
current_date = 25/01/2017 15:00

The DAG gets scheduled each day at 09:15 until 24/01 (including 24/01). On
26/01 at 09:15, the DAG gets scheduled for 25/01 09:15.

*Expected behavior:*

start_date = 22/01/2017 09:15
schedule_interval = '15 9 * * *'
current_date = 25/01/2017 15:00

The DAG gets scheduled each day at 09:15 until 25/01 (including 25/01). On
26/01 at 09:15, the DAG gets scheduled for 26/01 09:15.

=========================================================================

The expected behavior is how you would think it works.
But mainly, with the expected behavior, you are still free to adjust your
program to get the current behavior with the use of the macro *{{
yesterday_ds }}. *

Of course, you can use *{{ tomorrow_ds }} *to get the expected behavior but
then it becomes even more difficult to reason about your schedulers
behavior.

Kind regards,
Milan

Re: Behavior scheduler

Posted by Bolke de Bruin <bd...@gmail.com>.
Hi,

(Welcome to the world of touchbar that sends your email right away). 

I agree with you that for people coming from a cron world this behaviour is counter intuitive. I also know that Airflow has a long history with running at the end of the interval “when data is available”. However, I think we need to reconsider or make this configurable (per dag / if using cron syntax / globally) as it makes certain schedules quite difficult to achieve. One such example was raised by Daniel van der Ende (and not answered).  An interval of "0 20 * * MON-FRI” in our Airflow slang we would expect to run Tuesday - Saturday, however that is not the case. It will actually run MON-FRI, where the MON execution_date will be the Friday before. This is hardly what one would expect and difficult to fix mentally.

Moreover, in this “yesterday_ds” will not deliver you the previous execution_date, but an the current execution_date - 1 day, just an arbitrary date I would say. My suggestion would be:

1. Add previous_ds and next_ds to the templates. These should reflect previous_schedule and next_schedule (already available)
2. Create a per dag “schedule_type” that can be “start” or “end”
3. Have two variables that determine the default for timedelta based schedules (“end”) and cron-syntax based schedules (“start”)

This might start a religious war, I am aware of that but the amount of new users coming from more “cron-like” systems warrants it and trying to educate them all might not be so convenient.

Bolke




> On 27 Jan 2017, at 11:35, Milan van der Meer <mi...@realimpactanalytics.com> wrote:
> 
> Currently the behavior of the scheduler is counter intuitive which is
> confirmed by the discussions on the Gitter channel on this subject. (also:
> https://issues.apache.org/jira/browse/AIRFLOW-271)
> 
> *Current behavior:*
> 
> start_date = 22/01/2017 09:15
> schedule_interval = '15 9 * * *'
> current_date = 25/01/2017 15:00
> 
> The DAG gets scheduled each day at 09:15 until 24/01 (including 24/01). On
> 26/01 at 09:15, the DAG gets scheduled for 25/01 09:15.
> 
> *Expected behavior:*
> 
> start_date = 22/01/2017 09:15
> schedule_interval = '15 9 * * *'
> current_date = 25/01/2017 15:00
> 
> The DAG gets scheduled each day at 09:15 until 25/01 (including 25/01). On
> 26/01 at 09:15, the DAG gets scheduled for 26/01 09:15.
> 
> =========================================================================
> 
> The expected behavior is how you would think it works.
> But mainly, with the expected behavior, you are still free to adjust your
> program to get the current behavior with the use of the macro *{{
> yesterday_ds }}. *
> 
> Of course, you can use *{{ tomorrow_ds }} *to get the expected behavior but
> then it becomes even more difficult to reason about your schedulers
> behavior.
> 
> Kind regards,
> Milan


Re: Behavior scheduler

Posted by Bolke de Bruin <bd...@gmail.com>.
Hi Milan,

We have discussed this over gitter. I agree with you that for people coming from a cron world this behaviour is counter intuitive
> On 27 Jan 2017, at 11:35, Milan van der Meer <mi...@realimpactanalytics.com> wrote:
> 
> Currently the behavior of the scheduler is counter intuitive which is
> confirmed by the discussions on the Gitter channel on this subject. (also:
> https://issues.apache.org/jira/browse/AIRFLOW-271)
> 
> *Current behavior:*
> 
> start_date = 22/01/2017 09:15
> schedule_interval = '15 9 * * *'
> current_date = 25/01/2017 15:00
> 
> The DAG gets scheduled each day at 09:15 until 24/01 (including 24/01). On
> 26/01 at 09:15, the DAG gets scheduled for 25/01 09:15.
> 
> *Expected behavior:*
> 
> start_date = 22/01/2017 09:15
> schedule_interval = '15 9 * * *'
> current_date = 25/01/2017 15:00
> 
> The DAG gets scheduled each day at 09:15 until 25/01 (including 25/01). On
> 26/01 at 09:15, the DAG gets scheduled for 26/01 09:15.
> 
> =========================================================================
> 
> The expected behavior is how you would think it works.
> But mainly, with the expected behavior, you are still free to adjust your
> program to get the current behavior with the use of the macro *{{
> yesterday_ds }}. *
> 
> Of course, you can use *{{ tomorrow_ds }} *to get the expected behavior but
> then it becomes even more difficult to reason about your schedulers
> behavior.
> 
> Kind regards,
> Milan