You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Kyle Hamlin <ha...@gmail.com> on 2018/04/18 17:56:29 UTC

Bit confused about start_date and schedule_interval related to daily/weekly DAG

I'm a bit confused with how the scheduler catches up in relation to
start_date and schedule_interval. I have one dag that runs hourly:

dag = DAG(
    dag_id='hourly_dag',
    start_date=days_ago(1),
    schedule_interval='@hourly',
    default_args=ARGS)

When I start this DAG fresh it will catch up 24 hours + however many hours
have passed in the current day all the way up to the most recent hour. This
makes sense to me.

Now if I have a daily DAG:

dag = DAG(
    dag_id='daily_dag',
    start_date=days_ago(1),
    schedule_interval='0 5 * * *',
    default_args=ARGS)

Starting this DAG fresh will run yesterday's execution. This is fine since
I use the execution_date (ds_nodash) to have the task be lagged by one day.
What I can't seem to wrap my head around is how I would get this DAG to run
for the current day. I've tried passing is days_ago(0) but the tasks never
seem to start?

In addition to all that, I have a weekly DAG that must also use the
execution_date, but it needs the current weeks execution_date.

*How do I get a DAG that is not hourly to have an execution_date of the
current day or week?*

Re: Bit confused about start_date and schedule_interval related to daily/weekly DAG

Posted by Ruiqin Yang <yr...@gmail.com>.
Hi Kyle,
The execution_date of the DAG run will always be lagged one day for your
daily DAG and one week for your weekly DAG. Under the hood, airflow will
calculate the execution_date and next execution_date of the task, and only
schedule the task when the current timestamp is bigger than the *next
execution_date.*

If you need date other than `ds` or `ds_nodash`, you can explore the other
default variables from here
<https://airflow.apache.org/code.html#default-variables>.

Cheers,
Kevin Y

On Wed, Apr 18, 2018 at 10:56 AM, Kyle Hamlin <ha...@gmail.com> wrote:

> I'm a bit confused with how the scheduler catches up in relation to
> start_date and schedule_interval. I have one dag that runs hourly:
>
> dag = DAG(
>     dag_id='hourly_dag',
>     start_date=days_ago(1),
>     schedule_interval='@hourly',
>     default_args=ARGS)
>
> When I start this DAG fresh it will catch up 24 hours + however many hours
> have passed in the current day all the way up to the most recent hour. This
> makes sense to me.
>
> Now if I have a daily DAG:
>
> dag = DAG(
>     dag_id='daily_dag',
>     start_date=days_ago(1),
>     schedule_interval='0 5 * * *',
>     default_args=ARGS)
>
> Starting this DAG fresh will run yesterday's execution. This is fine since
> I use the execution_date (ds_nodash) to have the task be lagged by one day.
> What I can't seem to wrap my head around is how I would get this DAG to run
> for the current day. I've tried passing is days_ago(0) but the tasks never
> seem to start?
>
> In addition to all that, I have a weekly DAG that must also use the
> execution_date, but it needs the current weeks execution_date.
>
> *How do I get a DAG that is not hourly to have an execution_date of the
> current day or week?*
>