You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Kyle Hamlin <ha...@gmail.com> on 2018/04/18 17:56:29 UTC
Bit confused about start_date and schedule_interval related to
daily/weekly DAG
I'm a bit confused with how the scheduler catches up in relation to
start_date and schedule_interval. I have one dag that runs hourly:
dag = DAG(
dag_id='hourly_dag',
start_date=days_ago(1),
schedule_interval='@hourly',
default_args=ARGS)
When I start this DAG fresh it will catch up 24 hours + however many hours
have passed in the current day all the way up to the most recent hour. This
makes sense to me.
Now if I have a daily DAG:
dag = DAG(
dag_id='daily_dag',
start_date=days_ago(1),
schedule_interval='0 5 * * *',
default_args=ARGS)
Starting this DAG fresh will run yesterday's execution. This is fine since
I use the execution_date (ds_nodash) to have the task be lagged by one day.
What I can't seem to wrap my head around is how I would get this DAG to run
for the current day. I've tried passing is days_ago(0) but the tasks never
seem to start?
In addition to all that, I have a weekly DAG that must also use the
execution_date, but it needs the current weeks execution_date.
*How do I get a DAG that is not hourly to have an execution_date of the
current day or week?*
Re: Bit confused about start_date and schedule_interval related to
daily/weekly DAG
Posted by Ruiqin Yang <yr...@gmail.com>.
Hi Kyle,
The execution_date of the DAG run will always be lagged one day for your
daily DAG and one week for your weekly DAG. Under the hood, airflow will
calculate the execution_date and next execution_date of the task, and only
schedule the task when the current timestamp is bigger than the *next
execution_date.*
If you need date other than `ds` or `ds_nodash`, you can explore the other
default variables from here
<https://airflow.apache.org/code.html#default-variables>.
Cheers,
Kevin Y
On Wed, Apr 18, 2018 at 10:56 AM, Kyle Hamlin <ha...@gmail.com> wrote:
> I'm a bit confused with how the scheduler catches up in relation to
> start_date and schedule_interval. I have one dag that runs hourly:
>
> dag = DAG(
> dag_id='hourly_dag',
> start_date=days_ago(1),
> schedule_interval='@hourly',
> default_args=ARGS)
>
> When I start this DAG fresh it will catch up 24 hours + however many hours
> have passed in the current day all the way up to the most recent hour. This
> makes sense to me.
>
> Now if I have a daily DAG:
>
> dag = DAG(
> dag_id='daily_dag',
> start_date=days_ago(1),
> schedule_interval='0 5 * * *',
> default_args=ARGS)
>
> Starting this DAG fresh will run yesterday's execution. This is fine since
> I use the execution_date (ds_nodash) to have the task be lagged by one day.
> What I can't seem to wrap my head around is how I would get this DAG to run
> for the current day. I've tried passing is days_ago(0) but the tasks never
> seem to start?
>
> In addition to all that, I have a weekly DAG that must also use the
> execution_date, but it needs the current weeks execution_date.
>
> *How do I get a DAG that is not hourly to have an execution_date of the
> current day or week?*
>