You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Chia-Hung Lin <cl...@googlemail.com.INVALID> on 2023/05/23 20:56:45 UTC
Schedule to run at specific time range for the past start end date question
I play a bit with the Airflow (v2.6.1) based on some setting such as
catchup, start_date, end_date. I find I can't achieve the effect I am
after. So here is my question.
Scenario
I want to schedule a dag for a period of time in the past to run on
the dates at a specific schedule value in the future. For instance, I
want to backfill the data between start date 2023-01-01 00:00:00 and
end date 2023-01-05 22:00:00. However, I also need to trigger the dag
to run at a specific time frame like every day 22-23 and the next
day's 0-2. All date and timestamp is in UTC.
My attempt (the code may not be correct because I do not have the
source at hand)
args={
start_date=datetime(2023,1,1,0,0,0)
end_date=datetime(2023,1,5,22,0,0)
...
}
dag = DAG('my_dag',
default_args=args
catchup=True // I also tested with False
schedule="*/10 22-23,0-2 * * *"
...
)
my_task(dag) >> another_task(dag)
The problem I encountered
When setting catchup=False, the dag won't run. However, setting
catchup=True will cause the dag to run immediately, which is the
effect I want to avoid. In fact, I want the dag to run during specific
time frame (22 ~ 23 and the next day 0-2 am per 10 mins) everyday
after my dag is deployed to Airflow server.
In such case how should I configure the dag so that it will achieve
the effect I am looking for? Please let me know if my explanation is
not clear. I appreciate any suggestions, and advice.
Many thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@airflow.apache.org
For additional commands, e-mail: users-help@airflow.apache.org
Re: Schedule to run at specific time range for the past start end date question
Posted by Daniel Standish <da...@astronomer.io.INVALID>.
I think I'm a bit confused about exactly what your use case is. But it
might be easiest to just create the dag with the schedule you need on a
go-forward basis. Then you can manually trigger the dag for the backfills
periods that you need, or create an ad hoc copy of the dag with restricted
start / end and catchup true for the backfill.
Related but probably not relevant to your specific case is the events
timetable, which lets you define a schedule that runs on specific
pre-determined dates that are not achievable through cron.
https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/timetable.html#eventstimetable
Re: Schedule to run at specific time range for the past start end date question
Posted by Charul Badoni <ch...@chartboost.com.INVALID>.
Hi Chia,
About running at a specific time range you can use BranchPythonOperator and
add a Python function that only triggers downstream tasks at a given time
of the day. For the rest times you can raise AirflowSkipException . Hope
this helps!
Regards,
Charul
On Tue, May 23, 2023 at 1:57 PM Chia-Hung Lin <cl...@googlemail.com.invalid>
wrote:
> I play a bit with the Airflow (v2.6.1) based on some setting such as
> catchup, start_date, end_date. I find I can't achieve the effect I am
> after. So here is my question.
>
> Scenario
> I want to schedule a dag for a period of time in the past to run on
> the dates at a specific schedule value in the future. For instance, I
> want to backfill the data between start date 2023-01-01 00:00:00 and
> end date 2023-01-05 22:00:00. However, I also need to trigger the dag
> to run at a specific time frame like every day 22-23 and the next
> day's 0-2. All date and timestamp is in UTC.
>
> My attempt (the code may not be correct because I do not have the
> source at hand)
> args={
> start_date=datetime(2023,1,1,0,0,0)
> end_date=datetime(2023,1,5,22,0,0)
> ...
> }
>
> dag = DAG('my_dag',
> default_args=args
> catchup=True // I also tested with False
> schedule="*/10 22-23,0-2 * * *"
> ...
> )
>
> my_task(dag) >> another_task(dag)
>
> The problem I encountered
> When setting catchup=False, the dag won't run. However, setting
> catchup=True will cause the dag to run immediately, which is the
> effect I want to avoid. In fact, I want the dag to run during specific
> time frame (22 ~ 23 and the next day 0-2 am per 10 mins) everyday
> after my dag is deployed to Airflow server.
>
> In such case how should I configure the dag so that it will achieve
> the effect I am looking for? Please let me know if my explanation is
> not clear. I appreciate any suggestions, and advice.
>
> Many thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@airflow.apache.org
> For additional commands, e-mail: users-help@airflow.apache.org
>
>