You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Chia-Hung Lin <cl...@googlemail.com.INVALID> on 2023/05/23 20:56:45 UTC

Schedule to run at specific time range for the past start end date question

I play a bit with the Airflow (v2.6.1) based on some setting such as
catchup, start_date, end_date. I find I can't achieve the effect I am
after. So here is my question.

Scenario
I want to schedule a dag for a period of time in the past to run on
the dates at a specific schedule value in the future. For instance, I
want to backfill the data between start date 2023-01-01 00:00:00 and
end date 2023-01-05 22:00:00. However, I also need to trigger the dag
to run at a specific time frame like every day 22-23 and the next
day's 0-2. All date and timestamp is in UTC.

My attempt (the code may not be correct because I do not have the
source at hand)
args={
    start_date=datetime(2023,1,1,0,0,0)
    end_date=datetime(2023,1,5,22,0,0)
    ...
}

dag = DAG('my_dag',
    default_args=args
    catchup=True // I also tested with False
    schedule="*/10 22-23,0-2 * * *"
    ...
)

my_task(dag) >> another_task(dag)

The problem I encountered
When setting catchup=False, the dag won't run. However, setting
catchup=True will cause the dag to run immediately, which is the
effect I want to avoid. In fact, I want the dag to run during specific
time frame (22 ~ 23 and the next day 0-2 am per 10 mins) everyday
after my dag is deployed to Airflow server.

 In such case how should I configure the dag so that it will achieve
the effect I am looking for? Please let me know if my explanation is
not clear. I appreciate any suggestions, and advice.

Many thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@airflow.apache.org
For additional commands, e-mail: users-help@airflow.apache.org


Re: Schedule to run at specific time range for the past start end date question

Posted by Daniel Standish <da...@astronomer.io.INVALID>.
I think I'm a bit confused about exactly what your use case is.  But it
might be easiest to just create the dag with the schedule you need on a
go-forward basis.  Then you can manually trigger the dag for the backfills
periods that you need, or create an ad hoc copy of the dag with restricted
start / end and catchup true for the backfill.

Related but probably not relevant to your specific case is the events
timetable, which lets you define a schedule that runs on specific
pre-determined dates that are not achievable through cron.
https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/timetable.html#eventstimetable

Re: Schedule to run at specific time range for the past start end date question

Posted by Charul Badoni <ch...@chartboost.com.INVALID>.
Hi Chia,

About running at a specific time range you can use BranchPythonOperator and
add a Python function that only triggers downstream tasks at a given time
of the day. For the rest times you can raise AirflowSkipException . Hope
this helps!

Regards,
Charul

On Tue, May 23, 2023 at 1:57 PM Chia-Hung Lin <cl...@googlemail.com.invalid>
wrote:

> I play a bit with the Airflow (v2.6.1) based on some setting such as
> catchup, start_date, end_date. I find I can't achieve the effect I am
> after. So here is my question.
>
> Scenario
> I want to schedule a dag for a period of time in the past to run on
> the dates at a specific schedule value in the future. For instance, I
> want to backfill the data between start date 2023-01-01 00:00:00 and
> end date 2023-01-05 22:00:00. However, I also need to trigger the dag
> to run at a specific time frame like every day 22-23 and the next
> day's 0-2. All date and timestamp is in UTC.
>
> My attempt (the code may not be correct because I do not have the
> source at hand)
> args={
>     start_date=datetime(2023,1,1,0,0,0)
>     end_date=datetime(2023,1,5,22,0,0)
>     ...
> }
>
> dag = DAG('my_dag',
>     default_args=args
>     catchup=True // I also tested with False
>     schedule="*/10 22-23,0-2 * * *"
>     ...
> )
>
> my_task(dag) >> another_task(dag)
>
> The problem I encountered
> When setting catchup=False, the dag won't run. However, setting
> catchup=True will cause the dag to run immediately, which is the
> effect I want to avoid. In fact, I want the dag to run during specific
> time frame (22 ~ 23 and the next day 0-2 am per 10 mins) everyday
> after my dag is deployed to Airflow server.
>
>  In such case how should I configure the dag so that it will achieve
> the effect I am looking for? Please let me know if my explanation is
> not clear. I appreciate any suggestions, and advice.
>
> Many thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@airflow.apache.org
> For additional commands, e-mail: users-help@airflow.apache.org
>
>