You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Reed Villanueva <rv...@ucera.org> on 2019/10/23 21:17:36 UTC

How to stop airflow running dags (at unscheduled times) automatically when initially turned on?

How to stop airflow running dags (at unscheduled times) automatically when
initially turned on?

My dags look like...

default_args = {
    'owner': 'rvillanueva',
    'depends_on_past': False,
    'start_date': datetime(2019, 10, 13),
    'email': ['me@co.com'],
    'email_on_failure': True,
    # 'email_on_retry': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=5),
    'max_active_runs': 3,
    # 'queue': 'bash_queue',
    # 'pool': 'backfill',
    # 'priority_weight': 10,
    # 'end_date': datetime(2016, 1, 1),}

dag = DAG('mydag', default_args=default_args, catchup=False,
schedule_interval="10 22 * * *")...

I had thought that having depends_on_past=False or catchup=False would be
enough to stop this, but dags are still running once right when they are
turned on in the webserver UI (causing them end up overlapping runs on
their actual scheduled times in some cases).

Is there any way to stop this?

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: How to stop airflow running dags (at unscheduled times) automatically when initially turned on?

Posted by Reed Villanueva <rv...@ucera.org>.
Reading what the comments there say and fiddling with the webserver UI,
this does appear to be the same problem.
Thank you for showing me this.

On Wed, Oct 23, 2019 at 11:18 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> You might be hitting this bug
> https://issues.apache.org/jira/browse/AIRFLOW-3369
>
> -a
>
> On 24 Oct 2019, at 00:04, Reed Villanueva <rv...@ucera.org> wrote:
>
> Interesting.
> Is there no way to turn off this "checking if the dag ran at scheduled
> time yesterday, else run" behavior (eg. in the case where I have just run
> "airflow resetdb" for testing reasons)? That was the reason I had the
> catchup=False  setting in the dag object (I may be misinterpreting the
> answer from here: https://stackoverflow.com/a/43122799/8236733).
>
> On Wed, Oct 23, 2019 at 11:50 AM Shaw, Damian P. <
> damian.shaw.2@credit-suisse.com> wrote:
>
>>
>>
>> Airflow isn’t like crontab which just runs at a specific time. Airflow
>> checks if it successfully ran a DAG between the current time and the last
>> time it was due to be scheduled.
>>
>>
>>
>> So in your example your start date is 2019-10-13 and your schedule is
>> every day at 22:10, therefore when you start the DAG a DAG Run will be
>> immediately kicked off at there should have been a scheduled run today (or
>> yesterday) at 22:10.
>>
>>
>>
>> If you do not want this set your start date to the first DAG Run date you
>> wish to occur(e.g. at time of writing 2019-10-23 would not run till
>> “tomorrow”).
>>
>>
>>
>>
>>
>> *From:* Reed Villanueva [mailto:rvillanueva@ucera.org]
>> *Sent:* Wednesday, October 23, 2019 5:18 PM
>> *To:* users@airflow.apache.org
>> *Subject:* How to stop airflow running dags (at unscheduled times)
>> automatically when initially turned on?
>>
>>
>>
>> How to stop airflow running dags (at unscheduled times) automatically
>> when initially turned on?
>>
>> My dags look like...
>>
>> default_args = {
>>
>>     'owner': 'rvillanueva',
>>
>>     'depends_on_past': False,
>>
>>     'start_date': datetime(2019, 10, 13),
>>
>>     'email': ['me@co.com'],
>>
>>     'email_on_failure': True,
>>
>>     # 'email_on_retry': False,
>>
>>     'retries': 0,
>>
>>     'retry_delay': timedelta(minutes=5),
>>
>>     'max_active_runs': 3,
>>
>>     # 'queue': 'bash_queue',
>>
>>     # 'pool': 'backfill',
>>
>>     # 'priority_weight': 10,
>>
>>     # 'end_date': datetime(2016, 1, 1),
>>
>> }
>>
>>
>>
>> dag = DAG('mydag', default_args=default_args, catchup=False, schedule_interval="10 22 * * *")
>>
>> ...
>>
>> I had thought that having depends_on_past=False or catchup=False would
>> be enough to stop this, but dags are still running once right when they are
>> turned on in the webserver UI (causing them end up overlapping runs on
>> their actual scheduled times in some cases).
>>
>> Is there any way to stop this?
>>
>>
>> This electronic message is intended only for the named
>> recipient, and may contain information that is confidential or
>> privileged. If you are not the intended recipient, you are
>> hereby notified that any disclosure, copying, distribution or
>> use of the contents of this message is strictly prohibited. If
>> you have received this message in error or are not the named
>> recipient, please notify us immediately by contacting the
>> sender at the electronic mail address noted above, and delete
>> and destroy all copies of this message. Thank you.
>>
>>
>>
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic
>> communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>
>> ==============================================================================
>>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>
>
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: How to stop airflow running dags (at unscheduled times) automatically when initially turned on?

Posted by Ash Berlin-Taylor <as...@apache.org>.
You might be hitting this bug https://issues.apache.org/jira/browse/AIRFLOW-3369 <https://issues.apache.org/jira/browse/AIRFLOW-3369>

-a

> On 24 Oct 2019, at 00:04, Reed Villanueva <rv...@ucera.org> wrote:
> 
> Interesting.
> Is there no way to turn off this "checking if the dag ran at scheduled time yesterday, else run" behavior (eg. in the case where I have just run "airflow resetdb" for testing reasons)? That was the reason I had the catchup=False  setting in the dag object (I may be misinterpreting the answer from here: https://stackoverflow.com/a/43122799/8236733 <https://stackoverflow.com/a/43122799/8236733>). 
> 
> On Wed, Oct 23, 2019 at 11:50 AM Shaw, Damian P. <damian.shaw.2@credit-suisse.com <ma...@credit-suisse.com>> wrote:
>  
> 
> Airflow isn’t like crontab which just runs at a specific time. Airflow checks if it successfully ran a DAG between the current time and the last time it was due to be scheduled.
> 
>  
> 
> So in your example your start date is 2019-10-13 and your schedule is every day at 22:10, therefore when you start the DAG a DAG Run will be immediately kicked off at there should have been a scheduled run today (or yesterday) at 22:10.
> 
>  
> 
> If you do not want this set your start date to the first DAG Run date you wish to occur(e.g. at time of writing 2019-10-23 would not run till “tomorrow”).
> 
>  
> 
>  
> 
> From: Reed Villanueva [mailto:rvillanueva@ucera.org <ma...@ucera.org>] 
> Sent: Wednesday, October 23, 2019 5:18 PM
> To: users@airflow.apache.org <ma...@airflow.apache.org>
> Subject: How to stop airflow running dags (at unscheduled times) automatically when initially turned on?
> 
>  
> 
> How to stop airflow running dags (at unscheduled times) automatically when initially turned on?
> 
> My dags look like...
> 
> default_args = {
>     'owner': 'rvillanueva',
>     'depends_on_past': False,
>     'start_date': datetime(2019, 10, 13),
>     'email': ['me@co.com <ma...@co.com>'],
>     'email_on_failure': True,
>     # 'email_on_retry': False,
>     'retries': 0,
>     'retry_delay': timedelta(minutes=5),
>     'max_active_runs': 3,
>     # 'queue': 'bash_queue',
>     # 'pool': 'backfill',
>     # 'priority_weight': 10,
>     # 'end_date': datetime(2016, 1, 1),
> }
>  
> dag = DAG('mydag', default_args=default_args, catchup=False, schedule_interval="10 22 * * *")
> ...
> I had thought that having depends_on_past=False or catchup=False would be enough to stop this, but dags are still running once right when they are turned on in the webserver UI (causing them end up overlapping runs on their actual scheduled times in some cases).
> Is there any way to stop this?
> 
> 
> This electronic message is intended only for the named 
> recipient, and may contain information that is confidential or 
> privileged. If you are not the intended recipient, you are 
> hereby notified that any disclosure, copying, distribution or 
> use of the contents of this message is strictly prohibited. If 
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the 
> sender at the electronic mail address noted above, and delete 
> and destroy all copies of this message. Thank you.
> 
> 
> 
> 
> ==============================================================================
> Please access the attached hyperlink for an important electronic communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html <http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html>
> ==============================================================================
> 
> 
> This electronic message is intended only for the named 
> recipient, and may contain information that is confidential or 
> privileged. If you are not the intended recipient, you are 
> hereby notified that any disclosure, copying, distribution or 
> use of the contents of this message is strictly prohibited. If 
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the 
> sender at the electronic mail address noted above, and delete 
> and destroy all copies of this message. Thank you.


Re: How to stop airflow running dags (at unscheduled times) automatically when initially turned on?

Posted by Reed Villanueva <rv...@ucera.org>.
Interesting.
Is there no way to turn off this "checking if the dag ran at scheduled time
yesterday, else run" behavior (eg. in the case where I have just run
"airflow resetdb" for testing reasons)? That was the reason I had the
catchup=False  setting in the dag object (I may be misinterpreting the
answer from here: https://stackoverflow.com/a/43122799/8236733).

On Wed, Oct 23, 2019 at 11:50 AM Shaw, Damian P. <
damian.shaw.2@credit-suisse.com> wrote:

>
>
> Airflow isn’t like crontab which just runs at a specific time. Airflow
> checks if it successfully ran a DAG between the current time and the last
> time it was due to be scheduled.
>
>
>
> So in your example your start date is 2019-10-13 and your schedule is
> every day at 22:10, therefore when you start the DAG a DAG Run will be
> immediately kicked off at there should have been a scheduled run today (or
> yesterday) at 22:10.
>
>
>
> If you do not want this set your start date to the first DAG Run date you
> wish to occur(e.g. at time of writing 2019-10-23 would not run till
> “tomorrow”).
>
>
>
>
>
> *From:* Reed Villanueva [mailto:rvillanueva@ucera.org]
> *Sent:* Wednesday, October 23, 2019 5:18 PM
> *To:* users@airflow.apache.org
> *Subject:* How to stop airflow running dags (at unscheduled times)
> automatically when initially turned on?
>
>
>
> How to stop airflow running dags (at unscheduled times) automatically when
> initially turned on?
>
> My dags look like...
>
> default_args = {
>
>     'owner': 'rvillanueva',
>
>     'depends_on_past': False,
>
>     'start_date': datetime(2019, 10, 13),
>
>     'email': ['me@co.com'],
>
>     'email_on_failure': True,
>
>     # 'email_on_retry': False,
>
>     'retries': 0,
>
>     'retry_delay': timedelta(minutes=5),
>
>     'max_active_runs': 3,
>
>     # 'queue': 'bash_queue',
>
>     # 'pool': 'backfill',
>
>     # 'priority_weight': 10,
>
>     # 'end_date': datetime(2016, 1, 1),
>
> }
>
>
>
> dag = DAG('mydag', default_args=default_args, catchup=False, schedule_interval="10 22 * * *")
>
> ...
>
> I had thought that having depends_on_past=False or catchup=False would be
> enough to stop this, but dags are still running once right when they are
> turned on in the webserver UI (causing them end up overlapping runs on
> their actual scheduled times in some cases).
>
> Is there any way to stop this?
>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>
>
>
>
> ==============================================================================
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>
> ==============================================================================
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

RE: How to stop airflow running dags (at unscheduled times) automatically when initially turned on?

Posted by "Shaw, Damian P. " <da...@credit-suisse.com>.
Airflow isn’t like crontab which just runs at a specific time. Airflow checks if it successfully ran a DAG between the current time and the last time it was due to be scheduled.

So in your example your start date is 2019-10-13 and your schedule is every day at 22:10, therefore when you start the DAG a DAG Run will be immediately kicked off at there should have been a scheduled run today (or yesterday) at 22:10.

If you do not want this set your start date to the first DAG Run date you wish to occur(e.g. at time of writing 2019-10-23 would not run till “tomorrow”).


From: Reed Villanueva [mailto:rvillanueva@ucera.org]
Sent: Wednesday, October 23, 2019 5:18 PM
To: users@airflow.apache.org
Subject: How to stop airflow running dags (at unscheduled times) automatically when initially turned on?


How to stop airflow running dags (at unscheduled times) automatically when initially turned on?

My dags look like...

default_args = {

    'owner': 'rvillanueva',

    'depends_on_past': False,

    'start_date': datetime(2019, 10, 13),

    'email': ['me@co.com<ma...@co.com>'],

    'email_on_failure': True,

    # 'email_on_retry': False,

    'retries': 0,

    'retry_delay': timedelta(minutes=5),

    'max_active_runs': 3,

    # 'queue': 'bash_queue',

    # 'pool': 'backfill',

    # 'priority_weight': 10,

    # 'end_date': datetime(2016, 1, 1),

}



dag = DAG('mydag', default_args=default_args, catchup=False, schedule_interval="10 22 * * *")

...

I had thought that having depends_on_past=False or catchup=False would be enough to stop this, but dags are still running once right when they are turned on in the webserver UI (causing them end up overlapping runs on their actual scheduled times in some cases).

Is there any way to stop this?

This electronic message is intended only for the named
recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are
hereby notified that any disclosure, copying, distribution or
use of the contents of this message is strictly prohibited. If
you have received this message in error or are not the named
recipient, please notify us immediately by contacting the
sender at the electronic mail address noted above, and delete
and destroy all copies of this message. Thank you.



=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================