You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "TERESA YAN (JIRA)" <ji...@apache.org> on 2016/06/23 18:44:16 UTC

[jira] [Comment Edited] (AIRFLOW-271) schedule_interval at a particular time behaves strangely

    [ https://issues.apache.org/jira/browse/AIRFLOW-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346946#comment-15346946 ] 

TERESA YAN edited comment on AIRFLOW-271 at 6/23/16 6:43 PM:
-------------------------------------------------------------

I only specify the start_date property in the default_args, I change the start_date a little to include the hour and minute and reset the meta db before running it.  My code looks like this right now.

{code}
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2016,6,20,13,1),
    'email': email_list,
    'email_on_failure': True,
    'email_on_retry': True,
    'retries': 3,
    'retry_delay': timedelta(minutes=2),
    'provide_context': True
}

dag = DAG('feed_scheduler_template', default_args=default_args, schedule_interval="01 13 * * *",
          dagrun_timeout=timedelta(minutes=1))

{code}

I run my code on June 23 at hour 07.  It created logs immediately on for 6/20, 6/21, but not 6/22, then after it passes 6/23 13:01, then it creates the log for ts 2016-06-22T13:01:00, it looks like it has a day delay?  and it hasn't created log for 2016-06-23T13:01:00 so far.... although the machine time is already  "Thu Jun 23 18:41:23 UTC 2016"

here are the logs
ls -l
-rw-rw-r-- 1 data data 4511 Jun 23 07:53 2016-06-20T13:01:00
-rw-rw-r-- 1 data data 4511 Jun 23 07:53 2016-06-21T13:01:00
-rw-rw-r-- 1 data data 4511 Jun 23 13:01 2016-06-22T13:01:00

attaching the main python script in the ticket


was (Author: tyan):
I only specify the start_date property in the default_args, I change the start_date a little to include the hour and minute and reset the meta db before running it.  My code looks like this right now.

{code}
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2016,6,20,13,1),
    'email': email_list,
    'email_on_failure': True,
    'email_on_retry': True,
    'retries': 3,
    'retry_delay': timedelta(minutes=2),
    'provide_context': True
}

dag = DAG('feed_scheduler_template', default_args=default_args, schedule_interval="01 13 * * *",
          dagrun_timeout=timedelta(minutes=1))

{code}

I run my code on June 23 at hour 07.  It created logs immediately on for 6/20, 6/21, but not 6/22, then after it passes 6/23 13:01, then it creates the log for ts 2016-06-22T13:01:00, it looks like it has a day delay?  and it hasn't created log for 2016-06-23T13:01:00 so far.... although the machine time is already  "Thu Jun 23 18:41:23 UTC 2016"

here are the logs
ls -l
-rw-rw-r-- 1 data data 4511 Jun 23 07:53 2016-06-20T13:01:00
-rw-rw-r-- 1 data data 4511 Jun 23 07:53 2016-06-21T13:01:00
-rw-rw-r-- 1 data data 4511 Jun 23 13:01 2016-06-22T13:01:00

> schedule_interval at a particular time behaves strangely
> --------------------------------------------------------
>
>                 Key: AIRFLOW-271
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-271
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: Airflow 1.7.1
>            Reporter: TERESA YAN
>             Fix For: Airflow 1.7.1.2
>
>
> I have created a dag with the following configs in a python dag script.
> default_args = {
>     'owner': 'airflow',
>     'depends_on_past': False,
>     'start_date': datetime(2016,6,20),
>     'email': email_list,
>     'email_on_failure': True,
>     'email_on_retry': True,
>     'retries': 3,
>     'retry_delay': timedelta(minutes=2),
>     'provide_context': True
> }
> dag = DAG('feed_scheduler_template', default_args=default_args, schedule_interval="01 16 * * *")
> When I run the scheduler,  it gives a strange behavior, for example today is 6/20 19:30  (I clear the db when I run the scheduler), start_date is 6/20
> It will start running for the following three timestamps in the logs directory
> data@dp-i-54a2648f:~/airflow/logs/feed_scheduler_template $ ls -l send
> total 12
> -rw-rw-r-- 1 data data 3099 Jun 22 19:30 2016-06-20T00:00:00
> -rw-rw-r-- 1 data data 3100 Jun 22 19:30 2016-06-20T16:01:00
> -rw-rw-r-- 1 data data 3100 Jun 22 19:30 2016-06-21T16:01:00
> The question is
> 1.  Why is 2016-06-20T00:00:00 at 0 hour 0 minute get executed because I only want 16:01.
> 2.  I never get the 2016-06-22T16:01:00 run although my machine time already pass that 16:01 hour on June 22.
> Any idea?
> Thanks so much



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)