You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "TERESA YAN (JIRA)" <ji...@apache.org> on 2016/06/23 18:44:16 UTC
[jira] [Comment Edited] (AIRFLOW-271) schedule_interval at a
particular time behaves strangely
[ https://issues.apache.org/jira/browse/AIRFLOW-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346946#comment-15346946 ]
TERESA YAN edited comment on AIRFLOW-271 at 6/23/16 6:43 PM:
-------------------------------------------------------------
I only specify the start_date property in the default_args, I change the start_date a little to include the hour and minute and reset the meta db before running it. My code looks like this right now.
{code}
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2016,6,20,13,1),
'email': email_list,
'email_on_failure': True,
'email_on_retry': True,
'retries': 3,
'retry_delay': timedelta(minutes=2),
'provide_context': True
}
dag = DAG('feed_scheduler_template', default_args=default_args, schedule_interval="01 13 * * *",
dagrun_timeout=timedelta(minutes=1))
{code}
I run my code on June 23 at hour 07. It created logs immediately on for 6/20, 6/21, but not 6/22, then after it passes 6/23 13:01, then it creates the log for ts 2016-06-22T13:01:00, it looks like it has a day delay? and it hasn't created log for 2016-06-23T13:01:00 so far.... although the machine time is already "Thu Jun 23 18:41:23 UTC 2016"
here are the logs
ls -l
-rw-rw-r-- 1 data data 4511 Jun 23 07:53 2016-06-20T13:01:00
-rw-rw-r-- 1 data data 4511 Jun 23 07:53 2016-06-21T13:01:00
-rw-rw-r-- 1 data data 4511 Jun 23 13:01 2016-06-22T13:01:00
attaching the main python script in the ticket
was (Author: tyan):
I only specify the start_date property in the default_args, I change the start_date a little to include the hour and minute and reset the meta db before running it. My code looks like this right now.
{code}
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2016,6,20,13,1),
'email': email_list,
'email_on_failure': True,
'email_on_retry': True,
'retries': 3,
'retry_delay': timedelta(minutes=2),
'provide_context': True
}
dag = DAG('feed_scheduler_template', default_args=default_args, schedule_interval="01 13 * * *",
dagrun_timeout=timedelta(minutes=1))
{code}
I run my code on June 23 at hour 07. It created logs immediately on for 6/20, 6/21, but not 6/22, then after it passes 6/23 13:01, then it creates the log for ts 2016-06-22T13:01:00, it looks like it has a day delay? and it hasn't created log for 2016-06-23T13:01:00 so far.... although the machine time is already "Thu Jun 23 18:41:23 UTC 2016"
here are the logs
ls -l
-rw-rw-r-- 1 data data 4511 Jun 23 07:53 2016-06-20T13:01:00
-rw-rw-r-- 1 data data 4511 Jun 23 07:53 2016-06-21T13:01:00
-rw-rw-r-- 1 data data 4511 Jun 23 13:01 2016-06-22T13:01:00
> schedule_interval at a particular time behaves strangely
> --------------------------------------------------------
>
> Key: AIRFLOW-271
> URL: https://issues.apache.org/jira/browse/AIRFLOW-271
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: Airflow 1.7.1
> Reporter: TERESA YAN
> Fix For: Airflow 1.7.1.2
>
>
> I have created a dag with the following configs in a python dag script.
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2016,6,20),
> 'email': email_list,
> 'email_on_failure': True,
> 'email_on_retry': True,
> 'retries': 3,
> 'retry_delay': timedelta(minutes=2),
> 'provide_context': True
> }
> dag = DAG('feed_scheduler_template', default_args=default_args, schedule_interval="01 16 * * *")
> When I run the scheduler, it gives a strange behavior, for example today is 6/20 19:30 (I clear the db when I run the scheduler), start_date is 6/20
> It will start running for the following three timestamps in the logs directory
> data@dp-i-54a2648f:~/airflow/logs/feed_scheduler_template $ ls -l send
> total 12
> -rw-rw-r-- 1 data data 3099 Jun 22 19:30 2016-06-20T00:00:00
> -rw-rw-r-- 1 data data 3100 Jun 22 19:30 2016-06-20T16:01:00
> -rw-rw-r-- 1 data data 3100 Jun 22 19:30 2016-06-21T16:01:00
> The question is
> 1. Why is 2016-06-20T00:00:00 at 0 hour 0 minute get executed because I only want 16:01.
> 2. I never get the 2016-06-22T16:01:00 run although my machine time already pass that 16:01 hour on June 22.
> Any idea?
> Thanks so much
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)