You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kamil Bregula (Jira)" <ji...@apache.org> on 2020/03/29 21:33:00 UTC

[jira] [Commented] (AIRFLOW-7039) Specific DAG Schedule & DST Results in Skipped DAG Run

    [ https://issues.apache.org/jira/browse/AIRFLOW-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070545#comment-17070545 ] 

Kamil Bregula commented on AIRFLOW-7039:
----------------------------------------

[~potiuk]You are assigned to this task. What is his progress? 

> Specific DAG Schedule & DST Results in Skipped DAG Run
> ------------------------------------------------------
>
>                 Key: AIRFLOW-7039
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-7039
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.7
>         Environment: Amazon Linux 2 AMI
>            Reporter: Peter Kim
>            Assignee: Jarek Potiuk
>            Priority: Critical
>              Labels: timezone
>
> *Scenario:* 
> EC2 running airflow is in Eastern Time (America/New_York), 
> airflow.cfg>[core]>default_timezone=America/New_York (automatically changes correctly)
> Monday morning after Daylight Savings Time applied a handful of DAG runs were not executed as expected.  The strange part is that these DAGs were the only jobs that did not behave as expected, all other DAGs ran normally.  Additionally, only the first expected run after DST was skipped, subsequent runs later that day were scheduled successfully.
> Here is the pattern observed:
> DAG Schedule which skipped first run:  (0 <hour_a>,<hour_b> * * 1,2,3,4,5)
> e.g. Schedules M-F, with two distinct runs per day.
> DAGs that run at one time, M-F & DAGs that run at two times, not M-F did not experience this issue.  
>  
> Based on the logs, it appears as if the expected run that was missed was not seen by the scheduler whatsoever (see below):
>  
>  
> 2020 03 06 6:30 AM ET (BEFORE DST, EXPECTED BEHAVIOR):
> [2020-03-06 06:31:01,220] \{logging_mixin.py:112} INFO - [2020-03-06 06:31:01,220] \{settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=697
> [2020-03-06 06:31:01,222] \{scheduler_job.py:153} INFO - Started process (PID=697) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 06:31:01,228] \{scheduler_job.py:1539} INFO - Processing file /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
> [2020-03-06 06:31:01,228] \{logging_mixin.py:112} INFO - [2020-03-06 06:31:01,228] \{dagbag.py:403} INFO - Filling up the DagBag from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 06:31:01,238] \{scheduler_job.py:1551} INFO - DAG(s) dict_keys(['<DAG_NAME>']) retrieved from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 06:31:01,305] \{scheduler_job.py:1262} INFO - Processing <DAG_NAME>
> [2020-03-06 06:31:01,348] \{logging_mixin.py:112} WARNING - /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170: Warning: (1300, "Invalid utf8mb4 character string: '80047D'") result = self._query(query)
> [2020-03-06 06:31:01,362] \{scheduler_job.py:1272} INFO - Created <DagRun <DAG_NAME> @ 2020-03-05T15:30:00+00:00: scheduled__2020-03-05T15:30:00+00:00, externally triggered: False>
> [2020-03-06 06:31:01,366] \{scheduler_job.py:740} INFO - Examining DAG run <DagRun <DAG_NAME> @ 2020-03-05 15:30:00+00:00: scheduled__2020-03-05T15:30:00+00:00, externally triggered: False>
> [2020-03-06 06:31:01,389] \{scheduler_job.py:440} INFO - Skipping SLA check for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
> [2020-03-06 06:31:01,395] \{scheduler_job.py:1613} INFO - Creating / updating <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-05 15:30:00+00:00 [scheduled]> in ORM
> [2020-03-06 06:31:01,414] \{scheduler_job.py:161} INFO - Processing /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.192 seconds
> 20200306 10 AM ET (BEFORE DST, EXPECTED BEHAVIOR):
> [2020-03-06 10:30:00,083] \{logging_mixin.py:112} INFO - [2020-03-06 10:30:00,082] \{settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=16194
> [2020-03-06 10:30:00,085] \{scheduler_job.py:153} INFO - Started process (PID=16194) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 10:30:00,090] \{scheduler_job.py:1539} INFO - Processing file /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
> [2020-03-06 10:30:00,090] \{logging_mixin.py:112} INFO - [2020-03-06 10:30:00,090] \{dagbag.py:403} INFO - Filling up the DagBag from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 10:30:00,099] \{scheduler_job.py:1551} INFO - DAG(s) dict_keys(['<DAG_NAME>']) retrieved from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-06 10:30:00,159] \{scheduler_job.py:1262} INFO - Processing <DAG_NAME>
> [2020-03-06 10:30:00,193] \{logging_mixin.py:112} WARNING - /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170: Warning: (1300, "Invalid utf8mb4 character string: '80047D'")
>   result = self._query(query)
> [2020-03-06 10:30:00,207] \{scheduler_job.py:1272} INFO - Created <DagRun <DAG_NAME> @ 2020-03-06T11:30:00+00:00: scheduled__2020-03-06T11:30:00+00:00, externally triggered: False>
> [2020-03-06 10:30:00,212] \{scheduler_job.py:740} INFO - Examining DAG run <DagRun <DAG_NAME> @ 2020-03-06 11:30:00+00:00: scheduled__2020-03-06T11:30:00+00:00, externally triggered: False>
> [2020-03-06 10:30:00,232] \{scheduler_job.py:440} INFO - Skipping SLA check for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
> [2020-03-06 10:30:00,236] \{scheduler_job.py:1613} INFO - Creating / updating <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-06 11:30:00+00:00 [scheduled]> in ORM
> [2020-03-06 10:30:00,251] \{scheduler_job.py:161} INFO - Processing /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.166 seconds
>  
> 2020 03 09 6:30 ET (AFTER DST, *UNEXPECTED BEHAVIOR*):
> [2020-03-09 06:31:55,240] \{logging_mixin.py:112} INFO - [2020-03-09 06:31:55,240] \{settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=23336
> [2020-03-09 06:31:55,242] \{scheduler_job.py:153} INFO - Started process (PID=23336) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 06:31:55,247] \{scheduler_job.py:1539} INFO - Processing file /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
> [2020-03-09 06:31:55,247] \{logging_mixin.py:112} INFO - [2020-03-09 06:31:55,247] \{dagbag.py:403} INFO - Filling up the DagBag from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 06:31:55,257] \{scheduler_job.py:1551} INFO - DAG(s) dict_keys(['<DAG_NAME>']) retrieved from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 06:31:55,312] \{scheduler_job.py:1262} INFO - Processing <DAG_NAME>
> [2020-03-09 06:31:55,338] \{scheduler_job.py:440} INFO - Skipping SLA check for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
> [2020-03-09 06:31:55,343] \{scheduler_job.py:161} INFO - Processing /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.101 seconds
> 20200309 10 AM ET (AFTER DST, EXPECTED BEHAVIOR):
> [2020-03-09 10:30:54,487] \{logging_mixin.py:112} INFO - [2020-03-09 10:30:54,487] \{settings.py:254} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=6421
> [2020-03-09 10:30:54,490] \{scheduler_job.py:153} INFO - Started process (PID=6421) to work on /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 10:30:54,498] \{scheduler_job.py:1539} INFO - Processing file /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py for tasks to queue
> [2020-03-09 10:30:54,499] \{logging_mixin.py:112} INFO - [2020-03-09 10:30:54,499] \{dagbag.py:403} INFO - Filling up the DagBag from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 10:30:54,508] \{scheduler_job.py:1551} INFO - DAG(s) dict_keys(['<DAG_NAME>']) retrieved from /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py
> [2020-03-09 10:30:54,585] \{scheduler_job.py:1262} INFO - Processing <DAG_NAME>
> [2020-03-09 10:30:54,621] \{logging_mixin.py:112} WARNING - /home/ec2-user/venv/lib64/python3.7/site-packages/pymysql/cursors.py:170: Warning: (1300, "Invalid utf8mb4 character string: '80047D'")
>   result = self._query(query)
> [2020-03-09 10:30:54,634] \{scheduler_job.py:1272} INFO - Created <DagRun <DAG_NAME> @ 2020-03-09T10:30:00+00:00: scheduled__2020-03-09T10:30:00+00:00, externally triggered: False>
> [2020-03-09 10:30:54,638] \{scheduler_job.py:740} INFO - Examining DAG run <DagRun <DAG_NAME> @ 2020-03-09 10:30:00+00:00: scheduled__2020-03-09T10:30:00+00:00, externally triggered: False>
> [2020-03-09 10:30:54,660] \{scheduler_job.py:440} INFO - Skipping SLA check for <DAG: <DAG_NAME>> because no tasks in DAG have SLAs
> [2020-03-09 10:30:54,665] \{scheduler_job.py:1613} INFO - Creating / updating <TaskInstance: <DAG_NAME>.<TASK_NAME> 2020-03-09 10:30:00+00:00 [scheduled]> in ORM
> [2020-03-09 10:30:54,683] \{scheduler_job.py:161} INFO - Processing /home/ec2-user/airflow/s3fuse/dags/<DAG_NAME>.py took 0.193 seconds



--
This message was sent by Atlassian Jira
(v8.3.4#803005)