You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/19 11:33:36 UTC
[GitHub] [airflow] stromal opened a new issue, #27131: Scheduled runtime and actual runtime totally different
stromal opened a new issue, #27131:
URL: https://github.com/apache/airflow/issues/27131
### Apache Airflow version
2.4.1
### What happened
# Situation
- I have a DAG file that is scheduled to rune at 1:30 am UTC
- but in the logs it shows that it have started running at 2022-10-19, 08:09:12 UTC
- It have run for about the same period as before and delivered the data correctly from point A to B
- I haven't modified this DAG file for days
- there is nothing else scheduled for this time period at all
- If I look at the previouse logs sometimes it star running at 1:30 am UTC sometimes at totally different random times
- (these are Saturday and Sunday runs,
- I am the only one to acces to airflow and the files,
- noone else modified them,
- and I defiantly not touched or triggered them on Saturday and Sunday)
# System
- ubuntu 20.04 LTS
- Docker Airflow
- AWS EC2
- x86 AMD / x86 Intel CPU
# Dockerfile
```
FROM apache/airflow:2.4.1-python3.8
# ExternalPythonOperator
# Compulsory to switch parameter to use your own venve via ExternalPythonOperator
ENV PIP_USER=false
#python venv setup that you have created locally
RUN python3 -m venv /opt/airflow/my_virtual_env
# Install dependencies:
COPY airflow/requirements.txt .
# copy over folder that containes a my own local pip package
COPY airflow/my_own_pip_package_folder /opt/airflow/my_own_pip_package_folder
# copy over ahtever file
COPY my_file.py /opt/airflow/my_file.py
# install packages in my own docker virtual environemnt
RUN /opt/airflow/my_virtual_env/bin/pip install -r requirements.txt
RUN /opt/airflow/my_virtual_env/bin/pip install /opt/airflow/my_own_pip_package_folder
RUN /opt/airflow/my_virtual_env/bin/pip install 'apache-airflow==2.4.1' --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.4.1/constraints-3.8.txt"
```
# My DAG File
```
"""
Example DAG demonstrating the usage of the TaskFlow API to execute Python functions natively and within a
virtual environment.
"""
from __future__ import annotations
import logging
import os
import shutil
import sys
import tempfile
import time
from pprint import pprint
import pendulum
from airflow import DAG
from airflow.decorators import task
log = logging.getLogger(__name__)
PYTHON = sys.executable
BASE_DIR = tempfile.gettempdir()
with DAG(
dag_id='my_dag',
schedule='30 1 * * *', # IT IS IN UTC. EX.: 11:12 UTC = 12:12am BST = 12:12am UK time
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
tags=['Redshift'],
) as dag:
@task.external_python(task_id="my_dag_task", python='/opt/airflow/my_virtual_env/bin/python3')
def go():
"""
Example function that will be performed in a virtual environment.
Importing at the module level ensures that it will not attempt to import the
library before it is installed.
"""
# my imports
# my code
external_python_task = go()
```
# Logs
## Incorrect time run log, but rune without failure (Wednesday early morning)
```
4d37309d53a3
*** Reading local file: /opt/airflow/logs/dag_id=my_dag/run_id=scheduled__2022-10-18T01:30:00+00:00/task_id=my_dag_task/attempt=1.log
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: my_dag.my_dag_task scheduled__2022-10-18T01:30:00+00:00 [queued]>
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: my_dag.my_dag_task scheduled__2022-10-18T01:30:00+00:00 [queued]>
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1362} INFO -
--------------------------------------------------------------------------------
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1363} INFO - Starting attempt 1 of 1
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1364} INFO -
--------------------------------------------------------------------------------
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1383} INFO - Executing <Task(_PythonExternalDecoratedOperator): my_dag_task> on 2022-10-18 01:30:00+00:00
[2022-10-19, 08:09:12 UTC] {standard_task_runner.py:54} INFO - Started process 58 to run task
[2022-10-19, 08:09:12 UTC] {standard_task_runner.py:82} INFO - Running: ['airflow', 'tasks', 'run', 'my_dag', 'my_dag_task', 'scheduled__2022-10-18T01:30:00+00:00', '--job-id', '235', '--raw', '--subdir', 'DAGS_FOLDER/my_dag.py', '--cfg-path', '/tmp/tmpps6y22oi']
[2022-10-19, 08:09:12 UTC] {standard_task_runner.py:83} INFO - Job 235: Subtask my_dag_task
[2022-10-19, 08:09:12 UTC] {dagbag.py:525} INFO - Filling up the DagBag from /opt/airflow/dags/my_dag.py
[2022-10-19, 08:09:12 UTC] {task_command.py:384} INFO - Running <TaskInstance: my_dag.my_dag_task scheduled__2022-10-18T01:30:00+00:00 [running]> on host 4d37309d53a3
[2022-10-19, 08:09:13 UTC] {taskinstance.py:1590} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=my_dag
AIRFLOW_CTX_TASK_ID=my_dag_task
AIRFLOW_CTX_EXECUTION_DATE=2022-10-18T01:30:00+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-10-18T01:30:00+00:00
[2022-10-19, 08:09:16 UTC] {process_utils.py:179} INFO - Executing cmd: /opt/airflow/my_virtual_env/bin/python3 /tmp/tmdkd47ow0r/script.py /tmp/tmdkd47ow0r/script.in /tmp/tmdkd47ow0r/script.out /tmp/tmdkd47ow0r/string_args.txt
[2022-10-19, 08:09:16 UTC] {process_utils.py:183} INFO - Output:
[2022-10-19, 08:09:18 UTC] {process_utils.py:187} INFO - WARNING:root:/opt/airflow/logs/scheduler/latest already exists as a dir/file. Skip creating symlink.
[2022-10-19, 08:09:22 UTC] {process_utils.py:187} INFO - startdate: 2022-10-18T00:00:00.000 enddate: 2022-10-18T23:59:59.000
[2022-10-19, 08:09:22 UTC] {process_utils.py:187} INFO - [[34m2022-10-19 08:09:22,282[0m] {[34mcredentials.py:[0m1108} INFO[0m - Found credentials from IAM Role: BI-Read-Creds-2[0m
[2022-10-19, 08:09:30 UTC] {python.py:177} INFO - Done. Returned value was: [True, 'schema.table_sql']
[2022-10-19, 08:09:30 UTC] {taskinstance.py:1401} INFO - Marking task as SUCCESS. dag_id=my_dag, task_id=my_dag_task, execution_date=20221018T013000, start_date=20221019T080912, end_date=20221019T080930
[2022-10-19, 08:09:31 UTC] {local_task_job.py:164} INFO - Task exited with return code 0
[2022-10-19, 08:09:31 UTC] {local_task_job.py:273} INFO - 0 downstream tasks scheduled from follow-on schedule check
```
## Correct run log, but rune without failure (Tuesday evening)
```
9adeff0b1844
*** Reading local file: /opt/airflow/logs/dag_id=my_dag/run_id=scheduled__2022-10-17T01:30:00+00:00/task_id=my_dag_task/attempt=1.log
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: my_dag.my_dag_task scheduled__2022-10-17T01:30:00+00:00 [queued]>
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1165} INFO - Dependencies all met for <TaskInstance: my_dag.my_dag_task scheduled__2022-10-17T01:30:00+00:00 [queued]>
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1362} INFO -
--------------------------------------------------------------------------------
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1363} INFO - Starting attempt 1 of 1
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1364} INFO -
--------------------------------------------------------------------------------
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1383} INFO - Executing <Task(_PythonExternalDecoratedOperator): my_dag_task> on 2022-10-17 01:30:00+00:00
[2022-10-18, 01:30:00 UTC] {standard_task_runner.py:54} INFO - Started process 23538 to run task
[2022-10-18, 01:30:00 UTC] {standard_task_runner.py:82} INFO - Running: ['airflow', 'tasks', 'run', 'my_dag', 'my_dag_task', 'scheduled__2022-10-17T01:30:00+00:00', '--job-id', '212', '--raw', '--subdir', 'DAGS_FOLDER/my_dag.py', '--cfg-path', '/tmp/tmp4aa6swro']
[2022-10-18, 01:30:00 UTC] {standard_task_runner.py:83} INFO - Job 212: Subtask my_dag_task
[2022-10-18, 01:30:00 UTC] {dagbag.py:525} INFO - Filling up the DagBag from /opt/airflow/dags/my_dag.py
[2022-10-18, 01:30:01 UTC] {task_command.py:384} INFO - Running <TaskInstance: my_dag.my_dag_task scheduled__2022-10-17T01:30:00+00:00 [running]> on host 9adeff0b1844
[2022-10-18, 01:30:01 UTC] {taskinstance.py:1590} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=my_dag
AIRFLOW_CTX_TASK_ID=my_dag_task
AIRFLOW_CTX_EXECUTION_DATE=2022-10-17T01:30:00+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-10-17T01:30:00+00:00
[2022-10-18, 01:30:02 UTC] {process_utils.py:179} INFO - Executing cmd: /opt/airflow/my_virtual_env/bin/python3 /tmp/tmdebmcqmsm/script.py /tmp/tmdebmcqmsm/script.in /tmp/tmdebmcqmsm/script.out /tmp/tmdebmcqmsm/string_args.txt
[2022-10-18, 01:30:02 UTC] {process_utils.py:183} INFO - Output:
[2022-10-18, 01:30:03 UTC] {process_utils.py:187} INFO - WARNING:root:/opt/airflow/logs/scheduler/latest already exists as a dir/file. Skip creating symlink.
[2022-10-18, 01:30:06 UTC] {process_utils.py:187} INFO - startdate: 2022-10-17T00:00:00.000 enddate: 2022-10-17T23:59:59.000
[2022-10-18, 01:30:06 UTC] {process_utils.py:187} INFO - [[34m2022-10-18 01:30:06,223[0m] {[34mcredentials.py:[0m1108} INFO[0m - Found credentials from ......[0m
[2022-10-18, 01:30:17 UTC] {python.py:177} INFO - Done. Returned value was: [True, 'schema.table_sql']
[2022-10-18, 01:30:17 UTC] {taskinstance.py:1401} INFO - Marking task as SUCCESS. dag_id=my_dag, task_id=my_dag_task, execution_date=20221017T013000, start_date=20221018T013000, end_date=20221018T013017
[2022-10-18, 01:30:17 UTC] {local_task_job.py:164} INFO - Task exited with return code 0
[2022-10-18, 01:30:17 UTC] {local_task_job.py:273} INFO - 0 downstream tasks scheduled from follow-on schedule check
```
### What you think should happen instead
Start the file run at the specific period even if it will fail
### How to reproduce
_No response_
### Operating System
Ubuntu 20.04 LTS
### Versions of Apache Airflow Providers
IMAGE extended: ```apache/airflow:2.4.1-python3.8````
### Deployment
Other Docker-based deployment
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #27131: Scheduled runtime and actual runtime totally different
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #27131: Scheduled runtime and actual runtime totally different
URL: https://github.com/apache/airflow/issues/27131
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #27131: Scheduled runtime and actual runtime totally different
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #27131:
URL: https://github.com/apache/airflow/issues/27131#issuecomment-1291208983
i think this is misunderstanding. I do not fully understand what you tried to do and what was wrong, because you mention both "scheduled" runs and "triggering" and those two are completely different. "Triggered" tasks are run when you trigger them. Scheduled tasks run when their schedule comes. Regarding the first two runs, I think there was a case that just enabled task run twice (this is what **could** have happened) with the first two runs.
All the other runs I assume you triggered manually so they - correctly - started as soon you triggered them without looking at the schedule (this is how triggered tasks work)
unless i completely misunderstood what you wanted to explain. In which case I suggest you try to explain it more precisely.
Converting it to a discussion untill more information is provided.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org