You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/05/21 22:45:00 UTC

[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

    [ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349502#comment-17349502 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-----------------------------------------

tazimmerman commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-846297186


   I'm able to reproduce this using the standard `TimeDeltaSensor` with Airflow 1.10.14 and the `DaskExecutor`. The logs aren't helpful (at least to someone unfamiliar with the source code), but it's clear that there is a "hiccup" where the occasional retry fails.
   
   For example, the attempts before & after this one are successful, and on this try, there is an ominous message:
   
   `Dependencies not met for <TaskInstance: daily.wait_until_9_30_pm 2021-05-20T05:00:00+00:00 [up_for_retry]>, dependency 'Not In Retry Period' FAILED: Task is not ready for retry yet but will be retried automatically. Current date is 2021-05-21T21:42:46.765694+00:00 and task will be retried at 2021-05-21T21:47:45.333113+00:00.`
   
   ```
   --------------------------------------------------------------------------------
   [2021-05-21 16:41:43,452] {taskinstance.py:881} INFO - Starting attempt 1 of 2
   [2021-05-21 16:41:43,452] {taskinstance.py:882} INFO -
   --------------------------------------------------------------------------------
   [2021-05-21 16:41:43,457] {taskinstance.py:901} INFO - Executing <Task(TimeDeltaSensor): wait_until_9_30_pm> on 2021-05-20T05:00:00+00:00
   [2021-05-21 16:41:43,461] {standard_task_runner.py:54} INFO - Started process 129312 to run task
   [2021-05-21 16:41:43,489] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'daily', 'wait_until_9_30_pm', '2021-05-20T05:00:00+00:00', '--job_id', '25008', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER//home/troy/miniconda3/envs/test/daily.py', '--cfg_path', '/tmp/tmp24yafd42']
   [2021-05-21 16:41:43,490] {standard_task_runner.py:78} INFO - Job 25008: Subtask wait_until_9_30_pm
   [2021-05-21 16:41:43,512] {logging_mixin.py:112} INFO - Running <TaskInstance: daily.wait_until_9_30_pm 2021-05-20T05:00:00+00:00 [running]> on localhost
   [2021-05-21 16:41:43,526] {time_delta_sensor.py:45} INFO - Checking if the time (2021-05-22 02:30:00+00:00) has come
   [2021-05-21 16:41:43,533] {taskinstance.py:1141} INFO - Rescheduling task, marking task as UP_FOR_RESCHEDULE
   [2021-05-21 16:41:48,428] {local_task_job.py:102} INFO - Task exited with return code 0
   [2021-05-21 16:42:46,765] {taskinstance.py:662} INFO - Dependencies not met for <TaskInstance: daily.wait_until_9_30_pm 2021-05-20T05:00:00+00:00 [up_for_retry]>, dependency 'Not In Retry Period' FAILED: Task is not ready for retry yet but will be retried automatically. Current date is 2021-05-21T21:42:46.765694+00:00 and task will be retried at 2021-05-21T21:47:45.333113+00:00.
   [2021-05-21 16:42:46,773] {local_task_job.py:90} INFO - Task is not able to be run
   [2021-05-21 16:47:53,044] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: daily.wait_until_9_30_pm 2021-05-20T05:00:00+00:00 [queued]>
   [2021-05-21 16:47:53,059] {taskinstance.py:670} INFO - Dependencies all met for <TaskInstance: daily.wait_until_9_30_pm 2021-05-20T05:00:00+00:00 [queued]>
   [2021-05-21 16:47:53,059] {taskinstance.py:880} INFO -
   ```
   
   This happens every hour or so with `poke_interval=300`. I was hoping to use switch most of our sensors to `mode="reschedule"` but now I'm afraid it'll just mean even more alert emails.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5071
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, scheduler
>    Affects Versions: 1.10.3
>            Reporter: msempere
>            Priority: Critical
>             Fix For: 1.10.12
>
>         Attachments: image-2020-01-27-18-10-29-124.png, image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)