You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Adam Angeli (Jira)" <ji...@apache.org> on 2020/02/25 08:17:00 UTC

[jira] [Comment Edited] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

    [ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044214#comment-17044214 ] 

Adam Angeli edited comment on AIRFLOW-5071 at 2/25/20 8:16 AM:
---------------------------------------------------------------

Another variant of this involves {{_executeHelper_}} _from {{SchedulerJob}}._  It changes the state of tasks to _queued_ and adds them to the executor's {{queued_tasks}}.  It then proceeds to heartbeat the executor, which may not process all the {{queued_tasks}} if there aren't open slots.  It follows up by calling {{_change_state_for_tasks_failed_to_execute_}}_, which resets the task state to _scheduled_ for anything that couldn't be processed.  But it leaves them in {{queued_tasks}} and they will eventually make their way to a celery worker, still in the _scheduled_ state.  It looks like this may be fixed in {{1.7.0}}+, as the task instance is removed from {{queued_tasks}} after the state is set back to _scheduled_.  As a stopgap, you could try increasing your {{core.parallelism}} setting to avoid the scenario where you don't have any open slots.


was (Author: adam.angeli):
Another variant of this involves {{_executeHelper}} from {{SchedulerJob}}.  It changes the state of tasks to _queued_ and adds them to the executor's {{queued_tasks}}.  It then proceeds to heartbeat the executor, which may not process all the {{queued_tasks}} if there aren't open slots.  It follows up by calling {{_change_state_for_tasks_failed_to_execute}}, which resets the task state to _scheduled_ for anything that couldn't be processed.  But it leaves them in {{queued_tasks}} and they will eventually make their way to a celery worker, still in the _scheduled_ state.  It looks like this may be fixed in {{1.7.0}}+, as the task instance is removed from {{queued_tasks}} after the state is set back to _scheduled_.  As a stopgap, you could try increasing your {{core.parallelism}} setting to avoid the scenario where you don't have any open slots.

> Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5071
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, scheduler
>    Affects Versions: 1.10.3
>            Reporter: msempere
>            Priority: Critical
>         Attachments: image-2020-01-27-18-10-29-124.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)