You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Shantanu (JIRA)" <ji...@apache.org> on 2019/08/02 22:19:00 UTC
[jira] [Created] (AIRFLOW-5102) Workers fail to shutdown jobs after
failed heartbeats
Shantanu created AIRFLOW-5102:
---------------------------------
Summary: Workers fail to shutdown jobs after failed heartbeats
Key: AIRFLOW-5102
URL: https://issues.apache.org/jira/browse/AIRFLOW-5102
Project: Apache Airflow
Issue Type: Bug
Components: worker
Affects Versions: 1.10.3
Reporter: Shantanu
Assignee: Shantanu
If a LocalTaskJob fails to heartbeat for scheduler_zombie_task_threshold, it should shut itself down: [https://github.com/apache/airflow/blob/f34e13a/airflow/jobs/local_task_job.py#L109]
However, at some point, a change was made to catch exceptions inside the heartbeat: [https://github.com/apache/airflow/blob/f34e13a/airflow/jobs/base_job.py#L194]
LocalTaskJob now thinks heartbeats always succeed.
This effectively means that zombie tasks don't shut themselves down. When the scheduler reschedules the job, this means we could have two instances of the task running concurrently.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)