You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by "Chris Riccomini (JIRA)" <ji...@apache.org> on 2016/07/28 18:02:20 UTC

[jira] [Commented] (AIRFLOW-374) Kill task instances that haven't been able to heartbeat for a while

    [ https://issues.apache.org/jira/browse/AIRFLOW-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15397918#comment-15397918 ] 

Chris Riccomini commented on AIRFLOW-374:
-----------------------------------------

> The task can monitor the time since last heartbeat and kill itself to prevent such cases.

In your example, if the task can't get access to the DB how does it know? Are you planning on keeping some local in-mem variable that tracks the last successful heartbeat write?

> Kill task instances that haven't been able to heartbeat for a while
> -------------------------------------------------------------------
>
>                 Key: AIRFLOW-374
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-374
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>            Reporter: Paul Yang
>            Assignee: Paul Yang
>             Fix For: Airflow 1.8
>
>
> A task run by the LocalTaskJob periodically updates a timestamp to indicate that the task is still alive and running. If the task is unable to update this timestamp for a long time (for example, due to DB connection errors), the scheduler may reschedule the task to run again. In such a case, it's possible that two instances of the task are running. The task can monitor the time since last heartbeat and kill itself to prevent such cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)