You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Zhijiang Wang (JIRA)" <ji...@apache.org> on 2016/09/30 02:32:21 UTC

[jira] [Comment Edited] (FLINK-4715) TaskManager should commit suicide after cancellation failure

    [ https://issues.apache.org/jira/browse/FLINK-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534797#comment-15534797 ] 

Zhijiang Wang edited comment on FLINK-4715 at 9/30/16 2:31 AM:
---------------------------------------------------------------

Yes, we already experienced this problem in real production many times,  because the user code can not be controlled. If the thread is waiting for synchronized lock or other cases, it can not be cancelled. We take the way that if the job master cancel the task failed many times, the job master will let the task manager exit itself.


was (Author: zjwang):
Yes, we already experienced this problem in real production many times,  because the user code can not be controlled. If the thread is waiting for synchronized lock or other cases, it can not be cancelled, and the job master cancel the task failed many times, the job master will let the task manager exit itself.

> TaskManager should commit suicide after cancellation failure
> ------------------------------------------------------------
>
>                 Key: FLINK-4715
>                 URL: https://issues.apache.org/jira/browse/FLINK-4715
>             Project: Flink
>          Issue Type: Improvement
>          Components: TaskManager
>    Affects Versions: 1.2.0
>            Reporter: Till Rohrmann
>             Fix For: 1.2.0
>
>
> In case of a failed cancellation, e.g. the task cannot be cancelled after a given time, the {{TaskManager}} should kill itself. That way we guarantee that there is no resource leak. 
> This behaviour acts as a safety-net against faulty user code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)