You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Xintong Song (Jira)" <ji...@apache.org> on 2022/12/27 02:11:00 UTC

[jira] [Commented] (FLINK-30505) Close the connection between TM and JM when task executor failed

    [ https://issues.apache.org/jira/browse/FLINK-30505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652087#comment-17652087 ] 

Xintong Song commented on FLINK-30505:
--------------------------------------

I don't see how the proposed change makes a difference. The exception in the 2nd screenshot is not the _real reason_ of the TM failure. It practically said the same thing as the exception in the 1st screenshot, that the TM is no longer reachable. To understand the real reason, you need to check the TM/K8s logs anyway.

> Close the connection between TM and JM when task executor failed
> ----------------------------------------------------------------
>
>                 Key: FLINK-30505
>                 URL: https://issues.apache.org/jira/browse/FLINK-30505
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Task
>    Affects Versions: 1.16.0
>            Reporter: Yongming Zhang
>            Priority: Major
>             Fix For: 1.17.0
>
>
> When resource manager detects a task executor has failed, it will close connection with task executor. At this time,jobs running on this tm will fail for other reasons(no longger reachable or heartbeat timeout).
> !https://intranetproxy.alipay.com/skylark/lark/0/2022/png/336411/1672047809511-a4b8b5d9-f11f-483c-a113-b42290a33250.png|width=1160,id=uc24b1166!
> If close the connection between task executor and job master when resource manager detects a task executor has failed,the real reason for task executor failure will appear in "Root Exception".This will make it easier for users to find problems.
> !https://intranetproxy.alipay.com/skylark/lark/0/2022/png/336411/1672048733572-2b5b7be4-087d-46ae-9c8d-6ad5a1344019.png|width=1141,id=u947d8c4e!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)