You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yang Wang (Jira)" <ji...@apache.org> on 2020/01/02 03:18:00 UTC

[jira] [Commented] (FLINK-15449) Retain lost task managers on Flink UI

    [ https://issues.apache.org/jira/browse/FLINK-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006553#comment-17006553 ] 

Yang Wang commented on FLINK-15449:
-----------------------------------

I think it is an valid user experience improvement. However, if we retain all the TaskManagers, it will cost more memory in jobmanager. When the taskmanager failover frequently, the jobmanager will OOM. If we add a threshold for removing lost taskmanagers, it will not make much differences with now.

 

I want to share how to debug the lost taskmanager now. First, you need to find which nodemanager the lost taskmanager is located at. Then use the schema \{{http://{RM_Address:PORT}/node/containerlogs/\{container_id}/\{user}}} to construct the log url. The log url could be used until the application is finished.

> Retain lost task managers on Flink UI
> -------------------------------------
>
>                 Key: FLINK-15449
>                 URL: https://issues.apache.org/jira/browse/FLINK-15449
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.9.1
>            Reporter: Victor Wong
>            Priority: Major
>
> With Flink on Yarn, sometimes our TaskManager was killed because of OOM or heartbeat timeout or whatever reasons, it's not convenient to check out the logs of the lost TaskManger.
> Can we retain the lost task managers on Flink UI, and provide the log service through Yarn (we can redirect the URL of log/stdout to Yarn container log/stdout)?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)