You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "future (Jira)" <ji...@apache.org> on 2021/09/29 03:55:00 UTC

[jira] [Created] (FLINK-24401) TM cannot exit after Metaspace OOM

future created FLINK-24401:
------------------------------

             Summary: TM cannot exit after Metaspace OOM
                 Key: FLINK-24401
                 URL: https://issues.apache.org/jira/browse/FLINK-24401
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Task
    Affects Versions: 1.13.0, 1.12.0
            Reporter: future
             Fix For: 1.13.3, 1.14.1
         Attachments: image-2021-09-29-11-45-48-098.png, image-2021-09-29-11-47-47-157.png

Hi masters, from the code and log, we can see that OOM will terminateJVM directly, but Metaspace OutOfMemoryError will graceful shutdown. The code comment mentions: {{_it does not usually require more class loading to fail again with the Metaspace OutOfMemoryError_.}}.

But we encountered: after Metaspace OutOfMemoryError, {{_java.lang.NoClassDefFoundError: Could not initialize class org.apache.flink.runtime.taskexecutor.TaskManagerRunner$Result_.}}, makes Tm unable to exit, keeps trying again, keeps NoClassDefFoundError, keeps class loading failure, until kill tm by manually.

I want to add a catch Throwable in the onFatalError method, and directly terminateJVM() in the catch. Is there any problem with this strategy? 

 

[code link |https://github.com/apache/flink/blob/4fe9f525a92319acc1e3434bebed601306f7a16f/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L312]

picture:

!image-2021-09-29-11-45-48-098.png|width=663,height=343!

 

!image-2021-09-29-11-47-47-157.png!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)