You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (Jira)" <ji...@apache.org> on 2022/04/07 18:22:00 UTC

[jira] [Comment Edited] (FLINK-25022) ClassLoader leak with ThreadLocals on the JM when submitting a job through the REST API

    [ https://issues.apache.org/jira/browse/FLINK-25022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453396#comment-17453396 ] 

Chesnay Schepler edited comment on FLINK-25022 at 4/7/22 6:21 PM:
------------------------------------------------------------------

master: 1f212f2ef04e36e0248098a26e7db43a6d65796a
1.14: 5207fe560dc6a054beb0eb0a25af009215ca9f23
1.13: 59d19caf3687dfd3dcaadc14cc11c6bbdf33198e
1.12: fdd52a787260e2d4dd97473a74e7e45222dbd099


was (Author: zentol):
master: 1f212f2ef04e36e0248098a26e7db43a6d65796a
1.14: 5207fe560dc6a054beb0eb0a25af009215ca9f23
1.13: 59d19caf3687dfd3dcaadc14cc11c6bbdf33198e 

> ClassLoader leak with ThreadLocals on the JM when submitting a job through the REST API
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-25022
>                 URL: https://issues.apache.org/jira/browse/FLINK-25022
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / REST
>    Affects Versions: 1.14.0, 1.12.5, 1.13.3
>            Reporter: Nico Kruber
>            Assignee: Chesnay Schepler
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.15.0, 1.12.8, 1.13.6, 1.14.3
>
>
> If a job is submitted using the REST API's {{/jars/:jarid/run}} endpoint, user code has to be executed on the JobManager and it is doing this in a couple of (pooled) dispatcher threads like {{{}Flink-DispatcherRestEndpoint-thread-*{}}}.
> If the user code is using thread locals (and not cleaning them up), they may remain in the thread with references to the {{ChildFirstClassloader}} of the job and thus leaking that.
> We saw this for the {{jsoniter}} scala library at the JM which [creates ThreadLocal instances|https://github.com/plokhotnyuk/jsoniter-scala/blob/95c7053cfaa558877911f3448382f10d53c4fcbf/jsoniter-scala-core/jvm/src/main/scala/com/github/plokhotnyuk/jsoniter_scala/core/package.scala] but doesn't remove them, but it can actually happen with any user code or (worse) library used in user code.
>  
> There are a few *workarounds* a user can use, e.g. putting the library in Flink's lib/ folder or submitting via the Flink CLI, but these may actually not be possible to use, depending on the circumstances.
>  
> A *proper fix* should happen in Flink by guarding against any of these things in the dispatcher threads. We could, for example, spawn a separate thread for executing the user's {{main()}} method and once the job is submitted exit that thread and destroy all thread locals along with it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)