You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (Jira)" <ji...@apache.org> on 2022/11/09 09:41:00 UTC

[jira] [Closed] (FLINK-29927) AkkaUtils#getAddress may cause memory leak

     [ https://issues.apache.org/jira/browse/FLINK-29927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chesnay Schepler closed FLINK-29927.
------------------------------------
    Resolution: Fixed

master: 304122eadc52dd6ee8c04d9777d97eb66aec5e0e
1.16: 313e30483e9770d18461b5dc655da423d465b7e3
1.15: 6feaa440ffce2afc6b0222c49de598b94e60c825

> AkkaUtils#getAddress may cause memory leak
> ------------------------------------------
>
>                 Key: FLINK-29927
>                 URL: https://issues.apache.org/jira/browse/FLINK-29927
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / RPC
>    Affects Versions: 1.16.0, 1.15.2
>            Reporter: Gen Luo
>            Assignee: Chesnay Schepler
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.17.0, 1.15.3, 1.16.1
>
>         Attachments: RemoteAddressExtensionLeaking.png
>
>
> We found a slow memory leak in JM. When MetricFetcherImpl tries to retrieve metrics, it always call MetricQueryServiceRetriever#retrieveService first. And the method will acquire the address of a task manager, which will use AkkaUtil#getAddress internally. While the getAddress method is implemented like this:
> {code:java}
>     public static Address getAddress(ActorSystem system) {
>         return new RemoteAddressExtension().apply(system).getAddress();
>     }
> {code}
> and the RemoteAddressExtension#apply is like this:
> {code:scala}
>   def apply(system: ActorSystem): T = {
>     java.util.Objects.requireNonNull(system, "system must not be null!").registerExtension(this)
>   }
> {code}
> This means every call of AkkaUtils#getAddress will register a new extension to the ActorSystem, and can never be released until the ActorSystem exits.
> Most of the usage of the method are called only once while initializing, but as described above, MetricFetcherImpl will also use the method. It can happens periodically while users open the WebUI, or happens when the users call the RESTful API directly to get metrics. This means the memory may keep leaking. 
> The leak may be introduced in FLINK-23662 when porting the scala version of AkkaUtils to the java one, while I'm not sure if the scala version has the same issue.
> The leak seems very slow. We observed it on a job running for more than one month with only 1G memory for job manager. So I suppose it's not an emergency one but still needs to fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)