You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Chesnay Schepler (Jira)" <ji...@apache.org> on 2022/06/24 13:45:00 UTC

[jira] [Commented] (FLINK-28248) Metaspace memory is leaking when repeatedly submitting Beam batch pipelines via the REST API

    [ https://issues.apache.org/jira/browse/FLINK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558502#comment-17558502 ] 

Chesnay Schepler commented on FLINK-28248:
------------------------------------------

Please attach the heap dump, or if that is not possible investigate the gc root that prevents the classloaders from being garbage collected.

At first glance this is likely caused by a thread being leaked.

> Metaspace memory is leaking when repeatedly submitting Beam batch pipelines via the REST API
> --------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28248
>                 URL: https://issues.apache.org/jira/browse/FLINK-28248
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Core
>    Affects Versions: 1.14.4
>            Reporter: Arkadiusz Gasinski
>            Priority: Major
>         Attachments: image-2022-06-24-14-45-51-689.png, image-2022-06-24-14-51-47-909.png, image-2022-06-24-15-07-43-035.png
>
>
> We have a Flink cluster running on k8s/OpenShift in session mode running our Apache Beam pipelines. Some of these pipelines are streaming pipelines and run continuously; some are batch pipelines submitted periodically whenever there is a load to be processed.
> We believe that the batch pipelines cause the issue. We submit 1 to several batch jobs every 5 minutes. For each job, a new instance of the ChildFirstClassLoader is instantiated and it looks like they are not closed properly after the job finishes.
> Attached is the screenshot from the Eclipse memory analyzer - from the Leak Suspects report. When the heap dump was captured, there were 2 streaming and several batch jobs running plus over 100 finished batch jobs.
> !image-2022-06-24-14-45-51-689.png!
> In our current setup, we allocate 8GB for the metaspace:
> !image-2022-06-24-14-51-47-909.png!
>  
> And the top components from the mem analyzer:
> !image-2022-06-24-15-07-43-035.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)