You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Chesnay Schepler <ch...@apache.org> on 2021/03/01 08:30:49 UTC
Re: Suspected classloader leak in Flink 1.11.1
I'd suggest to take a heap dump and investigate what is referencing
these classloaders; chances are that some thread isn't being cleaned up.
On 2/28/2021 3:46 PM, Kezhu Wang wrote:
> Hi Tamir,
>
> You could check
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code
> <https://ci.apache.org/projects/flink/flink-docs-stable/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code> for
> known class loading issues.
>
> Besides this, I think GC.class_histogram(even filtered) could help us
> listing suspected objects.
>
>
> Best,
> Kezhu Wang
>
>
> On February 28, 2021 at 21:25:07, Tamir Sagi
> (tamir.sagi@niceactimize.com <ma...@niceactimize.com>) wrote:
>
>>
>> Hey all,
>>
>> We are encountering memory issues on a Flink client and task
>> managers, which I would like to raise here.
>>
>> we are running Flink on a session cluster (version 1.11.1) on
>> Kubernetes, submitting batch jobs with Flink client on Spring boot
>> application (using RestClusterClient).
>>
>> When jobs are being submitted and running, one after another, We see
>> that the metaspace memory(with max size of 1GB) keeps increasing,
>> as well as linear increase in the heap memory (though it's a more
>> moderate increase). We do see GC working on the heap and releasing
>> some resources.
>>
>> By analyzing the memory of the client Java application with profiling
>> tools, We saw that there are many instances of Flink's
>> ChildFirstClassLoader (perhaps as the number of jobs which were
>> running), and therefore many instances of the same class, each from a
>> different instance of the Class Loader (as shown in the attached
>> screenshot). Similarly, to the Flink task manager memory.
>>
>> We would expect to see one instance of Class Loader. Therefore, We
>> suspect that the reason for the increase is Class Loaders not being
>> cleaned.
>>
>> Does anyone have some insights about this issue, or ideas how to
>> proceed the investigation?
>>
>>
>> *Flink Client application (VisualVm)*
>>
>>
>>
>> Shallow Size com.fasterxmI.jackson.databind.PropertyMetadata
>> com.fasterxmIjackson.databind.PropertyMetadata
>> com.fasterxmI.jackson.databind.PropertyMetadata
>> com.fasterxmIjackson.databind.PropertyMetadata
>> com.fasterxmI.jackson.databind.PropertyMetadata
>> com.fasterxmIjackson.databind.PropertyMetadata
>> com.fasterxmI.jackson.databind.PropertyMetadata
>> com.fasterxmIjackson.databind.PropertyMetadata
>> com.fasterxmI.jackson.databind.PropertyMetadata
>> com.fasterxmIjackson.databind.PropertyMetadata
>> com.fasterxmI.jackson.databind.PropertyMetadata
>> com.fasterxmIjackson.databind.PropertyMetadata
>> com.fasterxmI.jackson.databind.PropertyMetadata
>> com.fasterxmIjackson.databind.PropertyMetadata
>> com.fasterxmI.jackson.databind.PropertyMetadata
>> com.fasterxmIjackson.databind.PropertyMetadata
>> com.fasterxmI.jackson.databind.PropertyMetadata
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (41)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (79)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (82)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (23)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (36)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (34)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (84)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (92)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (59)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (70)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (3)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (60)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (8)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (17)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (31)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (12)
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (49) Objects 0% 0% 0% 0%
>> 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% Retained Size 120 120 120 120 120
>> 120 120 120 120 120 120 120 120 120 120 120 120 0% 0% 0% 0% 0% 0% 0%
>> 0% 0% 0% 0% 0% 0% 0% 0% 0% z 120 z 120 z 120 z 120 z 120 z 120 z 120
>> z 120 z 120 z 120 z 120 z 120 z 120 z 120 z 120 z 120 z 120 0% 0% 0%
>> 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
>>
>> We have used different GCs but same results.
>>
>>
>> _*Task Manager*_
>>
>>
>> Total Size 4GB
>>
>> metaspace 1GB
>>
>> Off heap 512mb
>>
>>
>> Screenshot form Task manager, 612MB are occupied and not being released.
>>
>>
>> We used jcmd tool and attached 3 files
>>
>> 1. Threads print
>> 2. VM.metaspace output
>> 3. VM.classloader
>>
>> In addition, we have tried calling GC manually, but it did not change
>> much.
>>
>> Thank you
>>
>>
>>
>>
>> Confidentiality: This communication and any attachments are intended
>> for the above-named persons only and may be confidential and/or
>> legally privileged. Any opinions expressed in this communication are
>> not necessarily those of NICE Actimize. If this communication has
>> come to you in error you must take no action based on it, nor must
>> you copy or show it to anyone; please delete/destroy and inform the
>> sender by e-mail immediately.
>> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
>> Viruses: Although we have taken steps toward ensuring that this
>> e-mail and attachments are free from any virus, we advise that in
>> keeping with good computing practice the recipient should ensure they
>> are actually virus free.
>>