You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Chesnay Schepler <ch...@apache.org> on 2021/03/01 08:30:49 UTC

Re: Suspected classloader leak in Flink 1.11.1

I'd suggest to take a heap dump and investigate what is referencing 
these classloaders; chances are that some thread isn't being cleaned up.

On 2/28/2021 3:46 PM, Kezhu Wang wrote:
> Hi Tamir,
>
> You could check 
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code 
> <https://ci.apache.org/projects/flink/flink-docs-stable/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code> for 
> known class loading issues.
>
> Besides this, I think GC.class_histogram(even filtered) could help us 
> listing suspected objects.
>
>
> Best,
> Kezhu Wang
>
>
> On February 28, 2021 at 21:25:07, Tamir Sagi 
> (tamir.sagi@niceactimize.com <ma...@niceactimize.com>) wrote:
>
>>
>> Hey all,
>>
>> We are encountering memory issues on a Flink client and task 
>> managers, which I would like to raise here.
>>
>> we are running Flink on a session cluster (version 1.11.1) on 
>> Kubernetes, submitting batch jobs with Flink client on Spring boot 
>> application (using RestClusterClient).
>>
>> When jobs are being submitted and running, one after another, We see 
>> that the metaspace memory(with max size of  1GB)  keeps increasing, 
>> as well as linear increase in the heap memory (though it's a more 
>> moderate increase). We do see GC working on the heap and releasing 
>> some resources.
>>
>> By analyzing the memory of the client Java application with profiling 
>> tools, We saw that there are many instances of Flink's 
>> ChildFirstClassLoader (perhaps as the number of jobs which were 
>> running), and therefore many instances of the same class, each from a 
>> different instance of the Class Loader (as shown in the attached 
>> screenshot). Similarly, to the Flink task manager memory.
>>
>> We would expect to see one instance of Class Loader. Therefore, We 
>> suspect that the reason for the increase is Class Loaders not being 
>> cleaned.
>>
>> Does anyone have some insights about this issue, or ideas how to 
>> proceed the investigation?
>>
>>
>> *Flink Client application (VisualVm)*
>>
>>
>>
>> Shallow Size com.fasterxmI.jackson.databind.PropertyMetadata 
>> com.fasterxmIjackson.databind.PropertyMetadata 
>> com.fasterxmI.jackson.databind.PropertyMetadata 
>> com.fasterxmIjackson.databind.PropertyMetadata 
>> com.fasterxmI.jackson.databind.PropertyMetadata 
>> com.fasterxmIjackson.databind.PropertyMetadata 
>> com.fasterxmI.jackson.databind.PropertyMetadata 
>> com.fasterxmIjackson.databind.PropertyMetadata 
>> com.fasterxmI.jackson.databind.PropertyMetadata 
>> com.fasterxmIjackson.databind.PropertyMetadata 
>> com.fasterxmI.jackson.databind.PropertyMetadata 
>> com.fasterxmIjackson.databind.PropertyMetadata 
>> com.fasterxmI.jackson.databind.PropertyMetadata 
>> com.fasterxmIjackson.databind.PropertyMetadata 
>> com.fasterxmI.jackson.databind.PropertyMetadata 
>> com.fasterxmIjackson.databind.PropertyMetadata 
>> com.fasterxmI.jackson.databind.PropertyMetadata 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (41) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (79) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (82) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (23) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (36) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (34) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (84) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (92) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (59) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (70) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (3) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (60) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (8) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (17) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (31) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (12) 
>> org.apache.fIink.utiI.ChiIdFirstCIassLoader (49) Objects 0% 0% 0% 0% 
>> 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% Retained Size 120 120 120 120 120 
>> 120 120 120 120 120 120 120 120 120 120 120 120 0% 0% 0% 0% 0% 0% 0% 
>> 0% 0% 0% 0% 0% 0% 0% 0% 0% z 120 z 120 z 120 z 120 z 120 z 120 z 120 
>> z 120 z 120 z 120 z 120 z 120 z 120 z 120 z 120 z 120 z 120 0% 0% 0% 
>> 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
>>
>> We have used different GCs but same results.
>>
>>
>> _*Task Manager*_
>>
>>
>> Total Size 4GB
>>
>> metaspace 1GB
>>
>> Off heap 512mb
>>
>>
>> Screenshot form Task manager, 612MB are occupied and not being released.
>>
>>
>> We used jcmd tool and attached 3 files
>>
>>  1. Threads print
>>  2. VM.metaspace output
>>  3. VM.classloader
>>
>> In addition, we have tried calling GC manually, but it did not change 
>> much.
>>
>> Thank you
>>
>>
>>
>>
>> Confidentiality: This communication and any attachments are intended 
>> for the above-named persons only and may be confidential and/or 
>> legally privileged. Any opinions expressed in this communication are 
>> not necessarily those of NICE Actimize. If this communication has 
>> come to you in error you must take no action based on it, nor must 
>> you copy or show it to anyone; please delete/destroy and inform the 
>> sender by e-mail immediately.
>> Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
>> Viruses: Although we have taken steps toward ensuring that this 
>> e-mail and attachments are free from any virus, we advise that in 
>> keeping with good computing practice the recipient should ensure they 
>> are actually virus free.
>>