You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Rohit Singh (JIRA)" <ji...@apache.org> on 2018/03/26 16:32:00 UTC

[jira] [Commented] (FLINK-9080) Flink Scheduler goes OOM, suspecting a memory leak

    [ https://issues.apache.org/jira/browse/FLINK-9080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414093#comment-16414093 ] 

Rohit Singh commented on FLINK-9080:
------------------------------------

Based on documentation on Flink, Tried adding job in the flink lib of scheduler and task manager  to avoid dynamic class loading 

https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html

Getting following error
{code:java}
Class=o.a.f.r.e.ExecutionGraph Msg=Source: Custom Source -> Sink: Unnamed (1/1) (3f12f6953a235eb43f07cdf7966b5fcf) switched from RUNNING to FAILED.
org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot instantiate user function.
at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:235) ~[iot-mirror-device.jar:na]
at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:95) ~[iot-mirror-device.jar:na]
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:231) ~[iot-mirror-device.jar:na]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) ~[iot-mirror-device.jar:na]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
Caused by: java.lang.ClassCastException: cannot assign instance of org.apache.commons.collections.map.LinkedMap to field org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.pendingOffsetsToCommit of type org.apache.commons.collections.map.LinkedMap in instance of org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133) ~[na:1.8.0_91]
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305) ~[na:1.8.0_91]
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024) ~[na:1.8.0_91]
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) ~[na:1.8.0_91]
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) ~[na:1.8.0_91]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) ~[na:1.8.0_91]
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) ~[na:1.8.0_91]
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) ~[na:1.8.0_91]
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) ~[na:1.8.0_91]
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) ~[na:1.8.0_91]
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) ~[na:1.8.0_91]
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:290) ~[iot-mirror-device.jar:na]
{code}
 

 

> Flink Scheduler goes OOM, suspecting a memory leak
> --------------------------------------------------
>
>                 Key: FLINK-9080
>                 URL: https://issues.apache.org/jira/browse/FLINK-9080
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.4.0
>            Reporter: Rohit Singh
>            Priority: Critical
>         Attachments: Top Level packages.JPG, Top level classes.JPG, classesloaded vs unloaded.png
>
>
> Running FLink version 1.4.0. on mesos,scheduler running along  with job manager in single container, whereas task managers running in seperate containers.
> Couple of jobs were running continously, Flink scheduler was working properlyalong with task managers. Due to some change in data, one of the jobs started failing continuously. In the meantime,there was a surge in  flink scheduler memory usually eventually died out off OOM
>  
> Memory dump analysis was done, 
> Following were findings  !Top Level packages.JPG!!Top level classes.JPG!
>  *  Majority of top loaded packages retaining heap indicated towards Flinkuserclassloader, glassfish(jersey library), Finalizer classes. (Top level package image)
>  * Top level classes were of Flinkuserclassloader, (Top Level class image)
>  * The number of classes loaded vs unloaded was quite less  PFA,inspite of adding jvm options of -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled , PFAclassloaded vs unloaded graph, scheduler was restarted 3 times
>  * There were custom classes as well which were duplicated during subsequent class uploads
> PFA all the images of heap dump.  Can you suggest some pointers on as to how to overcome this issue.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)