You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by bat man <ti...@gmail.com> on 2021/03/05 08:23:53 UTC

java.lang.OutOfMemoryError: GC overhead limit exceeded

Hi,

Getting the below OOM but the job failed 4-5 times and recovered from there.

j







*ava.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit
exceeded        at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:212)
      at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.performDefaultAction(SourceStreamTask.java:132)
      at
org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
      at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
      at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)Caused by:
java.lang.OutOfMemoryError: GC overhead limit exceeded*

Is there any way I can debug this. since the job after a few re-starts
started running fine. what could be the reason behind this.

Thanks,
Hemant

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by bat man <ti...@gmail.com>.
The Java options should not have the double quotes. That was the issue. I
was able to generate the heap dump. based on the dump have made some
changes in the code to fix this issue.

This worked -

env.java.opts: -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/dump.hprof

Thanks.

On Mon, 8 Mar 2021 at 7:48 AM, Xintong Song <to...@gmail.com> wrote:

> Hi Hemant,
> I don't see any problem in your settings. Any exceptions suggesting why TM
> containers are not coming up?
>
> Thank you~
>
> Xintong Song
>
>
>
> On Sat, Mar 6, 2021 at 3:53 PM bat man <ti...@gmail.com> wrote:
>
>> Hi Xintong Song,
>> I tried using the java options to generate heap dump referring to docs[1]
>> in flink-conf.yaml, however after adding this the task manager containers
>> are not coming up. Note that I am using EMR. Am i doing anything wrong here?
>>
>> env.java.opts: "-XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=/tmp/dump.hprof"
>>
>> Thanks,
>> Hemant
>>
>>
>>
>>
>>
>> On Fri, Mar 5, 2021 at 3:05 PM Xintong Song <to...@gmail.com>
>> wrote:
>>
>>> Hi Hemant,
>>>
>>> This exception generally suggests that JVM is running out of heap
>>> memory. Per the official documentation [1], the amount of live data barely
>>> fits into the Java heap having little free space for new allocations.
>>>
>>> You can try to increase the heap size following these guides [2].
>>>
>>> If a memory leak is suspected, to further understand where the memory is
>>> consumed, you may need to dump the heap on OOMs and looking for unexpected
>>> memory usages leveraging profiling tools.
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>> [1]
>>> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html
>>>
>>> [2]
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup.html
>>>
>>>
>>>
>>> On Fri, Mar 5, 2021 at 4:24 PM bat man <ti...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Getting the below OOM but the job failed 4-5 times and recovered from
>>>> there.
>>>>
>>>> j
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *ava.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit
>>>> exceeded        at
>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:212)
>>>>       at
>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.performDefaultAction(SourceStreamTask.java:132)
>>>>       at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
>>>>       at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
>>>>       at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>>>>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>>>> at java.lang.Thread.run(Thread.java:748)Caused by:
>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded*
>>>>
>>>> Is there any way I can debug this. since the job after a few re-starts
>>>> started running fine. what could be the reason behind this.
>>>>
>>>> Thanks,
>>>> Hemant
>>>>
>>>

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Xintong Song <to...@gmail.com>.
Hi Hemant,
I don't see any problem in your settings. Any exceptions suggesting why TM
containers are not coming up?

Thank you~

Xintong Song



On Sat, Mar 6, 2021 at 3:53 PM bat man <ti...@gmail.com> wrote:

> Hi Xintong Song,
> I tried using the java options to generate heap dump referring to docs[1]
> in flink-conf.yaml, however after adding this the task manager containers
> are not coming up. Note that I am using EMR. Am i doing anything wrong here?
>
> env.java.opts: "-XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/tmp/dump.hprof"
>
> Thanks,
> Hemant
>
>
>
>
>
> On Fri, Mar 5, 2021 at 3:05 PM Xintong Song <to...@gmail.com> wrote:
>
>> Hi Hemant,
>>
>> This exception generally suggests that JVM is running out of heap memory.
>> Per the official documentation [1], the amount of live data barely fits
>> into the Java heap having little free space for new allocations.
>>
>> You can try to increase the heap size following these guides [2].
>>
>> If a memory leak is suspected, to further understand where the memory is
>> consumed, you may need to dump the heap on OOMs and looking for unexpected
>> memory usages leveraging profiling tools.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>> [1]
>> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html
>>
>> [2]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup.html
>>
>>
>>
>> On Fri, Mar 5, 2021 at 4:24 PM bat man <ti...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Getting the below OOM but the job failed 4-5 times and recovered from
>>> there.
>>>
>>> j
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *ava.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit
>>> exceeded        at
>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:212)
>>>       at
>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.performDefaultAction(SourceStreamTask.java:132)
>>>       at
>>> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
>>>       at
>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
>>>       at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>>>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>>> at java.lang.Thread.run(Thread.java:748)Caused by:
>>> java.lang.OutOfMemoryError: GC overhead limit exceeded*
>>>
>>> Is there any way I can debug this. since the job after a few re-starts
>>> started running fine. what could be the reason behind this.
>>>
>>> Thanks,
>>> Hemant
>>>
>>

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Tamir Sagi <Ta...@niceactimize.com>.
Hey Bruce Wayne,

I can suggest you 3 options to get the heap dump :


  1.  Dont define file name in the dump path, just set the folder. JVM will automatically create the hprof file in case of OOM
env.java.opts: "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"

  2.  it #1 still does not work for some reason or you don want to wait until the error occurs, you can dump it any time using cli tool.
install openjdk-11-jdk and use jcmd tool to dump the heap. jcmd <process id> GC.heap_dump  /tmp/heap-dump.hprof

  3.  If you don't want to deal with CLI tools, connect visual-Vm(here<https://visualvm.github.io/>) to Java process
- Download the application
- Run the java process with the following JVM args (The port number can be any available number):
  -Dcom.sun.management.jmxremote
  -Dcom.sun.management.jmxremote.port=9010
  -Dcom.sun.management.jmxremote.local.only=false
  -Dcom.sun.management.jmxremote.authenticate=false
  -Dcom.sun.management.jmxremote.ssl=false
  -Dcom.sun.management.jmxremote.rmi.port=9010
  -Djava.rmi.server.hostname=localhost
Note: place these flags before you call to -jar <name>.jar --> java <args> -jar <jar-name>.jar
connect with visual vm, just add JMX connection where the address in your case is
localhost:9010
[cid:f79daa4b-01f8-424b-90f0-de06c61560d9]
[cid:4ce76f81-a5e0-4687-a9de-dea778e8262f]

You can watch the heap in real time + create heap dump from visual VM
Note: If you are running the Flink application on top of Docker/Kubernetes you need a port forwarding.


Tamir
[https://my-email-signature.link/signature.gif?u=1088647&e=139754194&v=d477c259f28e676b4dbd4713659b67c3878f1b51aa8896d275d899720fca504f]
________________________________
From: bat man <ti...@gmail.com>
Sent: Saturday, March 6, 2021 9:53 AM
To: Xintong Song <to...@gmail.com>
Cc: user <us...@flink.apache.org>
Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded


EXTERNAL EMAIL


Hi Xintong Song,
I tried using the java options to generate heap dump referring to docs[1] in flink-conf.yaml, however after adding this the task manager containers are not coming up. Note that I am using EMR. Am i doing anything wrong here?

env.java.opts: "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/dump.hprof"

Thanks,
Hemant





On Fri, Mar 5, 2021 at 3:05 PM Xintong Song <to...@gmail.com>> wrote:
Hi Hemant,

This exception generally suggests that JVM is running out of heap memory. Per the official documentation [1], the amount of live data barely fits into the Java heap having little free space for new allocations.

You can try to increase the heap size following these guides [2].

If a memory leak is suspected, to further understand where the memory is consumed, you may need to dump the heap on OOMs and looking for unexpected memory usages leveraging profiling tools.


Thank you~

Xintong Song


[1] https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html

[2] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup.html


On Fri, Mar 5, 2021 at 4:24 PM bat man <ti...@gmail.com>> wrote:
Hi,

Getting the below OOM but the job failed 4-5 times and recovered from there.

java.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:212)
        at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.performDefaultAction(SourceStreamTask.java:132)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

Is there any way I can debug this. since the job after a few re-starts started running fine. what could be the reason behind this.

Thanks,
Hemant

Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by bat man <ti...@gmail.com>.
Hi Xintong Song,
I tried using the java options to generate heap dump referring to docs[1]
in flink-conf.yaml, however after adding this the task manager containers
are not coming up. Note that I am using EMR. Am i doing anything wrong here?

env.java.opts: "-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/tmp/dump.hprof"

Thanks,
Hemant





On Fri, Mar 5, 2021 at 3:05 PM Xintong Song <to...@gmail.com> wrote:

> Hi Hemant,
>
> This exception generally suggests that JVM is running out of heap memory.
> Per the official documentation [1], the amount of live data barely fits
> into the Java heap having little free space for new allocations.
>
> You can try to increase the heap size following these guides [2].
>
> If a memory leak is suspected, to further understand where the memory is
> consumed, you may need to dump the heap on OOMs and looking for unexpected
> memory usages leveraging profiling tools.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html
>
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup.html
>
>
>
> On Fri, Mar 5, 2021 at 4:24 PM bat man <ti...@gmail.com> wrote:
>
>> Hi,
>>
>> Getting the below OOM but the job failed 4-5 times and recovered from
>> there.
>>
>> j
>>
>>
>>
>>
>>
>>
>>
>> *ava.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit
>> exceeded        at
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:212)
>>       at
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.performDefaultAction(SourceStreamTask.java:132)
>>       at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
>>       at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
>>       at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>> at java.lang.Thread.run(Thread.java:748)Caused by:
>> java.lang.OutOfMemoryError: GC overhead limit exceeded*
>>
>> Is there any way I can debug this. since the job after a few re-starts
>> started running fine. what could be the reason behind this.
>>
>> Thanks,
>> Hemant
>>
>

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Xintong Song <to...@gmail.com>.
Hi Hemant,

This exception generally suggests that JVM is running out of heap memory.
Per the official documentation [1], the amount of live data barely fits
into the Java heap having little free space for new allocations.

You can try to increase the heap size following these guides [2].

If a memory leak is suspected, to further understand where the memory is
consumed, you may need to dump the heap on OOMs and looking for unexpected
memory usages leveraging profiling tools.

Thank you~

Xintong Song


[1]
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html

[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup.html



On Fri, Mar 5, 2021 at 4:24 PM bat man <ti...@gmail.com> wrote:

> Hi,
>
> Getting the below OOM but the job failed 4-5 times and recovered from
> there.
>
> j
>
>
>
>
>
>
>
> *ava.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit
> exceeded        at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.checkThrowSourceExecutionException(SourceStreamTask.java:212)
>       at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.performDefaultAction(SourceStreamTask.java:132)
>       at
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:298)
>       at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:403)
>       at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>     at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> at java.lang.Thread.run(Thread.java:748)Caused by:
> java.lang.OutOfMemoryError: GC overhead limit exceeded*
>
> Is there any way I can debug this. since the job after a few re-starts
> started running fine. what could be the reason behind this.
>
> Thanks,
> Hemant
>