You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Brian Braun <ja...@gmail.com> on 2023/11/16 20:26:37 UTC

Java/Tomcat is being killed by the Linux OOM killer for using a huge amount of RAM. How can I know what was going on inside my app (& Tomcat & the JVM) to make that happen?

Hello,

First of all, this is my stack:

- Ubuntu 22.04.3 on x86/64 with 2GM of physical RAM that has been enough
for years.
- Java 11.0.20.1+1-post-Ubuntu-0ubuntu122.04 / openjdk 11.0.20.1 2023-08-24
- Tomcat 9.0.58 (JAVA_OPTS="-Djava.awt.headless=true -Xmx900m -Xms16m
......")
- My app, which I developed myself, and has been running without any OOM
crashes for years

Well, a couple of weeks ago my website started crushing about every 5-7
days. Between crashes the RAM usage is fine and very steady (as it has been
for years) and it uses just about 50% of the "Max memory" (according to
what the Tomcat Manager server status shows). The 3 types of G1 heap are
steady and low. And there are no leaks as far as I can tell. And I haven't
made any significant changes to my app in the last months.

When my website crashes, I can see on the Ubuntu log that some process has
invoked the "oom-killer" and that this killer investigates which process is
using most of the RAM and it is Tomcat/Java so it kills it. This is what I
see on the log when it was Nginx that invoked the OOM-killer:

Nov 15 15:23:54 ip-172-31-89-211 kernel: [366008.597771]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=nginx.service,mems_allowed=0,global_oom,task_memcg=/system.slice/tomcat9.service,task=java,pid=470,uid=998
Nov 15 15:23:54 ip-172-31-89-211 kernel: [366008.597932] Out of memory:
Killed process 470 (java) total-vm:4553056kB, anon-rss:1527944kB,
file-rss:2872kB, shmem-rss:0kB, UID:998 pgtables:3628kB oom_score_adj:0

I would like to be able to know what was happening inside the JVM when it
was using too much RAM and deserved to be killed. Was it a problem in Java
not associated with Tomcat or my app? Was it Tomcat itself that ate too
much RAM? I doubt it. Was it my application? If it was my application (and
I have to assume it was), how/why was it using all that RAM? What were the
objects, threads, etc that were involved in the crash? What part of the
heap memory was using all that RAM?

This can happen at any time, like at 4am so I can not run to the computer
to see what was going on at that moment. I need some way to get a detailed
log of what was going on when the crush took place.

So my question is, what tool should I use to investigate these crashes? I
have started trying to make "New Relic" work since it seems that this
service could help me, but I am having some problems making it work and I
still don't know if this would be a solution in the first place. So, while
I struggle with New Relic, I would appreciate your suggestions.

Thanks in advance!

Re: Java/Tomcat is being killed by the Linux OOM killer for using a huge amount of RAM. How can I know what was going on inside my app (& Tomcat & the JVM) to make that happen?

Posted by Christopher Schultz <ch...@christopherschultz.net>.
Brian,

On 11/16/23 15:26, Brian Braun wrote:
> First of all, this is my stack:
> 
> - Ubuntu 22.04.3 on x86/64 with 2GM of physical RAM that has been enough
> for years.
> - Java 11.0.20.1+1-post-Ubuntu-0ubuntu122.04 / openjdk 11.0.20.1 2023-08-24
> - Tomcat 9.0.58 (JAVA_OPTS="-Djava.awt.headless=true -Xmx900m -Xms16m
> ......")

Don't bother setting a 16M initial heap and a maximum of 900M. Just set 
them both to 900M. That will cause the JVM to request all of that heap 
up front and lessen the chances of a native OOME.

There are certainly still plenty of reasons the process could use more 
heap than that, of course.

> - My app, which I developed myself, and has been running without any OOM
> crashes for years
> 
> Well, a couple of weeks ago my website started crushing about every 5-7
> days. Between crashes the RAM usage is fine and very steady (as it has been
> for years) and it uses just about 50% of the "Max memory" (according to
> what the Tomcat Manager server status shows). The 3 types of G1 heap are
> steady and low. And there are no leaks as far as I can tell. And I haven't
> made any significant changes to my app in the last months.

I think your problem is native-heap and not Java-heap.

What does 'top' say? You are looking for the "RES" (Resident Size) and 
"VIRT" (Virtual Size) numbers. That's what the process is REALLY using.

How big is your physical RAM? What does this output while running your 
application (after fixing the heap at 900M)?

$ free -m

What else is running on the machine?

Do you have swap enabled?

> When my website crashes, I can see on the Ubuntu log that some process has
> invoked the "oom-killer" and that this killer investigates which process is
> using most of the RAM and it is Tomcat/Java so it kills it. This is what I
> see on the log when it was Nginx that invoked the OOM-killer:
> 
> Nov 15 15:23:54 ip-172-31-89-211 kernel: [366008.597771]
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=nginx.service,mems_allowed=0,global_oom,task_memcg=/system.slice/tomcat9.service,task=java,pid=470,uid=998
> Nov 15 15:23:54 ip-172-31-89-211 kernel: [366008.597932] Out of memory:
> Killed process 470 (java) total-vm:4553056kB, anon-rss:1527944kB,
> file-rss:2872kB, shmem-rss:0kB, UID:998 pgtables:3628kB oom_score_adj:0
> 
> I would like to be able to know what was happening inside the JVM when it
> was using too much RAM and deserved to be killed. Was it a problem in Java
> not associated with Tomcat or my app? Was it Tomcat itself that ate too
> much RAM? I doubt it. Was it my application? If it was my application (and
> I have to assume it was), how/why was it using all that RAM? What were the
> objects, threads, etc that were involved in the crash? What part of the
> heap memory was using all that RAM?

Probably native heap. Java 11 is mature and there are likely no leaks in 
the JVM itself. If your code was using too much Java heap, you'd get 
OutOfMemoryErrors thrown in the JVM but not Linux oom-killer.

But certain native libraries can leak. I seem to recall libgzip or 
something like that can leak if you aren't careful. My guess is that you 
are actually just running very very close to what your hardware can support.

Do you actually need 900M of heap to run your application? We ran for 
years at $work with a 64M heap and only expended it when we started 
getting enough concurrent users to /have/ to expand the heap.

> This can happen at any time, like at 4am so I can not run to the computer
> to see what was going on at that moment. I need some way to get a detailed
> log of what was going on when the crush took place.
> 
> So my question is, what tool should I use to investigate these crashes? I
> have started trying to make "New Relic" work since it seems that this
> service could help me, but I am having some problems making it work and I
> still don't know if this would be a solution in the first place. So, while
> I struggle with New Relic, I would appreciate your suggestions.

You can get a lot of information by configuring your application to dump 
the heap on OOME, but you aren't getting an OOME so that's kind of off 
the table.

I would enable GC logging for sure. That will tell you the status of the 
Java heap, but not the native memory spaces. But you may find that the 
process is performing a GC when it dies or you can see what was 
happening up to the point of the kill.

Is there any pattern to when it crashes? For example... is 04:00 a 
popular time for it to die? Maybe you have a process that runs 
periodically that needs a lot of RAM temporarily.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org