You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Dominic Humphries <do...@adzuna.com.INVALID> on 2021/09/02 15:28:32 UTC

Slow GC causing high heap usage

We're trying to upgrade from 8.3.1 to 8.8.1 but my pre-release testing has
shown us some performance issues. Examination of the GC log shows that the
possible cause may be here:

8.3.1 graphs: https://imgur.com/a/ZM9wdob
8.8.1 graphs: https://imgur.com/a/UzMinwJ

The test cycle here is 2 mins with requests; 2 mins no requests; 2 mins
requests. You can see the 8.3 gives what I'd expect - fairly consistent
heap usage, fairly fast & consistent GC durations.
8.8 however shows steadily increasing heap usage in the first request cycle
and a big spike in one of the GC durations.

This broad pattern repeats - with sometimes more than one slow GC operation
for 8.8 - reliably each time I test. Is there any setting that might have
been configured badly that could cause this? Or is it a bug?

Thanks

Dominic

Re: Slow GC causing high heap usage

Posted by Walter Underwood <wu...@wunderwood.org>.
Yes, use a bigger heap and the generational collector. With CMS, we had an 8G heap
with 2G in the short-life generation. Nearly all of the allocations for handling a search 
request are free at the end of that request, so Solr can make use of a LOT of new generation
space.

With Solr 8.6, Java 11, and G1, we’ve moved to 16 G for all JVMs. We had some excessive
GC problems at 8G.

Our clusters don’t do facets, but they do some heavy searching. Our biggest cluster is 60 Mdocs,
8 shards, average query length is 25 terms. We use 36 CPU EC2 instances.

-Xms16g
-Xmx16g
-Xss256k
-XX:+AlwaysPreTouch
-XX:+ExplicitGCInvokesConcurrent
-XX:+ParallelRefProcEnabled
-XX:+PerfDisableSharedMem
-XX:+UseG1GC
-XX:+UseLargePages
-XX:-OmitStackTraceInFastThrow
-XX:MaxGCPauseMillis=250

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 3, 2021, at 9:08 AM, Shawn Heisey <el...@elyograg.org> wrote:
> 
> On 9/3/2021 8:34 AM, Dominic Humphries wrote:
>> Thanks for replying!
>> 
>> it is indeed the same Java (OpenJDK 11) we're using for both. I tried
>> upgrading to 14 and the heap *does* stay a little smaller, but we still see
>> the large (>10sec) GC durations happening.
>> 
>> I've not been able to get switching to CMS to work, unfortunately
> 
> 
> What did you see with CMS?
> 
> One thing that I would try, if you haven't already, and assuming you have enough memory installed, is to increase the max heap size.  If the available headroom is high enough, Java will be far more likely to use the generation-specific collectors, which are much faster than a full GC, and some phases of those collectors are concurrent, meaning they do not pause the app (Solr in this case) while they run.
> 
> If you are running at least version 11u9, then you have access to a new collector that hasn't received a lot of testing from us yet -- Shenandoah.  I enabled Shenandoah on my own Solr install just now ... I had thought it required a newer Java version than 11, so I hadn't tried it yet.  This is what I have in /etc/default/solr.in.sh :
> 
> GC_TUNE=" \
>   -XX:+AlwaysPreTouch \
>   -XX:+UseNUMA \
>   -XX:+UseShenandoahGC \
>   -XX:+ParallelRefProcEnabled \
>   -XX:ParallelGCThreads=6 \
> "
> 
> My server is NUMA hardware, which is why I included that option.  I have 12 CPU cores, so I told Java to use 6 threads for GC.  It is probably unnecessary to include ParallelRefProcEnabled but I like to be explicit in case defaults change in later versions of Java.
> 
> I would recommend trying Shenandoah if you have a new enough version of OpenJDK.  It's supposed to be the best low-pause option currently available.  And consider making your heap larger, if you have enough memory.   Make sure there's still enough un-allocated memory available for caching purposes -- you don't want to use all the memory in the machine for Solr's heap.
> 
> If my install ever gets around to doing GC (its heap is a lot bigger than it really needs to be) then I might know whether Shenandoah is beneficial.
> 
> Thanks,
> Shawn
> 


Re: Slow GC causing high heap usage

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/3/2021 10:08 AM, Shawn Heisey wrote:
> If my install ever gets around to doing GC (its heap is a lot bigger 
> than it really needs to be) then I might know whether Shenandoah is 
> beneficial. 

Followup:

Something interesting... it seems that Shenadoah will do a collection 
every five minutes even if activity is so small that GC wouldn't 
normally be necessary.

After 15 minutes of runtime, I let gceasy tackle my tiny little log:

https://www.dropbox.com/s/0ivey9lrnrzpim1/gceasy-shenandoah.png?dl=0

As you can see, the current heap size is very small.  The GC pauses are 
also very small.  This Solr install is for my mail server, specifically 
for dovecot.

I also came across another collector called ZGC that is available in 
Java 11.  I know even less about that one than I did about Shenandoah.

An interesting read:

https://blogs.oracle.com/javamagazine/understanding-the-jdks-new-superfast-garbage-collectors

There is a commercial JVM available that promises near zero pauses even 
on terabyte size heaps.  I have no idea how much it costs.  I bet it's 
very pricy:

https://www.azul.com/products/prime/

Thanks,
Shawn


Re: Slow GC causing high heap usage

Posted by Shawn Heisey <el...@elyograg.org>.
On 9/3/2021 8:34 AM, Dominic Humphries wrote:
> Thanks for replying!
>
> it is indeed the same Java (OpenJDK 11) we're using for both. I tried
> upgrading to 14 and the heap *does* stay a little smaller, but we still see
> the large (>10sec) GC durations happening.
>
> I've not been able to get switching to CMS to work, unfortunately


What did you see with CMS?

One thing that I would try, if you haven't already, and assuming you 
have enough memory installed, is to increase the max heap size.  If the 
available headroom is high enough, Java will be far more likely to use 
the generation-specific collectors, which are much faster than a full 
GC, and some phases of those collectors are concurrent, meaning they do 
not pause the app (Solr in this case) while they run.

If you are running at least version 11u9, then you have access to a new 
collector that hasn't received a lot of testing from us yet -- 
Shenandoah.  I enabled Shenandoah on my own Solr install just now ... I 
had thought it required a newer Java version than 11, so I hadn't tried 
it yet.  This is what I have in /etc/default/solr.in.sh :

GC_TUNE=" \
   -XX:+AlwaysPreTouch \
   -XX:+UseNUMA \
   -XX:+UseShenandoahGC \
   -XX:+ParallelRefProcEnabled \
   -XX:ParallelGCThreads=6 \
"

My server is NUMA hardware, which is why I included that option.  I have 
12 CPU cores, so I told Java to use 6 threads for GC.  It is probably 
unnecessary to include ParallelRefProcEnabled but I like to be explicit 
in case defaults change in later versions of Java.

I would recommend trying Shenandoah if you have a new enough version of 
OpenJDK.  It's supposed to be the best low-pause option currently 
available.  And consider making your heap larger, if you have enough 
memory.   Make sure there's still enough un-allocated memory available 
for caching purposes -- you don't want to use all the memory in the 
machine for Solr's heap.

If my install ever gets around to doing GC (its heap is a lot bigger 
than it really needs to be) then I might know whether Shenandoah is 
beneficial.

Thanks,
Shawn


Re: Slow GC causing high heap usage

Posted by Dominic Humphries <do...@adzuna.com.INVALID>.
Thanks for replying!

it is indeed the same Java (OpenJDK 11) we're using for both. I tried
upgrading to 14 and the heap *does* stay a little smaller, but we still see
the large (>10sec) GC durations happening.

I've not been able to get switching to CMS to work, unfortunately

On Thu, 2 Sept 2021 at 16:52, Shawn Heisey <ap...@elyograg.org> wrote:

> On 9/2/2021 9:28 AM, Dominic Humphries wrote:
> > We're trying to upgrade from 8.3.1 to 8.8.1 but my pre-release testing
> has
> > shown us some performance issues. Examination of the GC log shows that
> the
> > possible cause may be here:
> >
> > 8.3.1 graphs: https://imgur.com/a/ZM9wdob
> > 8.8.1 graphs: https://imgur.com/a/UzMinwJ
> >
> > The test cycle here is 2 mins with requests; 2 mins no requests; 2 mins
> > requests. You can see the 8.3 gives what I'd expect - fairly consistent
> > heap usage, fairly fast & consistent GC durations.
> > 8.8 however shows steadily increasing heap usage in the first request
> cycle
> > and a big spike in one of the GC durations.
>
> With such a small heap, 12 seconds for a GC seems extremely excessive.
> It simply shouldn't take that long for a heap that size.  If your heap
> were 8GB or more, then I could understand that happening.  The general
> solution when there are full GCs is to increase the heap size so that
> Java is more likely to choose the faster generation-specific collections
> instead of full GC.
>
> My recommendation would be to upgrade or change your Java version.  If
> you're about to tell me that it's the same Java version in both cases,
> there could be differences in how the two versions work that cause the
> newer one to trigger a bug in Java's GC that doesn't happen with the
> older version.  The latest Java 8 should be OK.  If you want to go with
> a later Java version, whether or not that will work will depend on how
> old your Solr version is.  I would stick with the latest releases of the
> LTS Java versions -- 8, 11, 14, and 17 when it gets released. Previously
> I would have recommended Oracle Java, but they changed their licensing
> so that most people must pay for it, so it would be better to go with
> one of the free alternatives.  OpenJDK would be a great option.  Avoid
> IBM's Java or any vendor that inherits from it -- IBM's Java causes
> known problems.
>
> You could try changing the GC to CMS, instead of the default of G1 that
> Solr now ships with.
>
>
> https://cwiki.apache.org/confluence/display/solr/shawnheisey#ShawnHeisey-CMS(ConcurrentMarkSweep)Collector
>
> That wiki page shows the environment variable as JVM_OPTS, because when
> I wrote it I had my own init script -- at the time, Solr didn't have
> one.  I believe that to use it in solr.in.sh or solr.in.cmd you would
> need to change that to GC_TUNE instead.  I will get around to changing
> it to reflect modern Solr versions.
>
> Thanks,
> Shawn
>
>

Re: Slow GC causing high heap usage

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/2/2021 9:28 AM, Dominic Humphries wrote:
> We're trying to upgrade from 8.3.1 to 8.8.1 but my pre-release testing has
> shown us some performance issues. Examination of the GC log shows that the
> possible cause may be here:
>
> 8.3.1 graphs: https://imgur.com/a/ZM9wdob
> 8.8.1 graphs: https://imgur.com/a/UzMinwJ
>
> The test cycle here is 2 mins with requests; 2 mins no requests; 2 mins
> requests. You can see the 8.3 gives what I'd expect - fairly consistent
> heap usage, fairly fast & consistent GC durations.
> 8.8 however shows steadily increasing heap usage in the first request cycle
> and a big spike in one of the GC durations.

With such a small heap, 12 seconds for a GC seems extremely excessive.  
It simply shouldn't take that long for a heap that size.  If your heap 
were 8GB or more, then I could understand that happening.  The general 
solution when there are full GCs is to increase the heap size so that 
Java is more likely to choose the faster generation-specific collections 
instead of full GC.

My recommendation would be to upgrade or change your Java version.  If 
you're about to tell me that it's the same Java version in both cases, 
there could be differences in how the two versions work that cause the 
newer one to trigger a bug in Java's GC that doesn't happen with the 
older version.  The latest Java 8 should be OK.  If you want to go with 
a later Java version, whether or not that will work will depend on how 
old your Solr version is.  I would stick with the latest releases of the 
LTS Java versions -- 8, 11, 14, and 17 when it gets released. Previously 
I would have recommended Oracle Java, but they changed their licensing 
so that most people must pay for it, so it would be better to go with 
one of the free alternatives.  OpenJDK would be a great option.  Avoid 
IBM's Java or any vendor that inherits from it -- IBM's Java causes 
known problems.

You could try changing the GC to CMS, instead of the default of G1 that 
Solr now ships with.

https://cwiki.apache.org/confluence/display/solr/shawnheisey#ShawnHeisey-CMS(ConcurrentMarkSweep)Collector

That wiki page shows the environment variable as JVM_OPTS, because when 
I wrote it I had my own init script -- at the time, Solr didn't have 
one.  I believe that to use it in solr.in.sh or solr.in.cmd you would 
need to change that to GC_TUNE instead.  I will get around to changing 
it to reflect modern Solr versions.

Thanks,
Shawn


Re: Slow GC causing high heap usage

Posted by Dominic Humphries <do...@adzuna.com.INVALID>.
AFAIK they're the defaults that come from the installation:

java -server -Xmx1944m -XX:+UseG1GC -XX:+PerfDisableSharedMem
-XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+UseLargePages
-XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent
-Xlog:gc*:file=/srv/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
-Dsolr.jetty.inetaccess.includes= -Dsolr.jetty.inetaccess.excludes=
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=18983
-Dcom.sun.management.jmxremote.rmi.port=18983 -Dsolr.log.dir=/srv/solr/logs
-Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
-XX:-OmitStackTraceInFastThrow -Djetty.home=/usr/local/solr/server
-Dsolr.solr.home=/srv/solr/data -Dsolr.data.home=
-Dsolr.install.dir=/usr/local/solr
-Dsolr.default.confdir=/usr/local/solr/server/solr/configsets/_default/conf
-Dlog4j.configurationFile=/srv/solr/log4j2.xml
-Dsolr.disable.shardsWhitelist=true -Xss256k -jar start.jar --module=http

On Thu, 2 Sept 2021 at 17:35, Walter Underwood <wu...@wunderwood.org>
wrote:

> What are your JVM settings (heap and GC)?
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Sep 2, 2021, at 8:28 AM, Dominic Humphries <do...@adzuna.com.INVALID>
> wrote:
> >
> > We're trying to upgrade from 8.3.1 to 8.8.1 but my pre-release testing
> has
> > shown us some performance issues. Examination of the GC log shows that
> the
> > possible cause may be here:
> >
> > 8.3.1 graphs: https://imgur.com/a/ZM9wdob
> > 8.8.1 graphs: https://imgur.com/a/UzMinwJ
> >
> > The test cycle here is 2 mins with requests; 2 mins no requests; 2 mins
> > requests. You can see the 8.3 gives what I'd expect - fairly consistent
> > heap usage, fairly fast & consistent GC durations.
> > 8.8 however shows steadily increasing heap usage in the first request
> cycle
> > and a big spike in one of the GC durations.
> >
> > This broad pattern repeats - with sometimes more than one slow GC
> operation
> > for 8.8 - reliably each time I test. Is there any setting that might have
> > been configured badly that could cause this? Or is it a bug?
> >
> > Thanks
> >
> > Dominic
>
>

Re: Slow GC causing high heap usage

Posted by Walter Underwood <wu...@wunderwood.org>.
What are your JVM settings (heap and GC)?

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 2, 2021, at 8:28 AM, Dominic Humphries <do...@adzuna.com.INVALID> wrote:
> 
> We're trying to upgrade from 8.3.1 to 8.8.1 but my pre-release testing has
> shown us some performance issues. Examination of the GC log shows that the
> possible cause may be here:
> 
> 8.3.1 graphs: https://imgur.com/a/ZM9wdob
> 8.8.1 graphs: https://imgur.com/a/UzMinwJ
> 
> The test cycle here is 2 mins with requests; 2 mins no requests; 2 mins
> requests. You can see the 8.3 gives what I'd expect - fairly consistent
> heap usage, fairly fast & consistent GC durations.
> 8.8 however shows steadily increasing heap usage in the first request cycle
> and a big spike in one of the GC durations.
> 
> This broad pattern repeats - with sometimes more than one slow GC operation
> for 8.8 - reliably each time I test. Is there any setting that might have
> been configured badly that could cause this? Or is it a bug?
> 
> Thanks
> 
> Dominic