You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Adam Harrison-Fuller <ah...@mintel.com> on 2018/04/11 10:01:18 UTC

Solr OOM Crashes / JVM tuning advice

Hey all,

I was wondering if I could get some JVM/GC tuning advice to resolve an
issue that we are experiencing.

Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
render would be greatly appreciated.

Our Solr cloud nodes are having issues throwing OOM exceptions under load.
This issue has only started manifesting itself over the last few months
during which time the only change I can discern is an increase in index
size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
index is currently 58G and the server has 46G of physical RAM and runs
nothing other than the Solr node.

The JVM is invoked with the following JVM options:
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
-XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 -XX:+ManagementServer
-XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
-XX:NewRatio=3 -XX:OldPLABSize=16
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000 /data/gnpd/solr/logs
-XX:ParallelGCThreads=4
-XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
-XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
-XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

These values were decided upon serveral years by a colleague based upon
some suggestions from this mailing group with an index size ~25G.

I have imported the GC logs into GCViewer and attached a link to a
screenshot showing the lead up to a OOM crash.  Interestingly the young
generation space is almost empty before the repeated GC's and subsequent
crash.
https://imgur.com/a/Wtlez

I was considering slowly increasing the amount of heap available to the JVM
slowly until the crashes, any other suggestions?  I'm looking at trying to
get the nodes stable without having issues with the GC taking forever to
run.

Additional information can be provided on request.

Cheers!
Adam

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in 
England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for 
our other offices can be found at http://www.mintel.com/office-locations 
<http://www.mintel.com/office-locations>.

This email and any attachments 
may include content that is confidential, privileged 
or otherwise 
protected under applicable law. Unauthorised disclosure, copying, 
distribution 
or use of the contents is prohibited and may be unlawful. If 
you have received this email in error,
including without appropriate 
authorisation, then please reply to the sender about the error 
and delete 
this email and any attachments.

Re: Solr OOM Crashes / JVM tuning advice

Posted by Joe Obernberger <jo...@gmail.com>.

Just as a side note, when Solr goes OOM and kills itself, and if you're 
running HDFS, you are guaranteed to have write.lock files left over.  If 
you're running lots of shards/replicas, you may have many files that you 
need to go into HDFS and delete before restarting.

-Joe


On 4/11/2018 10:46 AM, Shawn Heisey wrote:
> On 4/11/2018 4:01 AM, Adam Harrison-Fuller wrote:
>> I was wondering if I could get some JVM/GC tuning advice to resolve an
>> issue that we are experiencing.
>>
>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
>> render would be greatly appreciated.
>>
>> Our Solr cloud nodes are having issues throwing OOM exceptions under load.
>> This issue has only started manifesting itself over the last few months
>> during which time the only change I can discern is an increase in index
>> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
>> index is currently 58G and the server has 46G of physical RAM and runs
>> nothing other than the Solr node.
> The advice I see about tuning your garbage collection won't help you.
> GC tuning can do absolutely nothing about OutOfMemoryError problems.
> Better tuning might *delay* the OOM, but it can't prevent it.
>
> You need to figure out exactly what resource is running out.  Hopefully
> one of the solr logfiles will have the actual OutOfMemoryError exception
> information.  It might not be the heap.
>
> Once you know what resource is running out and causing the OOM, then we
> can look deeper.
>
> A side note:  The OOM is not *technically* causing a crash, even though
> that might be the visible behavior.  When Solr is started on a
> non-windows system with the included scripts, it runs with a parameter
> that calls a script on OOM. That script *very intentionally* kills
> Solr.  This is done because program operation when OOM hits is
> unpredictable, and there's a decent chance that if it keeps running,
> your index will get corrupted.  That could happen anyway, but with quick
> action to kill the program, it's less likely.
>
>> The JVM is invoked with the following JVM options:
>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 -XX:+ManagementServer
>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
>> -XX:NewRatio=3 -XX:OldPLABSize=16
>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000 /data/gnpd/solr/logs
>> -XX:ParallelGCThreads=4
>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>> -XX:TargetSurvivorRatio=90
>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> Solr 5.5.2 includes GC tuning options in its default configuration.
> Unless you'd like to switch to G1, you might want to let Solr's start
> script handle that for you instead of overriding the options.  The
> defaults are substantially similar to what you have defined.
>
>> I have imported the GC logs into GCViewer and attached a link to a
>> screenshot showing the lead up to a OOM crash.  Interestingly the young
>> generation space is almost empty before the repeated GC's and subsequent
>> crash.
>> https://imgur.com/a/Wtlez
> Can you share the actual GC logfile?  You'll need to use a file sharing
> site to do that, attachments almost never work on the mailing list.
>
> The info in the summary to the right of the graph seems to support your
> contention that there is plenty of heap, so the OutOfMemoryError is
> probably not related to heap memory.  You're going to have to look at
> your logfiles to see what the root cause is.
>
> Thanks,
> Shawn
>
>
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
>

Re: Solr OOM Crashes / JVM tuning advice

Posted by Shawn Heisey <ap...@elyograg.org>.

On 4/11/2018 4:01 AM, Adam Harrison-Fuller wrote:
> I was wondering if I could get some JVM/GC tuning advice to resolve an
> issue that we are experiencing.
>
> Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
> render would be greatly appreciated.
>
> Our Solr cloud nodes are having issues throwing OOM exceptions under load.
> This issue has only started manifesting itself over the last few months
> during which time the only change I can discern is an increase in index
> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
> index is currently 58G and the server has 46G of physical RAM and runs
> nothing other than the Solr node.

The advice I see about tuning your garbage collection won't help you. 
GC tuning can do absolutely nothing about OutOfMemoryError problems. 
Better tuning might *delay* the OOM, but it can't prevent it.

You need to figure out exactly what resource is running out.  Hopefully
one of the solr logfiles will have the actual OutOfMemoryError exception
information.  It might not be the heap.

Once you know what resource is running out and causing the OOM, then we
can look deeper.

A side note:  The OOM is not *technically* causing a crash, even though
that might be the visible behavior.  When Solr is started on a
non-windows system with the included scripts, it runs with a parameter
that calls a script on OOM. That script *very intentionally* kills
Solr.  This is done because program operation when OOM hits is
unpredictable, and there's a decent chance that if it keeps running,
your index will get corrupted.  That could happen anyway, but with quick
action to kill the program, it's less likely.

> The JVM is invoked with the following JVM options:
> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 -XX:+ManagementServer
> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> -XX:NewRatio=3 -XX:OldPLABSize=16
> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000 /data/gnpd/solr/logs
> -XX:ParallelGCThreads=4
> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

Solr 5.5.2 includes GC tuning options in its default configuration. 
Unless you'd like to switch to G1, you might want to let Solr's start
script handle that for you instead of overriding the options.  The
defaults are substantially similar to what you have defined.

> I have imported the GC logs into GCViewer and attached a link to a
> screenshot showing the lead up to a OOM crash.  Interestingly the young
> generation space is almost empty before the repeated GC's and subsequent
> crash.
> https://imgur.com/a/Wtlez

Can you share the actual GC logfile?  You'll need to use a file sharing
site to do that, attachments almost never work on the mailing list.

The info in the summary to the right of the graph seems to support your
contention that there is plenty of heap, so the OutOfMemoryError is
probably not related to heap memory.  You're going to have to look at
your logfiles to see what the root cause is.

Thanks,
Shawn

Re: Solr OOM Crashes / JVM tuning advice

Posted by Shawn Heisey <ap...@elyograg.org>.

On 4/11/2018 9:23 AM, Adam Harrison-Fuller wrote:
> In addition, here is the GC log leading up to the crash.
>
> https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_20180410_1009.zip?dl=0

I pulled that log into the http://gceasy.io website. This is a REALLY
nice way to look at GC logs.  I do still use gcviewer for in-depth
analysis, but for quick examination and locating trouble spots, that
website is top-notch.

Based on what I saw, I don't think you're running out of heap.  It's
more likely that it's too many file handles or limits on the number of
processes/threads that can be started.  Most operating systems limit
both of these to 1024 by default.

If you're on a non-windows OS (which I think is probably likely, given
that you have said Solr is crashing), change to the logs directory and
type this command:

grep OutOfMemoryError `ls -1tr solr.log*`

If it finds anything, it should give you the most recent error last.

Thanks,
Shawn

Re: Solr OOM Crashes / JVM tuning advice

Posted by Kevin Risden <kr...@apache.org>.

I'm going to share how I've debugged a similar OOM crash and solving it had
nothing to do with increasing heap.

https://risdenk.github.io/2017/12/18/ambari-infra-solr-ranger.html

This is specifically for Apache Ranger and how to fix it but you can treat
it just like any application using Solr.

There were a few things that caused issues "out of the blue":

   - Document TTL
      - The documents getting deleted after some time would trigger OOM
      (due to caches taking up too much heap)
   - Extra query load
      - caches again taking up too much memory
   - Extra inserts
      - too many commits refreshing caches and again going OOM

Many of these can be reduced by using docvalues for fields that you
typically sort/filter on.

Kevin Risden

On Wed, Apr 11, 2018 at 6:01 PM, Deepak Goel <de...@gmail.com> wrote:

> A few observations:
>
> 1. The Old Gen Heap on 9th April is about 6GB occupied which then runs up
> to 9+GB on 10th April (It steadily increases throughout the day)
> 2. The Old Gen GC is never able to reclaim any free memory
>
>
>
> Deepak
> "Please stop cruelty to Animals, help by becoming a Vegan"
> +91 73500 12833
> deicool@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> On Wed, Apr 11, 2018 at 8:53 PM, Adam Harrison-Fuller <
> aharrison-fuller@mintel.com> wrote:
>
> > In addition, here is the GC log leading up to the crash.
> >
> > https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_
> > 20180410_1009.zip?dl=0
> >
> > Thanks!
> >
> > Adam
> >
> > On 11 April 2018 at 16:18, Adam Harrison-Fuller <
> > aharrison-fuller@mintel.com
> > > wrote:
> >
> > > Thanks for the advice so far.
> > >
> > > The directoryFactory is set to ${solr.directoryFactory:solr.
> > NRTCachingDirectoryFactory}.
> > >
> > >
> > > The servers workload is predominantly queries with updates taking place
> > > once a day.  It seems the servers are more likely to go down whilst the
> > > servers are indexing but not exclusively so.
> > >
> > > I'm having issues locating the actual out of memory exception.  I can
> > tell
> > > that it has ran out of memory as its called the oom_killer script which
> > as
> > > left a log file in the logs directory.  I cannot find the actual
> > exception
> > > in the solr.log or our solr_gc.log, any suggestions?
> > >
> > > Cheers,
> > > Adam
> > >
> > >
> > > On 11 April 2018 at 15:49, Walter Underwood <wu...@wunderwood.org>
> > wrote:
> > >
> > >> For readability, I’d use -Xmx12G instead of
> -XX:MaxHeapSize=12884901888.
> > >> Also, I always use a start size the same as the max size, since
> servers
> > >> will eventually grow to the max size. So:
> > >>
> > >> -Xmx12G -Xms12G
> > >>
> > >> wunder
> > >> Walter Underwood
> > >> wunder@wunderwood.org
> > >> http://observer.wunderwood.org/  (my blog)
> > >>
> > >> > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <
> sujaybawaskar@gmail.com>
> > >> wrote:
> > >> >
> > >> > What is directory factory defined in solrconfig.xml? Your JVM heap
> > >> should
> > >> > be tuned up with respect to that.
> > >> > How solr is being use,  is it more updates and less query or less
> > >> updates
> > >> > more queries?
> > >> > What is OOM error? Is it frequent GC or Error 12?
> > >> >
> > >> > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
> > >> > aharrison-fuller@mintel.com> wrote:
> > >> >
> > >> >> Hey Jesus,
> > >> >>
> > >> >> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to
> > >> them.
> > >> >>
> > >> >> Cheers!
> > >> >> Adam
> > >> >>
> > >> >> On 11 April 2018 at 11:22, Jesus Olivan <je...@letgo.com>
> > >> wrote:
> > >> >>
> > >> >>> Hi Adam,
> > >> >>>
> > >> >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of
> physical
> > >> RAM,
> > >> >>> your JVM can afford more RAM without threading penalties due to
> > >> outside
> > >> >>> heap RAM lacks.
> > >> >>>
> > >> >>> Another good one would be to increase
> -XX:CMSInitiatingOccupancyFrac
> > >> tion
> > >> >>> =50
> > >> >>> to 75. I think that CMS collector works better when Old generation
> > >> space
> > >> >> is
> > >> >>> more populated.
> > >> >>>
> > >> >>> I usually use to set Survivor spaces to lesser size. If you want
> to
> > >> try
> > >> >>> SurvivorRatio to 6, i think performance would be improved.
> > >> >>>
> > >> >>> Another good practice for me would be to set an static NewSize
> > instead
> > >> >>> of -XX:NewRatio=3.
> > >> >>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb
> > (one
> > >> >> third
> > >> >>> of total heap space is recommended).
> > >> >>>
> > >> >>> Finally, my best results after a deep JVM I+D related to Solr,
> came
> > >> >>> removing ScavengeBeforeRemark flag and applying this new one: +
> > >> >>> ParGCCardsPerStrideChunk.
> > >> >>>
> > >> >>> However, It would be a good one to set ParallelGCThreads and
> > >> >>> *ConcGCThreads *to their optimal value, and we need you system CPU
> > >> number
> > >> >>> to know it. Can you provide this data, please?
> > >> >>>
> > >> >>> Regards
> > >> >>>
> > >> >>>
> > >> >>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
> > >> >>> aharrison-fuller@mintel.com
> > >> >>>> :
> > >> >>>
> > >> >>>> Hey all,
> > >> >>>>
> > >> >>>> I was wondering if I could get some JVM/GC tuning advice to
> resolve
> > >> an
> > >> >>>> issue that we are experiencing.
> > >> >>>>
> > >> >>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice
> you
> > >> can
> > >> >>>> render would be greatly appreciated.
> > >> >>>>
> > >> >>>> Our Solr cloud nodes are having issues throwing OOM exceptions
> > under
> > >> >>> load.
> > >> >>>> This issue has only started manifesting itself over the last few
> > >> months
> > >> >>>> during which time the only change I can discern is an increase in
> > >> index
> > >> >>>> size.  They are running Solr 5.5.2 on OpenJDK version
> "1.8.0_101".
> > >> The
> > >> >>>> index is currently 58G and the server has 46G of physical RAM and
> > >> runs
> > >> >>>> nothing other than the Solr node.
> > >> >>>>
> > >> >>>> The JVM is invoked with the following JVM options:
> > >> >>>> -XX:CMSInitiatingOccupancyFraction=50
> > -XX:CMSMaxAbortablePrecleanTim
> > >> e=
> > >> >>> 6000
> > >> >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> > >> >>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
> > >> >>> -XX:+ManagementServer
> > >> >>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> > >> >>>> -XX:NewRatio=3 -XX:OldPLABSize=16
> > >> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
> > >> >>>> /data/gnpd/solr/logs
> > >> >>>> -XX:ParallelGCThreads=4
> > >> >>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> > >> >>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
> > >> -XX:+PrintGCDateStamps
> > >> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> > >> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> > >> >>>> -XX:TargetSurvivorRatio=90
> > >> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+
> UseCompressedClassPointers
> > >> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> > >> >>>>
> > >> >>>> These values were decided upon serveral years by a colleague
> based
> > >> upon
> > >> >>>> some suggestions from this mailing group with an index size ~25G.
> > >> >>>>
> > >> >>>> I have imported the GC logs into GCViewer and attached a link to
> a
> > >> >>>> screenshot showing the lead up to a OOM crash.  Interestingly the
> > >> young
> > >> >>>> generation space is almost empty before the repeated GC's and
> > >> >> subsequent
> > >> >>>> crash.
> > >> >>>> https://imgur.com/a/Wtlez
> > >> >>>>
> > >> >>>> I was considering slowly increasing the amount of heap available
> to
> > >> the
> > >> >>> JVM
> > >> >>>> slowly until the crashes, any other suggestions?  I'm looking at
> > >> trying
> > >> >>> to
> > >> >>>> get the nodes stable without having issues with the GC taking
> > forever
> > >> >> to
> > >> >>>> run.
> > >> >>>>
> > >> >>>> Additional information can be provided on request.
> > >> >>>>
> > >> >>>> Cheers!
> > >> >>>> Adam
> > >> >>>>
> > >> >>>> --
> > >> >>>>
> > >> >>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > >> >>>> Registered in
> > >> >>>> England: Number 1475918. | VAT Number: GB 232 9342 72
> > >> >>>>
> > >> >>>> Contact details for
> > >> >>>> our other offices can be found at http://www.mintel.com/office-
> > >> >> locations
> > >> >>>> <http://www.mintel.com/office-locations>.
> > >> >>>>
> > >> >>>> This email and any attachments
> > >> >>>> may include content that is confidential, privileged
> > >> >>>> or otherwise
> > >> >>>> protected under applicable law. Unauthorised disclosure, copying,
> > >> >>>> distribution
> > >> >>>> or use of the contents is prohibited and may be unlawful. If
> > >> >>>> you have received this email in error,
> > >> >>>> including without appropriate
> > >> >>>> authorisation, then please reply to the sender about the error
> > >> >>>> and delete
> > >> >>>> this email and any attachments.
> > >> >>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >> --
> > >> >>
> > >> >> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > >> >> Registered in
> > >> >> England: Number 1475918. | VAT Number: GB 232 9342 72
> > >> >>
> > >> >> Contact details for
> > >> >> our other offices can be found at http://www.mintel.com/office-l
> > >> ocations
> > >> >> <http://www.mintel.com/office-locations>.
> > >> >>
> > >> >> This email and any attachments
> > >> >> may include content that is confidential, privileged
> > >> >> or otherwise
> > >> >> protected under applicable law. Unauthorised disclosure, copying,
> > >> >> distribution
> > >> >> or use of the contents is prohibited and may be unlawful. If
> > >> >> you have received this email in error,
> > >> >> including without appropriate
> > >> >> authorisation, then please reply to the sender about the error
> > >> >> and delete
> > >> >> this email and any attachments.
> > >> >>
> > >> >>
> > >> >
> > >> >
> > >> > --
> > >> > Thanks,
> > >> > Sujay P Bawaskar
> > >> > M:+91-77091 53669
> > >>
> > >>
> > >
> >
> > --
> >
> > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > Registered in
> > England: Number 1475918. | VAT Number: GB 232 9342 72
> >
> > Contact details for
> > our other offices can be found at http://www.mintel.com/office-locations
> > <http://www.mintel.com/office-locations>.
> >
> > This email and any attachments
> > may include content that is confidential, privileged
> > or otherwise
> > protected under applicable law. Unauthorised disclosure, copying,
> > distribution
> > or use of the contents is prohibited and may be unlawful. If
> > you have received this email in error,
> > including without appropriate
> > authorisation, then please reply to the sender about the error
> > and delete
> > this email and any attachments.
> >
> >
>

Re: Solr OOM Crashes / JVM tuning advice

Posted by Deepak Goel <de...@gmail.com>.

A few observations:

1. The Old Gen Heap on 9th April is about 6GB occupied which then runs up
to 9+GB on 10th April (It steadily increases throughout the day)
2. The Old Gen GC is never able to reclaim any free memory



Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deicool@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Wed, Apr 11, 2018 at 8:53 PM, Adam Harrison-Fuller <
aharrison-fuller@mintel.com> wrote:

> In addition, here is the GC log leading up to the crash.
>
> https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_
> 20180410_1009.zip?dl=0
>
> Thanks!
>
> Adam
>
> On 11 April 2018 at 16:18, Adam Harrison-Fuller <
> aharrison-fuller@mintel.com
> > wrote:
>
> > Thanks for the advice so far.
> >
> > The directoryFactory is set to ${solr.directoryFactory:solr.
> NRTCachingDirectoryFactory}.
> >
> >
> > The servers workload is predominantly queries with updates taking place
> > once a day.  It seems the servers are more likely to go down whilst the
> > servers are indexing but not exclusively so.
> >
> > I'm having issues locating the actual out of memory exception.  I can
> tell
> > that it has ran out of memory as its called the oom_killer script which
> as
> > left a log file in the logs directory.  I cannot find the actual
> exception
> > in the solr.log or our solr_gc.log, any suggestions?
> >
> > Cheers,
> > Adam
> >
> >
> > On 11 April 2018 at 15:49, Walter Underwood <wu...@wunderwood.org>
> wrote:
> >
> >> For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888.
> >> Also, I always use a start size the same as the max size, since servers
> >> will eventually grow to the max size. So:
> >>
> >> -Xmx12G -Xms12G
> >>
> >> wunder
> >> Walter Underwood
> >> wunder@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >> > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <su...@gmail.com>
> >> wrote:
> >> >
> >> > What is directory factory defined in solrconfig.xml? Your JVM heap
> >> should
> >> > be tuned up with respect to that.
> >> > How solr is being use,  is it more updates and less query or less
> >> updates
> >> > more queries?
> >> > What is OOM error? Is it frequent GC or Error 12?
> >> >
> >> > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
> >> > aharrison-fuller@mintel.com> wrote:
> >> >
> >> >> Hey Jesus,
> >> >>
> >> >> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to
> >> them.
> >> >>
> >> >> Cheers!
> >> >> Adam
> >> >>
> >> >> On 11 April 2018 at 11:22, Jesus Olivan <je...@letgo.com>
> >> wrote:
> >> >>
> >> >>> Hi Adam,
> >> >>>
> >> >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical
> >> RAM,
> >> >>> your JVM can afford more RAM without threading penalties due to
> >> outside
> >> >>> heap RAM lacks.
> >> >>>
> >> >>> Another good one would be to increase -XX:CMSInitiatingOccupancyFrac
> >> tion
> >> >>> =50
> >> >>> to 75. I think that CMS collector works better when Old generation
> >> space
> >> >> is
> >> >>> more populated.
> >> >>>
> >> >>> I usually use to set Survivor spaces to lesser size. If you want to
> >> try
> >> >>> SurvivorRatio to 6, i think performance would be improved.
> >> >>>
> >> >>> Another good practice for me would be to set an static NewSize
> instead
> >> >>> of -XX:NewRatio=3.
> >> >>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb
> (one
> >> >> third
> >> >>> of total heap space is recommended).
> >> >>>
> >> >>> Finally, my best results after a deep JVM I+D related to Solr, came
> >> >>> removing ScavengeBeforeRemark flag and applying this new one: +
> >> >>> ParGCCardsPerStrideChunk.
> >> >>>
> >> >>> However, It would be a good one to set ParallelGCThreads and
> >> >>> *ConcGCThreads *to their optimal value, and we need you system CPU
> >> number
> >> >>> to know it. Can you provide this data, please?
> >> >>>
> >> >>> Regards
> >> >>>
> >> >>>
> >> >>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
> >> >>> aharrison-fuller@mintel.com
> >> >>>> :
> >> >>>
> >> >>>> Hey all,
> >> >>>>
> >> >>>> I was wondering if I could get some JVM/GC tuning advice to resolve
> >> an
> >> >>>> issue that we are experiencing.
> >> >>>>
> >> >>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you
> >> can
> >> >>>> render would be greatly appreciated.
> >> >>>>
> >> >>>> Our Solr cloud nodes are having issues throwing OOM exceptions
> under
> >> >>> load.
> >> >>>> This issue has only started manifesting itself over the last few
> >> months
> >> >>>> during which time the only change I can discern is an increase in
> >> index
> >> >>>> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".
> >> The
> >> >>>> index is currently 58G and the server has 46G of physical RAM and
> >> runs
> >> >>>> nothing other than the Solr node.
> >> >>>>
> >> >>>> The JVM is invoked with the following JVM options:
> >> >>>> -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTim
> >> e=
> >> >>> 6000
> >> >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> >> >>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
> >> >>> -XX:+ManagementServer
> >> >>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> >> >>>> -XX:NewRatio=3 -XX:OldPLABSize=16
> >> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
> >> >>>> /data/gnpd/solr/logs
> >> >>>> -XX:ParallelGCThreads=4
> >> >>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> >> >>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
> >> -XX:+PrintGCDateStamps
> >> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> >> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> >> >>>> -XX:TargetSurvivorRatio=90
> >> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> >> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >> >>>>
> >> >>>> These values were decided upon serveral years by a colleague based
> >> upon
> >> >>>> some suggestions from this mailing group with an index size ~25G.
> >> >>>>
> >> >>>> I have imported the GC logs into GCViewer and attached a link to a
> >> >>>> screenshot showing the lead up to a OOM crash.  Interestingly the
> >> young
> >> >>>> generation space is almost empty before the repeated GC's and
> >> >> subsequent
> >> >>>> crash.
> >> >>>> https://imgur.com/a/Wtlez
> >> >>>>
> >> >>>> I was considering slowly increasing the amount of heap available to
> >> the
> >> >>> JVM
> >> >>>> slowly until the crashes, any other suggestions?  I'm looking at
> >> trying
> >> >>> to
> >> >>>> get the nodes stable without having issues with the GC taking
> forever
> >> >> to
> >> >>>> run.
> >> >>>>
> >> >>>> Additional information can be provided on request.
> >> >>>>
> >> >>>> Cheers!
> >> >>>> Adam
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> >> >>>> Registered in
> >> >>>> England: Number 1475918. | VAT Number: GB 232 9342 72
> >> >>>>
> >> >>>> Contact details for
> >> >>>> our other offices can be found at http://www.mintel.com/office-
> >> >> locations
> >> >>>> <http://www.mintel.com/office-locations>.
> >> >>>>
> >> >>>> This email and any attachments
> >> >>>> may include content that is confidential, privileged
> >> >>>> or otherwise
> >> >>>> protected under applicable law. Unauthorised disclosure, copying,
> >> >>>> distribution
> >> >>>> or use of the contents is prohibited and may be unlawful. If
> >> >>>> you have received this email in error,
> >> >>>> including without appropriate
> >> >>>> authorisation, then please reply to the sender about the error
> >> >>>> and delete
> >> >>>> this email and any attachments.
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >> --
> >> >>
> >> >> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> >> >> Registered in
> >> >> England: Number 1475918. | VAT Number: GB 232 9342 72
> >> >>
> >> >> Contact details for
> >> >> our other offices can be found at http://www.mintel.com/office-l
> >> ocations
> >> >> <http://www.mintel.com/office-locations>.
> >> >>
> >> >> This email and any attachments
> >> >> may include content that is confidential, privileged
> >> >> or otherwise
> >> >> protected under applicable law. Unauthorised disclosure, copying,
> >> >> distribution
> >> >> or use of the contents is prohibited and may be unlawful. If
> >> >> you have received this email in error,
> >> >> including without appropriate
> >> >> authorisation, then please reply to the sender about the error
> >> >> and delete
> >> >> this email and any attachments.
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Thanks,
> >> > Sujay P Bawaskar
> >> > M:+91-77091 53669
> >>
> >>
> >
>
> --
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in
> England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for
> our other offices can be found at http://www.mintel.com/office-locations
> <http://www.mintel.com/office-locations>.
>
> This email and any attachments
> may include content that is confidential, privileged
> or otherwise
> protected under applicable law. Unauthorised disclosure, copying,
> distribution
> or use of the contents is prohibited and may be unlawful. If
> you have received this email in error,
> including without appropriate
> authorisation, then please reply to the sender about the error
> and delete
> this email and any attachments.
>
>

Re: Solr OOM Crashes / JVM tuning advice

Posted by Adam Harrison-Fuller <ah...@mintel.com>.

In addition, here is the GC log leading up to the crash.

https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_20180410_1009.zip?dl=0

Thanks!

Adam

On 11 April 2018 at 16:18, Adam Harrison-Fuller <aharrison-fuller@mintel.com
> wrote:

> Thanks for the advice so far.
>
> The directoryFactory is set to ${solr.directoryFactory:solr.NRTCachingDirectoryFactory}.
>
>
> The servers workload is predominantly queries with updates taking place
> once a day.  It seems the servers are more likely to go down whilst the
> servers are indexing but not exclusively so.
>
> I'm having issues locating the actual out of memory exception.  I can tell
> that it has ran out of memory as its called the oom_killer script which as
> left a log file in the logs directory.  I cannot find the actual exception
> in the solr.log or our solr_gc.log, any suggestions?
>
> Cheers,
> Adam
>
>
> On 11 April 2018 at 15:49, Walter Underwood <wu...@wunderwood.org> wrote:
>
>> For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888.
>> Also, I always use a start size the same as the max size, since servers
>> will eventually grow to the max size. So:
>>
>> -Xmx12G -Xms12G
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <su...@gmail.com>
>> wrote:
>> >
>> > What is directory factory defined in solrconfig.xml? Your JVM heap
>> should
>> > be tuned up with respect to that.
>> > How solr is being use,  is it more updates and less query or less
>> updates
>> > more queries?
>> > What is OOM error? Is it frequent GC or Error 12?
>> >
>> > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
>> > aharrison-fuller@mintel.com> wrote:
>> >
>> >> Hey Jesus,
>> >>
>> >> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to
>> them.
>> >>
>> >> Cheers!
>> >> Adam
>> >>
>> >> On 11 April 2018 at 11:22, Jesus Olivan <je...@letgo.com>
>> wrote:
>> >>
>> >>> Hi Adam,
>> >>>
>> >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical
>> RAM,
>> >>> your JVM can afford more RAM without threading penalties due to
>> outside
>> >>> heap RAM lacks.
>> >>>
>> >>> Another good one would be to increase -XX:CMSInitiatingOccupancyFrac
>> tion
>> >>> =50
>> >>> to 75. I think that CMS collector works better when Old generation
>> space
>> >> is
>> >>> more populated.
>> >>>
>> >>> I usually use to set Survivor spaces to lesser size. If you want to
>> try
>> >>> SurvivorRatio to 6, i think performance would be improved.
>> >>>
>> >>> Another good practice for me would be to set an static NewSize instead
>> >>> of -XX:NewRatio=3.
>> >>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one
>> >> third
>> >>> of total heap space is recommended).
>> >>>
>> >>> Finally, my best results after a deep JVM I+D related to Solr, came
>> >>> removing ScavengeBeforeRemark flag and applying this new one: +
>> >>> ParGCCardsPerStrideChunk.
>> >>>
>> >>> However, It would be a good one to set ParallelGCThreads and
>> >>> *ConcGCThreads *to their optimal value, and we need you system CPU
>> number
>> >>> to know it. Can you provide this data, please?
>> >>>
>> >>> Regards
>> >>>
>> >>>
>> >>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
>> >>> aharrison-fuller@mintel.com
>> >>>> :
>> >>>
>> >>>> Hey all,
>> >>>>
>> >>>> I was wondering if I could get some JVM/GC tuning advice to resolve
>> an
>> >>>> issue that we are experiencing.
>> >>>>
>> >>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you
>> can
>> >>>> render would be greatly appreciated.
>> >>>>
>> >>>> Our Solr cloud nodes are having issues throwing OOM exceptions under
>> >>> load.
>> >>>> This issue has only started manifesting itself over the last few
>> months
>> >>>> during which time the only change I can discern is an increase in
>> index
>> >>>> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".
>> The
>> >>>> index is currently 58G and the server has 46G of physical RAM and
>> runs
>> >>>> nothing other than the Solr node.
>> >>>>
>> >>>> The JVM is invoked with the following JVM options:
>> >>>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTim
>> e=
>> >>> 6000
>> >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
>> >>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
>> >>> -XX:+ManagementServer
>> >>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
>> >>>> -XX:NewRatio=3 -XX:OldPLABSize=16
>> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
>> >>>> /data/gnpd/solr/logs
>> >>>> -XX:ParallelGCThreads=4
>> >>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
>> >>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
>> -XX:+PrintGCDateStamps
>> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>> >>>> -XX:TargetSurvivorRatio=90
>> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>> >>>>
>> >>>> These values were decided upon serveral years by a colleague based
>> upon
>> >>>> some suggestions from this mailing group with an index size ~25G.
>> >>>>
>> >>>> I have imported the GC logs into GCViewer and attached a link to a
>> >>>> screenshot showing the lead up to a OOM crash.  Interestingly the
>> young
>> >>>> generation space is almost empty before the repeated GC's and
>> >> subsequent
>> >>>> crash.
>> >>>> https://imgur.com/a/Wtlez
>> >>>>
>> >>>> I was considering slowly increasing the amount of heap available to
>> the
>> >>> JVM
>> >>>> slowly until the crashes, any other suggestions?  I'm looking at
>> trying
>> >>> to
>> >>>> get the nodes stable without having issues with the GC taking forever
>> >> to
>> >>>> run.
>> >>>>
>> >>>> Additional information can be provided on request.
>> >>>>
>> >>>> Cheers!
>> >>>> Adam
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>> >>>> Registered in
>> >>>> England: Number 1475918. | VAT Number: GB 232 9342 72
>> >>>>
>> >>>> Contact details for
>> >>>> our other offices can be found at http://www.mintel.com/office-
>> >> locations
>> >>>> <http://www.mintel.com/office-locations>.
>> >>>>
>> >>>> This email and any attachments
>> >>>> may include content that is confidential, privileged
>> >>>> or otherwise
>> >>>> protected under applicable law. Unauthorised disclosure, copying,
>> >>>> distribution
>> >>>> or use of the contents is prohibited and may be unlawful. If
>> >>>> you have received this email in error,
>> >>>> including without appropriate
>> >>>> authorisation, then please reply to the sender about the error
>> >>>> and delete
>> >>>> this email and any attachments.
>> >>>>
>> >>>>
>> >>>
>> >>
>> >> --
>> >>
>> >> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>> >> Registered in
>> >> England: Number 1475918. | VAT Number: GB 232 9342 72
>> >>
>> >> Contact details for
>> >> our other offices can be found at http://www.mintel.com/office-l
>> ocations
>> >> <http://www.mintel.com/office-locations>.
>> >>
>> >> This email and any attachments
>> >> may include content that is confidential, privileged
>> >> or otherwise
>> >> protected under applicable law. Unauthorised disclosure, copying,
>> >> distribution
>> >> or use of the contents is prohibited and may be unlawful. If
>> >> you have received this email in error,
>> >> including without appropriate
>> >> authorisation, then please reply to the sender about the error
>> >> and delete
>> >> this email and any attachments.
>> >>
>> >>
>> >
>> >
>> > --
>> > Thanks,
>> > Sujay P Bawaskar
>> > M:+91-77091 53669
>>
>>
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in 
England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for 
our other offices can be found at http://www.mintel.com/office-locations 
<http://www.mintel.com/office-locations>.

This email and any attachments 
may include content that is confidential, privileged 
or otherwise 
protected under applicable law. Unauthorised disclosure, copying, 
distribution 
or use of the contents is prohibited and may be unlawful. If 
you have received this email in error,
including without appropriate 
authorisation, then please reply to the sender about the error 
and delete 
this email and any attachments.

Re: Solr OOM Crashes / JVM tuning advice

Posted by Adam Harrison-Fuller <ah...@mintel.com>.

Thanks for the advice so far.

The directoryFactory is set to
${solr.directoryFactory:solr.NRTCachingDirectoryFactory}.

The servers workload is predominantly queries with updates taking place
once a day.  It seems the servers are more likely to go down whilst the
servers are indexing but not exclusively so.

I'm having issues locating the actual out of memory exception.  I can tell
that it has ran out of memory as its called the oom_killer script which as
left a log file in the logs directory.  I cannot find the actual exception
in the solr.log or our solr_gc.log, any suggestions?

Cheers,
Adam


On 11 April 2018 at 15:49, Walter Underwood <wu...@wunderwood.org> wrote:

> For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888.
> Also, I always use a start size the same as the max size, since servers
> will eventually grow to the max size. So:
>
> -Xmx12G -Xms12G
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <su...@gmail.com>
> wrote:
> >
> > What is directory factory defined in solrconfig.xml? Your JVM heap should
> > be tuned up with respect to that.
> > How solr is being use,  is it more updates and less query or less updates
> > more queries?
> > What is OOM error? Is it frequent GC or Error 12?
> >
> > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
> > aharrison-fuller@mintel.com> wrote:
> >
> >> Hey Jesus,
> >>
> >> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to
> them.
> >>
> >> Cheers!
> >> Adam
> >>
> >> On 11 April 2018 at 11:22, Jesus Olivan <je...@letgo.com> wrote:
> >>
> >>> Hi Adam,
> >>>
> >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical
> RAM,
> >>> your JVM can afford more RAM without threading penalties due to outside
> >>> heap RAM lacks.
> >>>
> >>> Another good one would be to increase -XX:
> CMSInitiatingOccupancyFraction
> >>> =50
> >>> to 75. I think that CMS collector works better when Old generation
> space
> >> is
> >>> more populated.
> >>>
> >>> I usually use to set Survivor spaces to lesser size. If you want to try
> >>> SurvivorRatio to 6, i think performance would be improved.
> >>>
> >>> Another good practice for me would be to set an static NewSize instead
> >>> of -XX:NewRatio=3.
> >>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one
> >> third
> >>> of total heap space is recommended).
> >>>
> >>> Finally, my best results after a deep JVM I+D related to Solr, came
> >>> removing ScavengeBeforeRemark flag and applying this new one: +
> >>> ParGCCardsPerStrideChunk.
> >>>
> >>> However, It would be a good one to set ParallelGCThreads and
> >>> *ConcGCThreads *to their optimal value, and we need you system CPU
> number
> >>> to know it. Can you provide this data, please?
> >>>
> >>> Regards
> >>>
> >>>
> >>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
> >>> aharrison-fuller@mintel.com
> >>>> :
> >>>
> >>>> Hey all,
> >>>>
> >>>> I was wondering if I could get some JVM/GC tuning advice to resolve an
> >>>> issue that we are experiencing.
> >>>>
> >>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you
> can
> >>>> render would be greatly appreciated.
> >>>>
> >>>> Our Solr cloud nodes are having issues throwing OOM exceptions under
> >>> load.
> >>>> This issue has only started manifesting itself over the last few
> months
> >>>> during which time the only change I can discern is an increase in
> index
> >>>> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".
> The
> >>>> index is currently 58G and the server has 46G of physical RAM and runs
> >>>> nothing other than the Solr node.
> >>>>
> >>>> The JVM is invoked with the following JVM options:
> >>>> -XX:CMSInitiatingOccupancyFraction=50 -XX:
> CMSMaxAbortablePrecleanTime=
> >>> 6000
> >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> >>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
> >>> -XX:+ManagementServer
> >>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> >>>> -XX:NewRatio=3 -XX:OldPLABSize=16
> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
> >>>> /data/gnpd/solr/logs
> >>>> -XX:ParallelGCThreads=4
> >>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> >>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps
> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> >>>> -XX:TargetSurvivorRatio=90
> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >>>>
> >>>> These values were decided upon serveral years by a colleague based
> upon
> >>>> some suggestions from this mailing group with an index size ~25G.
> >>>>
> >>>> I have imported the GC logs into GCViewer and attached a link to a
> >>>> screenshot showing the lead up to a OOM crash.  Interestingly the
> young
> >>>> generation space is almost empty before the repeated GC's and
> >> subsequent
> >>>> crash.
> >>>> https://imgur.com/a/Wtlez
> >>>>
> >>>> I was considering slowly increasing the amount of heap available to
> the
> >>> JVM
> >>>> slowly until the crashes, any other suggestions?  I'm looking at
> trying
> >>> to
> >>>> get the nodes stable without having issues with the GC taking forever
> >> to
> >>>> run.
> >>>>
> >>>> Additional information can be provided on request.
> >>>>
> >>>> Cheers!
> >>>> Adam
> >>>>
> >>>> --
> >>>>
> >>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> >>>> Registered in
> >>>> England: Number 1475918. | VAT Number: GB 232 9342 72
> >>>>
> >>>> Contact details for
> >>>> our other offices can be found at http://www.mintel.com/office-
> >> locations
> >>>> <http://www.mintel.com/office-locations>.
> >>>>
> >>>> This email and any attachments
> >>>> may include content that is confidential, privileged
> >>>> or otherwise
> >>>> protected under applicable law. Unauthorised disclosure, copying,
> >>>> distribution
> >>>> or use of the contents is prohibited and may be unlawful. If
> >>>> you have received this email in error,
> >>>> including without appropriate
> >>>> authorisation, then please reply to the sender about the error
> >>>> and delete
> >>>> this email and any attachments.
> >>>>
> >>>>
> >>>
> >>
> >> --
> >>
> >> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> >> Registered in
> >> England: Number 1475918. | VAT Number: GB 232 9342 72
> >>
> >> Contact details for
> >> our other offices can be found at http://www.mintel.com/office-
> locations
> >> <http://www.mintel.com/office-locations>.
> >>
> >> This email and any attachments
> >> may include content that is confidential, privileged
> >> or otherwise
> >> protected under applicable law. Unauthorised disclosure, copying,
> >> distribution
> >> or use of the contents is prohibited and may be unlawful. If
> >> you have received this email in error,
> >> including without appropriate
> >> authorisation, then please reply to the sender about the error
> >> and delete
> >> this email and any attachments.
> >>
> >>
> >
> >
> > --
> > Thanks,
> > Sujay P Bawaskar
> > M:+91-77091 53669
>
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in 
England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for 
our other offices can be found at http://www.mintel.com/office-locations 
<http://www.mintel.com/office-locations>.

This email and any attachments 
may include content that is confidential, privileged 
or otherwise 
protected under applicable law. Unauthorised disclosure, copying, 
distribution 
or use of the contents is prohibited and may be unlawful. If 
you have received this email in error,
including without appropriate 
authorisation, then please reply to the sender about the error 
and delete 
this email and any attachments.

Re: Solr OOM Crashes / JVM tuning advice

Posted by Walter Underwood <wu...@wunderwood.org>.

One other note on the JVM options, even though those aren’t the cause of the problem.

Don’t run four GC threads when you have four processors. That can use 100% of CPU just doing GC.

With four processors, I’d run one thread.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 11, 2018, at 7:49 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> 
> For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888. Also, I always use a start size the same as the max size, since servers will eventually grow to the max size. So:
> 
> -Xmx12G -Xms12G
> 
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <su...@gmail.com> wrote:
>> 
>> What is directory factory defined in solrconfig.xml? Your JVM heap should
>> be tuned up with respect to that.
>> How solr is being use,  is it more updates and less query or less updates
>> more queries?
>> What is OOM error? Is it frequent GC or Error 12?
>> 
>> On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
>> aharrison-fuller@mintel.com> wrote:
>> 
>>> Hey Jesus,
>>> 
>>> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to them.
>>> 
>>> Cheers!
>>> Adam
>>> 
>>> On 11 April 2018 at 11:22, Jesus Olivan <je...@letgo.com> wrote:
>>> 
>>>> Hi Adam,
>>>> 
>>>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM,
>>>> your JVM can afford more RAM without threading penalties due to outside
>>>> heap RAM lacks.
>>>> 
>>>> Another good one would be to increase -XX:CMSInitiatingOccupancyFraction
>>>> =50
>>>> to 75. I think that CMS collector works better when Old generation space
>>> is
>>>> more populated.
>>>> 
>>>> I usually use to set Survivor spaces to lesser size. If you want to try
>>>> SurvivorRatio to 6, i think performance would be improved.
>>>> 
>>>> Another good practice for me would be to set an static NewSize instead
>>>> of -XX:NewRatio=3.
>>>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one
>>> third
>>>> of total heap space is recommended).
>>>> 
>>>> Finally, my best results after a deep JVM I+D related to Solr, came
>>>> removing ScavengeBeforeRemark flag and applying this new one: +
>>>> ParGCCardsPerStrideChunk.
>>>> 
>>>> However, It would be a good one to set ParallelGCThreads and
>>>> *ConcGCThreads *to their optimal value, and we need you system CPU number
>>>> to know it. Can you provide this data, please?
>>>> 
>>>> Regards
>>>> 
>>>> 
>>>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
>>>> aharrison-fuller@mintel.com
>>>>> :
>>>> 
>>>>> Hey all,
>>>>> 
>>>>> I was wondering if I could get some JVM/GC tuning advice to resolve an
>>>>> issue that we are experiencing.
>>>>> 
>>>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
>>>>> render would be greatly appreciated.
>>>>> 
>>>>> Our Solr cloud nodes are having issues throwing OOM exceptions under
>>>> load.
>>>>> This issue has only started manifesting itself over the last few months
>>>>> during which time the only change I can discern is an increase in index
>>>>> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
>>>>> index is currently 58G and the server has 46G of physical RAM and runs
>>>>> nothing other than the Solr node.
>>>>> 
>>>>> The JVM is invoked with the following JVM options:
>>>>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=
>>>> 6000
>>>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
>>>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
>>>> -XX:+ManagementServer
>>>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
>>>>> -XX:NewRatio=3 -XX:OldPLABSize=16
>>>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
>>>>> /data/gnpd/solr/logs
>>>>> -XX:ParallelGCThreads=4
>>>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
>>>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>>>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>>>>> -XX:TargetSurvivorRatio=90
>>>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>>>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>>>> 
>>>>> These values were decided upon serveral years by a colleague based upon
>>>>> some suggestions from this mailing group with an index size ~25G.
>>>>> 
>>>>> I have imported the GC logs into GCViewer and attached a link to a
>>>>> screenshot showing the lead up to a OOM crash.  Interestingly the young
>>>>> generation space is almost empty before the repeated GC's and
>>> subsequent
>>>>> crash.
>>>>> https://imgur.com/a/Wtlez
>>>>> 
>>>>> I was considering slowly increasing the amount of heap available to the
>>>> JVM
>>>>> slowly until the crashes, any other suggestions?  I'm looking at trying
>>>> to
>>>>> get the nodes stable without having issues with the GC taking forever
>>> to
>>>>> run.
>>>>> 
>>>>> Additional information can be provided on request.
>>>>> 
>>>>> Cheers!
>>>>> Adam
>>>>> 
>>>>> --
>>>>> 
>>>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>>>>> Registered in
>>>>> England: Number 1475918. | VAT Number: GB 232 9342 72
>>>>> 
>>>>> Contact details for
>>>>> our other offices can be found at http://www.mintel.com/office-
>>> locations
>>>>> <http://www.mintel.com/office-locations>.
>>>>> 
>>>>> This email and any attachments
>>>>> may include content that is confidential, privileged
>>>>> or otherwise
>>>>> protected under applicable law. Unauthorised disclosure, copying,
>>>>> distribution
>>>>> or use of the contents is prohibited and may be unlawful. If
>>>>> you have received this email in error,
>>>>> including without appropriate
>>>>> authorisation, then please reply to the sender about the error
>>>>> and delete
>>>>> this email and any attachments.
>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> 
>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>>> Registered in
>>> England: Number 1475918. | VAT Number: GB 232 9342 72
>>> 
>>> Contact details for
>>> our other offices can be found at http://www.mintel.com/office-locations
>>> <http://www.mintel.com/office-locations>.
>>> 
>>> This email and any attachments
>>> may include content that is confidential, privileged
>>> or otherwise
>>> protected under applicable law. Unauthorised disclosure, copying,
>>> distribution
>>> or use of the contents is prohibited and may be unlawful. If
>>> you have received this email in error,
>>> including without appropriate
>>> authorisation, then please reply to the sender about the error
>>> and delete
>>> this email and any attachments.
>>> 
>>> 
>> 
>> 
>> -- 
>> Thanks,
>> Sujay P Bawaskar
>> M:+91-77091 53669
>

Re: Solr OOM Crashes / JVM tuning advice

Posted by Walter Underwood <wu...@wunderwood.org>.

For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888. Also, I always use a start size the same as the max size, since servers will eventually grow to the max size. So:

-Xmx12G -Xms12G

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <su...@gmail.com> wrote:
> 
> What is directory factory defined in solrconfig.xml? Your JVM heap should
> be tuned up with respect to that.
> How solr is being use,  is it more updates and less query or less updates
> more queries?
> What is OOM error? Is it frequent GC or Error 12?
> 
> On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
> aharrison-fuller@mintel.com> wrote:
> 
>> Hey Jesus,
>> 
>> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to them.
>> 
>> Cheers!
>> Adam
>> 
>> On 11 April 2018 at 11:22, Jesus Olivan <je...@letgo.com> wrote:
>> 
>>> Hi Adam,
>>> 
>>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM,
>>> your JVM can afford more RAM without threading penalties due to outside
>>> heap RAM lacks.
>>> 
>>> Another good one would be to increase -XX:CMSInitiatingOccupancyFraction
>>> =50
>>> to 75. I think that CMS collector works better when Old generation space
>> is
>>> more populated.
>>> 
>>> I usually use to set Survivor spaces to lesser size. If you want to try
>>> SurvivorRatio to 6, i think performance would be improved.
>>> 
>>> Another good practice for me would be to set an static NewSize instead
>>> of -XX:NewRatio=3.
>>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one
>> third
>>> of total heap space is recommended).
>>> 
>>> Finally, my best results after a deep JVM I+D related to Solr, came
>>> removing ScavengeBeforeRemark flag and applying this new one: +
>>> ParGCCardsPerStrideChunk.
>>> 
>>> However, It would be a good one to set ParallelGCThreads and
>>> *ConcGCThreads *to their optimal value, and we need you system CPU number
>>> to know it. Can you provide this data, please?
>>> 
>>> Regards
>>> 
>>> 
>>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
>>> aharrison-fuller@mintel.com
>>>> :
>>> 
>>>> Hey all,
>>>> 
>>>> I was wondering if I could get some JVM/GC tuning advice to resolve an
>>>> issue that we are experiencing.
>>>> 
>>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
>>>> render would be greatly appreciated.
>>>> 
>>>> Our Solr cloud nodes are having issues throwing OOM exceptions under
>>> load.
>>>> This issue has only started manifesting itself over the last few months
>>>> during which time the only change I can discern is an increase in index
>>>> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
>>>> index is currently 58G and the server has 46G of physical RAM and runs
>>>> nothing other than the Solr node.
>>>> 
>>>> The JVM is invoked with the following JVM options:
>>>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=
>>> 6000
>>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
>>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
>>> -XX:+ManagementServer
>>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
>>>> -XX:NewRatio=3 -XX:OldPLABSize=16
>>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
>>>> /data/gnpd/solr/logs
>>>> -XX:ParallelGCThreads=4
>>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
>>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>>>> -XX:TargetSurvivorRatio=90
>>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>>> 
>>>> These values were decided upon serveral years by a colleague based upon
>>>> some suggestions from this mailing group with an index size ~25G.
>>>> 
>>>> I have imported the GC logs into GCViewer and attached a link to a
>>>> screenshot showing the lead up to a OOM crash.  Interestingly the young
>>>> generation space is almost empty before the repeated GC's and
>> subsequent
>>>> crash.
>>>> https://imgur.com/a/Wtlez
>>>> 
>>>> I was considering slowly increasing the amount of heap available to the
>>> JVM
>>>> slowly until the crashes, any other suggestions?  I'm looking at trying
>>> to
>>>> get the nodes stable without having issues with the GC taking forever
>> to
>>>> run.
>>>> 
>>>> Additional information can be provided on request.
>>>> 
>>>> Cheers!
>>>> Adam
>>>> 
>>>> --
>>>> 
>>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>>>> Registered in
>>>> England: Number 1475918. | VAT Number: GB 232 9342 72
>>>> 
>>>> Contact details for
>>>> our other offices can be found at http://www.mintel.com/office-
>> locations
>>>> <http://www.mintel.com/office-locations>.
>>>> 
>>>> This email and any attachments
>>>> may include content that is confidential, privileged
>>>> or otherwise
>>>> protected under applicable law. Unauthorised disclosure, copying,
>>>> distribution
>>>> or use of the contents is prohibited and may be unlawful. If
>>>> you have received this email in error,
>>>> including without appropriate
>>>> authorisation, then please reply to the sender about the error
>>>> and delete
>>>> this email and any attachments.
>>>> 
>>>> 
>>> 
>> 
>> --
>> 
>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>> Registered in
>> England: Number 1475918. | VAT Number: GB 232 9342 72
>> 
>> Contact details for
>> our other offices can be found at http://www.mintel.com/office-locations
>> <http://www.mintel.com/office-locations>.
>> 
>> This email and any attachments
>> may include content that is confidential, privileged
>> or otherwise
>> protected under applicable law. Unauthorised disclosure, copying,
>> distribution
>> or use of the contents is prohibited and may be unlawful. If
>> you have received this email in error,
>> including without appropriate
>> authorisation, then please reply to the sender about the error
>> and delete
>> this email and any attachments.
>> 
>> 
> 
> 
> -- 
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669

Re: Solr OOM Crashes / JVM tuning advice

Posted by Sujay Bawaskar <su...@gmail.com>.

What is directory factory defined in solrconfig.xml? Your JVM heap should
be tuned up with respect to that.
How solr is being use,  is it more updates and less query or less updates
more queries?
What is OOM error? Is it frequent GC or Error 12?

On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
aharrison-fuller@mintel.com> wrote:

> Hey Jesus,
>
> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to them.
>
> Cheers!
> Adam
>
> On 11 April 2018 at 11:22, Jesus Olivan <je...@letgo.com> wrote:
>
> > Hi Adam,
> >
> > IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM,
> > your JVM can afford more RAM without threading penalties due to outside
> > heap RAM lacks.
> >
> > Another good one would be to increase -XX:CMSInitiatingOccupancyFraction
> > =50
> > to 75. I think that CMS collector works better when Old generation space
> is
> > more populated.
> >
> > I usually use to set Survivor spaces to lesser size. If you want to try
> > SurvivorRatio to 6, i think performance would be improved.
> >
> > Another good practice for me would be to set an static NewSize instead
> > of -XX:NewRatio=3.
> > You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one
> third
> > of total heap space is recommended).
> >
> > Finally, my best results after a deep JVM I+D related to Solr, came
> > removing ScavengeBeforeRemark flag and applying this new one: +
> > ParGCCardsPerStrideChunk.
> >
> > However, It would be a good one to set ParallelGCThreads and
> > *ConcGCThreads *to their optimal value, and we need you system CPU number
> > to know it. Can you provide this data, please?
> >
> > Regards
> >
> >
> > 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
> > aharrison-fuller@mintel.com
> > >:
> >
> > > Hey all,
> > >
> > > I was wondering if I could get some JVM/GC tuning advice to resolve an
> > > issue that we are experiencing.
> > >
> > > Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
> > > render would be greatly appreciated.
> > >
> > > Our Solr cloud nodes are having issues throwing OOM exceptions under
> > load.
> > > This issue has only started manifesting itself over the last few months
> > > during which time the only change I can discern is an increase in index
> > > size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
> > > index is currently 58G and the server has 46G of physical RAM and runs
> > > nothing other than the Solr node.
> > >
> > > The JVM is invoked with the following JVM options:
> > > -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=
> > 6000
> > > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> > > -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
> > -XX:+ManagementServer
> > > -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> > > -XX:NewRatio=3 -XX:OldPLABSize=16
> > > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
> > > /data/gnpd/solr/logs
> > > -XX:ParallelGCThreads=4
> > > -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> > > -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> > > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> > > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> > > -XX:TargetSurvivorRatio=90
> > > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> > > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> > >
> > > These values were decided upon serveral years by a colleague based upon
> > > some suggestions from this mailing group with an index size ~25G.
> > >
> > > I have imported the GC logs into GCViewer and attached a link to a
> > > screenshot showing the lead up to a OOM crash.  Interestingly the young
> > > generation space is almost empty before the repeated GC's and
> subsequent
> > > crash.
> > > https://imgur.com/a/Wtlez
> > >
> > > I was considering slowly increasing the amount of heap available to the
> > JVM
> > > slowly until the crashes, any other suggestions?  I'm looking at trying
> > to
> > > get the nodes stable without having issues with the GC taking forever
> to
> > > run.
> > >
> > > Additional information can be provided on request.
> > >
> > > Cheers!
> > > Adam
> > >
> > > --
> > >
> > > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > > Registered in
> > > England: Number 1475918. | VAT Number: GB 232 9342 72
> > >
> > > Contact details for
> > > our other offices can be found at http://www.mintel.com/office-
> locations
> > > <http://www.mintel.com/office-locations>.
> > >
> > > This email and any attachments
> > > may include content that is confidential, privileged
> > > or otherwise
> > > protected under applicable law. Unauthorised disclosure, copying,
> > > distribution
> > > or use of the contents is prohibited and may be unlawful. If
> > > you have received this email in error,
> > > including without appropriate
> > > authorisation, then please reply to the sender about the error
> > > and delete
> > > this email and any attachments.
> > >
> > >
> >
>
> --
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in
> England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for
> our other offices can be found at http://www.mintel.com/office-locations
> <http://www.mintel.com/office-locations>.
>
> This email and any attachments
> may include content that is confidential, privileged
> or otherwise
> protected under applicable law. Unauthorised disclosure, copying,
> distribution
> or use of the contents is prohibited and may be unlawful. If
> you have received this email in error,
> including without appropriate
> authorisation, then please reply to the sender about the error
> and delete
> this email and any attachments.
>
>


-- 
Thanks,
Sujay P Bawaskar
M:+91-77091 53669

Re: Solr OOM Crashes / JVM tuning advice

Posted by Emir Arnautović <em...@sematext.com>.

Hi Adam,
From Solr’s point of view, you should probably check your caches, mostly filterCache, fieldCache and fieldValueCache.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 11 Apr 2018, at 14:35, Adam Harrison-Fuller <ah...@mintel.com> wrote:
> 
> Hey Jesus,
> 
> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to them.
> 
> Cheers!
> Adam
> 
> On 11 April 2018 at 11:22, Jesus Olivan <je...@letgo.com> wrote:
> 
>> Hi Adam,
>> 
>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM,
>> your JVM can afford more RAM without threading penalties due to outside
>> heap RAM lacks.
>> 
>> Another good one would be to increase -XX:CMSInitiatingOccupancyFraction
>> =50
>> to 75. I think that CMS collector works better when Old generation space is
>> more populated.
>> 
>> I usually use to set Survivor spaces to lesser size. If you want to try
>> SurvivorRatio to 6, i think performance would be improved.
>> 
>> Another good practice for me would be to set an static NewSize instead
>> of -XX:NewRatio=3.
>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one third
>> of total heap space is recommended).
>> 
>> Finally, my best results after a deep JVM I+D related to Solr, came
>> removing ScavengeBeforeRemark flag and applying this new one: +
>> ParGCCardsPerStrideChunk.
>> 
>> However, It would be a good one to set ParallelGCThreads and
>> *ConcGCThreads *to their optimal value, and we need you system CPU number
>> to know it. Can you provide this data, please?
>> 
>> Regards
>> 
>> 
>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
>> aharrison-fuller@mintel.com
>>> :
>> 
>>> Hey all,
>>> 
>>> I was wondering if I could get some JVM/GC tuning advice to resolve an
>>> issue that we are experiencing.
>>> 
>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
>>> render would be greatly appreciated.
>>> 
>>> Our Solr cloud nodes are having issues throwing OOM exceptions under
>> load.
>>> This issue has only started manifesting itself over the last few months
>>> during which time the only change I can discern is an increase in index
>>> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
>>> index is currently 58G and the server has 46G of physical RAM and runs
>>> nothing other than the Solr node.
>>> 
>>> The JVM is invoked with the following JVM options:
>>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=
>> 6000
>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
>> -XX:+ManagementServer
>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
>>> -XX:NewRatio=3 -XX:OldPLABSize=16
>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
>>> /data/gnpd/solr/logs
>>> -XX:ParallelGCThreads=4
>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>>> -XX:TargetSurvivorRatio=90
>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>> 
>>> These values were decided upon serveral years by a colleague based upon
>>> some suggestions from this mailing group with an index size ~25G.
>>> 
>>> I have imported the GC logs into GCViewer and attached a link to a
>>> screenshot showing the lead up to a OOM crash.  Interestingly the young
>>> generation space is almost empty before the repeated GC's and subsequent
>>> crash.
>>> https://imgur.com/a/Wtlez
>>> 
>>> I was considering slowly increasing the amount of heap available to the
>> JVM
>>> slowly until the crashes, any other suggestions?  I'm looking at trying
>> to
>>> get the nodes stable without having issues with the GC taking forever to
>>> run.
>>> 
>>> Additional information can be provided on request.
>>> 
>>> Cheers!
>>> Adam
>>> 
>>> --
>>> 
>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>>> Registered in
>>> England: Number 1475918. | VAT Number: GB 232 9342 72
>>> 
>>> Contact details for
>>> our other offices can be found at http://www.mintel.com/office-locations
>>> <http://www.mintel.com/office-locations>.
>>> 
>>> This email and any attachments
>>> may include content that is confidential, privileged
>>> or otherwise
>>> protected under applicable law. Unauthorised disclosure, copying,
>>> distribution
>>> or use of the contents is prohibited and may be unlawful. If
>>> you have received this email in error,
>>> including without appropriate
>>> authorisation, then please reply to the sender about the error
>>> and delete
>>> this email and any attachments.
>>> 
>>> 
>> 
> 
> -- 
> 
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in 
> England: Number 1475918. | VAT Number: GB 232 9342 72
> 
> Contact details for 
> our other offices can be found at http://www.mintel.com/office-locations 
> <http://www.mintel.com/office-locations>.
> 
> This email and any attachments 
> may include content that is confidential, privileged 
> or otherwise 
> protected under applicable law. Unauthorised disclosure, copying, 
> distribution 
> or use of the contents is prohibited and may be unlawful. If 
> you have received this email in error,
> including without appropriate 
> authorisation, then please reply to the sender about the error 
> and delete 
> this email and any attachments.
>

Re: Solr OOM Crashes / JVM tuning advice

Posted by Adam Harrison-Fuller <ah...@mintel.com>.

Hey Jesus,

Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to them.

Cheers!
Adam

On 11 April 2018 at 11:22, Jesus Olivan <je...@letgo.com> wrote:

> Hi Adam,
>
> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM,
> your JVM can afford more RAM without threading penalties due to outside
> heap RAM lacks.
>
> Another good one would be to increase -XX:CMSInitiatingOccupancyFraction
> =50
> to 75. I think that CMS collector works better when Old generation space is
> more populated.
>
> I usually use to set Survivor spaces to lesser size. If you want to try
> SurvivorRatio to 6, i think performance would be improved.
>
> Another good practice for me would be to set an static NewSize instead
> of -XX:NewRatio=3.
> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one third
> of total heap space is recommended).
>
> Finally, my best results after a deep JVM I+D related to Solr, came
> removing ScavengeBeforeRemark flag and applying this new one: +
> ParGCCardsPerStrideChunk.
>
> However, It would be a good one to set ParallelGCThreads and
> *ConcGCThreads *to their optimal value, and we need you system CPU number
> to know it. Can you provide this data, please?
>
> Regards
>
>
> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
> aharrison-fuller@mintel.com
> >:
>
> > Hey all,
> >
> > I was wondering if I could get some JVM/GC tuning advice to resolve an
> > issue that we are experiencing.
> >
> > Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
> > render would be greatly appreciated.
> >
> > Our Solr cloud nodes are having issues throwing OOM exceptions under
> load.
> > This issue has only started manifesting itself over the last few months
> > during which time the only change I can discern is an increase in index
> > size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
> > index is currently 58G and the server has 46G of physical RAM and runs
> > nothing other than the Solr node.
> >
> > The JVM is invoked with the following JVM options:
> > -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=
> 6000
> > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> > -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
> -XX:+ManagementServer
> > -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> > -XX:NewRatio=3 -XX:OldPLABSize=16
> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
> > /data/gnpd/solr/logs
> > -XX:ParallelGCThreads=4
> > -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> > -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> > -XX:TargetSurvivorRatio=90
> > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >
> > These values were decided upon serveral years by a colleague based upon
> > some suggestions from this mailing group with an index size ~25G.
> >
> > I have imported the GC logs into GCViewer and attached a link to a
> > screenshot showing the lead up to a OOM crash.  Interestingly the young
> > generation space is almost empty before the repeated GC's and subsequent
> > crash.
> > https://imgur.com/a/Wtlez
> >
> > I was considering slowly increasing the amount of heap available to the
> JVM
> > slowly until the crashes, any other suggestions?  I'm looking at trying
> to
> > get the nodes stable without having issues with the GC taking forever to
> > run.
> >
> > Additional information can be provided on request.
> >
> > Cheers!
> > Adam
> >
> > --
> >
> > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > Registered in
> > England: Number 1475918. | VAT Number: GB 232 9342 72
> >
> > Contact details for
> > our other offices can be found at http://www.mintel.com/office-locations
> > <http://www.mintel.com/office-locations>.
> >
> > This email and any attachments
> > may include content that is confidential, privileged
> > or otherwise
> > protected under applicable law. Unauthorised disclosure, copying,
> > distribution
> > or use of the contents is prohibited and may be unlawful. If
> > you have received this email in error,
> > including without appropriate
> > authorisation, then please reply to the sender about the error
> > and delete
> > this email and any attachments.
> >
> >
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in 
England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for 
our other offices can be found at http://www.mintel.com/office-locations 
<http://www.mintel.com/office-locations>.

This email and any attachments 
may include content that is confidential, privileged 
or otherwise 
protected under applicable law. Unauthorised disclosure, copying, 
distribution 
or use of the contents is prohibited and may be unlawful. If 
you have received this email in error,
including without appropriate 
authorisation, then please reply to the sender about the error 
and delete 
this email and any attachments.

Re: Solr OOM Crashes / JVM tuning advice

Posted by Jesus Olivan <je...@letgo.com>.

Hi Adam,

IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM,
your JVM can afford more RAM without threading penalties due to outside
heap RAM lacks.

Another good one would be to increase -XX:CMSInitiatingOccupancyFraction=50
to 75. I think that CMS collector works better when Old generation space is
more populated.

I usually use to set Survivor spaces to lesser size. If you want to try
SurvivorRatio to 6, i think performance would be improved.

Another good practice for me would be to set an static NewSize instead
of -XX:NewRatio=3.
You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one third
of total heap space is recommended).

Finally, my best results after a deep JVM I+D related to Solr, came
removing ScavengeBeforeRemark flag and applying this new one: +
ParGCCardsPerStrideChunk.

However, It would be a good one to set ParallelGCThreads and
*ConcGCThreads *to their optimal value, and we need you system CPU number
to know it. Can you provide this data, please?

Regards


2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <aharrison-fuller@mintel.com
>:

> Hey all,
>
> I was wondering if I could get some JVM/GC tuning advice to resolve an
> issue that we are experiencing.
>
> Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
> render would be greatly appreciated.
>
> Our Solr cloud nodes are having issues throwing OOM exceptions under load.
> This issue has only started manifesting itself over the last few months
> during which time the only change I can discern is an increase in index
> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
> index is currently 58G and the server has 46G of physical RAM and runs
> nothing other than the Solr node.
>
> The JVM is invoked with the following JVM options:
> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 -XX:+ManagementServer
> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> -XX:NewRatio=3 -XX:OldPLABSize=16
> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
> /data/gnpd/solr/logs
> -XX:ParallelGCThreads=4
> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> -XX:TargetSurvivorRatio=90
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>
> These values were decided upon serveral years by a colleague based upon
> some suggestions from this mailing group with an index size ~25G.
>
> I have imported the GC logs into GCViewer and attached a link to a
> screenshot showing the lead up to a OOM crash.  Interestingly the young
> generation space is almost empty before the repeated GC's and subsequent
> crash.
> https://imgur.com/a/Wtlez
>
> I was considering slowly increasing the amount of heap available to the JVM
> slowly until the crashes, any other suggestions?  I'm looking at trying to
> get the nodes stable without having issues with the GC taking forever to
> run.
>
> Additional information can be provided on request.
>
> Cheers!
> Adam
>
> --
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in
> England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for
> our other offices can be found at http://www.mintel.com/office-locations
> <http://www.mintel.com/office-locations>.
>
> This email and any attachments
> may include content that is confidential, privileged
> or otherwise
> protected under applicable law. Unauthorised disclosure, copying,
> distribution
> or use of the contents is prohibited and may be unlawful. If
> you have received this email in error,
> including without appropriate
> authorisation, then please reply to the sender about the error
> and delete
> this email and any attachments.
>
>