You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Cas Rusnov <ca...@manzama.com> on 2016/06/15 21:05:53 UTC

Long STW GCs with Solr Cloud

I know this has been discussed in the past (although not too recently), but
the advices in those have failed us, so here we are.

Some basics:

We're running Solr 6 (6.0.0 48c80f91b8e5cd9b3a9b48e6184bd53e7619e7e3 -
nknize - 2016-04-01 14:41:49), on Java 8 (OpenJDK Runtime Environment
(build 1.8.0_72-internal-b15)), on at this point some rather large cloud
instances (8 cpu / 40gb ram).

Our general cluster layout is 3 nodes per shard, 6 shards, we have three
collections, but one primary collection which is heavily used and is
resulting in the GC situation we're seeing. There are roughly 55m documents
in this collection.

Our test load is multiple large, complicated queries which facet across
multiple fields.

After trying many of the off the shelf configurations (including CMS
configurations but excluding G1GC, which we're still taking the warnings
about seriously), numerous tweaks, rumors, various instance sizes, and all
the rest, most of which regardless of heap size and newspace size resulted
in frequent 30+ second STW GCs, we settled on the following configuration
which leads to occasional
high GCs but mostly stays between 10-20 second STWs every few minutes
(which is almost acceptable):

-XX:+AggressiveOpts
-XX:+UnlockDiagnosticVMOptions
-XX:+UseAdaptiveSizePolicy
-XX:+UseLargePages
-XX:+UseParallelGC
-XX:+UseParallelOldGC
-XX:MaxGCPauseMillis=15000
-XX:MaxNewSize=12000m
-XX:ParGCCardsPerStrideChunk=4096
-XX:ParallelGCThreads=16
-Xms31000m
-Xmx31000m

Note that HugeTable is working on the instances, and allocates
approximately the size of the java instance, and Java doesn't produce the
error that indicates that the HugeTable didn't work - getting this working
did provide a marginal improvement in performance.

Mostly we're wondering if there's something we missed something in the
configuration, and if anyone has experienced something similar! Thanks for
any help!

-- 

Cas Rusnov,

Engineer
[image: Manzama Logo] <http://www.manzama.com>

Visit our Resource Center <http://www.manzama.com/resource-center/>.

US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
(0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
<+610293266264>

LinkedIn  <http://www.linkedin.com/company/manzama>| Twitter
<https://twitter.com/ManzamaInc>| Facebook
<http://www.facebook.com/manzamainc>| Google +
<https://plus.google.com/u/0/b/116326385357563344293/+ManzamaInc/about>|
YouTube  <https://www.youtube.com/channel/UCgbgt-xWBTxrbQESTVeMMHw>|
Pinterest <https://www.pinterest.com/manzama1754/>

Re: Long STW GCs with Solr Cloud

Posted by Walter Underwood <wu...@wunderwood.org>.

I try to adjust the new generation size so that it can handle all the allocations needed for HTTP requests. Those short-lived objects should never come from tenured space.

Even without facets, I run a pretty big new generation, 2 GB in an 8 GB heap.

The tenured space will always grow in Solr, because objects ejected from cache have been around a while. Caches create garbage in tenured space.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 17, 2016, at 10:01 AM, Jeff Wartes <jw...@whitepages.com> wrote:
> 
> For what it’s worth, I looked into reducing the allocation footprint of CollapsingQParserPlugin a bit, but without success. See https://issues.apache.org/jira/browse/SOLR-9125
> 
> As it happened, I was collapsing on a field with such high cardinality that the chances of a query even doing much collapsing of interest was pretty low. That allowed me to use a vastly stripped-down version of CollapsingQParserPlugin with a *much* lower memory footprint, in exchange for collapsed document heads essentially being picked at random. (That is, when collapsing two documents, the one that gets returned is random.)
> 
> If that’s of interest, I could probably throw the code someplace public.
> 
> 
> On 6/16/16, 3:39 PM, "Cas Rusnov" <ca...@manzama.com> wrote:
> 
>> Hey thanks for your reply.
>> 
>> Looks like running the suggested CMS config from Shawn, we're getting some
>> nodes with 30+sec pauses, I gather due to large heap, interestingly enough
>> while the scenario Jeff talked about is remarkably similar (we use field
>> collapsing), including the performance aspects of it, we are getting
>> concurrent mode failures both due to new space allocation failures and due
>> to promotion failures. I suspect there's a lot of garbage building up.
>> We're going to run tests with field collapsing disabled and see if that
>> makes a difference.
>> 
>> Cas
>> 
>> 
>> On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes <jw...@whitepages.com> wrote:
>> 
>>> Check your gc log for CMS “concurrent mode failure” messages.
>>> 
>>> If a concurrent CMS collection fails, it does a stop-the-world pause while
>>> it cleans up using a *single thread*. This means the stop-the-world CMS
>>> collection in the failure case is typically several times slower than a
>>> concurrent CMS collection. The single-thread business means it will also be
>>> several times slower than the Parallel collector, which is probably what
>>> you’re seeing. I understand that it needs to stop the world in this case,
>>> but I really wish the CMS failure would fall back to a Parallel collector
>>> run instead.
>>> The Parallel collector is always going to be the fastest at getting rid of
>>> garbage, but only because it stops all the application threads while it
>>> runs, so it’s got less complexity to deal with. That said, it’s probably
>>> not going to be orders of magnitude faster than a (successfully) concurrent
>>> CMS collection.
>>> 
>>> Regardless, the bigger the heap, the bigger the pause.
>>> 
>>> If your application is generating a lot of garbage, or can generate a lot
>>> of garbage very suddenly, CMS concurrent mode failures are more likely. You
>>> can turn down the  -XX:CMSInitiatingOccupancyFraction value in order to
>>> give the CMS collection more of a head start at the cost of more frequent
>>> collections. If that doesn’t work, you can try using a bigger heap, but you
>>> may eventually find yourself trying to figure out what about your query
>>> load generates so much garbage (or causes garbage spikes) and trying to
>>> address that. Even G1 won’t protect you from highly unpredictable garbage
>>> generation rates.
>>> 
>>> In my case, for example, I found that a very small subset of my queries
>>> were using the CollapseQParserPlugin, which requires quite a lot of memory
>>> allocations, especially on a large index. Although generally this was fine,
>>> if I got several of these rare queries in a very short window, it would
>>> always spike enough garbage to cause CMS concurrent mode failures. The
>>> single-threaded concurrent-mode failure would then take long enough that
>>> the ZK heartbeat would fail, and things would just go downhill from there.
>>> 
>>> 
>>> 
>>> On 6/15/16, 3:57 PM, "Cas Rusnov" <ca...@manzama.com> wrote:
>>> 
>>>> Hey Shawn! Thanks for replying.
>>>> 
>>>> Yes I meant HugePages not HugeTable, brain fart. I will give the
>>>> transparent off option a go.
>>>> 
>>>> I have attempted to use your CMS configs as is and also the default
>>>> settings and the cluster dies under our load (basically a node will get a
>>>> 35-60s GC STW and then the others in the shard will take the load, and
>>> they
>>>> will in turn get long STWs until the shard dies), which is why basically
>>> in
>>>> a fit of desperation I tried out ParallelGC and found it to be half-way
>>>> acceptable. I will run a test using your configs (and the defaults) again
>>>> just to be sure (since I'm certain the machine config has changed since we
>>>> used your unaltered settings).
>>>> 
>>>> Thanks!
>>>> Cas
>>>> 
>>>> 
>>>> On Wed, Jun 15, 2016 at 3:41 PM, Shawn Heisey <ap...@elyograg.org>
>>> wrote:
>>>> 
>>>>> On 6/15/2016 3:05 PM, Cas Rusnov wrote:
>>>>>> After trying many of the off the shelf configurations (including CMS
>>>>>> configurations but excluding G1GC, which we're still taking the
>>>>>> warnings about seriously), numerous tweaks, rumors, various instance
>>>>>> sizes, and all the rest, most of which regardless of heap size and
>>>>>> newspace size resulted in frequent 30+ second STW GCs, we settled on
>>>>>> the following configuration which leads to occasional high GCs but
>>>>>> mostly stays between 10-20 second STWs every few minutes (which is
>>>>>> almost acceptable): -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions
>>>>>> -XX:+UseAdaptiveSizePolicy -XX:+UseLargePages -XX:+UseParallelGC
>>>>>> -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=15000 -XX:MaxNewSize=12000m
>>>>>> -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=16 -Xms31000m
>>>>>> -Xmx31000m
>>>>> 
>>>>> You mentioned something called "HugeTable" ... I assume you're talking
>>>>> about huge pages.  If that's what you're talking about, have you also
>>>>> turned off transparent huge pages?  If you haven't, you might want to
>>>>> completely disable huge pages in your OS.  There's evidence that the
>>>>> transparent option can affect performance.
>>>>> 
>>>>> I assume you've probably looked at my GC info at the following URL:
>>>>> 
>>>>> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
>>>>> 
>>>>> The parallel collector is most definitely not a good choice.  It does
>>>>> not optimize for latency.  It's my understanding that it actually
>>>>> prefers full GCs, because it is optimized for throughput.  Solr thrives
>>>>> on good latency, throughput doesn't matter very much.
>>>>> 
>>>>> If you want to continue avoiding G1, you should definitely be using
>>>>> CMS.  My recommendation right now would be to try the G1 settings on my
>>>>> wiki page under the heading "Current experiments" or the CMS settings
>>>>> just below that.
>>>>> 
>>>>> The out-of-the-box GC tuning included with Solr 6 is probably a better
>>>>> option than the parallel collector you've got configured now.
>>>>> 
>>>>> Thanks,
>>>>> Shawn
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Cas Rusnov,
>>>> 
>>>> Engineer
>>>> [image: Manzama Logo] <http://www.manzama.com>
>>>> 
>>>> Visit our Resource Center <http://www.manzama.com/resource-center/>.
>>>> 
>>>> US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
>>>> (0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
>>>> <+610293266264>
>>>> 
>>>> LinkedIn  <http://www.linkedin.com/company/manzama>| Twitter
>>>> <https://twitter.com/ManzamaInc>| Facebook
>>>> <http://www.facebook.com/manzamainc>| Google +
>>>> <https://plus.google.com/u/0/b/116326385357563344293/+ManzamaInc/about>|
>>>> YouTube  <https://www.youtube.com/channel/UCgbgt-xWBTxrbQESTVeMMHw>|
>>>> Pinterest <https://www.pinterest.com/manzama1754/>
>>> 
>>> 
>> 
>> 
>> -- 
>> 
>> Cas Rusnov,
>> 
>> Engineer
>> [image: Manzama Logo] <http://www.manzama.com>
>> 
>> Visit our Resource Center <http://www.manzama.com/resource-center/>.
>> 
>> US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
>> (0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
>> <+610293266264>
>> 
>> LinkedIn  <http://www.linkedin.com/company/manzama>| Twitter
>> <https://twitter.com/ManzamaInc>| Facebook
>> <http://www.facebook.com/manzamainc>| Google +
>> <https://plus.google.com/u/0/b/116326385357563344293/+ManzamaInc/about>|
>> YouTube  <https://www.youtube.com/channel/UCgbgt-xWBTxrbQESTVeMMHw>|
>> Pinterest <https://www.pinterest.com/manzama1754/>
>

Re: Long STW GCs with Solr Cloud

Posted by Jeff Wartes <jw...@whitepages.com>.

For what it’s worth, I looked into reducing the allocation footprint of CollapsingQParserPlugin a bit, but without success. See https://issues.apache.org/jira/browse/SOLR-9125

As it happened, I was collapsing on a field with such high cardinality that the chances of a query even doing much collapsing of interest was pretty low. That allowed me to use a vastly stripped-down version of CollapsingQParserPlugin with a *much* lower memory footprint, in exchange for collapsed document heads essentially being picked at random. (That is, when collapsing two documents, the one that gets returned is random.)

If that’s of interest, I could probably throw the code someplace public.


On 6/16/16, 3:39 PM, "Cas Rusnov" <ca...@manzama.com> wrote:

>Hey thanks for your reply.
>
>Looks like running the suggested CMS config from Shawn, we're getting some
>nodes with 30+sec pauses, I gather due to large heap, interestingly enough
>while the scenario Jeff talked about is remarkably similar (we use field
>collapsing), including the performance aspects of it, we are getting
>concurrent mode failures both due to new space allocation failures and due
>to promotion failures. I suspect there's a lot of garbage building up.
>We're going to run tests with field collapsing disabled and see if that
>makes a difference.
>
>Cas
>
>
>On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes <jw...@whitepages.com> wrote:
>
>> Check your gc log for CMS “concurrent mode failure” messages.
>>
>> If a concurrent CMS collection fails, it does a stop-the-world pause while
>> it cleans up using a *single thread*. This means the stop-the-world CMS
>> collection in the failure case is typically several times slower than a
>> concurrent CMS collection. The single-thread business means it will also be
>> several times slower than the Parallel collector, which is probably what
>> you’re seeing. I understand that it needs to stop the world in this case,
>> but I really wish the CMS failure would fall back to a Parallel collector
>> run instead.
>> The Parallel collector is always going to be the fastest at getting rid of
>> garbage, but only because it stops all the application threads while it
>> runs, so it’s got less complexity to deal with. That said, it’s probably
>> not going to be orders of magnitude faster than a (successfully) concurrent
>> CMS collection.
>>
>> Regardless, the bigger the heap, the bigger the pause.
>>
>> If your application is generating a lot of garbage, or can generate a lot
>> of garbage very suddenly, CMS concurrent mode failures are more likely. You
>> can turn down the  -XX:CMSInitiatingOccupancyFraction value in order to
>> give the CMS collection more of a head start at the cost of more frequent
>> collections. If that doesn’t work, you can try using a bigger heap, but you
>> may eventually find yourself trying to figure out what about your query
>> load generates so much garbage (or causes garbage spikes) and trying to
>> address that. Even G1 won’t protect you from highly unpredictable garbage
>> generation rates.
>>
>> In my case, for example, I found that a very small subset of my queries
>> were using the CollapseQParserPlugin, which requires quite a lot of memory
>> allocations, especially on a large index. Although generally this was fine,
>> if I got several of these rare queries in a very short window, it would
>> always spike enough garbage to cause CMS concurrent mode failures. The
>> single-threaded concurrent-mode failure would then take long enough that
>> the ZK heartbeat would fail, and things would just go downhill from there.
>>
>>
>>
>> On 6/15/16, 3:57 PM, "Cas Rusnov" <ca...@manzama.com> wrote:
>>
>> >Hey Shawn! Thanks for replying.
>> >
>> >Yes I meant HugePages not HugeTable, brain fart. I will give the
>> >transparent off option a go.
>> >
>> >I have attempted to use your CMS configs as is and also the default
>> >settings and the cluster dies under our load (basically a node will get a
>> >35-60s GC STW and then the others in the shard will take the load, and
>> they
>> >will in turn get long STWs until the shard dies), which is why basically
>> in
>> >a fit of desperation I tried out ParallelGC and found it to be half-way
>> >acceptable. I will run a test using your configs (and the defaults) again
>> >just to be sure (since I'm certain the machine config has changed since we
>> >used your unaltered settings).
>> >
>> >Thanks!
>> >Cas
>> >
>> >
>> >On Wed, Jun 15, 2016 at 3:41 PM, Shawn Heisey <ap...@elyograg.org>
>> wrote:
>> >
>> >> On 6/15/2016 3:05 PM, Cas Rusnov wrote:
>> >> > After trying many of the off the shelf configurations (including CMS
>> >> > configurations but excluding G1GC, which we're still taking the
>> >> > warnings about seriously), numerous tweaks, rumors, various instance
>> >> > sizes, and all the rest, most of which regardless of heap size and
>> >> > newspace size resulted in frequent 30+ second STW GCs, we settled on
>> >> > the following configuration which leads to occasional high GCs but
>> >> > mostly stays between 10-20 second STWs every few minutes (which is
>> >> > almost acceptable): -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions
>> >> > -XX:+UseAdaptiveSizePolicy -XX:+UseLargePages -XX:+UseParallelGC
>> >> > -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=15000 -XX:MaxNewSize=12000m
>> >> > -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=16 -Xms31000m
>> >> > -Xmx31000m
>> >>
>> >> You mentioned something called "HugeTable" ... I assume you're talking
>> >> about huge pages.  If that's what you're talking about, have you also
>> >> turned off transparent huge pages?  If you haven't, you might want to
>> >> completely disable huge pages in your OS.  There's evidence that the
>> >> transparent option can affect performance.
>> >>
>> >> I assume you've probably looked at my GC info at the following URL:
>> >>
>> >> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
>> >>
>> >> The parallel collector is most definitely not a good choice.  It does
>> >> not optimize for latency.  It's my understanding that it actually
>> >> prefers full GCs, because it is optimized for throughput.  Solr thrives
>> >> on good latency, throughput doesn't matter very much.
>> >>
>> >> If you want to continue avoiding G1, you should definitely be using
>> >> CMS.  My recommendation right now would be to try the G1 settings on my
>> >> wiki page under the heading "Current experiments" or the CMS settings
>> >> just below that.
>> >>
>> >> The out-of-the-box GC tuning included with Solr 6 is probably a better
>> >> option than the parallel collector you've got configured now.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>> >
>> >
>> >--
>> >
>> >Cas Rusnov,
>> >
>> >Engineer
>> >[image: Manzama Logo] <http://www.manzama.com>
>> >
>> >Visit our Resource Center <http://www.manzama.com/resource-center/>.
>> >
>> >US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
>> >(0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
>> ><+610293266264>
>> >
>> >LinkedIn  <http://www.linkedin.com/company/manzama>| Twitter
>> ><https://twitter.com/ManzamaInc>| Facebook
>> ><http://www.facebook.com/manzamainc>| Google +
>> ><https://plus.google.com/u/0/b/116326385357563344293/+ManzamaInc/about>|
>> >YouTube  <https://www.youtube.com/channel/UCgbgt-xWBTxrbQESTVeMMHw>|
>> >Pinterest <https://www.pinterest.com/manzama1754/>
>>
>>
>
>
>-- 
>
>Cas Rusnov,
>
>Engineer
>[image: Manzama Logo] <http://www.manzama.com>
>
>Visit our Resource Center <http://www.manzama.com/resource-center/>.
>
>US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
>(0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
><+610293266264>
>
>LinkedIn  <http://www.linkedin.com/company/manzama>| Twitter
><https://twitter.com/ManzamaInc>| Facebook
><http://www.facebook.com/manzamainc>| Google +
><https://plus.google.com/u/0/b/116326385357563344293/+ManzamaInc/about>|
>YouTube  <https://www.youtube.com/channel/UCgbgt-xWBTxrbQESTVeMMHw>|
>Pinterest <https://www.pinterest.com/manzama1754/>

Re: Long STW GCs with Solr Cloud

Posted by Cas Rusnov <ca...@manzama.com>.

Hey thanks for your reply.

Looks like running the suggested CMS config from Shawn, we're getting some
nodes with 30+sec pauses, I gather due to large heap, interestingly enough
while the scenario Jeff talked about is remarkably similar (we use field
collapsing), including the performance aspects of it, we are getting
concurrent mode failures both due to new space allocation failures and due
to promotion failures. I suspect there's a lot of garbage building up.
We're going to run tests with field collapsing disabled and see if that
makes a difference.

Cas


On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes <jw...@whitepages.com> wrote:

> Check your gc log for CMS “concurrent mode failure” messages.
>
> If a concurrent CMS collection fails, it does a stop-the-world pause while
> it cleans up using a *single thread*. This means the stop-the-world CMS
> collection in the failure case is typically several times slower than a
> concurrent CMS collection. The single-thread business means it will also be
> several times slower than the Parallel collector, which is probably what
> you’re seeing. I understand that it needs to stop the world in this case,
> but I really wish the CMS failure would fall back to a Parallel collector
> run instead.
> The Parallel collector is always going to be the fastest at getting rid of
> garbage, but only because it stops all the application threads while it
> runs, so it’s got less complexity to deal with. That said, it’s probably
> not going to be orders of magnitude faster than a (successfully) concurrent
> CMS collection.
>
> Regardless, the bigger the heap, the bigger the pause.
>
> If your application is generating a lot of garbage, or can generate a lot
> of garbage very suddenly, CMS concurrent mode failures are more likely. You
> can turn down the  -XX:CMSInitiatingOccupancyFraction value in order to
> give the CMS collection more of a head start at the cost of more frequent
> collections. If that doesn’t work, you can try using a bigger heap, but you
> may eventually find yourself trying to figure out what about your query
> load generates so much garbage (or causes garbage spikes) and trying to
> address that. Even G1 won’t protect you from highly unpredictable garbage
> generation rates.
>
> In my case, for example, I found that a very small subset of my queries
> were using the CollapseQParserPlugin, which requires quite a lot of memory
> allocations, especially on a large index. Although generally this was fine,
> if I got several of these rare queries in a very short window, it would
> always spike enough garbage to cause CMS concurrent mode failures. The
> single-threaded concurrent-mode failure would then take long enough that
> the ZK heartbeat would fail, and things would just go downhill from there.
>
>
>
> On 6/15/16, 3:57 PM, "Cas Rusnov" <ca...@manzama.com> wrote:
>
> >Hey Shawn! Thanks for replying.
> >
> >Yes I meant HugePages not HugeTable, brain fart. I will give the
> >transparent off option a go.
> >
> >I have attempted to use your CMS configs as is and also the default
> >settings and the cluster dies under our load (basically a node will get a
> >35-60s GC STW and then the others in the shard will take the load, and
> they
> >will in turn get long STWs until the shard dies), which is why basically
> in
> >a fit of desperation I tried out ParallelGC and found it to be half-way
> >acceptable. I will run a test using your configs (and the defaults) again
> >just to be sure (since I'm certain the machine config has changed since we
> >used your unaltered settings).
> >
> >Thanks!
> >Cas
> >
> >
> >On Wed, Jun 15, 2016 at 3:41 PM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> >> On 6/15/2016 3:05 PM, Cas Rusnov wrote:
> >> > After trying many of the off the shelf configurations (including CMS
> >> > configurations but excluding G1GC, which we're still taking the
> >> > warnings about seriously), numerous tweaks, rumors, various instance
> >> > sizes, and all the rest, most of which regardless of heap size and
> >> > newspace size resulted in frequent 30+ second STW GCs, we settled on
> >> > the following configuration which leads to occasional high GCs but
> >> > mostly stays between 10-20 second STWs every few minutes (which is
> >> > almost acceptable): -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions
> >> > -XX:+UseAdaptiveSizePolicy -XX:+UseLargePages -XX:+UseParallelGC
> >> > -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=15000 -XX:MaxNewSize=12000m
> >> > -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=16 -Xms31000m
> >> > -Xmx31000m
> >>
> >> You mentioned something called "HugeTable" ... I assume you're talking
> >> about huge pages.  If that's what you're talking about, have you also
> >> turned off transparent huge pages?  If you haven't, you might want to
> >> completely disable huge pages in your OS.  There's evidence that the
> >> transparent option can affect performance.
> >>
> >> I assume you've probably looked at my GC info at the following URL:
> >>
> >> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
> >>
> >> The parallel collector is most definitely not a good choice.  It does
> >> not optimize for latency.  It's my understanding that it actually
> >> prefers full GCs, because it is optimized for throughput.  Solr thrives
> >> on good latency, throughput doesn't matter very much.
> >>
> >> If you want to continue avoiding G1, you should definitely be using
> >> CMS.  My recommendation right now would be to try the G1 settings on my
> >> wiki page under the heading "Current experiments" or the CMS settings
> >> just below that.
> >>
> >> The out-of-the-box GC tuning included with Solr 6 is probably a better
> >> option than the parallel collector you've got configured now.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
> >
> >--
> >
> >Cas Rusnov,
> >
> >Engineer
> >[image: Manzama Logo] <http://www.manzama.com>
> >
> >Visit our Resource Center <http://www.manzama.com/resource-center/>.
> >
> >US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
> >(0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
> ><+610293266264>
> >
> >LinkedIn  <http://www.linkedin.com/company/manzama>| Twitter
> ><https://twitter.com/ManzamaInc>| Facebook
> ><http://www.facebook.com/manzamainc>| Google +
> ><https://plus.google.com/u/0/b/116326385357563344293/+ManzamaInc/about>|
> >YouTube  <https://www.youtube.com/channel/UCgbgt-xWBTxrbQESTVeMMHw>|
> >Pinterest <https://www.pinterest.com/manzama1754/>
>
>


-- 

Cas Rusnov,

Engineer
[image: Manzama Logo] <http://www.manzama.com>

Visit our Resource Center <http://www.manzama.com/resource-center/>.

US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
(0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
<+610293266264>

LinkedIn  <http://www.linkedin.com/company/manzama>| Twitter
<https://twitter.com/ManzamaInc>| Facebook
<http://www.facebook.com/manzamainc>| Google +
<https://plus.google.com/u/0/b/116326385357563344293/+ManzamaInc/about>|
YouTube  <https://www.youtube.com/channel/UCgbgt-xWBTxrbQESTVeMMHw>|
Pinterest <https://www.pinterest.com/manzama1754/>

Re: Long STW GCs with Solr Cloud

Posted by Jeff Wartes <jw...@whitepages.com>.

Check your gc log for CMS “concurrent mode failure” messages. 

If a concurrent CMS collection fails, it does a stop-the-world pause while it cleans up using a *single thread*. This means the stop-the-world CMS collection in the failure case is typically several times slower than a concurrent CMS collection. The single-thread business means it will also be several times slower than the Parallel collector, which is probably what you’re seeing. I understand that it needs to stop the world in this case, but I really wish the CMS failure would fall back to a Parallel collector run instead.
The Parallel collector is always going to be the fastest at getting rid of garbage, but only because it stops all the application threads while it runs, so it’s got less complexity to deal with. That said, it’s probably not going to be orders of magnitude faster than a (successfully) concurrent CMS collection.

Regardless, the bigger the heap, the bigger the pause.

If your application is generating a lot of garbage, or can generate a lot of garbage very suddenly, CMS concurrent mode failures are more likely. You can turn down the  -XX:CMSInitiatingOccupancyFraction value in order to give the CMS collection more of a head start at the cost of more frequent collections. If that doesn’t work, you can try using a bigger heap, but you may eventually find yourself trying to figure out what about your query load generates so much garbage (or causes garbage spikes) and trying to address that. Even G1 won’t protect you from highly unpredictable garbage generation rates.

In my case, for example, I found that a very small subset of my queries were using the CollapseQParserPlugin, which requires quite a lot of memory allocations, especially on a large index. Although generally this was fine, if I got several of these rare queries in a very short window, it would always spike enough garbage to cause CMS concurrent mode failures. The single-threaded concurrent-mode failure would then take long enough that the ZK heartbeat would fail, and things would just go downhill from there.

On 6/15/16, 3:57 PM, "Cas Rusnov" <ca...@manzama.com> wrote:

>Hey Shawn! Thanks for replying.
>
>Yes I meant HugePages not HugeTable, brain fart. I will give the
>transparent off option a go.
>
>I have attempted to use your CMS configs as is and also the default
>settings and the cluster dies under our load (basically a node will get a
>35-60s GC STW and then the others in the shard will take the load, and they
>will in turn get long STWs until the shard dies), which is why basically in
>a fit of desperation I tried out ParallelGC and found it to be half-way
>acceptable. I will run a test using your configs (and the defaults) again
>just to be sure (since I'm certain the machine config has changed since we
>used your unaltered settings).
>
>Thanks!
>Cas
>
>
>On Wed, Jun 15, 2016 at 3:41 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 6/15/2016 3:05 PM, Cas Rusnov wrote:
>> > After trying many of the off the shelf configurations (including CMS
>> > configurations but excluding G1GC, which we're still taking the
>> > warnings about seriously), numerous tweaks, rumors, various instance
>> > sizes, and all the rest, most of which regardless of heap size and
>> > newspace size resulted in frequent 30+ second STW GCs, we settled on
>> > the following configuration which leads to occasional high GCs but
>> > mostly stays between 10-20 second STWs every few minutes (which is
>> > almost acceptable): -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions
>> > -XX:+UseAdaptiveSizePolicy -XX:+UseLargePages -XX:+UseParallelGC
>> > -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=15000 -XX:MaxNewSize=12000m
>> > -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=16 -Xms31000m
>> > -Xmx31000m
>>
>> You mentioned something called "HugeTable" ... I assume you're talking
>> about huge pages.  If that's what you're talking about, have you also
>> turned off transparent huge pages?  If you haven't, you might want to
>> completely disable huge pages in your OS.  There's evidence that the
>> transparent option can affect performance.
>>
>> I assume you've probably looked at my GC info at the following URL:
>>
>> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
>>
>> The parallel collector is most definitely not a good choice.  It does
>> not optimize for latency.  It's my understanding that it actually
>> prefers full GCs, because it is optimized for throughput.  Solr thrives
>> on good latency, throughput doesn't matter very much.
>>
>> If you want to continue avoiding G1, you should definitely be using
>> CMS.  My recommendation right now would be to try the G1 settings on my
>> wiki page under the heading "Current experiments" or the CMS settings
>> just below that.
>>
>> The out-of-the-box GC tuning included with Solr 6 is probably a better
>> option than the parallel collector you've got configured now.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
>-- 
>
>Cas Rusnov,
>
>Engineer
>[image: Manzama Logo] <http://www.manzama.com>
>
>Visit our Resource Center <http://www.manzama.com/resource-center/>.
>
>US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
>(0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
><+610293266264>
>
>LinkedIn  <http://www.linkedin.com/company/manzama>| Twitter
><https://twitter.com/ManzamaInc>| Facebook
><http://www.facebook.com/manzamainc>| Google +
><https://plus.google.com/u/0/b/116326385357563344293/+ManzamaInc/about>|
>YouTube  <https://www.youtube.com/channel/UCgbgt-xWBTxrbQESTVeMMHw>|
>Pinterest <https://www.pinterest.com/manzama1754/>

Re: Long STW GCs with Solr Cloud

Posted by Cas Rusnov <ca...@manzama.com>.

Hey Shawn! Thanks for replying.

Yes I meant HugePages not HugeTable, brain fart. I will give the
transparent off option a go.

I have attempted to use your CMS configs as is and also the default
settings and the cluster dies under our load (basically a node will get a
35-60s GC STW and then the others in the shard will take the load, and they
will in turn get long STWs until the shard dies), which is why basically in
a fit of desperation I tried out ParallelGC and found it to be half-way
acceptable. I will run a test using your configs (and the defaults) again
just to be sure (since I'm certain the machine config has changed since we
used your unaltered settings).

Thanks!
Cas


On Wed, Jun 15, 2016 at 3:41 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 6/15/2016 3:05 PM, Cas Rusnov wrote:
> > After trying many of the off the shelf configurations (including CMS
> > configurations but excluding G1GC, which we're still taking the
> > warnings about seriously), numerous tweaks, rumors, various instance
> > sizes, and all the rest, most of which regardless of heap size and
> > newspace size resulted in frequent 30+ second STW GCs, we settled on
> > the following configuration which leads to occasional high GCs but
> > mostly stays between 10-20 second STWs every few minutes (which is
> > almost acceptable): -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions
> > -XX:+UseAdaptiveSizePolicy -XX:+UseLargePages -XX:+UseParallelGC
> > -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=15000 -XX:MaxNewSize=12000m
> > -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=16 -Xms31000m
> > -Xmx31000m
>
> You mentioned something called "HugeTable" ... I assume you're talking
> about huge pages.  If that's what you're talking about, have you also
> turned off transparent huge pages?  If you haven't, you might want to
> completely disable huge pages in your OS.  There's evidence that the
> transparent option can affect performance.
>
> I assume you've probably looked at my GC info at the following URL:
>
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
>
> The parallel collector is most definitely not a good choice.  It does
> not optimize for latency.  It's my understanding that it actually
> prefers full GCs, because it is optimized for throughput.  Solr thrives
> on good latency, throughput doesn't matter very much.
>
> If you want to continue avoiding G1, you should definitely be using
> CMS.  My recommendation right now would be to try the G1 settings on my
> wiki page under the heading "Current experiments" or the CMS settings
> just below that.
>
> The out-of-the-box GC tuning included with Solr 6 is probably a better
> option than the parallel collector you've got configured now.
>
> Thanks,
> Shawn
>
>


-- 

Cas Rusnov,

Engineer
[image: Manzama Logo] <http://www.manzama.com>

Visit our Resource Center <http://www.manzama.com/resource-center/>.

US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
(0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
<+610293266264>

LinkedIn  <http://www.linkedin.com/company/manzama>| Twitter
<https://twitter.com/ManzamaInc>| Facebook
<http://www.facebook.com/manzamainc>| Google +
<https://plus.google.com/u/0/b/116326385357563344293/+ManzamaInc/about>|
YouTube  <https://www.youtube.com/channel/UCgbgt-xWBTxrbQESTVeMMHw>|
Pinterest <https://www.pinterest.com/manzama1754/>

Re: Long STW GCs with Solr Cloud

Posted by Ere Maijala <er...@helsinki.fi>.

16.6.2016, 1.41, Shawn Heisey kirjoitti:
> If you want to continue avoiding G1, you should definitely be using
> CMS.  My recommendation right now would be to try the G1 settings on my
> wiki page under the heading "Current experiments" or the CMS settings
> just below that.

For what it's worth, we're currently running Shawn's G1 settings 
slightly modified for our workload on Java 1.8.0_91 25.91-b14:

GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=16m \
-XX:MaxGCPauseMillis=200 \
-XX:+UnlockExperimentalVMOptions \
-XX:G1NewSizePercent=3 \
-XX:ParallelGCThreads=12 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

It seems that our highly varying loads during day vs. night caused some 
issues leading to long pauses until I added the G1NewSizePercent (which 
needs +UnlockExperimentalVMOptions). Things are running smoothly and 
there are reports that the warnings regarding G1 with Lucene tests don't 
happen anymore with the newer Java versions, but it's of course up to 
you if you're willing to take the chance.

Regards,
Ere

Re: Long STW GCs with Solr Cloud

Posted by Ere Maijala <er...@helsinki.fi>.

17.6.2016, 11.05, Bernd Fehling kirjoitti:
>
>
> Am 17.06.2016 um 09:06 schrieb Ere Maijala:
>> 16.6.2016, 1.41, Shawn Heisey kirjoitti:
>>> If you want to continue avoiding G1, you should definitely be using
>>> CMS.  My recommendation right now would be to try the G1 settings on my
>>> wiki page under the heading "Current experiments" or the CMS settings
>>> just below that.
>>
>> For what it's worth, we're currently running Shawn's G1 settings slightly modified for our workload on Java 1.8.0_91 25.91-b14:
>>
>> GC_TUNE=" \
>> -XX:+UseG1GC \
>> -XX:+ParallelRefProcEnabled \
>> -XX:G1HeapRegionSize=16m \
>> -XX:MaxGCPauseMillis=200 \
>> -XX:+UnlockExperimentalVMOptions \
>> -XX:G1NewSizePercent=3 \
>> -XX:ParallelGCThreads=12 \
>> -XX:+UseLargePages \
>> -XX:+AggressiveOpts \
>> "
>
> -XX:G1NewSizePercent
> <Garbage First Garbage Collector Tuning (oracle.com)>
> ... Sets the percentage of the heap to use as the minimum for the young generation size.
>     The default value is 5 percent of your Java heap. ...
>
> So you are reducing the young heap generation size to get a smoother running system.
> This is strange, like reducing the bottle below the bottleneck.

True, but it works. Perhaps that's due to the default being too much 
with our heap size (> 10 GB). In any case, these settings allow us to 
run with average pause of <150ms and max pause of <2s whiel we 
previously struggled with pauses exceeding 20s at worst. All this was 
inspired by 
https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase.

Regards,
Ere

Re: Long STW GCs with Solr Cloud

Posted by Bernd Fehling <be...@uni-bielefeld.de>.


Am 17.06.2016 um 09:06 schrieb Ere Maijala:
> 16.6.2016, 1.41, Shawn Heisey kirjoitti:
>> If you want to continue avoiding G1, you should definitely be using
>> CMS.  My recommendation right now would be to try the G1 settings on my
>> wiki page under the heading "Current experiments" or the CMS settings
>> just below that.
> 
> For what it's worth, we're currently running Shawn's G1 settings slightly modified for our workload on Java 1.8.0_91 25.91-b14:
> 
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=16m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UnlockExperimentalVMOptions \
> -XX:G1NewSizePercent=3 \
> -XX:ParallelGCThreads=12 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "

-XX:G1NewSizePercent
<Garbage First Garbage Collector Tuning (oracle.com)>
... Sets the percentage of the heap to use as the minimum for the young generation size.
    The default value is 5 percent of your Java heap. ...

So you are reducing the young heap generation size to get a smoother running system.
This is strange, like reducing the bottle below the bottleneck.

Just my 2 cents.

Regards
Bernd

> 
> It seems that our highly varying loads during day vs. night caused some issues leading to long pauses until I added the G1NewSizePercent (which
> needs +UnlockExperimentalVMOptions). Things are running smoothly and there are reports that the warnings regarding G1 with Lucene tests don't
> happen anymore with the newer Java versions, but it's of course up to you if you're willing to take the chance.
> 
> Regards,
> Ere

Re: Long STW GCs with Solr Cloud

Posted by Shawn Heisey <ap...@elyograg.org>.

On 6/15/2016 3:05 PM, Cas Rusnov wrote:
> After trying many of the off the shelf configurations (including CMS
> configurations but excluding G1GC, which we're still taking the
> warnings about seriously), numerous tweaks, rumors, various instance
> sizes, and all the rest, most of which regardless of heap size and
> newspace size resulted in frequent 30+ second STW GCs, we settled on
> the following configuration which leads to occasional high GCs but
> mostly stays between 10-20 second STWs every few minutes (which is
> almost acceptable): -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions
> -XX:+UseAdaptiveSizePolicy -XX:+UseLargePages -XX:+UseParallelGC
> -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=15000 -XX:MaxNewSize=12000m
> -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=16 -Xms31000m
> -Xmx31000m

You mentioned something called "HugeTable" ... I assume you're talking
about huge pages.  If that's what you're talking about, have you also
turned off transparent huge pages?  If you haven't, you might want to
completely disable huge pages in your OS.  There's evidence that the
transparent option can affect performance.

I assume you've probably looked at my GC info at the following URL:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr

The parallel collector is most definitely not a good choice.  It does
not optimize for latency.  It's my understanding that it actually
prefers full GCs, because it is optimized for throughput.  Solr thrives
on good latency, throughput doesn't matter very much.

If you want to continue avoiding G1, you should definitely be using
CMS.  My recommendation right now would be to try the G1 settings on my
wiki page under the heading "Current experiments" or the CMS settings
just below that.

The out-of-the-box GC tuning included with Solr 6 is probably a better
option than the parallel collector you've got configured now.

Thanks,
Shawn