You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Marc Des Garets <ma...@192.com> on 2013/04/10 17:48:03 UTC

migration solr 3.5 to 4.1 - JVM GC problems

Hi,

I run multiple solr indexes in 1 single tomcat (1 webapp per index). All
the indexes are solr 3.5 and I have upgraded few of them to solr 4.1
(about half of them).

The JVM behavior is now radically different and doesn't seem to make
sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.

The perm gen went from 410Mb to 600Mb.

The eden space usage is a lot bigger and the survivor space usage is
100% all the time.

I don't really understand what is happening. GC behavior really doesn't
seem right.

My jvm settings:
-d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
-XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m

I have tried NewRatio=1 and SurvivorRatio=3 hoping to get the Survivor
space to not be 100% full all the time without success.

Here is what jmap is giving me:
Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 42949672960 (40960.0MB)
   NewSize          = 1363144 (1.2999954223632812MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5452592 (5.1999969482421875MB)
   NewRatio         = 1
   SurvivorRatio    = 3
   PermSize         = 754974720 (720.0MB)
   MaxPermSize      = 763363328 (728.0MB)
   G1HeapRegionSize = 16777216 (16.0MB)

Heap Usage:
G1 Heap:
   regions  = 2560
   capacity = 42949672960 (40960.0MB)
   used     = 23786449912 (22684.526359558105MB)
   free     = 19163223048 (18275.473640441895MB)
   55.382144432514906% used
G1 Young Generation:
Eden Space:
   regions  = 674
   capacity = 20619198464 (19664.0MB)
   used     = 11307843584 (10784.0MB)
   free     = 9311354880 (8880.0MB)
   54.841334418226204% used
Survivor Space:
   regions  = 115
   capacity = 1929379840 (1840.0MB)
   used     = 1929379840 (1840.0MB)
   free     = 0 (0.0MB)
   100.0% used
G1 Old Generation:
   regions  = 732
   capacity = 20401094656 (19456.0MB)
   used     = 10549226488 (10060.526359558105MB)
   free     = 9851868168 (9395.473640441895MB)
   51.70911985792612% used
Perm Generation:
   capacity = 754974720 (720.0MB)
   used     = 514956504 (491.10079193115234MB)
   free     = 240018216 (228.89920806884766MB)
   68.20844332377116% used

The Survivor space even went up to 3.6Gb but was still 100% used.

I have disabled all caches.

Obviously I am getting very bad GC performance.

Any idea as to what could be wrong and why this could be happening?


Thanks,

Marc


This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. 
Any views or opinions expressed within it are those of the author and do not necessarily represent those of 
192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must 
not disclose, copy or take any action in reliance of this transmission. If you have received this 
transmission in error, please notify the sender as soon as possible. No employee or agent is authorised 
to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written 
confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192). 
192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.

Re: migration solr 3.5 to 4.1 - JVM GC problems

Posted by Otis Gospodnetic <ot...@gmail.com>.

Marc,

Re smaller index sizes - it's the stored field compression that didn't
exist in 3.x.
See https://issues.apache.org/jira/browse/SOLR-4375

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Thu, Apr 11, 2013 at 10:53 AM, Marc Des Garets
<ma...@192.com> wrote:
> Same config. I compared both, some defaults changed like ramBufferSize
> which I've set like in 3.5 (same with other things).
>
> It becomes even more strange to me. Now I have changed the jvm settings
> to this:
> -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=6
> -XX:SurvivorRatio=2 -XX:G1ReservePercent=10 -XX:MaxGCPauseMillis=100
> -XX:InitiatingHeapOccupancyPercent=30 -XX:PermSize=728m -XX:MaxPermSize=728m
>
> So the Eden space is just 6Gb, survivor space is still weird (80Mb) and
> full 100% of time, and old gen is 34Gb.
>
> I now get GCs of just 0.07 sec every 30sec/1mn. Very regular like this:
> [GC pause (young) 16214M->10447M(40960M), 0.0738720 secs]
>
> Just 30% of the total heap is used.
>
> After while it's going to do:
> [GC pause (young) (initial-mark) 11603M->11391M(40960M), 0.1099990 secs]
> [GC concurrent-root-region-scan-start]
> [GC concurrent-root-region-scan-end, 0.0172380]
> [GC concurrent-mark-start]
> [GC concurrent-mark-end, 0.4824210 sec]
> [GC remark, 0.0248680 secs]
> [GC cleanup 11476M->11476M(40960M), 0.0116420 secs]
>
> Which looks pretty good. If I am not mistaken, concurrent-mark isn't
> stop the world. remark is stop the world but is just 0.02 sec and GC
> cleanup is also stop the world but is just 0.01 sec.
>
> By the look of it I could have a 20g heap rather than 40... Now I am
> waiting to see what happens when it will clear the old gen but that will
> take a while before it happens because it is growing slowly.
>
> Still mysterious to me but it looks like it's going to all work out.
>
> On 04/11/2013 03:06 PM, Jack Krupansky wrote:
>> Same config? Do a compare with the new example config and see what settings
>> are different/changed. There may have been some defaults that changed. Read
>> the comments in the new config.
>>
>> If you had just taken or merged the new config, then I would suggest making
>> sure that the update log is not enabled (or make sure you do hard commits
>> relatively frequently rather than only soft commits.)
>>
>> -- Jack Krupansky
>>
>> -----Original Message-----
>> From: Marc Des Garets
>> Sent: Thursday, April 11, 2013 3:07 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: migration solr 3.5 to 4.1 - JVM GC problems
>>
>> Big heap because very large number of requests with more than 60 indexes
>> and hundreds of million of documents (all indexes together). My problem
>> is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1
>> or 2mn and 20Gb of the heap is used.
>>
>> With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird
>> (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs
>> every minute.
>>
>> There must be something that has changed in 4.1 compared to 3.5 to cause
>> this behavior. It's the same requests, same schemas (excepted 4 fields
>> changed from sint to tint) and same config.
>>
>> On 04/10/2013 07:38 PM, Shawn Heisey wrote:
>>> On 4/10/2013 9:48 AM, Marc Des Garets wrote:
>>>> The JVM behavior is now radically different and doesn't seem to make
>>>> sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.
>>>>
>>>> The perm gen went from 410Mb to 600Mb.
>>>>
>>>> The eden space usage is a lot bigger and the survivor space usage is
>>>> 100% all the time.
>>>>
>>>> I don't really understand what is happening. GC behavior really doesn't
>>>> seem right.
>>>>
>>>> My jvm settings:
>>>> -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
>>>> -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m
>>> As Otis has already asked, why do you have a 40GB heap?  The only way I
>>> can imagine that you would actually NEED a heap that big is if your
>>> index size is measured in hundreds of gigabytes.  If you really do need
>>> a heap that big, you will probably need to go with a JVM like Zing.  I
>>> don't know how much Zing costs, but they claim to be able to make any
>>> heap size perform well under any load.  It is Linux-only.
>>>
>>> I was running into extreme problems with GC pauses with my own setup,
>>> and that was only with an 8GB heap.  I was using the CMS collector and
>>> NewRatio=1.  Switching to G1 didn't help at all - it might have even
>>> made the problem worse.  I never did try the Zing JVM.
>>>
>>> After a lot of experimentation (which I will admit was not done very
>>> methodically) I found JVM options that have reduced the GC pause problem
>>> greatly.  Below is what I am using now on Solr 4.2.1 with a total
>>> per-server index size of about 45GB.  This works properly on CentOS 6
>>> with Oracle Java 7u17, UseLargePages may require special kernel tuning
>>> on other operating systems:
>>>
>>> -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
>>> -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
>>> -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
>>>
>>> These options could probably use further tuning, but I haven't had time
>>> for the kind of testing that will be required.
>>>
>>> If you decide to pay someone to make the problem going away instead:
>>>
>>> http://www.azulsystems.com/products/zing/whatisit
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>>
>>
>> This transmission is strictly confidential, possibly legally privileged, and
>> intended solely for the addressee.
>> Any views or opinions expressed within it are those of the author and do not
>> necessarily represent those of
>> 192.com Ltd or any of its subsidiary companies. If you are not the intended
>> recipient then you must
>> not disclose, copy or take any action in reliance of this transmission. If
>> you have received this
>> transmission in error, please notify the sender as soon as possible. No
>> employee or agent is authorised
>> to conclude any binding agreement on behalf 192.com Ltd with another party
>> by email without express written
>> confirmation by an authorised employee of the company. http://www.192.com
>> (Tel: 08000 192 192).
>> 192.com Ltd is incorporated in England and Wales, company number 07180348,
>> VAT No. GB 103226273.
>>
>>
>>
>
>
> This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee.
> Any views or opinions expressed within it are those of the author and do not necessarily represent those of
> 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must
> not disclose, copy or take any action in reliance of this transmission. If you have received this
> transmission in error, please notify the sender as soon as possible. No employee or agent is authorised
> to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written
> confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192).
> 192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.

Re: migration solr 3.5 to 4.1 - JVM GC problems

Posted by Marc Des Garets <ma...@192.com>.

Same config. I compared both, some defaults changed like ramBufferSize
which I've set like in 3.5 (same with other things).

It becomes even more strange to me. Now I have changed the jvm settings
to this:
-d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=6
-XX:SurvivorRatio=2 -XX:G1ReservePercent=10 -XX:MaxGCPauseMillis=100
-XX:InitiatingHeapOccupancyPercent=30 -XX:PermSize=728m -XX:MaxPermSize=728m

So the Eden space is just 6Gb, survivor space is still weird (80Mb) and
full 100% of time, and old gen is 34Gb.

I now get GCs of just 0.07 sec every 30sec/1mn. Very regular like this:
[GC pause (young) 16214M->10447M(40960M), 0.0738720 secs]

Just 30% of the total heap is used.

After while it's going to do:
[GC pause (young) (initial-mark) 11603M->11391M(40960M), 0.1099990 secs]
[GC concurrent-root-region-scan-start]
[GC concurrent-root-region-scan-end, 0.0172380]
[GC concurrent-mark-start]
[GC concurrent-mark-end, 0.4824210 sec]
[GC remark, 0.0248680 secs]
[GC cleanup 11476M->11476M(40960M), 0.0116420 secs]

Which looks pretty good. If I am not mistaken, concurrent-mark isn't
stop the world. remark is stop the world but is just 0.02 sec and GC
cleanup is also stop the world but is just 0.01 sec.

By the look of it I could have a 20g heap rather than 40... Now I am
waiting to see what happens when it will clear the old gen but that will
take a while before it happens because it is growing slowly.

Still mysterious to me but it looks like it's going to all work out.

On 04/11/2013 03:06 PM, Jack Krupansky wrote:
> Same config? Do a compare with the new example config and see what settings 
> are different/changed. There may have been some defaults that changed. Read 
> the comments in the new config.
>
> If you had just taken or merged the new config, then I would suggest making 
> sure that the update log is not enabled (or make sure you do hard commits 
> relatively frequently rather than only soft commits.)
>
> -- Jack Krupansky
>
> -----Original Message----- 
> From: Marc Des Garets
> Sent: Thursday, April 11, 2013 3:07 AM
> To: solr-user@lucene.apache.org
> Subject: Re: migration solr 3.5 to 4.1 - JVM GC problems
>
> Big heap because very large number of requests with more than 60 indexes
> and hundreds of million of documents (all indexes together). My problem
> is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1
> or 2mn and 20Gb of the heap is used.
>
> With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird
> (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs
> every minute.
>
> There must be something that has changed in 4.1 compared to 3.5 to cause
> this behavior. It's the same requests, same schemas (excepted 4 fields
> changed from sint to tint) and same config.
>
> On 04/10/2013 07:38 PM, Shawn Heisey wrote:
>> On 4/10/2013 9:48 AM, Marc Des Garets wrote:
>>> The JVM behavior is now radically different and doesn't seem to make
>>> sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.
>>>
>>> The perm gen went from 410Mb to 600Mb.
>>>
>>> The eden space usage is a lot bigger and the survivor space usage is
>>> 100% all the time.
>>>
>>> I don't really understand what is happening. GC behavior really doesn't
>>> seem right.
>>>
>>> My jvm settings:
>>> -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
>>> -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m
>> As Otis has already asked, why do you have a 40GB heap?  The only way I
>> can imagine that you would actually NEED a heap that big is if your
>> index size is measured in hundreds of gigabytes.  If you really do need
>> a heap that big, you will probably need to go with a JVM like Zing.  I
>> don't know how much Zing costs, but they claim to be able to make any
>> heap size perform well under any load.  It is Linux-only.
>>
>> I was running into extreme problems with GC pauses with my own setup,
>> and that was only with an 8GB heap.  I was using the CMS collector and
>> NewRatio=1.  Switching to G1 didn't help at all - it might have even
>> made the problem worse.  I never did try the Zing JVM.
>>
>> After a lot of experimentation (which I will admit was not done very
>> methodically) I found JVM options that have reduced the GC pause problem
>> greatly.  Below is what I am using now on Solr 4.2.1 with a total
>> per-server index size of about 45GB.  This works properly on CentOS 6
>> with Oracle Java 7u17, UseLargePages may require special kernel tuning
>> on other operating systems:
>>
>> -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
>> -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
>> -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
>>
>> These options could probably use further tuning, but I haven't had time
>> for the kind of testing that will be required.
>>
>> If you decide to pay someone to make the problem going away instead:
>>
>> http://www.azulsystems.com/products/zing/whatisit
>>
>> Thanks,
>> Shawn
>>
>>
>>
>
> This transmission is strictly confidential, possibly legally privileged, and 
> intended solely for the addressee.
> Any views or opinions expressed within it are those of the author and do not 
> necessarily represent those of
> 192.com Ltd or any of its subsidiary companies. If you are not the intended 
> recipient then you must
> not disclose, copy or take any action in reliance of this transmission. If 
> you have received this
> transmission in error, please notify the sender as soon as possible. No 
> employee or agent is authorised
> to conclude any binding agreement on behalf 192.com Ltd with another party 
> by email without express written
> confirmation by an authorised employee of the company. http://www.192.com 
> (Tel: 08000 192 192).
> 192.com Ltd is incorporated in England and Wales, company number 07180348, 
> VAT No. GB 103226273. 
>
>
>


This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. 
Any views or opinions expressed within it are those of the author and do not necessarily represent those of 
192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must 
not disclose, copy or take any action in reliance of this transmission. If you have received this 
transmission in error, please notify the sender as soon as possible. No employee or agent is authorised 
to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written 
confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192). 
192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.

Re: migration solr 3.5 to 4.1 - JVM GC problems

Posted by Jack Krupansky <ja...@basetechnology.com>.

Same config? Do a compare with the new example config and see what settings 
are different/changed. There may have been some defaults that changed. Read 
the comments in the new config.

If you had just taken or merged the new config, then I would suggest making 
sure that the update log is not enabled (or make sure you do hard commits 
relatively frequently rather than only soft commits.)

-- Jack Krupansky

-----Original Message----- 
From: Marc Des Garets
Sent: Thursday, April 11, 2013 3:07 AM
To: solr-user@lucene.apache.org
Subject: Re: migration solr 3.5 to 4.1 - JVM GC problems

Big heap because very large number of requests with more than 60 indexes
and hundreds of million of documents (all indexes together). My problem
is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1
or 2mn and 20Gb of the heap is used.

With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird
(it changed the size capacity to 6Mb at some point) and I have 2 sec GCs
every minute.

There must be something that has changed in 4.1 compared to 3.5 to cause
this behavior. It's the same requests, same schemas (excepted 4 fields
changed from sint to tint) and same config.

On 04/10/2013 07:38 PM, Shawn Heisey wrote:
> On 4/10/2013 9:48 AM, Marc Des Garets wrote:
>> The JVM behavior is now radically different and doesn't seem to make
>> sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.
>>
>> The perm gen went from 410Mb to 600Mb.
>>
>> The eden space usage is a lot bigger and the survivor space usage is
>> 100% all the time.
>>
>> I don't really understand what is happening. GC behavior really doesn't
>> seem right.
>>
>> My jvm settings:
>> -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
>> -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m
> As Otis has already asked, why do you have a 40GB heap?  The only way I
> can imagine that you would actually NEED a heap that big is if your
> index size is measured in hundreds of gigabytes.  If you really do need
> a heap that big, you will probably need to go with a JVM like Zing.  I
> don't know how much Zing costs, but they claim to be able to make any
> heap size perform well under any load.  It is Linux-only.
>
> I was running into extreme problems with GC pauses with my own setup,
> and that was only with an 8GB heap.  I was using the CMS collector and
> NewRatio=1.  Switching to G1 didn't help at all - it might have even
> made the problem worse.  I never did try the Zing JVM.
>
> After a lot of experimentation (which I will admit was not done very
> methodically) I found JVM options that have reduced the GC pause problem
> greatly.  Below is what I am using now on Solr 4.2.1 with a total
> per-server index size of about 45GB.  This works properly on CentOS 6
> with Oracle Java 7u17, UseLargePages may require special kernel tuning
> on other operating systems:
>
> -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
> -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
> -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
>
> These options could probably use further tuning, but I haven't had time
> for the kind of testing that will be required.
>
> If you decide to pay someone to make the problem going away instead:
>
> http://www.azulsystems.com/products/zing/whatisit
>
> Thanks,
> Shawn
>
>
>

This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the addressee.
Any views or opinions expressed within it are those of the author and do not 
necessarily represent those of
192.com Ltd or any of its subsidiary companies. If you are not the intended 
recipient then you must
not disclose, copy or take any action in reliance of this transmission. If 
you have received this
transmission in error, please notify the sender as soon as possible. No 
employee or agent is authorised
to conclude any binding agreement on behalf 192.com Ltd with another party 
by email without express written
confirmation by an authorised employee of the company. http://www.192.com 
(Tel: 08000 192 192).
192.com Ltd is incorporated in England and Wales, company number 07180348, 
VAT No. GB 103226273.

Re: migration solr 3.5 to 4.1 - JVM GC problems

Posted by Marc Des Garets <ma...@192.com>.

I have 45 solr 4.1 indexes. Sizes vary between 20Gb and 2.2Gb.

- 1 is 20Gb (80 million docs)
- 1 is 5.1Gb (24 million docs)
- 1 is 5.6Gb (26 million docs)
- 1 is 6.5Gb (28 million docs)
- 11 others are about 2.2Gb (6-7 million docs).
- 20 others are about 600Mb (2.5 million docs)

That reminds me of something. The 4.1 indexes are 2 times smaller than
the 3.5 indexes. For example the one which is 20Gb with solr 4.1 is 43Gb
with solr 3.5. Maybe there is something there?

There is roughly 200 queries per second.


On 04/11/2013 11:07 AM, Furkan KAMACI wrote:
> Hi Marc;
>
> Could I learn your index size and what is your performance measure as query
> per second?
>
> 2013/4/11 Marc Des Garets <ma...@192.com>
>
>> Big heap because very large number of requests with more than 60 indexes
>> and hundreds of million of documents (all indexes together). My problem
>> is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1
>> or 2mn and 20Gb of the heap is used.
>>
>> With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird
>> (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs
>> every minute.
>>
>> There must be something that has changed in 4.1 compared to 3.5 to cause
>> this behavior. It's the same requests, same schemas (excepted 4 fields
>> changed from sint to tint) and same config.
>>
>> On 04/10/2013 07:38 PM, Shawn Heisey wrote:
>>> On 4/10/2013 9:48 AM, Marc Des Garets wrote:
>>>> The JVM behavior is now radically different and doesn't seem to make
>>>> sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.
>>>>
>>>> The perm gen went from 410Mb to 600Mb.
>>>>
>>>> The eden space usage is a lot bigger and the survivor space usage is
>>>> 100% all the time.
>>>>
>>>> I don't really understand what is happening. GC behavior really doesn't
>>>> seem right.
>>>>
>>>> My jvm settings:
>>>> -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
>>>> -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m
>>> As Otis has already asked, why do you have a 40GB heap?  The only way I
>>> can imagine that you would actually NEED a heap that big is if your
>>> index size is measured in hundreds of gigabytes.  If you really do need
>>> a heap that big, you will probably need to go with a JVM like Zing.  I
>>> don't know how much Zing costs, but they claim to be able to make any
>>> heap size perform well under any load.  It is Linux-only.
>>>
>>> I was running into extreme problems with GC pauses with my own setup,
>>> and that was only with an 8GB heap.  I was using the CMS collector and
>>> NewRatio=1.  Switching to G1 didn't help at all - it might have even
>>> made the problem worse.  I never did try the Zing JVM.
>>>
>>> After a lot of experimentation (which I will admit was not done very
>>> methodically) I found JVM options that have reduced the GC pause problem
>>> greatly.  Below is what I am using now on Solr 4.2.1 with a total
>>> per-server index size of about 45GB.  This works properly on CentOS 6
>>> with Oracle Java 7u17, UseLargePages may require special kernel tuning
>>> on other operating systems:
>>>
>>> -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
>>> -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
>>> -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
>>>
>>> These options could probably use further tuning, but I haven't had time
>>> for the kind of testing that will be required.
>>>
>>> If you decide to pay someone to make the problem going away instead:
>>>
>>> http://www.azulsystems.com/products/zing/whatisit
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
>>>
>>
>> This transmission is strictly confidential, possibly legally privileged,
>> and intended solely for the addressee.
>> Any views or opinions expressed within it are those of the author and do
>> not necessarily represent those of
>> 192.com Ltd or any of its subsidiary companies. If you are not the
>> intended recipient then you must
>> not disclose, copy or take any action in reliance of this transmission. If
>> you have received this
>> transmission in error, please notify the sender as soon as possible. No
>> employee or agent is authorised
>> to conclude any binding agreement on behalf 192.com Ltd with another
>> party by email without express written
>> confirmation by an authorised employee of the company. http://www.192.com(Tel: 08000 192 192).
>> 192.com Ltd is incorporated in England and Wales, company number
>> 07180348, VAT No. GB 103226273.
>>


This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. 
Any views or opinions expressed within it are those of the author and do not necessarily represent those of 
192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must 
not disclose, copy or take any action in reliance of this transmission. If you have received this 
transmission in error, please notify the sender as soon as possible. No employee or agent is authorised 
to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written 
confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192). 
192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.

Re: migration solr 3.5 to 4.1 - JVM GC problems

Posted by Furkan KAMACI <fu...@gmail.com>.

Hi Marc;

Could I learn your index size and what is your performance measure as query
per second?

2013/4/11 Marc Des Garets <ma...@192.com>

> Big heap because very large number of requests with more than 60 indexes
> and hundreds of million of documents (all indexes together). My problem
> is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1
> or 2mn and 20Gb of the heap is used.
>
> With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird
> (it changed the size capacity to 6Mb at some point) and I have 2 sec GCs
> every minute.
>
> There must be something that has changed in 4.1 compared to 3.5 to cause
> this behavior. It's the same requests, same schemas (excepted 4 fields
> changed from sint to tint) and same config.
>
> On 04/10/2013 07:38 PM, Shawn Heisey wrote:
> > On 4/10/2013 9:48 AM, Marc Des Garets wrote:
> >> The JVM behavior is now radically different and doesn't seem to make
> >> sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.
> >>
> >> The perm gen went from 410Mb to 600Mb.
> >>
> >> The eden space usage is a lot bigger and the survivor space usage is
> >> 100% all the time.
> >>
> >> I don't really understand what is happening. GC behavior really doesn't
> >> seem right.
> >>
> >> My jvm settings:
> >> -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
> >> -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m
> > As Otis has already asked, why do you have a 40GB heap?  The only way I
> > can imagine that you would actually NEED a heap that big is if your
> > index size is measured in hundreds of gigabytes.  If you really do need
> > a heap that big, you will probably need to go with a JVM like Zing.  I
> > don't know how much Zing costs, but they claim to be able to make any
> > heap size perform well under any load.  It is Linux-only.
> >
> > I was running into extreme problems with GC pauses with my own setup,
> > and that was only with an 8GB heap.  I was using the CMS collector and
> > NewRatio=1.  Switching to G1 didn't help at all - it might have even
> > made the problem worse.  I never did try the Zing JVM.
> >
> > After a lot of experimentation (which I will admit was not done very
> > methodically) I found JVM options that have reduced the GC pause problem
> > greatly.  Below is what I am using now on Solr 4.2.1 with a total
> > per-server index size of about 45GB.  This works properly on CentOS 6
> > with Oracle Java 7u17, UseLargePages may require special kernel tuning
> > on other operating systems:
> >
> > -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
> > -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
> > -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
> >
> > These options could probably use further tuning, but I haven't had time
> > for the kind of testing that will be required.
> >
> > If you decide to pay someone to make the problem going away instead:
> >
> > http://www.azulsystems.com/products/zing/whatisit
> >
> > Thanks,
> > Shawn
> >
> >
> >
>
>
> This transmission is strictly confidential, possibly legally privileged,
> and intended solely for the addressee.
> Any views or opinions expressed within it are those of the author and do
> not necessarily represent those of
> 192.com Ltd or any of its subsidiary companies. If you are not the
> intended recipient then you must
> not disclose, copy or take any action in reliance of this transmission. If
> you have received this
> transmission in error, please notify the sender as soon as possible. No
> employee or agent is authorised
> to conclude any binding agreement on behalf 192.com Ltd with another
> party by email without express written
> confirmation by an authorised employee of the company. http://www.192.com(Tel: 08000 192 192).
> 192.com Ltd is incorporated in England and Wales, company number
> 07180348, VAT No. GB 103226273.
>

Re: migration solr 3.5 to 4.1 - JVM GC problems

Posted by Marc Des Garets <ma...@192.com>.

Big heap because very large number of requests with more than 60 indexes
and hundreds of million of documents (all indexes together). My problem
is with solr 4.1. All is perfect with 3.5. I have 0.05 sec GCs every 1
or 2mn and 20Gb of the heap is used.

With the 4.1 indexes it uses 30Gb-33Gb, the survivor space is all weird
(it changed the size capacity to 6Mb at some point) and I have 2 sec GCs
every minute.

There must be something that has changed in 4.1 compared to 3.5 to cause
this behavior. It's the same requests, same schemas (excepted 4 fields
changed from sint to tint) and same config.

On 04/10/2013 07:38 PM, Shawn Heisey wrote:
> On 4/10/2013 9:48 AM, Marc Des Garets wrote:
>> The JVM behavior is now radically different and doesn't seem to make
>> sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.
>>
>> The perm gen went from 410Mb to 600Mb.
>>
>> The eden space usage is a lot bigger and the survivor space usage is
>> 100% all the time.
>>
>> I don't really understand what is happening. GC behavior really doesn't
>> seem right.
>>
>> My jvm settings:
>> -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
>> -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m
> As Otis has already asked, why do you have a 40GB heap?  The only way I 
> can imagine that you would actually NEED a heap that big is if your 
> index size is measured in hundreds of gigabytes.  If you really do need 
> a heap that big, you will probably need to go with a JVM like Zing.  I 
> don't know how much Zing costs, but they claim to be able to make any 
> heap size perform well under any load.  It is Linux-only.
>
> I was running into extreme problems with GC pauses with my own setup, 
> and that was only with an 8GB heap.  I was using the CMS collector and 
> NewRatio=1.  Switching to G1 didn't help at all - it might have even 
> made the problem worse.  I never did try the Zing JVM.
>
> After a lot of experimentation (which I will admit was not done very 
> methodically) I found JVM options that have reduced the GC pause problem 
> greatly.  Below is what I am using now on Solr 4.2.1 with a total 
> per-server index size of about 45GB.  This works properly on CentOS 6 
> with Oracle Java 7u17, UseLargePages may require special kernel tuning 
> on other operating systems:
>
> -Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 
> -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled 
> -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
>
> These options could probably use further tuning, but I haven't had time 
> for the kind of testing that will be required.
>
> If you decide to pay someone to make the problem going away instead:
>
> http://www.azulsystems.com/products/zing/whatisit
>
> Thanks,
> Shawn
>
>
>

This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. 
Any views or opinions expressed within it are those of the author and do not necessarily represent those of 
192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must 
not disclose, copy or take any action in reliance of this transmission. If you have received this 
transmission in error, please notify the sender as soon as possible. No employee or agent is authorised 
to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written 
confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192). 
192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.

Re: migration solr 3.5 to 4.1 - JVM GC problems

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/10/2013 9:48 AM, Marc Des Garets wrote:
> The JVM behavior is now radically different and doesn't seem to make
> sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.
>
> The perm gen went from 410Mb to 600Mb.
>
> The eden space usage is a lot bigger and the survivor space usage is
> 100% all the time.
>
> I don't really understand what is happening. GC behavior really doesn't
> seem right.
>
> My jvm settings:
> -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
> -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m

As Otis has already asked, why do you have a 40GB heap?  The only way I 
can imagine that you would actually NEED a heap that big is if your 
index size is measured in hundreds of gigabytes.  If you really do need 
a heap that big, you will probably need to go with a JVM like Zing.  I 
don't know how much Zing costs, but they claim to be able to make any 
heap size perform well under any load.  It is Linux-only.

I was running into extreme problems with GC pauses with my own setup, 
and that was only with an 8GB heap.  I was using the CMS collector and 
NewRatio=1.  Switching to G1 didn't help at all - it might have even 
made the problem worse.  I never did try the Zing JVM.

After a lot of experimentation (which I will admit was not done very 
methodically) I found JVM options that have reduced the GC pause problem 
greatly.  Below is what I am using now on Solr 4.2.1 with a total 
per-server index size of about 45GB.  This works properly on CentOS 6 
with Oracle Java 7u17, UseLargePages may require special kernel tuning 
on other operating systems:

-Xmx6144M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 
-XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled 
-XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts

These options could probably use further tuning, but I haven't had time 
for the kind of testing that will be required.

If you decide to pay someone to make the problem going away instead:

http://www.azulsystems.com/products/zing/whatisit

Thanks,
Shawn

Re: migration solr 3.5 to 4.1 - JVM GC problems

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi Marc,

Why such a big heap?  Do you really need it?  You disabled all caches,
so the JVM really shouldn't need much memory.  Have you tried with
-Xmx20g or even -Xmx8g?  Aha, survivor is getting to 100% so you kept
increasing -Xmx?

Have you tried just not using any of these:
-XX:+UseG1GC -XX:NewRatio=1 -XX:SurvivorRatio=3 -XX:PermSize=728m
-XX:MaxPermSize=728m ?

My hunch is that there is a leak somewhere, because without caches you
shouldn't eed 40GB heap.

Otis
--
SOLR Performance Monitoring - http://sematext.com/spm/index.html
Solr & ElasticSearch Support
http://sematext.com/





On Wed, Apr 10, 2013 at 11:48 AM, Marc Des Garets
<ma...@192.com> wrote:
> Hi,
>
> I run multiple solr indexes in 1 single tomcat (1 webapp per index). All
> the indexes are solr 3.5 and I have upgraded few of them to solr 4.1
> (about half of them).
>
> The JVM behavior is now radically different and doesn't seem to make
> sense. I was using ConcMarkSweepGC. I am now trying the G1 collector.
>
> The perm gen went from 410Mb to 600Mb.
>
> The eden space usage is a lot bigger and the survivor space usage is
> 100% all the time.
>
> I don't really understand what is happening. GC behavior really doesn't
> seem right.
>
> My jvm settings:
> -d64 -server -Xms40g -Xmx40g -XX:+UseG1GC -XX:NewRatio=1
> -XX:SurvivorRatio=3 -XX:PermSize=728m -XX:MaxPermSize=728m
>
> I have tried NewRatio=1 and SurvivorRatio=3 hoping to get the Survivor
> space to not be 100% full all the time without success.
>
> Here is what jmap is giving me:
> Heap Configuration:
>    MinHeapFreeRatio = 40
>    MaxHeapFreeRatio = 70
>    MaxHeapSize      = 42949672960 (40960.0MB)
>    NewSize          = 1363144 (1.2999954223632812MB)
>    MaxNewSize       = 17592186044415 MB
>    OldSize          = 5452592 (5.1999969482421875MB)
>    NewRatio         = 1
>    SurvivorRatio    = 3
>    PermSize         = 754974720 (720.0MB)
>    MaxPermSize      = 763363328 (728.0MB)
>    G1HeapRegionSize = 16777216 (16.0MB)
>
> Heap Usage:
> G1 Heap:
>    regions  = 2560
>    capacity = 42949672960 (40960.0MB)
>    used     = 23786449912 (22684.526359558105MB)
>    free     = 19163223048 (18275.473640441895MB)
>    55.382144432514906% used
> G1 Young Generation:
> Eden Space:
>    regions  = 674
>    capacity = 20619198464 (19664.0MB)
>    used     = 11307843584 (10784.0MB)
>    free     = 9311354880 (8880.0MB)
>    54.841334418226204% used
> Survivor Space:
>    regions  = 115
>    capacity = 1929379840 (1840.0MB)
>    used     = 1929379840 (1840.0MB)
>    free     = 0 (0.0MB)
>    100.0% used
> G1 Old Generation:
>    regions  = 732
>    capacity = 20401094656 (19456.0MB)
>    used     = 10549226488 (10060.526359558105MB)
>    free     = 9851868168 (9395.473640441895MB)
>    51.70911985792612% used
> Perm Generation:
>    capacity = 754974720 (720.0MB)
>    used     = 514956504 (491.10079193115234MB)
>    free     = 240018216 (228.89920806884766MB)
>    68.20844332377116% used
>
> The Survivor space even went up to 3.6Gb but was still 100% used.
>
> I have disabled all caches.
>
> Obviously I am getting very bad GC performance.
>
> Any idea as to what could be wrong and why this could be happening?
>
>
> Thanks,
>
> Marc
>
>
> This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee.
> Any views or opinions expressed within it are those of the author and do not necessarily represent those of
> 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must
> not disclose, copy or take any action in reliance of this transmission. If you have received this
> transmission in error, please notify the sender as soon as possible. No employee or agent is authorised
> to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written
> confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192).
> 192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.