You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Annette Newton <an...@servicetick.com> on 2012/11/28 17:06:00 UTC

Permanently Full Old Generation...

Hi,

 

I'm hoping someone can help me with an issue we are encountering with solr
cloud..

 

We are seeing strange gc behaviour after running solr cloud under quite
heavy insert load for a period of time.  The old generation becomes full and
no amount of garbage collection will free up the memory.  I have attached a
memory profile, as you can see it gets progressively worse as the day goes
on to the point where we are always doing full garbage collections all the
time.  The only way I have found to resolve this issue is to reload the
core, then subsequent garbage collections reclaim the used space, that's
what happened at 3pm on the Memory profile.  All the nodes eventually
display the same behaviour.

 

We have multiple threads running adding batches of upto 100 documents at a
time.  I have also attached our Schema and Config.

 

We are running 4 shards each with a single replica, have a 3 node zookeeper
setup and the 8 solr boxes instances are aws High-Memory Double Extra Large
with 34.2 GB Memory, 4 Virtual cores.

 

Thanks in advance

 

Annette Newton

Re: Permanently Full Old Generation...

Posted by Walter Underwood <wu...@wunderwood.org>.

We are running 1.6 update 37. That was released on the same day as your version, so it should have the same bug fixes. We use these options in production, it is very stable:

export CATALINA_OPTS="$CATALINA_OPTS -d64"
export CATALINA_OPTS="$CATALINA_OPTS -Xms4096m -Xmx6144m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:MaxPermSize=256m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:NewSize=2048m"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+UseConcMarkSweepGC -XX:+DoEscapeAnalysis -XX:+UseCompressedOops"
export CATALINA_OPTS="$CATALINA_OPTS -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled"
export CATALINA_OPTS="$CATALINA_OPTS -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
export CATALINA_OPTS="$CATALINA_OPTS -XX:-TraceClassUnloading"

We do indexing and searching on separate machines. At Netflix, I found that the indexing load had a big effect on search speed, so I've separated the functions since then.

wunder
Search Guy, Chegg.com

On Nov 30, 2012, at 8:43 AM, Andy Kershaw wrote:

> We are currently operating at reduced load which is why the ParNew
> collections are not a problem. I don't know how long they were taking
> before though. Thanks for the warning about index formats.
> 
> Our JVM is:
> 
> Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)
> 
> We are currently running more tests but it takes a while before the issues
> become apparent.
> 
> Andy Kershaw
> 
> On 29 November 2012 18:31, Walter Underwood <wu...@wunderwood.org> wrote:
> 
>> Several suggestions.
>> 
>> 1. Adjust the traffic load for about 75% CPU. When you hit 100%, you are
>> already in an overload state and the variance of the response times goes
>> way up. You'll have very noisy benchmark data.
>> 
>> 2. Do not force manual GCs during a benchmark.
>> 
>> 3. Do not force merge (optimise). That is a very expensive operation and
>> will cause slowdowns.
>> 
>> 4. Make eden big enough to hold all data allocated during a request for
>> all simultaneous requests. All that stuff is garbage after the end of the
>> request. If eden fills up, it will be allocated from the tenured space and
>> cause that to grow unnecessarily. We use an 8GB heap and 2GB eden. I like
>> setting the size better than setting ratios.
>> 
>> 5. What version of the JVM are you using?
>> 
>> wunder
>> 
>> On Nov 29, 2012, at 10:15 AM, Shawn Heisey wrote:
>> 
>>> On 11/29/2012 10:44 AM, Andy Kershaw wrote:
>>>> Annette is away until Monday so I am looking into this in the meantime.
>>>> Looking at the times of the Full GC entries at the end of the log, I
>> think
>>>> they are collections we started manually through jconsole to try and
>> reduce
>>>> the size of the old generation. This only seemed to have an effect when
>> we
>>>> reloaded the core first though.
>>>> 
>>>> It is my understanding that the eden size is deliberately smaller to
>> keep
>>>> the ParNew collection time down. If it takes too long then the node is
>>>> flagged as down.
>>> 
>>> Your ParNew collections are taking less than 1 second (some WAY less
>> than one second) to complete and the CMS collections are taking far longer
>> -- 6 seconds seems to be a common number in the GC log.  GC is unavoidable
>> with Java, so if there has to be a collection, you definitely want it to be
>> on the young generation (ParNew).
>>> 
>>> Controversial idea coming up, nothing concrete to back it up.  This
>> means that you might want to wait for a committer to weigh in:  I have seen
>> a lot of recent development work relating to SolrCloud and shard stability.
>> You may want to check out branch_4x from SVN and build that, rather than
>> use 4.0.  I don't have any idea what the timeline for 4.1 is, but based on
>> what I saw for 3.x releases, it should be released relatively soon.
>>> 
>>> The above advice is a bad idea if you have to be able to upgrade from
>> one 4.1 snapshot to a later one without reindexing. There is a possibility
>> that the 4.1 index format will change before release and require a reindex,
>> it has happened at least twice already.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>> 
>> --
>> Walter Underwood
>> wunder@wunderwood.org
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Andy Kershaw
> 
> Technical Developer
> 
> ServiceTick Ltd
> 
> 
> 
> T: +44(0)1603 618326
> 
> M: +44 (0)7876 556833
> 
> 
> 
> Seebohm House, 2-4 Queen Street, Norwich, England, NR2 4SQ
> 
> www.ServiceTick.com <http://www.servicetick.com/>
> 
> www.SessionCam.com <http://www.sessioncam.com/>
> 
> 
> 
> *This message is confidential and is intended to be read solely by the
> addressee. If you have received this message by mistake, please delete it
> and do not copy it to anyone else. Internet communications are not secure
> and may be intercepted or changed after they are sent. ServiceTick Ltd does
> not accept liability for any such changes.*

--
Walter Underwood
wunder@wunderwood.org

Re: Permanently Full Old Generation...

Posted by Andy Kershaw <an...@servicetick.com>.

We are currently operating at reduced load which is why the ParNew
collections are not a problem. I don't know how long they were taking
before though. Thanks for the warning about index formats.

Our JVM is:

Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) 64-Bit Server VM (build 23.5-b02, mixed mode)

We are currently running more tests but it takes a while before the issues
become apparent.

Andy Kershaw

On 29 November 2012 18:31, Walter Underwood <wu...@wunderwood.org> wrote:

> Several suggestions.
>
> 1. Adjust the traffic load for about 75% CPU. When you hit 100%, you are
> already in an overload state and the variance of the response times goes
> way up. You'll have very noisy benchmark data.
>
> 2. Do not force manual GCs during a benchmark.
>
> 3. Do not force merge (optimise). That is a very expensive operation and
> will cause slowdowns.
>
> 4. Make eden big enough to hold all data allocated during a request for
> all simultaneous requests. All that stuff is garbage after the end of the
> request. If eden fills up, it will be allocated from the tenured space and
> cause that to grow unnecessarily. We use an 8GB heap and 2GB eden. I like
> setting the size better than setting ratios.
>
> 5. What version of the JVM are you using?
>
> wunder
>
> On Nov 29, 2012, at 10:15 AM, Shawn Heisey wrote:
>
> > On 11/29/2012 10:44 AM, Andy Kershaw wrote:
> >> Annette is away until Monday so I am looking into this in the meantime.
> >> Looking at the times of the Full GC entries at the end of the log, I
> think
> >> they are collections we started manually through jconsole to try and
> reduce
> >> the size of the old generation. This only seemed to have an effect when
> we
> >> reloaded the core first though.
> >>
> >> It is my understanding that the eden size is deliberately smaller to
> keep
> >> the ParNew collection time down. If it takes too long then the node is
> >> flagged as down.
> >
> > Your ParNew collections are taking less than 1 second (some WAY less
> than one second) to complete and the CMS collections are taking far longer
> -- 6 seconds seems to be a common number in the GC log.  GC is unavoidable
> with Java, so if there has to be a collection, you definitely want it to be
> on the young generation (ParNew).
> >
> > Controversial idea coming up, nothing concrete to back it up.  This
> means that you might want to wait for a committer to weigh in:  I have seen
> a lot of recent development work relating to SolrCloud and shard stability.
>  You may want to check out branch_4x from SVN and build that, rather than
> use 4.0.  I don't have any idea what the timeline for 4.1 is, but based on
> what I saw for 3.x releases, it should be released relatively soon.
> >
> > The above advice is a bad idea if you have to be able to upgrade from
> one 4.1 snapshot to a later one without reindexing. There is a possibility
> that the 4.1 index format will change before release and require a reindex,
> it has happened at least twice already.
> >
> > Thanks,
> > Shawn
> >
>
> --
> Walter Underwood
> wunder@wunderwood.org
>
>
>
>


-- 
Andy Kershaw

Technical Developer

ServiceTick Ltd



T: +44(0)1603 618326

M: +44 (0)7876 556833



Seebohm House, 2-4 Queen Street, Norwich, England, NR2 4SQ

www.ServiceTick.com <http://www.servicetick.com/>

www.SessionCam.com <http://www.sessioncam.com/>



*This message is confidential and is intended to be read solely by the
addressee. If you have received this message by mistake, please delete it
and do not copy it to anyone else. Internet communications are not secure
and may be intercepted or changed after they are sent. ServiceTick Ltd does
not accept liability for any such changes.*

Re: Permanently Full Old Generation...

Posted by Walter Underwood <wu...@wunderwood.org>.

Several suggestions.

1. Adjust the traffic load for about 75% CPU. When you hit 100%, you are already in an overload state and the variance of the response times goes way up. You'll have very noisy benchmark data.

2. Do not force manual GCs during a benchmark.

3. Do not force merge (optimise). That is a very expensive operation and will cause slowdowns.

4. Make eden big enough to hold all data allocated during a request for all simultaneous requests. All that stuff is garbage after the end of the request. If eden fills up, it will be allocated from the tenured space and cause that to grow unnecessarily. We use an 8GB heap and 2GB eden. I like setting the size better than setting ratios.

5. What version of the JVM are you using?

wunder

On Nov 29, 2012, at 10:15 AM, Shawn Heisey wrote:

> On 11/29/2012 10:44 AM, Andy Kershaw wrote:
>> Annette is away until Monday so I am looking into this in the meantime.
>> Looking at the times of the Full GC entries at the end of the log, I think
>> they are collections we started manually through jconsole to try and reduce
>> the size of the old generation. This only seemed to have an effect when we
>> reloaded the core first though.
>> 
>> It is my understanding that the eden size is deliberately smaller to keep
>> the ParNew collection time down. If it takes too long then the node is
>> flagged as down.
> 
> Your ParNew collections are taking less than 1 second (some WAY less than one second) to complete and the CMS collections are taking far longer -- 6 seconds seems to be a common number in the GC log.  GC is unavoidable with Java, so if there has to be a collection, you definitely want it to be on the young generation (ParNew).
> 
> Controversial idea coming up, nothing concrete to back it up.  This means that you might want to wait for a committer to weigh in:  I have seen a lot of recent development work relating to SolrCloud and shard stability.  You may want to check out branch_4x from SVN and build that, rather than use 4.0.  I don't have any idea what the timeline for 4.1 is, but based on what I saw for 3.x releases, it should be released relatively soon.
> 
> The above advice is a bad idea if you have to be able to upgrade from one 4.1 snapshot to a later one without reindexing. There is a possibility that the 4.1 index format will change before release and require a reindex, it has happened at least twice already.
> 
> Thanks,
> Shawn
> 

--
Walter Underwood
wunder@wunderwood.org

Re: Permanently Full Old Generation...

Posted by Shawn Heisey <so...@elyograg.org>.

On 11/29/2012 10:44 AM, Andy Kershaw wrote:
> Annette is away until Monday so I am looking into this in the meantime.
> Looking at the times of the Full GC entries at the end of the log, I think
> they are collections we started manually through jconsole to try and reduce
> the size of the old generation. This only seemed to have an effect when we
> reloaded the core first though.
>
> It is my understanding that the eden size is deliberately smaller to keep
> the ParNew collection time down. If it takes too long then the node is
> flagged as down.

Your ParNew collections are taking less than 1 second (some WAY less 
than one second) to complete and the CMS collections are taking far 
longer -- 6 seconds seems to be a common number in the GC log.  GC is 
unavoidable with Java, so if there has to be a collection, you 
definitely want it to be on the young generation (ParNew).

Controversial idea coming up, nothing concrete to back it up.  This 
means that you might want to wait for a committer to weigh in:  I have 
seen a lot of recent development work relating to SolrCloud and shard 
stability.  You may want to check out branch_4x from SVN and build that, 
rather than use 4.0.  I don't have any idea what the timeline for 4.1 
is, but based on what I saw for 3.x releases, it should be released 
relatively soon.

The above advice is a bad idea if you have to be able to upgrade from 
one 4.1 snapshot to a later one without reindexing. There is a 
possibility that the 4.1 index format will change before release and 
require a reindex, it has happened at least twice already.

Thanks,
Shawn

Re: Permanently Full Old Generation...

Posted by Andy Kershaw <an...@servicetick.com>.

Thanks for responding Shawn.

Annette is away until Monday so I am looking into this in the meantime.
Looking at the times of the Full GC entries at the end of the log, I think
they are collections we started manually through jconsole to try and reduce
the size of the old generation. This only seemed to have an effect when we
reloaded the core first though.

It is my understanding that the eden size is deliberately smaller to keep
the ParNew collection time down. If it takes too long then the node is
flagged as down.

On 29 November 2012 15:28, Shawn Heisey <so...@elyograg.org> wrote:

> > My jvm settings:
> >
> >
> > -Xmx8192M -Xms8192M -XX:+CMSScavengeBeforeRemark -XX:NewRatio=2
> > -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> > -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70
> > -XX:+UseCMSInitiatingOccupancyOnly -XX:-CMSIncrementalPacing
> > -XX:CMSIncrementalDutyCycle=75
> >
> > I turned off IncrementalPacing, and enabled
> > CMSInitiatingOccupancyFraction,
> > after issues with nodes being reported as down due to large Garbage
> > collection pauses.  The problem with the memory profile was visible
> before
> > the drop down to 1.2GB (this was when I reloaded the core), my concern
> was
> > that the collection of the old generation didn't seem to free any of the
> > heap, and we went from occasionally collecting to always collecting the
> > old
> > gen.
> >
> > Please see the attached gc log.
>
> I am on the train for my morning commute, so I have some time, but no
> access to the log or graph.
>
> Confession time: GC logs make me go glassy eyed and babble incoherently,
> but I did take a look at it. I saw 18 CMS collections and three entries
> near the end that saif Full GC. It looks like these collections take 6 to
> 8 seconds. That is pretty nasty, but probably unavoidable, so the goal is
> to make them happen extremely infrequently - do young generation
> collections instead.
>
> The thing that seems to make GC less of a problem for solr is maximizing
> the young generation memory pool. Based on the available info, I would
> start with making NewRatio 1 instead of 2.  This will increase the eden
> size and decrease the old gen size. You may even want to use an explicit
> -Xmn of 6144.  If that doesn't help, you might actually need 6GB or so of
> old gen heap, so try increasing the overall heap size to 9 or 10 GB and
> going back to a NewRatio of 2.
>
> Thanks,
> Shawn
>

RE: Permanently Full Old Generation...

Posted by Shawn Heisey <so...@elyograg.org>.

> My jvm settings:
>
>
> -Xmx8192M -Xms8192M -XX:+CMSScavengeBeforeRemark -XX:NewRatio=2
> -XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70
> -XX:+UseCMSInitiatingOccupancyOnly -XX:-CMSIncrementalPacing
> -XX:CMSIncrementalDutyCycle=75
>
> I turned off IncrementalPacing, and enabled
> CMSInitiatingOccupancyFraction,
> after issues with nodes being reported as down due to large Garbage
> collection pauses.  The problem with the memory profile was visible before
> the drop down to 1.2GB (this was when I reloaded the core), my concern was
> that the collection of the old generation didn't seem to free any of the
> heap, and we went from occasionally collecting to always collecting the
> old
> gen.
>
> Please see the attached gc log.

I am on the train for my morning commute, so I have some time, but no
access to the log or graph.

Confession time: GC logs make me go glassy eyed and babble incoherently,
but I did take a look at it. I saw 18 CMS collections and three entries
near the end that saif Full GC. It looks like these collections take 6 to
8 seconds. That is pretty nasty, but probably unavoidable, so the goal is
to make them happen extremely infrequently - do young generation
collections instead.

The thing that seems to make GC less of a problem for solr is maximizing
the young generation memory pool. Based on the available info, I would
start with making NewRatio 1 instead of 2.  This will increase the eden
size and decrease the old gen size. You may even want to use an explicit
-Xmn of 6144.  If that doesn't help, you might actually need 6GB or so of
old gen heap, so try increasing the overall heap size to 9 or 10 GB and
going back to a NewRatio of 2.

Thanks,
Shawn

RE: Permanently Full Old Generation...

Posted by Annette Newton <an...@servicetick.com>.

My jvm settings:

-Xmx8192M -Xms8192M -XX:+CMSScavengeBeforeRemark -XX:NewRatio=2
-XX:+CMSParallelRemarkEnabled -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSInitiatingOccupancyOnly -XX:-CMSIncrementalPacing
-XX:CMSIncrementalDutyCycle=75

I turned off IncrementalPacing, and enabled CMSInitiatingOccupancyFraction,
after issues with nodes being reported as down due to large Garbage
collection pauses.  The problem with the memory profile was visible before
the drop down to 1.2GB (this was when I reloaded the core), my concern was
that the collection of the old generation didn't seem to free any of the
heap, and we went from occasionally collecting to always collecting the old
gen.

Please see the attached gc log.

-----Original Message-----
From: Shawn Heisey [mailto:solr@elyograg.org] 
Sent: 28 November 2012 16:48
To: solr-user@lucene.apache.org
Subject: Re: Permanently Full Old Generation...

On 11/28/2012 9:06 AM, Annette Newton wrote:
> We are seeing strange gc behaviour after running solr cloud under 
> quite heavy insert load for a period of time. The old generation 
> becomes full and no amount of garbage collection will free up the 
> memory. I have attached a memory profile, as you can see it gets 
> progressively worse as the day goes on to the point where we are 
> always doing full garbage collections all the time. The only way I 
> have found to resolve this issue is to reload the core, then 
> subsequent garbage collections reclaim the used space, that's what 
> happened at 3pm on the Memory profile. All the nodes eventually 
> display the same behaviour.
>
> We have multiple threads running adding batches of upto 100 documents 
> at a time. I have also attached our Schema and Config.
>
> We are running 4 shards each with a single replica, have a 3 node 
> zookeeper setup and the 8 solr boxes instances are aws High-Memory 
> Double Extra Large with 34.2 GB Memory, 4 Virtual cores.
>

Looking at that jconsole graph, I am not seeing a problem. It looks fairly
normal to me, especially for heavy indexing.

Solr has been up for at least 29 hours, just based on the graph I can see,
which may not reflect the full JVM uptime - click on VM Summary to see that.
In that time, you've only spent a total of two minutes in garbage
collection, which does not seem problematic to me. Also, your Old Gen is not
full - it's only using 1.2GB out of 5.5GB available. A full GC
(ConcurrentMarkSweep) would only be automatically triggered when the Old Gen
reaches that 5.5GB mark.

Are you seeing actual performance problems, or are you just concerned about
what you see when you watch the memory?

General note about memory and Solr: I have had very good luck with the
following java memory options. Each machine handles three 22GB index shards
on one JVM.

-Xms4096M
-Xmx8192M
-XX:NewRatio=1
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled

Thanks,
Shawn

Re: Permanently Full Old Generation...

Posted by Shawn Heisey <so...@elyograg.org>.

On 11/28/2012 9:06 AM, Annette Newton wrote:
> We are seeing strange gc behaviour after running solr cloud under 
> quite heavy insert load for a period of time. The old generation 
> becomes full and no amount of garbage collection will free up the 
> memory. I have attached a memory profile, as you can see it gets 
> progressively worse as the day goes on to the point where we are 
> always doing full garbage collections all the time. The only way I 
> have found to resolve this issue is to reload the core, then 
> subsequent garbage collections reclaim the used space, that’s what 
> happened at 3pm on the Memory profile. All the nodes eventually 
> display the same behaviour.
>
> We have multiple threads running adding batches of upto 100 documents 
> at a time. I have also attached our Schema and Config.
>
> We are running 4 shards each with a single replica, have a 3 node 
> zookeeper setup and the 8 solr boxes instances are aws High-Memory 
> Double Extra Large with 34.2 GB Memory, 4 Virtual cores.
>

Looking at that jconsole graph, I am not seeing a problem. It looks 
fairly normal to me, especially for heavy indexing.

Solr has been up for at least 29 hours, just based on the graph I can 
see, which may not reflect the full JVM uptime - click on VM Summary to 
see that. In that time, you've only spent a total of two minutes in 
garbage collection, which does not seem problematic to me. Also, your 
Old Gen is not full - it's only using 1.2GB out of 5.5GB available. A 
full GC (ConcurrentMarkSweep) would only be automatically triggered when 
the Old Gen reaches that 5.5GB mark.

Are you seeing actual performance problems, or are you just concerned 
about what you see when you watch the memory?

General note about memory and Solr: I have had very good luck with the 
following java memory options. Each machine handles three 22GB index 
shards on one JVM.

-Xms4096M
-Xmx8192M
-XX:NewRatio=1
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled

Thanks,
Shawn

RE: Permanently Full Old Generation...

Posted by Annette Newton <an...@servicetick.com>.

Hi,

 


Class <http://localhost:7000/histo/class> 

Instance Count <http://localhost:7000/histo/count> 

Total Size <http://localhost:7000/histo/size> 


class [C <http://localhost:7000/class/0x73ae500c0> 

3410607

699552656


class [Lorg.apache.lucene.util.fst.FST$Arc; <http://localhost:7000/class/0x73b6ff2c0> 

319813

332605520


class [Ljava.lang.Object; <http://localhost:7000/class/0x73ae96900> 

898433

170462152


class [Ljava.util.HashMap$Entry; <http://localhost:7000/class/0x73b60b0e8> 

856551

149091216


class java.util.HashMap$Entry <http://localhost:7000/class/0x73b14d060> 

2802560

100892160


class java.lang.String <http://localhost:7000/class/0x73ae91a80> 

3295405

52726480


class java.util.HashMap <http://localhost:7000/class/0x73b3f2270> 

750750

42042000


class org.apache.lucene.util.fst.FST <http://localhost:7000/class/0x73aee36d8> 

319896

39027312


class org.apache.lucene.index.FieldInfo <http://localhost:7000/class/0x73afd0af0> 

748801

35942448


class [B <http://localhost:7000/class/0x73ae96890> 

1032862

28932353


class java.util.LinkedHashMap$Entry <http://localhost:7000/class/0x73b545a08> 

516680

26867360


class org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader <http://localhost:7000/class/0x73afd0fa0> 

319831

24307156


class java.util.Collections$UnmodifiableMap <http://localhost:7000/class/0x73b016400> 

749522

23984704


class java.util.HashMap$FrontCache <http://localhost:7000/class/0x73afda618> 

863640

17272800


class org.apache.lucene.util.BytesRef <http://localhost:7000/class/0x73b09d2a8> 

709550

11352800


class java.util.LinkedHashMap <http://localhost:7000/class/0x73afd8908> 

109810

7137650


class [I <http://localhost:7000/class/0x73ae50130> 

320987

5268068


class org.apache.lucene.analysis.core.WhitespaceTokenizer <http://localhost:7000/class/0x73b3b3b90> 

59540

5239520


class java.lang.Integer <http://localhost:7000/class/0x73ae0e718> 

1149625

4598500


class org.apache.lucene.util.AttributeSource$State <http://localhost:7000/class/0x73b3b6440> 

169168

2706688


class java.util.TreeMap$Node <http://localhost:7000/class/0x73ae4c290> 

19876

1371444


class [Lorg.apache.lucene.util.AttributeSource$State; <http://localhost:7000/class/0x73b60a978> 

56394

1353456


class org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl <http://localhost:7000/class/0x73afcd418> 

56394

1127880


class org.apache.lucene.analysis.util.CharacterUtils$CharacterBuffer <http://localhost:7000/class/0x73b3b6600> 

52888

951984


class org.apache.lucene.analysis.Analyzer$TokenStreamComponents <http://localhost:7000/class/0x73b3b6670> 

56404

902464


class java.lang.Class <http://localhost:7000/class/0x73ae162f0> 

5706

867312


class org.apache.lucene.util.fst.FST$Arc <http://localhost:7000/class/0x73b3b6a60> 

14017

686833


class java.util.HashMap$Values <http://localhost:7000/class/0x73b605ce8> 

56409

451272


class org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl <http://localhost:7000/class/0x73b3823b8> 

56380

451040


class org.apache.lucene.analysis.tokenattributes.PositionIncrementAttributeImpl <http://localhost:7000/class/0x73b382498> 

56396

225584


class [Ljava.lang.Integer; <http://localhost:7000/class/0x73ae502f0> 

1

161048


class java.util.concurrent.ConcurrentHashMap$HashEntry <http://localhost:7000/class/0x73ae8db40> 

3456

96768


class [[C <http://localhost:7000/class/0x73b60cf18> 

		

 

This sample of the heap dump I took when we encountered the problem previously..

 

-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: 28 November 2012 16:23
To: solr-user@lucene.apache.org
Subject: Re: Permanently Full Old Generation...

 

Have you done a Java heap dump to see what the most common objects are?

 

-- Jack Krupansky

 

From: Annette Newton 

Sent: Wednesday, November 28, 2012 11:06 AM

To:  <ma...@lucene.apache.org> solr-user@lucene.apache.org 

Cc: Andy Kershaw 

Subject: Permanently Full Old Generation...

 

Hi,

 

 

I’m hoping someone can help me with an issue we are encountering with solr cloud..

 

 

We are seeing strange gc behaviour after running solr cloud under quite heavy insert load for a period of time.  The old generation becomes full and no amount of garbage collection will free up the memory.  I have attached a memory profile, as you can see it gets progressively worse as the day goes on to the point where we are always doing full garbage collections all the time.  The only way I have found to resolve this issue is to reload the core, then subsequent garbage collections reclaim the used space, that’s what happened at 3pm on the Memory profile.  All the nodes eventually display the same behaviour.

 

 

We have multiple threads running adding batches of upto 100 documents at a time.  I have also attached our Schema and Config.

 

 

We are running 4 shards each with a single replica, have a 3 node zookeeper setup and the 8 solr boxes instances are aws High-Memory Double Extra Large with 34.2 GB Memory, 4 Virtual cores.

 

 

Thanks in advance

 

 

Annette Newton

Re: Permanently Full Old Generation...

Posted by Jack Krupansky <ja...@basetechnology.com>.

Have you done a Java heap dump to see what the most common objects are?

-- Jack Krupansky

From: Annette Newton 
Sent: Wednesday, November 28, 2012 11:06 AM
To: solr-user@lucene.apache.org 
Cc: Andy Kershaw 
Subject: Permanently Full Old Generation...

Hi,

I’m hoping someone can help me with an issue we are encountering with solr cloud..

We are seeing strange gc behaviour after running solr cloud under quite heavy insert load for a period of time.  The old generation becomes full and no amount of garbage collection will free up the memory.  I have attached a memory profile, as you can see it gets progressively worse as the day goes on to the point where we are always doing full garbage collections all the time.  The only way I have found to resolve this issue is to reload the core, then subsequent garbage collections reclaim the used space, that’s what happened at 3pm on the Memory profile.  All the nodes eventually display the same behaviour.

We have multiple threads running adding batches of upto 100 documents at a time.  I have also attached our Schema and Config.

We are running 4 shards each with a single replica, have a 3 node zookeeper setup and the 8 solr boxes instances are aws High-Memory Double Extra Large with 34.2 GB Memory, 4 Virtual cores.

Thanks in advance

Annette Newton