You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Prasanna R <pl...@gmail.com> on 2012/11/13 02:35:22 UTC
Solr GC issues - Too many BooleanQuery & BooleanClause objects in heap
We have been using Solr in a custom setup where we generate results for
user queries by expanding it to a large boolean query consisting of
multiple prefix queries. There have been some GC issues recently with the
Old/tenured generation becoming nearly 100% full leading to near constant
full GC cycles.
We are running Solr 3.1 on servers with 13G of heap. jmap live object
histogram is as follows:
num #instances #bytes class name
----------------------------------------------
1: 27441222 1550723760 [Ljava.lang.Object;
2: 23546318 879258496 [C
3: 23813405 762028960 java.lang.String
4: 22700095 726403040 org.apache.lucene.search.BooleanQuery
5: 27431515 658356360 java.util.ArrayList
6: 22911883 549885192 org.apache.lucene.search.BooleanClause
7: 21651039 519624936 org.apache.lucene.index.Term
8: 6876651 495118872
org.apache.lucene.index.FieldsReader$LazyField
9: 11354214 363334848 org.apache.lucene.search.PrefixQuery
10: 4281624 137011968 java.util.HashMap$Entry
11: 3466680 83200320 org.apache.lucene.search.TermQuery
12: 1987450 79498000 org.apache.lucene.search.PhraseQuery
13: 631994 70148624 [Ljava.util.HashMap$Entry;
.....
I have looked at the Solr cache settings multiple times but am not able to
figure out how/why the high number of BooleanQuery and BooleanClause object
instances stay alive. These objects are live and do not get collected even
when the traffic is disabled and a manual GC is triggered which indicates
that someone is holding onto references.
Can anyone provide more details on the circumstances under which these
objects stay alive and/or cached? If they are cached then is the caching
configurable?
Any and all tips/suggestions/pointers will be much appreciated.
Thanks,
Prasanna
Re: Solr GC issues - Too many BooleanQuery & BooleanClause objects in heap
Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,
Yeah, large heap can be problematic like that. :)
But if there is some sort of a leak, and if I had to bet I'd put my money
on your custom QP knowing what I know about this situation, you could also
start Solr with a much smaller heap and grab the heap snapshot as soon as
you see some number of those objects appearing towards the top of jmap -
that should be enough to trace them to their roots.
Otis
--
Solr Performance Monitoring - http://sematext.com/spm/index.html
On Tue, Nov 13, 2012 at 5:18 PM, Prasanna R <pl...@gmail.com> wrote:
> We do have a custom query parser that is responsible for expanding the user
> input query into a bunch of prefix, phrase and regular boolean queries in a
> manner similar to that done by DisMax.
>
> Analyzing heap with jhat/YourKit is on my list of things to do but I
> haven't gotten around to doing it yet. Our big heap size (13G) makes it a
> little difficult to do a full blown heap dump analysis.
>
> Thanks a ton for the reply Otis!
>
> Prasanna
>
> On Mon, Nov 12, 2012 at 5:42 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
> > Hi,
> >
> > I've never seen this. You don't have a custom query parser or anything
> > else custom, do you?
> > Have you tried dumping and analyzing heap? YourKit has a 7 day eval, or
> > you can use things like jhat, which may be included on your machine
> already
> > (see
> http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html).
> >
> > Otis
> > --
> > Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> > On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R <pl...@gmail.com>
> wrote:
> >
> > > We have been using Solr in a custom setup where we generate results
> for
> > > user queries by expanding it to a large boolean query consisting of
> > > multiple prefix queries. There have been some GC issues recently with
> the
> > > Old/tenured generation becoming nearly 100% full leading to near
> constant
> > > full GC cycles.
> > >
> > > We are running Solr 3.1 on servers with 13G of heap. jmap live object
> > > histogram is as follows:
> > >
> > > num #instances #bytes class name
> > > ----------------------------------------------
> > > 1: 27441222 1550723760 [Ljava.lang.Object;
> > > 2: 23546318 879258496 [C
> > > 3: 23813405 762028960 java.lang.String
> > > 4: 22700095 726403040
> org.apache.lucene.search.BooleanQuery
> > > 5: 27431515 658356360 java.util.ArrayList
> > > 6: 22911883 549885192
> > org.apache.lucene.search.BooleanClause
> > > 7: 21651039 519624936 org.apache.lucene.index.Term
> > > 8: 6876651 495118872
> > > org.apache.lucene.index.FieldsReader$LazyField
> > > 9: 11354214 363334848
> org.apache.lucene.search.PrefixQuery
> > > 10: 4281624 137011968 java.util.HashMap$Entry
> > > 11: 3466680 83200320 org.apache.lucene.search.TermQuery
> > > 12: 1987450 79498000
> org.apache.lucene.search.PhraseQuery
> > > 13: 631994 70148624 [Ljava.util.HashMap$Entry;
> > > .....
> > >
> > > I have looked at the Solr cache settings multiple times but am not able
> > to
> > > figure out how/why the high number of BooleanQuery and BooleanClause
> > object
> > > instances stay alive. These objects are live and do not get collected
> > even
> > > when the traffic is disabled and a manual GC is triggered which
> indicates
> > > that someone is holding onto references.
> > >
> > > Can anyone provide more details on the circumstances under which these
> > > objects stay alive and/or cached? If they are cached then is the
> caching
> > > configurable?
> > >
> > > Any and all tips/suggestions/pointers will be much appreciated.
> > >
> > > Thanks,
> > >
> > > Prasanna
> > >
> >
>
Re: Solr GC issues - Too many BooleanQuery & BooleanClause objects in heap
Posted by Prasanna R <pl...@gmail.com>.
We do have a custom query parser that is responsible for expanding the user
input query into a bunch of prefix, phrase and regular boolean queries in a
manner similar to that done by DisMax.
Analyzing heap with jhat/YourKit is on my list of things to do but I
haven't gotten around to doing it yet. Our big heap size (13G) makes it a
little difficult to do a full blown heap dump analysis.
Thanks a ton for the reply Otis!
Prasanna
On Mon, Nov 12, 2012 at 5:42 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:
> Hi,
>
> I've never seen this. You don't have a custom query parser or anything
> else custom, do you?
> Have you tried dumping and analyzing heap? YourKit has a 7 day eval, or
> you can use things like jhat, which may be included on your machine already
> (see http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html).
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R <pl...@gmail.com> wrote:
>
> > We have been using Solr in a custom setup where we generate results for
> > user queries by expanding it to a large boolean query consisting of
> > multiple prefix queries. There have been some GC issues recently with the
> > Old/tenured generation becoming nearly 100% full leading to near constant
> > full GC cycles.
> >
> > We are running Solr 3.1 on servers with 13G of heap. jmap live object
> > histogram is as follows:
> >
> > num #instances #bytes class name
> > ----------------------------------------------
> > 1: 27441222 1550723760 [Ljava.lang.Object;
> > 2: 23546318 879258496 [C
> > 3: 23813405 762028960 java.lang.String
> > 4: 22700095 726403040 org.apache.lucene.search.BooleanQuery
> > 5: 27431515 658356360 java.util.ArrayList
> > 6: 22911883 549885192
> org.apache.lucene.search.BooleanClause
> > 7: 21651039 519624936 org.apache.lucene.index.Term
> > 8: 6876651 495118872
> > org.apache.lucene.index.FieldsReader$LazyField
> > 9: 11354214 363334848 org.apache.lucene.search.PrefixQuery
> > 10: 4281624 137011968 java.util.HashMap$Entry
> > 11: 3466680 83200320 org.apache.lucene.search.TermQuery
> > 12: 1987450 79498000 org.apache.lucene.search.PhraseQuery
> > 13: 631994 70148624 [Ljava.util.HashMap$Entry;
> > .....
> >
> > I have looked at the Solr cache settings multiple times but am not able
> to
> > figure out how/why the high number of BooleanQuery and BooleanClause
> object
> > instances stay alive. These objects are live and do not get collected
> even
> > when the traffic is disabled and a manual GC is triggered which indicates
> > that someone is holding onto references.
> >
> > Can anyone provide more details on the circumstances under which these
> > objects stay alive and/or cached? If they are cached then is the caching
> > configurable?
> >
> > Any and all tips/suggestions/pointers will be much appreciated.
> >
> > Thanks,
> >
> > Prasanna
> >
>
Re: Solr GC issues - Too many BooleanQuery & BooleanClause objects in heap
Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,
I've never seen this. You don't have a custom query parser or anything
else custom, do you?
Have you tried dumping and analyzing heap? YourKit has a 7 day eval, or
you can use things like jhat, which may be included on your machine already
(see http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html ).
Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R <pl...@gmail.com> wrote:
> We have been using Solr in a custom setup where we generate results for
> user queries by expanding it to a large boolean query consisting of
> multiple prefix queries. There have been some GC issues recently with the
> Old/tenured generation becoming nearly 100% full leading to near constant
> full GC cycles.
>
> We are running Solr 3.1 on servers with 13G of heap. jmap live object
> histogram is as follows:
>
> num #instances #bytes class name
> ----------------------------------------------
> 1: 27441222 1550723760 [Ljava.lang.Object;
> 2: 23546318 879258496 [C
> 3: 23813405 762028960 java.lang.String
> 4: 22700095 726403040 org.apache.lucene.search.BooleanQuery
> 5: 27431515 658356360 java.util.ArrayList
> 6: 22911883 549885192 org.apache.lucene.search.BooleanClause
> 7: 21651039 519624936 org.apache.lucene.index.Term
> 8: 6876651 495118872
> org.apache.lucene.index.FieldsReader$LazyField
> 9: 11354214 363334848 org.apache.lucene.search.PrefixQuery
> 10: 4281624 137011968 java.util.HashMap$Entry
> 11: 3466680 83200320 org.apache.lucene.search.TermQuery
> 12: 1987450 79498000 org.apache.lucene.search.PhraseQuery
> 13: 631994 70148624 [Ljava.util.HashMap$Entry;
> .....
>
> I have looked at the Solr cache settings multiple times but am not able to
> figure out how/why the high number of BooleanQuery and BooleanClause object
> instances stay alive. These objects are live and do not get collected even
> when the traffic is disabled and a manual GC is triggered which indicates
> that someone is holding onto references.
>
> Can anyone provide more details on the circumstances under which these
> objects stay alive and/or cached? If they are cached then is the caching
> configurable?
>
> Any and all tips/suggestions/pointers will be much appreciated.
>
> Thanks,
>
> Prasanna
>