You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Prasanna R <pl...@gmail.com> on 2012/11/13 02:35:22 UTC

Solr GC issues - Too many BooleanQuery & BooleanClause objects in heap

 We have been using Solr in a custom setup where we generate results for
user queries by expanding it to a large boolean query consisting of
multiple prefix queries. There have been some GC issues recently with the
Old/tenured generation becoming nearly 100% full leading to near constant
full GC cycles.

We are running Solr 3.1 on servers with 13G of heap. jmap live object
histogram is as follows:

num     #instances         #bytes  class name
----------------------------------------------
   1:      27441222     1550723760  [Ljava.lang.Object;
   2:      23546318      879258496  [C
   3:      23813405      762028960  java.lang.String
   4:      22700095      726403040  org.apache.lucene.search.BooleanQuery
   5:      27431515      658356360  java.util.ArrayList
   6:      22911883      549885192  org.apache.lucene.search.BooleanClause
   7:      21651039      519624936  org.apache.lucene.index.Term
   8:       6876651      495118872
org.apache.lucene.index.FieldsReader$LazyField
   9:      11354214      363334848  org.apache.lucene.search.PrefixQuery
  10:       4281624      137011968  java.util.HashMap$Entry
  11:       3466680       83200320  org.apache.lucene.search.TermQuery
  12:       1987450       79498000  org.apache.lucene.search.PhraseQuery
  13:        631994       70148624  [Ljava.util.HashMap$Entry;
.....

I have looked at the Solr cache settings multiple times but am not able to
figure out how/why the high number of BooleanQuery and BooleanClause object
instances stay alive. These objects are live and do not get collected even
when the traffic is disabled and a manual GC is triggered which indicates
that someone is holding onto references.

Can anyone provide more details on the circumstances under which these
objects stay alive and/or cached? If they are cached then is the caching
configurable?

Any and all tips/suggestions/pointers will be much appreciated.

Thanks,

Prasanna

Re: Solr GC issues - Too many BooleanQuery & BooleanClause objects in heap

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

Yeah, large heap can be problematic like that. :)
But if there is some sort of a leak, and if I had to bet I'd put my money
on your custom QP knowing what I know about this situation, you could also
start Solr with a much smaller heap and grab the heap snapshot as soon as
you see some number of those objects appearing towards the top of jmap -
that should be enough to trace them to their roots.

Otis
--
Solr Performance Monitoring - http://sematext.com/spm/index.html


On Tue, Nov 13, 2012 at 5:18 PM, Prasanna R <pl...@gmail.com> wrote:

> We do have a custom query parser that is responsible for expanding the user
> input query into a bunch of prefix, phrase and regular boolean queries in a
> manner similar to that done by DisMax.
>
> Analyzing heap with jhat/YourKit is on my list of things to do but I
> haven't gotten around to doing it yet. Our big heap size (13G) makes it a
> little difficult to do a full blown heap dump analysis.
>
> Thanks a ton for the reply Otis!
>
> Prasanna
>
> On Mon, Nov 12, 2012 at 5:42 PM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
> > Hi,
> >
> > I've never seen this.  You don't have a custom query parser or anything
> > else custom, do you?
> > Have you tried dumping and analyzing heap?  YourKit has a 7 day eval, or
> > you can use things like jhat, which may be included on your machine
> already
> > (see
> http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html).
> >
> > Otis
> > --
> > Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> > On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R <pl...@gmail.com>
> wrote:
> >
> > >  We have been using Solr in a custom setup where we generate results
> for
> > > user queries by expanding it to a large boolean query consisting of
> > > multiple prefix queries. There have been some GC issues recently with
> the
> > > Old/tenured generation becoming nearly 100% full leading to near
> constant
> > > full GC cycles.
> > >
> > > We are running Solr 3.1 on servers with 13G of heap. jmap live object
> > > histogram is as follows:
> > >
> > > num     #instances         #bytes  class name
> > > ----------------------------------------------
> > >    1:      27441222     1550723760  [Ljava.lang.Object;
> > >    2:      23546318      879258496  [C
> > >    3:      23813405      762028960  java.lang.String
> > >    4:      22700095      726403040
>  org.apache.lucene.search.BooleanQuery
> > >    5:      27431515      658356360  java.util.ArrayList
> > >    6:      22911883      549885192
> >  org.apache.lucene.search.BooleanClause
> > >    7:      21651039      519624936  org.apache.lucene.index.Term
> > >    8:       6876651      495118872
> > > org.apache.lucene.index.FieldsReader$LazyField
> > >    9:      11354214      363334848
>  org.apache.lucene.search.PrefixQuery
> > >   10:       4281624      137011968  java.util.HashMap$Entry
> > >   11:       3466680       83200320  org.apache.lucene.search.TermQuery
> > >   12:       1987450       79498000
>  org.apache.lucene.search.PhraseQuery
> > >   13:        631994       70148624  [Ljava.util.HashMap$Entry;
> > > .....
> > >
> > > I have looked at the Solr cache settings multiple times but am not able
> > to
> > > figure out how/why the high number of BooleanQuery and BooleanClause
> > object
> > > instances stay alive. These objects are live and do not get collected
> > even
> > > when the traffic is disabled and a manual GC is triggered which
> indicates
> > > that someone is holding onto references.
> > >
> > > Can anyone provide more details on the circumstances under which these
> > > objects stay alive and/or cached? If they are cached then is the
> caching
> > > configurable?
> > >
> > > Any and all tips/suggestions/pointers will be much appreciated.
> > >
> > > Thanks,
> > >
> > > Prasanna
> > >
> >
>

Re: Solr GC issues - Too many BooleanQuery & BooleanClause objects in heap

Posted by Prasanna R <pl...@gmail.com>.

We do have a custom query parser that is responsible for expanding the user
input query into a bunch of prefix, phrase and regular boolean queries in a
manner similar to that done by DisMax.

Analyzing heap with jhat/YourKit is on my list of things to do but I
haven't gotten around to doing it yet. Our big heap size (13G) makes it a
little difficult to do a full blown heap dump analysis.

Thanks a ton for the reply Otis!

Prasanna

On Mon, Nov 12, 2012 at 5:42 PM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hi,
>
> I've never seen this.  You don't have a custom query parser or anything
> else custom, do you?
> Have you tried dumping and analyzing heap?  YourKit has a 7 day eval, or
> you can use things like jhat, which may be included on your machine already
> (see http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html).
>
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R <pl...@gmail.com> wrote:
>
> >  We have been using Solr in a custom setup where we generate results for
> > user queries by expanding it to a large boolean query consisting of
> > multiple prefix queries. There have been some GC issues recently with the
> > Old/tenured generation becoming nearly 100% full leading to near constant
> > full GC cycles.
> >
> > We are running Solr 3.1 on servers with 13G of heap. jmap live object
> > histogram is as follows:
> >
> > num     #instances         #bytes  class name
> > ----------------------------------------------
> >    1:      27441222     1550723760  [Ljava.lang.Object;
> >    2:      23546318      879258496  [C
> >    3:      23813405      762028960  java.lang.String
> >    4:      22700095      726403040  org.apache.lucene.search.BooleanQuery
> >    5:      27431515      658356360  java.util.ArrayList
> >    6:      22911883      549885192
>  org.apache.lucene.search.BooleanClause
> >    7:      21651039      519624936  org.apache.lucene.index.Term
> >    8:       6876651      495118872
> > org.apache.lucene.index.FieldsReader$LazyField
> >    9:      11354214      363334848  org.apache.lucene.search.PrefixQuery
> >   10:       4281624      137011968  java.util.HashMap$Entry
> >   11:       3466680       83200320  org.apache.lucene.search.TermQuery
> >   12:       1987450       79498000  org.apache.lucene.search.PhraseQuery
> >   13:        631994       70148624  [Ljava.util.HashMap$Entry;
> > .....
> >
> > I have looked at the Solr cache settings multiple times but am not able
> to
> > figure out how/why the high number of BooleanQuery and BooleanClause
> object
> > instances stay alive. These objects are live and do not get collected
> even
> > when the traffic is disabled and a manual GC is triggered which indicates
> > that someone is holding onto references.
> >
> > Can anyone provide more details on the circumstances under which these
> > objects stay alive and/or cached? If they are cached then is the caching
> > configurable?
> >
> > Any and all tips/suggestions/pointers will be much appreciated.
> >
> > Thanks,
> >
> > Prasanna
> >
>

Re: Solr GC issues - Too many BooleanQuery & BooleanClause objects in heap

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

I've never seen this.  You don't have a custom query parser or anything
else custom, do you?
Have you tried dumping and analyzing heap?  YourKit has a 7 day eval, or
you can use things like jhat, which may be included on your machine already
(see http://docs.oracle.com/javase/6/docs/technotes/tools/share/jhat.html ).

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html


On Mon, Nov 12, 2012 at 8:35 PM, Prasanna R <pl...@gmail.com> wrote:

>  We have been using Solr in a custom setup where we generate results for
> user queries by expanding it to a large boolean query consisting of
> multiple prefix queries. There have been some GC issues recently with the
> Old/tenured generation becoming nearly 100% full leading to near constant
> full GC cycles.
>
> We are running Solr 3.1 on servers with 13G of heap. jmap live object
> histogram is as follows:
>
> num     #instances         #bytes  class name
> ----------------------------------------------
>    1:      27441222     1550723760  [Ljava.lang.Object;
>    2:      23546318      879258496  [C
>    3:      23813405      762028960  java.lang.String
>    4:      22700095      726403040  org.apache.lucene.search.BooleanQuery
>    5:      27431515      658356360  java.util.ArrayList
>    6:      22911883      549885192  org.apache.lucene.search.BooleanClause
>    7:      21651039      519624936  org.apache.lucene.index.Term
>    8:       6876651      495118872
> org.apache.lucene.index.FieldsReader$LazyField
>    9:      11354214      363334848  org.apache.lucene.search.PrefixQuery
>   10:       4281624      137011968  java.util.HashMap$Entry
>   11:       3466680       83200320  org.apache.lucene.search.TermQuery
>   12:       1987450       79498000  org.apache.lucene.search.PhraseQuery
>   13:        631994       70148624  [Ljava.util.HashMap$Entry;
> .....
>
> I have looked at the Solr cache settings multiple times but am not able to
> figure out how/why the high number of BooleanQuery and BooleanClause object
> instances stay alive. These objects are live and do not get collected even
> when the traffic is disabled and a manual GC is triggered which indicates
> that someone is holding onto references.
>
> Can anyone provide more details on the circumstances under which these
> objects stay alive and/or cached? If they are cached then is the caching
> configurable?
>
> Any and all tips/suggestions/pointers will be much appreciated.
>
> Thanks,
>
> Prasanna
>