You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Luis Neves <lu...@co.sapo.pt> on 2007/07/25 12:55:31 UTC

OOM when autowarming is enabled

Hello all.

We are having some issues with one of our Solr instances when autowarming is 
enabled. The index has about 2.2M documents and 2GB of size, so it's not 
particularly big. Solr runs with "-Xmx1024M -Xms1024M".

We are constantly inserting and updating the index, about 20 new/updated 
documents per minute, with a commit every 10 minutes.
These are our cache settings:

<filterCache class="solr.LRUCache" size="512" initialSize="512" 
autowarmCount="256"/>

<queryResultCache class="solr.LRUCache" size="512" initialSize="512" 
autowarmCount="256"/>

<documentCache class="solr.LRUCache" size="512" initialSize="512" 
autowarmCount="0"/>

When the autowarming is disabled there are no OOM errors, but the first search 
after a commit takes ~10 seconds and that is too long.

I've enabled the "-XX:+HeapDumpOnOutOfMemoryError" flag. If this happen again I 
will be able to produce a headdump for analysis... meanwhile is there any 
setting that we can tweak that is easier on the memory and still manages to make 
the first search after a commit return in a reasonable time?

Thanks!

--
Luis Neves



StackTrace:
Error during auto-warming of 
key:org.apache.solr.search.QueryResultKey@ce951c5e:java.lang.OutOfMemoryError: 
GC overhead limit exceeded
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:104)
at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:159)
at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:165)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:153)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54)
at org.apache.lucene.index.MultiTermDocs.termDocs(MultiReader.java:429)
at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:380)
at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:383)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:350)
at 
org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:56)
at 
org.apache.solr.search.function.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:57)
at 
org.apache.solr.search.function.LinearFloatFunction.getValues(LinearFloatFunction.java:49)
at 
org.apache.solr.search.function.FunctionQuery$AllScorer.<init>(FunctionQuery.java:100)
at 
org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:78)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:233)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:143)
at org.apache.lucene.search.Searcher.search(Searcher.java:118)
at org.apache.lucene.search.Searcher.search(Searcher.java:97)
at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:888)
at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:805)
at org.apache.solr.search.SolrIndexSearcher.access$1(SolrIndexSearcher.java:709)
at 
org.apache.solr.search.SolrIndexSearcher$2.regenerateItem(SolrIndexSearcher.java:251)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:193)
at org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1385)
at org.apache.solr.core.SolrCore$1.call(SolrCore.java:488)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

Re: OOM when autowarming is enabled

Posted by Yonik Seeley <yo...@apache.org>.
On 7/25/07, Luis Neves <lu...@co.sapo.pt> wrote:
> Yonik Seeley wrote:
> > On 7/25/07, Luis Neves <lu...@co.sapo.pt> wrote:
> >> This turn out to be a bad idea ... for some reason using the
> >> BoostQuery instead
> >> of the BoostFunction slows the search to a crawl.
> >
> > Dismax throws bq in with the main query, so it can't really be cached
> > separately, so iterating over the number of terms in [* TO
> > NOW/DAY-3MONTH] for each query is expensive.
>
> Ok.
>
> > You could try lowering the resolution of EntryDate to lower the number
> > of unique terms (but that would require reindexing).  That would speed
> > up a range query, or lower the memory usage of the FieldCache entry.
> >
> > Solr could also somehow be smarter about the FieldCache and only cache
> > the ordinal and not the actual values (this could apply to sorting
> > too).  Lucene's FieldCache doesn't currently support that though, so
> > it would require some hacking.
> >
> > If you didn't want date math, date faceting, or date ranges, you could
> > simply store a date as  a classic integer (number of seconds since
> > epoch).  function queries would still work on this, and the FieldCache
> > would be 4 bytes per doc.
>
> I will do a combination of both, I will add a new int field to the index and use
> it to hold the number of weeks since epoch (week resolution is good enough for
> freshness in our case).

For type "integer" (indexed as straight text... not "sint"), the
FieldCache values are stored directly into the int[] array (instead of
being indexes into a string array), so lowering the resolution to
weekly won't make a difference in memory (it will be 4 bytes per doc
regardless), but *will* speed up first-time generation of this number.

-Yonik

Re: OOM when autowarming is enabled

Posted by Luis Neves <lu...@co.sapo.pt>.
Yonik Seeley wrote:
> On 7/25/07, Luis Neves <lu...@co.sapo.pt> wrote:
>> This turn out to be a bad idea ... for some reason using the 
>> BoostQuery instead
>> of the BoostFunction slows the search to a crawl.
> 
> Dismax throws bq in with the main query, so it can't really be cached
> separately, so iterating over the number of terms in [* TO
> NOW/DAY-3MONTH] for each query is expensive.

Ok.

> You could try lowering the resolution of EntryDate to lower the number
> of unique terms (but that would require reindexing).  That would speed
> up a range query, or lower the memory usage of the FieldCache entry.
> 
> Solr could also somehow be smarter about the FieldCache and only cache
> the ordinal and not the actual values (this could apply to sorting
> too).  Lucene's FieldCache doesn't currently support that though, so
> it would require some hacking.
> 
> If you didn't want date math, date faceting, or date ranges, you could
> simply store a date as  a classic integer (number of seconds since
> epoch).  function queries would still work on this, and the FieldCache
> would be 4 bytes per doc.

I will do a combination of both, I will add a new int field to the index and use 
it to hold the number of weeks since epoch (week resolution is good enough for 
freshness in our case).

Thanks!

--
Luis Neves


Re: OOM when autowarming is enabled

Posted by Yonik Seeley <yo...@apache.org>.
On 7/25/07, Luis Neves <lu...@co.sapo.pt> wrote:
> Luis Neves wrote:
>
> > The objective is to boost the documents by "freshness" ...  this is
> > probably the cause of the memory abuse since all the "EntryDate" values
> > are unique.
> > I will try to use something like:
> > <str name="bq">EntryDate:[* TO NOW/DAY-3MONTH]^1.5</str>
>
> This turn out to be a bad idea ... for some reason using the BoostQuery instead
> of the BoostFunction slows the search to a crawl.

Dismax throws bq in with the main query, so it can't really be cached
separately, so iterating over the number of terms in [* TO
NOW/DAY-3MONTH] for each query is expensive.

You could try lowering the resolution of EntryDate to lower the number
of unique terms (but that would require reindexing).  That would speed
up a range query, or lower the memory usage of the FieldCache entry.

Solr could also somehow be smarter about the FieldCache and only cache
the ordinal and not the actual values (this could apply to sorting
too).  Lucene's FieldCache doesn't currently support that though, so
it would require some hacking.

If you didn't want date math, date faceting, or date ranges, you could
simply store a date as  a classic integer (number of seconds since
epoch).  function queries would still work on this, and the FieldCache
would be 4 bytes per doc.

-Yonik

Re: OOM when autowarming is enabled

Posted by Luis Neves <lu...@co.sapo.pt>.
Luis Neves wrote:

> The objective is to boost the documents by "freshness" ...  this is 
> probably the cause of the memory abuse since all the "EntryDate" values 
> are unique.
> I will try to use something like:
> <str name="bq">EntryDate:[* TO NOW/DAY-3MONTH]^1.5</str>

This turn out to be a bad idea ... for some reason using the BoostQuery instead 
of the BoostFunction slows the search to a crawl.

--
Luis Neves


Re: OOM when autowarming is enabled

Posted by Luis Neves <lu...@co.sapo.pt>.
Yonik Seeley wrote:
> On 7/25/07, Luis Neves <lu...@co.sapo.pt> wrote:
>> We are having some issues with one of our Solr instances when 
>> autowarming is
>> enabled. The index has about 2.2M documents and 2GB of size, so it's not
>> particularly big. Solr runs with "-Xmx1024M -Xms1024M".
> 
> "Big" is relative to what you are trying to do (faceting, sorting, etc).

Good point. We don't use faceting or sorting in this particular index.

>> From the stack trace it looks like a function query is the last
> straw... it causes a FieldCache entry to be populated, just like
> sorting would.  Depending on the number of unique terms in the field,
> and the number of fields you sort on or do function queries on, it can
> take quite a bit of memory.

I see ... we use the DismaxQueryHandler and the bf parameter is set like:

<str name="bf">linear(recip(rord(EntryDate),1,1000,1000),11,0)</str>

The objective is to boost the documents by "freshness" ...  this is probably the 
cause of the memory abuse since all the "EntryDate" values are unique.
I will try to use something like:
<str name="bq">EntryDate:[* TO NOW/DAY-3MONTH]^1.5</str>

Thanks!!

--
Luis Neves

Re: OOM when autowarming is enabled

Posted by Yonik Seeley <yo...@apache.org>.
On 7/25/07, Luis Neves <lu...@co.sapo.pt> wrote:
> We are having some issues with one of our Solr instances when autowarming is
> enabled. The index has about 2.2M documents and 2GB of size, so it's not
> particularly big. Solr runs with "-Xmx1024M -Xms1024M".

"Big" is relative to what you are trying to do (faceting, sorting, etc).
>From the stack trace it looks like a function query is the last
straw... it causes a FieldCache entry to be populated, just like
sorting would.  Depending on the number of unique terms in the field,
and the number of fields you sort on or do function queries on, it can
take quite a bit of memory.

-Yonik