You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Ecker <br...@gmail.com> on 2019/05/02 10:15:20 UTC

Re: Determing Solr heap requirments and analyzing memory usage

Just to update here in order to help others that might run into similar
issues in the future, the problem is resolved. The issue was caused by the
queryResultCache. This was very easy to determine by analyzing a heap dump.
In our setup we had the following config:

<queryResultCache class="solr.FastLRUCache" maxRamMB="3072"
autowarmCount="0"/>

In reality this maxRamMB="3072" was not as expected, and this cache was
using *way* more memory (about 6-8 times the amount). See the following
screenshot from Eclipse MAT (http://oi63.tinypic.com/epn341.jpg). Notice in
the left window that ramBytes, the internal calculation of how much memory
Solr currently thinks this cache is using, is 1894333464B (1894MB). Now
notice that the highlighted line, the ConcurrentLRUCache used internally by
the FastLRUCache representing the queryResultCache, is actually using
12212779160B (12212MB). On further investigation, I realized that this
cache is a map from a query with all its associated objects as the key, to
a very simple object containing an array of document (integer) ids as the
value.

Looking into the lucene-solr source, I found the following line for the
calculation of ramBytesUsed
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/util/ConcurrentLRUCache.java#L605.
Surprisingly, the query objects used as keys in the queryResultCache do not
implement Accountable as far as I can tell, and this lines up very well
with our observation of memory usage because in the heap dump we can also
see that the keys in the cache are using substantially more memory than the
values and completely account for the additional memory usage. It was quite
surprising to me that the keys were given a default value of 192B as
specified in LRUCache.DEFAULT_RAM_BYTES_USED because I can't actually
imagine a case where the keys in the queryResultCache would be so small. I
imagine that in almost all cases the keys would actually be larger than the
values for the queryResultCache, but that's probably not true for all
usages of a FastLRUCache.

We solved our memory usage issue by drastically reducing the maxRamMB value
and calculating the actual max usage as maxRamMB * 8. It would be quite
useful to have this detail at least documented somewhere.

-Brian

On Tue, Apr 23, 2019 at 9:49 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 4/23/2019 11:48 AM, Brian Ecker wrote:
> > I see. The other files I meant to attach were the GC log (
> > https://pastebin.com/raw/qeuQwsyd), the heap histogram (
> > https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
> > http://oi64.tinypic.com/21r0bk.jpg).
>
> I have no idea what to do with the histogram.  I doubt it's all that
> useful anyway, as it wouldn't have any information about what parts of
> the system are using the most.
>
> The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time.
>   To get anything useful out of a GC log, it would probably need to
> cover hours of runtime.
>
> But if you are experiencing OutOfMemoryError, then either you have run
> into something where a memory leak exists, or there's something about
> your index or your queries that needs more heap than you have allocated.
>   Memory leaks are not super common in Solr, but they have happened.
>
> Tuning GC will never help OOME problems.
>
> The screenshot looks like it matches the info below.
>
> > I'll work on getting the heap dump, but would it also be sufficient to
> use
> > say a 5GB dump from when it's half full and then extrapolate to the
> > contents of the heap when it's full? That way the dump would be a bit
> > easier to work with.
>
> That might be useful.  The only way to know for sure is to take a look
> at it to see if the part of the code using lots of heap is detectable.
>
> > There are around 2,100,000 documents.
> <snip>
> > The data takes around 9GB on disk.
>
> Ordinarily, I would expect that level of data to not need a whole lot of
> heap.  10GB would be more than I would think necessary, but if your
> queries are big consumers of memory, I could be wrong.  I ran indexes
> with 30 million documents taking up 50GB of disk space on an 8GB heap.
> I probably could have gone lower with no problems.
>
> I have absolutely no idea what kind of requirements the spellcheck
> feature has.  I've never used that beyond a few test queries.  If the
> query information you sent is complete, I wouldn't expect the
> non-spellcheck parts to require a whole lot of heap.  So perhaps
> spellcheck is the culprit here.  Somebody else will need to comment on
> that.
>
> Thanks,
> Shawn
>

Re: Determing Solr heap requirments and analyzing memory usage

Posted by Erick Erickson <er...@gmail.com>.
Brian: 

Many thanks for letting us know what you found. I’ll attach this to SOLR-13003 which is about this exact issue but doesn’t contain this information. This is a great help.

> On May 2, 2019, at 6:15 AM, Brian Ecker <br...@gmail.com> wrote:
> 
> Just to update here in order to help others that might run into similar
> issues in the future, the problem is resolved. The issue was caused by the
> queryResultCache. This was very easy to determine by analyzing a heap dump.
> In our setup we had the following config:
> 
> <queryResultCache class="solr.FastLRUCache" maxRamMB="3072"
> autowarmCount="0"/>
> 
> In reality this maxRamMB="3072" was not as expected, and this cache was
> using *way* more memory (about 6-8 times the amount). See the following
> screenshot from Eclipse MAT (http://oi63.tinypic.com/epn341.jpg). Notice in
> the left window that ramBytes, the internal calculation of how much memory
> Solr currently thinks this cache is using, is 1894333464B (1894MB). Now
> notice that the highlighted line, the ConcurrentLRUCache used internally by
> the FastLRUCache representing the queryResultCache, is actually using
> 12212779160B (12212MB). On further investigation, I realized that this
> cache is a map from a query with all its associated objects as the key, to
> a very simple object containing an array of document (integer) ids as the
> value.
> 
> Looking into the lucene-solr source, I found the following line for the
> calculation of ramBytesUsed
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/util/ConcurrentLRUCache.java#L605.
> Surprisingly, the query objects used as keys in the queryResultCache do not
> implement Accountable as far as I can tell, and this lines up very well
> with our observation of memory usage because in the heap dump we can also
> see that the keys in the cache are using substantially more memory than the
> values and completely account for the additional memory usage. It was quite
> surprising to me that the keys were given a default value of 192B as
> specified in LRUCache.DEFAULT_RAM_BYTES_USED because I can't actually
> imagine a case where the keys in the queryResultCache would be so small. I
> imagine that in almost all cases the keys would actually be larger than the
> values for the queryResultCache, but that's probably not true for all
> usages of a FastLRUCache.
> 
> We solved our memory usage issue by drastically reducing the maxRamMB value
> and calculating the actual max usage as maxRamMB * 8. It would be quite
> useful to have this detail at least documented somewhere.
> 
> -Brian
> 
> On Tue, Apr 23, 2019 at 9:49 PM Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> On 4/23/2019 11:48 AM, Brian Ecker wrote:
>>> I see. The other files I meant to attach were the GC log (
>>> https://pastebin.com/raw/qeuQwsyd), the heap histogram (
>>> https://pastebin.com/raw/aapKTKTU), and the screenshot from top (
>>> http://oi64.tinypic.com/21r0bk.jpg).
>> 
>> I have no idea what to do with the histogram.  I doubt it's all that
>> useful anyway, as it wouldn't have any information about what parts of
>> the system are using the most.
>> 
>> The GC log is not complete.  It only covers 2 min 47 sec 674 ms of time.
>>  To get anything useful out of a GC log, it would probably need to
>> cover hours of runtime.
>> 
>> But if you are experiencing OutOfMemoryError, then either you have run
>> into something where a memory leak exists, or there's something about
>> your index or your queries that needs more heap than you have allocated.
>>  Memory leaks are not super common in Solr, but they have happened.
>> 
>> Tuning GC will never help OOME problems.
>> 
>> The screenshot looks like it matches the info below.
>> 
>>> I'll work on getting the heap dump, but would it also be sufficient to
>> use
>>> say a 5GB dump from when it's half full and then extrapolate to the
>>> contents of the heap when it's full? That way the dump would be a bit
>>> easier to work with.
>> 
>> That might be useful.  The only way to know for sure is to take a look
>> at it to see if the part of the code using lots of heap is detectable.
>> 
>>> There are around 2,100,000 documents.
>> <snip>
>>> The data takes around 9GB on disk.
>> 
>> Ordinarily, I would expect that level of data to not need a whole lot of
>> heap.  10GB would be more than I would think necessary, but if your
>> queries are big consumers of memory, I could be wrong.  I ran indexes
>> with 30 million documents taking up 50GB of disk space on an 8GB heap.
>> I probably could have gone lower with no problems.
>> 
>> I have absolutely no idea what kind of requirements the spellcheck
>> feature has.  I've never used that beyond a few test queries.  If the
>> query information you sent is complete, I wouldn't expect the
>> non-spellcheck parts to require a whole lot of heap.  So perhaps
>> spellcheck is the culprit here.  Somebody else will need to comment on
>> that.
>> 
>> Thanks,
>> Shawn
>>