You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Kelly, Frank" <fr...@here.com> on 2017/02/09 13:19:26 UTC

Solr Heap Dump: Any suggestions on what to look for?

Got a heap dump on an Out of Memory error.
Analyzing the dump now in Visual VM

Seeing a lot of byte[] arrays (77% of our 8GB Heap) in

  *   TreeMap$Entry
  *   FieldCacheImpl$SortedDocValues

We’re considering switch over to DocValues but would rather be definitive about the root cause before we experiment with DocValues and require a reindex of our 200M document index
In each of our 4 data centers.

Any suggestions on what I should look for in this heap dump to get a definitive root cause?

Cheers!

-Frank


[Description: Macintosh HD:Users:jerchow:Downloads:Asset_Package_01_160721:HERE_Logo_2016:sRGB:PDF:HERE_Logo_2016_POS_sRGB.pdf]



Frank Kelly

Principal Software Engineer



HERE

5 Wayside Rd, Burlington, MA 01803, USA

42° 29' 7" N 71° 11' 32" W

[Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_360.gif]<http://360.here.com/>    [Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Twitter.gif] <https://www.twitter.com/here>    [Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_FB.gif] <https://www.facebook.com/here>     [Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_IN.gif] <https://www.linkedin.com/company/heremaps>     [Description: /Users/nussbaum/_WORK/PROJECTS/20160726_HERE_EMail_Signature/_Layout/_Images/20160726_HERE_EMail_Signature_Insta.gif] <https://www.instagram.com/here/>

Re: Solr Heap Dump: Any suggestions on what to look for?

Posted by "Kelly, Frank" <fr...@here.com>.
To clarify 


"we put ³docValues²=³true² on the schema” should have said
"we put ³docValues²=³true² on the id field only”

-Frank

On 2/10/17, 10:27 AM, "Kelly, Frank" <fr...@here.com> wrote:

>Thanks Shawn,
>
>Yeah think we have identified root cause thanks to some of the suggestions
>here.
>
>Originally we stopped using deleteByQuery as we saw it caused some large
>CPU spikes (see 
>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.
>apache.org%2Fjira%2Fbrowse%2FLUCENE-7049&data=01%7C01%7C%7Cd9606e62fa5a421
>a08d008d451c95f04%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=5LhJ4eWQY1s
>tkH0vyMm6c5kzeOcpjOXLtzU5gql6TT8%3D&reserved=0) and
>Solr pauses
>And switched to using a search and then deleteById. It worked fine on our
>(small) test collections.
>
>But with 200M documents it appears that deleteById causes the heap to
>increase dramatically (we guess fieldCache gets populated with a large
>number of object ids?)
>To confirm our suspicion we put ³docValues²=³true² on the schema and began
>to reindex and the heap memory usage dropped significantly - in fact heap
>memory usage on the Solr VMs dropped by a half.
>
>Can someone confirm (or deny) our suspicion that deleteById results in
>some on-heap caching of the unique key (id?)?
>
>
>Cheers!
>
>-Frank
>
>P.s. Interesting when I searched the Wiki for docs on deleteById I did not
>find any
>https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.a
>pache.org%2Fconfluence%2Fdosearchsite.action%3Fwhere%3Dsolr%26spaceSea&dat
>a=01%7C01%7C%7Cd9606e62fa5a421a08d008d451c95f04%7C6d4034cd72254f72b85391fe
>aea64919%7C1&sdata=ixHi%2BZ%2B5wlqQ3tu%2FSQCcgjqPIfMRA2ta7Uo%2BBvwEUxE%3D&
>reserved=0
>rch=true&queryString=deleteById
>
>
>P.p.s Separately we are also turning off FilterCache but we know from
>usage and plugin stats that it is not in use but best to turn it off
>entirely for risk reduction
>
> 
>Frank Kelly
>Principal Software Engineer
> 
>HERE 
>5 Wayside Rd, Burlington, MA 01803, USA
>42° 29' 7" N 71° 11' 32" W
> 
> 
><https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2F360.her
>e.com%2F&data=01%7C01%7C%7Cd9606e62fa5a421a08d008d451c95f04%7C6d4034cd7225
>4f72b85391feaea64919%7C1&sdata=R%2BAbWMlSJ%2FRN0oAF3smwJawoQGr4U4%2BFdKCxy
>XWLXIg%3D&reserved=0>
><https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.tw
>itter.com%2Fhere&data=01%7C01%7C%7Cd9606e62fa5a421a08d008d451c95f04%7C6d40
>34cd72254f72b85391feaea64919%7C1&sdata=qnVxW4o1CDcnjOiKdqjhCddGHUqbVlZuvxp
>zMxRme0s%3D&reserved=0>
><https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.fa
>cebook.com%2Fhere&data=01%7C01%7C%7Cd9606e62fa5a421a08d008d451c95f04%7C6d4
>034cd72254f72b85391feaea64919%7C1&sdata=YaluC4BvPWpKhe5HQ8aaJqy7eW4SIOEdls
>8tNp63xV0%3D&reserved=0>
><https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.li
>nkedin.com%2Fcompany%2Fheremaps&data=01%7C01%7C%7Cd9606e62fa5a421a08d008d4
>51c95f04%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=jLfR0kUX4yDZ29FeJEN5
>2jRUxYAPOEaXqoq3L67xSBk%3D&reserved=0>
><https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.in
>stagram.com%2Fhere%2F&data=01%7C01%7C%7Cd9606e62fa5a421a08d008d451c95f04%7
>C6d4034cd72254f72b85391feaea64919%7C1&sdata=xKrwI%2BcUq0sSNf%2FUUdiz9GA%2B
>ckjttBO61qCk1%2BwlsTk%3D&reserved=0>
>
>
>
>On 2/9/17, 11:00 AM, "Shawn Heisey" <ap...@elyograg.org> wrote:
>
>>On 2/9/2017 6:19 AM, Kelly, Frank wrote:
>>> Got a heap dump on an Out of Memory error.
>>> Analyzing the dump now in Visual VM
>>>
>>> Seeing a lot of byte[] arrays (77% of our 8GB Heap) in
>>>
>>>   * TreeMap$Entry
>>>   * FieldCacheImpl$SortedDocValues
>>>
>>> We¹re considering switch over to DocValues but would rather be
>>> definitive about the root cause before we experiment with DocValues
>>> and require a reindex of our 200M document index
>>> In each of our 4 data centers.
>>>
>>> Any suggestions on what I should look for in this heap dump to get a
>>> definitive root cause?
>>>
>>
>>Analyzing the cause of large memory allocations when the large
>>allocations are byte[] arrays might mean that it's a low-level class,
>>probably in Lucene.  Solr will likely have almost no influence on these
>>memory allocations, except by changing the schema to enable docValues,
>>which changes the particular Lucene code that is called.  Note that
>>wiping the index and rebuilding it from scratch is necessary when you
>>enable docValues.
>>
>>Another possible source of problems like this is the filterCache.  A 200
>>million document index (assuming it's all on the same machine) results
>>in filterCache entries that are 25 million bytes each.  In Solr
>>examples, the filterCache defaults to a size of 512.  If a cache that
>>size on a 200 million document index fills up, it will require nearly 13
>>gigabytes of heap memory.
>>
>>Thanks,
>>Shawn
>>
>


Re: Solr Heap Dump: Any suggestions on what to look for?

Posted by "Kelly, Frank" <fr...@here.com>.
Thanks Shawn,

Yeah think we have identified root cause thanks to some of the suggestions
here.

Originally we stopped using deleteByQuery as we saw it caused some large
CPU spikes (see https://issues.apache.org/jira/browse/LUCENE-7049) and
Solr pauses
And switched to using a search and then deleteById. It worked fine on our
(small) test collections.

But with 200M documents it appears that deleteById causes the heap to
increase dramatically (we guess fieldCache gets populated with a large
number of object ids?)
To confirm our suspicion we put ³docValues²=³true² on the schema and began
to reindex and the heap memory usage dropped significantly - in fact heap
memory usage on the Solr VMs dropped by a half.

Can someone confirm (or deny) our suspicion that deleteById results in
some on-heap caching of the unique key (id?)?


Cheers!

-Frank

P.s. Interesting when I searched the Wiki for docs on deleteById I did not
find any
https://cwiki.apache.org/confluence/dosearchsite.action?where=solr&spaceSea
rch=true&queryString=deleteById


P.p.s Separately we are also turning off FilterCache but we know from
usage and plugin stats that it is not in use but best to turn it off
entirely for risk reduction

 
Frank Kelly
Principal Software Engineer
 
HERE 
5 Wayside Rd, Burlington, MA 01803, USA
42° 29' 7" N 71° 11' 32" W
 
 <http://360.here.com/>     <https://www.twitter.com/here>
<https://www.facebook.com/here>
<https://www.linkedin.com/company/heremaps>
<https://www.instagram.com/here/>



On 2/9/17, 11:00 AM, "Shawn Heisey" <ap...@elyograg.org> wrote:

>On 2/9/2017 6:19 AM, Kelly, Frank wrote:
>> Got a heap dump on an Out of Memory error.
>> Analyzing the dump now in Visual VM
>>
>> Seeing a lot of byte[] arrays (77% of our 8GB Heap) in
>>
>>   * TreeMap$Entry
>>   * FieldCacheImpl$SortedDocValues
>>
>> We¹re considering switch over to DocValues but would rather be
>> definitive about the root cause before we experiment with DocValues
>> and require a reindex of our 200M document index
>> In each of our 4 data centers.
>>
>> Any suggestions on what I should look for in this heap dump to get a
>> definitive root cause?
>>
>
>Analyzing the cause of large memory allocations when the large
>allocations are byte[] arrays might mean that it's a low-level class,
>probably in Lucene.  Solr will likely have almost no influence on these
>memory allocations, except by changing the schema to enable docValues,
>which changes the particular Lucene code that is called.  Note that
>wiping the index and rebuilding it from scratch is necessary when you
>enable docValues.
>
>Another possible source of problems like this is the filterCache.  A 200
>million document index (assuming it's all on the same machine) results
>in filterCache entries that are 25 million bytes each.  In Solr
>examples, the filterCache defaults to a size of 512.  If a cache that
>size on a 200 million document index fills up, it will require nearly 13
>gigabytes of heap memory.
>
>Thanks,
>Shawn
>


Re: Solr Heap Dump: Any suggestions on what to look for?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 2/9/2017 6:19 AM, Kelly, Frank wrote:
> Got a heap dump on an Out of Memory error.
> Analyzing the dump now in Visual VM
>
> Seeing a lot of byte[] arrays (77% of our 8GB Heap) in
>
>   * TreeMap$Entry
>   * FieldCacheImpl$SortedDocValues
>
> Were considering switch over to DocValues but would rather be
> definitive about the root cause before we experiment with DocValues
> and require a reindex of our 200M document index 
> In each of our 4 data centers.
>
> Any suggestions on what I should look for in this heap dump to get a
> definitive root cause?
>

Analyzing the cause of large memory allocations when the large
allocations are byte[] arrays might mean that it's a low-level class,
probably in Lucene.  Solr will likely have almost no influence on these
memory allocations, except by changing the schema to enable docValues,
which changes the particular Lucene code that is called.  Note that
wiping the index and rebuilding it from scratch is necessary when you
enable docValues.

Another possible source of problems like this is the filterCache.  A 200
million document index (assuming it's all on the same machine) results
in filterCache entries that are 25 million bytes each.  In Solr
examples, the filterCache defaults to a size of 512.  If a cache that
size on a 200 million document index fills up, it will require nearly 13
gigabytes of heap memory.

Thanks,
Shawn