You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2020/07/12 04:02:39 UTC

Re: Solr heap Old generation grows and it is not recovered by G1GC

On 6/25/2020 2:08 PM, Odysci wrote:
> I have a solrcloud setup with 12GB heap and I've been trying to optimize it
> to avoid OOM errors. My index has about 30million docs and about 80GB
> total, 2 shards, 2 replicas.

Have you seen the full OutOfMemoryError exception text?  OOME can be 
caused by problems that are not actually memory-related.  Unless the 
error specifically mentions "heap space" we might be chasing the wrong 
thing here.

> When the queries return a smallish number of docs (say, below 1000), the
> heap behavior seems "normal". Monitoring the gc log I see that young
> generation grows then when GC kicks in, it goes considerably down. And the
> old generation grows just a bit.
> 
> However, at some point i have a query that returns over 300K docs (for a
> total size of approximately 1GB). At this very point the OLD generation
> size grows (almost by 2GB), and it remains high for all remaining time.
> Even as new queries are executed, the OLD generation size does not go down,
> despite multiple GC calls done afterwards.

Assuming the OOME exceptions were indeed caused by running out of heap, 
then the following paragraphs will apply:

G1 has this concept called "humongous allocations".  In order to reach 
this designation, a memory allocation must get to half of the G1 heap 
region size.  You have set this to 4 megabytes, so any allocation of 2 
megabytes or larger is humongous.  Humongous allocations bypass the new 
generation entirely and go directly into the old generation.  The max 
value that can be set for the G1 region size is 32MB.  If you increase 
the region size and the behavior changes, then humongous allocations 
could be something to investigate.

In the versions of Java that I have used, humongous allocations can only 
be reclaimed as garbage by a full GC.  I do not know if Oracle has 
changed this so the smaller collections will do it or not.

Were any of those multiple GCs a Full GC?  If they were, then there is 
probably little or no garbage to collect.  You've gotten a reply from 
"Zisis T." with some possible causes for this.  I do not have anything 
to add.

I did not know about any problems with maxRamMB ... but if I were 
attempting to limit cache sizes, I would do so by the size values, not a 
specific RAM size.  The size values you have chosen (8192 and 16384) 
will most likely result in a total cache size well beyond the limits 
you've indicated with maxRamMB.  So if there are any bugs in the code 
with the maxRamMB parameter, you might end up using a LOT of memory that 
you didn't expect to be using.

Thanks,
Shawn

Re: Solr heap Old generation grows and it is not recovered by G1GC

Posted by Erik Hatcher <er...@gmail.com>.

What kind of statistics?    Are these stats that you could perhaps get from faceting or the stats component instead of gathering docs and accumulating stats yourself?



> On Jul 14, 2020, at 8:51 AM, Odysci <od...@gmail.com> wrote:
> 
> Hi Erick,
> 
> I agree. The 300K docs in one search is an anomaly.
> But we do use 'fq' to return a large number of docs for the purposes of
> generating statistics for the whole index. We do use CursorMark extensively.
> Thanks!
> 
> Reinaldo
> 
> On Tue, Jul 14, 2020 at 8:55 AM Erick Erickson <er...@gmail.com>
> wrote:
> 
>> I’d add that you’re abusing Solr horribly by returning 300K documents in a
>> single go.
>> 
>> Solr is built to return the top N docs where N is usually quite small, <
>> 100. If you allow
>> an unlimited number of docs to be returned, you’re simply kicking the can
>> down
>> the road, somebody will ask for 1,000,000 docs sometime and you’ll be back
>> where
>> you started.
>> 
>> I _strongly_ recommend you do one of two things for such large result sets:
>> 
>> 1> Use Streaming. Perhaps Streaming Expressions will do what you want
>>    without you having to process all those docs on the client if you’re
>>    doing some kind of analytics.
>> 
>> 2> if you really, truly need all 300K docs, try getting them in chunks
>>     using CursorMark.
>> 
>> Best,
>> Erick
>> 
>>> On Jul 13, 2020, at 10:03 PM, Odysci <od...@gmail.com> wrote:
>>> 
>>> Shawn,
>>> 
>>> thanks for the extra info.
>>> The OOM errors were indeed because of heap space. In my case most of the
>> GC
>>> calls were not full GC. Only when heap was really near the top, a full GC
>>> was done.
>>> I'll try out your suggestion of increasing the G1 heap region size. I've
>>> been using 4m, and from what you said, a 2m allocation would be
>> considered
>>> humongous. My test cases have a few allocations that are definitely
>> bigger
>>> than 2m (estimating based on the number of docs returned), but most of
>> them
>>> are not.
>>> 
>>> When i was using maxRamMB, the size used was "compatible" with the the
>> size
>>> values, assuming the avg 2K bytes docs that our index has.
>>> As far as I could tell in my runs, removing maxRamMB did change the GC
>>> behavior for the better. That is, now, heap goes up and down as expected,
>>> and before (with maxRamMB) it seemed to increase continuously.
>>> Thanks
>>> 
>>> Reinaldo
>>> 
>>> On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <ap...@elyograg.org>
>> wrote:
>>> 
>>>> On 6/25/2020 2:08 PM, Odysci wrote:
>>>>> I have a solrcloud setup with 12GB heap and I've been trying to
>> optimize
>>>> it
>>>>> to avoid OOM errors. My index has about 30million docs and about 80GB
>>>>> total, 2 shards, 2 replicas.
>>>> 
>>>> Have you seen the full OutOfMemoryError exception text?  OOME can be
>>>> caused by problems that are not actually memory-related.  Unless the
>>>> error specifically mentions "heap space" we might be chasing the wrong
>>>> thing here.
>>>> 
>>>>> When the queries return a smallish number of docs (say, below 1000),
>> the
>>>>> heap behavior seems "normal". Monitoring the gc log I see that young
>>>>> generation grows then when GC kicks in, it goes considerably down. And
>>>> the
>>>>> old generation grows just a bit.
>>>>> 
>>>>> However, at some point i have a query that returns over 300K docs (for
>> a
>>>>> total size of approximately 1GB). At this very point the OLD generation
>>>>> size grows (almost by 2GB), and it remains high for all remaining time.
>>>>> Even as new queries are executed, the OLD generation size does not go
>>>> down,
>>>>> despite multiple GC calls done afterwards.
>>>> 
>>>> Assuming the OOME exceptions were indeed caused by running out of heap,
>>>> then the following paragraphs will apply:
>>>> 
>>>> G1 has this concept called "humongous allocations".  In order to reach
>>>> this designation, a memory allocation must get to half of the G1 heap
>>>> region size.  You have set this to 4 megabytes, so any allocation of 2
>>>> megabytes or larger is humongous.  Humongous allocations bypass the new
>>>> generation entirely and go directly into the old generation.  The max
>>>> value that can be set for the G1 region size is 32MB.  If you increase
>>>> the region size and the behavior changes, then humongous allocations
>>>> could be something to investigate.
>>>> 
>>>> In the versions of Java that I have used, humongous allocations can only
>>>> be reclaimed as garbage by a full GC.  I do not know if Oracle has
>>>> changed this so the smaller collections will do it or not.
>>>> 
>>>> Were any of those multiple GCs a Full GC?  If they were, then there is
>>>> probably little or no garbage to collect.  You've gotten a reply from
>>>> "Zisis T." with some possible causes for this.  I do not have anything
>>>> to add.
>>>> 
>>>> I did not know about any problems with maxRamMB ... but if I were
>>>> attempting to limit cache sizes, I would do so by the size values, not a
>>>> specific RAM size.  The size values you have chosen (8192 and 16384)
>>>> will most likely result in a total cache size well beyond the limits
>>>> you've indicated with maxRamMB.  So if there are any bugs in the code
>>>> with the maxRamMB parameter, you might end up using a LOT of memory that
>>>> you didn't expect to be using.
>>>> 
>>>> Thanks,
>>>> Shawn
>>>> 
>> 
>>

Re: Solr heap Old generation grows and it is not recovered by G1GC

Posted by Odysci <od...@gmail.com>.

Hi Erick,

I agree. The 300K docs in one search is an anomaly.
But we do use 'fq' to return a large number of docs for the purposes of
generating statistics for the whole index. We do use CursorMark extensively.
Thanks!

Reinaldo

On Tue, Jul 14, 2020 at 8:55 AM Erick Erickson <er...@gmail.com>
wrote:

> I’d add that you’re abusing Solr horribly by returning 300K documents in a
> single go.
>
> Solr is built to return the top N docs where N is usually quite small, <
> 100. If you allow
> an unlimited number of docs to be returned, you’re simply kicking the can
> down
> the road, somebody will ask for 1,000,000 docs sometime and you’ll be back
> where
> you started.
>
> I _strongly_ recommend you do one of two things for such large result sets:
>
> 1> Use Streaming. Perhaps Streaming Expressions will do what you want
>     without you having to process all those docs on the client if you’re
>     doing some kind of analytics.
>
> 2> if you really, truly need all 300K docs, try getting them in chunks
>      using CursorMark.
>
> Best,
> Erick
>
> > On Jul 13, 2020, at 10:03 PM, Odysci <od...@gmail.com> wrote:
> >
> > Shawn,
> >
> > thanks for the extra info.
> > The OOM errors were indeed because of heap space. In my case most of the
> GC
> > calls were not full GC. Only when heap was really near the top, a full GC
> > was done.
> > I'll try out your suggestion of increasing the G1 heap region size. I've
> > been using 4m, and from what you said, a 2m allocation would be
> considered
> > humongous. My test cases have a few allocations that are definitely
> bigger
> > than 2m (estimating based on the number of docs returned), but most of
> them
> > are not.
> >
> > When i was using maxRamMB, the size used was "compatible" with the the
> size
> > values, assuming the avg 2K bytes docs that our index has.
> > As far as I could tell in my runs, removing maxRamMB did change the GC
> > behavior for the better. That is, now, heap goes up and down as expected,
> > and before (with maxRamMB) it seemed to increase continuously.
> > Thanks
> >
> > Reinaldo
> >
> > On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> >> On 6/25/2020 2:08 PM, Odysci wrote:
> >>> I have a solrcloud setup with 12GB heap and I've been trying to
> optimize
> >> it
> >>> to avoid OOM errors. My index has about 30million docs and about 80GB
> >>> total, 2 shards, 2 replicas.
> >>
> >> Have you seen the full OutOfMemoryError exception text?  OOME can be
> >> caused by problems that are not actually memory-related.  Unless the
> >> error specifically mentions "heap space" we might be chasing the wrong
> >> thing here.
> >>
> >>> When the queries return a smallish number of docs (say, below 1000),
> the
> >>> heap behavior seems "normal". Monitoring the gc log I see that young
> >>> generation grows then when GC kicks in, it goes considerably down. And
> >> the
> >>> old generation grows just a bit.
> >>>
> >>> However, at some point i have a query that returns over 300K docs (for
> a
> >>> total size of approximately 1GB). At this very point the OLD generation
> >>> size grows (almost by 2GB), and it remains high for all remaining time.
> >>> Even as new queries are executed, the OLD generation size does not go
> >> down,
> >>> despite multiple GC calls done afterwards.
> >>
> >> Assuming the OOME exceptions were indeed caused by running out of heap,
> >> then the following paragraphs will apply:
> >>
> >> G1 has this concept called "humongous allocations".  In order to reach
> >> this designation, a memory allocation must get to half of the G1 heap
> >> region size.  You have set this to 4 megabytes, so any allocation of 2
> >> megabytes or larger is humongous.  Humongous allocations bypass the new
> >> generation entirely and go directly into the old generation.  The max
> >> value that can be set for the G1 region size is 32MB.  If you increase
> >> the region size and the behavior changes, then humongous allocations
> >> could be something to investigate.
> >>
> >> In the versions of Java that I have used, humongous allocations can only
> >> be reclaimed as garbage by a full GC.  I do not know if Oracle has
> >> changed this so the smaller collections will do it or not.
> >>
> >> Were any of those multiple GCs a Full GC?  If they were, then there is
> >> probably little or no garbage to collect.  You've gotten a reply from
> >> "Zisis T." with some possible causes for this.  I do not have anything
> >> to add.
> >>
> >> I did not know about any problems with maxRamMB ... but if I were
> >> attempting to limit cache sizes, I would do so by the size values, not a
> >> specific RAM size.  The size values you have chosen (8192 and 16384)
> >> will most likely result in a total cache size well beyond the limits
> >> you've indicated with maxRamMB.  So if there are any bugs in the code
> >> with the maxRamMB parameter, you might end up using a LOT of memory that
> >> you didn't expect to be using.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>

Re: Solr heap Old generation grows and it is not recovered by G1GC

Posted by Erick Erickson <er...@gmail.com>.

I’d add that you’re abusing Solr horribly by returning 300K documents in a single go.

Solr is built to return the top N docs where N is usually quite small, < 100. If you allow
an unlimited number of docs to be returned, you’re simply kicking the can down
the road, somebody will ask for 1,000,000 docs sometime and you’ll be back where
you started.

I _strongly_ recommend you do one of two things for such large result sets:

1> Use Streaming. Perhaps Streaming Expressions will do what you want
    without you having to process all those docs on the client if you’re 
    doing some kind of analytics.

2> if you really, truly need all 300K docs, try getting them in chunks
     using CursorMark.

Best,
Erick

> On Jul 13, 2020, at 10:03 PM, Odysci <od...@gmail.com> wrote:
> 
> Shawn,
> 
> thanks for the extra info.
> The OOM errors were indeed because of heap space. In my case most of the GC
> calls were not full GC. Only when heap was really near the top, a full GC
> was done.
> I'll try out your suggestion of increasing the G1 heap region size. I've
> been using 4m, and from what you said, a 2m allocation would be considered
> humongous. My test cases have a few allocations that are definitely bigger
> than 2m (estimating based on the number of docs returned), but most of them
> are not.
> 
> When i was using maxRamMB, the size used was "compatible" with the the size
> values, assuming the avg 2K bytes docs that our index has.
> As far as I could tell in my runs, removing maxRamMB did change the GC
> behavior for the better. That is, now, heap goes up and down as expected,
> and before (with maxRamMB) it seemed to increase continuously.
> Thanks
> 
> Reinaldo
> 
> On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> On 6/25/2020 2:08 PM, Odysci wrote:
>>> I have a solrcloud setup with 12GB heap and I've been trying to optimize
>> it
>>> to avoid OOM errors. My index has about 30million docs and about 80GB
>>> total, 2 shards, 2 replicas.
>> 
>> Have you seen the full OutOfMemoryError exception text?  OOME can be
>> caused by problems that are not actually memory-related.  Unless the
>> error specifically mentions "heap space" we might be chasing the wrong
>> thing here.
>> 
>>> When the queries return a smallish number of docs (say, below 1000), the
>>> heap behavior seems "normal". Monitoring the gc log I see that young
>>> generation grows then when GC kicks in, it goes considerably down. And
>> the
>>> old generation grows just a bit.
>>> 
>>> However, at some point i have a query that returns over 300K docs (for a
>>> total size of approximately 1GB). At this very point the OLD generation
>>> size grows (almost by 2GB), and it remains high for all remaining time.
>>> Even as new queries are executed, the OLD generation size does not go
>> down,
>>> despite multiple GC calls done afterwards.
>> 
>> Assuming the OOME exceptions were indeed caused by running out of heap,
>> then the following paragraphs will apply:
>> 
>> G1 has this concept called "humongous allocations".  In order to reach
>> this designation, a memory allocation must get to half of the G1 heap
>> region size.  You have set this to 4 megabytes, so any allocation of 2
>> megabytes or larger is humongous.  Humongous allocations bypass the new
>> generation entirely and go directly into the old generation.  The max
>> value that can be set for the G1 region size is 32MB.  If you increase
>> the region size and the behavior changes, then humongous allocations
>> could be something to investigate.
>> 
>> In the versions of Java that I have used, humongous allocations can only
>> be reclaimed as garbage by a full GC.  I do not know if Oracle has
>> changed this so the smaller collections will do it or not.
>> 
>> Were any of those multiple GCs a Full GC?  If they were, then there is
>> probably little or no garbage to collect.  You've gotten a reply from
>> "Zisis T." with some possible causes for this.  I do not have anything
>> to add.
>> 
>> I did not know about any problems with maxRamMB ... but if I were
>> attempting to limit cache sizes, I would do so by the size values, not a
>> specific RAM size.  The size values you have chosen (8192 and 16384)
>> will most likely result in a total cache size well beyond the limits
>> you've indicated with maxRamMB.  So if there are any bugs in the code
>> with the maxRamMB parameter, you might end up using a LOT of memory that
>> you didn't expect to be using.
>> 
>> Thanks,
>> Shawn
>>

Re: Solr heap Old generation grows and it is not recovered by G1GC

Posted by Odysci <od...@gmail.com>.

Shawn,

thanks for the extra info.
The OOM errors were indeed because of heap space. In my case most of the GC
calls were not full GC. Only when heap was really near the top, a full GC
was done.
I'll try out your suggestion of increasing the G1 heap region size. I've
been using 4m, and from what you said, a 2m allocation would be considered
humongous. My test cases have a few allocations that are definitely bigger
than 2m (estimating based on the number of docs returned), but most of them
are not.

When i was using maxRamMB, the size used was "compatible" with the the size
values, assuming the avg 2K bytes docs that our index has.
As far as I could tell in my runs, removing maxRamMB did change the GC
behavior for the better. That is, now, heap goes up and down as expected,
and before (with maxRamMB) it seemed to increase continuously.
Thanks

Reinaldo

On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <ap...@elyograg.org> wrote:

> On 6/25/2020 2:08 PM, Odysci wrote:
> > I have a solrcloud setup with 12GB heap and I've been trying to optimize
> it
> > to avoid OOM errors. My index has about 30million docs and about 80GB
> > total, 2 shards, 2 replicas.
>
> Have you seen the full OutOfMemoryError exception text?  OOME can be
> caused by problems that are not actually memory-related.  Unless the
> error specifically mentions "heap space" we might be chasing the wrong
> thing here.
>
> > When the queries return a smallish number of docs (say, below 1000), the
> > heap behavior seems "normal". Monitoring the gc log I see that young
> > generation grows then when GC kicks in, it goes considerably down. And
> the
> > old generation grows just a bit.
> >
> > However, at some point i have a query that returns over 300K docs (for a
> > total size of approximately 1GB). At this very point the OLD generation
> > size grows (almost by 2GB), and it remains high for all remaining time.
> > Even as new queries are executed, the OLD generation size does not go
> down,
> > despite multiple GC calls done afterwards.
>
> Assuming the OOME exceptions were indeed caused by running out of heap,
> then the following paragraphs will apply:
>
> G1 has this concept called "humongous allocations".  In order to reach
> this designation, a memory allocation must get to half of the G1 heap
> region size.  You have set this to 4 megabytes, so any allocation of 2
> megabytes or larger is humongous.  Humongous allocations bypass the new
> generation entirely and go directly into the old generation.  The max
> value that can be set for the G1 region size is 32MB.  If you increase
> the region size and the behavior changes, then humongous allocations
> could be something to investigate.
>
> In the versions of Java that I have used, humongous allocations can only
> be reclaimed as garbage by a full GC.  I do not know if Oracle has
> changed this so the smaller collections will do it or not.
>
> Were any of those multiple GCs a Full GC?  If they were, then there is
> probably little or no garbage to collect.  You've gotten a reply from
> "Zisis T." with some possible causes for this.  I do not have anything
> to add.
>
> I did not know about any problems with maxRamMB ... but if I were
> attempting to limit cache sizes, I would do so by the size values, not a
> specific RAM size.  The size values you have chosen (8192 and 16384)
> will most likely result in a total cache size well beyond the limits
> you've indicated with maxRamMB.  So if there are any bugs in the code
> with the maxRamMB parameter, you might end up using a LOT of memory that
> you didn't expect to be using.
>
> Thanks,
> Shawn
>