You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shawn Heisey <ap...@elyograg.org> on 2020/07/12 04:02:39 UTC
Re: Solr heap Old generation grows and it is not recovered by G1GC
On 6/25/2020 2:08 PM, Odysci wrote:
> I have a solrcloud setup with 12GB heap and I've been trying to optimize it
> to avoid OOM errors. My index has about 30million docs and about 80GB
> total, 2 shards, 2 replicas.
Have you seen the full OutOfMemoryError exception text? OOME can be
caused by problems that are not actually memory-related. Unless the
error specifically mentions "heap space" we might be chasing the wrong
thing here.
> When the queries return a smallish number of docs (say, below 1000), the
> heap behavior seems "normal". Monitoring the gc log I see that young
> generation grows then when GC kicks in, it goes considerably down. And the
> old generation grows just a bit.
>
> However, at some point i have a query that returns over 300K docs (for a
> total size of approximately 1GB). At this very point the OLD generation
> size grows (almost by 2GB), and it remains high for all remaining time.
> Even as new queries are executed, the OLD generation size does not go down,
> despite multiple GC calls done afterwards.
Assuming the OOME exceptions were indeed caused by running out of heap,
then the following paragraphs will apply:
G1 has this concept called "humongous allocations". In order to reach
this designation, a memory allocation must get to half of the G1 heap
region size. You have set this to 4 megabytes, so any allocation of 2
megabytes or larger is humongous. Humongous allocations bypass the new
generation entirely and go directly into the old generation. The max
value that can be set for the G1 region size is 32MB. If you increase
the region size and the behavior changes, then humongous allocations
could be something to investigate.
In the versions of Java that I have used, humongous allocations can only
be reclaimed as garbage by a full GC. I do not know if Oracle has
changed this so the smaller collections will do it or not.
Were any of those multiple GCs a Full GC? If they were, then there is
probably little or no garbage to collect. You've gotten a reply from
"Zisis T." with some possible causes for this. I do not have anything
to add.
I did not know about any problems with maxRamMB ... but if I were
attempting to limit cache sizes, I would do so by the size values, not a
specific RAM size. The size values you have chosen (8192 and 16384)
will most likely result in a total cache size well beyond the limits
you've indicated with maxRamMB. So if there are any bugs in the code
with the maxRamMB parameter, you might end up using a LOT of memory that
you didn't expect to be using.
Thanks,
Shawn
Re: Solr heap Old generation grows and it is not recovered by G1GC
Posted by Erik Hatcher <er...@gmail.com>.
What kind of statistics? Are these stats that you could perhaps get from faceting or the stats component instead of gathering docs and accumulating stats yourself?
> On Jul 14, 2020, at 8:51 AM, Odysci <od...@gmail.com> wrote:
>
> Hi Erick,
>
> I agree. The 300K docs in one search is an anomaly.
> But we do use 'fq' to return a large number of docs for the purposes of
> generating statistics for the whole index. We do use CursorMark extensively.
> Thanks!
>
> Reinaldo
>
> On Tue, Jul 14, 2020 at 8:55 AM Erick Erickson <er...@gmail.com>
> wrote:
>
>> I’d add that you’re abusing Solr horribly by returning 300K documents in a
>> single go.
>>
>> Solr is built to return the top N docs where N is usually quite small, <
>> 100. If you allow
>> an unlimited number of docs to be returned, you’re simply kicking the can
>> down
>> the road, somebody will ask for 1,000,000 docs sometime and you’ll be back
>> where
>> you started.
>>
>> I _strongly_ recommend you do one of two things for such large result sets:
>>
>> 1> Use Streaming. Perhaps Streaming Expressions will do what you want
>> without you having to process all those docs on the client if you’re
>> doing some kind of analytics.
>>
>> 2> if you really, truly need all 300K docs, try getting them in chunks
>> using CursorMark.
>>
>> Best,
>> Erick
>>
>>> On Jul 13, 2020, at 10:03 PM, Odysci <od...@gmail.com> wrote:
>>>
>>> Shawn,
>>>
>>> thanks for the extra info.
>>> The OOM errors were indeed because of heap space. In my case most of the
>> GC
>>> calls were not full GC. Only when heap was really near the top, a full GC
>>> was done.
>>> I'll try out your suggestion of increasing the G1 heap region size. I've
>>> been using 4m, and from what you said, a 2m allocation would be
>> considered
>>> humongous. My test cases have a few allocations that are definitely
>> bigger
>>> than 2m (estimating based on the number of docs returned), but most of
>> them
>>> are not.
>>>
>>> When i was using maxRamMB, the size used was "compatible" with the the
>> size
>>> values, assuming the avg 2K bytes docs that our index has.
>>> As far as I could tell in my runs, removing maxRamMB did change the GC
>>> behavior for the better. That is, now, heap goes up and down as expected,
>>> and before (with maxRamMB) it seemed to increase continuously.
>>> Thanks
>>>
>>> Reinaldo
>>>
>>> On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <ap...@elyograg.org>
>> wrote:
>>>
>>>> On 6/25/2020 2:08 PM, Odysci wrote:
>>>>> I have a solrcloud setup with 12GB heap and I've been trying to
>> optimize
>>>> it
>>>>> to avoid OOM errors. My index has about 30million docs and about 80GB
>>>>> total, 2 shards, 2 replicas.
>>>>
>>>> Have you seen the full OutOfMemoryError exception text? OOME can be
>>>> caused by problems that are not actually memory-related. Unless the
>>>> error specifically mentions "heap space" we might be chasing the wrong
>>>> thing here.
>>>>
>>>>> When the queries return a smallish number of docs (say, below 1000),
>> the
>>>>> heap behavior seems "normal". Monitoring the gc log I see that young
>>>>> generation grows then when GC kicks in, it goes considerably down. And
>>>> the
>>>>> old generation grows just a bit.
>>>>>
>>>>> However, at some point i have a query that returns over 300K docs (for
>> a
>>>>> total size of approximately 1GB). At this very point the OLD generation
>>>>> size grows (almost by 2GB), and it remains high for all remaining time.
>>>>> Even as new queries are executed, the OLD generation size does not go
>>>> down,
>>>>> despite multiple GC calls done afterwards.
>>>>
>>>> Assuming the OOME exceptions were indeed caused by running out of heap,
>>>> then the following paragraphs will apply:
>>>>
>>>> G1 has this concept called "humongous allocations". In order to reach
>>>> this designation, a memory allocation must get to half of the G1 heap
>>>> region size. You have set this to 4 megabytes, so any allocation of 2
>>>> megabytes or larger is humongous. Humongous allocations bypass the new
>>>> generation entirely and go directly into the old generation. The max
>>>> value that can be set for the G1 region size is 32MB. If you increase
>>>> the region size and the behavior changes, then humongous allocations
>>>> could be something to investigate.
>>>>
>>>> In the versions of Java that I have used, humongous allocations can only
>>>> be reclaimed as garbage by a full GC. I do not know if Oracle has
>>>> changed this so the smaller collections will do it or not.
>>>>
>>>> Were any of those multiple GCs a Full GC? If they were, then there is
>>>> probably little or no garbage to collect. You've gotten a reply from
>>>> "Zisis T." with some possible causes for this. I do not have anything
>>>> to add.
>>>>
>>>> I did not know about any problems with maxRamMB ... but if I were
>>>> attempting to limit cache sizes, I would do so by the size values, not a
>>>> specific RAM size. The size values you have chosen (8192 and 16384)
>>>> will most likely result in a total cache size well beyond the limits
>>>> you've indicated with maxRamMB. So if there are any bugs in the code
>>>> with the maxRamMB parameter, you might end up using a LOT of memory that
>>>> you didn't expect to be using.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>
>>
Re: Solr heap Old generation grows and it is not recovered by G1GC
Posted by Odysci <od...@gmail.com>.
Hi Erick,
I agree. The 300K docs in one search is an anomaly.
But we do use 'fq' to return a large number of docs for the purposes of
generating statistics for the whole index. We do use CursorMark extensively.
Thanks!
Reinaldo
On Tue, Jul 14, 2020 at 8:55 AM Erick Erickson <er...@gmail.com>
wrote:
> I’d add that you’re abusing Solr horribly by returning 300K documents in a
> single go.
>
> Solr is built to return the top N docs where N is usually quite small, <
> 100. If you allow
> an unlimited number of docs to be returned, you’re simply kicking the can
> down
> the road, somebody will ask for 1,000,000 docs sometime and you’ll be back
> where
> you started.
>
> I _strongly_ recommend you do one of two things for such large result sets:
>
> 1> Use Streaming. Perhaps Streaming Expressions will do what you want
> without you having to process all those docs on the client if you’re
> doing some kind of analytics.
>
> 2> if you really, truly need all 300K docs, try getting them in chunks
> using CursorMark.
>
> Best,
> Erick
>
> > On Jul 13, 2020, at 10:03 PM, Odysci <od...@gmail.com> wrote:
> >
> > Shawn,
> >
> > thanks for the extra info.
> > The OOM errors were indeed because of heap space. In my case most of the
> GC
> > calls were not full GC. Only when heap was really near the top, a full GC
> > was done.
> > I'll try out your suggestion of increasing the G1 heap region size. I've
> > been using 4m, and from what you said, a 2m allocation would be
> considered
> > humongous. My test cases have a few allocations that are definitely
> bigger
> > than 2m (estimating based on the number of docs returned), but most of
> them
> > are not.
> >
> > When i was using maxRamMB, the size used was "compatible" with the the
> size
> > values, assuming the avg 2K bytes docs that our index has.
> > As far as I could tell in my runs, removing maxRamMB did change the GC
> > behavior for the better. That is, now, heap goes up and down as expected,
> > and before (with maxRamMB) it seemed to increase continuously.
> > Thanks
> >
> > Reinaldo
> >
> > On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> >> On 6/25/2020 2:08 PM, Odysci wrote:
> >>> I have a solrcloud setup with 12GB heap and I've been trying to
> optimize
> >> it
> >>> to avoid OOM errors. My index has about 30million docs and about 80GB
> >>> total, 2 shards, 2 replicas.
> >>
> >> Have you seen the full OutOfMemoryError exception text? OOME can be
> >> caused by problems that are not actually memory-related. Unless the
> >> error specifically mentions "heap space" we might be chasing the wrong
> >> thing here.
> >>
> >>> When the queries return a smallish number of docs (say, below 1000),
> the
> >>> heap behavior seems "normal". Monitoring the gc log I see that young
> >>> generation grows then when GC kicks in, it goes considerably down. And
> >> the
> >>> old generation grows just a bit.
> >>>
> >>> However, at some point i have a query that returns over 300K docs (for
> a
> >>> total size of approximately 1GB). At this very point the OLD generation
> >>> size grows (almost by 2GB), and it remains high for all remaining time.
> >>> Even as new queries are executed, the OLD generation size does not go
> >> down,
> >>> despite multiple GC calls done afterwards.
> >>
> >> Assuming the OOME exceptions were indeed caused by running out of heap,
> >> then the following paragraphs will apply:
> >>
> >> G1 has this concept called "humongous allocations". In order to reach
> >> this designation, a memory allocation must get to half of the G1 heap
> >> region size. You have set this to 4 megabytes, so any allocation of 2
> >> megabytes or larger is humongous. Humongous allocations bypass the new
> >> generation entirely and go directly into the old generation. The max
> >> value that can be set for the G1 region size is 32MB. If you increase
> >> the region size and the behavior changes, then humongous allocations
> >> could be something to investigate.
> >>
> >> In the versions of Java that I have used, humongous allocations can only
> >> be reclaimed as garbage by a full GC. I do not know if Oracle has
> >> changed this so the smaller collections will do it or not.
> >>
> >> Were any of those multiple GCs a Full GC? If they were, then there is
> >> probably little or no garbage to collect. You've gotten a reply from
> >> "Zisis T." with some possible causes for this. I do not have anything
> >> to add.
> >>
> >> I did not know about any problems with maxRamMB ... but if I were
> >> attempting to limit cache sizes, I would do so by the size values, not a
> >> specific RAM size. The size values you have chosen (8192 and 16384)
> >> will most likely result in a total cache size well beyond the limits
> >> you've indicated with maxRamMB. So if there are any bugs in the code
> >> with the maxRamMB parameter, you might end up using a LOT of memory that
> >> you didn't expect to be using.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>
Re: Solr heap Old generation grows and it is not recovered by G1GC
Posted by Erick Erickson <er...@gmail.com>.
I’d add that you’re abusing Solr horribly by returning 300K documents in a single go.
Solr is built to return the top N docs where N is usually quite small, < 100. If you allow
an unlimited number of docs to be returned, you’re simply kicking the can down
the road, somebody will ask for 1,000,000 docs sometime and you’ll be back where
you started.
I _strongly_ recommend you do one of two things for such large result sets:
1> Use Streaming. Perhaps Streaming Expressions will do what you want
without you having to process all those docs on the client if you’re
doing some kind of analytics.
2> if you really, truly need all 300K docs, try getting them in chunks
using CursorMark.
Best,
Erick
> On Jul 13, 2020, at 10:03 PM, Odysci <od...@gmail.com> wrote:
>
> Shawn,
>
> thanks for the extra info.
> The OOM errors were indeed because of heap space. In my case most of the GC
> calls were not full GC. Only when heap was really near the top, a full GC
> was done.
> I'll try out your suggestion of increasing the G1 heap region size. I've
> been using 4m, and from what you said, a 2m allocation would be considered
> humongous. My test cases have a few allocations that are definitely bigger
> than 2m (estimating based on the number of docs returned), but most of them
> are not.
>
> When i was using maxRamMB, the size used was "compatible" with the the size
> values, assuming the avg 2K bytes docs that our index has.
> As far as I could tell in my runs, removing maxRamMB did change the GC
> behavior for the better. That is, now, heap goes up and down as expected,
> and before (with maxRamMB) it seemed to increase continuously.
> Thanks
>
> Reinaldo
>
> On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 6/25/2020 2:08 PM, Odysci wrote:
>>> I have a solrcloud setup with 12GB heap and I've been trying to optimize
>> it
>>> to avoid OOM errors. My index has about 30million docs and about 80GB
>>> total, 2 shards, 2 replicas.
>>
>> Have you seen the full OutOfMemoryError exception text? OOME can be
>> caused by problems that are not actually memory-related. Unless the
>> error specifically mentions "heap space" we might be chasing the wrong
>> thing here.
>>
>>> When the queries return a smallish number of docs (say, below 1000), the
>>> heap behavior seems "normal". Monitoring the gc log I see that young
>>> generation grows then when GC kicks in, it goes considerably down. And
>> the
>>> old generation grows just a bit.
>>>
>>> However, at some point i have a query that returns over 300K docs (for a
>>> total size of approximately 1GB). At this very point the OLD generation
>>> size grows (almost by 2GB), and it remains high for all remaining time.
>>> Even as new queries are executed, the OLD generation size does not go
>> down,
>>> despite multiple GC calls done afterwards.
>>
>> Assuming the OOME exceptions were indeed caused by running out of heap,
>> then the following paragraphs will apply:
>>
>> G1 has this concept called "humongous allocations". In order to reach
>> this designation, a memory allocation must get to half of the G1 heap
>> region size. You have set this to 4 megabytes, so any allocation of 2
>> megabytes or larger is humongous. Humongous allocations bypass the new
>> generation entirely and go directly into the old generation. The max
>> value that can be set for the G1 region size is 32MB. If you increase
>> the region size and the behavior changes, then humongous allocations
>> could be something to investigate.
>>
>> In the versions of Java that I have used, humongous allocations can only
>> be reclaimed as garbage by a full GC. I do not know if Oracle has
>> changed this so the smaller collections will do it or not.
>>
>> Were any of those multiple GCs a Full GC? If they were, then there is
>> probably little or no garbage to collect. You've gotten a reply from
>> "Zisis T." with some possible causes for this. I do not have anything
>> to add.
>>
>> I did not know about any problems with maxRamMB ... but if I were
>> attempting to limit cache sizes, I would do so by the size values, not a
>> specific RAM size. The size values you have chosen (8192 and 16384)
>> will most likely result in a total cache size well beyond the limits
>> you've indicated with maxRamMB. So if there are any bugs in the code
>> with the maxRamMB parameter, you might end up using a LOT of memory that
>> you didn't expect to be using.
>>
>> Thanks,
>> Shawn
>>
Re: Solr heap Old generation grows and it is not recovered by G1GC
Posted by Odysci <od...@gmail.com>.
Shawn,
thanks for the extra info.
The OOM errors were indeed because of heap space. In my case most of the GC
calls were not full GC. Only when heap was really near the top, a full GC
was done.
I'll try out your suggestion of increasing the G1 heap region size. I've
been using 4m, and from what you said, a 2m allocation would be considered
humongous. My test cases have a few allocations that are definitely bigger
than 2m (estimating based on the number of docs returned), but most of them
are not.
When i was using maxRamMB, the size used was "compatible" with the the size
values, assuming the avg 2K bytes docs that our index has.
As far as I could tell in my runs, removing maxRamMB did change the GC
behavior for the better. That is, now, heap goes up and down as expected,
and before (with maxRamMB) it seemed to increase continuously.
Thanks
Reinaldo
On Sun, Jul 12, 2020 at 1:02 AM Shawn Heisey <ap...@elyograg.org> wrote:
> On 6/25/2020 2:08 PM, Odysci wrote:
> > I have a solrcloud setup with 12GB heap and I've been trying to optimize
> it
> > to avoid OOM errors. My index has about 30million docs and about 80GB
> > total, 2 shards, 2 replicas.
>
> Have you seen the full OutOfMemoryError exception text? OOME can be
> caused by problems that are not actually memory-related. Unless the
> error specifically mentions "heap space" we might be chasing the wrong
> thing here.
>
> > When the queries return a smallish number of docs (say, below 1000), the
> > heap behavior seems "normal". Monitoring the gc log I see that young
> > generation grows then when GC kicks in, it goes considerably down. And
> the
> > old generation grows just a bit.
> >
> > However, at some point i have a query that returns over 300K docs (for a
> > total size of approximately 1GB). At this very point the OLD generation
> > size grows (almost by 2GB), and it remains high for all remaining time.
> > Even as new queries are executed, the OLD generation size does not go
> down,
> > despite multiple GC calls done afterwards.
>
> Assuming the OOME exceptions were indeed caused by running out of heap,
> then the following paragraphs will apply:
>
> G1 has this concept called "humongous allocations". In order to reach
> this designation, a memory allocation must get to half of the G1 heap
> region size. You have set this to 4 megabytes, so any allocation of 2
> megabytes or larger is humongous. Humongous allocations bypass the new
> generation entirely and go directly into the old generation. The max
> value that can be set for the G1 region size is 32MB. If you increase
> the region size and the behavior changes, then humongous allocations
> could be something to investigate.
>
> In the versions of Java that I have used, humongous allocations can only
> be reclaimed as garbage by a full GC. I do not know if Oracle has
> changed this so the smaller collections will do it or not.
>
> Were any of those multiple GCs a Full GC? If they were, then there is
> probably little or no garbage to collect. You've gotten a reply from
> "Zisis T." with some possible causes for this. I do not have anything
> to add.
>
> I did not know about any problems with maxRamMB ... but if I were
> attempting to limit cache sizes, I would do so by the size values, not a
> specific RAM size. The size values you have chosen (8192 and 16384)
> will most likely result in a total cache size well beyond the limits
> you've indicated with maxRamMB. So if there are any bugs in the code
> with the maxRamMB parameter, you might end up using a LOT of memory that
> you didn't expect to be using.
>
> Thanks,
> Shawn
>