You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Caroline Hind <ca...@dbdept.com.au> on 2015/06/30 08:39:08 UTC

Some guidance on memory requirements/usage/tuning

Hi, 

I am very new to SOLR, and would appreciate some guidance if anyone has the time to offer it. 

We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same time increased the physical RAM from 24Gb to 96Gb. We run multiple cores on this one server, approximately 20 in total, but primarily we have one that is huge in comparison to all of the others. This very large core consists of nearly 62 million documents, and the index is around 45Gb in size.(Is that index unreasonably large, should it be sharded?) 

I'm really unfamiliar with how we should be configuring our JVM. Currently we have it set to a maximum of 48Gb, up until yesterday it was set to 24Gb and we've been seeing the dreaded OOME messages from time to time. The type of queries that are run can return anything from 1 million to 9.5 million documents, and typically run for anything from 20 to 45 minutes. 

I'd appreciate any suggestions, pointers to articles I should be reading to learn more, criticisms, etc. 

Thanks, 
Caroline

Re: Some guidance on memory requirements/usage/tuning

Posted by Alessandro Benedetti <be...@gmail.com>.

Am I wrong or the current type of default IndexDirectory is the
"NRTCachingDirectoryFactory" since Solr 4.x ?
If I remember well this Factory is creating a Directory implementation
built on top of a "MMapDirectory".
In this case we should rely on the Memory Mapping Operative System feature
to properly manage the index in memory.
This means we give to the Solr JVM few memory and we leave all the big
memory to the operative system that will load and swap part of index in
memory ad hoc.

If my assumptions are correct and the user is using a recent Solr, it
should be a wrong practice to assign such a big memory to the Solr JVM ( I
assume Garbage collection will face big pauses and others problems as well).

One thing I read that should invalidate my assumptions is the  OOME
messages but actually as Toke wisely said,this can derive from a massive
usage of sorting, field grouping and field caching ( field faceting on not
DocValues fields ?).
On the other hand Erick consideration is quite important and I hope that
there is no usage of such a monster row parameter ( which will make
no-sense) .
To do deep paging the suggested approach is to use Cursor Mark ( as Erick
suggested) and if we need to stream all the results it can be a good option
to study the streaming component and the export feature.

Please correct me if I said anything wrong !

Cheers

2015-06-30 13:37 GMT+01:00 Erick Erickson <er...@gmail.com>:

> bq: The type of queries that are run can return anything from 1
> million to 9.5 million documents, and typically run for anything from
> 20 to 45 minutes.
>
> Uhhh, are you literally setting the &rows parameter to over 9.5M and
> getting that many docs all at once? Or is that just numFound and
> you're _really_ returning just a relatively few docs? Because if
> you're returning 9.5M rows, that's really an anti-pattern for Solr.
> There are other ways to do some of this (cursor mark, streaming
> aggregation,  export). But before we go there I want to be sure I'm
> understanding the use-case.
>
> Because I agree with Toke, the performance numbers you give are waaaay
> out of what I would expect, so clearly I don't get something about
> your setup.
>
> Best,
> Erick
>
> On Tue, Jun 30, 2015 at 3:43 AM, Toke Eskildsen <te...@statsbiblioteket.dk>
> wrote:
> > On Tue, 2015-06-30 at 16:39 +1000, Caroline Hind wrote:
> >> We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same
> >> time increased the physical RAM from 24Gb to 96Gb. We run multiple
> >> cores on this one server, approximately 20 in total, but primarily we
> >> have one that is huge in comparison to all of the others. This very
> >> large core consists of nearly 62 million documents, and the index is
> >> around 45Gb in size.(Is that index unreasonably large, should it be
> >> sharded?)
> >
> > The size itself sounds fine, but your performance numbers below are
> > worrying. As always it is hard to give advice on setups:
> >
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >
> >> I'm really unfamiliar with how we should be configuring our JVM.
> >> Currently we have it set to a maximum of 48Gb, up until yesterday it
> >> was set to 24Gb and we've been seeing the dreaded OOME messages from
> >> time to time.
> >
> > There is a shift in pointer size when one passes the 32GB mark for JVM
> > memory. Your 48GB allocation gives you about the same amount of heap as
> > a 32GB allocation would:
> >
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
> > Consider running two Solrs on the same machine instead. Maybe one for
> > the large collection and one for the rest?
> >
> > Anyway, OOMs with ~32GB of heap for 62M documents indicates that you are
> > doing heavy sorting, grouping or faceting on fields that does not have
> > DocValues enabled. Could you describe what you do in that regard?
> >
> >> The type of queries that are run can return anything from
> >> 1 million to 9.5 million documents, and typically run for anything from
> >> 20 to 45 minutes.
> >
> > Such response times are a thousand times higher than what most people
> > are seeing. There might be a perfectly fine reason for those response
> > times, but I suggest we sanity check them: Could you show us a typical
> > query and tell us how many concurrent queries you normally serve?
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Some guidance on memory requirements/usage/tuning

Posted by Erick Erickson <er...@gmail.com>.

bq: The type of queries that are run can return anything from 1
million to 9.5 million documents, and typically run for anything from
20 to 45 minutes.

Uhhh, are you literally setting the &rows parameter to over 9.5M and
getting that many docs all at once? Or is that just numFound and
you're _really_ returning just a relatively few docs? Because if
you're returning 9.5M rows, that's really an anti-pattern for Solr.
There are other ways to do some of this (cursor mark, streaming
aggregation,  export). But before we go there I want to be sure I'm
understanding the use-case.

Because I agree with Toke, the performance numbers you give are waaaay
out of what I would expect, so clearly I don't get something about
your setup.

Best,
Erick

On Tue, Jun 30, 2015 at 3:43 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> On Tue, 2015-06-30 at 16:39 +1000, Caroline Hind wrote:
>> We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same
>> time increased the physical RAM from 24Gb to 96Gb. We run multiple
>> cores on this one server, approximately 20 in total, but primarily we
>> have one that is huge in comparison to all of the others. This very
>> large core consists of nearly 62 million documents, and the index is
>> around 45Gb in size.(Is that index unreasonably large, should it be
>> sharded?)
>
> The size itself sounds fine, but your performance numbers below are
> worrying. As always it is hard to give advice on setups:
> https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
>> I'm really unfamiliar with how we should be configuring our JVM.
>> Currently we have it set to a maximum of 48Gb, up until yesterday it
>> was set to 24Gb and we've been seeing the dreaded OOME messages from
>> time to time.
>
> There is a shift in pointer size when one passes the 32GB mark for JVM
> memory. Your 48GB allocation gives you about the same amount of heap as
> a 32GB allocation would:
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
> Consider running two Solrs on the same machine instead. Maybe one for
> the large collection and one for the rest?
>
> Anyway, OOMs with ~32GB of heap for 62M documents indicates that you are
> doing heavy sorting, grouping or faceting on fields that does not have
> DocValues enabled. Could you describe what you do in that regard?
>
>> The type of queries that are run can return anything from
>> 1 million to 9.5 million documents, and typically run for anything from
>> 20 to 45 minutes.
>
> Such response times are a thousand times higher than what most people
> are seeing. There might be a perfectly fine reason for those response
> times, but I suggest we sanity check them: Could you show us a typical
> query and tell us how many concurrent queries you normally serve?
>
> - Toke Eskildsen, State and University Library, Denmark
>
>

Re: Some guidance on memory requirements/usage/tuning

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Tue, 2015-06-30 at 16:39 +1000, Caroline Hind wrote:
> We have very recently upgraded from SOLR 4.1 to 5.2.1, and at the same
> time increased the physical RAM from 24Gb to 96Gb. We run multiple
> cores on this one server, approximately 20 in total, but primarily we
> have one that is huge in comparison to all of the others. This very
> large core consists of nearly 62 million documents, and the index is
> around 45Gb in size.(Is that index unreasonably large, should it be
> sharded?) 

The size itself sounds fine, but your performance numbers below are
worrying. As always it is hard to give advice on setups:
https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

> I'm really unfamiliar with how we should be configuring our JVM.
> Currently we have it set to a maximum of 48Gb, up until yesterday it
> was set to 24Gb and we've been seeing the dreaded OOME messages from
> time to time.

There is a shift in pointer size when one passes the 32GB mark for JVM
memory. Your 48GB allocation gives you about the same amount of heap as
a 32GB allocation would:
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
Consider running two Solrs on the same machine instead. Maybe one for
the large collection and one for the rest?

Anyway, OOMs with ~32GB of heap for 62M documents indicates that you are
doing heavy sorting, grouping or faceting on fields that does not have
DocValues enabled. Could you describe what you do in that regard?

> The type of queries that are run can return anything from
> 1 million to 9.5 million documents, and typically run for anything from
> 20 to 45 minutes. 

Such response times are a thousand times higher than what most people
are seeing. There might be a perfectly fine reason for those response
times, but I suggest we sanity check them: Could you show us a typical
query and tell us how many concurrent queries you normally serve?

- Toke Eskildsen, State and University Library, Denmark