You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dominique Bejean <do...@eolya.fr> on 2017/12/02 15:43:09 UTC

Solr JVM best pratices

Hi,

I would like to have some advices on best practices related to Heap Size,
MMap, direct memory, GC algorithm and OS Swap.

This is a waste subject and sorry for this long question but all these
items are linked in order to have a stable Solr environment.

My understanding and questions.

About JVM heap size setting

JVM heap size setting is related to use case so there is no other advice
than reduce it at the minimum possible size in order to avoid GC issue.
Reduce Heap size at is minimum will be achieved mainly by :

   -

   Optimize schema by remove unused fields and not index / store fields if
   it is not necessary
   -

   Enable docValues on fields used for facetting, sorting and grouping
   -

   Not oversize Solr cache
   -

   Be careful with rows and fl query parameters


Any other advice is welcome :)


About MMap setting

According to the great article “
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
from Uwe Schindler, the only tasks that have to be done at OS settings
level is check that “ulimit -v” and “ulimit -m” both report “unlimited” and
increase vm.max_map_count setting from is default 65536.

I suppose the best value is related to available off heap memory. I
generally set it to 262144. Is it a good value or is there a better way to
determine this value ?


About Direct Memory

According to a response in Solr Maillig list from Uwe Schindler (again), I
understand that the MmapDirectory is not Direct Memory.

The only place where I read that MaxDirectMemorySize JVM setting have to be
set for Solr is in Cloudera blog post and in Solr mailing list when using
Solr with HDFS.

Is it necessary to change the default MaxDirectMemorySize JVM setting ? If
yes, how to determine the appropriate value ?


About OS Swap setting

Linux generally starts swapping when less than 30% of the memory is free.
In order to avoid OS goes against Solr for off heap memory management,  I
use to change OS swappiness value to 0. Can you confirm it is a good thing ?


About CMS GC vs G1 GC

Default Solr setting use CMS GC.

According to the post from Shawn Heisey in the old Solr wiki (
https://wiki.apache.org/solr/ShawnHeisey), can we consider that G1 GC can
definitely be used with Solr for heap size over nearly 4Gb ?


Regards

Dominique

-- 
Dominique Béjean
06 08 46 12 43

Re: Solr JVM best pratices

Posted by Walter Underwood <wu...@wunderwood.org>.
We decided to go with modern technology for the new cluster. CMS has been around for a very long time, maybe more then ten years.

These are the GC settings where we still use CMS. Instead of setting up a lot of ratios, I specify the sizes of the GC areas. That seems a lot more clear to me. We did some benchmarking and increasing the new space to 2G reduced the growth of tenured space. Most of Solr’s allocations have a lifetime of a single HTTP request.

-Xms8g
-Xmx8g
-XX:NewSize=2g
-XX:MaxPermSize=256m
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+ExplicitGCInvokesConcurrent

The last flag was because something was invoking a full GC to get accurate memory usage. That was annoying.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 2, 2017, at 8:18 AM, Dominique Bejean <do...@eolya.fr> wrote:
> 
> Hi Walter,
> 
> Thank you for this response. Did you use CMS before G1 ? Was there any GC
> issues fixed by G1 ?
> 
> Dominique
> 
> 
> Le sam. 2 déc. 2017 à 17:13, Walter Underwood <wu...@wunderwood.org> a
> écrit :
> 
>> We use an 8G heap and G1 with Shawn Heisey’s settings. Java 8, update 131.
>> 
>> This has been solid in production with a 32 node Solr Cloud cluster. We do
>> not do faceting.
>> 
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Dec 2, 2017, at 7:43 AM, Dominique Bejean <do...@eolya.fr>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> I would like to have some advices on best practices related to Heap Size,
>>> MMap, direct memory, GC algorithm and OS Swap.
>>> 
>>> This is a waste subject and sorry for this long question but all these
>>> items are linked in order to have a stable Solr environment.
>>> 
>>> My understanding and questions.
>>> 
>>> About JVM heap size setting
>>> 
>>> JVM heap size setting is related to use case so there is no other advice
>>> than reduce it at the minimum possible size in order to avoid GC issue.
>>> Reduce Heap size at is minimum will be achieved mainly by :
>>> 
>>>  -
>>> 
>>>  Optimize schema by remove unused fields and not index / store fields if
>>>  it is not necessary
>>>  -
>>> 
>>>  Enable docValues on fields used for facetting, sorting and grouping
>>>  -
>>> 
>>>  Not oversize Solr cache
>>>  -
>>> 
>>>  Be careful with rows and fl query parameters
>>> 
>>> 
>>> Any other advice is welcome :)
>>> 
>>> 
>>> About MMap setting
>>> 
>>> According to the great article “
>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
>>> from Uwe Schindler, the only tasks that have to be done at OS settings
>>> level is check that “ulimit -v” and “ulimit -m” both report “unlimited”
>> and
>>> increase vm.max_map_count setting from is default 65536.
>>> 
>>> I suppose the best value is related to available off heap memory. I
>>> generally set it to 262144. Is it a good value or is there a better way
>> to
>>> determine this value ?
>>> 
>>> 
>>> About Direct Memory
>>> 
>>> According to a response in Solr Maillig list from Uwe Schindler (again),
>> I
>>> understand that the MmapDirectory is not Direct Memory.
>>> 
>>> The only place where I read that MaxDirectMemorySize JVM setting have to
>> be
>>> set for Solr is in Cloudera blog post and in Solr mailing list when using
>>> Solr with HDFS.
>>> 
>>> Is it necessary to change the default MaxDirectMemorySize JVM setting ?
>> If
>>> yes, how to determine the appropriate value ?
>>> 
>>> 
>>> About OS Swap setting
>>> 
>>> Linux generally starts swapping when less than 30% of the memory is free.
>>> In order to avoid OS goes against Solr for off heap memory management,  I
>>> use to change OS swappiness value to 0. Can you confirm it is a good
>> thing ?
>>> 
>>> 
>>> About CMS GC vs G1 GC
>>> 
>>> Default Solr setting use CMS GC.
>>> 
>>> According to the post from Shawn Heisey in the old Solr wiki (
>>> https://wiki.apache.org/solr/ShawnHeisey), can we consider that G1 GC
>> can
>>> definitely be used with Solr for heap size over nearly 4Gb ?
>>> 
>>> 
>>> Regards
>>> 
>>> Dominique
>>> 
>>> --
>>> Dominique Béjean
>>> 06 08 46 12 43
>> 
>> --
> Dominique Béjean
> 06 08 46 12 43


Re: Solr JVM best pratices

Posted by Dominique Bejean <do...@eolya.fr>.
Hi Walter,

Thank you for this response. Did you use CMS before G1 ? Was there any GC
issues fixed by G1 ?

Dominique


Le sam. 2 déc. 2017 à 17:13, Walter Underwood <wu...@wunderwood.org> a
écrit :

> We use an 8G heap and G1 with Shawn Heisey’s settings. Java 8, update 131.
>
> This has been solid in production with a 32 node Solr Cloud cluster. We do
> not do faceting.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Dec 2, 2017, at 7:43 AM, Dominique Bejean <do...@eolya.fr>
> wrote:
> >
> > Hi,
> >
> > I would like to have some advices on best practices related to Heap Size,
> > MMap, direct memory, GC algorithm and OS Swap.
> >
> > This is a waste subject and sorry for this long question but all these
> > items are linked in order to have a stable Solr environment.
> >
> > My understanding and questions.
> >
> > About JVM heap size setting
> >
> > JVM heap size setting is related to use case so there is no other advice
> > than reduce it at the minimum possible size in order to avoid GC issue.
> > Reduce Heap size at is minimum will be achieved mainly by :
> >
> >   -
> >
> >   Optimize schema by remove unused fields and not index / store fields if
> >   it is not necessary
> >   -
> >
> >   Enable docValues on fields used for facetting, sorting and grouping
> >   -
> >
> >   Not oversize Solr cache
> >   -
> >
> >   Be careful with rows and fl query parameters
> >
> >
> > Any other advice is welcome :)
> >
> >
> > About MMap setting
> >
> > According to the great article “
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
> > from Uwe Schindler, the only tasks that have to be done at OS settings
> > level is check that “ulimit -v” and “ulimit -m” both report “unlimited”
> and
> > increase vm.max_map_count setting from is default 65536.
> >
> > I suppose the best value is related to available off heap memory. I
> > generally set it to 262144. Is it a good value or is there a better way
> to
> > determine this value ?
> >
> >
> > About Direct Memory
> >
> > According to a response in Solr Maillig list from Uwe Schindler (again),
> I
> > understand that the MmapDirectory is not Direct Memory.
> >
> > The only place where I read that MaxDirectMemorySize JVM setting have to
> be
> > set for Solr is in Cloudera blog post and in Solr mailing list when using
> > Solr with HDFS.
> >
> > Is it necessary to change the default MaxDirectMemorySize JVM setting ?
> If
> > yes, how to determine the appropriate value ?
> >
> >
> > About OS Swap setting
> >
> > Linux generally starts swapping when less than 30% of the memory is free.
> > In order to avoid OS goes against Solr for off heap memory management,  I
> > use to change OS swappiness value to 0. Can you confirm it is a good
> thing ?
> >
> >
> > About CMS GC vs G1 GC
> >
> > Default Solr setting use CMS GC.
> >
> > According to the post from Shawn Heisey in the old Solr wiki (
> > https://wiki.apache.org/solr/ShawnHeisey), can we consider that G1 GC
> can
> > definitely be used with Solr for heap size over nearly 4Gb ?
> >
> >
> > Regards
> >
> > Dominique
> >
> > --
> > Dominique Béjean
> > 06 08 46 12 43
>
> --
Dominique Béjean
06 08 46 12 43

Re: Solr JVM best pratices

Posted by Walter Underwood <wu...@wunderwood.org>.
We use an 8G heap and G1 with Shawn Heisey’s settings. Java 8, update 131.

This has been solid in production with a 32 node Solr Cloud cluster. We do not do faceting.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 2, 2017, at 7:43 AM, Dominique Bejean <do...@eolya.fr> wrote:
> 
> Hi,
> 
> I would like to have some advices on best practices related to Heap Size,
> MMap, direct memory, GC algorithm and OS Swap.
> 
> This is a waste subject and sorry for this long question but all these
> items are linked in order to have a stable Solr environment.
> 
> My understanding and questions.
> 
> About JVM heap size setting
> 
> JVM heap size setting is related to use case so there is no other advice
> than reduce it at the minimum possible size in order to avoid GC issue.
> Reduce Heap size at is minimum will be achieved mainly by :
> 
>   -
> 
>   Optimize schema by remove unused fields and not index / store fields if
>   it is not necessary
>   -
> 
>   Enable docValues on fields used for facetting, sorting and grouping
>   -
> 
>   Not oversize Solr cache
>   -
> 
>   Be careful with rows and fl query parameters
> 
> 
> Any other advice is welcome :)
> 
> 
> About MMap setting
> 
> According to the great article “
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
> from Uwe Schindler, the only tasks that have to be done at OS settings
> level is check that “ulimit -v” and “ulimit -m” both report “unlimited” and
> increase vm.max_map_count setting from is default 65536.
> 
> I suppose the best value is related to available off heap memory. I
> generally set it to 262144. Is it a good value or is there a better way to
> determine this value ?
> 
> 
> About Direct Memory
> 
> According to a response in Solr Maillig list from Uwe Schindler (again), I
> understand that the MmapDirectory is not Direct Memory.
> 
> The only place where I read that MaxDirectMemorySize JVM setting have to be
> set for Solr is in Cloudera blog post and in Solr mailing list when using
> Solr with HDFS.
> 
> Is it necessary to change the default MaxDirectMemorySize JVM setting ? If
> yes, how to determine the appropriate value ?
> 
> 
> About OS Swap setting
> 
> Linux generally starts swapping when less than 30% of the memory is free.
> In order to avoid OS goes against Solr for off heap memory management,  I
> use to change OS swappiness value to 0. Can you confirm it is a good thing ?
> 
> 
> About CMS GC vs G1 GC
> 
> Default Solr setting use CMS GC.
> 
> According to the post from Shawn Heisey in the old Solr wiki (
> https://wiki.apache.org/solr/ShawnHeisey), can we consider that G1 GC can
> definitely be used with Solr for heap size over nearly 4Gb ?
> 
> 
> Regards
> 
> Dominique
> 
> -- 
> Dominique Béjean
> 06 08 46 12 43


Re: Solr JVM best pratices

Posted by Dominique Bejean <do...@eolya.fr>.
Thank you Shaw for replying each items

I start to figure out better all these tricky jvm stuff.

Dominique

Le dim. 3 déc. 2017 à 01:30, Shawn Heisey <ap...@elyograg.org> a écrit :

> On 12/2/2017 8:43 AM, Dominique Bejean wrote:
> > I would like to have some advices on best practices related to Heap Size,
> > MMap, direct memory, GC algorithm and OS Swap.
>
> For the most part, there is no generic advice we can give you for these
> things.  What you need is going to be highly dependent on exactly what
> you are doing with Solr and how much index data you have.  There are no
> formulas for calculating these values based on information about your
> setup.
>
> Experienced Solr users can make *guesses* if you provide some
> information, but those guesses might turn out the be completely wrong.
>
>
> https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> > About JVM heap size setting
> >
> > JVM heap size setting is related to use case so there is no other advice
> > than reduce it at the minimum possible size in order to avoid GC issue.
> > Reduce Heap size at is minimum will be achieved mainly by :
>
> The max heap size should be as large as you need, and no larger.
> Figuring out what you need may require trial and error on an
> installation that has all the index data and is receiving production
> queries.
>
> On this wiki page, I wrote a small section about one way you MIGHT be
> able to figure out what heap size you need:
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F
>
> >     Optimize schema by remove unused fields and not index / store fields
> if
> >     it is not necessary
> >     -
> >
> >     Enable docValues on fields used for facetting, sorting and grouping
> >     -
> >
> >     Not oversize Solr cache
> >     -
> >
> >     Be careful with rows and fl query parameters
>
> These are good ideas.  But sometimes you find that you need a lot of
> fields, and you need a lot of them to be stored.  The schema and config
> should be designed around what you need Solr to do.  Designing them for
> the lowest possible memory usage might result in a config that doesn't
> do what you want.
>
> > About MMap setting
> >
> > According to the great article “
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
> > from Uwe Schindler, the only tasks that have to be done at OS settings
> > level is check that “ulimit -v” and “ulimit -m” both report “unlimited”
> and
> > increase vm.max_map_count setting from is default 65536.
>
> The default directory implementation that recent Solr versions use is
> NRTCachingDirectoryFactory.  This wraps another implementation with a
> small memory cache.  The implementation that is wrapped by default DOES
> use MMAP.
>
> The amount of memory used for caching MMAP access cannot be configured
> in the application.  The OS handles that caching completely
> automatically, without any configuration at all.  All modern operating
> systems are designed so that the disk cache can use *all* available
> memory in the system.  This is because the cache will instantly give up
> memory if a program requests it.  The cache never keeps memory that
> programs want.
>
> > I suppose the best value is related to available off heap memory. I
> > generally set it to 262144. Is it a good value or is there a better way
> to
> > determine this value ?
>
> Solr doesn't use any off heap memory as far as I'm aware.  There was a
> fork of Solr for a short time named heliosearch, which DID use off-heap
> memory.  Java itself will use some off-heap memory for its own
> operation.  I do not know whether that is configurable, and if so, how
> it's done.
>
> > About Direct Memory
> >
> > According to a response in Solr Maillig list from Uwe Schindler (again),
> I
> > understand that the MmapDirectory is not Direct Memory.
> >
> > The only place where I read that MaxDirectMemorySize JVM setting have to
> be
> > set for Solr is in Cloudera blog post and in Solr mailing list when using
> > Solr with HDFS.
> >
> > Is it necessary to change the default MaxDirectMemorySize JVM setting ?
> If
> > yes, how to determine the appropriate value ?
>
> I have never heard of this "direct memory."  Solr probably doesn't use
> it.  I really have no idea what happens when the index is in HDFS.
> You'd have to ask somebody who knows Hadoop.
>
> > About OS Swap setting
> >
> > Linux generally starts swapping when less than 30% of the memory is free.
> > In order to avoid OS goes against Solr for off heap memory management,  I
> > use to change OS swappiness value to 0. Can you confirm it is a good
> thing ?
>
> If the OS starts swapping, performance of everything on the machine is
> going to drop significantly.  Setting swappiness to 0 is probably a good
> idea.  Most Linux distributions default to 60 here, which means the OS
> is going to aggressively start swapping anything it thinks isn't being
> used, even before memory pressure becomes extreme.
>
> > About CMS GC vs G1 GC
> >
> > Default Solr setting use CMS GC.
> >
> > According to the post from Shawn Heisey in the old Solr wiki (
> > https://wiki.apache.org/solr/ShawnHeisey), can we consider that G1 GC
> can
> > definitely be used with Solr for heap size over nearly 4Gb ?
>
> I've never had any problems with G1, and my experiments suggest that it
> does a better job of reducing GC pauses than CMS does, if it is tuned
> correctly.  Just enabling G1 isn't much better than Java's defaults, and
> Solr's CMS settings are definitely better than untuned G1.
>
> Thanks,
> Shawn
>
-- 
Dominique Béjean
06 08 46 12 43

Re: Solr JVM best pratices

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/2/2017 8:43 AM, Dominique Bejean wrote:
> I would like to have some advices on best practices related to Heap Size,
> MMap, direct memory, GC algorithm and OS Swap.

For the most part, there is no generic advice we can give you for these 
things.  What you need is going to be highly dependent on exactly what 
you are doing with Solr and how much index data you have.  There are no 
formulas for calculating these values based on information about your setup.

Experienced Solr users can make *guesses* if you provide some 
information, but those guesses might turn out the be completely wrong.

https://lucidworks.com/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

> About JVM heap size setting
> 
> JVM heap size setting is related to use case so there is no other advice
> than reduce it at the minimum possible size in order to avoid GC issue.
> Reduce Heap size at is minimum will be achieved mainly by :

The max heap size should be as large as you need, and no larger. 
Figuring out what you need may require trial and error on an 
installation that has all the index data and is receiving production 
queries.

On this wiki page, I wrote a small section about one way you MIGHT be 
able to figure out what heap size you need:

https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F

>     Optimize schema by remove unused fields and not index / store fields if
>     it is not necessary
>     -
> 
>     Enable docValues on fields used for facetting, sorting and grouping
>     -
> 
>     Not oversize Solr cache
>     -
> 
>     Be careful with rows and fl query parameters

These are good ideas.  But sometimes you find that you need a lot of 
fields, and you need a lot of them to be stored.  The schema and config 
should be designed around what you need Solr to do.  Designing them for 
the lowest possible memory usage might result in a config that doesn't 
do what you want.

> About MMap setting
> 
> According to the great article “
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html”
> from Uwe Schindler, the only tasks that have to be done at OS settings
> level is check that “ulimit -v” and “ulimit -m” both report “unlimited” and
> increase vm.max_map_count setting from is default 65536.

The default directory implementation that recent Solr versions use is 
NRTCachingDirectoryFactory.  This wraps another implementation with a 
small memory cache.  The implementation that is wrapped by default DOES 
use MMAP.

The amount of memory used for caching MMAP access cannot be configured 
in the application.  The OS handles that caching completely 
automatically, without any configuration at all.  All modern operating 
systems are designed so that the disk cache can use *all* available 
memory in the system.  This is because the cache will instantly give up 
memory if a program requests it.  The cache never keeps memory that 
programs want.

> I suppose the best value is related to available off heap memory. I
> generally set it to 262144. Is it a good value or is there a better way to
> determine this value ?

Solr doesn't use any off heap memory as far as I'm aware.  There was a 
fork of Solr for a short time named heliosearch, which DID use off-heap 
memory.  Java itself will use some off-heap memory for its own 
operation.  I do not know whether that is configurable, and if so, how 
it's done.

> About Direct Memory
> 
> According to a response in Solr Maillig list from Uwe Schindler (again), I
> understand that the MmapDirectory is not Direct Memory.
> 
> The only place where I read that MaxDirectMemorySize JVM setting have to be
> set for Solr is in Cloudera blog post and in Solr mailing list when using
> Solr with HDFS.
> 
> Is it necessary to change the default MaxDirectMemorySize JVM setting ? If
> yes, how to determine the appropriate value ?

I have never heard of this "direct memory."  Solr probably doesn't use 
it.  I really have no idea what happens when the index is in HDFS. 
You'd have to ask somebody who knows Hadoop.

> About OS Swap setting
> 
> Linux generally starts swapping when less than 30% of the memory is free.
> In order to avoid OS goes against Solr for off heap memory management,  I
> use to change OS swappiness value to 0. Can you confirm it is a good thing ?

If the OS starts swapping, performance of everything on the machine is 
going to drop significantly.  Setting swappiness to 0 is probably a good 
idea.  Most Linux distributions default to 60 here, which means the OS 
is going to aggressively start swapping anything it thinks isn't being 
used, even before memory pressure becomes extreme.

> About CMS GC vs G1 GC
> 
> Default Solr setting use CMS GC.
> 
> According to the post from Shawn Heisey in the old Solr wiki (
> https://wiki.apache.org/solr/ShawnHeisey), can we consider that G1 GC can
> definitely be used with Solr for heap size over nearly 4Gb ?

I've never had any problems with G1, and my experiments suggest that it 
does a better job of reducing GC pauses than CMS does, if it is tuned 
correctly.  Just enabling G1 isn't much better than Java's defaults, and 
Solr's CMS settings are definitely better than untuned G1.

Thanks,
Shawn