You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Johannes Siegert <jo...@marktjagd.de> on 2014/02/05 17:18:19 UTC

Re: high memory usage with small data set

Hi Erick,

thanks for your reply.

What do you exactly mean with "Do your used entries in your caches 
increase in parallel?"?

I update the indices every hour and commit the changes. So a new 
searcher with empty or autowarmed caches should be created and the old 
one should be removed.

Johannes

Am 30.01.2014 15:08, schrieb Erick Erickson:
> Do your used entries in your caches increase in parallel? This would be the case
> if you aren't updating your index and would explain it. BTW, take a look at your
> cache statistics (from the admin page) and look at the cache hit ratios. If they
> are very small (and my guess is that with 1,500 boolean operations, you aren't
> getting significant re-use) then you're just wasting space, try the cache=false
> option.
>
> Also, how are you measuring memory? It's sometimes confusing that virtual
> memory can be include, see:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Best,
> Erick
>
> On Wed, Jan 29, 2014 at 7:49 AM, Johannes Siegert
> <jo...@marktjagd.de> wrote:
>> Hi,
>>
>> we are using Apache Solr Cloud within a production environment. If the
>> maximum heap-space is reached the Solr access time slows down, because of
>> the working garbage collector for a small amount of time.
>>
>> We use the following configuration:
>>
>> - Apache Tomcat as webserver to run the Solr web application
>> - 13 indices with about 1500000 entries (300 MB)
>> - 5 server with one replication per index (5 GB max heap-space)
>> - All indices have the following caches
>>     - maximum document-cache-size is 4096 entries, all other indices have
>> between 64 and 1536 entries
>>     - maximum query-cache-size is 1024 entries, all other indices have
>> between 64 and 768
>>     - maximum filter-cache-size is 1536 entries, all other i ndices have
>> between 64 and 1024
>> - the directory-factory-implementation is NRTCachingDirectoryFactory
>> - the index is updated once per hour (no auto commit)
>> - ca. 5000 requests per hour per server
>> - large filter-queries (up to 15000 bytes and 1500 boolean operations)
>> - many facet-queries (30%)
>>
>> Behaviour:
>>
>> Started with 512 MB heap space. Over several days the heap-space grow up,
>> until the 5 GB was reached. At this moment the described problem occurs.
>>  From this time on the heap-space-useage is between 50 and 90 percent. No
>> OutOfMemoryException occurs.
>>
>> Questions:
>>
>>
>> 1. Why does Solr use 5 GB ram, with this small amount of data?
>> 2. Which impact does the large filter-queries have in relation to ram usage?
>>
>> Thanks!
>>
>> Johannes Siegert
>>

Re: high memory usage with small data set

Posted by Erick Erickson <er...@gmail.com>.
Check the admin page for the number of used cache entries as time passes.
I'm wondering if you're consuming lots of memory but it's not apparent at
first, your caches might be filling up over time...


FWIW,

Erick
On Feb 5, 2014 8:19 AM, "Johannes Siegert" <jo...@marktjagd.de>
wrote:

> Hi Erick,
>
> thanks for your reply.
>
> What do you exactly mean with "Do your used entries in your caches
> increase in parallel?"?
>
> I update the indices every hour and commit the changes. So a new searcher
> with empty or autowarmed caches should be created and the old one should be
> removed.
>
> Johannes
>
> Am 30.01.2014 15:08, schrieb Erick Erickson:
>
>> Do your used entries in your caches increase in parallel? This would be
>> the case
>> if you aren't updating your index and would explain it. BTW, take a look
>> at your
>> cache statistics (from the admin page) and look at the cache hit ratios.
>> If they
>> are very small (and my guess is that with 1,500 boolean operations, you
>> aren't
>> getting significant re-use) then you're just wasting space, try the
>> cache=false
>> option.
>>
>> Also, how are you measuring memory? It's sometimes confusing that virtual
>> memory can be include, see:
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>>
>> Best,
>> Erick
>>
>> On Wed, Jan 29, 2014 at 7:49 AM, Johannes Siegert
>> <jo...@marktjagd.de> wrote:
>>
>>> Hi,
>>>
>>> we are using Apache Solr Cloud within a production environment. If the
>>> maximum heap-space is reached the Solr access time slows down, because of
>>> the working garbage collector for a small amount of time.
>>>
>>> We use the following configuration:
>>>
>>> - Apache Tomcat as webserver to run the Solr web application
>>> - 13 indices with about 1500000 entries (300 MB)
>>> - 5 server with one replication per index (5 GB max heap-space)
>>> - All indices have the following caches
>>>     - maximum document-cache-size is 4096 entries, all other indices have
>>> between 64 and 1536 entries
>>>     - maximum query-cache-size is 1024 entries, all other indices have
>>> between 64 and 768
>>>     - maximum filter-cache-size is 1536 entries, all other i ndices have
>>> between 64 and 1024
>>> - the directory-factory-implementation is NRTCachingDirectoryFactory
>>> - the index is updated once per hour (no auto commit)
>>> - ca. 5000 requests per hour per server
>>> - large filter-queries (up to 15000 bytes and 1500 boolean operations)
>>> - many facet-queries (30%)
>>>
>>> Behaviour:
>>>
>>> Started with 512 MB heap space. Over several days the heap-space grow up,
>>> until the 5 GB was reached. At this moment the described problem occurs.
>>>  From this time on the heap-space-useage is between 50 and 90 percent. No
>>> OutOfMemoryException occurs.
>>>
>>> Questions:
>>>
>>>
>>> 1. Why does Solr use 5 GB ram, with this small amount of data?
>>> 2. Which impact does the large filter-queries have in relation to ram
>>> usage?
>>>
>>> Thanks!
>>>
>>> Johannes Siegert
>>>
>>>