You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by kiwi clive <ki...@yahoo.com.INVALID> on 2015/08/04 17:41:22 UTC

Lucene Searcher Caching and Performance

Hi Guys,
We have an index/query server that contains several thousand fairly hefty indexes. Each searcher is shared between many 'user-threads' and once opened we keep the searcher in a cache which is refreshed depending on how often it is used. Due to memory limitations on the server, we need some kind of LRU mechanism to drop unused searchers to make way for newer ones.
We are seeing load spikes when we get hit by queries that try to open several non-cached searches at the same (or at least a small delta) time. This looks to be the disks struggling to open all the appropriate files for that period, and it takes a little while for the server to return to normal operating limits thereafter.
Given that upgrading hardware/memory is not currently an option, we need a way to smooth over these spikes, even if it is at the cost of slowing query performance overall.

It strikes me that if we could cache all of our searchers on the machine (ie have all of our indexes 'open for business'), possibly having to alter kernel parameters to cater for the large number of file handles, without caching many query results, this might solve the problem, without pushing memory usage too high. Also, the higher number of searchers stored in the heap is going to steal space from the lucene filecache so is there a recommended mechanism for doing this?
So is there a way to mimimize the searcher cache memory footprint to possibly keep more of them in memory, at the cost of storing less data?
Any insight would be most appreciated.
ThanksClive

Re: Lucene Searcher Caching and Performance

Posted by "McKinley, James T" <ja...@cengage.com>.

Hi Clive,

We essentially do what you're suggesting, namely we create a single index searcher (as well as the directory reader it uses) on each partition that is shared amongst all threads.  We also perform various index operations (searching, browsing terms etc.) for a while to "warm up" Lucene's internal data structures as well as the Linux OS file caches prior to putting the partition server in service.  I don't know if this is the "recommended" method, but it seems to work for us.

Jim
________________________________________
From: kiwi clive <ki...@yahoo.com.INVALID>
Sent: 04 August 2015 11:41
To: Java-user
Subject: Lucene Searcher Caching and Performance

Hi Guys,
We have an index/query server that contains several thousand fairly hefty indexes. Each searcher is shared between many 'user-threads' and once opened we keep the searcher in a cache which is refreshed depending on how often it is used. Due to memory limitations on the server, we need some kind of LRU mechanism to drop unused searchers to make way for newer ones.
We are seeing load spikes when we get hit by queries that try to open several non-cached searches at the same (or at least a small delta) time. This looks to be the disks struggling to open all the appropriate files for that period, and it takes a little while for the server to return to normal operating limits thereafter.
Given that upgrading hardware/memory is not currently an option, we need a way to smooth over these spikes, even if it is at the cost of slowing query performance overall.

It strikes me that if we could cache all of our searchers on the machine (ie have all of our indexes 'open for business'), possibly having to alter kernel parameters to cater for the large number of file handles, without caching many query results, this might solve the problem, without pushing memory usage too high. Also, the higher number of searchers stored in the heap is going to steal space from the lucene filecache so is there a recommended mechanism for doing this?
So is there a way to mimimize the searcher cache memory footprint to possibly keep more of them in memory, at the cost of storing less data?
Any insight would be most appreciated.
ThanksClive

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Lucene Searcher Caching and Performance

Posted by Lutz Fechner <LF...@hubwoo.com>.

Hi,

not sure what Lucene Version you are using and how you cache you Readers.

But you can adjust memory consumption per reader afaik by setings "TermInfos index divisor" to something larger 1 when opening the Searcher.
At least that works for 2.9.4.

" it takes a little while for the server to return to normal operating limits thereafter " - Did you check your VM behavior? 
Just adding more mem to it might not always work (or even make the problem worse) - depending on VM (vendor and version), and GC in place. 

Regards

Lutz

-----Original Message-----
From: kiwi clive [mailto:kiwi_clive@yahoo.com.INVALID] 
Sent: Dienstag, 4. August 2015 17:41
To: Java-user
Subject: Lucene Searcher Caching and Performance

Hi Guys,
We have an index/query server that contains several thousand fairly hefty indexes. Each searcher is shared between many 'user-threads' and once opened we keep the searcher in a cache which is refreshed depending on how often it is used. Due to memory limitations on the server, we need some kind of LRU mechanism to drop unused searchers to make way for newer ones.
We are seeing load spikes when we get hit by queries that try to open several non-cached searches at the same (or at least a small delta) time. This looks to be the disks struggling to open all the appropriate files for that period, and it takes a little while for the server to return to normal operating limits thereafter.
Given that upgrading hardware/memory is not currently an option, we need a way to smooth over these spikes, even if it is at the cost of slowing query performance overall. 

It strikes me that if we could cache all of our searchers on the machine (ie have all of our indexes 'open for business'), possibly having to alter kernel parameters to cater for the large number of file handles, without caching many query results, this might solve the problem, without pushing memory usage too high. Also, the higher number of searchers stored in the heap is going to steal space from the lucene filecache so is there a recommended mechanism for doing this?
So is there a way to mimimize the searcher cache memory footprint to possibly keep more of them in memory, at the cost of storing less data?
Any insight would be most appreciated.
ThanksClive