You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Manohar Sripada <ma...@gmail.com> on 2014/12/04 17:48:48 UTC

Question on Solr Caching

Hi,

I am working on implementing Solr in my product. I have a few questions on
caching.

1. Does posting-list and term-list of the index reside in the memory? If
not, how to load this to memory. I don't want to load entire data, like
using DocumentCache. Either I want to use RAMDirectoryFactory as the data
will be lost if you restart

2. For FilterCache, there is a way to specify whether the filter should be
cached or not in the query. Similarly, Is there a way where I can specify
the list of stored fields to be loaded to Document Cache? I know Document
Cache is not associated to query. Just curious to know.

3. Similarly, Is there a way I can specify list of fields to be cached for
FieldCache?

Thanks,
Manohar

Re: Question on Solr Caching

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/8/2014 2:42 AM, Manohar Sripada wrote:
> Can you please re-direct me to any wiki which describes (in detail) the
> differences between MMapDirectoryFactory and NRTCachingDirectoryFactory? I
> found this blog
> <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html> very
> helpful which describes about MMapDirectory. I want to know in detail about
> NRTCachingFactory as well.
> 
> Also, when I ran this rest request solr/admin/cores?action=STATUS, I got
> the below result (pasted partial result only). I have set the
> DirectoryFactory as NRTCachingDirectory in solrconfig.xml. But, it also
> shows MMapDirectory in the below element. Does this means
> NRTCachingDirectory is using MMapDirectory internally??
> 
> <str name="directory">
> org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/instance/solr/collection1_shard2_replica1/data/index
> lockFactory=NativeFSLockFactory@/instance/solr/collection1_shard2_replica1/data/index;
> maxCacheMB=48.0 maxMergeSizeMB=4.0)</str>
> 
> What does maxCacheMB and maxMergeSizeMB indicate? How to control it?

NRTCachingDirectoryFactory creates instances of NRTCachingDirectory.
This is is a wrapper on top of another Directory implementation.
Normally it wraps MMapDirectory, so you get all the MMap advantages.
The javadoc for NRTCachingDirectory says that it "Wraps a RAMDirectory
around any provided delegate directory, to be used during NRT search."

http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/store/NRTCachingDirectory.html

Further down in that javadoc, the constructor documentation has this to
say: "We will cache a newly created output if 1) it's a flush or a merge
and the estimated size of the merged segment is <= maxMergeSizeMB, and
2) the total cached bytes is <= maxCachedMB"

Basically, if a newly created or merged segment is small enough, it
won't be written to disk right away, it will be saved into RAM until
another cacheable segment won't fit in available RAM and the oldest
cached segment must be flushed to disk.  Near Real Time search becomes
easier.

This DirectoryFactory implementation is default in 4.x, so as I
understand it, it's critically important for Solr to have a replayable
transaction log ... without it, any data that is cached in RAM will be
lost if the program crashes or exits.  The main Solr example *does* have
the transaction log enabled.

Thanks,
Shawn


Re: Question on Solr Caching

Posted by Manohar Sripada <ma...@gmail.com>.
Thanks Shawn,

Can you please re-direct me to any wiki which describes (in detail) the
differences between MMapDirectoryFactory and NRTCachingDirectoryFactory? I
found this blog
<http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html> very
helpful which describes about MMapDirectory. I want to know in detail about
NRTCachingFactory as well.

Also, when I ran this rest request solr/admin/cores?action=STATUS, I got
the below result (pasted partial result only). I have set the
DirectoryFactory as NRTCachingDirectory in solrconfig.xml. But, it also
shows MMapDirectory in the below element. Does this means
NRTCachingDirectory is using MMapDirectory internally??

<str name="directory">
org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/instance/solr/collection1_shard2_replica1/data/index
lockFactory=NativeFSLockFactory@/instance/solr/collection1_shard2_replica1/data/index;
maxCacheMB=48.0 maxMergeSizeMB=4.0)</str>

What does maxCacheMB and maxMergeSizeMB indicate? How to control it?


Thanks,
Manohar

On Fri, Dec 5, 2014 at 11:04 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 12/4/2014 10:06 PM, Manohar Sripada wrote:
> > If you use MMapDirectory, Lucene will map the files into memory off heap
> > and the OS's disk cache will cache the files in memory for you. Don't use
> > RAMDirectory, it's not better than MMapDirectory for any use I'm aware
> of.
> >
> >> Will that mean it will cache the Inverted index as well to OS disk's
> > cache? The reason I am asking is, Solr searches this Inverted Index first
> > to get the data. How about if we can keep this in memory?
>
> If you have enough memory, the operating system will cache *everything*.
>  It does so by simply loading the data that's on the disk into RAM ...
> it is not aware that certain parts are the inverted index, it simply
> caches whatever data gets read.  A subsequent read will come out of
> memory, the disk heads will never even move.  If certain data in the
> index is never accessed, then it will not get cached.
>
> http://en.wikipedia.org/wiki/Page_cache
>
> Thanks,
> Shawn
>
>

Re: Question on Solr Caching

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/4/2014 10:06 PM, Manohar Sripada wrote:
> If you use MMapDirectory, Lucene will map the files into memory off heap
> and the OS's disk cache will cache the files in memory for you. Don't use
> RAMDirectory, it's not better than MMapDirectory for any use I'm aware of.
> 
>> Will that mean it will cache the Inverted index as well to OS disk's
> cache? The reason I am asking is, Solr searches this Inverted Index first
> to get the data. How about if we can keep this in memory?

If you have enough memory, the operating system will cache *everything*.
 It does so by simply loading the data that's on the disk into RAM ...
it is not aware that certain parts are the inverted index, it simply
caches whatever data gets read.  A subsequent read will come out of
memory, the disk heads will never even move.  If certain data in the
index is never accessed, then it will not get cached.

http://en.wikipedia.org/wiki/Page_cache

Thanks,
Shawn


Re: Question on Solr Caching

Posted by Manohar Sripada <ma...@gmail.com>.
Thanks Micheal for the response.

If you use MMapDirectory, Lucene will map the files into memory off heap
and the OS's disk cache will cache the files in memory for you. Don't use
RAMDirectory, it's not better than MMapDirectory for any use I'm aware of.

> Will that mean it will cache the Inverted index as well to OS disk's
cache? The reason I am asking is, Solr searches this Inverted Index first
to get the data. How about if we can keep this in memory?

Thanks,
Manohar



On Thu, Dec 4, 2014 at 10:54 PM, Michael Della Bitta <
michael.della.bitta@appinions.com> wrote:

> Hi, Manohar,
>
>  1. Does posting-list and term-list of the index reside in the memory? If
>>
> not, how to load this to memory. I don't want to load entire data, like
> using DocumentCache. Either I want to use RAMDirectoryFactory as the data
> will be lost if you restart
>
>
> If you use MMapDirectory, Lucene will map the files into memory off heap
> and the OS's disk cache will cache the files in memory for you. Don't use
> RAMDirectory, it's not better than MMapDirectory for any use I'm aware of.
>
> > 2. For FilterCache, there is a way to specify whether the filter should
> be cached or not in the query.
>
> If you add {!cache=false}  to your filter query, it will bypass the cache.
> I'm fairly certain it will not subsequently be cached.
>
> > Similarly, Is there a way where I can specify the list of stored fields
> to be loaded to Document Cache?
>
> If you have lazy loading enabled, the DocumentCache will only have the
> fields you asked for in it.
>
> > 3. Similarly, Is there a way I can specify list of fields to be cached
> for FieldCache? Thanks, Manohar
>
> You basically don't have much control over the FieldCache in Solr other
> than warming it with queries.
>
> You should check out this wiki page, it will probably answer some
> questions:
>
> https://wiki.apache.org/solr/SolrCaching
>
> I hope that helps!
>
> Michael
>
>

Re: Question on Solr Caching

Posted by Michael Della Bitta <mi...@appinions.com>.
Hi, Manohar,

> 1. Does posting-list and term-list of the index reside in the memory? If
not, how to load this to memory. I don't want to load entire data, like
using DocumentCache. Either I want to use RAMDirectoryFactory as the data
will be lost if you restart


If you use MMapDirectory, Lucene will map the files into memory off heap 
and the OS's disk cache will cache the files in memory for you. Don't 
use RAMDirectory, it's not better than MMapDirectory for any use I'm 
aware of.

 > 2. For FilterCache, there is a way to specify whether the filter 
should be cached or not in the query.

If you add {!cache=false}  to your filter query, it will bypass the 
cache. I'm fairly certain it will not subsequently be cached.

 > Similarly, Is there a way where I can specify the list of stored 
fields to be loaded to Document Cache?

If you have lazy loading enabled, the DocumentCache will only have the 
fields you asked for in it.

 > 3. Similarly, Is there a way I can specify list of fields to be 
cached for FieldCache? Thanks, Manohar

You basically don't have much control over the FieldCache in Solr other 
than warming it with queries.

You should check out this wiki page, it will probably answer some questions:

https://wiki.apache.org/solr/SolrCaching

I hope that helps!

Michael