You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ken Krugler <kk...@transpac.com> on 2014/04/19 15:51:39 UTC

Re: When not to use NRTCachingDirectory and what to use instead.

On Jul 10, 2013, at 9:16am, Shawn Heisey <so...@elyograg.org> wrote:

> On 7/10/2013 9:59 AM, Tom Burton-West wrote:
>> The Javadoc for NRTCachingDirectoy (
>> http://lucene.apache.org/core/4_3_1/core/org/apache/lucene/store/NRTCachingDirectory.html?is-external=true)
>>  says:
>> 
>>  "This class is likely only useful in a near real-time context, where
>> indexing rate is lowish but reopen rate is highish, resulting in many tiny
>> files being written..."
>> 
>> It seems like we have exactly the opposite use case, so we would like
>> advice on what directory implementation to use instead.
>> 
>> We are doing offline batch indexing, so no searches are being done.  So we
>> don't need NRT.  We also have a high indexing rate as we are trying to
>> index 3 billion pages as quickly as possible.
>> 
>> I am not clear what determines the reopen rate.   Is it only related to
>> searching or is it involved in indexing as well?
>> 
>>  Does the NRTCachingDirectory have any benefit for indexing under the use
>> case noted above?
>> 
>> I'm guessing we should just use the solrStandardDirectoryFactory instead.
>>  Is this correct?
> 
> The NRT directory object in Solr uses the MMap implementation as its default delegate.  

The code I see seems to be using an FSDirectory, or is there another layer of wrapping going on here?

    return new NRTCachingDirectory(FSDirectory.open(new File(path)), maxMergeSizeMB, maxCachedMB);

> I would use MMapDirectoryFactory (the default for most of the 3.x releases) for testing whether you can get any improvement from moving away from the default.  The advantages of memory mapping are not something you'd want to give up.

Tom - did you ever get any useful results from testing here? I'm also curious about the impact of various xxxDirectoryFactory implementations for batch indexing.

Thanks,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr






Re: When not to use NRTCachingDirectory and what to use instead.

Posted by Tom Burton-West <tb...@umich.edu>.
Hi Ken,

Given the comments which seemed to describe using NRT for the opposite of
our use case, I just set our Solr 4 to use the solr.MMapDirectoryFactory.
 Didn't bother to test whether NRT would be better for our use case, mostly
because it didn't sound like there was an advantage and   I've been focused
on other things relating to Solr 4.  , I'd love to hear any results from
someone who is testing for a  batch indexing use case and has tested
various xxxDirectoryFactory implementations.  Please let me know your
results if you do end up doing some testing.

Tom


On Sat, Apr 19, 2014 at 9:51 AM, Ken Krugler <kk...@transpac.com>wrote:

>
> Tom - did you ever get any useful results from testing here? I'm also
> curious about the impact of various xxxDirectoryFactory implementations for
> batch indexing.
>
> Thanks,
>
> -- Ken
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>

Re: When not to use NRTCachingDirectory and what to use instead.

Posted by Jeff Wartes <jw...@whitepages.com>.

On 4/19/14, 6:51 AM, "Ken Krugler" <kk...@transpac.com> wrote:
>
>The code I see seems to be using an FSDirectory, or is there another
>layer of wrapping going on here?
>
>    return new NRTCachingDirectory(FSDirectory.open(new File(path)),
>maxMergeSizeMB, maxCachedMB);


I was also curious about this subject. Not enough to test anything, but
enough to look at the code too.

FSDirectory.open picks one of MMapDirectory, SimpleFSDirectory and
NIOFSDirectory in that order of preference based on what it thinks your
system will support.

ThereĀ¹s still the possibility that the added caching functionality slows
down bulk index operations, but setting that aside, it does look like
NRTCachingDirectoryFactory is almost always the best choice.