You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eric Torti <er...@gmail.com> on 2015/10/07 16:48:51 UTC

Is solr.StandardDirectoryFactory an MMapDirectory?

Hello,

I'm running a 5.2.1 SolrCloud cluster and I see that one of my cores
is configured under solrconfig.xml to use

<directoryFactory
class="${solr.directoryFactory:solr.StandardDirectoryFactory}"
name="DirectoryFactory"/>

I'm just starting to grasp different strategies for Directory
implementation. Can I assume that solr.StandardDirectoryFactory is a
MMapDirectory as described by Uwe Schindler in this post about the use
of virtual memory?
[http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]

Thanks!

Best,

Eric Torti

Re: Is solr.StandardDirectoryFactory an MMapDirectory?

Posted by Eric Torti <er...@gmail.com>.
Ok, thanks Shawn!

That makes sense. We'll be experimenting with it.

Best,
Eric

On Wed, Oct 7, 2015 at 5:54 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 10/7/2015 12:00 PM, Eric Torti wrote:
>> Can we read "high reopen rate" as "frequent soft commits"? (In our
>> case, hard commits do not open a searcher. But soft commits do).
>>
>> Considering it does mean "frequent soft commits", I'd say that it
>> doesn't fit our setup because we have an index rate of about 10
>> updates/s and we perform a soft commit at each 15min. So our scenario
>> is not near real time in that sense. In light of this, do you thing
>> using NRTCachingDirectory is still convenient?
>
> The NRT factory achieves high speed in NRT situations by flushing very
> small updates to RAM instead of the disk.  As more updates come in,
> older index segments sitting in RAM will eventually be flushed to disk,
> so a sustained flood of updates doesn't really achieve a speed increase,
> but a short burst of updates will be searchable *very* quickly.
>
> NRTCachingDirectoryFactory was chosen for Solr examples (and I think
> it's the Solr default) because it has no real performance downsides, but
> has a strong possibility to be noticeably faster than the standard
> factory in NRT situations.
>
> The only problem with it is that small index segments from recent
> updates might only exist in RAM, and not get flushed to disk, so they
> would be lost if Solr dies or is killed suddenly.  This is part of why
> the updateLog feature exists -- when Solr is started, the transaction
> logs will be replayed, inserting/replacing (at a minimum) all documents
> indexed since the last hard commit.  When the replay is finished, you
> will not lose data.  This does require a defined uniqueKey to operate
> correctly.
>
> Thanks,
> Shawn
>

Re: Is solr.StandardDirectoryFactory an MMapDirectory?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/7/2015 12:00 PM, Eric Torti wrote:
> Can we read "high reopen rate" as "frequent soft commits"? (In our
> case, hard commits do not open a searcher. But soft commits do).
>
> Considering it does mean "frequent soft commits", I'd say that it
> doesn't fit our setup because we have an index rate of about 10
> updates/s and we perform a soft commit at each 15min. So our scenario
> is not near real time in that sense. In light of this, do you thing
> using NRTCachingDirectory is still convenient?

The NRT factory achieves high speed in NRT situations by flushing very
small updates to RAM instead of the disk.  As more updates come in,
older index segments sitting in RAM will eventually be flushed to disk,
so a sustained flood of updates doesn't really achieve a speed increase,
but a short burst of updates will be searchable *very* quickly.

NRTCachingDirectoryFactory was chosen for Solr examples (and I think
it's the Solr default) because it has no real performance downsides, but
has a strong possibility to be noticeably faster than the standard
factory in NRT situations.

The only problem with it is that small index segments from recent
updates might only exist in RAM, and not get flushed to disk, so they
would be lost if Solr dies or is killed suddenly.  This is part of why
the updateLog feature exists -- when Solr is started, the transaction
logs will be replayed, inserting/replacing (at a minimum) all documents
indexed since the last hard commit.  When the replay is finished, you
will not lose data.  This does require a defined uniqueKey to operate
correctly.

Thanks,
Shawn


Re: Is solr.StandardDirectoryFactory an MMapDirectory?

Posted by Eric Torti <er...@gmail.com>.
Correcting:

When I mentioned high non-JVM memory usage, what I probably meant was
high virtual memory allocation.

On Wed, Oct 7, 2015 at 3:00 PM, Eric Torti <er...@gmail.com> wrote:
> Thanks, Shawn.
>
>> After a look at the code, I found that StandardDirectoryFactory should
>> use MMap if the OS and Java version support it.  If support isn't there,
>> it will use conventional file access methods.  As far as I know, all
>> 64-bit Java versions and 64-bit operating systems will support MMap.
>
> Considering our JVM is 64-bit, that probably explains why we're
> experiencing MMapDirectory like behaviour on our cluster (i.e. high
> non-JVM related memory use).
>
> As to NRTCachingDirectoryFactory, when looking up the docs we were in
> doubt about what it means to have a "highish reopen rate".
>
>> public class NRTCachingDirectory
>
>> This class is likely only useful in a near-real-time context, where indexing rate is lowish but reopen rate is highish, > resulting in many tiny files being written.
>
> Can we read "high reopen rate" as "frequent soft commits"? (In our
> case, hard commits do not open a searcher. But soft commits do).
>
> Considering it does mean "frequent soft commits", I'd say that it
> doesn't fit our setup because we have an index rate of about 10
> updates/s and we perform a soft commit at each 15min. So our scenario
> is not near real time in that sense. In light of this, do you thing
> using NRTCachingDirectory is still convenient?
>
> Best,
>
> Eric
>
>
>
> On Wed, Oct 7, 2015 at 12:08 PM, Shawn Heisey <ap...@elyograg.org> wrote:
>> On 10/7/2015 8:48 AM, Eric Torti wrote:
>>> <directoryFactory
>>> class="${solr.directoryFactory:solr.StandardDirectoryFactory}"
>>> name="DirectoryFactory"/>
>>>
>>> I'm just starting to grasp different strategies for Directory
>>> implementation. Can I assume that solr.StandardDirectoryFactory is a
>>> MMapDirectory as described by Uwe Schindler in this post about the use
>>> of virtual memory?
>>> [http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]
>>
>> After a look at the code, I found that StandardDirectoryFactory should
>> use MMap if the OS and Java version support it.  If support isn't there,
>> it will use conventional file access methods.  As far as I know, all
>> 64-bit Java versions and 64-bit operating systems will support MMap.
>>
>> The factory you *should* be using is NRTCachingDirectoryFactory, and you
>> should enable the updateLog to ensure data reliability.
>>
>> Thanks,
>> Shawn
>>

Re: Is solr.StandardDirectoryFactory an MMapDirectory?

Posted by Eric Torti <er...@gmail.com>.
Thanks, Shawn.

> After a look at the code, I found that StandardDirectoryFactory should
> use MMap if the OS and Java version support it.  If support isn't there,
> it will use conventional file access methods.  As far as I know, all
> 64-bit Java versions and 64-bit operating systems will support MMap.

Considering our JVM is 64-bit, that probably explains why we're
experiencing MMapDirectory like behaviour on our cluster (i.e. high
non-JVM related memory use).

As to NRTCachingDirectoryFactory, when looking up the docs we were in
doubt about what it means to have a "highish reopen rate".

> public class NRTCachingDirectory

> This class is likely only useful in a near-real-time context, where indexing rate is lowish but reopen rate is highish, > resulting in many tiny files being written.

Can we read "high reopen rate" as "frequent soft commits"? (In our
case, hard commits do not open a searcher. But soft commits do).

Considering it does mean "frequent soft commits", I'd say that it
doesn't fit our setup because we have an index rate of about 10
updates/s and we perform a soft commit at each 15min. So our scenario
is not near real time in that sense. In light of this, do you thing
using NRTCachingDirectory is still convenient?

Best,

Eric



On Wed, Oct 7, 2015 at 12:08 PM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 10/7/2015 8:48 AM, Eric Torti wrote:
>> <directoryFactory
>> class="${solr.directoryFactory:solr.StandardDirectoryFactory}"
>> name="DirectoryFactory"/>
>>
>> I'm just starting to grasp different strategies for Directory
>> implementation. Can I assume that solr.StandardDirectoryFactory is a
>> MMapDirectory as described by Uwe Schindler in this post about the use
>> of virtual memory?
>> [http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]
>
> After a look at the code, I found that StandardDirectoryFactory should
> use MMap if the OS and Java version support it.  If support isn't there,
> it will use conventional file access methods.  As far as I know, all
> 64-bit Java versions and 64-bit operating systems will support MMap.
>
> The factory you *should* be using is NRTCachingDirectoryFactory, and you
> should enable the updateLog to ensure data reliability.
>
> Thanks,
> Shawn
>

Re: Is solr.StandardDirectoryFactory an MMapDirectory?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/7/2015 8:48 AM, Eric Torti wrote:
> <directoryFactory
> class="${solr.directoryFactory:solr.StandardDirectoryFactory}"
> name="DirectoryFactory"/>
>
> I'm just starting to grasp different strategies for Directory
> implementation. Can I assume that solr.StandardDirectoryFactory is a
> MMapDirectory as described by Uwe Schindler in this post about the use
> of virtual memory?
> [http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html]

After a look at the code, I found that StandardDirectoryFactory should
use MMap if the OS and Java version support it.  If support isn't there,
it will use conventional file access methods.  As far as I know, all
64-bit Java versions and 64-bit operating systems will support MMap.

The factory you *should* be using is NRTCachingDirectoryFactory, and you
should enable the updateLog to ensure data reliability.

Thanks,
Shawn