You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by adfel70 <ad...@gmail.com> on 2013/09/24 13:51:50 UTC

Soft commit and flush

I am struggling to get a deep understanding of soft commit.
I have read  Erick's post
<http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/>  
which helped me a lot with when and why we should call each type of commit.
But still, I cant understand what exactly happens when we call soft commit:
I mean, does the new data is flushed, fsynched, or hold in the RAM... ?
I tried to test it myself and I got 2 different behaviours: 
a. If I just had 1 document that was added to the index, soft commit did not
cause index files to change.
b. If I had a big change (addition of about 100,000 docs, ~5MB tlog file),
calling the soft commit DID change the index files - so I guess that soft
commit caused fsynch.

My conclusion is that soft commit always flushes the data, but because of
the implementation of NRTCachingDirectoryFactory, the data will be written
to the disk when its getting too big. 

Can some one please correct me? 



--
View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Soft commit and flush

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

I believe data is not fsynched to disk until a hard commit (and even
then disks can lie to you and tell you data is safe even though it's
still in disk cache waiting to really be written to the medium) ,
which is why you can lose it between hard commits.  Soft commits just
make newly added docs visible in search results.

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Tue, Sep 24, 2013 at 7:51 AM, adfel70 <ad...@gmail.com> wrote:
> I am struggling to get a deep understanding of soft commit.
> I have read  Erick's post
> <http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/>
> which helped me a lot with when and why we should call each type of commit.
> But still, I cant understand what exactly happens when we call soft commit:
> I mean, does the new data is flushed, fsynched, or hold in the RAM... ?
> I tried to test it myself and I got 2 different behaviours:
> a. If I just had 1 document that was added to the index, soft commit did not
> cause index files to change.
> b. If I had a big change (addition of about 100,000 docs, ~5MB tlog file),
> calling the soft commit DID change the index files - so I guess that soft
> commit caused fsynch.
>
> My conclusion is that soft commit always flushes the data, but because of
> the implementation of NRTCachingDirectoryFactory, the data will be written
> to the disk when its getting too big.
>
> Can some one please correct me?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Soft commit and flush

Posted by Erick Erickson <er...@gmail.com>.
bq:  If so, using soft commit without calling hard commit could cause OOM

no. Aside from anything you have configured for auto(hard) commit, the
ramBufferSizeMB in solrconfig.xml will flush the in-memory structures out
to the segments when the size reaches this limit. It won't _close_ the
current segment, so it won't be permanent, but it'll limit memory consumption.

Best,
Erick

On Mon, Oct 7, 2013 at 9:40 AM, Guido Medina <gu...@temetra.com> wrote:
> Out of Memory Exception is well known as OOM.
>
> Guido.
>
>
> On 07/10/13 14:11, adfel70 wrote:
>>
>> Sorry, by "OOE" I meant Out of memory exception...
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Soft commit and flush

Posted by Guido Medina <gu...@temetra.com>.
Out of Memory Exception is well known as OOM.

Guido.

On 07/10/13 14:11, adfel70 wrote:
> Sorry, by "OOE" I meant Out of memory exception...
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Soft commit and flush

Posted by adfel70 <ad...@gmail.com>.
Sorry, by "OOE" I meant Out of memory exception...



--
View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093902.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Soft commit and flush

Posted by Erick Erickson <er...@gmail.com>.
bq: Does the NRTCachingDirectoryFactory relevant for both types of commit, or
just for hard commit

Don't know the code deeply, but NRT==Near Real Time == Soft commit I'd guess.

bq: If soft commit does not flush...

soft commit flushes the transaction log. On restart if the content of
the tlog isn't
in the index, then it's replayed to catch up the index. OOE? Out Of Energy? You
can optionally set up soft commits to fsync the tlog if you want to
eliminate the
remote possibility that you have an op-system (not JVM) crash between the time
the JVM passes the write off to the op system and the op system writes the
bits to disk.

Best,
Erick

On Mon, Oct 7, 2013 at 2:57 AM, adfel70 <ad...@gmail.com> wrote:
> I understand the bottom line that soft commits are about visibility, hard
> commits are about durability. I am just trying to gain better understanding
> what happens under the hood...
> 2 more related questions you made me think of:
> 1. Does the NRTCachingDirectoryFactory relevant for both types of commit, or
> just for hard commit?
> 2. If soft commit does not flush - all data exists in RAM until we call hard
> commit? If so, using soft commit without calling hard commit could cause OOE
> ... ?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093834.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Soft commit and flush

Posted by adfel70 <ad...@gmail.com>.
I understand the bottom line that soft commits are about visibility, hard
commits are about durability. I am just trying to gain better understanding
what happens under the hood...
2 more related questions you made me think of:
1. Does the NRTCachingDirectoryFactory relevant for both types of commit, or
just for hard commit?
2. If soft commit does not flush - all data exists in RAM until we call hard
commit? If so, using soft commit without calling hard commit could cause OOE
... ?



--
View this message in context: http://lucene.472066.n3.nabble.com/Soft-commit-and-flush-tp4091726p4093834.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Soft commit and flush

Posted by Erick Erickson <er...@gmail.com>.
Why do you care? Curiosity or are you trying to find a
behavior you can count on?

because "soft commits are about visibility, hard commits are
about durability". Meaning you can't count on a soft commit
writing anything to disk at all. I suspect in your tests the soft
commit had nothing to do with the changes on disk, those were
just a consequence of indexing more data triggering a flush
to disk and would have happened if you hadn't done a soft
commit.

hard commits are what you can control writes to disk with,
not soft commits.

Best,
Erick

On Tue, Sep 24, 2013 at 3:56 PM, Shawn Heisey <so...@elyograg.org> wrote:
> On 9/24/2013 5:51 AM, adfel70 wrote:
>>
>> My conclusion is that soft commit always flushes the data, but because of
>> the implementation of NRTCachingDirectoryFactory, the data will be written
>> to the disk when its getting too big.
>
>
> The NRTCachingDirectoryFactory (which creates NRTCachingDirectory instances)
> used by default in newer Solr versions has default settings for some of its
> parameters that show up in the solr log:
>
> maxCacheMB=48.0 maxMergeSizeMB=4.0
>
> The constructor javadocs for NRTCachingDirectory show what circumstances
> will cause the directory to use RAM instead of flushing to disk:
>
> http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/store/NRTCachingDirectory.html#NRTCachingDirectory%28org.apache.lucene.store.Directory,%20double,%20double%29
>
> "We will cache a newly created output if 1) it's a flush or a merge and the
> estimated size of the merged segment is <= maxMergeSizeMB, and 2) the total
> cached bytes is <= maxCachedMB"
>
> Thanks,
> Shawn
>

Re: Soft commit and flush

Posted by Shawn Heisey <so...@elyograg.org>.
On 9/24/2013 5:51 AM, adfel70 wrote:
> My conclusion is that soft commit always flushes the data, but because of
> the implementation of NRTCachingDirectoryFactory, the data will be written
> to the disk when its getting too big.

The NRTCachingDirectoryFactory (which creates NRTCachingDirectory 
instances) used by default in newer Solr versions has default settings 
for some of its parameters that show up in the solr log:

maxCacheMB=48.0 maxMergeSizeMB=4.0

The constructor javadocs for NRTCachingDirectory show what circumstances 
will cause the directory to use RAM instead of flushing to disk:

http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/store/NRTCachingDirectory.html#NRTCachingDirectory%28org.apache.lucene.store.Directory,%20double,%20double%29

"We will cache a newly created output if 1) it's a flush or a merge and 
the estimated size of the merged segment is <= maxMergeSizeMB, and 2) 
the total cached bytes is <= maxCachedMB"

Thanks,
Shawn