You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Andrew Stephens <An...@nu-ins.com> on 2015/06/18 13:07:16 UTC

Questions about commit and optimise

At various times through the day my desktop application adds collections of documents to a Lucene index (typically a handful each time, but can be up to a few thousand). After I’ve added each one using AddDocument(), I then call Commit(). These actions *usually* happen seconds/minutes apart throughout the day, so would you recommend that I also include a call to Optimise()? (The documentation says I should optimize after indexing).

Having said this, it’s possible that occasionally a number of these actions might be triggered immediately one after the other, so what would happen if I called AddDocument() and Commit() (and Optimise, if I included this) while Lucene was still committing/optimizing from the last time? Will it handle this gracefully or fall over?
The reason I ask is because I recently tried adding a large number of documents to Lucene, but I accidentally had the Commit() inside the foreach loop, which threw an IOException (I forget the exact message, but it was something about not being able to access one of the Lucene files). This makes me think you can’t call Commit() if Lucene is already doing something. Is this correct? If so, can I handle this in some way, e.g. waiting for Lucene to finish before calling Commit()?

Lastly, am I right in saying that most of the commonly-used Lucene methods are non-blocking? What would happen if the user closed the application while a commit or optimise was in progress?



Andrew Stephens | Senior Software Engineer
Andrew.Stephens@nu-ins.com<ma...@nu-ins.com>   [cid:image640c61.PNG@0cbe16b2.4db95903]
T: +44 (0) 1978 661304  |  F: +44 (0) 1978 664301  |


  W: www.nu-ins.com<http://www.nu-ins.com>


Unit 74 Clywedog Road South, Wrexham Industrial Estate, Wrexham,  LL13 9XS | Nu Instruments Ltd is registered in England, No.: 3046042. Registered Office: Seacourt Tower, West Way, Oxford OX2 0FB. VAT No.: GB 616 3733 45

This message is confidential and may contain privileged information and is protected by copyright. If you are not the intended recipient you should not copy or disclose this message to anyone but should kindly notify the sender and delete the message. Opinions, conclusions and other information in this message which do not relate to the official business of Nu Instruments Ltd shall be understood as neither given nor endorsed by it. Neither the Company nor the sender accepts any responsibility or liability for any loss or damage arising from the presence of any computer virus or similar harmful code contained in this email or attachment/s.  It is your responsibility to scan this email and any attachments. The Company reserves the right to access and disclose all messages sent over its email system.




Re: Questions about commit and optimise

Posted by Simon Svensson <si...@devhost.se>.
Hi,

There's no need to optimized your index at all. When in doubt, never 
optimize. If you have issues with segment sizes, switch merging 
policies. Deleted documents are pruned automatically when segments are 
merged.

Most Lucene methods are blocking. There are some operations that starts 
background threads to complete their work, but I can only think of the 
ConcurrentMergeScheduler that does this, and a failed merge (by 
terminating the application) only means that the merge will be done at a 
later date. The original data is still there.

Calling Commit in a tight loop should work, it's a blocking call that 
shouldn't return until everything is persisted to disk. That IOException 
you mention could have been a first-chance exception (if you ran with a 
debugger like Visual Studio) which happens when older index files are 
removed. Files currently in use by a reader will cause that IOException, 
and it's fine to ignore. They will be removed at a later time, when they 
are no longer locked.

To answer your two questions regarding terminating your application 
suddenly:

1. Terminating your application during a commit will result in all 
changes since your last commit to be lost. In your case it's the last 
call to AddDocument that's lost.

2. Terminating your application during an optimize will result in no 
lost data. An optimize will take an existing commit, and create a new 
commit with new segments. The previous segments are still present, and 
will be used when opening the index next time.

// Simon

On 18/06/15 13:07, Andrew Stephens wrote:
>
> At various times through the day my desktop application adds 
> collections of documents to a Lucene index (typically a handful each 
> time, but can be up to a few thousand). After I’ve added each one 
> using AddDocument(), I then call Commit(). These actions **usually** 
> happen seconds/minutes apart throughout the day, so would you 
> recommend that I also include a call to Optimise()? (The documentation 
> says I should optimize after indexing).
>
> Having said this, it’s possible that occasionally a number of these 
> actions might be triggered immediately one after the other, so what 
> would happen if I called AddDocument() and Commit() (and Optimise, if 
> I included this) while Lucene was still committing/optimizing from the 
> last time? Will it handle this gracefully or fall over?
>
> The reason I ask is because I recently tried adding a large number of 
> documents to Lucene, but I accidentally had the Commit() inside the 
> foreach loop, which threw an IOException (I forget the exact message, 
> but it was something about not being able to access one of the Lucene 
> files). This makes me think you can’t call Commit() if Lucene is 
> already doing something. Is this correct? If so, can I handle this in 
> some way, e.g. waiting for Lucene to finish before calling Commit()?
>
> Lastly, am I right in saying that most of the commonly-used Lucene 
> methods are non-blocking? What would happen if the user closed the 
> application while a commit or optimise was in progress?
>
>
>
> AndrewStephens | Senior Software Engineer
> Andrew.Stephens@nu-ins.com <ma...@nu-ins.com> 	
>
> T: +44 (0) 1978 661304  |  F: +44 (0) 1978 664301  |
>
> 	
>
> 	
>   W: www.nu-ins.com <http://www.nu-ins.com>
>
>
> Unit 74 Clywedog Road South, Wrexham Industrial Estate, Wrexham,  LL13 
> 9XS | Nu Instruments Ltd is registered in England, No.: 3046042. 
> Registered Office: Seacourt Tower, West Way, Oxford OX2 0FB. VAT No.: 
> GB 616 3733 45
>
> This message is confidential and may contain privileged information 
> and is protected by copyright. If you are not the intended recipient 
> you should not copy or disclose this message to anyone but should 
> kindly notify the sender and delete the message. Opinions, conclusions 
> and other information in this message which do not relate to the 
> official business of Nu Instruments Ltd shall be understood as neither 
> given nor endorsed by it. Neither the Company nor the sender accepts 
> any responsibility or liability for any loss or damage arising from 
> the presence of any computer virus or similar harmful code contained 
> in this email or attachment/s.  It is your responsibility to scan this 
> email and any attachments. The Company reserves the right to access 
> and disclose all messages sent over its email system.
>