You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Andrew Stephens <An...@nu-ins.com> on 2015/06/10 14:00:12 UTC

How to handle change to index definition?

Let’s say I’ve been writing documents to the index for some time, then I decide to make a change to the index. This could be adding or removing a field, or changing how one of the fields is configured (e.g. changing the Field.Store or Field.Index parameter).

Assuming that I need this change to apply to old documents, is the only course of action to delete the index then add all the documents from scratch? And is “_writer.DeleteAll()” sufficient or would I need to delete the FS directory?

(I’ve used RavenDB in the past, which is built on Lucene, and that detects when you’ve changed the index definition, and applies the changes. I presume it must be doing something similar, i.e. deleting and recreating the index on a background thread).



Andrew Stephens | Senior Software Engineer
Andrew.Stephens@nu-ins.com<ma...@nu-ins.com>   [cid:imagec71ffe.PNG@d62c0326.42921d9c]
T: +44 (0) 1978 661304  |  F: +44 (0) 1978 664301  |


  W: www.nu-ins.com<http://www.nu-ins.com>


Unit 74 Clywedog Road South, Wrexham Industrial Estate, Wrexham,  LL13 9XS | Nu Instruments Ltd is registered in England, No.: 3046042. Registered Office: Seacourt Tower, West Way, Oxford OX2 0FB. VAT No.: GB 616 3733 45

This message is confidential and may contain privileged information and is protected by copyright. If you are not the intended recipient you should not copy or disclose this message to anyone but should kindly notify the sender and delete the message. Opinions, conclusions and other information in this message which do not relate to the official business of Nu Instruments Ltd shall be understood as neither given nor endorsed by it. Neither the Company nor the sender accepts any responsibility or liability for any loss or damage arising from the presence of any computer virus or similar harmful code contained in this email or attachment/s.  It is your responsibility to scan this email and any attachments. The Company reserves the right to access and disclose all messages sent over its email system.




Re: How to handle change to index definition?

Posted by Simon Svensson <si...@devhost.se>.
Hi,

Go with UpdateDocument and do an incremental indexing. It may be 
unnecessary now, but such a process scales better when the number of 
documents grow. It will work if indexing takes hours, and you can't lock 
the index from other writers during that time.

Calling commit will indeed persist any changes you've done, you'll also 
need to reopen your IndexReader to see the committed changes. I see no 
issues with calling Commit in your scenario. Optimize is often 
unnecessary (and causes more harm than good), you can skip calling it. 
Flush is also unnecessary, it'll only write changes to disk but not 
commit them, so you still need a call to Commit (and a Commit will Flush 
anyhow).

// Simon

On 10/06/15 14:40, Andrew Stephens wrote:
> Thanks for the reply. The number of documents will typically be in the hundreds (perhaps up to a couple of thousand), so I assume a complete reindex in one go should be fairly quick. Would you recommend IndexWriter.UpdateDocument rather than DeleteAll()?
>
> As an aside, whenever I add a document to the index I call _writer.Commit(). My understanding is that it forces the document to be added to the index immediately (there are times where my app queries the index straight afterwards, and I need to see the new document in the results). Documents are added very infrequently (usually minutes apart), so I guess calling Commit() each time is okay in this scenario? What about Optimize() and Flush() - would I ever need to use those?
>
>
>
>
> Andrew Stephens | Senior Software Engineer
> Andrew.Stephens@nu-ins.com
>
> T: +44 (0) 1978 661304 |  F: +44 (0) 1978 664301 | http://www.nu-ins.com
>
> Unit 74 Clywedog Road South, Wrexham Industrial Estate, Wrexham,  LL13 9XS |  Nu Instruments Ltd is registered in England, No.: 3046042. Registered Office: Seacourt Tower, West Way, Oxford OX2 0FB. VAT No.: GB 616 3733 45
>
> This message is confidential and may contain privileged information and is protected by copyright. If you are not the intended recipient you should not copy or disclose this message to anyone but should kindly notify the sender and delete the message. Opinions, conclusions and other information in this message which do not relate to the official business of Nu Instruments Ltd shall be understood as neither given nor endorsed by it. Neither the Company nor the sender accepts any responsibility or liability for any loss or damage arising from the presence of any computer virus or similar harmful code contained in this email or attachment/s.  It is your responsibility to scan this email and any attachments. The Company reserves the right to access and disclose all messages sent over its email system.
>
>
>
> -----Original Message-----
> From: Simon Svensson [mailto:sisve@devhost.se]
> Sent: 10 June 2015 13:10
> To: user@lucenenet.apache.org
> Subject: Re: How to handle change to index definition?
>
> Hi,
>
> You'll need to reindex all those old documents, either using IndexWriter.UpdateDocument for every document, or as you describe using IndexWriter.DeleteAll and reindex everything. Do you need incremental reindexing, or can you accept doing a complete reindex at once?
>
> No matter how you do it your index will end up with a lot of documents marked as deleted, and the new versions of those documents. The deleted documents will not be returned by searches, but they occupy disk space.
> They can be removed using IndexWriter.ExpungeDeletes, or you could optimize your index which would also remove them.
>
> // Simon
>
>
> On 10/06/15 14:00, Andrew Stephens wrote:
>> Let’s say I’ve been writing documents to the index for some time, then
>> I decide to make a change to the index. This could be adding or
>> removing a field, or changing how one of the fields is configured
>> (e.g. changing the Field.Store or Field.Index parameter).
>>
>> Assuming that I need this change to apply to old documents, is the
>> only course of action to delete the index then add all the documents
>> from scratch? And is “_writer.DeleteAll()” sufficient or would I need
>> to delete the FS directory?
>>
>> (I’ve used RavenDB in the past, which is built on Lucene, and that
>> detects when you’ve changed the index definition, and applies the
>> changes. I presume it must be doing something similar, i.e. deleting
>> and recreating the index on a background thread).
>>
>>
>>
>> AndrewStephens | Senior Software Engineer
>> Andrew.Stephens@nu-ins.com <ma...@nu-ins.com>
>>
>> T: +44 (0) 1978 661304  |  F: +44 (0) 1978 664301  |
>>
>>
>>
>>
>>    W: www.nu-ins.com <http://www.nu-ins.com>
>>
>>
>> Unit 74 Clywedog Road South, Wrexham Industrial Estate, Wrexham,  LL13
>> 9XS | Nu Instruments Ltd is registered in England, No.: 3046042.
>> Registered Office: Seacourt Tower, West Way, Oxford OX2 0FB. VAT No.:
>> GB 616 3733 45
>>
>> This message is confidential and may contain privileged information
>> and is protected by copyright. If you are not the intended recipient
>> you should not copy or disclose this message to anyone but should
>> kindly notify the sender and delete the message. Opinions, conclusions
>> and other information in this message which do not relate to the
>> official business of Nu Instruments Ltd shall be understood as neither
>> given nor endorsed by it. Neither the Company nor the sender accepts
>> any responsibility or liability for any loss or damage arising from
>> the presence of any computer virus or similar harmful code contained
>> in this email or attachment/s.  It is your responsibility to scan this
>> email and any attachments. The Company reserves the right to access
>> and disclose all messages sent over its email system.
>>
>
>


RE: How to handle change to index definition?

Posted by Andrew Stephens <An...@nu-ins.com>.
Thanks for the reply. The number of documents will typically be in the hundreds (perhaps up to a couple of thousand), so I assume a complete reindex in one go should be fairly quick. Would you recommend IndexWriter.UpdateDocument rather than DeleteAll()?

As an aside, whenever I add a document to the index I call _writer.Commit(). My understanding is that it forces the document to be added to the index immediately (there are times where my app queries the index straight afterwards, and I need to see the new document in the results). Documents are added very infrequently (usually minutes apart), so I guess calling Commit() each time is okay in this scenario? What about Optimize() and Flush() - would I ever need to use those?




Andrew Stephens | Senior Software Engineer
Andrew.Stephens@nu-ins.com

T: +44 (0) 1978 661304 |  F: +44 (0) 1978 664301 | http://www.nu-ins.com

Unit 74 Clywedog Road South, Wrexham Industrial Estate, Wrexham,  LL13 9XS |  Nu Instruments Ltd is registered in England, No.: 3046042. Registered Office: Seacourt Tower, West Way, Oxford OX2 0FB. VAT No.: GB 616 3733 45

This message is confidential and may contain privileged information and is protected by copyright. If you are not the intended recipient you should not copy or disclose this message to anyone but should kindly notify the sender and delete the message. Opinions, conclusions and other information in this message which do not relate to the official business of Nu Instruments Ltd shall be understood as neither given nor endorsed by it. Neither the Company nor the sender accepts any responsibility or liability for any loss or damage arising from the presence of any computer virus or similar harmful code contained in this email or attachment/s.  It is your responsibility to scan this email and any attachments. The Company reserves the right to access and disclose all messages sent over its email system.



-----Original Message-----
From: Simon Svensson [mailto:sisve@devhost.se]
Sent: 10 June 2015 13:10
To: user@lucenenet.apache.org
Subject: Re: How to handle change to index definition?

Hi,

You'll need to reindex all those old documents, either using IndexWriter.UpdateDocument for every document, or as you describe using IndexWriter.DeleteAll and reindex everything. Do you need incremental reindexing, or can you accept doing a complete reindex at once?

No matter how you do it your index will end up with a lot of documents marked as deleted, and the new versions of those documents. The deleted documents will not be returned by searches, but they occupy disk space.
They can be removed using IndexWriter.ExpungeDeletes, or you could optimize your index which would also remove them.

// Simon


On 10/06/15 14:00, Andrew Stephens wrote:
>
> Let’s say I’ve been writing documents to the index for some time, then
> I decide to make a change to the index. This could be adding or
> removing a field, or changing how one of the fields is configured
> (e.g. changing the Field.Store or Field.Index parameter).
>
> Assuming that I need this change to apply to old documents, is the
> only course of action to delete the index then add all the documents
> from scratch? And is “_writer.DeleteAll()” sufficient or would I need
> to delete the FS directory?
>
> (I’ve used RavenDB in the past, which is built on Lucene, and that
> detects when you’ve changed the index definition, and applies the
> changes. I presume it must be doing something similar, i.e. deleting
> and recreating the index on a background thread).
>
>
>
> AndrewStephens | Senior Software Engineer
> Andrew.Stephens@nu-ins.com <ma...@nu-ins.com>
>
> T: +44 (0) 1978 661304  |  F: +44 (0) 1978 664301  |
>
>
>
>
>   W: www.nu-ins.com <http://www.nu-ins.com>
>
>
> Unit 74 Clywedog Road South, Wrexham Industrial Estate, Wrexham,  LL13
> 9XS | Nu Instruments Ltd is registered in England, No.: 3046042.
> Registered Office: Seacourt Tower, West Way, Oxford OX2 0FB. VAT No.:
> GB 616 3733 45
>
> This message is confidential and may contain privileged information
> and is protected by copyright. If you are not the intended recipient
> you should not copy or disclose this message to anyone but should
> kindly notify the sender and delete the message. Opinions, conclusions
> and other information in this message which do not relate to the
> official business of Nu Instruments Ltd shall be understood as neither
> given nor endorsed by it. Neither the Company nor the sender accepts
> any responsibility or liability for any loss or damage arising from
> the presence of any computer virus or similar harmful code contained
> in this email or attachment/s.  It is your responsibility to scan this
> email and any attachments. The Company reserves the right to access
> and disclose all messages sent over its email system.
>




Re: How to handle change to index definition?

Posted by Simon Svensson <si...@devhost.se>.
Hi,

You'll need to reindex all those old documents, either using 
IndexWriter.UpdateDocument for every document, or as you describe using 
IndexWriter.DeleteAll and reindex everything. Do you need incremental 
reindexing, or can you accept doing a complete reindex at once?

No matter how you do it your index will end up with a lot of documents 
marked as deleted, and the new versions of those documents. The deleted 
documents will not be returned by searches, but they occupy disk space. 
They can be removed using IndexWriter.ExpungeDeletes, or you could 
optimize your index which would also remove them.

// Simon


On 10/06/15 14:00, Andrew Stephens wrote:
>
> Let’s say I’ve been writing documents to the index for some time, then 
> I decide to make a change to the index. This could be adding or 
> removing a field, or changing how one of the fields is configured 
> (e.g. changing the Field.Store or Field.Index parameter).
>
> Assuming that I need this change to apply to old documents, is the 
> only course of action to delete the index then add all the documents 
> from scratch? And is “_writer.DeleteAll()” sufficient or would I need 
> to delete the FS directory?
>
> (I’ve used RavenDB in the past, which is built on Lucene, and that 
> detects when you’ve changed the index definition, and applies the 
> changes. I presume it must be doing something similar, i.e. deleting 
> and recreating the index on a background thread).
>
>
>
> AndrewStephens | Senior Software Engineer
> Andrew.Stephens@nu-ins.com <ma...@nu-ins.com> 	
>
> T: +44 (0) 1978 661304  |  F: +44 (0) 1978 664301  |
>
> 	
>
> 	
>   W: www.nu-ins.com <http://www.nu-ins.com>
>
>
> Unit 74 Clywedog Road South, Wrexham Industrial Estate, Wrexham,  LL13 
> 9XS | Nu Instruments Ltd is registered in England, No.: 3046042. 
> Registered Office: Seacourt Tower, West Way, Oxford OX2 0FB. VAT No.: 
> GB 616 3733 45
>
> This message is confidential and may contain privileged information 
> and is protected by copyright. If you are not the intended recipient 
> you should not copy or disclose this message to anyone but should 
> kindly notify the sender and delete the message. Opinions, conclusions 
> and other information in this message which do not relate to the 
> official business of Nu Instruments Ltd shall be understood as neither 
> given nor endorsed by it. Neither the Company nor the sender accepts 
> any responsibility or liability for any loss or damage arising from 
> the presence of any computer virus or similar harmful code contained 
> in this email or attachment/s.  It is your responsibility to scan this 
> email and any attachments. The Company reserves the right to access 
> and disclose all messages sent over its email system.
>