You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Robert Stewart <Ro...@INFONGEN.COM> on 2008/08/07 21:44:53 UTC

updating existing field values

Given an existing index, is there any way to update the value of a particular field across all documents, without deleting/re-indexing documents?  For instance, if I have a date field, and I need to offset all dates based on change to stored time zone (subtract 12 hours from each value).  The sort order of field values would not change, and the postings should not need to change, only values of fields.  I do not believe there is any API to do it, but is there some lower level way to do it (modifying files manually)?  I only ask because I have a large index and I don't want to re-index all documents.

Re: updating existing field values

Posted by Michael McCandless <lu...@mikemccandless.com>.
It's also possible to create your own code that uses FieldsReader and  
FieldsWriter (both in org.apache.lucene.index) directly to create new  
fdx/fdt files, with the date fixed on each document.  You'd have to  
enumerate over all segments in your index, create a FieldsReader  
instance on each, create a FieldsWriter writing to a new segment name,  
iterate through the docs fixing your date, calling  
FieldsWriter.writeField per field then flushDocument to finish the  
document.

I think that should work, but I've never tried it!

Your code would have to be in org.apache.lucene.index since these  
classes are package private.

Mike

Andrzej Bialecki wrote:

> Robert Stewart wrote:
>> Given an existing index, is there any way to update the value of a
>> particular field across all documents, without deleting/re-indexing
>> documents?  For instance, if I have a date field, and I need to
>> offset all dates based on change to stored time zone (subtract 12
>> hours from each value).  The sort order of field values would not
>> change, and the postings should not need to change, only values of
>> fields.  I do not believe there is any API to do it, but is there
>> some lower level way to do it (modifying files manually)?  I only ask
>> because I have a large index and I don't want to re-index all
>> documents.
>
>
> Perhaps you could do this using FilterIndexReader implementation,  
> i.e. wrap your original index within this filter and then invoke  
> IndexWriter.addIndexes(IndexReader[]). If you override  
> FilterIndexReader.document(int, FieldSelector), which retrieves the  
> stored fields of a document, then you can change the values of  
> stored fields only, just before they are added to the new index.  
> Note that this works ONLY for non-indexed fields that are stored  
> (i.e. Store.YES, Index.NO) - if a field is both stored and indexed  
> then you need to modify the postings too, which requires overriding  
> other methods.
>
> -- 
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: updating existing field values

Posted by Andrzej Bialecki <ab...@getopt.org>.
Robert Stewart wrote:
> Given an existing index, is there any way to update the value of a
> particular field across all documents, without deleting/re-indexing
> documents?  For instance, if I have a date field, and I need to
> offset all dates based on change to stored time zone (subtract 12
> hours from each value).  The sort order of field values would not
> change, and the postings should not need to change, only values of
> fields.  I do not believe there is any API to do it, but is there
> some lower level way to do it (modifying files manually)?  I only ask
> because I have a large index and I don't want to re-index all
> documents.


Perhaps you could do this using FilterIndexReader implementation, i.e. 
wrap your original index within this filter and then invoke 
IndexWriter.addIndexes(IndexReader[]). If you override 
FilterIndexReader.document(int, FieldSelector), which retrieves the 
stored fields of a document, then you can change the values of stored 
fields only, just before they are added to the new index. Note that this 
works ONLY for non-indexed fields that are stored (i.e. Store.YES, 
Index.NO) - if a field is both stored and indexed then you need to 
modify the postings too, which requires overriding other methods.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org