You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by JMA <mr...@comcast.net> on 2006/07/14 10:09:24 UTC

Adding fields to an existing index fails?

I have an existing index.  I want to add new fields to each document in the index.  
Ideally I would do it in-place on the same index, but let's assume I can create a new, separate index.
Easy I thought:

<code>
IndexReader reader = IndexReader.open(originalIndex);
int maxdocs = reader.maxDoc();

IndexWriter  writer = new IndexWriter("indexWithMoreFields", new StandardAnalyzer(), true);

for (int i=0; i<maxdocs; i++) {
   Document d = reader.document(i);
						
   Field f = new Field("extraField", text_for_extra_field, true, true, true);
   d.add(f);			
   writer.addDocument(d);
}
writer.close();
reader.close();
</code>

- This is not working two ways:
1) Not all documents in the original index appear be copied
2) Some of the fields from the original index are dropped when copying the document

Now I figure it has something to do with the original index not "storing" all the original fields (indexed, tokenized, not stored) and perhaps this is not the proper way to loop through an index, so I can understand that my approach is incorrect.  However, this seems like something that would be very common.  I do not want to reindex every document from scratch again just to add a field.

Help please.

Thanks,
JMA


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Adding fields to an existing index fails?

Posted by karl wettin <ka...@gmail.com>.
On Fri, 2006-07-14 at 04:09 -0400, JMA wrote:
> I have an existing index.  I want to add new fields to each document in the index.

>    Document d = reader.document(i);
> 						
>    Field f = new Field("extraField", text_for_extra_field, true, true, true);
>    d.add(f);			
>    writer.addDocument(d);

Your problem is this: Lucene is an index and not a database. A lot of
semantic information is lost when you pass a document to the writer. If
you want to add data to an existing document in the index, you really
want to remove the old and create a new one.

It would be possible to analyze the index and re-create the information,
but as far as I know, you will not find that functionality in the API.
Basically you would have to extract the tokens (unless the data is
stored and you know for a fact what analyzer you used for each
document), extract boost values, take a look at the vectors, et.c. It's
quite a messy procedure.


Please try to create a new email rather than replying to and changing
the subject of an old thread next time. And please post questions of
this character to the users- rather than the dev-list.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org