You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rod Giles <Ro...@ventyx.com> on 2007/10/04 02:29:28 UTC

FW: Eliminating duplicate documents when indexing

Duplicate Documents In An Index

The updateDocument method of Index Writer indicates that a delete term
occurs before the update

document takes place (i.e. the document is replaced in the index, but
not duplicated).    Has anyone

been able to get this process to work?  The term that I am using has a
unique key that comes directly

from the primary key of a database table.   But, my updated documents
are still consistently duplicated

in my index when the writer is eventually flushed.   I am tried using
Lucene 2.2 and nightly build

Lucene-2007-10-02_02-29-37 (which correctly includes the
setRamBufferSizeMB() method for Index Writer).

 

 

Optimizing An Index

Also, in the nightly build lucene-2007-10-02_02-29-37, the optimize()
method of Index Writer appears to

have been broken.  My existing code, which worked with Lucene 2.2,
consistently throws an illegal argument

exception when this method is executed.

 

 

 



DISCLAIMER:
Please note that our email and web site addresses have changed.
************************************************************************
This email message and all attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. Please DO NOT forward this email outside of the recipient's Company unless expressly authorized to do so herein. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Any views expressed in this email message are those of the individual sender except where the sender specifically states them to be the views of Ventyx.
************************************************************************