You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Rod Giles <Ro...@ventyx.com> on 2007/10/04 02:29:28 UTC
FW: Eliminating duplicate documents when indexing
Duplicate Documents In An Index
The updateDocument method of Index Writer indicates that a delete term
occurs before the update
document takes place (i.e. the document is replaced in the index, but
not duplicated). Has anyone
been able to get this process to work? The term that I am using has a
unique key that comes directly
from the primary key of a database table. But, my updated documents
are still consistently duplicated
in my index when the writer is eventually flushed. I am tried using
Lucene 2.2 and nightly build
Lucene-2007-10-02_02-29-37 (which correctly includes the
setRamBufferSizeMB() method for Index Writer).
Optimizing An Index
Also, in the nightly build lucene-2007-10-02_02-29-37, the optimize()
method of Index Writer appears to
have been broken. My existing code, which worked with Lucene 2.2,
consistently throws an illegal argument
exception when this method is executed.
DISCLAIMER:
Please note that our email and web site addresses have changed.
************************************************************************
This email message and all attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. Please DO NOT forward this email outside of the recipient's Company unless expressly authorized to do so herein. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Any views expressed in this email message are those of the individual sender except where the sender specifically states them to be the views of Ventyx.
************************************************************************