You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Venkat Rangan <ve...@clearwellsystems.com> on 2009/07/14 18:08:36 UTC

MergePolicy option to retain deleted docID positions

Hi,

 

We modified Lucene 2.4.1 sources to add a MergePolicy option to insert
empty documents in places where a document was deleted during the merge
phase. The motivation was to retain the same document locations (docID)
when there are deletions, so an external tracking database can persist
the docIDs and not be concerned about the IDs becoming invalid upon a
merge. This option in combination with a "no-optimize" allows an index
to be tolerant of a small number of deletions without requiring another
parallel immutable ID space. While it is possible to create a field
called ID within each document which is immutable, large retrievals
require a relatively expensive search of the IDs, which an immutable ID
space will avoid. This also helps in constructing a filter bit map to a
search, where the bit map was created using other business logic.

 

Our implementation has added a few unit tests to confirm its correct
operation.

 

Is there an interest in pursuing this as a useful capability?

 

Thanks,

 

Venkat Rangan

Clearwell Systems Inc.

(650) 526 0639

http://www.clearwellsystems.com <http://www.clearwellsystems.com/> 

- Delivering  Intelligent eDiscovery