You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jack McBane <ja...@gmail.com> on 2005/10/06 21:16:31 UTC

change of document ids accross optimize

I know that in general if you optimize an index the document id that a
HitCollector would see is not guarenteed to stay the same since any deleted
documents would get removed from the index completely and all later
documents pushed up to fill in the gaps. I'm wondering though, are the
document ids are preserved accross an optimize if there were no deletions
(which could be checked with a call to IndexReader.hasDeletions())?

Similarly, if you populate an index using IndexWriter.addIndexes(Directory[])
are the document ids preserved except with offsets added? That is, if you
have two indices with 10 documents each (so that each has document ids 0-9)
and combine them in this way, does document 0 in the new index correspond to
document 0 in the Directory[0] index, ..., document[9] correspond to
document 9 in Directory[0], document[10] correspond to document 0 in
Directory[1],...,document[17] correspond to document 9 in Directory[1].

I've been creating a prototype of something that for a first pass assumed
that documents are preserved as described above, but some of my results are
starting to look like that is not the case.

Thanks.

Re: change of document ids accross optimize

Posted by Yonik Seeley <ys...@gmail.com>.
ids can also change as the result of an add(), not just optimize(). An add
can trigger a segment merge which can squeeze out deleted docs and thus
change the ids. I think everything else you said is pretty much correct.

On 10/6/05, Jack McBane <ja...@gmail.com> wrote:
>
> I know that in general if you optimize an index the document id that a
> HitCollector would see is not guarenteed to stay the same since any
> deleted
> documents would get removed from the index completely and all later
> documents pushed up to fill in the gaps. I'm wondering though, are the
> document ids are preserved accross an optimize if there were no deletions
> (which could be checked with a call to IndexReader.hasDeletions())?
>
> Similarly, if you populate an index using IndexWriter.addIndexes
> (Directory[])
> are the document ids preserved except with offsets added? That is, if you
> have two indices with 10 documents each (so that each has document ids
> 0-9)
> and combine them in this way, does document 0 in the new index correspond
> to
> document 0 in the Directory[0] index, ..., document[9] correspond to
> document 9 in Directory[0], document[10] correspond to document 0 in
> Directory[1],...,document[17] correspond to document 9 in Directory[1].
>
> I've been creating a prototype of something that for a first pass assumed
> that documents are preserved as described above, but some of my results
> are
> starting to look like that is not the case.
>
> Thanks.
>
>


--
-Yonik
Now hiring -- http://tinyurl.com/7m67g