You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Tom Mortimer <to...@lemurconsulting.com> on 2002/09/17 23:24:10 UTC

delete / optimize question

Hi all,

I'm new-ish to Lucene, and having a few problems with document deletion.  
In particular, the point at which a deleted document is no longer visible to
an IndexReader. Is the following scenario sane?

1. Open an IndexReader and delete all docs with Term("name", "Bob"), then
   close the reader.

2. Open an IndexWriter and add various "non-Bob" documents

3. then add a new document with a Term("name", "Bob").  

4. Call optimize()

5. Open an IndexReader and get docFreq for "Bob"


I'd expect the final doc freq to be 1, as all the "Bob" docs should have
been deleted except for the one added in step 3, but instead I'm getting
freq > 1.  

Do I in fact need to do the optimize() step immediately after the deletions,
and before adding any more docs?  This could be expensive with a large
index, and document additions and deletions required in random order.

Thanks for any help!

Tom



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: delete / optimize question

Posted by Joshua O'Madadhain <jm...@ics.uci.edu>.
On Tue, 17 Sep 2002, Tom Mortimer wrote:

> I'm new-ish to Lucene, and having a few problems with document deletion.  
> In particular, the point at which a deleted document is no longer visible to
> an IndexReader. Is the following scenario sane?
> 
> 1. Open an IndexReader and delete all docs with Term("name", "Bob"), then
>    close the reader.
> 
> 2. Open an IndexWriter and add various "non-Bob" documents
> 
> 3. then add a new document with a Term("name", "Bob").  
> 
> 4. Call optimize()
> 
> 5. Open an IndexReader and get docFreq for "Bob"
> 
> I'd expect the final doc freq to be 1, as all the "Bob" docs should have
> been deleted except for the one added in step 3, but instead I'm getting
> freq > 1.  
> 
> Do I in fact need to do the optimize() step immediately after the deletions,
> and before adding any more docs?  This could be expensive with a large
> index, and document additions and deletions required in random order.

I *think* that what you think should happen is what indeed should be
happening.  However, I have a few suggestions for sanity checks to make
sure that you're seeing what you think you're seeing:

(1) Call docFreq("Bob") before and after each step.  This will make sure
that (for instance) your supposed "non-Bob" documents are indeed
"non-Bob", which if false would be a subtle sort of "gotcha".

(2) Double-check that you use the same index in all cases.

(3) Are you closing the index before you call optimize()?  (According to
the docs, you shouldn't.)

(4) Are you closing the IndexWriter after you call optimize() and before
you call docFreq()?  (close() does flush changes, although I don't know
whether it should be necessary after optimize().)

Anyway, good luck.

Joshua

 jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
  Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
 It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>