You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Joseph Ottinger <jo...@enigmastation.com> on 2003/03/05 17:59:11 UTC

IndexReader.delete(int) not working for me

I've got a versioning content system where I want to replace documents in
a lucene repository. To do so, according to the FAQ and the mailing list
archives, I need to open an IndexReader, look for the document in
question, delete it via the IndexReader, and then add it.

This shouldn't replace the document per se - it should, however, free the
index entry (for reuse by documents added later) as I understand it. It
should also mark the document as deleted. A query still may return the
document (again, as I understand it), requiring a filter to make sure
deleted documents aren't returned.

If I'm offbase in my understanding, I apologize - this is the best I can
tell.

In my removeDocument() method (names and parameters are obscured to remove
cruft not germane to the problem at hand), I iterate through the
IndexReader's documents (because there are non-indexed identifiers used).
When I hit a document that contains the correct identifiers, I use
ir.delete(idx), and output a log message that I'm deleting the document.

This part works as expected. (A log message for one entry is spit out.)

Now, however, when I search for documents, things go awry. I'm using the
standard analyzer (StandardAnalyzer, I should say), and
IndexSearcher(String). I then use code like the following:

Hits hits=searcher.search(query, new Filter() {
  public BitSet bits(IndexReader ir) throws IOException {
    BitSet bs=new BitSet();
    for(int idx=0;idx<ir.maxDoc();idx++) {
      boolean deleted=ir.isDeleted(idx);
      bs.set(idx, !deleted);
    }
    return bs;
  }
});

(I also have a log message to output the salient information about the
document and whether it's been deleted.)

Here's where the problem evinces itself: *every* document here says that
it's not deleted, even though the removeDocument() method mentioned above
doesn't show all of the documents returned here. It's almost like there
are two IndexReaders in action, one noting the deleted documents, and the
other not. It's very confusing to me. Can anyone give me any pointers?

---------------------------------------------------------
Joseph B. Ottinger                 joeo@enigmastation.com
http://enigmastation.com                    IT Consultant


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: IndexReader.delete(int) not working for me

Posted by Joseph Ottinger <jo...@enigmastation.com>.
Okay, I found the problem: it was a stupid coder. To wit, here's the
salient code:
Document d=indexReader.document(i);
if(d.getField("key").equals(node.getKey()) {
   ...
}

The error, of course, is that getField.equals() is comparing FIELDS and
not string values. When I changed this to pull the stringValue() out of
getField(), everything worked as expected. Turns out my logging actually
was spitting out the *wrong* message somewhere else, which deceived
me^Wthe stupid coder into thinking the removal was occurring when it was
not.

Now everything's working fine. Thank you for your time.

On Wed, 5 Mar 2003, Doug Cutting wrote:

> Joseph Ottinger wrote:
> > Then this means that my IndexReader.delete(i) isn't working properly. What
> > would be the common causes for this? My log shows the documents being
> > deleted, so something's going wrong at that point.
>
> Are you closing the IndexReader after doing the deletes?  This is
> required for the deletions to be saved.
>
> What makes you think that that delete is not working properly?
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------
Joseph B. Ottinger                 joeo@enigmastation.com
http://enigmastation.com                    IT Consultant


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: IndexReader.delete(int) not working for me

Posted by Joseph Ottinger <jo...@enigmastation.com>.
Okay, I think I've done something stupid here: on closer examination, it
looks like my comparison to find the specific documents to delete is
failing. Let me look further at that.

On Wed, 5 Mar 2003, Doug Cutting wrote:
> Joseph Ottinger wrote:
> > Then this means that my IndexReader.delete(i) isn't working properly. What
> > would be the common causes for this? My log shows the documents being
> > deleted, so something's going wrong at that point.
>
> Are you closing the IndexReader after doing the deletes?  This is
> required for the deletions to be saved.
>
> What makes you think that that delete is not working properly?
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------
Joseph B. Ottinger                 joeo@enigmastation.com
http://enigmastation.com                    IT Consultant


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: IndexReader.delete(int) not working for me

Posted by Doug Cutting <cu...@lucene.com>.
Joseph Ottinger wrote:
> Then this means that my IndexReader.delete(i) isn't working properly. What
> would be the common causes for this? My log shows the documents being
> deleted, so something's going wrong at that point.

Are you closing the IndexReader after doing the deletes?  This is 
required for the deletions to be saved.

What makes you think that that delete is not working properly?

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: IndexReader.delete(int) not working for me

Posted by Joseph Ottinger <jo...@enigmastation.com>.
Then this means that my IndexReader.delete(i) isn't working properly. What
would be the common causes for this? My log shows the documents being
deleted, so something's going wrong at that point.

On Wed, 5 Mar 2003, Doug Cutting wrote:

> Joseph Ottinger wrote:
> > This shouldn't replace the document per se - it should, however, free the
> > index entry (for reuse by documents added later) as I understand it. It
> > should also mark the document as deleted. A query still may return the
> > document (again, as I understand it), requiring a filter to make sure
> > deleted documents aren't returned.
>
> Searches results do not include deleted documents, so you do not need to
> explicitly filter for them.  After a document is deleted, the space
> consumed by it may not be reclaimed for a while, and some term
> statistics may not be updated immediately, but Lucene never returns
> references deleted documents.
>
> Doug
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------
Joseph B. Ottinger                 joeo@enigmastation.com
http://enigmastation.com                    IT Consultant


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: IndexReader.delete(int) not working for me

Posted by Doug Cutting <cu...@lucene.com>.
Joseph Ottinger wrote:
> I've got a versioning content system where I want to replace documents in
> a lucene repository. To do so, according to the FAQ and the mailing list
> archives, I need to open an IndexReader, look for the document in
> question, delete it via the IndexReader, and then add it.
> 
> This shouldn't replace the document per se - it should, however, free the
> index entry (for reuse by documents added later) as I understand it. It
> should also mark the document as deleted. A query still may return the
> document (again, as I understand it), requiring a filter to make sure
> deleted documents aren't returned.

Searches results do not include deleted documents, so you do not need to 
explicitly filter for them.  After a document is deleted, the space 
consumed by it may not be reclaimed for a while, and some term 
statistics may not be updated immediately, but Lucene never returns 
references deleted documents.

Doug



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org