You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by David Townsend <da...@magus.co.uk> on 2004/12/03 17:05:10 UTC

hits.length() changes during delete process.

I have a delete script

IndexSearcher searcher = new IndexSearcher(reader);

Hits hits = searcher.search(query);
log.info("there are " + hits.length() + " hits");

for (int i = 0; i < hits.length(); i++) {
  log.info(hits.length() + " " + i + " " + hits.id(i));
  reader.delete(hits.id(i));
}

which iterates through the results of a search and deletes the returns.  I keep getting an ArrayIndexOutOfBoundsException.  I've found the reason is that hits.length() actually changes during the iteration in large regular steps i.e 

The hits length is initially 10003

after 100 deletions hits.length() changes to  9903
after 200 deletions hits.length() changes to 9803

then changes after 
200 deletions
400
800
1600
3200

So the short question is, should the hits object be changing and what is the best way to delete all the results of a search (it's a range query so I can't use delete(Term term)? 

cheers.

David

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: hits.length() changes during delete process.

Posted by Morus Walter <mo...@tanto.de>.
David Townsend writes:
> 
> So the short question is, should the hits object be changing and what is the best way to delete all the results of a search (it's a range query so I can't use delete(Term term)? 
> 
The hits object caches only part of the hits (initially the first 100 (?)). 
This cache is extended if further hits are accessed by repeating the search. 
Since you deleted part of the hits at this point, your hits object changes.
You should be able to get around this by either scanning the hits objects
from end to start instead of start to end or deleting with a different
index reader. In the latter case the searcher should not see the deletions.
Reversing the order might be preferable, since it implies only one search
repetition.
(both suggestions untested)

The "best" way would probably be, to avoid a hit object anyway and delete
the documents at the level where the hits object is created. Have a look
at the sources for details. (also untested; I never needed more than 
term based deletions)

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org