You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ridwan Habbal <ri...@hotmail.com> on 2007/08/01 17:48:28 UTC

IndexReader deletes more that expected

Hi,  I got unexpected behavior while testing lucene. To shortly address the problem: Using IndexWriter I add docs with fields named ID with a consecutive order (1,2,3,4, etc) then close that index. I get new IndexReader, and call IndexReader.deleteDocuments(Term). The term is simply new Term("ID", "1"). and then class close on IndexReader. Things work out fine. But if i add docs using IndexWriter, close writer, then create new IndexReader to delete one of the docs already inserted, but without closing index. while the indexReader that perform deletion is still not closed, I add more docs, and then commit the IndexWriter, so when i search I get all added docs in the two phases (before using deleteDocuments() on IndexReader and after because i haven't closed IndexReader, howerer, closed IndexWriter). I close IndexReader and then query the index, so i deletes all docs after opening it till closing it, in addition to the specified doc in the Term object (in this test case: ID=1). I know that i can avoid this by close IndexReader directly after deleting docs, but what about runing it on mutiThread app like web application?  There you are the code: 
IndexSearcher indexSearcher = new IndexSearcher(this.indexDirectory);
Hits hitsB4InsertAndClose = null;
hitsB4InsertAndClose = getAllAsHits(indexSearcher);
int beforeInsertAndClose = hitsB4InsertAndClose.length();

indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.close();
IndexSearcher indexSearcherDel = new IndexSearcher(this.indexDirectory);
indexSearcherDel.getIndexReader().deleteDocuments(new Term("ID","1"));

indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());

indexWriter.close();
Hits hitsAfterInsertAndClose = getAllAsHits(indexSearcher);
int AfterInsertAndClose = hitsAfterInsertAndClose.length();//This is 14
 
indexWriter.addDocument(getNewElement());
indexWriter.close();
Hits hitsAfterInsertAndAfterCloseb4Delete = getAllAsHits(indexSearcher);
int hitsAfterInsertButAndAfterCountb4Delete = hitsAfterInsertAndAfterCloseb4Delete.length();//This is 15


 
indexSearcherDel.close();
Hits hitsAfterInsertAndAfterClose = getAllAsHits(indexSearcher);int hitsAfterInsertButAndAfterCount = hitsAfterInsertAndAfterClose.length();//This is 2   The two methods I Use 
private Hits getAllAsHits(IndexSearcher indexSearcher){
try{
Analyzer analyzer = new StandardAnalyzer();
String defaultSearchField = "all";
QueryParser parser = new QueryParser(defaultSearchField, analyzer);
indexSearcher = new IndexSearcher(this.indexDirectory);
Hits hits = indexSearcher.search(parser.parse("+alias:mydoc"));
indexSearcher.close();
return hits;
}catch(IOException ex){
throw new RuntimeException(ex);
}catch(org.apache.lucene.queryParser.ParseException ex){
throw new RuntimeException(ex);
}

}

private Document getNewElement(){
Map<String, String> map = new HashMap();
map.put("ID", new Integer(insertCounter).toString());
map.put("name", "name"+insertCounter);
insertCounter++;
Document document = new Document();
for (Iterator iter = map.keySet().iterator(); iter.hasNext();) {
String key = (String) iter.next();
document.add(new Field(key, map.get(key), Store.YES, Index.TOKENIZED));
}
document.add(new Field("alias", "mydoc", Store.YES, Index.UN_TOKENIZED));
return document;}  any clue why it works that way? I expect it to delete only one doc? 
_________________________________________________________________
PC Magazine’s 2007 editors’ choice for best web mail—award-winning Windows Live Hotmail.
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HMWL_mini_pcmag_0707

RE: IndexReader deletes more that expected

Posted by Steven Parkes <st...@esseff.org>.
If I'm reading this correctly, there's something a little wonky here. In
your example code, you close the IndexWriter and then, without creating
a new IndexWriter, you call addDocument again. This shouldn't be
possible (what version of Lucene are you using?)

Assuming for the time being that you are creating the IndexWriter again,
the other issue here is that you shouldn't be able to have a reader and
a writer changing an index at the same time. There should be a lock
failure. This should occur either in the Index 

Might you be creating your IndexWriters (which you don't show) with the
create flag always set to true? That will wipe your index each time,
ignoring the locks and cause all sorts of weird results.

-----Original Message-----
From: Ridwan Habbal [mailto:ridwanhabbal@hotmail.com] 
Sent: Wednesday, August 01, 2007 8:48 AM
To: java-user@lucene.apache.org
Subject: IndexReader deletes more that expected

Hi,  I got unexpected behavior while testing lucene. To shortly address
the problem: Using IndexWriter I add docs with fields named ID with a
consecutive order (1,2,3,4, etc) then close that index. I get new
IndexReader, and call IndexReader.deleteDocuments(Term). The term is
simply new Term("ID", "1"). and then class close on IndexReader. Things
work out fine. But if i add docs using IndexWriter, close writer, then
create new IndexReader to delete one of the docs already inserted, but
without closing index. while the indexReader that perform deletion is
still not closed, I add more docs, and then commit the IndexWriter, so
when i search I get all added docs in the two phases (before using
deleteDocuments() on IndexReader and after because i haven't closed
IndexReader, howerer, closed IndexWriter). I close IndexReader and then
query the index, so i deletes all docs after opening it till closing it,
in addition to the specified doc in the Term object (in this test case:
ID=1). I know that i can avoid this by close IndexReader directly after
deleting docs, but what about runing it on mutiThread app like web
application?  There you are the code: 
IndexSearcher indexSearcher = new IndexSearcher(this.indexDirectory);
Hits hitsB4InsertAndClose = null;
hitsB4InsertAndClose = getAllAsHits(indexSearcher);
int beforeInsertAndClose = hitsB4InsertAndClose.length();

indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.close();
IndexSearcher indexSearcherDel = new IndexSearcher(this.indexDirectory);
indexSearcherDel.getIndexReader().deleteDocuments(new Term("ID","1"));

indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());
indexWriter.addDocument(getNewElement());

indexWriter.close();
Hits hitsAfterInsertAndClose = getAllAsHits(indexSearcher);
int AfterInsertAndClose = hitsAfterInsertAndClose.length();//This is 14
 
indexWriter.addDocument(getNewElement());
indexWriter.close();
Hits hitsAfterInsertAndAfterCloseb4Delete = getAllAsHits(indexSearcher);
int hitsAfterInsertButAndAfterCountb4Delete =
hitsAfterInsertAndAfterCloseb4Delete.length();//This is 15


 
indexSearcherDel.close();
Hits hitsAfterInsertAndAfterClose = getAllAsHits(indexSearcher);int
hitsAfterInsertButAndAfterCount =
hitsAfterInsertAndAfterClose.length();//This is 2   The two methods I
Use 
private Hits getAllAsHits(IndexSearcher indexSearcher){
try{
Analyzer analyzer = new StandardAnalyzer();
String defaultSearchField = "all";
QueryParser parser = new QueryParser(defaultSearchField, analyzer);
indexSearcher = new IndexSearcher(this.indexDirectory);
Hits hits = indexSearcher.search(parser.parse("+alias:mydoc"));
indexSearcher.close();
return hits;
}catch(IOException ex){
throw new RuntimeException(ex);
}catch(org.apache.lucene.queryParser.ParseException ex){
throw new RuntimeException(ex);
}

}

private Document getNewElement(){
Map<String, String> map = new HashMap();
map.put("ID", new Integer(insertCounter).toString());
map.put("name", "name"+insertCounter);
insertCounter++;
Document document = new Document();
for (Iterator iter = map.keySet().iterator(); iter.hasNext();) {
String key = (String) iter.next();
document.add(new Field(key, map.get(key), Store.YES, Index.TOKENIZED));
}
document.add(new Field("alias", "mydoc", Store.YES,
Index.UN_TOKENIZED));
return document;}  any clue why it works that way? I expect it to delete
only one doc? 
_________________________________________________________________
PC Magazine's 2007 editors' choice for best web mail-award-winning
Windows Live Hotmail.
http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migr
ation_HMWL_mini_pcmag_0707

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexReader deletes more that expected

Posted by Mark Miller <ma...@gmail.com>.
On 8/1/07, Ridwan Habbal <ri...@hotmail.com> wrote:
>
>  but what about runing it on mutiThread app like web application?  There
> you are the code:


If you are targeting a multi threaded webapp than I strongly suggest you
look into using either Solr or the LuceneIndexAccessor code. You will want
to use some form of reference counting to manage your Readers and Writers.

Also, you can now use IndexWriter (Lucene 2.0 and greater I think) to
delete. This allows for efficient mixing of deletes and adds by buffering
the deletes, and then opening an IndexReader to commit them later. This is
much more efficient than IndexModifier.

- Mark