You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Christoph Goller <go...@detego-software.de> on 2003/09/03 17:21:48 UTC

PATCH: IndexWriter

IndexWriter implements the method docCount() which reads the number
of documents from the SegmentInfos of the index. However, it delivers
incorrect values if documents get deleted from the index. The reason for
this is that SegmentInfo.docCounts are updated in an incorrect way when
segments get merged. The new value is taken from the old SegmentInfos.
It would be better to take the value from the reader instead. In this
way indexWriter.docCount() would deliver the same value as
indexReader.maxDoc().

test and patch are attached,
Christoph


-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 München, Germany     Email:  goller@detego-software.de  *
*****************************************************************

Re: PATCH: IndexWriter

Posted by Christoph Goller <go...@detego-software.de>.
My patch makes indexWriter.docCount() deliver the same values as
indexReader.maxDoc(). I attach a patch to TestIndexWriter that
demonstrates the difference. The bug in current IndexWriter has the effect
that indexWriter.docCount() never updates any changes that result
from deleting documents.

Christoph


Otis Gospodnetic wrote:
> Christoph,
> 
> The idea looks good, but the test fails for both pre-patched as well as
> patched version of IndexWriter.
> 
> I converted your test to JUnit test and will check it into CVS shortly.
> If I made a mistake in it, please point it out.
> You can run 'ant test-unit' to see where the test fails.
> 
> Otis
> 
> --- Christoph Goller <go...@detego-software.de> wrote:
> 
>>IndexWriter implements the method docCount() which reads the number
>>of documents from the SegmentInfos of the index. However, it delivers
>>incorrect values if documents get deleted from the index. The reason
>>for
>>this is that SegmentInfo.docCounts are updated in an incorrect way
>>when
>>segments get merged. The new value is taken from the old
>>SegmentInfos.
>>It would be better to take the value from the reader instead. In this
>>way indexWriter.docCount() would deliver the same value as
>>indexReader.maxDoc().
>>
>>test and patch are attached,
>>Christoph
>>
>>
>>-- 
>>*****************************************************************
>>* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
>>* Detego Software GmbH       Mobile: +49 179 1128469            *
>>* Keuslinstr. 13             Fax.:   +49 721 151516176          *
>>* 80798 München, Germany     Email:  goller@detego-software.de  *
>>*****************************************************************
>>
>>>Index: IndexWriter.java
>>
>>===================================================================
>>RCS file:
>>
> 
> /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java,v
> 
>>retrieving revision 1.14
>>diff -u -r1.14 IndexWriter.java
>>--- IndexWriter.java	12 Aug 2003 15:05:03 -0000	1.14
>>+++ IndexWriter.java	3 Sep 2003 14:55:33 -0000
>>@@ -355,7 +355,7 @@
>>       if ((reader.directory == this.directory) || // if we own the
>>directory
>>           (reader.directory == this.ramDirectory))
>> 	segmentsToDelete.addElement(reader);	  // queue segment for
>>deletion
>>-      mergedDocCount += si.docCount;
>>+      mergedDocCount += reader.numDocs();
>>     }
>>     if (infoStream != null) {
>>       infoStream.println();
>>
>>>import java.io.IOException;
>>
>>import org.apache.lucene.analysis.WhitespaceAnalyzer;
>>import org.apache.lucene.document.Document;
>>import org.apache.lucene.document.Field;
>>import org.apache.lucene.index.IndexReader;
>>import org.apache.lucene.index.IndexWriter;
>>import org.apache.lucene.store.Directory;
>>import org.apache.lucene.store.RAMDirectory;
>>
>>/*
>> * Created on 03.09.2003
>> *
>> * To change the template for this generated file go to
>> * Window>Preferences>Java>Code Generation>Code and Comments
>> */
>>
>>/**
>> * 
>> * @author goller
>> */
>>public class IndexWriterDocCountTest {
>>    
>>    int docCount = 0;
>>  
>>      void addDoc(IndexWriter writer)
>>      {
>>        Document doc = new Document();
>>    
>>        doc.add(Field.Keyword("id","id" + docCount));
>>        doc.add(Field.UnStored("content","aaa"));
>>    
>>        try {
>>          writer.addDocument(doc);
>>        }
>>        catch (IOException e) {
>>          // TODO Auto-generated catch block
>>          e.printStackTrace();
>>        }
>>        docCount++;
>>      }
>>    
>>    
>>
>>    public static void main(String[] args) {
>>        
>>        Directory dir = new RAMDirectory();
>>        IndexWriterDocCountTest test = new IndexWriterDocCountTest();
>>    
>>        IndexWriter writer = null;
>>        IndexReader reader = null;
>>        int i;
>>    
>>        try {
>>          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>true);
>>      
>>          for (i = 0; i < 100; i++)
>>            test.addDoc(writer);
>>      
>>          System.out.println("docCount: " + writer.docCount());
>>          writer.close();
>>          
>>          reader = IndexReader.open(dir);
>>          for (i = 0; i < 50; i++)
>>            reader.delete(i);
>>          reader.close();
>>          System.out.println("doc #0-49 deleted");
>>          
>>          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>false);
>>          System.out.println("docCount: " + writer.docCount());
>>          
>>          writer.optimize();
>>          System.out.println("optimized called");
>>          System.out.println("docCount: " + writer.docCount());
>>          writer.close();
>>          
>>        }
>>        catch (IOException e) {
>>          // TODO Auto-generated catch block
>>          e.printStackTrace();
>>        }
>>    }
>>}
>>
>>
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software
> http://sitebuilder.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 München, Germany     Email:  goller@detego-software.de  *
*****************************************************************


Re: PATCH: IndexWriter

Posted by Doug Cutting <cu...@lucene.com>.
I prefer option (A).  SegmentInfo.docCount is the number of documents 
stored in the segment, regardless of deletions, unlike 
IndexReader.docCount() which is the number of non-deleted documents in 
an index.  So perhaps SegmentInfo.docCount should be renamed, but I 
don't think we should change it's semantics.  Deletions should not be 
considered when deciding whether to automatically merge segments, which 
is another place where SegmentInfo.docCount is used.

Doug

Christoph Goller wrote:
> I thought things over and I now think there are two possible options
> for coping with the indexWriter.docCount() bug. I cannot decide this
> alone. Maybe voting is needed.
> 
> Problem:
> 
> writer.docCount() adds up the docCount values from segmentInfos.
> Note that currently only IndexWriter writes segmentInfos ("segments" file).
> IndexReader only reads them. The problem is that segmentInfo.docCount
> values are updated incorrectly in indexWriter.mergeSegments. Information
> about deleted documents is ignored and therefore segmentInfo.docCount
> values for new segments become too big and do not reflect the real size
> of the new segments. This has two effects. Firstly, writer.docCount()
> becomes incorrect, secondly the merge process is controlled by incorrect
> values about segment size. Note that the the docCount values from
> segmentInfos are used to control the merge process.
> 
> Option (A)
> 
> This is the IndexWriter patch that I submitted. This patch has the effect
> that segmentInfo.docCount values represent the real size of the segments.
> Even if a document is deleted, it is still there until the segment gets
> merged. For every segment the corresponding segmentInfo.docCount values
> delivers the same value that a reader on this segment would deliver with
> reader.maxDoc(). Off course this also means that for readers and writers
> on the whole index reader.maxDoc() == writer.docCount().
> 
> Option (B)
> 
> This option leaves IndexWriter as it was. IndexReader has to be changed.
> Instead of only reading segmentInfos ("segments" file) IndexReader would
> have to write segmentInfos if documents have been deleted. I would do that
> in reader.doClose. The effect would be that for every segment
> segmentInfo.docCount would deliver the same value that a reader on this
> segment would deliver with reader.numDocs(). For reader and writers on the
> whole index we would have reader.numDocs() == writer.docCount(). Here
> segmentInfo.docCount values represent the number of valid documents of a
> segment, those documents that have not been deleted.
> 
> I am slightly in favour of option (A) since it is less work to do :-) and
> it seems reasonable to use the real size of segments for controlling the
> merge process. However, I can also implement option (B).
> 
> Christoph
> 
> Otis Gospodnetic schrieb:
> 
>> Christoph,
>>
>> Thank you for expanding the coverage of the test.
>> However, this looks wrong to me:
>>
>> -          assertEquals(50, writer.docCount());
>> +          assertEquals(100, writer.docCount());
>>
>> Aren't you trying to fix IndexWriter so that after adding 100 and
>> deleting 50 documents, its docCount() method returns 50?
>> The above suggests that the correct behaviour is to return 100, even
>> though 50 have been deleted, and only 50 documents are left in the
>> index.
>>
>> Could you please clarify this for me, before I commit the patches to
>> (Test)IndexWriter?
>>
>> Thanks,
>> Otis
>>
>>
>> --- Christoph Goller <go...@detego-software.de> wrote:
>>
> 


Re: PATCH: IndexWriter

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Christoph,

Option (A) seems to be a better thing to do after all.
Isn't that the IndexWriter patch that you ... ah, yes, you say that
yourself below.

Thanks again, I'll commit the patched IndexWriter now.

Otis

--- Christoph Goller <go...@detego-software.de> wrote:
> I thought things over and I now think there are two possible options
> for coping with the indexWriter.docCount() bug. I cannot decide this
> alone. Maybe voting is needed.
> 
> Problem:
> 
> writer.docCount() adds up the docCount values from segmentInfos.
> Note that currently only IndexWriter writes segmentInfos ("segments"
> file).
> IndexReader only reads them. The problem is that segmentInfo.docCount
> values are updated incorrectly in indexWriter.mergeSegments.
> Information
> about deleted documents is ignored and therefore segmentInfo.docCount
> values for new segments become too big and do not reflect the real
> size
> of the new segments. This has two effects. Firstly, writer.docCount()
> becomes incorrect, secondly the merge process is controlled by
> incorrect
> values about segment size. Note that the the docCount values from
> segmentInfos are used to control the merge process.
> 
> Option (A)
> 
> This is the IndexWriter patch that I submitted. This patch has the
> effect
> that segmentInfo.docCount values represent the real size of the
> segments.
> Even if a document is deleted, it is still there until the segment
> gets
> merged. For every segment the corresponding segmentInfo.docCount
> values
> delivers the same value that a reader on this segment would deliver
> with
> reader.maxDoc(). Off course this also means that for readers and
> writers
> on the whole index reader.maxDoc() == writer.docCount().
> 
> Option (B)
> 
> This option leaves IndexWriter as it was. IndexReader has to be
> changed.
> Instead of only reading segmentInfos ("segments" file) IndexReader
> would
> have to write segmentInfos if documents have been deleted. I would do
> that
> in reader.doClose. The effect would be that for every segment
> segmentInfo.docCount would deliver the same value that a reader on
> this
> segment would deliver with reader.numDocs(). For reader and writers
> on the
> whole index we would have reader.numDocs() == writer.docCount(). Here
> segmentInfo.docCount values represent the number of valid documents
> of a
> segment, those documents that have not been deleted.
> 
> I am slightly in favour of option (A) since it is less work to do :-)
> and
> it seems reasonable to use the real size of segments for controlling
> the
> merge process. However, I can also implement option (B).
> 
> Christoph
> 
> Otis Gospodnetic schrieb:
> > Christoph,
> > 
> > Thank you for expanding the coverage of the test.
> > However, this looks wrong to me:
> > 
> > -          assertEquals(50, writer.docCount());
> > +          assertEquals(100, writer.docCount());
> > 
> > Aren't you trying to fix IndexWriter so that after adding 100 and
> > deleting 50 documents, its docCount() method returns 50?
> > The above suggests that the correct behaviour is to return 100,
> even
> > though 50 have been deleted, and only 50 documents are left in the
> > index.
> > 
> > Could you please clarify this for me, before I commit the patches
> to
> > (Test)IndexWriter?
> > 
> > Thanks,
> > Otis
> > 
> > 
> > --- Christoph Goller <go...@detego-software.de> wrote:
> > 
> 
> -- 
> *****************************************************************
> * Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
> * Detego Software GmbH       Mobile: +49 179 1128469            *
> * Keuslinstr. 13             Fax.:   +49 721 151516176          *
> * 80798 M�nchen, Germany     Email:  goller@detego-software.de  *
> *****************************************************************
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

Re: PATCH: IndexWriter

Posted by Christoph Goller <go...@detego-software.de>.

Erik Hatcher wrote:
> I'd have to dig deeper to really understand this issue fully, but my 
> first impression is that it would be wrong to write a file from an 
> IndexReader - I'm not sure you could assume write access to an index.  
> Although if you didn't have write access then you shouldn't be deleting 
> documents either.  Are there any issues here related to whether you have 
> write privilege to the index?
> 
> 
> On Friday, September 12, 2003, at 04:33  AM, Christoph Goller wrote:
> 

Having write privilege or not is not the question. The changes in IndexReader
necessary for option (B) only have effect if the reader has already deleted
documents and thus already has the write.lock on the index. Instead of only
writing segment.del the reader would also write/change segmentInfos, I would
probably do it in SegmentsReader.doClose().

Christoph

-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 München, Germany     Email:  goller@detego-software.de  *
*****************************************************************


Re: PATCH: IndexWriter

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I'd have to dig deeper to really understand this issue fully, but my 
first impression is that it would be wrong to write a file from an 
IndexReader - I'm not sure you could assume write access to an index.  
Although if you didn't have write access then you shouldn't be deleting 
documents either.  Are there any issues here related to whether you 
have write privilege to the index?


On Friday, September 12, 2003, at 04:33  AM, Christoph Goller wrote:

> I thought things over and I now think there are two possible options
> for coping with the indexWriter.docCount() bug. I cannot decide this
> alone. Maybe voting is needed.
>
> Problem:
>
> writer.docCount() adds up the docCount values from segmentInfos.
> Note that currently only IndexWriter writes segmentInfos ("segments" 
> file).
> IndexReader only reads them. The problem is that segmentInfo.docCount
> values are updated incorrectly in indexWriter.mergeSegments. 
> Information
> about deleted documents is ignored and therefore segmentInfo.docCount
> values for new segments become too big and do not reflect the real size
> of the new segments. This has two effects. Firstly, writer.docCount()
> becomes incorrect, secondly the merge process is controlled by 
> incorrect
> values about segment size. Note that the the docCount values from
> segmentInfos are used to control the merge process.
>
> Option (A)
>
> This is the IndexWriter patch that I submitted. This patch has the 
> effect
> that segmentInfo.docCount values represent the real size of the 
> segments.
> Even if a document is deleted, it is still there until the segment gets
> merged. For every segment the corresponding segmentInfo.docCount values
> delivers the same value that a reader on this segment would deliver 
> with
> reader.maxDoc(). Off course this also means that for readers and 
> writers
> on the whole index reader.maxDoc() == writer.docCount().
>
> Option (B)
>
> This option leaves IndexWriter as it was. IndexReader has to be 
> changed.
> Instead of only reading segmentInfos ("segments" file) IndexReader 
> would
> have to write segmentInfos if documents have been deleted. I would do 
> that
> in reader.doClose. The effect would be that for every segment
> segmentInfo.docCount would deliver the same value that a reader on this
> segment would deliver with reader.numDocs(). For reader and writers on 
> the
> whole index we would have reader.numDocs() == writer.docCount(). Here
> segmentInfo.docCount values represent the number of valid documents of 
> a
> segment, those documents that have not been deleted.
>
> I am slightly in favour of option (A) since it is less work to do :-) 
> and
> it seems reasonable to use the real size of segments for controlling 
> the
> merge process. However, I can also implement option (B).
>
> Christoph
>
> Otis Gospodnetic schrieb:
>> Christoph,
>> Thank you for expanding the coverage of the test.
>> However, this looks wrong to me:
>> -          assertEquals(50, writer.docCount());
>> +          assertEquals(100, writer.docCount());
>> Aren't you trying to fix IndexWriter so that after adding 100 and
>> deleting 50 documents, its docCount() method returns 50?
>> The above suggests that the correct behaviour is to return 100, even
>> though 50 have been deleted, and only 50 documents are left in the
>> index.
>> Could you please clarify this for me, before I commit the patches to
>> (Test)IndexWriter?
>> Thanks,
>> Otis
>> --- Christoph Goller <go...@detego-software.de> wrote:
>
> -- 
> *****************************************************************
> * Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
> * Detego Software GmbH       Mobile: +49 179 1128469            *
> * Keuslinstr. 13             Fax.:   +49 721 151516176          *
> * 80798 München, Germany     Email:  goller@detego-software.de  *
> *****************************************************************
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: PATCH: IndexWriter

Posted by Christoph Goller <go...@detego-software.de>.
I thought things over and I now think there are two possible options
for coping with the indexWriter.docCount() bug. I cannot decide this
alone. Maybe voting is needed.

Problem:

writer.docCount() adds up the docCount values from segmentInfos.
Note that currently only IndexWriter writes segmentInfos ("segments" file).
IndexReader only reads them. The problem is that segmentInfo.docCount
values are updated incorrectly in indexWriter.mergeSegments. Information
about deleted documents is ignored and therefore segmentInfo.docCount
values for new segments become too big and do not reflect the real size
of the new segments. This has two effects. Firstly, writer.docCount()
becomes incorrect, secondly the merge process is controlled by incorrect
values about segment size. Note that the the docCount values from
segmentInfos are used to control the merge process.

Option (A)

This is the IndexWriter patch that I submitted. This patch has the effect
that segmentInfo.docCount values represent the real size of the segments.
Even if a document is deleted, it is still there until the segment gets
merged. For every segment the corresponding segmentInfo.docCount values
delivers the same value that a reader on this segment would deliver with
reader.maxDoc(). Off course this also means that for readers and writers
on the whole index reader.maxDoc() == writer.docCount().

Option (B)

This option leaves IndexWriter as it was. IndexReader has to be changed.
Instead of only reading segmentInfos ("segments" file) IndexReader would
have to write segmentInfos if documents have been deleted. I would do that
in reader.doClose. The effect would be that for every segment
segmentInfo.docCount would deliver the same value that a reader on this
segment would deliver with reader.numDocs(). For reader and writers on the
whole index we would have reader.numDocs() == writer.docCount(). Here
segmentInfo.docCount values represent the number of valid documents of a
segment, those documents that have not been deleted.

I am slightly in favour of option (A) since it is less work to do :-) and
it seems reasonable to use the real size of segments for controlling the
merge process. However, I can also implement option (B).

Christoph

Otis Gospodnetic schrieb:
> Christoph,
> 
> Thank you for expanding the coverage of the test.
> However, this looks wrong to me:
> 
> -          assertEquals(50, writer.docCount());
> +          assertEquals(100, writer.docCount());
> 
> Aren't you trying to fix IndexWriter so that after adding 100 and
> deleting 50 documents, its docCount() method returns 50?
> The above suggests that the correct behaviour is to return 100, even
> though 50 have been deleted, and only 50 documents are left in the
> index.
> 
> Could you please clarify this for me, before I commit the patches to
> (Test)IndexWriter?
> 
> Thanks,
> Otis
> 
> 
> --- Christoph Goller <go...@detego-software.de> wrote:
> 

-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 München, Germany     Email:  goller@detego-software.de  *
*****************************************************************


Re: PATCH: TestIndexWriter

Posted by Christoph Goller <go...@detego-software.de>.
writer.docCount() adds up the docCount values from segmentInfos. The problem
is that currently these values are not updated if documents get deleted and that
the values for new segments during merge are taken from the old segmentInfos.
My patch makes writer.docCount() deliver the same results as reader.maxDoc(),
which reflects deletion of documents in a segment not before the segment is
merged. This is the difference to reader.numDocs() that is updated immediately.
Look what is tested after deleting 50 documents:

           writer  = new IndexWriter(dir, new WhitespaceAnalyzer(), false);
           assertEquals(100, writer.docCount()); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
           writer.close();

           reader = IndexReader.open(dir);
           assertEquals(100, reader.maxDoc());    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
           assertEquals(50, reader.numDocs());
           reader.close();

           writer  = new IndexWriter(dir, new WhitespaceAnalyzer(), false);
           writer.optimize();
           assertEquals(50, writer.docCount());
           writer.close();

           reader = IndexReader.open(dir);
           assertEquals(50, reader.maxDoc());
           assertEquals(50, reader.numDocs());
           reader.close();


Maybe I did not think far enough. Writer.docCount() could deliver the
same values as reader.numDocs(). However, this would require changes in
IndexReader. reader.doClose() would have to change values in segmentInfos.
Currently IndexReader only reads segmentInfos. This also makes sense with
respect to merging if we keep in mind that the docCount values in
segmentInfos are used for controlling the merge process. I think about it,
maybe I submit another patch soon. Please wait a little bit with committing
my IndexWriter patch. Maybe it will become obsolete.

Christoph


Otis Gospodnetic schrieb:
> Christoph,
> 
> Thank you for expanding the coverage of the test.
> However, this looks wrong to me:
> 
> -          assertEquals(50, writer.docCount());
> +          assertEquals(100, writer.docCount());
> 
> Aren't you trying to fix IndexWriter so that after adding 100 and
> deleting 50 documents, its docCount() method returns 50?
> The above suggests that the correct behaviour is to return 100, even
> though 50 have been deleted, and only 50 documents are left in the
> index.
> 
> Could you please clarify this for me, before I commit the patches to
> (Test)IndexWriter?
> 
> Thanks,
> Otis
> 
> 
> --- Christoph Goller <go...@detego-software.de> wrote:
> 
>>Sorry, here is the patch.
>>
>>Otis Gospodnetic schrieb:
>>
>>>Christoph,
>>>
>>>The idea looks good, but the test fails for both pre-patched as
>>
>>well as
>>
>>>patched version of IndexWriter.
>>>
>>>I converted your test to JUnit test and will check it into CVS
>>
>>shortly.
>>
>>>If I made a mistake in it, please point it out.
>>>You can run 'ant test-unit' to see where the test fails.
>>>
>>>Otis
>>>
>>>--- Christoph Goller <go...@detego-software.de> wrote:
>>>
>>>
>>>>IndexWriter implements the method docCount() which reads the number
>>>>of documents from the SegmentInfos of the index. However, it
>>
>>delivers
>>
>>>>incorrect values if documents get deleted from the index. The
>>
>>reason
>>
>>>>for
>>>>this is that SegmentInfo.docCounts are updated in an incorrect way
>>>>when
>>>>segments get merged. The new value is taken from the old
>>>>SegmentInfos.
>>>>It would be better to take the value from the reader instead. In
>>
>>this
>>
>>>>way indexWriter.docCount() would deliver the same value as
>>>>indexReader.maxDoc().
>>>>
>>>>test and patch are attached,
>>>>Christoph
>>>>
>>>>
>>>>-- 
>>>>*****************************************************************
>>>>* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
>>>>* Detego Software GmbH       Mobile: +49 179 1128469            *
>>>>* Keuslinstr. 13             Fax.:   +49 721 151516176          *
>>>>* 80798 München, Germany     Email:  goller@detego-software.de  *
>>>>*****************************************************************
>>>>
>>>>
>>>>>Index: IndexWriter.java
>>>>
>>>>===================================================================
>>>>RCS file:
>>>>
>>>
>>>
> /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java,v
> 
>>>>retrieving revision 1.14
>>>>diff -u -r1.14 IndexWriter.java
>>>>--- IndexWriter.java	12 Aug 2003 15:05:03 -0000	1.14
>>>>+++ IndexWriter.java	3 Sep 2003 14:55:33 -0000
>>>>@@ -355,7 +355,7 @@
>>>>      if ((reader.directory == this.directory) || // if we own the
>>>>directory
>>>>          (reader.directory == this.ramDirectory))
>>>>	segmentsToDelete.addElement(reader);	  // queue segment for
>>>>deletion
>>>>-      mergedDocCount += si.docCount;
>>>>+      mergedDocCount += reader.numDocs();
>>>>    }
>>>>    if (infoStream != null) {
>>>>      infoStream.println();
>>>>
>>>>
>>>>>import java.io.IOException;
>>>>
>>>>import org.apache.lucene.analysis.WhitespaceAnalyzer;
>>>>import org.apache.lucene.document.Document;
>>>>import org.apache.lucene.document.Field;
>>>>import org.apache.lucene.index.IndexReader;
>>>>import org.apache.lucene.index.IndexWriter;
>>>>import org.apache.lucene.store.Directory;
>>>>import org.apache.lucene.store.RAMDirectory;
>>>>
>>>>/*
>>>>* Created on 03.09.2003
>>>>*
>>>>* To change the template for this generated file go to
>>>>* Window>Preferences>Java>Code Generation>Code and Comments
>>>>*/
>>>>
>>>>/**
>>>>* 
>>>>* @author goller
>>>>*/
>>>>public class IndexWriterDocCountTest {
>>>>   
>>>>   int docCount = 0;
>>>> 
>>>>     void addDoc(IndexWriter writer)
>>>>     {
>>>>       Document doc = new Document();
>>>>   
>>>>       doc.add(Field.Keyword("id","id" + docCount));
>>>>       doc.add(Field.UnStored("content","aaa"));
>>>>   
>>>>       try {
>>>>         writer.addDocument(doc);
>>>>       }
>>>>       catch (IOException e) {
>>>>         // TODO Auto-generated catch block
>>>>         e.printStackTrace();
>>>>       }
>>>>       docCount++;
>>>>     }
>>>>   
>>>>   
>>>>
>>>>   public static void main(String[] args) {
>>>>       
>>>>       Directory dir = new RAMDirectory();
>>>>       IndexWriterDocCountTest test = new
>>
>>IndexWriterDocCountTest();
>>
>>>>   
>>>>       IndexWriter writer = null;
>>>>       IndexReader reader = null;
>>>>       int i;
>>>>   
>>>>       try {
>>>>         writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>>>true);
>>>>     
>>>>         for (i = 0; i < 100; i++)
>>>>           test.addDoc(writer);
>>>>     
>>>>         System.out.println("docCount: " + writer.docCount());
>>>>         writer.close();
>>>>         
>>>>         reader = IndexReader.open(dir);
>>>>         for (i = 0; i < 50; i++)
>>>>           reader.delete(i);
>>>>         reader.close();
>>>>         System.out.println("doc #0-49 deleted");
>>>>         
>>>>         writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>>>false);
>>>>         System.out.println("docCount: " + writer.docCount());
>>>>         
>>>>         writer.optimize();
>>>>         System.out.println("optimized called");
>>>>         System.out.println("docCount: " + writer.docCount());
>>>>         writer.close();
>>>>         
>>>>       }
>>>>       catch (IOException e) {
>>>>         // TODO Auto-generated catch block
>>>>         e.printStackTrace();
>>>>       }
>>>>   }
>>>>}
>>>>
>>>>
>>>
>>---------------------------------------------------------------------
>>
>>>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>>
>>>
>>>
>>>__________________________________
>>>Do you Yahoo!?
>>>Yahoo! SiteBuilder - Free, easy-to-use web site design software
>>>http://sitebuilder.yahoo.com
>>>
>>>
>>
>>---------------------------------------------------------------------
>>
>>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>>>
>>>
>>
>>-- 
>>*****************************************************************
>>* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
>>* Detego Software GmbH       Mobile: +49 179 1128469            *
>>* Keuslinstr. 13             Fax.:   +49 721 151516176          *
>>* 80798 München, Germany     Email:  goller@detego-software.de  *
>>*****************************************************************
>>
>>>Index: TestIndexWriter.java
>>
>>===================================================================
>>RCS file:
>>
> 
> /home/cvspublic/jakarta-lucene/src/test/org/apache/lucene/index/TestIndexWriter.java,v
> 
>>retrieving revision 1.1
>>diff -u -r1.1 TestIndexWriter.java
>>--- TestIndexWriter.java	10 Sep 2003 12:58:37 -0000	1.1
>>+++ TestIndexWriter.java	10 Sep 2003 16:29:31 -0000
>>@@ -47,10 +47,23 @@
>>           reader.close();
>> 
>>           writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>false);
>>-          assertEquals(50, writer.docCount());
>>+          assertEquals(100, writer.docCount());
>>+          writer.close();
>>+          
>>+          reader = IndexReader.open(dir);
>>+          assertEquals(100, reader.maxDoc());
>>+          assertEquals(50, reader.numDocs());
>>+          reader.close();
>>+          
>>+          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>false);
>>           writer.optimize();
>>           assertEquals(50, writer.docCount());
>>           writer.close();
>>+          
>>+          reader = IndexReader.open(dir);
>>+          assertEquals(50, reader.maxDoc());
>>+          assertEquals(50, reader.numDocs());
>>+          reader.close();
>>         }
>>         catch (IOException e) {
>>           e.printStackTrace();
>>
>>
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software
> http://sitebuilder.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 München, Germany     Email:  goller@detego-software.de  *
*****************************************************************


Re: PATCH: TestIndexWriter

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Christoph,

Thank you for expanding the coverage of the test.
However, this looks wrong to me:

-          assertEquals(50, writer.docCount());
+          assertEquals(100, writer.docCount());

Aren't you trying to fix IndexWriter so that after adding 100 and
deleting 50 documents, its docCount() method returns 50?
The above suggests that the correct behaviour is to return 100, even
though 50 have been deleted, and only 50 documents are left in the
index.

Could you please clarify this for me, before I commit the patches to
(Test)IndexWriter?

Thanks,
Otis


--- Christoph Goller <go...@detego-software.de> wrote:
> Sorry, here is the patch.
> 
> Otis Gospodnetic schrieb:
> > Christoph,
> > 
> > The idea looks good, but the test fails for both pre-patched as
> well as
> > patched version of IndexWriter.
> > 
> > I converted your test to JUnit test and will check it into CVS
> shortly.
> > If I made a mistake in it, please point it out.
> > You can run 'ant test-unit' to see where the test fails.
> > 
> > Otis
> > 
> > --- Christoph Goller <go...@detego-software.de> wrote:
> > 
> >>IndexWriter implements the method docCount() which reads the number
> >>of documents from the SegmentInfos of the index. However, it
> delivers
> >>incorrect values if documents get deleted from the index. The
> reason
> >>for
> >>this is that SegmentInfo.docCounts are updated in an incorrect way
> >>when
> >>segments get merged. The new value is taken from the old
> >>SegmentInfos.
> >>It would be better to take the value from the reader instead. In
> this
> >>way indexWriter.docCount() would deliver the same value as
> >>indexReader.maxDoc().
> >>
> >>test and patch are attached,
> >>Christoph
> >>
> >>
> >>-- 
> >>*****************************************************************
> >>* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
> >>* Detego Software GmbH       Mobile: +49 179 1128469            *
> >>* Keuslinstr. 13             Fax.:   +49 721 151516176          *
> >>* 80798 M�nchen, Germany     Email:  goller@detego-software.de  *
> >>*****************************************************************
> >>
> >>>Index: IndexWriter.java
> >>
> >>===================================================================
> >>RCS file:
> >>
> > 
> >
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java,v
> > 
> >>retrieving revision 1.14
> >>diff -u -r1.14 IndexWriter.java
> >>--- IndexWriter.java	12 Aug 2003 15:05:03 -0000	1.14
> >>+++ IndexWriter.java	3 Sep 2003 14:55:33 -0000
> >>@@ -355,7 +355,7 @@
> >>       if ((reader.directory == this.directory) || // if we own the
> >>directory
> >>           (reader.directory == this.ramDirectory))
> >> 	segmentsToDelete.addElement(reader);	  // queue segment for
> >>deletion
> >>-      mergedDocCount += si.docCount;
> >>+      mergedDocCount += reader.numDocs();
> >>     }
> >>     if (infoStream != null) {
> >>       infoStream.println();
> >>
> >>>import java.io.IOException;
> >>
> >>import org.apache.lucene.analysis.WhitespaceAnalyzer;
> >>import org.apache.lucene.document.Document;
> >>import org.apache.lucene.document.Field;
> >>import org.apache.lucene.index.IndexReader;
> >>import org.apache.lucene.index.IndexWriter;
> >>import org.apache.lucene.store.Directory;
> >>import org.apache.lucene.store.RAMDirectory;
> >>
> >>/*
> >> * Created on 03.09.2003
> >> *
> >> * To change the template for this generated file go to
> >> * Window>Preferences>Java>Code Generation>Code and Comments
> >> */
> >>
> >>/**
> >> * 
> >> * @author goller
> >> */
> >>public class IndexWriterDocCountTest {
> >>    
> >>    int docCount = 0;
> >>  
> >>      void addDoc(IndexWriter writer)
> >>      {
> >>        Document doc = new Document();
> >>    
> >>        doc.add(Field.Keyword("id","id" + docCount));
> >>        doc.add(Field.UnStored("content","aaa"));
> >>    
> >>        try {
> >>          writer.addDocument(doc);
> >>        }
> >>        catch (IOException e) {
> >>          // TODO Auto-generated catch block
> >>          e.printStackTrace();
> >>        }
> >>        docCount++;
> >>      }
> >>    
> >>    
> >>
> >>    public static void main(String[] args) {
> >>        
> >>        Directory dir = new RAMDirectory();
> >>        IndexWriterDocCountTest test = new
> IndexWriterDocCountTest();
> >>    
> >>        IndexWriter writer = null;
> >>        IndexReader reader = null;
> >>        int i;
> >>    
> >>        try {
> >>          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
> >>true);
> >>      
> >>          for (i = 0; i < 100; i++)
> >>            test.addDoc(writer);
> >>      
> >>          System.out.println("docCount: " + writer.docCount());
> >>          writer.close();
> >>          
> >>          reader = IndexReader.open(dir);
> >>          for (i = 0; i < 50; i++)
> >>            reader.delete(i);
> >>          reader.close();
> >>          System.out.println("doc #0-49 deleted");
> >>          
> >>          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
> >>false);
> >>          System.out.println("docCount: " + writer.docCount());
> >>          
> >>          writer.optimize();
> >>          System.out.println("optimized called");
> >>          System.out.println("docCount: " + writer.docCount());
> >>          writer.close();
> >>          
> >>        }
> >>        catch (IOException e) {
> >>          // TODO Auto-generated catch block
> >>          e.printStackTrace();
> >>        }
> >>    }
> >>}
> >>
> >>
> >
> ---------------------------------------------------------------------
> > 
> >>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> >>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> > 
> > 
> > 
> > __________________________________
> > Do you Yahoo!?
> > Yahoo! SiteBuilder - Free, easy-to-use web site design software
> > http://sitebuilder.yahoo.com
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> > 
> > 
> 
> -- 
> *****************************************************************
> * Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
> * Detego Software GmbH       Mobile: +49 179 1128469            *
> * Keuslinstr. 13             Fax.:   +49 721 151516176          *
> * 80798 M�nchen, Germany     Email:  goller@detego-software.de  *
> *****************************************************************
> > Index: TestIndexWriter.java
> ===================================================================
> RCS file:
>
/home/cvspublic/jakarta-lucene/src/test/org/apache/lucene/index/TestIndexWriter.java,v
> retrieving revision 1.1
> diff -u -r1.1 TestIndexWriter.java
> --- TestIndexWriter.java	10 Sep 2003 12:58:37 -0000	1.1
> +++ TestIndexWriter.java	10 Sep 2003 16:29:31 -0000
> @@ -47,10 +47,23 @@
>            reader.close();
>  
>            writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
> false);
> -          assertEquals(50, writer.docCount());
> +          assertEquals(100, writer.docCount());
> +          writer.close();
> +          
> +          reader = IndexReader.open(dir);
> +          assertEquals(100, reader.maxDoc());
> +          assertEquals(50, reader.numDocs());
> +          reader.close();
> +          
> +          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
> false);
>            writer.optimize();
>            assertEquals(50, writer.docCount());
>            writer.close();
> +          
> +          reader = IndexReader.open(dir);
> +          assertEquals(50, reader.maxDoc());
> +          assertEquals(50, reader.numDocs());
> +          reader.close();
>          }
>          catch (IOException e) {
>            e.printStackTrace();
> 
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

PATCH: TestIndexWriter

Posted by Christoph Goller <go...@detego-software.de>.
Sorry, here is the patch.

Otis Gospodnetic schrieb:
> Christoph,
> 
> The idea looks good, but the test fails for both pre-patched as well as
> patched version of IndexWriter.
> 
> I converted your test to JUnit test and will check it into CVS shortly.
> If I made a mistake in it, please point it out.
> You can run 'ant test-unit' to see where the test fails.
> 
> Otis
> 
> --- Christoph Goller <go...@detego-software.de> wrote:
> 
>>IndexWriter implements the method docCount() which reads the number
>>of documents from the SegmentInfos of the index. However, it delivers
>>incorrect values if documents get deleted from the index. The reason
>>for
>>this is that SegmentInfo.docCounts are updated in an incorrect way
>>when
>>segments get merged. The new value is taken from the old
>>SegmentInfos.
>>It would be better to take the value from the reader instead. In this
>>way indexWriter.docCount() would deliver the same value as
>>indexReader.maxDoc().
>>
>>test and patch are attached,
>>Christoph
>>
>>
>>-- 
>>*****************************************************************
>>* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
>>* Detego Software GmbH       Mobile: +49 179 1128469            *
>>* Keuslinstr. 13             Fax.:   +49 721 151516176          *
>>* 80798 München, Germany     Email:  goller@detego-software.de  *
>>*****************************************************************
>>
>>>Index: IndexWriter.java
>>
>>===================================================================
>>RCS file:
>>
> 
> /home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java,v
> 
>>retrieving revision 1.14
>>diff -u -r1.14 IndexWriter.java
>>--- IndexWriter.java	12 Aug 2003 15:05:03 -0000	1.14
>>+++ IndexWriter.java	3 Sep 2003 14:55:33 -0000
>>@@ -355,7 +355,7 @@
>>       if ((reader.directory == this.directory) || // if we own the
>>directory
>>           (reader.directory == this.ramDirectory))
>> 	segmentsToDelete.addElement(reader);	  // queue segment for
>>deletion
>>-      mergedDocCount += si.docCount;
>>+      mergedDocCount += reader.numDocs();
>>     }
>>     if (infoStream != null) {
>>       infoStream.println();
>>
>>>import java.io.IOException;
>>
>>import org.apache.lucene.analysis.WhitespaceAnalyzer;
>>import org.apache.lucene.document.Document;
>>import org.apache.lucene.document.Field;
>>import org.apache.lucene.index.IndexReader;
>>import org.apache.lucene.index.IndexWriter;
>>import org.apache.lucene.store.Directory;
>>import org.apache.lucene.store.RAMDirectory;
>>
>>/*
>> * Created on 03.09.2003
>> *
>> * To change the template for this generated file go to
>> * Window>Preferences>Java>Code Generation>Code and Comments
>> */
>>
>>/**
>> * 
>> * @author goller
>> */
>>public class IndexWriterDocCountTest {
>>    
>>    int docCount = 0;
>>  
>>      void addDoc(IndexWriter writer)
>>      {
>>        Document doc = new Document();
>>    
>>        doc.add(Field.Keyword("id","id" + docCount));
>>        doc.add(Field.UnStored("content","aaa"));
>>    
>>        try {
>>          writer.addDocument(doc);
>>        }
>>        catch (IOException e) {
>>          // TODO Auto-generated catch block
>>          e.printStackTrace();
>>        }
>>        docCount++;
>>      }
>>    
>>    
>>
>>    public static void main(String[] args) {
>>        
>>        Directory dir = new RAMDirectory();
>>        IndexWriterDocCountTest test = new IndexWriterDocCountTest();
>>    
>>        IndexWriter writer = null;
>>        IndexReader reader = null;
>>        int i;
>>    
>>        try {
>>          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>true);
>>      
>>          for (i = 0; i < 100; i++)
>>            test.addDoc(writer);
>>      
>>          System.out.println("docCount: " + writer.docCount());
>>          writer.close();
>>          
>>          reader = IndexReader.open(dir);
>>          for (i = 0; i < 50; i++)
>>            reader.delete(i);
>>          reader.close();
>>          System.out.println("doc #0-49 deleted");
>>          
>>          writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
>>false);
>>          System.out.println("docCount: " + writer.docCount());
>>          
>>          writer.optimize();
>>          System.out.println("optimized called");
>>          System.out.println("docCount: " + writer.docCount());
>>          writer.close();
>>          
>>        }
>>        catch (IOException e) {
>>          // TODO Auto-generated catch block
>>          e.printStackTrace();
>>        }
>>    }
>>}
>>
>>
> ---------------------------------------------------------------------
> 
>>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! SiteBuilder - Free, easy-to-use web site design software
> http://sitebuilder.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 

-- 
*****************************************************************
* Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
* Detego Software GmbH       Mobile: +49 179 1128469            *
* Keuslinstr. 13             Fax.:   +49 721 151516176          *
* 80798 München, Germany     Email:  goller@detego-software.de  *
*****************************************************************

Re: PATCH: IndexWriter

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Christoph,

The idea looks good, but the test fails for both pre-patched as well as
patched version of IndexWriter.

I converted your test to JUnit test and will check it into CVS shortly.
If I made a mistake in it, please point it out.
You can run 'ant test-unit' to see where the test fails.

Otis

--- Christoph Goller <go...@detego-software.de> wrote:
> 
> IndexWriter implements the method docCount() which reads the number
> of documents from the SegmentInfos of the index. However, it delivers
> incorrect values if documents get deleted from the index. The reason
> for
> this is that SegmentInfo.docCounts are updated in an incorrect way
> when
> segments get merged. The new value is taken from the old
> SegmentInfos.
> It would be better to take the value from the reader instead. In this
> way indexWriter.docCount() would deliver the same value as
> indexReader.maxDoc().
> 
> test and patch are attached,
> Christoph
> 
> 
> -- 
> *****************************************************************
> * Dr. Christoph Goller       Tel.:   +49 89 203 45734           *
> * Detego Software GmbH       Mobile: +49 179 1128469            *
> * Keuslinstr. 13             Fax.:   +49 721 151516176          *
> * 80798 M�nchen, Germany     Email:  goller@detego-software.de  *
> *****************************************************************
> > Index: IndexWriter.java
> ===================================================================
> RCS file:
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java,v
> retrieving revision 1.14
> diff -u -r1.14 IndexWriter.java
> --- IndexWriter.java	12 Aug 2003 15:05:03 -0000	1.14
> +++ IndexWriter.java	3 Sep 2003 14:55:33 -0000
> @@ -355,7 +355,7 @@
>        if ((reader.directory == this.directory) || // if we own the
> directory
>            (reader.directory == this.ramDirectory))
>  	segmentsToDelete.addElement(reader);	  // queue segment for
> deletion
> -      mergedDocCount += si.docCount;
> +      mergedDocCount += reader.numDocs();
>      }
>      if (infoStream != null) {
>        infoStream.println();
> > import java.io.IOException;
> 
> import org.apache.lucene.analysis.WhitespaceAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.RAMDirectory;
> 
> /*
>  * Created on 03.09.2003
>  *
>  * To change the template for this generated file go to
>  * Window>Preferences>Java>Code Generation>Code and Comments
>  */
> 
> /**
>  * 
>  * @author goller
>  */
> public class IndexWriterDocCountTest {
>     
>     int docCount = 0;
>   
>       void addDoc(IndexWriter writer)
>       {
>         Document doc = new Document();
>     
>         doc.add(Field.Keyword("id","id" + docCount));
>         doc.add(Field.UnStored("content","aaa"));
>     
>         try {
>           writer.addDocument(doc);
>         }
>         catch (IOException e) {
>           // TODO Auto-generated catch block
>           e.printStackTrace();
>         }
>         docCount++;
>       }
>     
>     
> 
>     public static void main(String[] args) {
>         
>         Directory dir = new RAMDirectory();
>         IndexWriterDocCountTest test = new IndexWriterDocCountTest();
>     
>         IndexWriter writer = null;
>         IndexReader reader = null;
>         int i;
>     
>         try {
>           writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
> true);
>       
>           for (i = 0; i < 100; i++)
>             test.addDoc(writer);
>       
>           System.out.println("docCount: " + writer.docCount());
>           writer.close();
>           
>           reader = IndexReader.open(dir);
>           for (i = 0; i < 50; i++)
>             reader.delete(i);
>           reader.close();
>           System.out.println("doc #0-49 deleted");
>           
>           writer  = new IndexWriter(dir, new WhitespaceAnalyzer(),
> false);
>           System.out.println("docCount: " + writer.docCount());
>           
>           writer.optimize();
>           System.out.println("optimized called");
>           System.out.println("docCount: " + writer.docCount());
>           writer.close();
>           
>         }
>         catch (IOException e) {
>           // TODO Auto-generated catch block
>           e.printStackTrace();
>         }
>     }
> }
> 
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com