You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2003/10/12 19:02:18 UTC

Re:[PATCH]_IndexWriter_:_controling_the_number_of_Docs_merged_

Thanks Julien, I put your patch in Bugzilla, so we don't lose it.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23754

Otis


--- fp235-5 <ju...@lingway.com> wrote:
> Sorry, here is the patch ;-) 
> 
> 
> ---------- Debut du message initial -----------
> 
> De     : "fp235-5" <ju...@lingway.com>
> A      : "lucene-dev" <lu...@jakarta.apache.org>
> Copies : 
> Date   : Sat, 20 Sep 2003 16:06:06 +0200
> Sujet  : [PATCH] IndexWriter : controling the number of Docs merged 
> 
> Hello, 
> 
> Someone made a suggestion yesterday about adding a variable to
> IndexWriter in
> order to control the number of Documents merged in RAMDirectory
> independently of
> the mergeFactor. (I'm sorry I don't remember who exactly and the mail
> arrived at
> my office).
> I'm proposing a tiny modification of IndexWriter to add this
> functionality. A
> variable minMergeDocs specifies the number of Documents to be merged
> in memory
> before starting a new Segment. The mergeFactor still control the
> number of
> Segments created in the Directory and thus it's possible to avoid the
> file
> number limitation problem.
> 
> The diff file is attached.
> 
> As noticed by Dmitry and Erik there are no true JUnit tests. I'd be
> OK to write
> a JUnit test for this feature. The problem is that the SegmentInfos
> field is
> private in IndexWriter and can't be used to check the number and size
> of the
> Segments. I ran a test using the infoStream variable of IndexWriter -
> everything
> seems to be OK.
> 
> Any comments / suggestions are welcome. 
> 
> Regards
> 
> Julien
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> > Index: IndexWriter.java
> ===================================================================
> RCS file:
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java,v
> retrieving revision 1.15
> diff -u -r1.15 IndexWriter.java
> --- IndexWriter.java	15 Sep 2003 12:40:23 -0000	1.15
> +++ IndexWriter.java	20 Sep 2003 12:22:13 -0000
> @@ -249,6 +249,16 @@
>     *
>     * <p>This must never be less than 2.  The default value is 10.*/
>    public int mergeFactor = 10;
> +  
> +  /** Determines the minimal number of documents required before
> merging
> +   * and starting a new Segment. Since Documents are merged in a 
> +   * {@link org.apache.lucene.store.RAMDirectory}, large value gives
> faster 
> +   * indexing. At the same time mergeFactor limits the number of
> files open in 
> +   * a FSDirectory.
> +   * 
> +   * <p> The default value is 10.*/
> +  public int minMergeDocs = 10;
> +  
>  
>    /** Determines the largest number of documents ever merged by
> addDocument().
>     * Small values (e.g., less than 10,000) are best for interactive
> indexing,
> @@ -316,7 +326,7 @@
>  
>    /** Incremental segment merger.  */
>    private final void maybeMergeSegments() throws IOException {
> -    long targetMergeDocs = mergeFactor;
> +    long targetMergeDocs = minMergeDocs;
>      while (targetMergeDocs <= maxMergeDocs) {
>        // find segments smaller than current target size
>        int minSegment = segmentInfos.size();
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


index summary facilities

Posted by Alex Aw Seat Kiong <al...@bigonthenet.com>.
Hi !

Are the lucene indexer library included the index summary facilities?
Like,
- total record was indexed.
- List of the sources(uid) was indexed
- total record was deleted.
- List of the sources(uid) was deleted

And, how to use the searcher to retrieve back all the records was
indexed/deleted?
What type the Query String to do this?


thkx...
AlexAw



----- Original Message ----- 
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Cc: <ju...@lingway.com>
Sent: Monday, October 13, 2003 1:02 AM
Subject: Re:[PATCH]_IndexWriter_:_controling_the_number_of_Docs_merged_


> Thanks Julien, I put your patch in Bugzilla, so we don't lose it.
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23754
>
> Otis
>
>
> --- fp235-5 <ju...@lingway.com> wrote:
> > Sorry, here is the patch ;-)
> >
> >
> > ---------- Debut du message initial -----------
> >
> > De     : "fp235-5" <ju...@lingway.com>
> > A      : "lucene-dev" <lu...@jakarta.apache.org>
> > Copies :
> > Date   : Sat, 20 Sep 2003 16:06:06 +0200
> > Sujet  : [PATCH] IndexWriter : controling the number of Docs merged
> >
> > Hello,
> >
> > Someone made a suggestion yesterday about adding a variable to
> > IndexWriter in
> > order to control the number of Documents merged in RAMDirectory
> > independently of
> > the mergeFactor. (I'm sorry I don't remember who exactly and the mail
> > arrived at
> > my office).
> > I'm proposing a tiny modification of IndexWriter to add this
> > functionality. A
> > variable minMergeDocs specifies the number of Documents to be merged
> > in memory
> > before starting a new Segment. The mergeFactor still control the
> > number of
> > Segments created in the Directory and thus it's possible to avoid the
> > file
> > number limitation problem.
> >
> > The diff file is attached.
> >
> > As noticed by Dmitry and Erik there are no true JUnit tests. I'd be
> > OK to write
> > a JUnit test for this feature. The problem is that the SegmentInfos
> > field is
> > private in IndexWriter and can't be used to check the number and size
> > of the
> > Segments. I ran a test using the infoStream variable of IndexWriter -
> > everything
> > seems to be OK.
> >
> > Any comments / suggestions are welcome.
> >
> > Regards
> >
> > Julien
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> >
> >
> > > Index: IndexWriter.java
> > ===================================================================
> > RCS file:
> >
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.
java,v
> > retrieving revision 1.15
> > diff -u -r1.15 IndexWriter.java
> > --- IndexWriter.java 15 Sep 2003 12:40:23 -0000 1.15
> > +++ IndexWriter.java 20 Sep 2003 12:22:13 -0000
> > @@ -249,6 +249,16 @@
> >     *
> >     * <p>This must never be less than 2.  The default value is 10.*/
> >    public int mergeFactor = 10;
> > +
> > +  /** Determines the minimal number of documents required before
> > merging
> > +   * and starting a new Segment. Since Documents are merged in a
> > +   * {@link org.apache.lucene.store.RAMDirectory}, large value gives
> > faster
> > +   * indexing. At the same time mergeFactor limits the number of
> > files open in
> > +   * a FSDirectory.
> > +   *
> > +   * <p> The default value is 10.*/
> > +  public int minMergeDocs = 10;
> > +
> >
> >    /** Determines the largest number of documents ever merged by
> > addDocument().
> >     * Small values (e.g., less than 10,000) are best for interactive
> > indexing,
> > @@ -316,7 +326,7 @@
> >
> >    /** Incremental segment merger.  */
> >    private final void maybeMergeSegments() throws IOException {
> > -    long targetMergeDocs = mergeFactor;
> > +    long targetMergeDocs = minMergeDocs;
> >      while (targetMergeDocs <= maxMergeDocs) {
> >        // find segments smaller than current target size
> >        int minSegment = segmentInfos.size();
> > >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>
> __________________________________
> Do you Yahoo!?
> The New Yahoo! Shopping - with improved product search
> http://shopping.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org