You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nizamul <ni...@rediff.co.in> on 2007/12/03 09:48:34 UTC

can we do partial optimization?

Hello, 
I am very new to Lucene.I am facing one problem.
 I have one very large index which is constantly getting update(add and delete) at a regular interval.after which I am optimizing the whole index (otherwise searches will be slow) but optimization takes time.So I was thinking to merge only the segments of lesser size(I guess it will be a good compromise between search time and optimization time) i.e. suppose I have 10 segment 
1 of 10,000,000 doc
4 of 100,000 doc
4 of 10,000 doc 
and 1 of 5 doc.

I want to merger 9 segment of lesser size  in to  one(I believe this would not take much time and searching will improve a lot).But I don't know how to do partial merging.Whether Lucene allow it or not?? or if I can extend indexWriter and add a method optimize of my own where I can specify which cfs file to chose for optimization?

Thanks and Regards,
Nizam

Re: can we do partial optimization?

Posted by Michael McCandless <lu...@mikemccandless.com>.
The current trunk of Lucene (unreleased 2.3-dev) has a new method on
IndexWriter: optimize(int maxNumSegments).  This method should do what
you want: you tell it how many segments to optimize down to, and it
will try to pick the least cost merges to get the index to that
point.  It's very new (only committed a few days ago), plus the trunk
may have bugs, so tread carefully!

If that doesn't seem to do the right merges for your index, it's also
very simple to create your own MergePolicy.  You can subclass the
default LogByteSizeMergePolicy and override the
"findMergesForOptimize" method.  This feature (separate MergePolicy)
is also only available in 2.3-dev (trunk).

Mike

"Nizamul" <ni...@rediff.co.in> wrote:
> Hello, 
> I am very new to Lucene.I am facing one problem.
>  I have one very large index which is constantly getting update(add and
>  delete) at a regular interval.after which I am optimizing the whole
>  index (otherwise searches will be slow) but optimization takes time.So I
>  was thinking to merge only the segments of lesser size(I guess it will
>  be a good compromise between search time and optimization time) i.e.
>  suppose I have 10 segment 
> 1 of 10,000,000 doc
> 4 of 100,000 doc
> 4 of 10,000 doc 
> and 1 of 5 doc.
> 
> I want to merger 9 segment of lesser size  in to  one(I believe this
> would not take much time and searching will improve a lot).But I don't
> know how to do partial merging.Whether Lucene allow it or not?? or if I
> can extend indexWriter and add a method optimize of my own where I can
> specify which cfs file to chose for optimization?
> 
> Thanks and Regards,
> Nizam

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: can we do partial optimization?

Posted by Doron Cohen <DO...@il.ibm.com>.
It doesn't make sense to optimize() after every document add.
Lucene in fact implements a logic in the spirit of what you
describe below, when it decides to merge segments on the fly.

There are various ways to tell Lucene how often to flush
recently added/updated documents and what to merge.

But it will pay to check the simple things first, like - Are
you closing and opening the index writer after each document
add (don't)? Are you deleting using IndexRedaer or
IndexWriter (use IndexWriter if you can)? etc.

It's a good start to going again through Lucene FAQ,
http://wiki.apache.org/lucene-java/LuceneFAQ
and in addition see this wiki page on performance:
http://wiki.apache.org/lucene-java/BasicsOfPerformance

Good luck, and let us know how it went!
Doron

"Nizamul" <ni...@rediff.co.in> wrote on 03/12/2007 10:48:34:

> Hello,
> I am very new to Lucene.I am facing one problem.
>  I have one very large index which is constantly getting
> update(add and delete) at a regular interval.after which I am
> optimizing the whole index (otherwise searches will be slow)
> but optimization takes time.So I was thinking to merge only the
> segments of lesser size(I guess it will be a good compromise
> between search time and optimization time) i.e. suppose I have
> 10 segment
> 1 of 10,000,000 doc
> 4 of 100,000 doc
> 4 of 10,000 doc
> and 1 of 5 doc.
>
> I want to merger 9 segment of lesser size  in to  one(I believe
> this would not take much time and searching will improve a
> lot).But I don't know how to do partial merging.Whether Lucene
> allow it or not?? or if I can extend indexWriter and add a
> method optimize of my own where I can specify which cfs file to
> chose for optimization?
>
> Thanks and Regards,
> Nizam


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org