You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Marc Sturlese <ma...@gmail.com> on 2010/11/12 16:18:05 UTC

performance merging indexes with addIndexesNoOptimize

I am doing some test about merge indexing and have a performance doubt 
I am doing merge in a simple way, something like: 

      FSDirectory indexes[] = new FSDirectory[indexList.size()]; 
      for (int i = 0; i < indexList.size(); i++) { 
        indexes[i] = FSDirectory.open(new File(indexList.get(i))); 
      } 
      w.addIndexesNoOptimize(indexes); 
      w.close(); 

IndexList.size() is 50 and contains paths to index. These 50 indexes contain
500.000 docs each and have about 500m size each (each index). I have
realised that 50% of the time is spent in w.addIndexesNoOptimize(indexes)
and the other 50 in w.close() (I suppose because close commits and have to
wait for all the merges to be completed). 

I am wondering if is there a way to do this faster. For example, merge the
50 indexes into 25 indexes, these 25 into 12, these 12 into 6... till geting
a single big index. Could this be faster? 

Does anyone have experience with this? Any advice? 
Thanks in advance
-- 
View this message in context: http://lucene.472066.n3.nabble.com/performance-merging-indexes-with-addIndexesNoOptimize-tp1889378p1889378.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: performance merging indexes with addIndexesNoOptimize

Posted by Shai Erera <se...@gmail.com>.
That's right. In 3x though you have to call addIndexes followed by
maybeMerge if you want to achieve the same effect of
addindexesNoOptimize.

Shai

On Friday, November 12, 2010, Marc Sturlese <ma...@gmail.com> wrote:
>
> Thanks, so clarifying. As far as I've understood, if I have to end up
> optimizing the index just after merging it, no matter if I use the lucene
> 3.X addIndexes or addIndexesNoOptimize as the sum of time of doing both
> things will be the same in one case or other. Am I right?
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/performance-merging-indexes-with-addIndexesNoOptimize-tp1889378p1890595.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: performance merging indexes with addIndexesNoOptimize

Posted by Marc Sturlese <ma...@gmail.com>.
Thanks, so clarifying. As far as I've understood, if I have to end up
optimizing the index just after merging it, no matter if I use the lucene
3.X addIndexes or addIndexesNoOptimize as the sum of time of doing both
things will be the same in one case or other. Am I right?

-- 
View this message in context: http://lucene.472066.n3.nabble.com/performance-merging-indexes-with-addIndexesNoOptimize-tp1889378p1890595.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: performance merging indexes with addIndexesNoOptimize

Posted by Shai Erera <se...@gmail.com>.
Ok, so a couple of clarifications:

addIndexes(Directory...) *does not* trigger any merges. It simply registers
the incoming directories in the target index, and returns. You can later
call maybeMerge() or optimize() as you see fit.

Compound files are irrelevant to addIndexes - it just adds the incoming
ones. To later merge them and into a compound file, you'd have to change the
MergePolicy settings to create compound files and then call optimize(). This
is true even if you're not using addIndexes - say you're adding documents to
an index w/ setUseCompoundFiles(false), commits a couple of times, then
setUseCompFiles(true) and call optimize(), they will be converted to
compound files.

Calling w.optimize() will trigger merges and wait for all of them to
complete before the method returns. Therefore calling close() or
close(false) afterwards is the same. Calling optimize(false) followed by a
close(false) (immediately) means that ~0 merges will complete, so that's not
good either.

If you want optimize to finish, call optimize(). If you want to close the
IndexWriter as soon as possible, call close(false). Placing these two method
calls immediately one after the other makes sense only if you call either:
1) optimize(); close(); or
2) optimize(false); close();

In either case, you want to wait for merges. If you don't want that, then
just don't call optimize() can use close(false).

Shai

On Fri, Nov 12, 2010 at 6:56 PM, Marc Sturlese <ma...@gmail.com>wrote:

>
> Thanks a lot Shai, couple of questions:
>
> >> In Lucene 3x there is a new addIndexes which accepts Directory… that
> >> simply registers the new indexes in the index, without running merges.
> >> That makes addIndexes very fast.
> With the lucene 3.X addIndexes which accepts Directory, if after the merge
> I
> need to optimize the index using compound file, what would happen? Would
> this optimize be slower than if I use the addIndexesNoOptimize?
>
> >> But note that not running merges, or letting them finish, is not
> >> recommended long term. The approach I've mentioned are good if you
> >> want to quickly add new indexes and plan to run index optimization at
> >> a later time.
>
> Actually I always build all the "small indexes" from scratch and have to
> optimize my final index using compound file (the merged one). If i do:
> w.optimze() ;
> w.close (false) ;
> Would I be getting any benefit if the w.close (false) ?
>
> Thanks in advance
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/performance-merging-indexes-with-addIndexesNoOptimize-tp1889378p1890077.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: performance merging indexes with addIndexesNoOptimize

Posted by Marc Sturlese <ma...@gmail.com>.
Thanks a lot Shai, couple of questions:

>> In Lucene 3x there is a new addIndexes which accepts Directory… that 
>> simply registers the new indexes in the index, without running merges. 
>> That makes addIndexes very fast. 
With the lucene 3.X addIndexes which accepts Directory, if after the merge I
need to optimize the index using compound file, what would happen? Would
this optimize be slower than if I use the addIndexesNoOptimize?

>> But note that not running merges, or letting them finish, is not 
>> recommended long term. The approach I've mentioned are good if you 
>> want to quickly add new indexes and plan to run index optimization at 
>> a later time.

Actually I always build all the "small indexes" from scratch and have to
optimize my final index using compound file (the merged one). If i do:
w.optimze() ; 
w.close (false) ;
Would I be getting any benefit if the w.close (false) ?

Thanks in advance


-- 
View this message in context: http://lucene.472066.n3.nabble.com/performance-merging-indexes-with-addIndexesNoOptimize-tp1889378p1890077.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: performance merging indexes with addIndexesNoOptimize

Posted by Shai Erera <se...@gmail.com>.
 In Lucene 3x there is a new addIndexes which accepts Directory… that
simply registers the new indexes in the index, without running merges.
That makes addIndexes very fast.

Also, you can consider calling close(false) to not wait for merges.
That can speed things up as well.

But note that not running merges, or letting them finish, is not
recommended long term. The approachs I've mentioned are good if you
want to quickly add new indexes and plan to run index optimization at
a later time.

Shai

On Friday, November 12, 2010, Marc Sturlese <ma...@gmail.com> wrote:
>
> I am doing some test about merge indexing and have a performance doubt
> I am doing merge in a simple way, something like:
>
>       FSDirectory indexes[] = new FSDirectory[indexList.size()];
>       for (int i = 0; i < indexList.size(); i++) {
>         indexes[i] = FSDirectory.open(new File(indexList.get(i)));
>       }
>       w.addIndexesNoOptimize(indexes);
>       w.close();
>
> IndexList.size() is 50 and contains paths to index. These 50 indexes contain
> 500.000 docs each and have about 500m size each (each index). I have
> realised that 50% of the time is spent in w.addIndexesNoOptimize(indexes)
> and the other 50 in w.close() (I suppose because close commits and have to
> wait for all the merges to be completed).
>
> I am wondering if is there a way to do this faster. For example, merge the
> 50 indexes into 25 indexes, these 25 into 12, these 12 into 6... till geting
> a single big index. Could this be faster?
>
> Does anyone have experience with this? Any advice?
> Thanks in advance
> --
> View this message in context: http://lucene.472066.n3.nabble.com/performance-merging-indexes-with-addIndexesNoOptimize-tp1889378p1889378.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org