You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gerard Sychay <Ge...@cchmc.org> on 2004/04/21 15:29:02 UTC

Re: Does a RAMDirectory ever need to merge segments... (performanceissue)

I've always wondered about this too.  To put it another way, how does
mergeFactor affect an IndexWriter backed by a RAMDirectory?  Can I set
mergeFactor to the highest possible value (given the machine's RAM) in
order to avoid merging segments?

>>> "Kevin A. Burton" <bu...@newsmonster.org> 04/20/04 04:40AM >>>
I've been benchmarking our indexer to find out if I can squeeze any
more 
performance out of it.

I noticed one problem with RAMDirectory... I'm storing documents in 
memory and then writing them to disk every once in a while. ...

IndexWriter.maybeMergeSegments is taking up 5% of total runtime. 
DocumentWriter.addDocument is taking up another 17% of total runtime.

Notice that this doesn't == 100% becuase there are other tasks taking
up 
CPU before and after Lucene is called.

Anyway... I don't see why RAMDirectory is trying to merge segments.  Is

there anyway to prevent this?  I could just store them in a big 
ArrayList until I'm ready to write them to a disk index but I'm not
sure 
how efficient this will be.

Anyone run into this before?


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Does a RAMDirectory ever need to merge segments... (performanceissue)

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
Gerard Sychay wrote:

>I've always wondered about this too.  To put it another way, how does
>mergeFactor affect an IndexWriter backed by a RAMDirectory?  Can I set
>mergeFactor to the highest possible value (given the machine's RAM) in
>order to avoid merging segments?
>  
>
Yes... actually I was thinking of increasing these vars on the 
RAMDirectory in the hope to avoid this CPU overhead..

Also I think the var you want is minMergeDocs not mergeFactor.  the only 
problem is that the source to maybeMergeSegments says:

>   private final void maybeMergeSegments() throws IOException {
>     long targetMergeDocs = minMergeDocs;
>     while (targetMergeDocs <= maxMergeDocs) {

So I guess to prevent this we would have to set minMergeDocs to 
maxMergeDocs+1 ... which makes not sense.  Also by default maxMergeDocs 
is Integer.MAX_VALUE so that will have to be changed.

Anyway... I'm still playing with this myself. It might be easier to just 
use an ArrayList of N documents if you know for sure how big your RAM 
dir will grow to.

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster