You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gerard Sychay <Ge...@cchmc.org> on 2004/04/21 15:29:02 UTC
Re: Does a RAMDirectory ever need to merge segments...
(performanceissue)
I've always wondered about this too. To put it another way, how does
mergeFactor affect an IndexWriter backed by a RAMDirectory? Can I set
mergeFactor to the highest possible value (given the machine's RAM) in
order to avoid merging segments?
>>> "Kevin A. Burton" <bu...@newsmonster.org> 04/20/04 04:40AM >>>
I've been benchmarking our indexer to find out if I can squeeze any
more
performance out of it.
I noticed one problem with RAMDirectory... I'm storing documents in
memory and then writing them to disk every once in a while. ...
IndexWriter.maybeMergeSegments is taking up 5% of total runtime.
DocumentWriter.addDocument is taking up another 17% of total runtime.
Notice that this doesn't == 100% becuase there are other tasks taking
up
CPU before and after Lucene is called.
Anyway... I don't see why RAMDirectory is trying to merge segments. Is
there anyway to prevent this? I could just store them in a big
ArrayList until I'm ready to write them to a disk index but I'm not
sure
how efficient this will be.
Anyone run into this before?
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Does a RAMDirectory ever need to merge segments... (performanceissue)
Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
Gerard Sychay wrote:
>I've always wondered about this too. To put it another way, how does
>mergeFactor affect an IndexWriter backed by a RAMDirectory? Can I set
>mergeFactor to the highest possible value (given the machine's RAM) in
>order to avoid merging segments?
>
>
Yes... actually I was thinking of increasing these vars on the
RAMDirectory in the hope to avoid this CPU overhead..
Also I think the var you want is minMergeDocs not mergeFactor. the only
problem is that the source to maybeMergeSegments says:
> private final void maybeMergeSegments() throws IOException {
> long targetMergeDocs = minMergeDocs;
> while (targetMergeDocs <= maxMergeDocs) {
So I guess to prevent this we would have to set minMergeDocs to
maxMergeDocs+1 ... which makes not sense. Also by default maxMergeDocs
is Integer.MAX_VALUE so that will have to be changed.
Anyway... I'm still playing with this myself. It might be easier to just
use an ArrayList of N documents if you know for sure how big your RAM
dir will grow to.
Kevin
--
Please reply using PGP.
http://peerfear.org/pubkey.asc
NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster