You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Kevin A. Burton" <bu...@newsmonster.org> on 2004/04/13 02:45:37 UTC
Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI
Not sure if this is a bug or expected behavior.
I took Doug's suggestion and migrated to a large BUFFER_SIZE of 1024^2
. He mentioned that I might be able to squeeze 5-10% out of index
merges this way.
I'm not sure if this is expected behavior but this requires a LOT of
memory. Without this setting the VM only grows to about 200M ... As
soon as I enable this my VM will go up to 1.5G and run out of memory
(which is the max heap).
Our indexes aren't THAT big so I'm not sure if something's wrong here or
if this is expected behavior.
If this is expected I'm not sure this is valuable. There are other uses
for that memory... perhaps just doing the whole merge in memory...
Kevin
--
Please reply using PGP.
http://peerfear.org/pubkey.asc
NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError..
FYI)
Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
petite_abeille wrote:
>
> On Apr 13, 2004, at 02:45, Kevin A. Burton wrote:
>
>> He mentioned that I might be able to squeeze 5-10% out of index
>> merges this way.
>
>
> Talking of which... what strategy(ies) do people use to minimize
> downtime when updating an index?
>
This should probably be a wiki page.
Anyway... two thoughts I had on the subject a while back:
You maintain two disk (not RAID ... you get reliability through software).
Searches are load balanced between disks for performance reasons. If
one fails you just stop using it.
When you want to do an index merge you read from disk0 and write to
disk1. Then you take disk0 out of search rotation and add disk1 and
copy the contents of disk1 to disk two. Users shouldn't notice much of
a performance issue during the merge because it will be VERY fast and
it's just reads from disk0.
Kevin
--
Please reply using PGP.
http://peerfear.org/pubkey.asc
NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: index update (was Re: Large InputStream.BUFFER_SIZE causes
OutOfMemoryError.. FYI)
Posted by Stephane James Vaucher <va...@cirano.qc.ca>.
I'm actually pretty lazy about index updates, and haven't had the need for
efficiency, since my requirement is that new documents should be
available on a next working day basis.
I reindex everything from scatch every night (400,000 docs) and store it
in an timestamped index. When the reindexing is done, I alert a controller
of the new active index. I keep a few versions of the index in case of
a failure somewhere and I can always send a message to the controller to
use an old index.
cheers,
sv
On Tue, 13 Apr 2004, petite_abeille wrote:
>
> On Apr 13, 2004, at 02:45, Kevin A. Burton wrote:
>
> > He mentioned that I might be able to squeeze 5-10% out of index merges
> > this way.
>
> Talking of which... what strategy(ies) do people use to minimize
> downtime when updating an index?
>
> My current "strategy" is as follow:
>
> (1) use a temporary RAMDirectory for ongoing updates.
> (2) perform a "copy on write" when flushing the RAMDirectory into the
> persistent index.
>
> The second step means that I create an offline copy of a live index
> before invoking addIndexes() and then substitute the old index with the
> new, updated, one. While this effectively increase the time it takes to
> update an index, it nonetheless reduce the *perceived* downtime for it.
>
> Thoughts? Alternative strategies?
>
> TIA.
>
> Cheers,
>
> PA.
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)
Posted by petite_abeille <pe...@mac.com>.
On Apr 13, 2004, at 02:45, Kevin A. Burton wrote:
> He mentioned that I might be able to squeeze 5-10% out of index merges
> this way.
Talking of which... what strategy(ies) do people use to minimize
downtime when updating an index?
My current "strategy" is as follow:
(1) use a temporary RAMDirectory for ongoing updates.
(2) perform a "copy on write" when flushing the RAMDirectory into the
persistent index.
The second step means that I create an offline copy of a live index
before invoking addIndexes() and then substitute the old index with the
new, updated, one. While this effectively increase the time it takes to
update an index, it nonetheless reduce the *perceived* downtime for it.
Thoughts? Alternative strategies?
TIA.
Cheers,
PA.
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org