You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Kevin A. Burton" <bu...@newsmonster.org> on 2004/04/13 02:45:37 UTC

Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI

Not sure if this is a bug or expected behavior. 

I took Doug's suggestion and migrated to a large BUFFER_SIZE of 1024^2 
.  He mentioned that I might be able to squeeze 5-10% out of index 
merges this way.

I'm not sure if this is expected behavior but this requires a LOT of 
memory.  Without this setting the VM only grows to about 200M ... As 
soon as I enable this my VM will go up to 1.5G and run out of memory 
(which is the max heap).

Our indexes aren't THAT big so I'm not sure if something's wrong here or 
if this is expected behavior.

If this is expected I'm not sure this is valuable.  There are other uses 
for that memory... perhaps just doing the whole merge in memory...

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    
    
    NewsMonster - http://www.newsmonster.org/
    
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

Posted by "Kevin A. Burton" <bu...@newsmonster.org>.

petite_abeille wrote:

>
> On Apr 13, 2004, at 02:45, Kevin A. Burton wrote:
>
>> He mentioned that I might be able to squeeze 5-10% out of index 
>> merges this way.
>
>
> Talking of which... what strategy(ies) do people use to minimize 
> downtime when updating an index?
>
This should probably be a wiki page.

Anyway... two thoughts I had on the subject a while back:

You maintain two disk (not RAID ... you get reliability through software).

Searches are load balanced between disks for performance reasons.  If 
one fails you just stop using it.

When you want to do an index merge you read from disk0 and write to 
disk1.  Then you take disk0 out of search rotation and add disk1 and 
copy the contents of disk1 to disk two.  Users shouldn't notice much of 
a performance issue during the merge because it will be VERY fast and 
it's just reads from disk0.

Kevin

-- 

Please reply using PGP.

    http://peerfear.org/pubkey.asc    

    NewsMonster - http://www.newsmonster.org/

Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

Posted by Stephane James Vaucher <va...@cirano.qc.ca>.

I'm actually pretty lazy about index updates, and haven't had the need for 
efficiency, since my requirement is that new documents should be 
available on a next working day basis.

I reindex everything from scatch every night (400,000 docs) and store it 
in an timestamped index. When the reindexing is done, I alert a controller 
of the new active index. I keep a few versions of the index in case of 
a failure somewhere and I can always send a message to the controller to 
use an old index.

cheers,
sv

On Tue, 13 Apr 2004, petite_abeille wrote:

> 
> On Apr 13, 2004, at 02:45, Kevin A. Burton wrote:
> 
> > He mentioned that I might be able to squeeze 5-10% out of index merges 
> > this way.
> 
> Talking of which... what strategy(ies) do people use to minimize 
> downtime when updating an index?
> 
> My current "strategy" is as follow:
> 
> (1) use a temporary RAMDirectory for ongoing updates.
> (2) perform a "copy on write" when flushing the RAMDirectory into the 
> persistent index.
> 
> The second step means that I create an offline copy of a live index 
> before invoking addIndexes() and then substitute the old index with the 
> new, updated, one. While this effectively increase the time it takes to 
> update an index, it nonetheless reduce the *perceived* downtime for it.
> 
> Thoughts? Alternative strategies?
> 
> TIA.
> 
> Cheers,
> 
> PA.
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

Posted by petite_abeille <pe...@mac.com>.

On Apr 13, 2004, at 02:45, Kevin A. Burton wrote:

> He mentioned that I might be able to squeeze 5-10% out of index merges 
> this way.

Talking of which... what strategy(ies) do people use to minimize 
downtime when updating an index?

My current "strategy" is as follow:

(1) use a temporary RAMDirectory for ongoing updates.
(2) perform a "copy on write" when flushing the RAMDirectory into the 
persistent index.

The second step means that I create an offline copy of a live index 
before invoking addIndexes() and then substitute the old index with the 
new, updated, one. While this effectively increase the time it takes to 
update an index, it nonetheless reduce the *perceived* downtime for it.

Thoughts? Alternative strategies?

TIA.

Cheers,

PA.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org