You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kris Leite <kl...@imcsoftware.com> on 2009/06/24 18:53:46 UTC

updateDocument and high Memory Usage

I was wondering if anybody else that has been using updateDocument 
noticed it uses a large amount of memory when updating an existing document.

For example, when using updateDocument on an empty Lucene directory, the 
resulting 12K documents creates a 3MB index, the amount of memory the 
program uses is 270MB.  When the program is executed again, this time 
updating all 12K documents with the same exact same data, the program 
will take all available memory allocated to the JVM.  The largest I used 
was 1024MB.  No out of memory errors occurred but the memory also was 
not released after closing all the readers and writers.

I was extremely surprised by this.  By running the program multiple 
times making sure that the code path does not change when deleting the 
Lucene directory to effectively add verses update.  I get very 
consistent results.  Adding is very stable at 270MB, but updating 
existing documents maxes out the JVM memory allocation.

Is there a configuration option that can be used to adjust the memory 
usage? 

I have tried doing separate delete/add code and get similar results so 
it appears to be more of a Lucene delete document issue?

Any help would be greatly appreciated.

Thanks,
Kris


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: updateDocument and high Memory Usage

Posted by Michael McCandless <lu...@mikemccandless.com>.
Likely this is because under the hood when IndexWriter flushes your
deletes, it opens readers.  It closes the readers as soon as the
deletes are done, thus creating a fair amount of garbage (which looks
like memory used by the JVM).

How are you measuring the memory usage?  Likely it's mostly garbage,
ie, the JVM is taking advantage of the large heap by waiting longer
before collecting.  If you gave it a smallish heap (not so small that
you hit OOMs) then it should run just fine within that footprint, just
running GC more often.

It also looks like Lucene may not flush soon enough when lots of
deletes are buffered (there's another thread going on about this,
now), but I'm not sure that's having an appreciable effect in your
case.

Mike

On Wed, Jun 24, 2009 at 12:53 PM, Kris Leite<kl...@imcsoftware.com> wrote:
> I was wondering if anybody else that has been using updateDocument noticed
> it uses a large amount of memory when updating an existing document.
>
> For example, when using updateDocument on an empty Lucene directory, the
> resulting 12K documents creates a 3MB index, the amount of memory the
> program uses is 270MB.  When the program is executed again, this time
> updating all 12K documents with the same exact same data, the program will
> take all available memory allocated to the JVM.  The largest I used was
> 1024MB.  No out of memory errors occurred but the memory also was not
> released after closing all the readers and writers.
>
> I was extremely surprised by this.  By running the program multiple times
> making sure that the code path does not change when deleting the Lucene
> directory to effectively add verses update.  I get very consistent results.
>  Adding is very stable at 270MB, but updating existing documents maxes out
> the JVM memory allocation.
>
> Is there a configuration option that can be used to adjust the memory usage?
> I have tried doing separate delete/add code and get similar results so it
> appears to be more of a Lucene delete document issue?
>
> Any help would be greatly appreciated.
>
> Thanks,
> Kris
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org