You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Alon Muchnick <al...@datonics.com> on 2013/01/09 18:34:59 UTC

Fwd: extensive minor garbage collection when using RAMDirectory on Lucune 3.6.2

hi ,
after upgrading to Lucune 3.6.2 i noticed there is an extensive minor
garbage collection operations. once or twice a second , and the amount of
memory being freed is about 600 MB each time for a load of 60 searches per
second :

2013-01-09T18:57:24.350+0200: 174200.121: [GC [PSYoungGen:
630064K->544K(630336K)] 1405948K->776747K(1660992K), 0.0084250 secs]
[Times: user=0.03 sys=0.00, real=0.01 secs]
2013-01-09T18:57:24.785+0200: 174200.555: [GC [PSYoungGen:
629920K->704K(630336K)] 1406123K->777083K(1660992K), 0.0066510 secs]
[Times: user=0.04 sys=0.00, real=0.01 secs]

after some investigation , the caused seemed to be Lucune related , we are
using RAM directory (with about 70000 documents ) on top of tomcat 7.

so i took Lucune source code and integrated it to our application , then
run Jprofiler during some  load for a minute and checked the "application
call tree " for "Garbage collected objects" only .
i found that the 2 Lucune method that produce the most garbage collected
object are (*screen shot attached* ) :

(we have several indexes the below is from one of them )


   - 23.2% - 215 MB - 48,252 alloc.
   org.apache.lucene.search.ExactPhraseScorer.


   - 6.3% - 60,481 kB - 728,414 alloc.
   org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer



looking at the currently live object i see the top 2 classes instance are (
a screen shot is also attached )  :

 org.apache.lucene.search.BooleanScorer$Bucket (instance count )
1311699  (size)
41974368

 org.apache.lucene.search.ScoreDoc (instance count )45846  (size)1100304

we have a system with an earlier version  (tomcat 5.5 and Lucune 3.0.3 our
all other code / jars are the same) which handles the same load and has the
same indexes and the minor gc there occurs once every 1-2 second and the
amount of memory being freed is about 60 MB.

does this behavior is normal ?
does a search should really generate so much short lived objects ( about
5MB for a search ) ?
should we try and use some other RAM bases Index ? ( i know
InstantiatedIndex is deprecated )

thanks in advance .

Alon

Re: extensive minor garbage collection when using RAMDirectory on Lucune 3.6.2

Posted by Alon Muchnick <al...@datonics.com>.
attaching the second screen shot of live recorded objects .

thanks again

Alon


On Wed, Jan 9, 2013 at 7:34 PM, Alon Muchnick <al...@datonics.com> wrote:

> hi ,
> after upgrading to Lucune 3.6.2 i noticed there is an extensive minor
> garbage collection operations. once or twice a second , and the amount of
> memory being freed is about 600 MB each time for a load of 60 searches per
> second :
>
> 2013-01-09T18:57:24.350+0200: 174200.121: [GC [PSYoungGen:
> 630064K->544K(630336K)] 1405948K->776747K(1660992K), 0.0084250 secs]
> [Times: user=0.03 sys=0.00, real=0.01 secs]
> 2013-01-09T18:57:24.785+0200: 174200.555: [GC [PSYoungGen:
> 629920K->704K(630336K)] 1406123K->777083K(1660992K), 0.0066510 secs]
> [Times: user=0.04 sys=0.00, real=0.01 secs]
>
> after some investigation , the caused seemed to be Lucune related , we are
> using RAM directory (with about 70000 documents ) on top of tomcat 7.
>
> so i took Lucune source code and integrated it to our application , then
> run Jprofiler during some  load for a minute and checked the "application
> call tree " for "Garbage collected objects" only .
> i found that the 2 Lucune method that produce the most garbage collected
> object are (*screen shot attached* ) :
>
> (we have several indexes the below is from one of them )
>
>
>    - 23.2% - 215 MB - 48,252 alloc.
>    org.apache.lucene.search.ExactPhraseScorer.
>
>
>    - 6.3% - 60,481 kB - 728,414 alloc.
>    org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer
>
>
>
> looking at the currently live object i see the top 2 classes instance are
> ( a screen shot is also attached )  :
>
>  org.apache.lucene.search.BooleanScorer$Bucket (instance count ) 1311699  (size)
> 41974368
>
>  org.apache.lucene.search.ScoreDoc (instance count )45846  (size)1100304
>
> we have a system with an earlier version  (tomcat 5.5 and Lucune 3.0.3 our
> all other code / jars are the same) which handles the same load and has the
> same indexes and the minor gc there occurs once every 1-2 second and the
> amount of memory being freed is about 60 MB.
>
> does this behavior is normal ?
> does a search should really generate so much short lived objects ( about
> 5MB for a search ) ?
> should we try and use some other RAM bases Index ? ( i know
> InstantiatedIndex is deprecated )
>
> thanks in advance .
>
> Alon
>
>
>

Re: extensive minor garbage collection when using RAMDirectory on Lucune 3.6.2

Posted by Michael McCandless <lu...@mikemccandless.com>.
RAMDirectory generally has high GC cost for a large index because it always
allocates byte[1024] as its "pages".  See eg
http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html

But, you are hitting lots of new gen garbage, which is different.

ExactPhraseScorer is new from 3.0.3 -> 3.6.2, and it does alloc two
int[4096] temp arrays, per segment, to do its work, and it gave a speedup
over the old ExactPhraseScorer.

BooleanScorer$Bucket existed in 3.0.3 as well, so I'm not sure why you
didn't see it then.  Though, it could be you previously got BooleanScorer2,
which is slower but creates fewer temp objects.  It creates an array of
2048 Bucket instances, each instance is maybe ~ 32 bytes, plus the pointer
so maybe ~40 bytes * 2048 per segment.

ScoreDoc is created to return the hits back after searching.  How many hits
do you pull for every search?

It's hard to see how this can be 5 MB per search.  But: how many segments
are in your index...?

Net/net I would not worry about this unless you see slower real-world
performance.  The GC seems to be doing its job (reaping lots of "died
young" objects) effectively ...

Mike McCandless

http://blog.mikemccandless.com


On Wed, Jan 9, 2013 at 12:34 PM, Alon Muchnick <al...@datonics.com> wrote:

> hi ,
> after upgrading to Lucune 3.6.2 i noticed there is an extensive minor
> garbage collection operations. once or twice a second , and the amount of
> memory being freed is about 600 MB each time for a load of 60 searches per
> second :
>
> 2013-01-09T18:57:24.350+0200: 174200.121: [GC [PSYoungGen:
> 630064K->544K(630336K)] 1405948K->776747K(1660992K), 0.0084250 secs]
> [Times: user=0.03 sys=0.00, real=0.01 secs]
> 2013-01-09T18:57:24.785+0200: 174200.555: [GC [PSYoungGen:
> 629920K->704K(630336K)] 1406123K->777083K(1660992K), 0.0066510 secs]
> [Times: user=0.04 sys=0.00, real=0.01 secs]
>
> after some investigation , the caused seemed to be Lucune related , we are
> using RAM directory (with about 70000 documents ) on top of tomcat 7.
>
> so i took Lucune source code and integrated it to our application , then
> run Jprofiler during some  load for a minute and checked the "application
> call tree " for "Garbage collected objects" only .
> i found that the 2 Lucune method that produce the most garbage collected
> object are (*screen shot attached* ) :
>
> (we have several indexes the below is from one of them )
>
>
>    - 23.2% - 215 MB - 48,252 alloc.
>    org.apache.lucene.search.ExactPhraseScorer.
>
>
>    - 6.3% - 60,481 kB - 728,414 alloc.
>    org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer
>
>
>
> looking at the currently live object i see the top 2 classes instance are
> ( a screen shot is also attached )  :
>
>  org.apache.lucene.search.BooleanScorer$Bucket (instance count ) 1311699  (size)
> 41974368
>
>  org.apache.lucene.search.ScoreDoc (instance count )45846  (size)1100304
>
> we have a system with an earlier version  (tomcat 5.5 and Lucune 3.0.3 our
> all other code / jars are the same) which handles the same load and has the
> same indexes and the minor gc there occurs once every 1-2 second and the
> amount of memory being freed is about 60 MB.
>
> does this behavior is normal ?
> does a search should really generate so much short lived objects ( about
> 5MB for a search ) ?
> should we try and use some other RAM bases Index ? ( i know
> InstantiatedIndex is deprecated )
>
> thanks in advance .
>
> Alon
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>