You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by roz dev <ro...@gmail.com> on 2012/07/31 20:34:11 UTC

Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Hi All

I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
when we are indexing lots of data with 16 concurrent threads, Heap grows
continuously. It remains high and ultimately most of the stuff ends up
being moved to Old Gen. Eventually, Old Gen also fills up and we start
getting into excessive GC problem.

I took a heap dump and found that most of the memory is consumed by
CloseableThreadLocal which is holding a WeakHashMap of Threads and its
state.

Most of the old gen is full with ThreadLocal eating up 3GB of heap and heap
dump shows that all such entries are using Snowball Filter. I looked into
LUCENE-3841 and verified that my version of SOLR 4 has that code.

So, I am wondering the reason for this memory leak - is it due to some
other bug with Solr/Lucene?

Here is a brief snapshot of HeapDump showing the problem

Class
Name
| Shallow Heap | Retained Heap
-------------------------------------------------------------------------------------------------------------------------------------------------------------
*org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
0x300c3eb28
|           24 | 3,885,213,072*
|- <class> class org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
0x2f9753340                                           |            0
|             0
|- this$0 org.apache.solr.schema.IndexSchema @
0x300bf4048
|           96 |       276,704
*|- reuseStrategy org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy
@ 0x300c3eb40                                      |           16 |
3,885,208,728*
|  |- <class> class
org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @
0x2f98368c0                                   |            0 |             0
|  |- storedValue org.apache.lucene.util.CloseableThreadLocal @
0x300c3eb50                                                   |
24 | 3,885,208,712
|  |  |- <class> class org.apache.lucene.util.CloseableThreadLocal @
0x2f9788918                                              |            8
|             8
|  |  |- t java.lang.ThreadLocal @
0x300c3eb68
|           16 |            16
|  |  |  '- <class> class java.lang.ThreadLocal @ 0x2f80f0868 System
Class                                                    |            8
|            24
*|  |  |- hardRefs java.util.WeakHashMap @
0x300c3eb78
|           48 | 3,885,208,656*
|  |  |  |- <class> class java.util.WeakHashMap @ 0x2f8476c00 System
Class                                                    |           16
|            16
|  |  |  |- table java.util.WeakHashMap$Entry[16] @
0x300c3eba8
|           80 | 2,200,016,960
|  |  |  |  |- <class> class java.util.WeakHashMap$Entry[] @
0x2f84789e8
|            0 |             0
|  |  |  |  |-* [7] java.util.WeakHashMap$Entry @
0x306a24950
|           40 |   318,502,920*
|  |  |  |  |  |- <class> class java.util.WeakHashMap$Entry @ 0x2f84786f8
System Class                                        |            0
|             0
|  |  |  |  |  |- queue java.lang.ref.ReferenceQueue @
0x300c3ebf8
|           32 |            48
|  |  |  |  |  |- referent java.lang.Thread @ 0x30678c2c0
web-23
|          112 |           160
|  |  |  |  |  |- value java.util.HashMap @
0x30678cbb0
|           48 |   318,502,880
|  |  |  |  |  |  |- <class> class java.util.HashMap @ 0x2f80b9428 System
Class                                               |           24
|            24
*|  |  |  |  |  |  |- table java.util.HashMap$Entry[32768] @
0x3c07c6f58                                                       |
131,088 |   318,502,832*
|  |  |  |  |  |  |  |- <class> class java.util.HashMap$Entry[] @
0x2f80bd9c8                                                 |            0
|             0
|  |  |  |  |  |  |  |- [10457] java.util.HashMap$Entry @
0x30678cbe0
|           32 |        40,864
|  |  |  |  |  |  |  |  |- <class> class java.util.HashMap$Entry @
0x2f80bd400 System Class                                   |            0
|             0
|  |  |  |  |  |  |  |  |- key java.lang.String @ 0x30678cc00
prod_desc_keywd_en_CA                                          |
32 |            96
|  |  |  |  |  |  |  |  |- value
org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
0x30678cc60              |           24 |        20,344
|  |  |  |  |  |  |  |  |- next java.util.HashMap$Entry @
0x39a2c9100
|           32 |        20,392
|  |  |  |  |  |  |  |  |  |- <class> class java.util.HashMap$Entry @
0x2f80bd400 System Class                                |            0
|             0
|  |  |  |  |  |  |  |  |  |- key java.lang.String @ 0x39a2c9120
3637994_fr_CA_cat_name_keywd                                |           32
|           104
|  |  |  |  |  |  |  |  |  |- value
org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
0x39a2c9188           |           24 |        20,256
|  |  |  |  |  |  |  |  |  |  |- <class> class
org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
0x2f97a69a0|            0 |             0
|  |  |  |  |  |  |  |  |  |  |- this$0
org.apache.solr.analysis.TokenizerChain @
0x300bf6158                                 |           32 |        13,768
|  |  |  |  |  |  |  |  |  |  |- source
org.apache.lucene.analysis.core.WhitespaceTokenizer @
0x39a2c91a0                     |           64 |         8,304
|  |  |  |  |  |  |  |  | * |  |- sink
org.apache.lucene.analysis.snowball.SnowballFilter @
0x39a2c96a8                        |           48 |        10,736*
-------------------------------------------------------------------------------------------------------------------------------------------------------------



Any inputs are most welcome.

-Saroj

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Erick Erickson <er...@gmail.com>.
I wasn't in on tracking down the original issue, but I know
at least one client ran into a problem with weak hash
references that was a bug in the JVM here:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034

Here's the summary:

Parallel CMS (ie with more than one CMS marking thread) does not
enqueue all the dead Reference objects in the old gen.

If you have very large numbers of dead Reference objects (in this case
Finalizable objects) then running CMS with 2 marking threads appears
to cause only a small fraction of Reference objects to be identified.

The unmarked objects build up in old gen and eventually a STW Full GC
is triggered which enqueues the dead Reference objects. Running with
-XX:ConcGCThreads=1 also fixes the problem.

Best
Erick

On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
> Hi All
>
> I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
> when we are indexing lots of data with 16 concurrent threads, Heap grows
> continuously. It remains high and ultimately most of the stuff ends up
> being moved to Old Gen. Eventually, Old Gen also fills up and we start
> getting into excessive GC problem.
>
> I took a heap dump and found that most of the memory is consumed by
> CloseableThreadLocal which is holding a WeakHashMap of Threads and its
> state.
>
> Most of the old gen is full with ThreadLocal eating up 3GB of heap and heap
> dump shows that all such entries are using Snowball Filter. I looked into
> LUCENE-3841 and verified that my version of SOLR 4 has that code.
>
> So, I am wondering the reason for this memory leak - is it due to some
> other bug with Solr/Lucene?
>
> Here is a brief snapshot of HeapDump showing the problem
>
> Class
> Name
> | Shallow Heap | Retained Heap
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
> *org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
> 0x300c3eb28
> |           24 | 3,885,213,072*
> |- <class> class org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @
> 0x2f9753340                                           |            0
> |             0
> |- this$0 org.apache.solr.schema.IndexSchema @
> 0x300bf4048
> |           96 |       276,704
> *|- reuseStrategy org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy
> @ 0x300c3eb40                                      |           16 |
> 3,885,208,728*
> |  |- <class> class
> org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @
> 0x2f98368c0                                   |            0 |             0
> |  |- storedValue org.apache.lucene.util.CloseableThreadLocal @
> 0x300c3eb50                                                   |
> 24 | 3,885,208,712
> |  |  |- <class> class org.apache.lucene.util.CloseableThreadLocal @
> 0x2f9788918                                              |            8
> |             8
> |  |  |- t java.lang.ThreadLocal @
> 0x300c3eb68
> |           16 |            16
> |  |  |  '- <class> class java.lang.ThreadLocal @ 0x2f80f0868 System
> Class                                                    |            8
> |            24
> *|  |  |- hardRefs java.util.WeakHashMap @
> 0x300c3eb78
> |           48 | 3,885,208,656*
> |  |  |  |- <class> class java.util.WeakHashMap @ 0x2f8476c00 System
> Class                                                    |           16
> |            16
> |  |  |  |- table java.util.WeakHashMap$Entry[16] @
> 0x300c3eba8
> |           80 | 2,200,016,960
> |  |  |  |  |- <class> class java.util.WeakHashMap$Entry[] @
> 0x2f84789e8
> |            0 |             0
> |  |  |  |  |-* [7] java.util.WeakHashMap$Entry @
> 0x306a24950
> |           40 |   318,502,920*
> |  |  |  |  |  |- <class> class java.util.WeakHashMap$Entry @ 0x2f84786f8
> System Class                                        |            0
> |             0
> |  |  |  |  |  |- queue java.lang.ref.ReferenceQueue @
> 0x300c3ebf8
> |           32 |            48
> |  |  |  |  |  |- referent java.lang.Thread @ 0x30678c2c0
> web-23
> |          112 |           160
> |  |  |  |  |  |- value java.util.HashMap @
> 0x30678cbb0
> |           48 |   318,502,880
> |  |  |  |  |  |  |- <class> class java.util.HashMap @ 0x2f80b9428 System
> Class                                               |           24
> |            24
> *|  |  |  |  |  |  |- table java.util.HashMap$Entry[32768] @
> 0x3c07c6f58                                                       |
> 131,088 |   318,502,832*
> |  |  |  |  |  |  |  |- <class> class java.util.HashMap$Entry[] @
> 0x2f80bd9c8                                                 |            0
> |             0
> |  |  |  |  |  |  |  |- [10457] java.util.HashMap$Entry @
> 0x30678cbe0
> |           32 |        40,864
> |  |  |  |  |  |  |  |  |- <class> class java.util.HashMap$Entry @
> 0x2f80bd400 System Class                                   |            0
> |             0
> |  |  |  |  |  |  |  |  |- key java.lang.String @ 0x30678cc00
> prod_desc_keywd_en_CA                                          |
> 32 |            96
> |  |  |  |  |  |  |  |  |- value
> org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
> 0x30678cc60              |           24 |        20,344
> |  |  |  |  |  |  |  |  |- next java.util.HashMap$Entry @
> 0x39a2c9100
> |           32 |        20,392
> |  |  |  |  |  |  |  |  |  |- <class> class java.util.HashMap$Entry @
> 0x2f80bd400 System Class                                |            0
> |             0
> |  |  |  |  |  |  |  |  |  |- key java.lang.String @ 0x39a2c9120
> 3637994_fr_CA_cat_name_keywd                                |           32
> |           104
> |  |  |  |  |  |  |  |  |  |- value
> org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
> 0x39a2c9188           |           24 |        20,256
> |  |  |  |  |  |  |  |  |  |  |- <class> class
> org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @
> 0x2f97a69a0|            0 |             0
> |  |  |  |  |  |  |  |  |  |  |- this$0
> org.apache.solr.analysis.TokenizerChain @
> 0x300bf6158                                 |           32 |        13,768
> |  |  |  |  |  |  |  |  |  |  |- source
> org.apache.lucene.analysis.core.WhitespaceTokenizer @
> 0x39a2c91a0                     |           64 |         8,304
> |  |  |  |  |  |  |  |  | * |  |- sink
> org.apache.lucene.analysis.snowball.SnowballFilter @
> 0x39a2c96a8                        |           48 |        10,736*
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> Any inputs are most welcome.
>
> -Saroj

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Aug 2, 2012 at 3:13 AM, Laurent Vaills <la...@gmail.com> wrote:
> Hi everyone,
>
> Is there any chance to get his backported for a 3.6.2 ?
>

Hello, I personally have no problem with it: but its really
technically not a bugfix, just an optimization.

It also doesnt solve the actual problem if you have a tomcat
threadpool configuration recycling threads too fast. There will be
other performance problems.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Robert Muir <rc...@gmail.com>.
On Thu, Aug 2, 2012 at 3:13 AM, Laurent Vaills <la...@gmail.com> wrote:
> Hi everyone,
>
> Is there any chance to get his backported for a 3.6.2 ?
>

Hello, I personally have no problem with it: but its really
technically not a bugfix, just an optimization.

It also doesnt solve the actual problem if you have a tomcat
threadpool configuration recycling threads too fast. There will be
other performance problems.

-- 
lucidimagination.com

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Laurent Vaills <la...@gmail.com>.
Hi everyone,

Is there any chance to get his backported for a 3.6.2 ?

Regards,
Laurent

2012/8/2 Simon Willnauer <si...@gmail.com>

> On Thu, Aug 2, 2012 at 7:53 AM, roz dev <ro...@gmail.com> wrote:
> > Thanks Robert for these inputs.
> >
> > Since we do not really Snowball analyzer for this field, we would not use
> > it for now. If this still does not address our issue, we would tweak
> thread
> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
> as
> > we would be reducing thread pool which can adversely impact our
> throughput
> >
> > If Snowball Filter is being optimized for Solr 4 beta then it would be
> > great for us. If you have already filed a JIRA for this then please let
> me
> > know and I would like to follow it
>
> AFAIK Robert already created and issue here:
> https://issues.apache.org/jira/browse/LUCENE-4279
> and it seems fixed. Given the massive commit last night its already
> committed and backported so it will be in 4.0-BETA.
>
> simon
> >
> > Thanks again
> > Saroj
> >
> >
> >
> >
> >
> > On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rc...@gmail.com> wrote:
> >
> >> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
> >> > Hi All
> >> >
> >> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
> >> that
> >> > when we are indexing lots of data with 16 concurrent threads, Heap
> grows
> >> > continuously. It remains high and ultimately most of the stuff ends up
> >> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
> >> > getting into excessive GC problem.
> >>
> >> Hi: I don't claim to know anything about how tomcat manages threads,
> >> but really you shouldnt have all these objects.
> >>
> >> In general snowball stemmers should be reused per-thread-per-field.
> >> But if you have a lot of fields*threads, especially if there really is
> >> high thread churn on tomcat, then this could be bad with snowball:
> >> see eks dev's comment on
> https://issues.apache.org/jira/browse/LUCENE-3841
> >>
> >> I think it would be useful to see if you can tune tomcat's threadpool
> >> as he describes.
> >>
> >> separately: Snowball stemmers are currently really ram-expensive for
> >> stupid reasons.
> >> each one creates a ton of Among objects, e.g. an EnglishStemmer today
> >> is about 8KB.
> >>
> >> I'll regenerate these and open a JIRA issue: as the snowball code
> >> generator in their svn was improved
> >> recently and each one now takes about 64 bytes instead (the Among's
> >> are static and reused).
> >>
> >> Still this wont really "solve your problem", because the analysis
> >> chain could have other heavy parts
> >> in initialization, but it seems good to fix.
> >>
> >> As a workaround until then you can also just use the "good old
> >> PorterStemmer" (PorterStemFilterFactory in solr).
> >> Its not exactly the same as using Snowball(English) but its pretty
> >> close and also much faster.
> >>
> >> --
> >> lucidimagination.com
> >>
>

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Laurent Vaills <la...@gmail.com>.
Hi everyone,

Is there any chance to get his backported for a 3.6.2 ?

Regards,
Laurent

2012/8/2 Simon Willnauer <si...@gmail.com>

> On Thu, Aug 2, 2012 at 7:53 AM, roz dev <ro...@gmail.com> wrote:
> > Thanks Robert for these inputs.
> >
> > Since we do not really Snowball analyzer for this field, we would not use
> > it for now. If this still does not address our issue, we would tweak
> thread
> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
> as
> > we would be reducing thread pool which can adversely impact our
> throughput
> >
> > If Snowball Filter is being optimized for Solr 4 beta then it would be
> > great for us. If you have already filed a JIRA for this then please let
> me
> > know and I would like to follow it
>
> AFAIK Robert already created and issue here:
> https://issues.apache.org/jira/browse/LUCENE-4279
> and it seems fixed. Given the massive commit last night its already
> committed and backported so it will be in 4.0-BETA.
>
> simon
> >
> > Thanks again
> > Saroj
> >
> >
> >
> >
> >
> > On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rc...@gmail.com> wrote:
> >
> >> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
> >> > Hi All
> >> >
> >> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
> >> that
> >> > when we are indexing lots of data with 16 concurrent threads, Heap
> grows
> >> > continuously. It remains high and ultimately most of the stuff ends up
> >> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
> >> > getting into excessive GC problem.
> >>
> >> Hi: I don't claim to know anything about how tomcat manages threads,
> >> but really you shouldnt have all these objects.
> >>
> >> In general snowball stemmers should be reused per-thread-per-field.
> >> But if you have a lot of fields*threads, especially if there really is
> >> high thread churn on tomcat, then this could be bad with snowball:
> >> see eks dev's comment on
> https://issues.apache.org/jira/browse/LUCENE-3841
> >>
> >> I think it would be useful to see if you can tune tomcat's threadpool
> >> as he describes.
> >>
> >> separately: Snowball stemmers are currently really ram-expensive for
> >> stupid reasons.
> >> each one creates a ton of Among objects, e.g. an EnglishStemmer today
> >> is about 8KB.
> >>
> >> I'll regenerate these and open a JIRA issue: as the snowball code
> >> generator in their svn was improved
> >> recently and each one now takes about 64 bytes instead (the Among's
> >> are static and reused).
> >>
> >> Still this wont really "solve your problem", because the analysis
> >> chain could have other heavy parts
> >> in initialization, but it seems good to fix.
> >>
> >> As a workaround until then you can also just use the "good old
> >> PorterStemmer" (PorterStemFilterFactory in solr).
> >> Its not exactly the same as using Snowball(English) but its pretty
> >> close and also much faster.
> >>
> >> --
> >> lucidimagination.com
> >>
>

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Dawid Weiss <da...@gmail.com>.
http://static1.blip.pl/user_generated/update_pictures/1758685.jpg

On Thu, Aug 2, 2012 at 8:32 AM, roz dev <ro...@gmail.com> wrote:
> wow!! That was quick.
>
> Thanks a ton.
>
>
> On Wed, Aug 1, 2012 at 11:07 PM, Simon Willnauer
> <si...@gmail.com>wrote:
>
>> On Thu, Aug 2, 2012 at 7:53 AM, roz dev <ro...@gmail.com> wrote:
>> > Thanks Robert for these inputs.
>> >
>> > Since we do not really Snowball analyzer for this field, we would not use
>> > it for now. If this still does not address our issue, we would tweak
>> thread
>> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
>> as
>> > we would be reducing thread pool which can adversely impact our
>> throughput
>> >
>> > If Snowball Filter is being optimized for Solr 4 beta then it would be
>> > great for us. If you have already filed a JIRA for this then please let
>> me
>> > know and I would like to follow it
>>
>> AFAIK Robert already created and issue here:
>> https://issues.apache.org/jira/browse/LUCENE-4279
>> and it seems fixed. Given the massive commit last night its already
>> committed and backported so it will be in 4.0-BETA.
>>
>> simon
>> >
>> > Thanks again
>> > Saroj
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rc...@gmail.com> wrote:
>> >
>> >> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
>> >> > Hi All
>> >> >
>> >> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
>> >> that
>> >> > when we are indexing lots of data with 16 concurrent threads, Heap
>> grows
>> >> > continuously. It remains high and ultimately most of the stuff ends up
>> >> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
>> >> > getting into excessive GC problem.
>> >>
>> >> Hi: I don't claim to know anything about how tomcat manages threads,
>> >> but really you shouldnt have all these objects.
>> >>
>> >> In general snowball stemmers should be reused per-thread-per-field.
>> >> But if you have a lot of fields*threads, especially if there really is
>> >> high thread churn on tomcat, then this could be bad with snowball:
>> >> see eks dev's comment on
>> https://issues.apache.org/jira/browse/LUCENE-3841
>> >>
>> >> I think it would be useful to see if you can tune tomcat's threadpool
>> >> as he describes.
>> >>
>> >> separately: Snowball stemmers are currently really ram-expensive for
>> >> stupid reasons.
>> >> each one creates a ton of Among objects, e.g. an EnglishStemmer today
>> >> is about 8KB.
>> >>
>> >> I'll regenerate these and open a JIRA issue: as the snowball code
>> >> generator in their svn was improved
>> >> recently and each one now takes about 64 bytes instead (the Among's
>> >> are static and reused).
>> >>
>> >> Still this wont really "solve your problem", because the analysis
>> >> chain could have other heavy parts
>> >> in initialization, but it seems good to fix.
>> >>
>> >> As a workaround until then you can also just use the "good old
>> >> PorterStemmer" (PorterStemFilterFactory in solr).
>> >> Its not exactly the same as using Snowball(English) but its pretty
>> >> close and also much faster.
>> >>
>> >> --
>> >> lucidimagination.com
>> >>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by roz dev <ro...@gmail.com>.
wow!! That was quick.

Thanks a ton.


On Wed, Aug 1, 2012 at 11:07 PM, Simon Willnauer
<si...@gmail.com>wrote:

> On Thu, Aug 2, 2012 at 7:53 AM, roz dev <ro...@gmail.com> wrote:
> > Thanks Robert for these inputs.
> >
> > Since we do not really Snowball analyzer for this field, we would not use
> > it for now. If this still does not address our issue, we would tweak
> thread
> > pool as per eks dev suggestion - I am bit hesitant to do this change yet
> as
> > we would be reducing thread pool which can adversely impact our
> throughput
> >
> > If Snowball Filter is being optimized for Solr 4 beta then it would be
> > great for us. If you have already filed a JIRA for this then please let
> me
> > know and I would like to follow it
>
> AFAIK Robert already created and issue here:
> https://issues.apache.org/jira/browse/LUCENE-4279
> and it seems fixed. Given the massive commit last night its already
> committed and backported so it will be in 4.0-BETA.
>
> simon
> >
> > Thanks again
> > Saroj
> >
> >
> >
> >
> >
> > On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rc...@gmail.com> wrote:
> >
> >> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
> >> > Hi All
> >> >
> >> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
> >> that
> >> > when we are indexing lots of data with 16 concurrent threads, Heap
> grows
> >> > continuously. It remains high and ultimately most of the stuff ends up
> >> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
> >> > getting into excessive GC problem.
> >>
> >> Hi: I don't claim to know anything about how tomcat manages threads,
> >> but really you shouldnt have all these objects.
> >>
> >> In general snowball stemmers should be reused per-thread-per-field.
> >> But if you have a lot of fields*threads, especially if there really is
> >> high thread churn on tomcat, then this could be bad with snowball:
> >> see eks dev's comment on
> https://issues.apache.org/jira/browse/LUCENE-3841
> >>
> >> I think it would be useful to see if you can tune tomcat's threadpool
> >> as he describes.
> >>
> >> separately: Snowball stemmers are currently really ram-expensive for
> >> stupid reasons.
> >> each one creates a ton of Among objects, e.g. an EnglishStemmer today
> >> is about 8KB.
> >>
> >> I'll regenerate these and open a JIRA issue: as the snowball code
> >> generator in their svn was improved
> >> recently and each one now takes about 64 bytes instead (the Among's
> >> are static and reused).
> >>
> >> Still this wont really "solve your problem", because the analysis
> >> chain could have other heavy parts
> >> in initialization, but it seems good to fix.
> >>
> >> As a workaround until then you can also just use the "good old
> >> PorterStemmer" (PorterStemFilterFactory in solr).
> >> Its not exactly the same as using Snowball(English) but its pretty
> >> close and also much faster.
> >>
> >> --
> >> lucidimagination.com
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Simon Willnauer <si...@gmail.com>.
On Thu, Aug 2, 2012 at 7:53 AM, roz dev <ro...@gmail.com> wrote:
> Thanks Robert for these inputs.
>
> Since we do not really Snowball analyzer for this field, we would not use
> it for now. If this still does not address our issue, we would tweak thread
> pool as per eks dev suggestion - I am bit hesitant to do this change yet as
> we would be reducing thread pool which can adversely impact our throughput
>
> If Snowball Filter is being optimized for Solr 4 beta then it would be
> great for us. If you have already filed a JIRA for this then please let me
> know and I would like to follow it

AFAIK Robert already created and issue here:
https://issues.apache.org/jira/browse/LUCENE-4279
and it seems fixed. Given the massive commit last night its already
committed and backported so it will be in 4.0-BETA.

simon
>
> Thanks again
> Saroj
>
>
>
>
>
> On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rc...@gmail.com> wrote:
>
>> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
>> > Hi All
>> >
>> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
>> that
>> > when we are indexing lots of data with 16 concurrent threads, Heap grows
>> > continuously. It remains high and ultimately most of the stuff ends up
>> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
>> > getting into excessive GC problem.
>>
>> Hi: I don't claim to know anything about how tomcat manages threads,
>> but really you shouldnt have all these objects.
>>
>> In general snowball stemmers should be reused per-thread-per-field.
>> But if you have a lot of fields*threads, especially if there really is
>> high thread churn on tomcat, then this could be bad with snowball:
>> see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841
>>
>> I think it would be useful to see if you can tune tomcat's threadpool
>> as he describes.
>>
>> separately: Snowball stemmers are currently really ram-expensive for
>> stupid reasons.
>> each one creates a ton of Among objects, e.g. an EnglishStemmer today
>> is about 8KB.
>>
>> I'll regenerate these and open a JIRA issue: as the snowball code
>> generator in their svn was improved
>> recently and each one now takes about 64 bytes instead (the Among's
>> are static and reused).
>>
>> Still this wont really "solve your problem", because the analysis
>> chain could have other heavy parts
>> in initialization, but it seems good to fix.
>>
>> As a workaround until then you can also just use the "good old
>> PorterStemmer" (PorterStemFilterFactory in solr).
>> Its not exactly the same as using Snowball(English) but its pretty
>> close and also much faster.
>>
>> --
>> lucidimagination.com
>>

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Simon Willnauer <si...@gmail.com>.
On Thu, Aug 2, 2012 at 7:53 AM, roz dev <ro...@gmail.com> wrote:
> Thanks Robert for these inputs.
>
> Since we do not really Snowball analyzer for this field, we would not use
> it for now. If this still does not address our issue, we would tweak thread
> pool as per eks dev suggestion - I am bit hesitant to do this change yet as
> we would be reducing thread pool which can adversely impact our throughput
>
> If Snowball Filter is being optimized for Solr 4 beta then it would be
> great for us. If you have already filed a JIRA for this then please let me
> know and I would like to follow it

AFAIK Robert already created and issue here:
https://issues.apache.org/jira/browse/LUCENE-4279
and it seems fixed. Given the massive commit last night its already
committed and backported so it will be in 4.0-BETA.

simon
>
> Thanks again
> Saroj
>
>
>
>
>
> On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rc...@gmail.com> wrote:
>
>> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
>> > Hi All
>> >
>> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
>> that
>> > when we are indexing lots of data with 16 concurrent threads, Heap grows
>> > continuously. It remains high and ultimately most of the stuff ends up
>> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
>> > getting into excessive GC problem.
>>
>> Hi: I don't claim to know anything about how tomcat manages threads,
>> but really you shouldnt have all these objects.
>>
>> In general snowball stemmers should be reused per-thread-per-field.
>> But if you have a lot of fields*threads, especially if there really is
>> high thread churn on tomcat, then this could be bad with snowball:
>> see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841
>>
>> I think it would be useful to see if you can tune tomcat's threadpool
>> as he describes.
>>
>> separately: Snowball stemmers are currently really ram-expensive for
>> stupid reasons.
>> each one creates a ton of Among objects, e.g. an EnglishStemmer today
>> is about 8KB.
>>
>> I'll regenerate these and open a JIRA issue: as the snowball code
>> generator in their svn was improved
>> recently and each one now takes about 64 bytes instead (the Among's
>> are static and reused).
>>
>> Still this wont really "solve your problem", because the analysis
>> chain could have other heavy parts
>> in initialization, but it seems good to fix.
>>
>> As a workaround until then you can also just use the "good old
>> PorterStemmer" (PorterStemFilterFactory in solr).
>> Its not exactly the same as using Snowball(English) but its pretty
>> close and also much faster.
>>
>> --
>> lucidimagination.com
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by roz dev <ro...@gmail.com>.
Thanks Robert for these inputs.

Since we do not really Snowball analyzer for this field, we would not use
it for now. If this still does not address our issue, we would tweak thread
pool as per eks dev suggestion - I am bit hesitant to do this change yet as
we would be reducing thread pool which can adversely impact our throughput

If Snowball Filter is being optimized for Solr 4 beta then it would be
great for us. If you have already filed a JIRA for this then please let me
know and I would like to follow it

Thanks again
Saroj





On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rc...@gmail.com> wrote:

> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
> > Hi All
> >
> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
> that
> > when we are indexing lots of data with 16 concurrent threads, Heap grows
> > continuously. It remains high and ultimately most of the stuff ends up
> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
> > getting into excessive GC problem.
>
> Hi: I don't claim to know anything about how tomcat manages threads,
> but really you shouldnt have all these objects.
>
> In general snowball stemmers should be reused per-thread-per-field.
> But if you have a lot of fields*threads, especially if there really is
> high thread churn on tomcat, then this could be bad with snowball:
> see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841
>
> I think it would be useful to see if you can tune tomcat's threadpool
> as he describes.
>
> separately: Snowball stemmers are currently really ram-expensive for
> stupid reasons.
> each one creates a ton of Among objects, e.g. an EnglishStemmer today
> is about 8KB.
>
> I'll regenerate these and open a JIRA issue: as the snowball code
> generator in their svn was improved
> recently and each one now takes about 64 bytes instead (the Among's
> are static and reused).
>
> Still this wont really "solve your problem", because the analysis
> chain could have other heavy parts
> in initialization, but it seems good to fix.
>
> As a workaround until then you can also just use the "good old
> PorterStemmer" (PorterStemFilterFactory in solr).
> Its not exactly the same as using Snowball(English) but its pretty
> close and also much faster.
>
> --
> lucidimagination.com
>

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by roz dev <ro...@gmail.com>.
Thanks Robert for these inputs.

Since we do not really Snowball analyzer for this field, we would not use
it for now. If this still does not address our issue, we would tweak thread
pool as per eks dev suggestion - I am bit hesitant to do this change yet as
we would be reducing thread pool which can adversely impact our throughput

If Snowball Filter is being optimized for Solr 4 beta then it would be
great for us. If you have already filed a JIRA for this then please let me
know and I would like to follow it

Thanks again
Saroj





On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir <rc...@gmail.com> wrote:

> On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
> > Hi All
> >
> > I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
> that
> > when we are indexing lots of data with 16 concurrent threads, Heap grows
> > continuously. It remains high and ultimately most of the stuff ends up
> > being moved to Old Gen. Eventually, Old Gen also fills up and we start
> > getting into excessive GC problem.
>
> Hi: I don't claim to know anything about how tomcat manages threads,
> but really you shouldnt have all these objects.
>
> In general snowball stemmers should be reused per-thread-per-field.
> But if you have a lot of fields*threads, especially if there really is
> high thread churn on tomcat, then this could be bad with snowball:
> see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841
>
> I think it would be useful to see if you can tune tomcat's threadpool
> as he describes.
>
> separately: Snowball stemmers are currently really ram-expensive for
> stupid reasons.
> each one creates a ton of Among objects, e.g. an EnglishStemmer today
> is about 8KB.
>
> I'll regenerate these and open a JIRA issue: as the snowball code
> generator in their svn was improved
> recently and each one now takes about 64 bytes instead (the Among's
> are static and reused).
>
> Still this wont really "solve your problem", because the analysis
> chain could have other heavy parts
> in initialization, but it seems good to fix.
>
> As a workaround until then you can also just use the "good old
> PorterStemmer" (PorterStemFilterFactory in solr).
> Its not exactly the same as using Snowball(English) but its pretty
> close and also much faster.
>
> --
> lucidimagination.com
>

Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
> Hi All
>
> I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
> when we are indexing lots of data with 16 concurrent threads, Heap grows
> continuously. It remains high and ultimately most of the stuff ends up
> being moved to Old Gen. Eventually, Old Gen also fills up and we start
> getting into excessive GC problem.

Hi: I don't claim to know anything about how tomcat manages threads,
but really you shouldnt have all these objects.

In general snowball stemmers should be reused per-thread-per-field.
But if you have a lot of fields*threads, especially if there really is
high thread churn on tomcat, then this could be bad with snowball:
see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841

I think it would be useful to see if you can tune tomcat's threadpool
as he describes.

separately: Snowball stemmers are currently really ram-expensive for
stupid reasons.
each one creates a ton of Among objects, e.g. an EnglishStemmer today
is about 8KB.

I'll regenerate these and open a JIRA issue: as the snowball code
generator in their svn was improved
recently and each one now takes about 64 bytes instead (the Among's
are static and reused).

Still this wont really "solve your problem", because the analysis
chain could have other heavy parts
in initialization, but it seems good to fix.

As a workaround until then you can also just use the "good old
PorterStemmer" (PorterStemFilterFactory in solr).
Its not exactly the same as using Snowball(English) but its pretty
close and also much faster.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Jul 31, 2012 at 2:34 PM, roz dev <ro...@gmail.com> wrote:
> Hi All
>
> I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
> when we are indexing lots of data with 16 concurrent threads, Heap grows
> continuously. It remains high and ultimately most of the stuff ends up
> being moved to Old Gen. Eventually, Old Gen also fills up and we start
> getting into excessive GC problem.

Hi: I don't claim to know anything about how tomcat manages threads,
but really you shouldnt have all these objects.

In general snowball stemmers should be reused per-thread-per-field.
But if you have a lot of fields*threads, especially if there really is
high thread churn on tomcat, then this could be bad with snowball:
see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841

I think it would be useful to see if you can tune tomcat's threadpool
as he describes.

separately: Snowball stemmers are currently really ram-expensive for
stupid reasons.
each one creates a ton of Among objects, e.g. an EnglishStemmer today
is about 8KB.

I'll regenerate these and open a JIRA issue: as the snowball code
generator in their svn was improved
recently and each one now takes about 64 bytes instead (the Among's
are static and reused).

Still this wont really "solve your problem", because the analysis
chain could have other heavy parts
in initialization, but it seems good to fix.

As a workaround until then you can also just use the "good old
PorterStemmer" (PorterStemFilterFactory in solr).
Its not exactly the same as using Snowball(English) but its pretty
close and also much faster.

-- 
lucidimagination.com