You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by zhenglingyun <ko...@163.com> on 2015/12/15 12:03:59 UTC

solrcloud used a lot of memory and memory keep increasing during long time run

Hi, list

I’m new to solr. Recently I encounter a “memory leak” problem with solrcloud.

I have two 64GB servers running a solrcloud cluster. In the solrcloud, I have
one collection with about 400k docs. The index size of the collection is about
500MB. Memory for solr is 16GB.

Following is "ps aux | grep solr” :

/usr/java/jdk1.7.0_67-cloudera/bin/java -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true -Dsolr.hdfs.blockcache.direct.memory.allocation=true -Dsolr.hdfs.blockcache.blocksperbank=16384 -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264 -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -Xloggc:/var/log/solr/gc.log -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf -Dsolr.authentication.simple.anonymous.allowed=true -Dsolr.security.proxyuser.hue.hosts=* -Dsolr.security.proxyuser.hue.groups=* -Dhost=bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983 -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984 -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true -Dsolr.hdfs.blockcache.direct.memory.allocation=true -Dsolr.hdfs.blockcache.blocksperbank=16384 -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264 -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -Xloggc:/var/log/solr/gc.log -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf -Dsolr.authentication.simple.anonymous.allowed=true -Dsolr.security.proxyuser.hue.hosts=* -Dsolr.security.proxyuser.hue.groups=* -Dhost=bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983 -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984 -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath /usr/lib/bigtop-tomcat/bin/bootstrap.jar -Dcatalina.base=/var/lib/solr/tomcat-deployment -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/ org.apache.catalina.startup.Bootstrap start


solr version is solr4.4.0-cdh5.3.0
jdk version is 1.7.0_67

Soft commit time is 1.5s. And we have real time indexing/partialupdating rate about 100 docs per second.

When fresh started, Solr will use about 500M memory(the memory show in solr ui panel). 
After several days running, Solr will meet with long time gc problems, and no response to user query.

During solr running, the memory used by solr is keep increasing until some large value, and decrease to
a low level(because of gc), and keep increasing until a larger value again, then decrease to a low level again … and keep
increasing to an more larger value … until solr has no response and i restart it.


I don’t know how to solve this problem. Can you give me some advices? 

Thanks.

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Posted by zhenglingyun <ko...@163.com>.

Yesterday I found there are some slow join operations in another collection whose from index is the collection with many searchers opened.
Those slow join operations will be auto warmed when that collection is soft committed. The auto warm time is about 120s but the soft commit
time is 30s. So that collection keeps auto warming all the time. Every join operation will open a searcher of “fromindex” collection, and doesn’t
close it until join is finished. So I see so many opened searchers in “fromindex” collection, and those opened searchers waste a lot of memory.

I removed those joins, now the memory used by Solr is OK.

Thank you very much Erick. I have got stuck on these problem for several weeks!
Thank you all.




> 在 2015年12月22日，13:24，Erick Erickson <er...@gmail.com> 写道：
> 
> bq: What can we benefit from set maxWarmingSearchers to a larger value
> 
> You really don't get _any_ value. That's in there as a safety valve to prevent run-away resource consumption. Getting this warning in your logs means you're mis-configuring your system. Increasing the value is almost totally useless. It simply makes little sense to have your soft commit take less time than your autowarming, that's a ton of wasted work for no purpose. It's highly unlikely that your users _really_ need 1.5 second latency, my bet is 10-15 seconds would be fine. You know best of course, but this kind of requirement is often something that people _think_ they need but really don't. It particularly amuses me when the time between when a document changes and any attempt is made to send it to solr is minutes, but the product manager insists that "Solr must show the doc within two seconds of sending it to the index".
> 
> It's often actually acceptable for your users to know "it may take up to a minute for the docs to be searchable". What's usually not acceptable is unpredictability. But again that's up to your product managers.
> 
> bq: You mean if my customer SearchComponent open a searcher, it will exceed the limit set by maxWarmingSearchers? 
> 
> Not at all. but if you don't close it properly (it's reference counted), then more and more searchers will stay open, chewing up memory. So you may just be failing to close them and seeing memory increase because of that.
> 
> Best,
> Erick
> 
> On Mon, Dec 21, 2015 at 6:47 PM, zhenglingyun <konghuarukhr@163.com <ma...@163.com>> wrote:
> Yes, I do have some custom “Tokenizer"s and “SearchComponent"s.
> 
> Here is the screenshot:
> 
> 
> The number of opened searchers keeps changing. This time it’s 10.
> 
> You mean if my customer SearchComponent open a searcher, it will exceed
> the limit set by maxWarmingSearchers? I’ll check that, thanks!
> 
> I have to do a short time commit. Our application needs a near real time searching
> service. But I’m not sure whether Solr can support NRT search in other ways. Can
> you give me some advices?
> 
> The value of maxWarmingSearchers is copied from some example configs I think,
> I’ll try to set it back to 2.
> 
> What can we benefit from set maxWarmingSearchers to a larger value? I don't find
> the answer on google and apache-solr-ref-guide.
> 
> 
> 
> 
>> 在 2015年12月22日，00:34，Erick Erickson <erickerickson@gmail.com <ma...@gmail.com>> 写道：
>> 
>> Do you have any custom components? Indeed, you shouldn't have
>> that many searchers open. But could we see a screenshot? That's
>> the best way to insure that we're talking about the same thing.
>> 
>> Your autocommit settings are really hurting you. Your commit interval
>> should be as long as you can tolerate. At that kind of commit frequency,
>> your caches are of very limited usefulness anyway, so you can pretty
>> much shut them off. Every 1.5 seconds, they're invalidated totally.
>> 
>> Upping maxWarmingSearchers is almost always a mistake. That's
>> a safety valve that's there in order to prevent runaway resource
>> consumption and almost always means the system is mis-configured.
>> I'd put it back to 2 and tune the rest of the system to avoid it rather
>> than bumping it up.
>> 
>> Best,
>> Erick
>> 
>> On Sun, Dec 20, 2015 at 11:43 PM, zhenglingyun <konghuarukhr@163.com <ma...@163.com>> wrote:
>>> Just now, I see about 40 "Searchers@XXXX main" displayed in Solr Web UI: collection -> Plugins/Stats -> CORE
>>> 
>>> I think it’s abnormal!
>>> 
>>> softcommit is set to 1.5s, but warmupTime needs about 3s
>>> Does it lead to so many Searchers?
>>> 
>>> maxWarmingSearchers is set to 4 in my solrconfig.xml,
>>> doesn’t it will prevent Solr from creating more than 4 Searchers?
>>> 
>>> 
>>> 
>>>> 在 2015年12月21日，14:43，zhenglingyun <konghuarukhr@163.com <ma...@163.com>> 写道：
>>>> 
>>>> Thanks Erick for pointing out the memory change in a sawtooth pattern.
>>>> The problem troubles me is that the bottom point of the sawtooth keeps increasing.
>>>> And when the used capacity of old generation exceeds the threshold set by CMS’s
>>>> CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU cycle
>>>> but the used old generation memory does not decrease.
>>>> 
>>>> After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and
>>>> adjust the parameters of JVM from
>>>>   -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>   -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70
>>>>   -XX:+CMSParallelRemarkEnabled
>>>> to
>>>>   -XX:NewRatio=3
>>>>   -XX:SurvivorRatio=4
>>>>   -XX:TargetSurvivorRatio=90
>>>>   -XX:MaxTenuringThreshold=8
>>>>   -XX:+UseConcMarkSweepGC
>>>>   -XX:+UseParNewGC
>>>>   -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
>>>>   -XX:+CMSScavengeBeforeRemark
>>>>   -XX:PretenureSizeThreshold=64m
>>>>   -XX:+UseCMSInitiatingOccupancyOnly
>>>>   -XX:CMSInitiatingOccupancyFraction=50
>>>>   -XX:CMSMaxAbortablePrecleanTime=6000
>>>>   -XX:+CMSParallelRemarkEnabled
>>>>   -XX:+ParallelRefProcEnabled
>>>>   -XX:-CMSConcurrentMTEnabled
>>>> which is taken from bin/solr.in.sh <http://solr.in.sh/>
>>>> I hope this can reduce gc pause time and full gc times.
>>>> And maybe the memory increasing problem will disappear if I’m lucky.
>>>> 
>>>> After several day's running, the memory on one of my two servers increased to 90% again…
>>>> (When solr is started, the memory used by solr is less than 1G.)
>>>> 
>>>> Following is the output of stat -gccause -h5 <pid> 1000:
>>>> 
>>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>>> 9.56   0.00   8.65  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
>>>> 9.56   0.00  51.10  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
>>>> 0.00   9.23  10.23  91.35  65.89  69380 3076.135 16563 1579.639 4655.774 Allocation Failure   No GC
>>>> 7.90   0.00   9.74  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
>>>> 7.90   0.00  67.45  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
>>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>>> 0.00   7.48  16.18  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
>>>> 0.00   7.48  73.77  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
>>>> 8.61   0.00  29.86  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
>>>> 8.61   0.00  90.16  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
>>>> 0.00   7.46  47.89  91.46  65.89  69384 3076.258 16565 1579.707 4655.966 Allocation Failure   No GC
>>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>>> 8.67   0.00  11.98  91.49  65.89  69385 3076.287 16565 1579.707 4655.995 Allocation Failure   No GC
>>>> 0.00  11.76   9.24  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
>>>> 0.00  11.76  64.53  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
>>>> 7.25   0.00  20.39  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
>>>> 7.25   0.00  81.56  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
>>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>>> 0.00   8.05  34.42  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
>>>> 0.00   8.05  84.17  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
>>>> 8.54   0.00  55.14  91.62  65.89  69389 3076.420 16567 1579.786 4656.205 Allocation Failure   No GC
>>>> 0.00   7.74  12.42  91.66  65.89  69390 3076.456 16567 1579.786 4656.242 Allocation Failure   No GC
>>>> 9.60   0.00  11.00  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
>>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>>> 9.60   0.00  69.24  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
>>>> 0.00   8.70  18.21  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
>>>> 0.00   8.70  61.92  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
>>>> 7.36   0.00   3.49  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
>>>> 7.36   0.00  42.03  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
>>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>>> 0.00   9.77   0.00  91.80  65.89  69394 3076.604 16569 1579.870 4656.475 Allocation Failure   No GC
>>>> 9.08   0.00   9.92  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
>>>> 9.08   0.00  58.90  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
>>>> 0.00   8.44  16.20  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
>>>> 0.00   8.44  71.95  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
>>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>>> 8.11   0.00  30.59  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
>>>> 8.11   0.00  93.41  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
>>>> 0.00   9.77  57.34  91.96  65.89  69398 3076.724 16571 1579.930 4656.654 Allocation Failure   No GC
>>>> 
>>>> Full gc seems can’t free any garbage any more (Or the garbage produced is as fast as gc freed?)
>>>> On the other hand, another replication of the collection on another server(the collection has two replications)
>>>> uses 40% of old generation memory, and doesn’t trigger so many full gc.
>>>> 
>>>> 
>>>> Following is the output of eclipse MAT leak suspects:
>>>> 
>>>> Problem Suspect 1
>>>> 
>>>> 4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 3,743,067,520 (64.12%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"
>>>> 
>>>> Keywords
>>>> java.lang.Object[]
>>>> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
>>>> org.apache.lucene.index.SegmentCoreReaders
>>>> 
>>>> Details »
>>>> Problem Suspect 2
>>>> 
>>>> 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 970,614,912 (16.63%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"
>>>> 
>>>> Keywords
>>>> java.lang.Object[]
>>>> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
>>>> org.apache.lucene.index.StandardDirectoryReader
>>>> 
>>>> Details »
>>>> 
>>>> 
>>>> 
>>>> Class structure in above “Details":
>>>> 
>>>> java.lang.Thread @XXX
>>>>   <Java Local> java.util.ArrayList @XXXX
>>>>       elementData java.lang.Object[3141] @XXXX
>>>>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>>>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>>>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>>>           …
>>>> a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect 1, 2785 in Suspect 2)
>>>> 
>>>> Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal?
>>>> 
>>>> Thanks.
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 在 2015年12月16日，00:44，Erick Erickson <erickerickson@gmail.com <ma...@gmail.com>> 写道：
>>>>> 
>>>>> Rahul's comments were spot on. You can gain more confidence that this
>>>>> is normal if if you try attaching a memory reporting program (jconsole
>>>>> is one) you'll see the memory grow for quite a while, then garbage
>>>>> collection kicks in and you'll see it drop in a sawtooth pattern.
>>>>> 
>>>>> Best,
>>>>> Erick
>>>>> 
>>>>> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <konghuarukhr@163.com <ma...@163.com>> wrote:
>>>>>> Thank you very much.
>>>>>> I will try reduce the heap memory and check if the memory still keep increasing or not.
>>>>>> 
>>>>>>> 在 2015年12月15日，19:37，Rahul Ramesh <rr.iiitb@gmail.com <ma...@gmail.com>> 写道：
>>>>>>> 
>>>>>>> You should actually decrease solr heap size. Let me explain a bit.
>>>>>>> 
>>>>>>> Solr requires very less heap memory for its operation and more memory for
>>>>>>> storing data in main memory. This is because solr uses mmap for storing the
>>>>>>> index files.
>>>>>>> Please check the link
>>>>>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html <http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html> for
>>>>>>> understanding how solr operates on files .
>>>>>>> 
>>>>>>> Solr has typical problem of Garbage collection once you the heap size to a
>>>>>>> large value. It will have indeterminate pauses due to GC. The amount of
>>>>>>> heap memory required is difficult to tell. However the way we tuned this
>>>>>>> parameter is setting it to a low value and increasing it by 1Gb whenever
>>>>>>> OOM is thrown.
>>>>>>> 
>>>>>>> Please check the problem of having large Java Heap
>>>>>>> 
>>>>>>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap <http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap>
>>>>>>> 
>>>>>>> 
>>>>>>> Just for your reference, in our production setup, we have data of around
>>>>>>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
>>>>>>> the rest of the memory we will leave it to OS to manage. We do around 1000
>>>>>>> (search + Insert)/second on the data.
>>>>>>> 
>>>>>>> I hope this helps.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Rahul
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <konghuarukhr@163.com <ma...@163.com>> wrote:
>>>>>>> 
>>>>>>>> Hi, list
>>>>>>>> 
>>>>>>>> I’m new to solr. Recently I encounter a “memory leak” problem with
>>>>>>>> solrcloud.
>>>>>>>> 
>>>>>>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>>>>>>>> have
>>>>>>>> one collection with about 400k docs. The index size of the collection is
>>>>>>>> about
>>>>>>>> 500MB. Memory for solr is 16GB.
>>>>>>>> 
>>>>>>>> Following is "ps aux | grep solr” :
>>>>>>>> 
>>>>>>>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>>>>>>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>>>>>>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>>>>> -Xloggc:/var/log/solr/gc.log
>>>>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181 <http://bjzw-datacenter-hadoop-160.d.yourmall.cc:2181/>,
>>>>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181 <http://bjzw-datacenter-hadoop-163.d.yourmall.cc:2181/>,
>>>>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr <http://bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr>
>>>>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc <http://bjzw-datacenter-solr-15.d.yourmall.cc/> -Djetty.port=8983 -Dsolr.host=
>>>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc <http://bjzw-datacenter-solr-15.d.yourmall.cc/> -Dsolr.port=8983
>>>>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>>>>> -Xloggc:/var/log/solr/gc.log
>>>>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181 <http://bjzw-datacenter-hadoop-160.d.yourmall.cc:2181/>,
>>>>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181 <http://bjzw-datacenter-hadoop-163.d.yourmall.cc:2181/>,
>>>>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr <http://bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr>
>>>>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc <http://bjzw-datacenter-solr-15.d.yourmall.cc/> -Djetty.port=8983 -Dsolr.host=
>>>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc <http://bjzw-datacenter-solr-15.d.yourmall.cc/> -Dsolr.port=8983
>>>>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>>>>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>>>>>>>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>>>>>>>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>>>>>>>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
>>>>>>>> org.apache.catalina.startup.Bootstrap start
>>>>>>>> 
>>>>>>>> 
>>>>>>>> solr version is solr4.4.0-cdh5.3.0
>>>>>>>> jdk version is 1.7.0_67
>>>>>>>> 
>>>>>>>> Soft commit time is 1.5s. And we have real time indexing/partialupdating
>>>>>>>> rate about 100 docs per second.
>>>>>>>> 
>>>>>>>> When fresh started, Solr will use about 500M memory(the memory show in
>>>>>>>> solr ui panel).
>>>>>>>> After several days running, Solr will meet with long time gc problems, and
>>>>>>>> no response to user query.
>>>>>>>> 
>>>>>>>> During solr running, the memory used by solr is keep increasing until some
>>>>>>>> large value, and decrease to
>>>>>>>> a low level(because of gc), and keep increasing until a larger value
>>>>>>>> again, then decrease to a low level again … and keep
>>>>>>>> increasing to an more larger value … until solr has no response and i
>>>>>>>> restart it.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I don’t know how to solve this problem. Can you give me some advices?
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>> 
>>> 
> 
>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Posted by Erick Erickson <er...@gmail.com>.

bq: What can we benefit from set maxWarmingSearchers to a larger value

You really don't get _any_ value. That's in there as a safety valve to
prevent run-away resource consumption. Getting this warning in your logs
means you're mis-configuring your system. Increasing the value is almost
totally useless. It simply makes little sense to have your soft commit take
less time than your autowarming, that's a ton of wasted work for no
purpose. It's highly unlikely that your users _really_ need 1.5 second
latency, my bet is 10-15 seconds would be fine. You know best of course,
but this kind of requirement is often something that people _think_ they
need but really don't. It particularly amuses me when the time between when
a document changes and any attempt is made to send it to solr is minutes,
but the product manager insists that "Solr must show the doc within two
seconds of sending it to the index".

It's often actually acceptable for your users to know "it may take up to a
minute for the docs to be searchable". What's usually not acceptable is
unpredictability. But again that's up to your product managers.

bq: You mean if my customer SearchComponent open a searcher, it will exceed the
limit set by maxWarmingSearchers?

Not at all. but if you don't close it properly (it's reference counted),
then more and more searchers will stay open, chewing up memory. So you may
just be failing to close them and seeing memory increase because of that.

Best,
Erick

On Mon, Dec 21, 2015 at 6:47 PM, zhenglingyun <ko...@163.com> wrote:

> Yes, I do have some custom “Tokenizer"s and “SearchComponent"s.
>
> Here is the screenshot:
>
>
> The number of opened searchers keeps changing. This time it’s 10.
>
> You mean if my customer SearchComponent open a searcher, it will exceed
> the limit set by maxWarmingSearchers? I’ll check that, thanks!
>
> I have to do a short time commit. Our application needs a near real time
> searching
> service. But I’m not sure whether Solr can support NRT search in other
> ways. Can
> you give me some advices?
>
> The value of maxWarmingSearchers is copied from some example configs I
> think,
> I’ll try to set it back to 2.
>
> What can we benefit from set maxWarmingSearchers to a larger value? I
> don't find
> the answer on google and apache-solr-ref-guide.
>
>
>
>
> 在 2015年12月22日，00:34，Erick Erickson <er...@gmail.com> 写道：
>
> Do you have any custom components? Indeed, you shouldn't have
> that many searchers open. But could we see a screenshot? That's
> the best way to insure that we're talking about the same thing.
>
> Your autocommit settings are really hurting you. Your commit interval
> should be as long as you can tolerate. At that kind of commit frequency,
> your caches are of very limited usefulness anyway, so you can pretty
> much shut them off. Every 1.5 seconds, they're invalidated totally.
>
> Upping maxWarmingSearchers is almost always a mistake. That's
> a safety valve that's there in order to prevent runaway resource
> consumption and almost always means the system is mis-configured.
> I'd put it back to 2 and tune the rest of the system to avoid it rather
> than bumping it up.
>
> Best,
> Erick
>
> On Sun, Dec 20, 2015 at 11:43 PM, zhenglingyun <ko...@163.com>
> wrote:
>
> Just now, I see about 40 "Searchers@XXXX main" displayed in Solr Web UI:
> collection -> Plugins/Stats -> CORE
>
> I think it’s abnormal!
>
> softcommit is set to 1.5s, but warmupTime needs about 3s
> Does it lead to so many Searchers?
>
> maxWarmingSearchers is set to 4 in my solrconfig.xml,
> doesn’t it will prevent Solr from creating more than 4 Searchers?
>
>
>
> 在 2015年12月21日，14:43，zhenglingyun <ko...@163.com> 写道：
>
> Thanks Erick for pointing out the memory change in a sawtooth pattern.
> The problem troubles me is that the bottom point of the sawtooth keeps
> increasing.
> And when the used capacity of old generation exceeds the threshold set by
> CMS’s
> CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU
> cycle
> but the used old generation memory does not decrease.
>
> After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and
> adjust the parameters of JVM from
>   -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>   -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70
>   -XX:+CMSParallelRemarkEnabled
> to
>   -XX:NewRatio=3
>   -XX:SurvivorRatio=4
>   -XX:TargetSurvivorRatio=90
>   -XX:MaxTenuringThreshold=8
>   -XX:+UseConcMarkSweepGC
>   -XX:+UseParNewGC
>   -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
>   -XX:+CMSScavengeBeforeRemark
>   -XX:PretenureSizeThreshold=64m
>   -XX:+UseCMSInitiatingOccupancyOnly
>   -XX:CMSInitiatingOccupancyFraction=50
>   -XX:CMSMaxAbortablePrecleanTime=6000
>   -XX:+CMSParallelRemarkEnabled
>   -XX:+ParallelRefProcEnabled
>   -XX:-CMSConcurrentMTEnabled
> which is taken from bin/solr.in.sh
> I hope this can reduce gc pause time and full gc times.
> And maybe the memory increasing problem will disappear if I’m lucky.
>
> After several day's running, the memory on one of my two servers increased
> to 90% again…
> (When solr is started, the memory used by solr is less than 1G.)
>
> Following is the output of stat -gccause -h5 <pid> 1000:
>
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 9.56   0.00   8.65  91.31  65.89  69379 3076.096 16563 1579.639 4655.735
> Allocation Failure   No GC
> 9.56   0.00  51.10  91.31  65.89  69379 3076.096 16563 1579.639 4655.735
> Allocation Failure   No GC
> 0.00   9.23  10.23  91.35  65.89  69380 3076.135 16563 1579.639 4655.774
> Allocation Failure   No GC
> 7.90   0.00   9.74  91.39  65.89  69381 3076.165 16564 1579.683 4655.848
> CMS Final Remark     No GC
> 7.90   0.00  67.45  91.39  65.89  69381 3076.165 16564 1579.683 4655.848
> CMS Final Remark     No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 0.00   7.48  16.18  91.41  65.89  69382 3076.200 16565 1579.707 4655.908
> CMS Initial Mark     No GC
> 0.00   7.48  73.77  91.41  65.89  69382 3076.200 16565 1579.707 4655.908
> CMS Initial Mark     No GC
> 8.61   0.00  29.86  91.45  65.89  69383 3076.228 16565 1579.707 4655.936
> Allocation Failure   No GC
> 8.61   0.00  90.16  91.45  65.89  69383 3076.228 16565 1579.707 4655.936
> Allocation Failure   No GC
> 0.00   7.46  47.89  91.46  65.89  69384 3076.258 16565 1579.707 4655.966
> Allocation Failure   No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 8.67   0.00  11.98  91.49  65.89  69385 3076.287 16565 1579.707 4655.995
> Allocation Failure   No GC
> 0.00  11.76   9.24  91.54  65.89  69386 3076.321 16566 1579.759 4656.081
> CMS Final Remark     No GC
> 0.00  11.76  64.53  91.54  65.89  69386 3076.321 16566 1579.759 4656.081
> CMS Final Remark     No GC
> 7.25   0.00  20.39  91.57  65.89  69387 3076.358 16567 1579.786 4656.144
> CMS Initial Mark     No GC
> 7.25   0.00  81.56  91.57  65.89  69387 3076.358 16567 1579.786 4656.144
> CMS Initial Mark     No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 0.00   8.05  34.42  91.60  65.89  69388 3076.391 16567 1579.786 4656.177
> Allocation Failure   No GC
> 0.00   8.05  84.17  91.60  65.89  69388 3076.391 16567 1579.786 4656.177
> Allocation Failure   No GC
> 8.54   0.00  55.14  91.62  65.89  69389 3076.420 16567 1579.786 4656.205
> Allocation Failure   No GC
> 0.00   7.74  12.42  91.66  65.89  69390 3076.456 16567 1579.786 4656.242
> Allocation Failure   No GC
> 9.60   0.00  11.00  91.70  65.89  69391 3076.492 16568 1579.841 4656.333
> CMS Final Remark     No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 9.60   0.00  69.24  91.70  65.89  69391 3076.492 16568 1579.841 4656.333
> CMS Final Remark     No GC
> 0.00   8.70  18.21  91.74  65.89  69392 3076.529 16569 1579.870 4656.400
> CMS Initial Mark     No GC
> 0.00   8.70  61.92  91.74  65.89  69392 3076.529 16569 1579.870 4656.400
> CMS Initial Mark     No GC
> 7.36   0.00   3.49  91.77  65.89  69393 3076.570 16569 1579.870 4656.440
> Allocation Failure   No GC
> 7.36   0.00  42.03  91.77  65.89  69393 3076.570 16569 1579.870 4656.440
> Allocation Failure   No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 0.00   9.77   0.00  91.80  65.89  69394 3076.604 16569 1579.870 4656.475
> Allocation Failure   No GC
> 9.08   0.00   9.92  91.82  65.89  69395 3076.632 16570 1579.913 4656.545
> CMS Final Remark     No GC
> 9.08   0.00  58.90  91.82  65.89  69395 3076.632 16570 1579.913 4656.545
> CMS Final Remark     No GC
> 0.00   8.44  16.20  91.86  65.89  69396 3076.664 16571 1579.930 4656.594
> CMS Initial Mark     No GC
> 0.00   8.44  71.95  91.86  65.89  69396 3076.664 16571 1579.930 4656.594
> CMS Initial Mark     No GC
> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>    LGCC                 GCC
> 8.11   0.00  30.59  91.90  65.89  69397 3076.694 16571 1579.930 4656.624
> Allocation Failure   No GC
> 8.11   0.00  93.41  91.90  65.89  69397 3076.694 16571 1579.930 4656.624
> Allocation Failure   No GC
> 0.00   9.77  57.34  91.96  65.89  69398 3076.724 16571 1579.930 4656.654
> Allocation Failure   No GC
>
> Full gc seems can’t free any garbage any more (Or the garbage produced is
> as fast as gc freed?)
> On the other hand, another replication of the collection on another
> server(the collection has two replications)
> uses 40% of old generation memory, and doesn’t trigger so many full gc.
>
>
> Following is the output of eclipse MAT leak suspects:
>
> Problem Suspect 1
>
> 4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by
> "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy
> 3,743,067,520 (64.12%) bytes. These instances are referenced from one
> instance of "java.lang.Object[]", loaded by "<system class loader>"
>
> Keywords
> java.lang.Object[]
> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
> org.apache.lucene.index.SegmentCoreReaders
>
> Details »
> Problem Suspect 2
>
> 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader",
> loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978"
> occupy 970,614,912 (16.63%) bytes. These instances are referenced from one
> instance of "java.lang.Object[]", loaded by "<system class loader>"
>
> Keywords
> java.lang.Object[]
> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
> org.apache.lucene.index.StandardDirectoryReader
>
> Details »
>
>
>
> Class structure in above “Details":
>
> java.lang.Thread @XXX
>   <Java Local> java.util.ArrayList @XXXX
>       elementData java.lang.Object[3141] @XXXX
>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>           …
> a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect
> 1, 2785 in Suspect 2)
>
> Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal?
>
> Thanks.
>
>
>
>
> 在 2015年12月16日，00:44，Erick Erickson <er...@gmail.com> 写道：
>
> Rahul's comments were spot on. You can gain more confidence that this
> is normal if if you try attaching a memory reporting program (jconsole
> is one) you'll see the memory grow for quite a while, then garbage
> collection kicks in and you'll see it drop in a sawtooth pattern.
>
> Best,
> Erick
>
> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <ko...@163.com>
> wrote:
>
> Thank you very much.
> I will try reduce the heap memory and check if the memory still keep
> increasing or not.
>
> 在 2015年12月15日，19:37，Rahul Ramesh <rr...@gmail.com> 写道：
>
> You should actually decrease solr heap size. Let me explain a bit.
>
> Solr requires very less heap memory for its operation and more memory for
> storing data in main memory. This is because solr uses mmap for storing the
> index files.
> Please check the link
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> for
> understanding how solr operates on files .
>
> Solr has typical problem of Garbage collection once you the heap size to a
> large value. It will have indeterminate pauses due to GC. The amount of
> heap memory required is difficult to tell. However the way we tuned this
> parameter is setting it to a low value and increasing it by 1Gb whenever
> OOM is thrown.
>
> Please check the problem of having large Java Heap
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
>
> Just for your reference, in our production setup, we have data of around
> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
> the rest of the memory we will leave it to OS to manage. We do around 1000
> (search + Insert)/second on the data.
>
> I hope this helps.
>
> Regards,
> Rahul
>
>
>
> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <ko...@163.com>
> wrote:
>
> Hi, list
>
> I’m new to solr. Recently I encounter a “memory leak” problem with
> solrcloud.
>
> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
> have
> one collection with about 400k docs. The index size of the collection is
> about
> 500MB. Memory for solr is 16GB.
>
> Following is "ps aux | grep solr” :
>
> /usr/java/jdk1.7.0_67-cloudera/bin/java
>
> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
> -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh
> -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
> -Dcatalina.base=/var/lib/solr/tomcat-deployment
> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
> org.apache.catalina.startup.Bootstrap start
>
>
> solr version is solr4.4.0-cdh5.3.0
> jdk version is 1.7.0_67
>
> Soft commit time is 1.5s. And we have real time indexing/partialupdating
> rate about 100 docs per second.
>
> When fresh started, Solr will use about 500M memory(the memory show in
> solr ui panel).
> After several days running, Solr will meet with long time gc problems, and
> no response to user query.
>
> During solr running, the memory used by solr is keep increasing until some
> large value, and decrease to
> a low level(because of gc), and keep increasing until a larger value
> again, then decrease to a low level again … and keep
> increasing to an more larger value … until solr has no response and i
> restart it.
>
>
> I don’t know how to solve this problem. Can you give me some advices?
>
> Thanks.
>
>
>
>
>
>
>
>
>
>
>
>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Posted by zhenglingyun <ko...@163.com>.

Yes, I do have some custom “Tokenizer"s and “SearchComponent"s.

Here is the screenshot:


The number of opened searchers keeps changing. This time it’s 10.

You mean if my customer SearchComponent open a searcher, it will exceed
the limit set by maxWarmingSearchers? I’ll check that, thanks!

I have to do a short time commit. Our application needs a near real time searching
service. But I’m not sure whether Solr can support NRT search in other ways. Can
you give me some advices?

The value of maxWarmingSearchers is copied from some example configs I think,
I’ll try to set it back to 2.

What can we benefit from set maxWarmingSearchers to a larger value? I don't find
the answer on google and apache-solr-ref-guide.




> 在 2015年12月22日，00:34，Erick Erickson <er...@gmail.com> 写道：
> 
> Do you have any custom components? Indeed, you shouldn't have
> that many searchers open. But could we see a screenshot? That's
> the best way to insure that we're talking about the same thing.
> 
> Your autocommit settings are really hurting you. Your commit interval
> should be as long as you can tolerate. At that kind of commit frequency,
> your caches are of very limited usefulness anyway, so you can pretty
> much shut them off. Every 1.5 seconds, they're invalidated totally.
> 
> Upping maxWarmingSearchers is almost always a mistake. That's
> a safety valve that's there in order to prevent runaway resource
> consumption and almost always means the system is mis-configured.
> I'd put it back to 2 and tune the rest of the system to avoid it rather
> than bumping it up.
> 
> Best,
> Erick
> 
> On Sun, Dec 20, 2015 at 11:43 PM, zhenglingyun <ko...@163.com> wrote:
>> Just now, I see about 40 "Searchers@XXXX main" displayed in Solr Web UI: collection -> Plugins/Stats -> CORE
>> 
>> I think it’s abnormal!
>> 
>> softcommit is set to 1.5s, but warmupTime needs about 3s
>> Does it lead to so many Searchers?
>> 
>> maxWarmingSearchers is set to 4 in my solrconfig.xml,
>> doesn’t it will prevent Solr from creating more than 4 Searchers?
>> 
>> 
>> 
>>> 在 2015年12月21日，14:43，zhenglingyun <ko...@163.com> 写道：
>>> 
>>> Thanks Erick for pointing out the memory change in a sawtooth pattern.
>>> The problem troubles me is that the bottom point of the sawtooth keeps increasing.
>>> And when the used capacity of old generation exceeds the threshold set by CMS’s
>>> CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU cycle
>>> but the used old generation memory does not decrease.
>>> 
>>> After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and
>>> adjust the parameters of JVM from
>>>   -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>   -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70
>>>   -XX:+CMSParallelRemarkEnabled
>>> to
>>>   -XX:NewRatio=3
>>>   -XX:SurvivorRatio=4
>>>   -XX:TargetSurvivorRatio=90
>>>   -XX:MaxTenuringThreshold=8
>>>   -XX:+UseConcMarkSweepGC
>>>   -XX:+UseParNewGC
>>>   -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
>>>   -XX:+CMSScavengeBeforeRemark
>>>   -XX:PretenureSizeThreshold=64m
>>>   -XX:+UseCMSInitiatingOccupancyOnly
>>>   -XX:CMSInitiatingOccupancyFraction=50
>>>   -XX:CMSMaxAbortablePrecleanTime=6000
>>>   -XX:+CMSParallelRemarkEnabled
>>>   -XX:+ParallelRefProcEnabled
>>>   -XX:-CMSConcurrentMTEnabled
>>> which is taken from bin/solr.in.sh
>>> I hope this can reduce gc pause time and full gc times.
>>> And maybe the memory increasing problem will disappear if I’m lucky.
>>> 
>>> After several day's running, the memory on one of my two servers increased to 90% again…
>>> (When solr is started, the memory used by solr is less than 1G.)
>>> 
>>> Following is the output of stat -gccause -h5 <pid> 1000:
>>> 
>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>> 9.56   0.00   8.65  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
>>> 9.56   0.00  51.10  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
>>> 0.00   9.23  10.23  91.35  65.89  69380 3076.135 16563 1579.639 4655.774 Allocation Failure   No GC
>>> 7.90   0.00   9.74  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
>>> 7.90   0.00  67.45  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>> 0.00   7.48  16.18  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
>>> 0.00   7.48  73.77  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
>>> 8.61   0.00  29.86  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
>>> 8.61   0.00  90.16  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
>>> 0.00   7.46  47.89  91.46  65.89  69384 3076.258 16565 1579.707 4655.966 Allocation Failure   No GC
>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>> 8.67   0.00  11.98  91.49  65.89  69385 3076.287 16565 1579.707 4655.995 Allocation Failure   No GC
>>> 0.00  11.76   9.24  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
>>> 0.00  11.76  64.53  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
>>> 7.25   0.00  20.39  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
>>> 7.25   0.00  81.56  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>> 0.00   8.05  34.42  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
>>> 0.00   8.05  84.17  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
>>> 8.54   0.00  55.14  91.62  65.89  69389 3076.420 16567 1579.786 4656.205 Allocation Failure   No GC
>>> 0.00   7.74  12.42  91.66  65.89  69390 3076.456 16567 1579.786 4656.242 Allocation Failure   No GC
>>> 9.60   0.00  11.00  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>> 9.60   0.00  69.24  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
>>> 0.00   8.70  18.21  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
>>> 0.00   8.70  61.92  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
>>> 7.36   0.00   3.49  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
>>> 7.36   0.00  42.03  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>> 0.00   9.77   0.00  91.80  65.89  69394 3076.604 16569 1579.870 4656.475 Allocation Failure   No GC
>>> 9.08   0.00   9.92  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
>>> 9.08   0.00  58.90  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
>>> 0.00   8.44  16.20  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
>>> 0.00   8.44  71.95  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
>>> S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>> 8.11   0.00  30.59  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
>>> 8.11   0.00  93.41  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
>>> 0.00   9.77  57.34  91.96  65.89  69398 3076.724 16571 1579.930 4656.654 Allocation Failure   No GC
>>> 
>>> Full gc seems can’t free any garbage any more (Or the garbage produced is as fast as gc freed?)
>>> On the other hand, another replication of the collection on another server(the collection has two replications)
>>> uses 40% of old generation memory, and doesn’t trigger so many full gc.
>>> 
>>> 
>>> Following is the output of eclipse MAT leak suspects:
>>> 
>>> Problem Suspect 1
>>> 
>>> 4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 3,743,067,520 (64.12%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"
>>> 
>>> Keywords
>>> java.lang.Object[]
>>> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
>>> org.apache.lucene.index.SegmentCoreReaders
>>> 
>>> Details »
>>> Problem Suspect 2
>>> 
>>> 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 970,614,912 (16.63%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"
>>> 
>>> Keywords
>>> java.lang.Object[]
>>> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
>>> org.apache.lucene.index.StandardDirectoryReader
>>> 
>>> Details »
>>> 
>>> 
>>> 
>>> Class structure in above “Details":
>>> 
>>> java.lang.Thread @XXX
>>>   <Java Local> java.util.ArrayList @XXXX
>>>       elementData java.lang.Object[3141] @XXXX
>>>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>>           org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>>           …
>>> a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect 1, 2785 in Suspect 2)
>>> 
>>> Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal?
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>>> 
>>>> 在 2015年12月16日，00:44，Erick Erickson <er...@gmail.com> 写道：
>>>> 
>>>> Rahul's comments were spot on. You can gain more confidence that this
>>>> is normal if if you try attaching a memory reporting program (jconsole
>>>> is one) you'll see the memory grow for quite a while, then garbage
>>>> collection kicks in and you'll see it drop in a sawtooth pattern.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <ko...@163.com> wrote:
>>>>> Thank you very much.
>>>>> I will try reduce the heap memory and check if the memory still keep increasing or not.
>>>>> 
>>>>>> 在 2015年12月15日，19:37，Rahul Ramesh <rr...@gmail.com> 写道：
>>>>>> 
>>>>>> You should actually decrease solr heap size. Let me explain a bit.
>>>>>> 
>>>>>> Solr requires very less heap memory for its operation and more memory for
>>>>>> storing data in main memory. This is because solr uses mmap for storing the
>>>>>> index files.
>>>>>> Please check the link
>>>>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
>>>>>> understanding how solr operates on files .
>>>>>> 
>>>>>> Solr has typical problem of Garbage collection once you the heap size to a
>>>>>> large value. It will have indeterminate pauses due to GC. The amount of
>>>>>> heap memory required is difficult to tell. However the way we tuned this
>>>>>> parameter is setting it to a low value and increasing it by 1Gb whenever
>>>>>> OOM is thrown.
>>>>>> 
>>>>>> Please check the problem of having large Java Heap
>>>>>> 
>>>>>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>>>>> 
>>>>>> 
>>>>>> Just for your reference, in our production setup, we have data of around
>>>>>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
>>>>>> the rest of the memory we will leave it to OS to manage. We do around 1000
>>>>>> (search + Insert)/second on the data.
>>>>>> 
>>>>>> I hope this helps.
>>>>>> 
>>>>>> Regards,
>>>>>> Rahul
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <ko...@163.com> wrote:
>>>>>> 
>>>>>>> Hi, list
>>>>>>> 
>>>>>>> I’m new to solr. Recently I encounter a “memory leak” problem with
>>>>>>> solrcloud.
>>>>>>> 
>>>>>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>>>>>>> have
>>>>>>> one collection with about 400k docs. The index size of the collection is
>>>>>>> about
>>>>>>> 500MB. Memory for solr is 16GB.
>>>>>>> 
>>>>>>> Following is "ps aux | grep solr” :
>>>>>>> 
>>>>>>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>>>>>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>>>>>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>>>> -Xloggc:/var/log/solr/gc.log
>>>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>>>> -Xloggc:/var/log/solr/gc.log
>>>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>>>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>>>>>>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>>>>>>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>>>>>>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
>>>>>>> org.apache.catalina.startup.Bootstrap start
>>>>>>> 
>>>>>>> 
>>>>>>> solr version is solr4.4.0-cdh5.3.0
>>>>>>> jdk version is 1.7.0_67
>>>>>>> 
>>>>>>> Soft commit time is 1.5s. And we have real time indexing/partialupdating
>>>>>>> rate about 100 docs per second.
>>>>>>> 
>>>>>>> When fresh started, Solr will use about 500M memory(the memory show in
>>>>>>> solr ui panel).
>>>>>>> After several days running, Solr will meet with long time gc problems, and
>>>>>>> no response to user query.
>>>>>>> 
>>>>>>> During solr running, the memory used by solr is keep increasing until some
>>>>>>> large value, and decrease to
>>>>>>> a low level(because of gc), and keep increasing until a larger value
>>>>>>> again, then decrease to a low level again … and keep
>>>>>>> increasing to an more larger value … until solr has no response and i
>>>>>>> restart it.
>>>>>>> 
>>>>>>> 
>>>>>>> I don’t know how to solve this problem. Can you give me some advices?
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 
>>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Posted by Erick Erickson <er...@gmail.com>.

Do you have any custom components? Indeed, you shouldn't have
that many searchers open. But could we see a screenshot? That's
the best way to insure that we're talking about the same thing.

Your autocommit settings are really hurting you. Your commit interval
should be as long as you can tolerate. At that kind of commit frequency,
your caches are of very limited usefulness anyway, so you can pretty
much shut them off. Every 1.5 seconds, they're invalidated totally.

Upping maxWarmingSearchers is almost always a mistake. That's
a safety valve that's there in order to prevent runaway resource
consumption and almost always means the system is mis-configured.
I'd put it back to 2 and tune the rest of the system to avoid it rather
than bumping it up.

Best,
Erick

On Sun, Dec 20, 2015 at 11:43 PM, zhenglingyun <ko...@163.com> wrote:
> Just now, I see about 40 "Searchers@XXXX main" displayed in Solr Web UI: collection -> Plugins/Stats -> CORE
>
> I think it’s abnormal!
>
> softcommit is set to 1.5s, but warmupTime needs about 3s
> Does it lead to so many Searchers?
>
> maxWarmingSearchers is set to 4 in my solrconfig.xml,
> doesn’t it will prevent Solr from creating more than 4 Searchers?
>
>
>
>> 在 2015年12月21日，14:43，zhenglingyun <ko...@163.com> 写道：
>>
>> Thanks Erick for pointing out the memory change in a sawtooth pattern.
>> The problem troubles me is that the bottom point of the sawtooth keeps increasing.
>> And when the used capacity of old generation exceeds the threshold set by CMS’s
>> CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU cycle
>> but the used old generation memory does not decrease.
>>
>> After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and
>> adjust the parameters of JVM from
>>    -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>    -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70
>>    -XX:+CMSParallelRemarkEnabled
>> to
>>    -XX:NewRatio=3
>>    -XX:SurvivorRatio=4
>>    -XX:TargetSurvivorRatio=90
>>    -XX:MaxTenuringThreshold=8
>>    -XX:+UseConcMarkSweepGC
>>    -XX:+UseParNewGC
>>    -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
>>    -XX:+CMSScavengeBeforeRemark
>>    -XX:PretenureSizeThreshold=64m
>>    -XX:+UseCMSInitiatingOccupancyOnly
>>    -XX:CMSInitiatingOccupancyFraction=50
>>    -XX:CMSMaxAbortablePrecleanTime=6000
>>    -XX:+CMSParallelRemarkEnabled
>>    -XX:+ParallelRefProcEnabled
>>    -XX:-CMSConcurrentMTEnabled
>> which is taken from bin/solr.in.sh
>> I hope this can reduce gc pause time and full gc times.
>> And maybe the memory increasing problem will disappear if I’m lucky.
>>
>> After several day's running, the memory on one of my two servers increased to 90% again…
>> (When solr is started, the memory used by solr is less than 1G.)
>>
>> Following is the output of stat -gccause -h5 <pid> 1000:
>>
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>  9.56   0.00   8.65  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
>>  9.56   0.00  51.10  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
>>  0.00   9.23  10.23  91.35  65.89  69380 3076.135 16563 1579.639 4655.774 Allocation Failure   No GC
>>  7.90   0.00   9.74  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
>>  7.90   0.00  67.45  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>  0.00   7.48  16.18  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
>>  0.00   7.48  73.77  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
>>  8.61   0.00  29.86  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
>>  8.61   0.00  90.16  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
>>  0.00   7.46  47.89  91.46  65.89  69384 3076.258 16565 1579.707 4655.966 Allocation Failure   No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>  8.67   0.00  11.98  91.49  65.89  69385 3076.287 16565 1579.707 4655.995 Allocation Failure   No GC
>>  0.00  11.76   9.24  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
>>  0.00  11.76  64.53  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
>>  7.25   0.00  20.39  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
>>  7.25   0.00  81.56  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>  0.00   8.05  34.42  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
>>  0.00   8.05  84.17  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
>>  8.54   0.00  55.14  91.62  65.89  69389 3076.420 16567 1579.786 4656.205 Allocation Failure   No GC
>>  0.00   7.74  12.42  91.66  65.89  69390 3076.456 16567 1579.786 4656.242 Allocation Failure   No GC
>>  9.60   0.00  11.00  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>  9.60   0.00  69.24  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
>>  0.00   8.70  18.21  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
>>  0.00   8.70  61.92  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
>>  7.36   0.00   3.49  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
>>  7.36   0.00  42.03  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>  0.00   9.77   0.00  91.80  65.89  69394 3076.604 16569 1579.870 4656.475 Allocation Failure   No GC
>>  9.08   0.00   9.92  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
>>  9.08   0.00  58.90  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
>>  0.00   8.44  16.20  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
>>  0.00   8.44  71.95  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
>>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>>  8.11   0.00  30.59  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
>>  8.11   0.00  93.41  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
>>  0.00   9.77  57.34  91.96  65.89  69398 3076.724 16571 1579.930 4656.654 Allocation Failure   No GC
>>
>> Full gc seems can’t free any garbage any more (Or the garbage produced is as fast as gc freed?)
>> On the other hand, another replication of the collection on another server(the collection has two replications)
>> uses 40% of old generation memory, and doesn’t trigger so many full gc.
>>
>>
>> Following is the output of eclipse MAT leak suspects:
>>
>>  Problem Suspect 1
>>
>> 4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 3,743,067,520 (64.12%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"
>>
>> Keywords
>> java.lang.Object[]
>> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
>> org.apache.lucene.index.SegmentCoreReaders
>>
>> Details »
>>  Problem Suspect 2
>>
>> 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 970,614,912 (16.63%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"
>>
>> Keywords
>> java.lang.Object[]
>> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
>> org.apache.lucene.index.StandardDirectoryReader
>>
>> Details »
>>
>>
>>
>> Class structure in above “Details":
>>
>> java.lang.Thread @XXX
>>    <Java Local> java.util.ArrayList @XXXX
>>        elementData java.lang.Object[3141] @XXXX
>>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>>            …
>> a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect 1, 2785 in Suspect 2)
>>
>> Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal?
>>
>> Thanks.
>>
>>
>>
>>
>>> 在 2015年12月16日，00:44，Erick Erickson <er...@gmail.com> 写道：
>>>
>>> Rahul's comments were spot on. You can gain more confidence that this
>>> is normal if if you try attaching a memory reporting program (jconsole
>>> is one) you'll see the memory grow for quite a while, then garbage
>>> collection kicks in and you'll see it drop in a sawtooth pattern.
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <ko...@163.com> wrote:
>>>> Thank you very much.
>>>> I will try reduce the heap memory and check if the memory still keep increasing or not.
>>>>
>>>>> 在 2015年12月15日，19:37，Rahul Ramesh <rr...@gmail.com> 写道：
>>>>>
>>>>> You should actually decrease solr heap size. Let me explain a bit.
>>>>>
>>>>> Solr requires very less heap memory for its operation and more memory for
>>>>> storing data in main memory. This is because solr uses mmap for storing the
>>>>> index files.
>>>>> Please check the link
>>>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
>>>>> understanding how solr operates on files .
>>>>>
>>>>> Solr has typical problem of Garbage collection once you the heap size to a
>>>>> large value. It will have indeterminate pauses due to GC. The amount of
>>>>> heap memory required is difficult to tell. However the way we tuned this
>>>>> parameter is setting it to a low value and increasing it by 1Gb whenever
>>>>> OOM is thrown.
>>>>>
>>>>> Please check the problem of having large Java Heap
>>>>>
>>>>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>>>>
>>>>>
>>>>> Just for your reference, in our production setup, we have data of around
>>>>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
>>>>> the rest of the memory we will leave it to OS to manage. We do around 1000
>>>>> (search + Insert)/second on the data.
>>>>>
>>>>> I hope this helps.
>>>>>
>>>>> Regards,
>>>>> Rahul
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <ko...@163.com> wrote:
>>>>>
>>>>>> Hi, list
>>>>>>
>>>>>> I’m new to solr. Recently I encounter a “memory leak” problem with
>>>>>> solrcloud.
>>>>>>
>>>>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>>>>>> have
>>>>>> one collection with about 400k docs. The index size of the collection is
>>>>>> about
>>>>>> 500MB. Memory for solr is 16GB.
>>>>>>
>>>>>> Following is "ps aux | grep solr” :
>>>>>>
>>>>>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>>>>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>>>>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>>> -Xloggc:/var/log/solr/gc.log
>>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>>> -Xloggc:/var/log/solr/gc.log
>>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>>>>>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>>>>>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>>>>>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
>>>>>> org.apache.catalina.startup.Bootstrap start
>>>>>>
>>>>>>
>>>>>> solr version is solr4.4.0-cdh5.3.0
>>>>>> jdk version is 1.7.0_67
>>>>>>
>>>>>> Soft commit time is 1.5s. And we have real time indexing/partialupdating
>>>>>> rate about 100 docs per second.
>>>>>>
>>>>>> When fresh started, Solr will use about 500M memory(the memory show in
>>>>>> solr ui panel).
>>>>>> After several days running, Solr will meet with long time gc problems, and
>>>>>> no response to user query.
>>>>>>
>>>>>> During solr running, the memory used by solr is keep increasing until some
>>>>>> large value, and decrease to
>>>>>> a low level(because of gc), and keep increasing until a larger value
>>>>>> again, then decrease to a low level again … and keep
>>>>>> increasing to an more larger value … until solr has no response and i
>>>>>> restart it.
>>>>>>
>>>>>>
>>>>>> I don’t know how to solve this problem. Can you give me some advices?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>
>
>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Posted by zhenglingyun <ko...@163.com>.

Just now, I see about 40 "Searchers@XXXX main" displayed in Solr Web UI: collection -> Plugins/Stats -> CORE

I think it’s abnormal!

softcommit is set to 1.5s, but warmupTime needs about 3s
Does it lead to so many Searchers?

maxWarmingSearchers is set to 4 in my solrconfig.xml,
doesn’t it will prevent Solr from creating more than 4 Searchers?



> 在 2015年12月21日，14:43，zhenglingyun <ko...@163.com> 写道：
> 
> Thanks Erick for pointing out the memory change in a sawtooth pattern.
> The problem troubles me is that the bottom point of the sawtooth keeps increasing.
> And when the used capacity of old generation exceeds the threshold set by CMS’s
> CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU cycle
> but the used old generation memory does not decrease.
> 
> After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and
> adjust the parameters of JVM from
>    -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>    -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70
>    -XX:+CMSParallelRemarkEnabled
> to
>    -XX:NewRatio=3
>    -XX:SurvivorRatio=4
>    -XX:TargetSurvivorRatio=90
>    -XX:MaxTenuringThreshold=8
>    -XX:+UseConcMarkSweepGC
>    -XX:+UseParNewGC
>    -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
>    -XX:+CMSScavengeBeforeRemark
>    -XX:PretenureSizeThreshold=64m
>    -XX:+UseCMSInitiatingOccupancyOnly
>    -XX:CMSInitiatingOccupancyFraction=50
>    -XX:CMSMaxAbortablePrecleanTime=6000
>    -XX:+CMSParallelRemarkEnabled
>    -XX:+ParallelRefProcEnabled
>    -XX:-CMSConcurrentMTEnabled
> which is taken from bin/solr.in.sh
> I hope this can reduce gc pause time and full gc times.
> And maybe the memory increasing problem will disappear if I’m lucky.
> 
> After several day's running, the memory on one of my two servers increased to 90% again…
> (When solr is started, the memory used by solr is less than 1G.)
> 
> Following is the output of stat -gccause -h5 <pid> 1000:
> 
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>  9.56   0.00   8.65  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
>  9.56   0.00  51.10  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
>  0.00   9.23  10.23  91.35  65.89  69380 3076.135 16563 1579.639 4655.774 Allocation Failure   No GC
>  7.90   0.00   9.74  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
>  7.90   0.00  67.45  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>  0.00   7.48  16.18  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
>  0.00   7.48  73.77  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
>  8.61   0.00  29.86  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
>  8.61   0.00  90.16  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
>  0.00   7.46  47.89  91.46  65.89  69384 3076.258 16565 1579.707 4655.966 Allocation Failure   No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>  8.67   0.00  11.98  91.49  65.89  69385 3076.287 16565 1579.707 4655.995 Allocation Failure   No GC
>  0.00  11.76   9.24  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
>  0.00  11.76  64.53  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
>  7.25   0.00  20.39  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
>  7.25   0.00  81.56  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>  0.00   8.05  34.42  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
>  0.00   8.05  84.17  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
>  8.54   0.00  55.14  91.62  65.89  69389 3076.420 16567 1579.786 4656.205 Allocation Failure   No GC
>  0.00   7.74  12.42  91.66  65.89  69390 3076.456 16567 1579.786 4656.242 Allocation Failure   No GC
>  9.60   0.00  11.00  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>  9.60   0.00  69.24  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
>  0.00   8.70  18.21  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
>  0.00   8.70  61.92  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
>  7.36   0.00   3.49  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
>  7.36   0.00  42.03  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>  0.00   9.77   0.00  91.80  65.89  69394 3076.604 16569 1579.870 4656.475 Allocation Failure   No GC
>  9.08   0.00   9.92  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
>  9.08   0.00  58.90  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
>  0.00   8.44  16.20  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
>  0.00   8.44  71.95  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
>  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
>  8.11   0.00  30.59  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
>  8.11   0.00  93.41  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
>  0.00   9.77  57.34  91.96  65.89  69398 3076.724 16571 1579.930 4656.654 Allocation Failure   No GC
> 
> Full gc seems can’t free any garbage any more (Or the garbage produced is as fast as gc freed?)
> On the other hand, another replication of the collection on another server(the collection has two replications)
> uses 40% of old generation memory, and doesn’t trigger so many full gc.
> 
> 
> Following is the output of eclipse MAT leak suspects:
> 
>  Problem Suspect 1
> 
> 4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 3,743,067,520 (64.12%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"
> 
> Keywords
> java.lang.Object[]
> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
> org.apache.lucene.index.SegmentCoreReaders
> 
> Details »
>  Problem Suspect 2
> 
> 2,815 instances of "org.apache.lucene.index.StandardDirectoryReader", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 970,614,912 (16.63%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"
> 
> Keywords
> java.lang.Object[]
> org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
> org.apache.lucene.index.StandardDirectoryReader
> 
> Details »
> 
> 
> 
> Class structure in above “Details":
> 
> java.lang.Thread @XXX
>    <Java Local> java.util.ArrayList @XXXX
>        elementData java.lang.Object[3141] @XXXX
>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
>            …
> a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect 1, 2785 in Suspect 2)
> 
> Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal?
> 
> Thanks.
> 
> 
> 
> 
>> 在 2015年12月16日，00:44，Erick Erickson <er...@gmail.com> 写道：
>> 
>> Rahul's comments were spot on. You can gain more confidence that this
>> is normal if if you try attaching a memory reporting program (jconsole
>> is one) you'll see the memory grow for quite a while, then garbage
>> collection kicks in and you'll see it drop in a sawtooth pattern.
>> 
>> Best,
>> Erick
>> 
>> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <ko...@163.com> wrote:
>>> Thank you very much.
>>> I will try reduce the heap memory and check if the memory still keep increasing or not.
>>> 
>>>> 在 2015年12月15日，19:37，Rahul Ramesh <rr...@gmail.com> 写道：
>>>> 
>>>> You should actually decrease solr heap size. Let me explain a bit.
>>>> 
>>>> Solr requires very less heap memory for its operation and more memory for
>>>> storing data in main memory. This is because solr uses mmap for storing the
>>>> index files.
>>>> Please check the link
>>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
>>>> understanding how solr operates on files .
>>>> 
>>>> Solr has typical problem of Garbage collection once you the heap size to a
>>>> large value. It will have indeterminate pauses due to GC. The amount of
>>>> heap memory required is difficult to tell. However the way we tuned this
>>>> parameter is setting it to a low value and increasing it by 1Gb whenever
>>>> OOM is thrown.
>>>> 
>>>> Please check the problem of having large Java Heap
>>>> 
>>>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>>> 
>>>> 
>>>> Just for your reference, in our production setup, we have data of around
>>>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
>>>> the rest of the memory we will leave it to OS to manage. We do around 1000
>>>> (search + Insert)/second on the data.
>>>> 
>>>> I hope this helps.
>>>> 
>>>> Regards,
>>>> Rahul
>>>> 
>>>> 
>>>> 
>>>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <ko...@163.com> wrote:
>>>> 
>>>>> Hi, list
>>>>> 
>>>>> I’m new to solr. Recently I encounter a “memory leak” problem with
>>>>> solrcloud.
>>>>> 
>>>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>>>>> have
>>>>> one collection with about 400k docs. The index size of the collection is
>>>>> about
>>>>> 500MB. Memory for solr is 16GB.
>>>>> 
>>>>> Following is "ps aux | grep solr” :
>>>>> 
>>>>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>>>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>>>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>> -Xloggc:/var/log/solr/gc.log
>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>>> -Xloggc:/var/log/solr/gc.log
>>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>>>>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>>>>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>>>>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
>>>>> org.apache.catalina.startup.Bootstrap start
>>>>> 
>>>>> 
>>>>> solr version is solr4.4.0-cdh5.3.0
>>>>> jdk version is 1.7.0_67
>>>>> 
>>>>> Soft commit time is 1.5s. And we have real time indexing/partialupdating
>>>>> rate about 100 docs per second.
>>>>> 
>>>>> When fresh started, Solr will use about 500M memory(the memory show in
>>>>> solr ui panel).
>>>>> After several days running, Solr will meet with long time gc problems, and
>>>>> no response to user query.
>>>>> 
>>>>> During solr running, the memory used by solr is keep increasing until some
>>>>> large value, and decrease to
>>>>> a low level(because of gc), and keep increasing until a larger value
>>>>> again, then decrease to a low level again … and keep
>>>>> increasing to an more larger value … until solr has no response and i
>>>>> restart it.
>>>>> 
>>>>> 
>>>>> I don’t know how to solve this problem. Can you give me some advices?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 
>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Posted by zhenglingyun <ko...@163.com>.

Thanks Erick for pointing out the memory change in a sawtooth pattern.
The problem troubles me is that the bottom point of the sawtooth keeps increasing.
And when the used capacity of old generation exceeds the threshold set by CMS’s
CMSInitiatingOccupancyFraction, gc keeps running and uses a lot of CPU cycle
but the used old generation memory does not decrease.

After I take Rahul’s advice, I decrease the Xms and Xmx from 16G to 8G, and
adjust the parameters of JVM from
    -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
    -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70
    -XX:+CMSParallelRemarkEnabled
to
    -XX:NewRatio=3
    -XX:SurvivorRatio=4
    -XX:TargetSurvivorRatio=90
    -XX:MaxTenuringThreshold=8
    -XX:+UseConcMarkSweepGC
    -XX:+UseParNewGC
    -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
    -XX:+CMSScavengeBeforeRemark
    -XX:PretenureSizeThreshold=64m
    -XX:+UseCMSInitiatingOccupancyOnly
    -XX:CMSInitiatingOccupancyFraction=50
    -XX:CMSMaxAbortablePrecleanTime=6000
    -XX:+CMSParallelRemarkEnabled
    -XX:+ParallelRefProcEnabled
    -XX:-CMSConcurrentMTEnabled
which is taken from bin/solr.in.sh
I hope this can reduce gc pause time and full gc times.
And maybe the memory increasing problem will disappear if I’m lucky.

After several day's running, the memory on one of my two servers increased to 90% again…
(When solr is started, the memory used by solr is less than 1G.)

Following is the output of stat -gccause -h5 <pid> 1000:

  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  9.56   0.00   8.65  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
  9.56   0.00  51.10  91.31  65.89  69379 3076.096 16563 1579.639 4655.735 Allocation Failure   No GC
  0.00   9.23  10.23  91.35  65.89  69380 3076.135 16563 1579.639 4655.774 Allocation Failure   No GC
  7.90   0.00   9.74  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
  7.90   0.00  67.45  91.39  65.89  69381 3076.165 16564 1579.683 4655.848 CMS Final Remark     No GC
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  0.00   7.48  16.18  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
  0.00   7.48  73.77  91.41  65.89  69382 3076.200 16565 1579.707 4655.908 CMS Initial Mark     No GC
  8.61   0.00  29.86  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
  8.61   0.00  90.16  91.45  65.89  69383 3076.228 16565 1579.707 4655.936 Allocation Failure   No GC
  0.00   7.46  47.89  91.46  65.89  69384 3076.258 16565 1579.707 4655.966 Allocation Failure   No GC
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  8.67   0.00  11.98  91.49  65.89  69385 3076.287 16565 1579.707 4655.995 Allocation Failure   No GC
  0.00  11.76   9.24  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
  0.00  11.76  64.53  91.54  65.89  69386 3076.321 16566 1579.759 4656.081 CMS Final Remark     No GC
  7.25   0.00  20.39  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
  7.25   0.00  81.56  91.57  65.89  69387 3076.358 16567 1579.786 4656.144 CMS Initial Mark     No GC
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  0.00   8.05  34.42  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
  0.00   8.05  84.17  91.60  65.89  69388 3076.391 16567 1579.786 4656.177 Allocation Failure   No GC
  8.54   0.00  55.14  91.62  65.89  69389 3076.420 16567 1579.786 4656.205 Allocation Failure   No GC
  0.00   7.74  12.42  91.66  65.89  69390 3076.456 16567 1579.786 4656.242 Allocation Failure   No GC
  9.60   0.00  11.00  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  9.60   0.00  69.24  91.70  65.89  69391 3076.492 16568 1579.841 4656.333 CMS Final Remark     No GC
  0.00   8.70  18.21  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
  0.00   8.70  61.92  91.74  65.89  69392 3076.529 16569 1579.870 4656.400 CMS Initial Mark     No GC
  7.36   0.00   3.49  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
  7.36   0.00  42.03  91.77  65.89  69393 3076.570 16569 1579.870 4656.440 Allocation Failure   No GC
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  0.00   9.77   0.00  91.80  65.89  69394 3076.604 16569 1579.870 4656.475 Allocation Failure   No GC
  9.08   0.00   9.92  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
  9.08   0.00  58.90  91.82  65.89  69395 3076.632 16570 1579.913 4656.545 CMS Final Remark     No GC
  0.00   8.44  16.20  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
  0.00   8.44  71.95  91.86  65.89  69396 3076.664 16571 1579.930 4656.594 CMS Initial Mark     No GC
  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC
  8.11   0.00  30.59  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
  8.11   0.00  93.41  91.90  65.89  69397 3076.694 16571 1579.930 4656.624 Allocation Failure   No GC
  0.00   9.77  57.34  91.96  65.89  69398 3076.724 16571 1579.930 4656.654 Allocation Failure   No GC

Full gc seems can’t free any garbage any more (Or the garbage produced is as fast as gc freed?)
On the other hand, another replication of the collection on another server(the collection has two replications)
uses 40% of old generation memory, and doesn’t trigger so many full gc.


Following is the output of eclipse MAT leak suspects:

  Problem Suspect 1

4,741 instances of "org.apache.lucene.index.SegmentCoreReaders", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 3,743,067,520 (64.12%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"

Keywords
java.lang.Object[]
org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
org.apache.lucene.index.SegmentCoreReaders

Details »
  Problem Suspect 2

2,815 instances of "org.apache.lucene.index.StandardDirectoryReader", loaded by "org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978" occupy 970,614,912 (16.63%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"

Keywords
java.lang.Object[]
org.apache.catalina.loader.WebappClassLoader @ 0x67d8ed978
org.apache.lucene.index.StandardDirectoryReader

Details »



Class structure in above “Details":

java.lang.Thread @XXX
    <Java Local> java.util.ArrayList @XXXX
        elementData java.lang.Object[3141] @XXXX
            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
            org.apache.lucene.search.FieldCache$CacheEntry @XXXX
            …
a lot of org.apache.lucene.search.FieldCache$CacheEntry (1205 in Suspect 1, 2785 in Suspect 2)

Does these lots of org.apache.lucene.search.FieldCache$CacheEntry normal?

Thanks.




> 在 2015年12月16日，00:44，Erick Erickson <er...@gmail.com> 写道：
> 
> Rahul's comments were spot on. You can gain more confidence that this
> is normal if if you try attaching a memory reporting program (jconsole
> is one) you'll see the memory grow for quite a while, then garbage
> collection kicks in and you'll see it drop in a sawtooth pattern.
> 
> Best,
> Erick
> 
> On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <ko...@163.com> wrote:
>> Thank you very much.
>> I will try reduce the heap memory and check if the memory still keep increasing or not.
>> 
>>> 在 2015年12月15日，19:37，Rahul Ramesh <rr...@gmail.com> 写道：
>>> 
>>> You should actually decrease solr heap size. Let me explain a bit.
>>> 
>>> Solr requires very less heap memory for its operation and more memory for
>>> storing data in main memory. This is because solr uses mmap for storing the
>>> index files.
>>> Please check the link
>>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
>>> understanding how solr operates on files .
>>> 
>>> Solr has typical problem of Garbage collection once you the heap size to a
>>> large value. It will have indeterminate pauses due to GC. The amount of
>>> heap memory required is difficult to tell. However the way we tuned this
>>> parameter is setting it to a low value and increasing it by 1Gb whenever
>>> OOM is thrown.
>>> 
>>> Please check the problem of having large Java Heap
>>> 
>>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>> 
>>> 
>>> Just for your reference, in our production setup, we have data of around
>>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
>>> the rest of the memory we will leave it to OS to manage. We do around 1000
>>> (search + Insert)/second on the data.
>>> 
>>> I hope this helps.
>>> 
>>> Regards,
>>> Rahul
>>> 
>>> 
>>> 
>>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <ko...@163.com> wrote:
>>> 
>>>> Hi, list
>>>> 
>>>> I’m new to solr. Recently I encounter a “memory leak” problem with
>>>> solrcloud.
>>>> 
>>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>>>> have
>>>> one collection with about 400k docs. The index size of the collection is
>>>> about
>>>> 500MB. Memory for solr is 16GB.
>>>> 
>>>> Following is "ps aux | grep solr” :
>>>> 
>>>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>> -Xloggc:/var/log/solr/gc.log
>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>>> -Xloggc:/var/log/solr/gc.log
>>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>>> -Dsolr.security.proxyuser.hue.hosts=*
>>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>>>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>>>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>>>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
>>>> org.apache.catalina.startup.Bootstrap start
>>>> 
>>>> 
>>>> solr version is solr4.4.0-cdh5.3.0
>>>> jdk version is 1.7.0_67
>>>> 
>>>> Soft commit time is 1.5s. And we have real time indexing/partialupdating
>>>> rate about 100 docs per second.
>>>> 
>>>> When fresh started, Solr will use about 500M memory(the memory show in
>>>> solr ui panel).
>>>> After several days running, Solr will meet with long time gc problems, and
>>>> no response to user query.
>>>> 
>>>> During solr running, the memory used by solr is keep increasing until some
>>>> large value, and decrease to
>>>> a low level(because of gc), and keep increasing until a larger value
>>>> again, then decrease to a low level again … and keep
>>>> increasing to an more larger value … until solr has no response and i
>>>> restart it.
>>>> 
>>>> 
>>>> I don’t know how to solve this problem. Can you give me some advices?
>>>> 
>>>> Thanks.
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Posted by Erick Erickson <er...@gmail.com>.

Rahul's comments were spot on. You can gain more confidence that this
is normal if if you try attaching a memory reporting program (jconsole
is one) you'll see the memory grow for quite a while, then garbage
collection kicks in and you'll see it drop in a sawtooth pattern.

Best,
Erick

On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun <ko...@163.com> wrote:
> Thank you very much.
> I will try reduce the heap memory and check if the memory still keep increasing or not.
>
>> 在 2015年12月15日，19:37，Rahul Ramesh <rr...@gmail.com> 写道：
>>
>> You should actually decrease solr heap size. Let me explain a bit.
>>
>> Solr requires very less heap memory for its operation and more memory for
>> storing data in main memory. This is because solr uses mmap for storing the
>> index files.
>> Please check the link
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
>> understanding how solr operates on files .
>>
>> Solr has typical problem of Garbage collection once you the heap size to a
>> large value. It will have indeterminate pauses due to GC. The amount of
>> heap memory required is difficult to tell. However the way we tuned this
>> parameter is setting it to a low value and increasing it by 1Gb whenever
>> OOM is thrown.
>>
>> Please check the problem of having large Java Heap
>>
>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>
>>
>> Just for your reference, in our production setup, we have data of around
>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
>> the rest of the memory we will leave it to OS to manage. We do around 1000
>> (search + Insert)/second on the data.
>>
>> I hope this helps.
>>
>> Regards,
>> Rahul
>>
>>
>>
>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <ko...@163.com> wrote:
>>
>>> Hi, list
>>>
>>> I’m new to solr. Recently I encounter a “memory leak” problem with
>>> solrcloud.
>>>
>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>>> have
>>> one collection with about 400k docs. The index size of the collection is
>>> about
>>> 500MB. Memory for solr is 16GB.
>>>
>>> Following is "ps aux | grep solr” :
>>>
>>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>> -Xloggc:/var/log/solr/gc.log
>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>> -Dsolr.security.proxyuser.hue.hosts=*
>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>> -Xloggc:/var/log/solr/gc.log
>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>> -Dsolr.security.proxyuser.hue.hosts=*
>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
>>> org.apache.catalina.startup.Bootstrap start
>>>
>>>
>>> solr version is solr4.4.0-cdh5.3.0
>>> jdk version is 1.7.0_67
>>>
>>> Soft commit time is 1.5s. And we have real time indexing/partialupdating
>>> rate about 100 docs per second.
>>>
>>> When fresh started, Solr will use about 500M memory(the memory show in
>>> solr ui panel).
>>> After several days running, Solr will meet with long time gc problems, and
>>> no response to user query.
>>>
>>> During solr running, the memory used by solr is keep increasing until some
>>> large value, and decrease to
>>> a low level(because of gc), and keep increasing until a larger value
>>> again, then decrease to a low level again … and keep
>>> increasing to an more larger value … until solr has no response and i
>>> restart it.
>>>
>>>
>>> I don’t know how to solve this problem. Can you give me some advices?
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>
>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Posted by zhenglingyun <ko...@163.com>.

Thank you very much.
I will try reduce the heap memory and check if the memory still keep increasing or not.

> 在 2015年12月15日，19:37，Rahul Ramesh <rr...@gmail.com> 写道：
> 
> You should actually decrease solr heap size. Let me explain a bit.
> 
> Solr requires very less heap memory for its operation and more memory for
> storing data in main memory. This is because solr uses mmap for storing the
> index files.
> Please check the link
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
> understanding how solr operates on files .
> 
> Solr has typical problem of Garbage collection once you the heap size to a
> large value. It will have indeterminate pauses due to GC. The amount of
> heap memory required is difficult to tell. However the way we tuned this
> parameter is setting it to a low value and increasing it by 1Gb whenever
> OOM is thrown.
> 
> Please check the problem of having large Java Heap
> 
> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
> 
> 
> Just for your reference, in our production setup, we have data of around
> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
> the rest of the memory we will leave it to OS to manage. We do around 1000
> (search + Insert)/second on the data.
> 
> I hope this helps.
> 
> Regards,
> Rahul
> 
> 
> 
> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <ko...@163.com> wrote:
> 
>> Hi, list
>> 
>> I’m new to solr. Recently I encounter a “memory leak” problem with
>> solrcloud.
>> 
>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>> have
>> one collection with about 400k docs. The index size of the collection is
>> about
>> 500MB. Memory for solr is 16GB.
>> 
>> Following is "ps aux | grep solr” :
>> 
>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>> -Xloggc:/var/log/solr/gc.log
>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>> -Dsolr.authentication.simple.anonymous.allowed=true
>> -Dsolr.security.proxyuser.hue.hosts=*
>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>> -Xloggc:/var/log/solr/gc.log
>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>> -Dsolr.authentication.simple.anonymous.allowed=true
>> -Dsolr.security.proxyuser.hue.hosts=*
>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
>> org.apache.catalina.startup.Bootstrap start
>> 
>> 
>> solr version is solr4.4.0-cdh5.3.0
>> jdk version is 1.7.0_67
>> 
>> Soft commit time is 1.5s. And we have real time indexing/partialupdating
>> rate about 100 docs per second.
>> 
>> When fresh started, Solr will use about 500M memory(the memory show in
>> solr ui panel).
>> After several days running, Solr will meet with long time gc problems, and
>> no response to user query.
>> 
>> During solr running, the memory used by solr is keep increasing until some
>> large value, and decrease to
>> a low level(because of gc), and keep increasing until a larger value
>> again, then decrease to a low level again … and keep
>> increasing to an more larger value … until solr has no response and i
>> restart it.
>> 
>> 
>> I don’t know how to solve this problem. Can you give me some advices?
>> 
>> Thanks.
>> 
>> 
>> 
>>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Posted by Rahul Ramesh <rr...@gmail.com>.

You should actually decrease solr heap size. Let me explain a bit.

Solr requires very less heap memory for its operation and more memory for
storing data in main memory. This is because solr uses mmap for storing the
index files.
Please check the link
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
understanding how solr operates on files .

Solr has typical problem of Garbage collection once you the heap size to a
large value. It will have indeterminate pauses due to GC. The amount of
heap memory required is difficult to tell. However the way we tuned this
parameter is setting it to a low value and increasing it by 1Gb whenever
OOM is thrown.

Please check the problem of having large Java Heap

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap


Just for your reference, in our production setup, we have data of around
60Gb/node spread across 25 collections. We have configured 8GB as heap and
the rest of the memory we will leave it to OS to manage. We do around 1000
(search + Insert)/second on the data.

I hope this helps.

Regards,
Rahul



On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun <ko...@163.com> wrote:

> Hi, list
>
> I’m new to solr. Recently I encounter a “memory leak” problem with
> solrcloud.
>
> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
> have
> one collection with about 400k docs. The index size of the collection is
> about
> 500MB. Memory for solr is 16GB.
>
> Following is "ps aux | grep solr” :
>
> /usr/java/jdk1.7.0_67-cloudera/bin/java
> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=10000 -Dsolr.solr.home=/var/lib/solr
> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
> -Dcatalina.base=/var/lib/solr/tomcat-deployment
> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
> org.apache.catalina.startup.Bootstrap start
>
>
> solr version is solr4.4.0-cdh5.3.0
> jdk version is 1.7.0_67
>
> Soft commit time is 1.5s. And we have real time indexing/partialupdating
> rate about 100 docs per second.
>
> When fresh started, Solr will use about 500M memory(the memory show in
> solr ui panel).
> After several days running, Solr will meet with long time gc problems, and
> no response to user query.
>
> During solr running, the memory used by solr is keep increasing until some
> large value, and decrease to
> a low level(because of gc), and keep increasing until a larger value
> again, then decrease to a low level again … and keep
> increasing to an more larger value … until solr has no response and i
> restart it.
>
>
> I don’t know how to solve this problem. Can you give me some advices?
>
> Thanks.
>
>
>
>