You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shawn Heisey (JIRA)" <ji...@apache.org> on 2016/12/06 14:08:59 UTC

[jira] [Commented] (SOLR-9828) Very long young generation stop the world GC pause

    [ https://issues.apache.org/jira/browse/SOLR-9828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725578#comment-15725578 ] 

Shawn Heisey commented on SOLR-9828:
------------------------------------

What exactly do you want the Solr project to do about this?  Long garbage collections are a fact of life in a Java program with a large heap.  32GB is a large heap.  

You are not running with the GC settings that version 5.3.2 shipped with.  If you change the GC tuning, you're on your own.  Any problems that result are not an indication of a bug in Solr, they are at most a bug in the implementation of Java that you're running.  This problem should have been mentioned on the mailing list or IRC channel, not in this issue tracker.  I'm going to close this issue.  A bug in Solr is VERY unlikely, and this issue tracker is not the correct place for support.

Part of the GC tuning that you've done is assign 34 threads to the garbage collector -- 16 for the concurrent collector, 18 for the parallel collector.  How many CPU cores do you have in the machine (not counting hyperthreading)?  Typical CPU counts for a modern server are somewhere between 4 and 16.  It's possible that all your CPU cores are involved in the garbage collection, so CPU usage of 100 percent would not be surprising.  Even with a CPU count of 32, the thread counts you've configured might cause this.

I have created a wiki page with some GC tuning parameters for Solr, both G1 and CMS:

https://wiki.apache.org/solr/ShawnHeisey

One thing you can do here that might help is to decrease your heap to 31GB.  The effective amount of usable heap memory would actually probably go UP because Java will switch to 32-bit pointers.  With a 32GB heap, Java must use 64-bit pointers, so each one takes twice as much memory.

I'm not sure your heap needs to be that high at all.  The GC log above shows that the long collection reduced the heap from 4.5GB to 3.4GB.  If these numbers are typical after Solr has been running for a while, a heap size of 8GB, maybe less, might be enough.  You won't know for sure without long-term experimentation.

The following page discusses a way to determine how much heap is needed using jconsole graphing:

https://wiki.apache.org/solr/SolrPerformanceProblems

> Very long young generation stop the world GC pause 
> ---------------------------------------------------
>
>                 Key: SOLR-9828
>                 URL: https://issues.apache.org/jira/browse/SOLR-9828
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.3.2
>         Environment: Linux Redhat 64bit
>            Reporter: Forest Soup
>
> We are using oracle jdk8u92 64bit.
> The jvm memory related options:
> -Xms32768m 
> -Xmx32768m 
> -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/mnt/solrdata1/log 
> -XX:+UseG1GC 
> -XX:+PerfDisableSharedMem 
> -XX:+ParallelRefProcEnabled 
> -XX:G1HeapRegionSize=8m 
> -XX:MaxGCPauseMillis=100 
> -XX:InitiatingHeapOccupancyPercent=35 
> -XX:+AggressiveOpts 
> -XX:+AlwaysPreTouch 
> -XX:ConcGCThreads=16 
> -XX:ParallelGCThreads=18 
> -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/mnt/solrdata1/log 
> -verbose:gc 
> -XX:+PrintHeapAtGC 
> -XX:+PrintGCDetails 
> -XX:+PrintGCDateStamps 
> -XX:+PrintGCTimeStamps 
> -XX:+PrintTenuringDistribution 
> -XX:+PrintGCApplicationStoppedTime 
> -Xloggc:/mnt/solrdata1/log/solr_gc.log
> It usually works fine. But recently we met very long stop the world young generation GC pause. Some snippets of the gc log are as below:
> 2016-11-22T20:43:16.436+0000: 2942054.483: Total time for which application threads were stopped: 0.0005510 seconds, Stopping threads took: 0.0000894 seconds
> 2016-11-22T20:43:16.463+0000: 2942054.509: Total time for which application threads were stopped: 0.0029195 seconds, Stopping threads took: 0.0000804 seconds
> {Heap before GC invocations=2246 (full 0):
>  garbage-first heap   total 26673152K, used 4683965K [0x00007f0c10000000, 0x00007f0c108065c0, 0x00007f1410000000)
>   region size 8192K, 162 young (1327104K), 17 survivors (139264K)
>  Metaspace       used 56487K, capacity 57092K, committed 58368K, reserved 59392K
> 2016-11-22T20:43:16.555+0000: 2942054.602: [GC pause (G1 Evacuation Pause) (young)
> Desired survivor size 88080384 bytes, new threshold 15 (max 15)
> - age   1:   28176280 bytes,   28176280 total
> - age   2:    5632480 bytes,   33808760 total
> - age   3:    9719072 bytes,   43527832 total
> - age   4:    6219408 bytes,   49747240 total
> - age   5:    4465544 bytes,   54212784 total
> - age   6:    3417168 bytes,   57629952 total
> - age   7:    5343072 bytes,   62973024 total
> - age   8:    2784808 bytes,   65757832 total
> - age   9:    6538056 bytes,   72295888 total
> - age  10:    6368016 bytes,   78663904 total
> - age  11:     695216 bytes,   79359120 total
> , 97.2044320 secs]
>    [Parallel Time: 19.8 ms, GC Workers: 18]
>       [GC Worker Start (ms): Min: 2942054602.1, Avg: 2942054604.6, Max: 2942054612.7, Diff: 10.6]
>       [Ext Root Scanning (ms): Min: 0.0, Avg: 2.4, Max: 6.7, Diff: 6.7, Sum: 43.5]
>       [Update RS (ms): Min: 0.0, Avg: 3.0, Max: 15.9, Diff: 15.9, Sum: 54.0]
>          [Processed Buffers: Min: 0, Avg: 10.7, Max: 39, Diff: 39, Sum: 192]
>       [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.6]
>       [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
>       [Object Copy (ms): Min: 0.1, Avg: 9.2, Max: 13.4, Diff: 13.3, Sum: 165.9]
>       [Termination (ms): Min: 0.0, Avg: 2.5, Max: 2.7, Diff: 2.7, Sum: 44.1]
>          [Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 27]
>       [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.6]
>       [GC Worker Total (ms): Min: 9.0, Avg: 17.1, Max: 19.7, Diff: 10.6, Sum: 308.7]
>       [GC Worker End (ms): Min: 2942054621.8, Avg: 2942054621.8, Max: 2942054621.8, Diff: 0.0]
>    [Code Root Fixup: 0.1 ms]
>    [Code Root Purge: 0.0 ms]
>    [Clear CT: 0.2 ms]
>    [Other: 97184.3 ms]
>       [Choose CSet: 0.0 ms]
>       [Ref Proc: 8.5 ms]
>       [Ref Enq: 0.2 ms]
>       [Redirty Cards: 0.2 ms]
>       [Humongous Register: 0.1 ms]
>       [Humongous Reclaim: 0.1 ms]
>       [Free CSet: 0.4 ms]
>    [Eden: 1160.0M(1160.0M)->0.0B(1200.0M) Survivors: 136.0M->168.0M Heap: 4574.2M(25.4G)->3450.8M(26.8G)]
> Heap after GC invocations=2247 (full 0):
>  garbage-first heap   total 28049408K, used 3533601K [0x00007f0c10000000, 0x00007f0c10806b00, 0x00007f1410000000)
>   region size 8192K, 21 young (172032K), 21 survivors (172032K)
>  Metaspace       used 56487K, capacity 57092K, committed 58368K, reserved 59392K
> }
>  [Times: user=0.00 sys=94.28, real=97.19 secs] 
> 2016-11-22T20:44:53.760+0000: 2942151.806: Total time for which application threads were stopped: 97.2053747 seconds, Stopping threads took: 0.0001373 seconds
> 2016-11-22T20:44:53.762+0000: 2942151.809: Total time for which application threads were stopped: 0.0008138 seconds, Stopping threads took: 0.0001258 seconds
> And CPU reached near 100% during the GC.
> The load is normal at that time according to the stats of solr update/select/delete handler and jetty request log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org