You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by yasoobhaider <ya...@gmail.com> on 2018/10/03 14:01:28 UTC

CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Hi

I'm working with a Solr cluster with master-slave architecture.

Master and slave config:
ram: 120GB
cores: 16

At any point there are between 10-20 slaves in the cluster, each serving ~2k
requests per minute. Each slave houses two collections of approx 10G
(~2.5mil docs) and 2G(10mil docs) when optimized.

I am working with Solr 6.2.1

Solr configuration:

-XX:+CMSParallelRemarkEnabled
-XX:+CMSScavengeBeforeRemark
-XX:+ParallelRefProcEnabled
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:-OmitStackTraceInFastThrow
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:ConcGCThreads=4
-XX:MaxTenuringThreshold=8
-XX:ParallelGCThreads=4
-XX:PretenureSizeThreshold=64m
-XX:SurvivorRatio=15
-XX:TargetSurvivorRatio=90
-Xmn10G
-Xms80G
-Xmx80G

Some of these configurations have been reached by multiple trial and errors
over time, including the huge heap size.

This cluster usually runs without any error.

In the usual scenario, old gen gc is triggered according to the
configuration at 50% old gen occupancy, and the collector clears out the
memory over the next minute or so. This happens every 10-15 minutes.

However, I have noticed that sometimes the GC pattern of the slaves
completely changes and old gen gc is not able to clear the memory.

After observing the gc logs closely for multiple old gen gc collections, I
noticed that the old gen gc is triggered at 50% occupancy, but if there is a
GC Allocation Failure before the collection completes (after CMS Initial
Remark but before CMS reset), the old gen collection is not able to clear
much memory. And as soon as this collection completes, another old gen gc is
triggered.

And in worst case scenarios, this cycle of old gen gc triggering, GC
allocation failure keeps happening, and the old gen memory keeps increasing,
leading to a single threaded STW GC, which is not able to do much, and I
have to restart the solr server.

The last time this happened after the following sequence of events:

1. We optimized the bigger collection bringing it to its optimized size of
~10G.
2. For an unrelated reason, we had stopped indexing to the master. We
usually index at a low-ish throughput of ~1mil docs/day. This is relevant as
when we are indexing, the size of the collection increases, and this effects
the heap size used by collection.
3. The slaves started behaving erratically, with old gc collection not being
able to free up the required memory and finally being stuck in a STW GC.

As unlikely as this sounds, this is the only thing that changed on the
cluster. There was no change in query throughput or type of queries.

I restarted the slaves multiple times but the gc behaved in the same way for
over three days. Then when we fixed the indexing and made it live, the
slaves resumed their original gc pattern and are running without any issues
for over 24 hours now.

I would really be grateful for any advice on the following:

1. What could be the reason behind CMS not being able to free up the memory?
What are some experiments I can run to solve this problem?
2. Can stopping/starting indexing be a reason for such drastic changes to GC
pattern?
3. I have read at multiple places on this mailing list that the heap size
should be much lower (2x-3x the size of collection), but the last time I
tried CMS was not able to run smoothly and GC STW would occur which was only
solved by a restart. My reasoning for this is that the type of queries and
the throughput are also a factor in deciding the heap size, so it may be
that our queries are creating too many objects maybe. Is my reasoning
correct or should I try with a lower heap size (if it helps achieve a stable
gc pattern)?

(4. Silly question, but what is the right way to ask question on the mailing
list? via mail or via the nabble website? I sent this question earlier as a
mail, but it was not showing up on the nabble website so I am posting it
from the website now)

-----------------------------------------------------------------
-----------------------------------------------------------------

Logs which show this:


Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  437184344 bytes,  437184344 total
- age   2:  194385736 bytes,  631570080 total
: 9868992K->616768K(9868992K), 1.7115191 secs]
48349347K->40160469K(83269312K), 1.7116410 secs] [Times: user=6.25 sys=0.00,
real=1.71 secs]
Heap after GC invocations=921 (full 170):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 39543701K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:15:29.578+0000: 21112.757: Total time for which application
threads were stopped: 1.7120277 seconds, Stopping threads took: 0.0001172
seconds
2018-10-03T12:15:29.579+0000: 21112.758: Total time for which application
threads were stopped: 0.0003476 seconds, Stopping threads took: 0.0001611
seconds
2018-10-03T12:15:29.579+0000: 21112.758: Total time for which application
threads were stopped: 0.0003482 seconds, Stopping threads took: 0.0002375
seconds
2018-10-03T12:15:29.580+0000: 21112.759: Total time for which application
threads were stopped: 0.0003628 seconds, Stopping threads took: 0.0002455
seconds
2018-10-03T12:15:30.580+0000: 21113.759: Total time for which application
threads were stopped: 0.0003182 seconds, Stopping threads took: 0.0001611
seconds
2018-10-03T12:15:31.633+0000: 21114.812: [CMS-concurrent-sweep: 7.579/9.305
secs] [Times: user=29.07 sys=0.11, real=9.30 secs]
2018-10-03T12:15:31.633+0000: 21114.812: [CMS-concurrent-reset-start]
2018-10-03T12:15:31.782+0000: 21114.961: [CMS-concurrent-reset: 0.149/0.149
secs] [Times: user=0.31 sys=0.00, real=0.16 secs]





2018-10-03T12:15:32.335+0000: 21115.514: Total time for which application
threads were stopped: 0.0003525 seconds, Stopping threads took: 0.0001639
seconds
2018-10-03T12:15:33.335+0000: 21116.514: Total time for which application
threads were stopped: 0.0002410 seconds, Stopping threads took: 0.0000933
seconds


2018-10-03T12:15:33.782+0000: 21116.961: [GC (CMS Initial Mark) [1
CMS-initial-mark: 39261799K(73400320K)] 42714046K(83269312K), 0.2772283
secs] [Times: user=1.10 sys=0.00, real=0.28 secs]









2018-10-03T12:15:34.059+0000: 21117.238: Total time for which application
threads were stopped: 0.2775628 seconds, Stopping threads took: 0.0001125
seconds
2018-10-03T12:15:34.059+0000: 21117.238: [CMS-concurrent-mark-start]
2018-10-03T12:15:35.060+0000: 21118.239: Total time for which application
threads were stopped: 0.0003905 seconds, Stopping threads took: 0.0002020
seconds
2018-10-03T12:15:36.060+0000: 21119.239: Total time for which application
threads were stopped: 0.0003198 seconds, Stopping threads took: 0.0001554
seconds
2018-10-03T12:15:37.336+0000: 21120.514: Total time for which application
threads were stopped: 0.0003034 seconds, Stopping threads took: 0.0001426
seconds
2018-10-03T12:15:38.336+0000: 21121.515: Total time for which application
threads were stopped: 0.0003033 seconds, Stopping threads took: 0.0001344
seconds
2018-10-03T12:15:39.337+0000: 21122.515: Total time for which application
threads were stopped: 0.0003374 seconds, Stopping threads took: 0.0001597
seconds
2018-10-03T12:15:40.337+0000: 21123.516: Total time for which application
threads were stopped: 0.0003181 seconds, Stopping threads took: 0.0001417
seconds

{Heap before GC invocations=921 (full 171):
 par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 39261799K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K


2018-10-03T12:15:40.402+0000: 21123.581: [GC (Allocation Failure)
2018-10-03T12:15:40.402+0000: 21123.581: [ParNew
Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  420761352 bytes,  420761352 total
- age   2:  203405104 bytes,  624166456 total
: 9868992K->616768K(9868992K), 1.1580928 secs]
49130791K->40661998K(83269312K), 1.1581942 secs] [Times: user=4.31 sys=0.02,
real=1.16 secs]

Heap after GC invocations=922 (full 171):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
0x00007f51de5b0000)
  to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
0x00007f5204000000)
 concurrent mark-sweep generation total 73400320K, used 40045230K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:15:41.560+0000: 21124.739: Total time for which application
threads were stopped: 1.1585459 seconds, Stopping threads took: 0.0000965
seconds
2018-10-03T12:15:41.727+0000: 21124.906: [CMS-concurrent-mark: 6.502/7.668
secs] [Times: user=44.13 sys=0.23, real=7.67 secs]










2018-10-03T12:15:41.727+0000: 21124.906: [CMS-concurrent-preclean-start]
2018-10-03T12:15:42.336+0000: 21125.515: Total time for which application
threads were stopped: 0.0002580 seconds, Stopping threads took: 0.0001056
seconds
2018-10-03T12:15:43.154+0000: 21126.332: [CMS-concurrent-preclean:
1.350/1.427 secs] [Times: user=3.80 sys=0.10, real=1.42 secs]
2018-10-03T12:15:43.154+0000: 21126.333:
[CMS-concurrent-abortable-preclean-start]
{Heap before GC invocations=922 (full 171):
 par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
0x00007f51de5b0000)
  to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
0x00007f5204000000)
 concurrent mark-sweep generation total 73400320K, used 40045230K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K



2018-10-03T12:15:44.501+0000: 21127.680: [GC (Allocation Failure)
2018-10-03T12:15:44.501+0000: 21127.680: [ParNew
Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  368166576 bytes,  368166576 total
- age   2:  254060192 bytes,  622226768 total
: 9868992K->616768K(9868992K), 0.9552847 secs]
49914222K->41208420K(83269312K), 0.9553941 secs] [Times: user=3.50 sys=0.02,
real=0.96 secs]
Heap after GC invocations=923 (full 171):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 40591652K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:15:45.456+0000: 21128.635: Total time for which application
threads were stopped: 0.9557089 seconds, Stopping threads took: 0.0000723
seconds
2018-10-03T12:15:46.457+0000: 21129.636: Total time for which application
threads were stopped: 0.0002304 seconds, Stopping threads took: 0.0000779
seconds
2018-10-03T12:15:47.336+0000: 21130.515: Total time for which application
threads were stopped: 0.0002717 seconds, Stopping threads took: 0.0001186
seconds
2018-10-03T12:15:47.441+0000: 21130.620: [CMS-concurrent-abortable-preclean:
3.273/4.287 secs] [Times: user=12.59 sys=0.09, real=4.29 secs]
2018-10-03T12:15:47.441+0000: 21130.620: [GC (CMS Final Remark) [YG
occupancy: 7398403 K (9868992 K)]{Heap before GC invocations=923 (full 171):
 par new generation   total 9868992K, used 7398403K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,  73% used [0x00007f4f84000000, 0x00007f5121eb0d00,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 40591652K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
2018-10-03T12:15:47.441+0000: 21130.620: [GC (CMS Final Remark)
2018-10-03T12:15:47.441+0000: 21130.620: [ParNew
Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  308078752 bytes,  308078752 total
- age   2:  316638256 bytes,  624717008 total
: 7398403K->616768K(9868992K), 0.6155285 secs]
47990055K->41552127K(83269312K), 0.6156160 secs] [Times: user=2.29 sys=0.00,
real=0.62 secs]
Heap after GC invocations=924 (full 171):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
0x00007f51de5b0000)
  to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
0x00007f5204000000)
 concurrent mark-sweep generation total 73400320K, used 40935359K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:15:48.057+0000: 21131.236: [Rescan (parallel) , 0.1869583
secs]2018-10-03T12:15:48.244+0000: 21131.423: [weak refs processing,
1.7414971 secs]2018-10-03T12:15:49.985+0000: 21133.164: [class unloading,
0.0592428 secs]2018-10-03T12:15:50.045+0000: 21133.224: [scrub symbol table,
0.0056035 secs]2018-10-03T12:15:50.050+0000: 21133.229: [scrub string table,
0.0014962 secs][1 CMS-remark: 40935359K(73400320K)] 41552127K(83269312K),
2.6119932 secs] [Times: user=6.41 sys=0.00, real=2.61 secs]
2018-10-03T12:15:50.053+0000: 21133.232: Total time for which application
threads were stopped: 2.6123611 seconds, Stopping threads took: 0.0001326
seconds
2018-10-03T12:15:50.053+0000: 21133.232: [CMS-concurrent-sweep-start]
2018-10-03T12:15:50.059+0000: 21133.238: Total time for which application
threads were stopped: 0.0006659 seconds, Stopping threads took: 0.0002255
seconds
2018-10-03T12:15:50.103+0000: 21133.281: Total time for which application
threads were stopped: 0.0039148 seconds, Stopping threads took: 0.0034179
seconds
2018-10-03T12:15:50.271+0000: 21133.450: Total time for which application
threads were stopped: 0.0005564 seconds, Stopping threads took: 0.0002647
seconds
2018-10-03T12:15:50.315+0000: 21133.494: Total time for which application
threads were stopped: 0.0003337 seconds, Stopping threads took: 0.0001589
seconds
2018-10-03T12:15:50.379+0000: 21133.558: Total time for which application
threads were stopped: 0.0003200 seconds, Stopping threads took: 0.0001383
seconds
2018-10-03T12:15:50.551+0000: 21133.730: Total time for which application
threads were stopped: 0.0003616 seconds, Stopping threads took: 0.0001738
seconds
2018-10-03T12:15:51.551+0000: 21134.730: Total time for which application
threads were stopped: 0.0002993 seconds, Stopping threads took: 0.0001143
seconds
2018-10-03T12:15:52.312+0000: 21135.491: Total time for which application
threads were stopped: 0.0003575 seconds, Stopping threads took: 0.0001460
seconds
2018-10-03T12:15:52.330+0000: 21135.509: Total time for which application
threads were stopped: 0.0003366 seconds, Stopping threads took: 0.0001516
seconds
2018-10-03T12:15:52.337+0000: 21135.516: Total time for which application
threads were stopped: 0.0002891 seconds, Stopping threads took: 0.0001312
seconds
2018-10-03T12:15:52.503+0000: 21135.682: Total time for which application
threads were stopped: 0.0002795 seconds, Stopping threads took: 0.0001047
seconds
{Heap before GC invocations=924 (full 171):
 par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
0x00007f51de5b0000)
  to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
0x00007f5204000000)
 concurrent mark-sweep generation total 73400320K, used 40681560K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
2018-10-03T12:15:52.792+0000: 21135.971: [GC (Allocation Failure)
2018-10-03T12:15:52.792+0000: 21135.971: [ParNew
Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  459462088 bytes,  459462088 total
- age   2:  154305656 bytes,  613767744 total
: 9868992K->616768K(9868992K), 0.8807607 secs]
50550552K->41930672K(83269312K), 0.8808648 secs] [Times: user=3.21 sys=0.02,
real=0.88 secs]
Heap after GC invocations=925 (full 171):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 41313904K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:15:53.673+0000: 21136.852: Total time for which application
threads were stopped: 0.8812102 seconds, Stopping threads took: 0.0000861
seconds
2018-10-03T12:15:53.697+0000: 21136.876: Total time for which application
threads were stopped: 0.0022652 seconds, Stopping threads took: 0.0019829
seconds
2018-10-03T12:15:53.703+0000: 21136.882: Total time for which application
threads were stopped: 0.0006961 seconds, Stopping threads took: 0.0004381
seconds
2018-10-03T12:15:53.864+0000: 21137.043: Total time for which application
threads were stopped: 0.0003650 seconds, Stopping threads took: 0.0001561
seconds
2018-10-03T12:15:53.864+0000: 21137.043: Total time for which application
threads were stopped: 0.0001950 seconds, Stopping threads took: 0.0001019
seconds
2018-10-03T12:15:53.865+0000: 21137.043: Total time for which application
threads were stopped: 0.0001891 seconds, Stopping threads took: 0.0000972
seconds
2018-10-03T12:15:53.865+0000: 21137.044: Total time for which application
threads were stopped: 0.0001881 seconds, Stopping threads took: 0.0000985
seconds
2018-10-03T12:15:53.865+0000: 21137.044: Total time for which application
threads were stopped: 0.0001839 seconds, Stopping threads took: 0.0000948
seconds
2018-10-03T12:15:53.865+0000: 21137.044: Total time for which application
threads were stopped: 0.0001835 seconds, Stopping threads took: 0.0000947
seconds
2018-10-03T12:15:53.866+0000: 21137.045: Total time for which application
threads were stopped: 0.0001854 seconds, Stopping threads took: 0.0000962
seconds
2018-10-03T12:15:53.866+0000: 21137.045: Total time for which application
threads were stopped: 0.0002046 seconds, Stopping threads took: 0.0001138
seconds
2018-10-03T12:15:53.866+0000: 21137.045: Total time for which application
threads were stopped: 0.0001842 seconds, Stopping threads took: 0.0000950
seconds
2018-10-03T12:15:53.867+0000: 21137.046: Total time for which application
threads were stopped: 0.0005123 seconds, Stopping threads took: 0.0000891
seconds
2018-10-03T12:15:54.868+0000: 21138.046: Total time for which application
threads were stopped: 0.0002820 seconds, Stopping threads took: 0.0001244
seconds
2018-10-03T12:15:55.868+0000: 21139.047: Total time for which application
threads were stopped: 0.0003152 seconds, Stopping threads took: 0.0001386
seconds
2018-10-03T12:15:56.106+0000: 21139.284: [CMS-concurrent-sweep: 5.141/6.052
secs] [Times: user=21.05 sys=0.08, real=6.06 secs]
2018-10-03T12:15:56.106+0000: 21139.285: [CMS-concurrent-reset-start]
2018-10-03T12:15:56.252+0000: 21139.431: [CMS-concurrent-reset: 0.147/0.147
secs] [Times: user=0.31 sys=0.00, real=0.14 secs]
2018-10-03T12:15:56.868+0000: 21140.047: Total time for which application
threads were stopped: 0.0003063 seconds, Stopping threads took: 0.0001315
seconds
2018-10-03T12:15:57.337+0000: 21140.516: Total time for which application
threads were stopped: 0.0002707 seconds, Stopping threads took: 0.0001180
seconds



2018-10-03T12:15:58.253+0000: 21141.432: [GC (CMS Initial Mark) [1
CMS-initial-mark: 41172739K(73400320K)] 46149710K(83269312K), 0.5153635
secs] [Times: user=1.98 sys=0.00, real=0.52 secs]
2018-10-03T12:15:58.768+0000: 21141.947: Total time for which application
threads were stopped: 0.5158067 seconds, Stopping threads took: 0.0001510
seconds
2018-10-03T12:15:58.769+0000: 21141.947: [CMS-concurrent-mark-start]
2018-10-03T12:15:59.769+0000: 21142.948: Total time for which application
threads were stopped: 0.0002839 seconds, Stopping threads took: 0.0001286
seconds
2018-10-03T12:16:00.770+0000: 21143.948: Total time for which application
threads were stopped: 0.0003470 seconds, Stopping threads took: 0.0001918
seconds
2018-10-03T12:16:00.775+0000: 21143.954: Total time for which application
threads were stopped: 0.0002620 seconds, Stopping threads took: 0.0001340
seconds
2018-10-03T12:16:00.825+0000: 21144.004: Total time for which application
threads were stopped: 0.0003357 seconds, Stopping threads took: 0.0001365
seconds
2018-10-03T12:16:01.825+0000: 21145.004: Total time for which application
threads were stopped: 0.0003481 seconds, Stopping threads took: 0.0001582
seconds
2018-10-03T12:16:02.338+0000: 21145.517: Total time for which application
threads were stopped: 0.0002951 seconds, Stopping threads took: 0.0001352
seconds
2018-10-03T12:16:03.339+0000: 21146.517: Total time for which application
threads were stopped: 0.0002830 seconds, Stopping threads took: 0.0001329
seconds
2018-10-03T12:16:05.414+0000: 21148.593: [CMS-concurrent-mark: 6.638/6.645
secs] [Times: user=38.68 sys=0.14, real=6.65 secs]
2018-10-03T12:16:05.414+0000: 21148.593: [CMS-concurrent-preclean-start]
2018-10-03T12:16:05.524+0000: 21148.703: [CMS-concurrent-preclean:
0.109/0.110 secs] [Times: user=0.28 sys=0.00, real=0.11 secs]
2018-10-03T12:16:05.524+0000: 21148.703:
[CMS-concurrent-abortable-preclean-start]
2018-10-03T12:16:07.339+0000: 21150.517: Total time for which application
threads were stopped: 0.0002768 seconds, Stopping threads took: 0.0001258
seconds
{Heap before GC invocations=925 (full 172):
 par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 41172739K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
2018-10-03T12:16:07.565+0000: 21150.744: [GC (Allocation Failure)
2018-10-03T12:16:07.566+0000: 21150.744: [ParNew
Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  509166416 bytes,  509166416 total
- age   2:  122054840 bytes,  631221256 total
: 9868992K->616768K(9868992K), 1.4807701 secs]
51041731K->42866755K(83269312K), 1.4808761 secs] [Times: user=5.36 sys=0.00,
real=1.48 secs]
Heap after GC invocations=926 (full 172):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
0x00007f51de5b0000)
  to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
0x00007f5204000000)
 concurrent mark-sweep generation total 73400320K, used 42249987K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:16:09.046+0000: 21152.225: Total time for which application
threads were stopped: 1.4812608 seconds, Stopping threads took: 0.0000981
seconds
2018-10-03T12:16:10.047+0000: 21153.226: Total time for which application
threads were stopped: 0.0002932 seconds, Stopping threads took: 0.0001333
seconds
2018-10-03T12:16:11.048+0000: 21154.226: Total time for which application
threads were stopped: 0.0003958 seconds, Stopping threads took: 0.0002094
seconds
2018-10-03T12:16:12.048+0000: 21155.227: Total time for which application
threads were stopped: 0.0002591 seconds, Stopping threads took: 0.0001086
seconds
2018-10-03T12:16:12.339+0000: 21155.518: Total time for which application
threads were stopped: 0.0004337 seconds, Stopping threads took: 0.0002367
seconds
 CMS: abort preclean due to time 2018-10-03T12:16:13.839+0000: 21157.017:
[CMS-concurrent-abortable-preclean: 6.785/8.315 secs] [Times: user=26.73
sys=0.11, real=8.31 secs]
2018-10-03T12:16:13.839+0000: 21157.018: [GC (CMS Final Remark) [YG
occupancy: 6310225 K (9868992 K)]{Heap before GC invocations=926 (full 172):
 par new generation   total 9868992K, used 6310225K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,  61% used [0x00007f4f84000000, 0x00007f50df804780,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
0x00007f51de5b0000)
  to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
0x00007f5204000000)
 concurrent mark-sweep generation total 73400320K, used 42249987K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
2018-10-03T12:16:13.839+0000: 21157.018: [GC (CMS Final Remark)
2018-10-03T12:16:13.839+0000: 21157.018: [ParNew
Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  524643480 bytes,  524643480 total
- age   2:  100201016 bytes,  624844496 total
: 6310225K->616768K(9868992K), 2.2304522 secs]
48560213K->44433360K(83269312K), 2.2305414 secs] [Times: user=8.15 sys=0.00,
real=2.23 secs]
Heap after GC invocations=927 (full 172):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 43816592K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:16:16.070+0000: 21159.249: [Rescan (parallel) , 0.7394523
secs]2018-10-03T12:16:16.809+0000: 21159.988: [weak refs processing,
0.0014298 secs]2018-10-03T12:16:16.811+0000: 21159.989: [class unloading,
0.0592993 secs]2018-10-03T12:16:16.870+0000: 21160.049: [scrub symbol table,
0.0056100 secs]2018-10-03T12:16:16.876+0000: 21160.054: [scrub string table,
0.0014915 secs][1 CMS-remark: 43816592K(73400320K)] 44433360K(83269312K),
3.0393836 secs] [Times: user=11.18 sys=0.00, real=3.04 secs]
2018-10-03T12:16:16.878+0000: 21160.057: Total time for which application
threads were stopped: 3.0397460 seconds, Stopping threads took: 0.0001232
seconds
2018-10-03T12:16:16.879+0000: 21160.057: [CMS-concurrent-sweep-start]
2018-10-03T12:16:16.886+0000: 21160.065: Total time for which application
threads were stopped: 0.0005649 seconds, Stopping threads took: 0.0002303
seconds
2018-10-03T12:16:16.927+0000: 21160.106: Total time for which application
threads were stopped: 0.0010459 seconds, Stopping threads took: 0.0006273
seconds
2018-10-03T12:16:16.927+0000: 21160.106: Total time for which application
threads were stopped: 0.0003996 seconds, Stopping threads took: 0.0002180
seconds
2018-10-03T12:16:17.340+0000: 21160.519: Total time for which application
threads were stopped: 0.0004957 seconds, Stopping threads took: 0.0002318
seconds
2018-10-03T12:16:18.341+0000: 21161.519: Total time for which application
threads were stopped: 0.0003805 seconds, Stopping threads took: 0.0001847
seconds
2018-10-03T12:16:19.341+0000: 21162.520: Total time for which application
threads were stopped: 0.0003364 seconds, Stopping threads took: 0.0001609
seconds
2018-10-03T12:16:20.341+0000: 21163.520: Total time for which application
threads were stopped: 0.0003111 seconds, Stopping threads took: 0.0001119
seconds
2018-10-03T12:16:21.342+0000: 21164.521: Total time for which application
threads were stopped: 0.0003138 seconds, Stopping threads took: 0.0001345
seconds
2018-10-03T12:16:21.890+0000: 21165.069: Total time for which application
threads were stopped: 0.0003635 seconds, Stopping threads took: 0.0001298
seconds
2018-10-03T12:16:21.892+0000: 21165.071: Total time for which application
threads were stopped: 0.0002075 seconds, Stopping threads took: 0.0001069
seconds
2018-10-03T12:16:21.892+0000: 21165.071: Total time for which application
threads were stopped: 0.0001653 seconds, Stopping threads took: 0.0000824
seconds
2018-10-03T12:16:21.954+0000: 21165.133: Total time for which application
threads were stopped: 0.0003568 seconds, Stopping threads took: 0.0001555
seconds
2018-10-03T12:16:22.340+0000: 21165.519: Total time for which application
threads were stopped: 0.0003015 seconds, Stopping threads took: 0.0001233
seconds
2018-10-03T12:16:23.190+0000: 21166.369: Total time for which application
threads were stopped: 0.0003195 seconds, Stopping threads took: 0.0001227
seconds
2018-10-03T12:16:24.270+0000: 21167.449: Total time for which application
threads were stopped: 0.0797588 seconds, Stopping threads took: 0.0795611
seconds
2018-10-03T12:16:25.185+0000: 21168.364: [CMS-concurrent-sweep: 8.210/8.306
secs] [Times: user=27.51 sys=0.07, real=8.31 secs]
2018-10-03T12:16:25.185+0000: 21168.364: [CMS-concurrent-reset-start]
2018-10-03T12:16:25.270+0000: 21168.449: Total time for which application
threads were stopped: 0.0004177 seconds, Stopping threads took: 0.0002056
seconds
2018-10-03T12:16:25.333+0000: 21168.511: [CMS-concurrent-reset: 0.147/0.148
secs] [Times: user=0.54 sys=0.00, real=0.14 secs]
2018-10-03T12:16:26.271+0000: 21169.450: Total time for which application
threads were stopped: 0.0002714 seconds, Stopping threads took: 0.0001245
seconds
2018-10-03T12:16:27.271+0000: 21170.450: Total time for which application
threads were stopped: 0.0002319 seconds, Stopping threads took: 0.0000859
seconds


2018-10-03T12:16:27.332+0000: 21170.511: [GC (CMS Initial Mark) [1
CMS-initial-mark: 42239401K(73400320K)] 50046127K(83269312K), 1.2121645
secs] [Times: user=4.80 sys=0.00, real=1.22 secs]



2018-10-03T12:16:28.545+0000: 21171.724: Total time for which application
threads were stopped: 1.2124408 seconds, Stopping threads took: 0.0000623
seconds
2018-10-03T12:16:28.545+0000: 21171.724: [CMS-concurrent-mark-start]
2018-10-03T12:16:28.545+0000: 21171.724: Total time for which application
threads were stopped: 0.0004375 seconds, Stopping threads took: 0.0002651
seconds
2018-10-03T12:16:29.546+0000: 21172.725: Total time for which application
threads were stopped: 0.0002859 seconds, Stopping threads took: 0.0001463
seconds
2018-10-03T12:16:30.546+0000: 21173.725: Total time for which application
threads were stopped: 0.0003327 seconds, Stopping threads took: 0.0001699
seconds
2018-10-03T12:16:31.547+0000: 21174.726: Total time for which application
threads were stopped: 0.0002795 seconds, Stopping threads took: 0.0001344
seconds
2018-10-03T12:16:33.546+0000: 21176.725: Total time for which application
threads were stopped: 0.0003180 seconds, Stopping threads took: 0.0001169
seconds
{Heap before GC invocations=927 (full 173):
 par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 42239401K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
2018-10-03T12:16:33.953+0000: 21177.131: [GC (Allocation Failure)
2018-10-03T12:16:33.953+0000: 21177.132: [ParNew
Desired survivor size 568413384 bytes, new threshold 2 (max 8)
- age   1:  557021864 bytes,  557021864 total
- age   2:   68995936 bytes,  626017800 total
: 9868992K->616768K(9868992K), 1.6252719 secs]
52108393K->43891937K(83269312K), 1.6253812 secs] [Times: user=5.94 sys=0.01,
real=1.63 secs]




Heap after GC invocations=928 (full 173):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
0x00007f51de5b0000)
  to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
0x00007f5204000000)
 concurrent mark-sweep generation total 73400320K, used 43275169K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:16:35.578+0000: 21178.757: Total time for which application
threads were stopped: 1.6258123 seconds, Stopping threads took: 0.0001480
seconds
2018-10-03T12:16:35.595+0000: 21178.773: Total time for which application
threads were stopped: 0.0006569 seconds, Stopping threads took: 0.0003852
seconds
2018-10-03T12:16:35.595+0000: 21178.774: Total time for which application
threads were stopped: 0.0003074 seconds, Stopping threads took: 0.0001710
seconds
2018-10-03T12:16:35.595+0000: 21178.774: Total time for which application
threads were stopped: 0.0002113 seconds, Stopping threads took: 0.0000924
seconds
2018-10-03T12:16:35.596+0000: 21178.775: Total time for which application
threads were stopped: 0.0003727 seconds, Stopping threads took: 0.0002441
seconds
2018-10-03T12:16:36.596+0000: 21179.775: Total time for which application
threads were stopped: 0.0003703 seconds, Stopping threads took: 0.0001878
seconds
2018-10-03T12:16:37.597+0000: 21180.776: Total time for which application
threads were stopped: 0.0002539 seconds, Stopping threads took: 0.0001050
seconds
2018-10-03T12:16:38.546+0000: 21181.725: Total time for which application
threads were stopped: 0.0002715 seconds, Stopping threads took: 0.0001215
seconds
2018-10-03T12:16:39.557+0000: 21182.736: Total time for which application
threads were stopped: 0.0003852 seconds, Stopping threads took: 0.0001566
seconds
2018-10-03T12:16:40.375+0000: 21183.554: [CMS-concurrent-mark: 10.186/11.830
secs] [Times: user=65.67 sys=0.34, real=11.83 secs]




2018-10-03T12:16:40.375+0000: 21183.554: [CMS-concurrent-preclean-start]
2018-10-03T12:16:40.558+0000: 21183.737: Total time for which application
threads were stopped: 0.0007029 seconds, Stopping threads took: 0.0005097
seconds
2018-10-03T12:16:41.830+0000: 21185.009: [CMS-concurrent-preclean:
1.384/1.455 secs] [Times: user=4.64 sys=0.06, real=1.45 secs]








2018-10-03T12:16:41.830+0000: 21185.009:
[CMS-concurrent-abortable-preclean-start]
2018-10-03T12:16:42.910+0000: 21186.089: Total time for which application
threads were stopped: 0.0003692 seconds, Stopping threads took: 0.0001969
seconds
2018-10-03T12:16:42.914+0000: 21186.093: Total time for which application
threads were stopped: 0.0012019 seconds, Stopping threads took: 0.0011341
seconds
2018-10-03T12:16:43.287+0000: 21186.466: Total time for which application
threads were stopped: 0.0004056 seconds, Stopping threads took: 0.0001726
seconds
2018-10-03T12:16:43.547+0000: 21186.726: Total time for which application
threads were stopped: 0.0003341 seconds, Stopping threads took: 0.0001467
seconds
2018-10-03T12:16:44.547+0000: 21187.726: Total time for which application
threads were stopped: 0.0003079 seconds, Stopping threads took: 0.0001480
seconds
2018-10-03T12:16:48.547+0000: 21191.726: Total time for which application
threads were stopped: 0.0001833 seconds, Stopping threads took: 0.0000628
seconds
 CMS: abort preclean due to time 2018-10-03T12:16:49.482+0000: 21192.661:
[CMS-concurrent-abortable-preclean: 7.648/7.652 secs] [Times: user=14.30
sys=0.12, real=7.66 secs]
2018-10-03T12:16:49.482+0000: 21192.661: [GC (CMS Final Remark) [YG
occupancy: 7690942 K (9868992 K)]{Heap before GC invocations=928 (full 173):
 par new generation   total 9868992K, used 7690942K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,  76% used [0x00007f4f84000000, 0x00007f5133c5fbe0,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
0x00007f51de5b0000)
  to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
0x00007f5204000000)
 concurrent mark-sweep generation total 73400320K, used 43275169K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
2018-10-03T12:16:49.482+0000: 21192.661: [GC (CMS Final Remark)
2018-10-03T12:16:49.482+0000: 21192.661: [ParNew
Desired survivor size 568413384 bytes, new threshold 1 (max 8)
- age   1:  601979288 bytes,  601979288 total
- age   2:   29589400 bytes,  631568688 total
: 7690942K->616768K(9868992K), 1.0751368 secs]
50966112K->45897734K(83269312K), 1.0752293 secs] [Times: user=3.94 sys=0.00,
real=1.07 secs]
Heap after GC invocations=929 (full 173):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 45280966K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:16:50.558+0000: 21193.737: [Rescan (parallel) , 0.3095918
secs]2018-10-03T12:16:50.867+0000: 21194.046: [weak refs processing,
0.0013762 secs]2018-10-03T12:16:50.869+0000: 21194.048: [class unloading,
0.0569672 secs]2018-10-03T12:16:50.926+0000: 21194.105: [scrub symbol table,
0.0053711 secs]2018-10-03T12:16:50.931+0000: 21194.110: [scrub string table,
0.0014654 secs][1 CMS-remark: 45280966K(73400320K)] 45897734K(83269312K),
1.4515832 secs] [Times: user=5.25 sys=0.00, real=1.45 secs]






2018-10-03T12:16:50.934+0000: 21194.113: Total time for which application
threads were stopped: 1.4518312 seconds, Stopping threads took: 0.0000537
seconds
2018-10-03T12:16:50.934+0000: 21194.113: [CMS-concurrent-sweep-start]
2018-10-03T12:16:50.940+0000: 21194.119: Total time for which application
threads were stopped: 0.0006360 seconds, Stopping threads took: 0.0002782
seconds
2018-10-03T12:16:51.169+0000: 21194.348: Total time for which application
threads were stopped: 0.0003905 seconds, Stopping threads took: 0.0001195
seconds
2018-10-03T12:16:52.169+0000: 21195.348: Total time for which application
threads were stopped: 0.0002521 seconds, Stopping threads took: 0.0000770
seconds
2018-10-03T12:16:53.170+0000: 21196.349: Total time for which application
threads were stopped: 0.0003716 seconds, Stopping threads took: 0.0001484
seconds
2018-10-03T12:16:53.548+0000: 21196.727: Total time for which application
threads were stopped: 0.0003341 seconds, Stopping threads took: 0.0001590
seconds
2018-10-03T12:16:54.548+0000: 21197.727: Total time for which application
threads were stopped: 0.0002481 seconds, Stopping threads took: 0.0000942
seconds
2018-10-03T12:16:55.549+0000: 21198.728: Total time for which application
threads were stopped: 0.0003797 seconds, Stopping threads took: 0.0001918
seconds
2018-10-03T12:16:56.188+0000: 21199.367: [CMS-concurrent-sweep: 5.251/5.254
secs] [Times: user=10.42 sys=0.03, real=5.25 secs]
2018-10-03T12:16:56.189+0000: 21199.367: [CMS-concurrent-reset-start]
2018-10-03T12:16:56.339+0000: 21199.518: [CMS-concurrent-reset: 0.151/0.151
secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
2018-10-03T12:16:56.549+0000: 21199.728: Total time for which application
threads were stopped: 0.0003961 seconds, Stopping threads took: 0.0002311
seconds
2018-10-03T12:16:57.550+0000: 21200.729: Total time for which application
threads were stopped: 0.0002442 seconds, Stopping threads took: 0.0001090
seconds


2018-10-03T12:16:58.339+0000: 21201.518: [GC (CMS Initial Mark) [1
CMS-initial-mark: 45178633K(73400320K)] 49907123K(83269312K), 0.4192155
secs] [Times: user=1.68 sys=0.00, real=0.42 secs]
2018-10-03T12:16:58.759+0000: 21201.938: Total time for which application
threads were stopped: 0.4195337 seconds, Stopping threads took: 0.0000902
seconds
2018-10-03T12:16:58.759+0000: 21201.938: [CMS-concurrent-mark-start]
2018-10-03T12:16:58.759+0000: 21201.938: Total time for which application
threads were stopped: 0.0002544 seconds, Stopping threads took: 0.0001327
seconds
2018-10-03T12:17:00.760+0000: 21203.939: Total time for which application
threads were stopped: 0.0003008 seconds, Stopping threads took: 0.0001486
seconds
2018-10-03T12:17:02.760+0000: 21205.939: Total time for which application
threads were stopped: 0.0002923 seconds, Stopping threads took: 0.0001359
seconds
2018-10-03T12:17:03.760+0000: 21206.939: Total time for which application
threads were stopped: 0.0003256 seconds, Stopping threads took: 0.0001512
seconds
2018-10-03T12:17:04.760+0000: 21207.939: Total time for which application
threads were stopped: 0.0003389 seconds, Stopping threads took: 0.0001654
seconds
2018-10-03T12:17:05.761+0000: 21208.940: Total time for which application
threads were stopped: 0.0002250 seconds, Stopping threads took: 0.0000897
seconds
2018-10-03T12:17:06.761+0000: 21209.940: Total time for which application
threads were stopped: 0.0001925 seconds, Stopping threads took: 0.0000749
seconds
2018-10-03T12:17:07.082+0000: 21210.261: [CMS-concurrent-mark: 8.316/8.323
secs] [Times: user=39.05 sys=0.28, real=8.33 secs]
2018-10-03T12:17:07.082+0000: 21210.261: [CMS-concurrent-preclean-start]
2018-10-03T12:17:07.192+0000: 21210.371: [CMS-concurrent-preclean:
0.109/0.110 secs] [Times: user=0.11 sys=0.00, real=0.10 secs]
2018-10-03T12:17:07.192+0000: 21210.371:
[CMS-concurrent-abortable-preclean-start]
2018-10-03T12:17:08.760+0000: 21211.939: Total time for which application
threads were stopped: 0.0001917 seconds, Stopping threads took: 0.0000699
seconds
2018-10-03T12:17:10.761+0000: 21213.940: Total time for which application
threads were stopped: 0.0002303 seconds, Stopping threads took: 0.0000800
seconds
2018-10-03T12:17:12.761+0000: 21215.940: Total time for which application
threads were stopped: 0.0002098 seconds, Stopping threads took: 0.0000633
seconds
 CMS: abort preclean due to time 2018-10-03T12:17:13.355+0000: 21216.534:
[CMS-concurrent-abortable-preclean: 6.162/6.163 secs] [Times: user=9.43
sys=0.03, real=6.17 secs]
2018-10-03T12:17:13.355+0000: 21216.534: [GC (CMS Final Remark) [YG
occupancy: 9513570 K (9868992 K)]{Heap before GC invocations=929 (full 174):
 par new generation   total 9868992K, used 9513570K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,  96% used [0x00007f4f84000000, 0x00007f51a3048ab0,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
0x00007f5204000000)
  to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
0x00007f51de5b0000)
 concurrent mark-sweep generation total 73400320K, used 45178633K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
2018-10-03T12:17:13.356+0000: 21216.534: [GC (CMS Final Remark)
2018-10-03T12:17:13.356+0000: 21216.534: [ParNew
Desired survivor size 568413384 bytes, new threshold 1 (max 8)
- age   1:  620466400 bytes,  620466400 total
: 9513570K->616768K(9868992K), 0.8406100 secs]
54692204K->48320246K(83269312K), 0.8407079 secs] [Times: user=2.98 sys=0.00,
real=0.84 secs]
Heap after GC invocations=930 (full 174):
 par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
0x00007f5204000000, 0x00007f5204000000)
  eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
0x00007f51b8b60000)
  from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
0x00007f51de5b0000)
  to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
0x00007f5204000000)
 concurrent mark-sweep generation total 73400320K, used 47703478K
[0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
 Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
61440K
}
2018-10-03T12:17:14.196+0000: 21217.375: [Rescan (parallel) , 0.2207621
secs]2018-10-03T12:17:14.417+0000: 21217.596: [weak refs processing,
0.0016539 secs]2018-10-03T12:17:14.419+0000: 21217.598: [class unloading,
0.0577526 secs]2018-10-03T12:17:14.476+0000: 21217.655: [scrub symbol table,
0.0055694 secs]2018-10-03T12:17:14.482+0000: 21217.661: [scrub string table,
0.0014483 secs][1 CMS-remark: 47703478K(73400320K)] 48320246K(83269312K),
1.1294195 secs] [Times: user=3.93 sys=0.00, real=1.13 secs]
2018-10-03T12:17:14.485+0000: 21217.664: Total time for which application
threads were stopped: 1.1297533 seconds, Stopping threads took: 0.0001321
seconds
2018-10-03T12:17:14.485+0000: 21217.664: Total time for which application
threads were stopped: 0.0002640 seconds, Stopping threads took: 0.0001215
seconds
2018-10-03T12:17:14.485+0000: 21217.664: [CMS-concurrent-sweep-start]
2018-10-03T12:17:14.502+0000: 21217.681: Total time for which application
threads were stopped: 0.0004896 seconds, Stopping threads took: 0.0001425
seconds
2018-10-03T12:17:14.562+0000: 21217.741: Total time for which application
threads were stopped: 0.0006066 seconds, Stopping threads took: 0.0002838
seconds
2018-10-03T12:17:15.562+0000: 21218.741: Total time for which application
threads were stopped: 0.0002378 seconds, Stopping threads took: 0.0000804
seconds
2018-10-03T12:17:16.562+0000: 21219.741: Total time for which application
threads were stopped: 0.0002154 seconds, Stopping threads took: 0.0000845
seconds
2018-10-03T12:17:17.563+0000: 21220.742: Total time for which application
threads were stopped: 0.0001754 seconds, Stopping threads took: 0.0000556
seconds
2018-10-03T12:17:18.563+0000: 21221.742: Total time for which application
threads were stopped: 0.0001794 seconds, Stopping threads took: 0.0000606
seconds
2018-10-03T12:17:19.486+0000: 21222.665: Total time for which application
threads were stopped: 0.0002057 seconds, Stopping threads took: 0.0000719
seconds
2018-10-03T12:17:20.486+0000: 21223.665: Total time for which application
threads were stopped: 0.0003069 seconds, Stopping threads took: 0.0001119
seconds
2018-10-03T12:17:21.487+0000: 21224.666: Total time for which application
threads were stopped: 0.0003546 seconds, Stopping threads took: 0.0001126
seconds
2018-10-03T12:17:22.487+0000: 21225.666: Total time for which application
threads were stopped: 0.0003192 seconds, Stopping threads took: 0.0001288
seconds
2018-10-03T12:17:23.488+0000: 21226.666: Total time for which application
threads were stopped: 0.0002118 seconds, Stopping threads took: 0.0000659
seconds
2018-10-03T12:17:24.486+0000: 21227.665: Total time for which application
threads were stopped: 0.0001808 seconds, Stopping threads took: 0.0000631
seconds
2018-10-03T12:17:25.486+0000: 21228.665: Total time for which application
threads were stopped: 0.0001869 seconds, Stopping threads took: 0.0000732
seconds
2018-10-03T12:17:26.487+0000: 21229.666: Total time for which application
threads were stopped: 0.0002461 seconds, Stopping threads took: 0.0001013
seconds
2018-10-03T12:17:27.332+0000: 21230.511: Total time for which application
threads were stopped: 0.0002317 seconds, Stopping threads took: 0.0000648
seconds
2018-10-03T12:17:28.224+0000: 21231.403: Total time for which application
threads were stopped: 0.0001957 seconds, Stopping threads took: 0.0000599
seconds
2018-10-03T12:17:29.225+0000: 21232.403: Total time for which application
threads were stopped: 0.0002778 seconds, Stopping threads took: 0.0001248
seconds
2018-10-03T12:17:29.487+0000: 21232.665: Total time for which application
threads were stopped: 0.0001862 seconds, Stopping threads took: 0.0000633
seconds
2018-10-03T12:17:32.673+0000: 21235.852: [CMS-concurrent-sweep:
18.182/18.188 secs] [Times: user=30.61 sys=0.06, real=18.18 secs]
2018-10-03T12:17:32.673+0000: 21235.852: [CMS-concurrent-reset-start]
2018-10-03T12:17:32.824+0000: 21236.003: [CMS-concurrent-reset: 0.151/0.151
secs] [Times: user=0.24 sys=0.00, real=0.16 secs]

-----------------------------------------------------------------
-----------------------------------------------------------------



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by Ere Maijala <er...@helsinki.fi>.

Hi,

In addition to what others wrote already, there are a couple of things 
that might trigger sudden memory allocation surge that you can't really 
account for:

1. Deep paging, especially in a sharded index. Don't allow it and you'll 
be much happier.

2. Faceting without docValues especially in a large index.

These would be my top two things to check before anything else. I've 
gone from 48 GB heap and GC having massive trouble keeping up to 8 GB 
heap and no trouble at all just by getting rid of deep paging and using 
docValues with all faceted fields.

--Ere

yasoobhaider kirjoitti 3.10.2018 klo 17.01:
> Hi
> 
> I'm working with a Solr cluster with master-slave architecture.
> 
> Master and slave config:
> ram: 120GB
> cores: 16
> 
> At any point there are between 10-20 slaves in the cluster, each serving ~2k
> requests per minute. Each slave houses two collections of approx 10G
> (~2.5mil docs) and 2G(10mil docs) when optimized.
> 
> I am working with Solr 6.2.1
> 
> Solr configuration:
> 
> -XX:+CMSParallelRemarkEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:+ParallelRefProcEnabled
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
> -XX:-OmitStackTraceInFastThrow
> -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:ConcGCThreads=4
> -XX:MaxTenuringThreshold=8
> -XX:ParallelGCThreads=4
> -XX:PretenureSizeThreshold=64m
> -XX:SurvivorRatio=15
> -XX:TargetSurvivorRatio=90
> -Xmn10G
> -Xms80G
> -Xmx80G
> 
> Some of these configurations have been reached by multiple trial and errors
> over time, including the huge heap size.
> 
> This cluster usually runs without any error.
> 
> In the usual scenario, old gen gc is triggered according to the
> configuration at 50% old gen occupancy, and the collector clears out the
> memory over the next minute or so. This happens every 10-15 minutes.
> 
> However, I have noticed that sometimes the GC pattern of the slaves
> completely changes and old gen gc is not able to clear the memory.
> 
> After observing the gc logs closely for multiple old gen gc collections, I
> noticed that the old gen gc is triggered at 50% occupancy, but if there is a
> GC Allocation Failure before the collection completes (after CMS Initial
> Remark but before CMS reset), the old gen collection is not able to clear
> much memory. And as soon as this collection completes, another old gen gc is
> triggered.
> 
> And in worst case scenarios, this cycle of old gen gc triggering, GC
> allocation failure keeps happening, and the old gen memory keeps increasing,
> leading to a single threaded STW GC, which is not able to do much, and I
> have to restart the solr server.
> 
> The last time this happened after the following sequence of events:
> 
> 1. We optimized the bigger collection bringing it to its optimized size of
> ~10G.
> 2. For an unrelated reason, we had stopped indexing to the master. We
> usually index at a low-ish throughput of ~1mil docs/day. This is relevant as
> when we are indexing, the size of the collection increases, and this effects
> the heap size used by collection.
> 3. The slaves started behaving erratically, with old gc collection not being
> able to free up the required memory and finally being stuck in a STW GC.
> 
> As unlikely as this sounds, this is the only thing that changed on the
> cluster. There was no change in query throughput or type of queries.
> 
> I restarted the slaves multiple times but the gc behaved in the same way for
> over three days. Then when we fixed the indexing and made it live, the
> slaves resumed their original gc pattern and are running without any issues
> for over 24 hours now.
> 
> I would really be grateful for any advice on the following:
> 
> 1. What could be the reason behind CMS not being able to free up the memory?
> What are some experiments I can run to solve this problem?
> 2. Can stopping/starting indexing be a reason for such drastic changes to GC
> pattern?
> 3. I have read at multiple places on this mailing list that the heap size
> should be much lower (2x-3x the size of collection), but the last time I
> tried CMS was not able to run smoothly and GC STW would occur which was only
> solved by a restart. My reasoning for this is that the type of queries and
> the throughput are also a factor in deciding the heap size, so it may be
> that our queries are creating too many objects maybe. Is my reasoning
> correct or should I try with a lower heap size (if it helps achieve a stable
> gc pattern)?
> 
> (4. Silly question, but what is the right way to ask question on the mailing
> list? via mail or via the nabble website? I sent this question earlier as a
> mail, but it was not showing up on the nabble website so I am posting it
> from the website now)
> 
> -----------------------------------------------------------------
> -----------------------------------------------------------------
> 
> Logs which show this:
> 
> 
> Desired survivor size 568413384 bytes, new threshold 2 (max 8)
> - age   1:  437184344 bytes,  437184344 total
> - age   2:  194385736 bytes,  631570080 total
> : 9868992K->616768K(9868992K), 1.7115191 secs]
> 48349347K->40160469K(83269312K), 1.7116410 secs] [Times: user=6.25 sys=0.00,
> real=1.71 secs]
> Heap after GC invocations=921 (full 170):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 39543701K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:15:29.578+0000: 21112.757: Total time for which application
> threads were stopped: 1.7120277 seconds, Stopping threads took: 0.0001172
> seconds
> 2018-10-03T12:15:29.579+0000: 21112.758: Total time for which application
> threads were stopped: 0.0003476 seconds, Stopping threads took: 0.0001611
> seconds
> 2018-10-03T12:15:29.579+0000: 21112.758: Total time for which application
> threads were stopped: 0.0003482 seconds, Stopping threads took: 0.0002375
> seconds
> 2018-10-03T12:15:29.580+0000: 21112.759: Total time for which application
> threads were stopped: 0.0003628 seconds, Stopping threads took: 0.0002455
> seconds
> 2018-10-03T12:15:30.580+0000: 21113.759: Total time for which application
> threads were stopped: 0.0003182 seconds, Stopping threads took: 0.0001611
> seconds
> 2018-10-03T12:15:31.633+0000: 21114.812: [CMS-concurrent-sweep: 7.579/9.305
> secs] [Times: user=29.07 sys=0.11, real=9.30 secs]
> 2018-10-03T12:15:31.633+0000: 21114.812: [CMS-concurrent-reset-start]
> 2018-10-03T12:15:31.782+0000: 21114.961: [CMS-concurrent-reset: 0.149/0.149
> secs] [Times: user=0.31 sys=0.00, real=0.16 secs]
> 
> 
> 
> 
> 
> 2018-10-03T12:15:32.335+0000: 21115.514: Total time for which application
> threads were stopped: 0.0003525 seconds, Stopping threads took: 0.0001639
> seconds
> 2018-10-03T12:15:33.335+0000: 21116.514: Total time for which application
> threads were stopped: 0.0002410 seconds, Stopping threads took: 0.0000933
> seconds
> 
> 
> 2018-10-03T12:15:33.782+0000: 21116.961: [GC (CMS Initial Mark) [1
> CMS-initial-mark: 39261799K(73400320K)] 42714046K(83269312K), 0.2772283
> secs] [Times: user=1.10 sys=0.00, real=0.28 secs]
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 2018-10-03T12:15:34.059+0000: 21117.238: Total time for which application
> threads were stopped: 0.2775628 seconds, Stopping threads took: 0.0001125
> seconds
> 2018-10-03T12:15:34.059+0000: 21117.238: [CMS-concurrent-mark-start]
> 2018-10-03T12:15:35.060+0000: 21118.239: Total time for which application
> threads were stopped: 0.0003905 seconds, Stopping threads took: 0.0002020
> seconds
> 2018-10-03T12:15:36.060+0000: 21119.239: Total time for which application
> threads were stopped: 0.0003198 seconds, Stopping threads took: 0.0001554
> seconds
> 2018-10-03T12:15:37.336+0000: 21120.514: Total time for which application
> threads were stopped: 0.0003034 seconds, Stopping threads took: 0.0001426
> seconds
> 2018-10-03T12:15:38.336+0000: 21121.515: Total time for which application
> threads were stopped: 0.0003033 seconds, Stopping threads took: 0.0001344
> seconds
> 2018-10-03T12:15:39.337+0000: 21122.515: Total time for which application
> threads were stopped: 0.0003374 seconds, Stopping threads took: 0.0001597
> seconds
> 2018-10-03T12:15:40.337+0000: 21123.516: Total time for which application
> threads were stopped: 0.0003181 seconds, Stopping threads took: 0.0001417
> seconds
> 
> {Heap before GC invocations=921 (full 171):
>   par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 39261799K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> 
> 
> 2018-10-03T12:15:40.402+0000: 21123.581: [GC (Allocation Failure)
> 2018-10-03T12:15:40.402+0000: 21123.581: [ParNew
> Desired survivor size 568413384 bytes, new threshold 2 (max 8)
> - age   1:  420761352 bytes,  420761352 total
> - age   2:  203405104 bytes,  624166456 total
> : 9868992K->616768K(9868992K), 1.1580928 secs]
> 49130791K->40661998K(83269312K), 1.1581942 secs] [Times: user=4.31 sys=0.02,
> real=1.16 secs]
> 
> Heap after GC invocations=922 (full 171):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
> 0x00007f51de5b0000)
>    to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
> 0x00007f5204000000)
>   concurrent mark-sweep generation total 73400320K, used 40045230K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:15:41.560+0000: 21124.739: Total time for which application
> threads were stopped: 1.1585459 seconds, Stopping threads took: 0.0000965
> seconds
> 2018-10-03T12:15:41.727+0000: 21124.906: [CMS-concurrent-mark: 6.502/7.668
> secs] [Times: user=44.13 sys=0.23, real=7.67 secs]
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 2018-10-03T12:15:41.727+0000: 21124.906: [CMS-concurrent-preclean-start]
> 2018-10-03T12:15:42.336+0000: 21125.515: Total time for which application
> threads were stopped: 0.0002580 seconds, Stopping threads took: 0.0001056
> seconds
> 2018-10-03T12:15:43.154+0000: 21126.332: [CMS-concurrent-preclean:
> 1.350/1.427 secs] [Times: user=3.80 sys=0.10, real=1.42 secs]
> 2018-10-03T12:15:43.154+0000: 21126.333:
> [CMS-concurrent-abortable-preclean-start]
> {Heap before GC invocations=922 (full 171):
>   par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
> 0x00007f51de5b0000)
>    to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
> 0x00007f5204000000)
>   concurrent mark-sweep generation total 73400320K, used 40045230K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> 
> 
> 
> 2018-10-03T12:15:44.501+0000: 21127.680: [GC (Allocation Failure)
> 2018-10-03T12:15:44.501+0000: 21127.680: [ParNew
> Desired survivor size 568413384 bytes, new threshold 2 (max 8)
> - age   1:  368166576 bytes,  368166576 total
> - age   2:  254060192 bytes,  622226768 total
> : 9868992K->616768K(9868992K), 0.9552847 secs]
> 49914222K->41208420K(83269312K), 0.9553941 secs] [Times: user=3.50 sys=0.02,
> real=0.96 secs]
> Heap after GC invocations=923 (full 171):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 40591652K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:15:45.456+0000: 21128.635: Total time for which application
> threads were stopped: 0.9557089 seconds, Stopping threads took: 0.0000723
> seconds
> 2018-10-03T12:15:46.457+0000: 21129.636: Total time for which application
> threads were stopped: 0.0002304 seconds, Stopping threads took: 0.0000779
> seconds
> 2018-10-03T12:15:47.336+0000: 21130.515: Total time for which application
> threads were stopped: 0.0002717 seconds, Stopping threads took: 0.0001186
> seconds
> 2018-10-03T12:15:47.441+0000: 21130.620: [CMS-concurrent-abortable-preclean:
> 3.273/4.287 secs] [Times: user=12.59 sys=0.09, real=4.29 secs]
> 2018-10-03T12:15:47.441+0000: 21130.620: [GC (CMS Final Remark) [YG
> occupancy: 7398403 K (9868992 K)]{Heap before GC invocations=923 (full 171):
>   par new generation   total 9868992K, used 7398403K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,  73% used [0x00007f4f84000000, 0x00007f5121eb0d00,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 40591652K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> 2018-10-03T12:15:47.441+0000: 21130.620: [GC (CMS Final Remark)
> 2018-10-03T12:15:47.441+0000: 21130.620: [ParNew
> Desired survivor size 568413384 bytes, new threshold 2 (max 8)
> - age   1:  308078752 bytes,  308078752 total
> - age   2:  316638256 bytes,  624717008 total
> : 7398403K->616768K(9868992K), 0.6155285 secs]
> 47990055K->41552127K(83269312K), 0.6156160 secs] [Times: user=2.29 sys=0.00,
> real=0.62 secs]
> Heap after GC invocations=924 (full 171):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
> 0x00007f51de5b0000)
>    to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
> 0x00007f5204000000)
>   concurrent mark-sweep generation total 73400320K, used 40935359K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:15:48.057+0000: 21131.236: [Rescan (parallel) , 0.1869583
> secs]2018-10-03T12:15:48.244+0000: 21131.423: [weak refs processing,
> 1.7414971 secs]2018-10-03T12:15:49.985+0000: 21133.164: [class unloading,
> 0.0592428 secs]2018-10-03T12:15:50.045+0000: 21133.224: [scrub symbol table,
> 0.0056035 secs]2018-10-03T12:15:50.050+0000: 21133.229: [scrub string table,
> 0.0014962 secs][1 CMS-remark: 40935359K(73400320K)] 41552127K(83269312K),
> 2.6119932 secs] [Times: user=6.41 sys=0.00, real=2.61 secs]
> 2018-10-03T12:15:50.053+0000: 21133.232: Total time for which application
> threads were stopped: 2.6123611 seconds, Stopping threads took: 0.0001326
> seconds
> 2018-10-03T12:15:50.053+0000: 21133.232: [CMS-concurrent-sweep-start]
> 2018-10-03T12:15:50.059+0000: 21133.238: Total time for which application
> threads were stopped: 0.0006659 seconds, Stopping threads took: 0.0002255
> seconds
> 2018-10-03T12:15:50.103+0000: 21133.281: Total time for which application
> threads were stopped: 0.0039148 seconds, Stopping threads took: 0.0034179
> seconds
> 2018-10-03T12:15:50.271+0000: 21133.450: Total time for which application
> threads were stopped: 0.0005564 seconds, Stopping threads took: 0.0002647
> seconds
> 2018-10-03T12:15:50.315+0000: 21133.494: Total time for which application
> threads were stopped: 0.0003337 seconds, Stopping threads took: 0.0001589
> seconds
> 2018-10-03T12:15:50.379+0000: 21133.558: Total time for which application
> threads were stopped: 0.0003200 seconds, Stopping threads took: 0.0001383
> seconds
> 2018-10-03T12:15:50.551+0000: 21133.730: Total time for which application
> threads were stopped: 0.0003616 seconds, Stopping threads took: 0.0001738
> seconds
> 2018-10-03T12:15:51.551+0000: 21134.730: Total time for which application
> threads were stopped: 0.0002993 seconds, Stopping threads took: 0.0001143
> seconds
> 2018-10-03T12:15:52.312+0000: 21135.491: Total time for which application
> threads were stopped: 0.0003575 seconds, Stopping threads took: 0.0001460
> seconds
> 2018-10-03T12:15:52.330+0000: 21135.509: Total time for which application
> threads were stopped: 0.0003366 seconds, Stopping threads took: 0.0001516
> seconds
> 2018-10-03T12:15:52.337+0000: 21135.516: Total time for which application
> threads were stopped: 0.0002891 seconds, Stopping threads took: 0.0001312
> seconds
> 2018-10-03T12:15:52.503+0000: 21135.682: Total time for which application
> threads were stopped: 0.0002795 seconds, Stopping threads took: 0.0001047
> seconds
> {Heap before GC invocations=924 (full 171):
>   par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
> 0x00007f51de5b0000)
>    to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
> 0x00007f5204000000)
>   concurrent mark-sweep generation total 73400320K, used 40681560K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> 2018-10-03T12:15:52.792+0000: 21135.971: [GC (Allocation Failure)
> 2018-10-03T12:15:52.792+0000: 21135.971: [ParNew
> Desired survivor size 568413384 bytes, new threshold 2 (max 8)
> - age   1:  459462088 bytes,  459462088 total
> - age   2:  154305656 bytes,  613767744 total
> : 9868992K->616768K(9868992K), 0.8807607 secs]
> 50550552K->41930672K(83269312K), 0.8808648 secs] [Times: user=3.21 sys=0.02,
> real=0.88 secs]
> Heap after GC invocations=925 (full 171):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 41313904K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:15:53.673+0000: 21136.852: Total time for which application
> threads were stopped: 0.8812102 seconds, Stopping threads took: 0.0000861
> seconds
> 2018-10-03T12:15:53.697+0000: 21136.876: Total time for which application
> threads were stopped: 0.0022652 seconds, Stopping threads took: 0.0019829
> seconds
> 2018-10-03T12:15:53.703+0000: 21136.882: Total time for which application
> threads were stopped: 0.0006961 seconds, Stopping threads took: 0.0004381
> seconds
> 2018-10-03T12:15:53.864+0000: 21137.043: Total time for which application
> threads were stopped: 0.0003650 seconds, Stopping threads took: 0.0001561
> seconds
> 2018-10-03T12:15:53.864+0000: 21137.043: Total time for which application
> threads were stopped: 0.0001950 seconds, Stopping threads took: 0.0001019
> seconds
> 2018-10-03T12:15:53.865+0000: 21137.043: Total time for which application
> threads were stopped: 0.0001891 seconds, Stopping threads took: 0.0000972
> seconds
> 2018-10-03T12:15:53.865+0000: 21137.044: Total time for which application
> threads were stopped: 0.0001881 seconds, Stopping threads took: 0.0000985
> seconds
> 2018-10-03T12:15:53.865+0000: 21137.044: Total time for which application
> threads were stopped: 0.0001839 seconds, Stopping threads took: 0.0000948
> seconds
> 2018-10-03T12:15:53.865+0000: 21137.044: Total time for which application
> threads were stopped: 0.0001835 seconds, Stopping threads took: 0.0000947
> seconds
> 2018-10-03T12:15:53.866+0000: 21137.045: Total time for which application
> threads were stopped: 0.0001854 seconds, Stopping threads took: 0.0000962
> seconds
> 2018-10-03T12:15:53.866+0000: 21137.045: Total time for which application
> threads were stopped: 0.0002046 seconds, Stopping threads took: 0.0001138
> seconds
> 2018-10-03T12:15:53.866+0000: 21137.045: Total time for which application
> threads were stopped: 0.0001842 seconds, Stopping threads took: 0.0000950
> seconds
> 2018-10-03T12:15:53.867+0000: 21137.046: Total time for which application
> threads were stopped: 0.0005123 seconds, Stopping threads took: 0.0000891
> seconds
> 2018-10-03T12:15:54.868+0000: 21138.046: Total time for which application
> threads were stopped: 0.0002820 seconds, Stopping threads took: 0.0001244
> seconds
> 2018-10-03T12:15:55.868+0000: 21139.047: Total time for which application
> threads were stopped: 0.0003152 seconds, Stopping threads took: 0.0001386
> seconds
> 2018-10-03T12:15:56.106+0000: 21139.284: [CMS-concurrent-sweep: 5.141/6.052
> secs] [Times: user=21.05 sys=0.08, real=6.06 secs]
> 2018-10-03T12:15:56.106+0000: 21139.285: [CMS-concurrent-reset-start]
> 2018-10-03T12:15:56.252+0000: 21139.431: [CMS-concurrent-reset: 0.147/0.147
> secs] [Times: user=0.31 sys=0.00, real=0.14 secs]
> 2018-10-03T12:15:56.868+0000: 21140.047: Total time for which application
> threads were stopped: 0.0003063 seconds, Stopping threads took: 0.0001315
> seconds
> 2018-10-03T12:15:57.337+0000: 21140.516: Total time for which application
> threads were stopped: 0.0002707 seconds, Stopping threads took: 0.0001180
> seconds
> 
> 
> 
> 2018-10-03T12:15:58.253+0000: 21141.432: [GC (CMS Initial Mark) [1
> CMS-initial-mark: 41172739K(73400320K)] 46149710K(83269312K), 0.5153635
> secs] [Times: user=1.98 sys=0.00, real=0.52 secs]
> 2018-10-03T12:15:58.768+0000: 21141.947: Total time for which application
> threads were stopped: 0.5158067 seconds, Stopping threads took: 0.0001510
> seconds
> 2018-10-03T12:15:58.769+0000: 21141.947: [CMS-concurrent-mark-start]
> 2018-10-03T12:15:59.769+0000: 21142.948: Total time for which application
> threads were stopped: 0.0002839 seconds, Stopping threads took: 0.0001286
> seconds
> 2018-10-03T12:16:00.770+0000: 21143.948: Total time for which application
> threads were stopped: 0.0003470 seconds, Stopping threads took: 0.0001918
> seconds
> 2018-10-03T12:16:00.775+0000: 21143.954: Total time for which application
> threads were stopped: 0.0002620 seconds, Stopping threads took: 0.0001340
> seconds
> 2018-10-03T12:16:00.825+0000: 21144.004: Total time for which application
> threads were stopped: 0.0003357 seconds, Stopping threads took: 0.0001365
> seconds
> 2018-10-03T12:16:01.825+0000: 21145.004: Total time for which application
> threads were stopped: 0.0003481 seconds, Stopping threads took: 0.0001582
> seconds
> 2018-10-03T12:16:02.338+0000: 21145.517: Total time for which application
> threads were stopped: 0.0002951 seconds, Stopping threads took: 0.0001352
> seconds
> 2018-10-03T12:16:03.339+0000: 21146.517: Total time for which application
> threads were stopped: 0.0002830 seconds, Stopping threads took: 0.0001329
> seconds
> 2018-10-03T12:16:05.414+0000: 21148.593: [CMS-concurrent-mark: 6.638/6.645
> secs] [Times: user=38.68 sys=0.14, real=6.65 secs]
> 2018-10-03T12:16:05.414+0000: 21148.593: [CMS-concurrent-preclean-start]
> 2018-10-03T12:16:05.524+0000: 21148.703: [CMS-concurrent-preclean:
> 0.109/0.110 secs] [Times: user=0.28 sys=0.00, real=0.11 secs]
> 2018-10-03T12:16:05.524+0000: 21148.703:
> [CMS-concurrent-abortable-preclean-start]
> 2018-10-03T12:16:07.339+0000: 21150.517: Total time for which application
> threads were stopped: 0.0002768 seconds, Stopping threads took: 0.0001258
> seconds
> {Heap before GC invocations=925 (full 172):
>   par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 41172739K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> 2018-10-03T12:16:07.565+0000: 21150.744: [GC (Allocation Failure)
> 2018-10-03T12:16:07.566+0000: 21150.744: [ParNew
> Desired survivor size 568413384 bytes, new threshold 2 (max 8)
> - age   1:  509166416 bytes,  509166416 total
> - age   2:  122054840 bytes,  631221256 total
> : 9868992K->616768K(9868992K), 1.4807701 secs]
> 51041731K->42866755K(83269312K), 1.4808761 secs] [Times: user=5.36 sys=0.00,
> real=1.48 secs]
> Heap after GC invocations=926 (full 172):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
> 0x00007f51de5b0000)
>    to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
> 0x00007f5204000000)
>   concurrent mark-sweep generation total 73400320K, used 42249987K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:16:09.046+0000: 21152.225: Total time for which application
> threads were stopped: 1.4812608 seconds, Stopping threads took: 0.0000981
> seconds
> 2018-10-03T12:16:10.047+0000: 21153.226: Total time for which application
> threads were stopped: 0.0002932 seconds, Stopping threads took: 0.0001333
> seconds
> 2018-10-03T12:16:11.048+0000: 21154.226: Total time for which application
> threads were stopped: 0.0003958 seconds, Stopping threads took: 0.0002094
> seconds
> 2018-10-03T12:16:12.048+0000: 21155.227: Total time for which application
> threads were stopped: 0.0002591 seconds, Stopping threads took: 0.0001086
> seconds
> 2018-10-03T12:16:12.339+0000: 21155.518: Total time for which application
> threads were stopped: 0.0004337 seconds, Stopping threads took: 0.0002367
> seconds
>   CMS: abort preclean due to time 2018-10-03T12:16:13.839+0000: 21157.017:
> [CMS-concurrent-abortable-preclean: 6.785/8.315 secs] [Times: user=26.73
> sys=0.11, real=8.31 secs]
> 2018-10-03T12:16:13.839+0000: 21157.018: [GC (CMS Final Remark) [YG
> occupancy: 6310225 K (9868992 K)]{Heap before GC invocations=926 (full 172):
>   par new generation   total 9868992K, used 6310225K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,  61% used [0x00007f4f84000000, 0x00007f50df804780,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
> 0x00007f51de5b0000)
>    to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
> 0x00007f5204000000)
>   concurrent mark-sweep generation total 73400320K, used 42249987K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> 2018-10-03T12:16:13.839+0000: 21157.018: [GC (CMS Final Remark)
> 2018-10-03T12:16:13.839+0000: 21157.018: [ParNew
> Desired survivor size 568413384 bytes, new threshold 2 (max 8)
> - age   1:  524643480 bytes,  524643480 total
> - age   2:  100201016 bytes,  624844496 total
> : 6310225K->616768K(9868992K), 2.2304522 secs]
> 48560213K->44433360K(83269312K), 2.2305414 secs] [Times: user=8.15 sys=0.00,
> real=2.23 secs]
> Heap after GC invocations=927 (full 172):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 43816592K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:16:16.070+0000: 21159.249: [Rescan (parallel) , 0.7394523
> secs]2018-10-03T12:16:16.809+0000: 21159.988: [weak refs processing,
> 0.0014298 secs]2018-10-03T12:16:16.811+0000: 21159.989: [class unloading,
> 0.0592993 secs]2018-10-03T12:16:16.870+0000: 21160.049: [scrub symbol table,
> 0.0056100 secs]2018-10-03T12:16:16.876+0000: 21160.054: [scrub string table,
> 0.0014915 secs][1 CMS-remark: 43816592K(73400320K)] 44433360K(83269312K),
> 3.0393836 secs] [Times: user=11.18 sys=0.00, real=3.04 secs]
> 2018-10-03T12:16:16.878+0000: 21160.057: Total time for which application
> threads were stopped: 3.0397460 seconds, Stopping threads took: 0.0001232
> seconds
> 2018-10-03T12:16:16.879+0000: 21160.057: [CMS-concurrent-sweep-start]
> 2018-10-03T12:16:16.886+0000: 21160.065: Total time for which application
> threads were stopped: 0.0005649 seconds, Stopping threads took: 0.0002303
> seconds
> 2018-10-03T12:16:16.927+0000: 21160.106: Total time for which application
> threads were stopped: 0.0010459 seconds, Stopping threads took: 0.0006273
> seconds
> 2018-10-03T12:16:16.927+0000: 21160.106: Total time for which application
> threads were stopped: 0.0003996 seconds, Stopping threads took: 0.0002180
> seconds
> 2018-10-03T12:16:17.340+0000: 21160.519: Total time for which application
> threads were stopped: 0.0004957 seconds, Stopping threads took: 0.0002318
> seconds
> 2018-10-03T12:16:18.341+0000: 21161.519: Total time for which application
> threads were stopped: 0.0003805 seconds, Stopping threads took: 0.0001847
> seconds
> 2018-10-03T12:16:19.341+0000: 21162.520: Total time for which application
> threads were stopped: 0.0003364 seconds, Stopping threads took: 0.0001609
> seconds
> 2018-10-03T12:16:20.341+0000: 21163.520: Total time for which application
> threads were stopped: 0.0003111 seconds, Stopping threads took: 0.0001119
> seconds
> 2018-10-03T12:16:21.342+0000: 21164.521: Total time for which application
> threads were stopped: 0.0003138 seconds, Stopping threads took: 0.0001345
> seconds
> 2018-10-03T12:16:21.890+0000: 21165.069: Total time for which application
> threads were stopped: 0.0003635 seconds, Stopping threads took: 0.0001298
> seconds
> 2018-10-03T12:16:21.892+0000: 21165.071: Total time for which application
> threads were stopped: 0.0002075 seconds, Stopping threads took: 0.0001069
> seconds
> 2018-10-03T12:16:21.892+0000: 21165.071: Total time for which application
> threads were stopped: 0.0001653 seconds, Stopping threads took: 0.0000824
> seconds
> 2018-10-03T12:16:21.954+0000: 21165.133: Total time for which application
> threads were stopped: 0.0003568 seconds, Stopping threads took: 0.0001555
> seconds
> 2018-10-03T12:16:22.340+0000: 21165.519: Total time for which application
> threads were stopped: 0.0003015 seconds, Stopping threads took: 0.0001233
> seconds
> 2018-10-03T12:16:23.190+0000: 21166.369: Total time for which application
> threads were stopped: 0.0003195 seconds, Stopping threads took: 0.0001227
> seconds
> 2018-10-03T12:16:24.270+0000: 21167.449: Total time for which application
> threads were stopped: 0.0797588 seconds, Stopping threads took: 0.0795611
> seconds
> 2018-10-03T12:16:25.185+0000: 21168.364: [CMS-concurrent-sweep: 8.210/8.306
> secs] [Times: user=27.51 sys=0.07, real=8.31 secs]
> 2018-10-03T12:16:25.185+0000: 21168.364: [CMS-concurrent-reset-start]
> 2018-10-03T12:16:25.270+0000: 21168.449: Total time for which application
> threads were stopped: 0.0004177 seconds, Stopping threads took: 0.0002056
> seconds
> 2018-10-03T12:16:25.333+0000: 21168.511: [CMS-concurrent-reset: 0.147/0.148
> secs] [Times: user=0.54 sys=0.00, real=0.14 secs]
> 2018-10-03T12:16:26.271+0000: 21169.450: Total time for which application
> threads were stopped: 0.0002714 seconds, Stopping threads took: 0.0001245
> seconds
> 2018-10-03T12:16:27.271+0000: 21170.450: Total time for which application
> threads were stopped: 0.0002319 seconds, Stopping threads took: 0.0000859
> seconds
> 
> 
> 2018-10-03T12:16:27.332+0000: 21170.511: [GC (CMS Initial Mark) [1
> CMS-initial-mark: 42239401K(73400320K)] 50046127K(83269312K), 1.2121645
> secs] [Times: user=4.80 sys=0.00, real=1.22 secs]
> 
> 
> 
> 2018-10-03T12:16:28.545+0000: 21171.724: Total time for which application
> threads were stopped: 1.2124408 seconds, Stopping threads took: 0.0000623
> seconds
> 2018-10-03T12:16:28.545+0000: 21171.724: [CMS-concurrent-mark-start]
> 2018-10-03T12:16:28.545+0000: 21171.724: Total time for which application
> threads were stopped: 0.0004375 seconds, Stopping threads took: 0.0002651
> seconds
> 2018-10-03T12:16:29.546+0000: 21172.725: Total time for which application
> threads were stopped: 0.0002859 seconds, Stopping threads took: 0.0001463
> seconds
> 2018-10-03T12:16:30.546+0000: 21173.725: Total time for which application
> threads were stopped: 0.0003327 seconds, Stopping threads took: 0.0001699
> seconds
> 2018-10-03T12:16:31.547+0000: 21174.726: Total time for which application
> threads were stopped: 0.0002795 seconds, Stopping threads took: 0.0001344
> seconds
> 2018-10-03T12:16:33.546+0000: 21176.725: Total time for which application
> threads were stopped: 0.0003180 seconds, Stopping threads took: 0.0001169
> seconds
> {Heap before GC invocations=927 (full 173):
>   par new generation   total 9868992K, used 9868992K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K, 100% used [0x00007f4f84000000, 0x00007f51b8b60000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 42239401K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> 2018-10-03T12:16:33.953+0000: 21177.131: [GC (Allocation Failure)
> 2018-10-03T12:16:33.953+0000: 21177.132: [ParNew
> Desired survivor size 568413384 bytes, new threshold 2 (max 8)
> - age   1:  557021864 bytes,  557021864 total
> - age   2:   68995936 bytes,  626017800 total
> : 9868992K->616768K(9868992K), 1.6252719 secs]
> 52108393K->43891937K(83269312K), 1.6253812 secs] [Times: user=5.94 sys=0.01,
> real=1.63 secs]
> 
> 
> 
> 
> Heap after GC invocations=928 (full 173):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
> 0x00007f51de5b0000)
>    to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
> 0x00007f5204000000)
>   concurrent mark-sweep generation total 73400320K, used 43275169K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:16:35.578+0000: 21178.757: Total time for which application
> threads were stopped: 1.6258123 seconds, Stopping threads took: 0.0001480
> seconds
> 2018-10-03T12:16:35.595+0000: 21178.773: Total time for which application
> threads were stopped: 0.0006569 seconds, Stopping threads took: 0.0003852
> seconds
> 2018-10-03T12:16:35.595+0000: 21178.774: Total time for which application
> threads were stopped: 0.0003074 seconds, Stopping threads took: 0.0001710
> seconds
> 2018-10-03T12:16:35.595+0000: 21178.774: Total time for which application
> threads were stopped: 0.0002113 seconds, Stopping threads took: 0.0000924
> seconds
> 2018-10-03T12:16:35.596+0000: 21178.775: Total time for which application
> threads were stopped: 0.0003727 seconds, Stopping threads took: 0.0002441
> seconds
> 2018-10-03T12:16:36.596+0000: 21179.775: Total time for which application
> threads were stopped: 0.0003703 seconds, Stopping threads took: 0.0001878
> seconds
> 2018-10-03T12:16:37.597+0000: 21180.776: Total time for which application
> threads were stopped: 0.0002539 seconds, Stopping threads took: 0.0001050
> seconds
> 2018-10-03T12:16:38.546+0000: 21181.725: Total time for which application
> threads were stopped: 0.0002715 seconds, Stopping threads took: 0.0001215
> seconds
> 2018-10-03T12:16:39.557+0000: 21182.736: Total time for which application
> threads were stopped: 0.0003852 seconds, Stopping threads took: 0.0001566
> seconds
> 2018-10-03T12:16:40.375+0000: 21183.554: [CMS-concurrent-mark: 10.186/11.830
> secs] [Times: user=65.67 sys=0.34, real=11.83 secs]
> 
> 
> 
> 
> 2018-10-03T12:16:40.375+0000: 21183.554: [CMS-concurrent-preclean-start]
> 2018-10-03T12:16:40.558+0000: 21183.737: Total time for which application
> threads were stopped: 0.0007029 seconds, Stopping threads took: 0.0005097
> seconds
> 2018-10-03T12:16:41.830+0000: 21185.009: [CMS-concurrent-preclean:
> 1.384/1.455 secs] [Times: user=4.64 sys=0.06, real=1.45 secs]
> 
> 
> 
> 
> 
> 
> 
> 
> 2018-10-03T12:16:41.830+0000: 21185.009:
> [CMS-concurrent-abortable-preclean-start]
> 2018-10-03T12:16:42.910+0000: 21186.089: Total time for which application
> threads were stopped: 0.0003692 seconds, Stopping threads took: 0.0001969
> seconds
> 2018-10-03T12:16:42.914+0000: 21186.093: Total time for which application
> threads were stopped: 0.0012019 seconds, Stopping threads took: 0.0011341
> seconds
> 2018-10-03T12:16:43.287+0000: 21186.466: Total time for which application
> threads were stopped: 0.0004056 seconds, Stopping threads took: 0.0001726
> seconds
> 2018-10-03T12:16:43.547+0000: 21186.726: Total time for which application
> threads were stopped: 0.0003341 seconds, Stopping threads took: 0.0001467
> seconds
> 2018-10-03T12:16:44.547+0000: 21187.726: Total time for which application
> threads were stopped: 0.0003079 seconds, Stopping threads took: 0.0001480
> seconds
> 2018-10-03T12:16:48.547+0000: 21191.726: Total time for which application
> threads were stopped: 0.0001833 seconds, Stopping threads took: 0.0000628
> seconds
>   CMS: abort preclean due to time 2018-10-03T12:16:49.482+0000: 21192.661:
> [CMS-concurrent-abortable-preclean: 7.648/7.652 secs] [Times: user=14.30
> sys=0.12, real=7.66 secs]
> 2018-10-03T12:16:49.482+0000: 21192.661: [GC (CMS Final Remark) [YG
> occupancy: 7690942 K (9868992 K)]{Heap before GC invocations=928 (full 173):
>   par new generation   total 9868992K, used 7690942K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,  76% used [0x00007f4f84000000, 0x00007f5133c5fbe0,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
> 0x00007f51de5b0000)
>    to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
> 0x00007f5204000000)
>   concurrent mark-sweep generation total 73400320K, used 43275169K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> 2018-10-03T12:16:49.482+0000: 21192.661: [GC (CMS Final Remark)
> 2018-10-03T12:16:49.482+0000: 21192.661: [ParNew
> Desired survivor size 568413384 bytes, new threshold 1 (max 8)
> - age   1:  601979288 bytes,  601979288 total
> - age   2:   29589400 bytes,  631568688 total
> : 7690942K->616768K(9868992K), 1.0751368 secs]
> 50966112K->45897734K(83269312K), 1.0752293 secs] [Times: user=3.94 sys=0.00,
> real=1.07 secs]
> Heap after GC invocations=929 (full 173):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 45280966K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:16:50.558+0000: 21193.737: [Rescan (parallel) , 0.3095918
> secs]2018-10-03T12:16:50.867+0000: 21194.046: [weak refs processing,
> 0.0013762 secs]2018-10-03T12:16:50.869+0000: 21194.048: [class unloading,
> 0.0569672 secs]2018-10-03T12:16:50.926+0000: 21194.105: [scrub symbol table,
> 0.0053711 secs]2018-10-03T12:16:50.931+0000: 21194.110: [scrub string table,
> 0.0014654 secs][1 CMS-remark: 45280966K(73400320K)] 45897734K(83269312K),
> 1.4515832 secs] [Times: user=5.25 sys=0.00, real=1.45 secs]
> 
> 
> 
> 
> 
> 
> 2018-10-03T12:16:50.934+0000: 21194.113: Total time for which application
> threads were stopped: 1.4518312 seconds, Stopping threads took: 0.0000537
> seconds
> 2018-10-03T12:16:50.934+0000: 21194.113: [CMS-concurrent-sweep-start]
> 2018-10-03T12:16:50.940+0000: 21194.119: Total time for which application
> threads were stopped: 0.0006360 seconds, Stopping threads took: 0.0002782
> seconds
> 2018-10-03T12:16:51.169+0000: 21194.348: Total time for which application
> threads were stopped: 0.0003905 seconds, Stopping threads took: 0.0001195
> seconds
> 2018-10-03T12:16:52.169+0000: 21195.348: Total time for which application
> threads were stopped: 0.0002521 seconds, Stopping threads took: 0.0000770
> seconds
> 2018-10-03T12:16:53.170+0000: 21196.349: Total time for which application
> threads were stopped: 0.0003716 seconds, Stopping threads took: 0.0001484
> seconds
> 2018-10-03T12:16:53.548+0000: 21196.727: Total time for which application
> threads were stopped: 0.0003341 seconds, Stopping threads took: 0.0001590
> seconds
> 2018-10-03T12:16:54.548+0000: 21197.727: Total time for which application
> threads were stopped: 0.0002481 seconds, Stopping threads took: 0.0000942
> seconds
> 2018-10-03T12:16:55.549+0000: 21198.728: Total time for which application
> threads were stopped: 0.0003797 seconds, Stopping threads took: 0.0001918
> seconds
> 2018-10-03T12:16:56.188+0000: 21199.367: [CMS-concurrent-sweep: 5.251/5.254
> secs] [Times: user=10.42 sys=0.03, real=5.25 secs]
> 2018-10-03T12:16:56.189+0000: 21199.367: [CMS-concurrent-reset-start]
> 2018-10-03T12:16:56.339+0000: 21199.518: [CMS-concurrent-reset: 0.151/0.151
> secs] [Times: user=0.18 sys=0.00, real=0.15 secs]
> 2018-10-03T12:16:56.549+0000: 21199.728: Total time for which application
> threads were stopped: 0.0003961 seconds, Stopping threads took: 0.0002311
> seconds
> 2018-10-03T12:16:57.550+0000: 21200.729: Total time for which application
> threads were stopped: 0.0002442 seconds, Stopping threads took: 0.0001090
> seconds
> 
> 
> 2018-10-03T12:16:58.339+0000: 21201.518: [GC (CMS Initial Mark) [1
> CMS-initial-mark: 45178633K(73400320K)] 49907123K(83269312K), 0.4192155
> secs] [Times: user=1.68 sys=0.00, real=0.42 secs]
> 2018-10-03T12:16:58.759+0000: 21201.938: Total time for which application
> threads were stopped: 0.4195337 seconds, Stopping threads took: 0.0000902
> seconds
> 2018-10-03T12:16:58.759+0000: 21201.938: [CMS-concurrent-mark-start]
> 2018-10-03T12:16:58.759+0000: 21201.938: Total time for which application
> threads were stopped: 0.0002544 seconds, Stopping threads took: 0.0001327
> seconds
> 2018-10-03T12:17:00.760+0000: 21203.939: Total time for which application
> threads were stopped: 0.0003008 seconds, Stopping threads took: 0.0001486
> seconds
> 2018-10-03T12:17:02.760+0000: 21205.939: Total time for which application
> threads were stopped: 0.0002923 seconds, Stopping threads took: 0.0001359
> seconds
> 2018-10-03T12:17:03.760+0000: 21206.939: Total time for which application
> threads were stopped: 0.0003256 seconds, Stopping threads took: 0.0001512
> seconds
> 2018-10-03T12:17:04.760+0000: 21207.939: Total time for which application
> threads were stopped: 0.0003389 seconds, Stopping threads took: 0.0001654
> seconds
> 2018-10-03T12:17:05.761+0000: 21208.940: Total time for which application
> threads were stopped: 0.0002250 seconds, Stopping threads took: 0.0000897
> seconds
> 2018-10-03T12:17:06.761+0000: 21209.940: Total time for which application
> threads were stopped: 0.0001925 seconds, Stopping threads took: 0.0000749
> seconds
> 2018-10-03T12:17:07.082+0000: 21210.261: [CMS-concurrent-mark: 8.316/8.323
> secs] [Times: user=39.05 sys=0.28, real=8.33 secs]
> 2018-10-03T12:17:07.082+0000: 21210.261: [CMS-concurrent-preclean-start]
> 2018-10-03T12:17:07.192+0000: 21210.371: [CMS-concurrent-preclean:
> 0.109/0.110 secs] [Times: user=0.11 sys=0.00, real=0.10 secs]
> 2018-10-03T12:17:07.192+0000: 21210.371:
> [CMS-concurrent-abortable-preclean-start]
> 2018-10-03T12:17:08.760+0000: 21211.939: Total time for which application
> threads were stopped: 0.0001917 seconds, Stopping threads took: 0.0000699
> seconds
> 2018-10-03T12:17:10.761+0000: 21213.940: Total time for which application
> threads were stopped: 0.0002303 seconds, Stopping threads took: 0.0000800
> seconds
> 2018-10-03T12:17:12.761+0000: 21215.940: Total time for which application
> threads were stopped: 0.0002098 seconds, Stopping threads took: 0.0000633
> seconds
>   CMS: abort preclean due to time 2018-10-03T12:17:13.355+0000: 21216.534:
> [CMS-concurrent-abortable-preclean: 6.162/6.163 secs] [Times: user=9.43
> sys=0.03, real=6.17 secs]
> 2018-10-03T12:17:13.355+0000: 21216.534: [GC (CMS Final Remark) [YG
> occupancy: 9513570 K (9868992 K)]{Heap before GC invocations=929 (full 174):
>   par new generation   total 9868992K, used 9513570K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,  96% used [0x00007f4f84000000, 0x00007f51a3048ab0,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51de5b0000, 0x00007f5204000000,
> 0x00007f5204000000)
>    to   space 616768K,   0% used [0x00007f51b8b60000, 0x00007f51b8b60000,
> 0x00007f51de5b0000)
>   concurrent mark-sweep generation total 73400320K, used 45178633K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> 2018-10-03T12:17:13.356+0000: 21216.534: [GC (CMS Final Remark)
> 2018-10-03T12:17:13.356+0000: 21216.534: [ParNew
> Desired survivor size 568413384 bytes, new threshold 1 (max 8)
> - age   1:  620466400 bytes,  620466400 total
> : 9513570K->616768K(9868992K), 0.8406100 secs]
> 54692204K->48320246K(83269312K), 0.8407079 secs] [Times: user=2.98 sys=0.00,
> real=0.84 secs]
> Heap after GC invocations=930 (full 174):
>   par new generation   total 9868992K, used 616768K [0x00007f4f84000000,
> 0x00007f5204000000, 0x00007f5204000000)
>    eden space 9252224K,   0% used [0x00007f4f84000000, 0x00007f4f84000000,
> 0x00007f51b8b60000)
>    from space 616768K, 100% used [0x00007f51b8b60000, 0x00007f51de5b0000,
> 0x00007f51de5b0000)
>    to   space 616768K,   0% used [0x00007f51de5b0000, 0x00007f51de5b0000,
> 0x00007f5204000000)
>   concurrent mark-sweep generation total 73400320K, used 47703478K
> [0x00007f5204000000, 0x00007f6384000000, 0x00007f6384000000)
>   Metaspace       used 58556K, capacity 59420K, committed 59812K, reserved
> 61440K
> }
> 2018-10-03T12:17:14.196+0000: 21217.375: [Rescan (parallel) , 0.2207621
> secs]2018-10-03T12:17:14.417+0000: 21217.596: [weak refs processing,
> 0.0016539 secs]2018-10-03T12:17:14.419+0000: 21217.598: [class unloading,
> 0.0577526 secs]2018-10-03T12:17:14.476+0000: 21217.655: [scrub symbol table,
> 0.0055694 secs]2018-10-03T12:17:14.482+0000: 21217.661: [scrub string table,
> 0.0014483 secs][1 CMS-remark: 47703478K(73400320K)] 48320246K(83269312K),
> 1.1294195 secs] [Times: user=3.93 sys=0.00, real=1.13 secs]
> 2018-10-03T12:17:14.485+0000: 21217.664: Total time for which application
> threads were stopped: 1.1297533 seconds, Stopping threads took: 0.0001321
> seconds
> 2018-10-03T12:17:14.485+0000: 21217.664: Total time for which application
> threads were stopped: 0.0002640 seconds, Stopping threads took: 0.0001215
> seconds
> 2018-10-03T12:17:14.485+0000: 21217.664: [CMS-concurrent-sweep-start]
> 2018-10-03T12:17:14.502+0000: 21217.681: Total time for which application
> threads were stopped: 0.0004896 seconds, Stopping threads took: 0.0001425
> seconds
> 2018-10-03T12:17:14.562+0000: 21217.741: Total time for which application
> threads were stopped: 0.0006066 seconds, Stopping threads took: 0.0002838
> seconds
> 2018-10-03T12:17:15.562+0000: 21218.741: Total time for which application
> threads were stopped: 0.0002378 seconds, Stopping threads took: 0.0000804
> seconds
> 2018-10-03T12:17:16.562+0000: 21219.741: Total time for which application
> threads were stopped: 0.0002154 seconds, Stopping threads took: 0.0000845
> seconds
> 2018-10-03T12:17:17.563+0000: 21220.742: Total time for which application
> threads were stopped: 0.0001754 seconds, Stopping threads took: 0.0000556
> seconds
> 2018-10-03T12:17:18.563+0000: 21221.742: Total time for which application
> threads were stopped: 0.0001794 seconds, Stopping threads took: 0.0000606
> seconds
> 2018-10-03T12:17:19.486+0000: 21222.665: Total time for which application
> threads were stopped: 0.0002057 seconds, Stopping threads took: 0.0000719
> seconds
> 2018-10-03T12:17:20.486+0000: 21223.665: Total time for which application
> threads were stopped: 0.0003069 seconds, Stopping threads took: 0.0001119
> seconds
> 2018-10-03T12:17:21.487+0000: 21224.666: Total time for which application
> threads were stopped: 0.0003546 seconds, Stopping threads took: 0.0001126
> seconds
> 2018-10-03T12:17:22.487+0000: 21225.666: Total time for which application
> threads were stopped: 0.0003192 seconds, Stopping threads took: 0.0001288
> seconds
> 2018-10-03T12:17:23.488+0000: 21226.666: Total time for which application
> threads were stopped: 0.0002118 seconds, Stopping threads took: 0.0000659
> seconds
> 2018-10-03T12:17:24.486+0000: 21227.665: Total time for which application
> threads were stopped: 0.0001808 seconds, Stopping threads took: 0.0000631
> seconds
> 2018-10-03T12:17:25.486+0000: 21228.665: Total time for which application
> threads were stopped: 0.0001869 seconds, Stopping threads took: 0.0000732
> seconds
> 2018-10-03T12:17:26.487+0000: 21229.666: Total time for which application
> threads were stopped: 0.0002461 seconds, Stopping threads took: 0.0001013
> seconds
> 2018-10-03T12:17:27.332+0000: 21230.511: Total time for which application
> threads were stopped: 0.0002317 seconds, Stopping threads took: 0.0000648
> seconds
> 2018-10-03T12:17:28.224+0000: 21231.403: Total time for which application
> threads were stopped: 0.0001957 seconds, Stopping threads took: 0.0000599
> seconds
> 2018-10-03T12:17:29.225+0000: 21232.403: Total time for which application
> threads were stopped: 0.0002778 seconds, Stopping threads took: 0.0001248
> seconds
> 2018-10-03T12:17:29.487+0000: 21232.665: Total time for which application
> threads were stopped: 0.0001862 seconds, Stopping threads took: 0.0000633
> seconds
> 2018-10-03T12:17:32.673+0000: 21235.852: [CMS-concurrent-sweep:
> 18.182/18.188 secs] [Times: user=30.61 sys=0.06, real=18.18 secs]
> 2018-10-03T12:17:32.673+0000: 21235.852: [CMS-concurrent-reset-start]
> 2018-10-03T12:17:32.824+0000: 21236.003: [CMS-concurrent-reset: 0.151/0.151
> secs] [Times: user=0.24 sys=0.00, real=0.16 secs]
> 
> -----------------------------------------------------------------
> -----------------------------------------------------------------
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by Erick Erickson <er...@gmail.com>.

I think you're on the wrong track here. Shawn is absolutely
right his statements about when the field cache is invalidated.

That said, the cure to your problem has nothing to do with
opening new searchers. Solr puts values in caches as it needs
them _and keeps them there_. In the sort/group/facet/function
case, it needs the values for field X so it uninverts the field and
puts it in the cache where it stays. The assumption is that if it's
used once, it'll be used again. Since the uninversion process is
expensive, the idea of building it over and over is not something
that's performant.

This problem will go away if you define these fields with
docValues="true" (and completely re-index into a new collection).
The index will get bigger true. But it's a case of
pay-me-now-or-pay-me-later. Essentially at index time, the
uninverted structure is built and serialized to disk. Thereafter,
it's in MMapDirectoyr (OS memory) space. Here's the go-to
blog about that:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

So the growth in the index size is roughly the amount of heap that
will _not_ be used for uninverting fields. By removing the structures
from the heap, the memory allocated to the JVM can get much smaller,
reducing GC pressure and the like.

There are no advantages and many disadvantagess to _not_ using docValues
for fields used for sorting/grouping/faceting.

I'm overstating the case a tiny bit, but for all practical purposes I
stand behind
it ;).

Best,
Erick
On Sun, Oct 14, 2018 at 11:46 AM Shawn Heisey <ap...@elyograg.org> wrote:
>
> On 10/14/2018 6:32 AM, yasoobhaider wrote:
> > Memory Analyzer output:
> >
> > One instance of "org.apache.solr.uninverting.FieldCacheImpl" loaded by
> > "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x7f60f7b38658" occupies
> > 61,234,712,560 (91.86%) bytes. The memory is accumulated in one instance of
> > "java.util.HashMap$Node[]" loaded by "<system class loader>".
> <snip>
> > But I also noticed that the fieldcache entries on solr UI have the same
> > entries for all collections on that solr instance.
> >
> > Ques 1. Is the field cache reset on commit? If so, is it reset when any of
> > the collections are committed? Or it is not reset at all and I'm missing
> > something here?
>
> ALL caches are invalidated when a new searcher is opened. Some of the
> caches support autowarming.  The field cache isn't one of them.  The
> documentCache also cannot be warmed ... but warming queryResultCache
> will populate documentCache.
>
> > Ques 2. Is there a way to reset/delete this cache every x minutes (the
> > current autocommit duration) irrespective of whether documents were added or
> > not?
>
> Not that I know of.  Opening a new searcher is generally required to
> clear out Solr's caches.  Opening a new searcher requires a change to
> the index and a commit.  One thing you could do is have your indexing
> software insert/update a dummy document (with a special value in the
> uniqueKey field and all non-required fields missing) on a regular basis.
>
> > Other than this, I think the reason for huge heap usage (as others have
> > pointed out) is that we are not using docValues for any of the fields, and
> > we use a large number of fields in sorting functions (between 15-20 over all
> > queries combined). As the next step on this front, I will add new fields
> > with docvalues true and reindex the entire collection. Hopefully that will
> > help.
>
> Yes, if you facet, do group queries, or sort on fields that do not have
> docValues, then Solr must build an uninverted index to do those things,
> and I think it uses the field cache for that. The docValues structure is
> the same info as an uninverted index, so Solr can just read it directly,
> rather than generating it and using heap memory.
>
> Adding docValues will make your index bigger.  For fields with high
> cardinality, the increase could be large.
>
> > We use quite a few dynamic fields in sorting. There is no mention of using
> > docvalues with dynamic fields in the official documentation
> > (https://lucene.apache.org/solr/guide/6_6/docvalues.html).
> >
> > Ques 3. Do docvalues work with dynamic fields or not? If they do, anything
> > in particular that I should look out for, like the cardinality of the field
> > (ie number of different x's in example_dynamic_field_x)?
>
> The ONLY functional difference between a dynamic field and a "standard"
> field is that dynamic fields are not explicitly named in the schema, but
> use wildcard naming.  This difference is only at the Solr level.  At the
> Lucene level, there is NO difference at all.  A dynamic field can have
> any of the same attributes as other fields, including docValues.
>
> > Shawn, I've uploaded my configuration files for the two collections here:
> > https://ufile.io/u6oe0 (tar -zxvf c1a_confs.tar.gz to decompress)
> >
> > c1 collection is ~10GB when optimized, and has 2.5 million documents.
> > ca collection is ~2GB when optimized, and has 9.5 million documents.
> >
> > Please let me know if you think there is something amiss in the
> > configuration that I should fix.
>
> I think your autowarm counts, particularly on the filterCache, are
> probably too large.  But if commits that open a new searcher are
> happening quickly, you probably won't need to fiddle with that.  You can
> check the admin UI "plugins/stats" area to see how long it takes to open
> the last searcher, and the caches have information about how long it
> took to warm.
>
> You have increased ramBufferSizeMB. The default value is 100, and for
> most indexes, increasing it just consumes memory without making things
> work any faster.  Increasing it a little bit might make a difference on
> your c1 collection, since its documents are a bit larger than what I'd
> call typical.
>
> Here's what I would recommend you use for autoCommit (removing maxDocs,
> lowering maxTime, and setting openSearcher to false):
>
>      <autoCommit>
>        <maxTime>60000</maxTime>
>        <openSearcher>false</openSearcher>
>      </autoCommit>
>
> Opening a new searcher on autoCommit isn't necessary if you configure
> autoSoftCommit, and disabling it will make those commits faster.  You
> *do* want autoCommit configured with a fairly short interval, so don't
> remove it.  Not configuring maxDocs makes the operation more predictable.
>
> For autoSoftCommit, 30 seconds is pretty aggressive, but as long as
> commits are happening very quickly, shouldn't be a problem.  If commits
> are taking more than a few seconds, I would increase the interval.  The
> autoSoftCommit interval in the config one of your collections is set to
> an hour ... if you're not overriding that with the
> solr.autoSoftCommit.maxTime property, you could decrease that one.
>
> Thanks,
> Shawn
>

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/14/2018 6:32 AM, yasoobhaider wrote:
> Memory Analyzer output:
>
> One instance of "org.apache.solr.uninverting.FieldCacheImpl" loaded by
> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x7f60f7b38658" occupies
> 61,234,712,560 (91.86%) bytes. The memory is accumulated in one instance of
> "java.util.HashMap$Node[]" loaded by "<system class loader>".
<snip>
> But I also noticed that the fieldcache entries on solr UI have the same
> entries for all collections on that solr instance.
>
> Ques 1. Is the field cache reset on commit? If so, is it reset when any of
> the collections are committed? Or it is not reset at all and I'm missing
> something here?

ALL caches are invalidated when a new searcher is opened. Some of the 
caches support autowarming.  The field cache isn't one of them.  The 
documentCache also cannot be warmed ... but warming queryResultCache 
will populate documentCache.

> Ques 2. Is there a way to reset/delete this cache every x minutes (the
> current autocommit duration) irrespective of whether documents were added or
> not?

Not that I know of.  Opening a new searcher is generally required to 
clear out Solr's caches.  Opening a new searcher requires a change to 
the index and a commit.  One thing you could do is have your indexing 
software insert/update a dummy document (with a special value in the 
uniqueKey field and all non-required fields missing) on a regular basis.

> Other than this, I think the reason for huge heap usage (as others have
> pointed out) is that we are not using docValues for any of the fields, and
> we use a large number of fields in sorting functions (between 15-20 over all
> queries combined). As the next step on this front, I will add new fields
> with docvalues true and reindex the entire collection. Hopefully that will
> help.

Yes, if you facet, do group queries, or sort on fields that do not have 
docValues, then Solr must build an uninverted index to do those things, 
and I think it uses the field cache for that. The docValues structure is 
the same info as an uninverted index, so Solr can just read it directly, 
rather than generating it and using heap memory.

Adding docValues will make your index bigger.  For fields with high 
cardinality, the increase could be large.

> We use quite a few dynamic fields in sorting. There is no mention of using
> docvalues with dynamic fields in the official documentation
> (https://lucene.apache.org/solr/guide/6_6/docvalues.html).
>
> Ques 3. Do docvalues work with dynamic fields or not? If they do, anything
> in particular that I should look out for, like the cardinality of the field
> (ie number of different x's in example_dynamic_field_x)?

The ONLY functional difference between a dynamic field and a "standard" 
field is that dynamic fields are not explicitly named in the schema, but 
use wildcard naming.  This difference is only at the Solr level.  At the 
Lucene level, there is NO difference at all.  A dynamic field can have 
any of the same attributes as other fields, including docValues.

> Shawn, I've uploaded my configuration files for the two collections here:
> https://ufile.io/u6oe0 (tar -zxvf c1a_confs.tar.gz to decompress)
>
> c1 collection is ~10GB when optimized, and has 2.5 million documents.
> ca collection is ~2GB when optimized, and has 9.5 million documents.
>
> Please let me know if you think there is something amiss in the
> configuration that I should fix.

I think your autowarm counts, particularly on the filterCache, are 
probably too large.  But if commits that open a new searcher are 
happening quickly, you probably won't need to fiddle with that.  You can 
check the admin UI "plugins/stats" area to see how long it takes to open 
the last searcher, and the caches have information about how long it 
took to warm.

You have increased ramBufferSizeMB. The default value is 100, and for 
most indexes, increasing it just consumes memory without making things 
work any faster.  Increasing it a little bit might make a difference on 
your c1 collection, since its documents are a bit larger than what I'd 
call typical.

Here's what I would recommend you use for autoCommit (removing maxDocs, 
lowering maxTime, and setting openSearcher to false):

     <autoCommit>
       <maxTime>60000</maxTime>
       <openSearcher>false</openSearcher>
     </autoCommit>

Opening a new searcher on autoCommit isn't necessary if you configure 
autoSoftCommit, and disabling it will make those commits faster.  You 
*do* want autoCommit configured with a fairly short interval, so don't 
remove it.  Not configuring maxDocs makes the operation more predictable.

For autoSoftCommit, 30 seconds is pretty aggressive, but as long as 
commits are happening very quickly, shouldn't be a problem.  If commits 
are taking more than a few seconds, I would increase the interval.  The 
autoSoftCommit interval in the config one of your collections is set to 
an hour ... if you're not overriding that with the 
solr.autoSoftCommit.maxTime property, you could decrease that one.

Thanks,
Shawn

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by yasoobhaider <ya...@gmail.com>.

After none of the JVM configuration options helped witH GC, as Erick
suggested I took a heap dump of one of the misbehaving slaves and analysis
shows that fieldcache is using a large amount of the the total heap.

Memory Analyzer output:

One instance of "org.apache.solr.uninverting.FieldCacheImpl" loaded by
"org.eclipse.jetty.webapp.WebAppClassLoader @ 0x7f60f7b38658" occupies
61,234,712,560 (91.86%) bytes. The memory is accumulated in one instance of
"java.util.HashMap$Node[]" loaded by "<system class loader>".

Hypotheses:

Without regular indexing, commits are not happening, so the searcher is not
being re opened, and field cache is not being reset. Since there is only one
instance of this field cache, it is a live object and not being cleaned up
in GC.

But I also noticed that the fieldcache entries on solr UI have the same
entries for all collections on that solr instance.

Ques 1. Is the field cache reset on commit? If so, is it reset when any of
the collections are committed? Or it is not reset at all and I'm missing
something here?
Ques 2. Is there a way to reset/delete this cache every x minutes (the
current autocommit duration) irrespective of whether documents were added or
not?

Other than this, I think the reason for huge heap usage (as others have
pointed out) is that we are not using docValues for any of the fields, and
we use a large number of fields in sorting functions (between 15-20 over all
queries combined). As the next step on this front, I will add new fields
with docvalues true and reindex the entire collection. Hopefully that will
help.

We use quite a few dynamic fields in sorting. There is no mention of using
docvalues with dynamic fields in the official documentation
(https://lucene.apache.org/solr/guide/6_6/docvalues.html). 

Ques 3. Do docvalues work with dynamic fields or not? If they do, anything
in particular that I should look out for, like the cardinality of the field
(ie number of different x's in example_dynamic_field_x)?

Shawn, I've uploaded my configuration files for the two collections here:
https://ufile.io/u6oe0 (tar -zxvf c1a_confs.tar.gz to decompress)

c1 collection is ~10GB when optimized, and has 2.5 million documents.
ca collection is ~2GB when optimized, and has 9.5 million documents.

Please let me know if you think there is something amiss in the
configuration that I should fix.

Thanks
Yasoob



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/11/2018 4:51 AM, yasoobhaider wrote:
> Hi Shawn, thanks for the inputs.
>
> I have uploaded the gc logs of one of the slaves here:
> https://ufile.io/ecvag (should work till 18th Oct '18)
>
> I uploaded the logs to gceasy as well and it says that the problem is
> consecutive full GCs. According to the solution they have mentioned,
> increasing the heap size is a solution. But I am already running on a pretty
> big heap, so don't think increasing the heap size is going to be a long term
> solution.

Surprisingly, the GC performance in that logfile is actually pretty 
good.  I was more interested in how much heap was actually being used 
than the performance.

The "Heap after GC" button on the gceasy report page (which controls 
which graph is shown) shows that you really are using most of that 
80GB.  If the info you shared about your index sizes is accurate, the 
only way I can imagine this much heap being necessary is configuration.

It sounds like each system should have two configurations -- 
solrconfig.xml and the schema are the primary things in those configs.  
Can you share the unique configuration directories for each of your 
indexes, which I think means there will be two of them? For each of the 
configurations, indicate the number of documents and size on disk.  
You'll need to use a file sharing site.  It would be best to archive 
each directory into its own zipfile or .tar.gz file.

If your systems are running in cloud mode, the active configuration will 
be stored in zookeeper.

Thanks,
Shawn

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by Erick Erickson <er...@gmail.com>.

I have to echo what others have said. An 80G heap is waaaaaaay out the norm,
especially when you consider the size of your indexes and the number of docs.

Understanding why you think you need that much heap should be your top
priority. As has already been suggested, insuring docValues are set for all
fields that are used for sorting, faceting and grouping is a must. Deep paging
can hurt too.

In addition I'd check the cache settings, do you have a huge
filterCache? What about
the other caches? One common mistake is to have very high cache
settings, in your setup
I'd stick with 512 to start.

Without _data_ it's hard to say, so unless some of those settings
don't help the next
thing I'd do is a heap dump or put a profiler on the JVM and see where
the heap is actually
allocated.

It's quite possible that you arrived at 80G with some mistaken
assumptions and once
those are cleared up you can reduce your heap a lot. You say "through
a lot of trial and error",
what exactly happens when you use, say, a 32G heap? OOMs? Slowdowns?

This is also starving your OS cache where most of the Lucene index
data is stored, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best,
Erick

On Thu, Oct 11, 2018 at 4:42 AM yasoobhaider <ya...@gmail.com> wrote:
>
> Hi Shawn, thanks for the inputs.
>
> I have uploaded the gc logs of one of the slaves here:
> https://ufile.io/ecvag (should work till 18th Oct '18)
>
> I uploaded the logs to gceasy as well and it says that the problem is
> consecutive full GCs. According to the solution they have mentioned,
> increasing the heap size is a solution. But I am already running on a pretty
> big heap, so don't think increasing the heap size is going to be a long term
> solution.
>
> From what I understood from a bit more looking around, this is Concurrent
> Mode Failure for CMS. I found an old blog mentioning the use of
> XX:CMSFullGCsBeforeCompaction=1 to make sure that compaction is done prior
> to next collection trigger. So if it is a fragmentation problem, this will
> solve it I hope.
>
> I will also try out using docValues as suggested by Ere on a couple of
> fields on which we make a lot of faceting queries to reduce memory usage on
> the slaves.
>
> Please share any ideas that you may have from the gc logs analysis
>
> Thanks
> Yasoob
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by yasoobhaider <ya...@gmail.com>.

Hi Shawn, thanks for the inputs.

I have uploaded the gc logs of one of the slaves here:
https://ufile.io/ecvag (should work till 18th Oct '18)

I uploaded the logs to gceasy as well and it says that the problem is
consecutive full GCs. According to the solution they have mentioned,
increasing the heap size is a solution. But I am already running on a pretty
big heap, so don't think increasing the heap size is going to be a long term
solution.

From what I understood from a bit more looking around, this is Concurrent
Mode Failure for CMS. I found an old blog mentioning the use of
XX:CMSFullGCsBeforeCompaction=1 to make sure that compaction is done prior
to next collection trigger. So if it is a fragmentation problem, this will
solve it I hope.

I will also try out using docValues as suggested by Ere on a couple of
fields on which we make a lot of faceting queries to reduce memory usage on
the slaves.

Please share any ideas that you may have from the gc logs analysis

Thanks
Yasoob



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by Walter Underwood <wu...@wunderwood.org>.

We run a big cluster with 8 GB heap on the JVMs. When we used CMS, I gave 2 GB to
the new generation. Solr queries make a ton of short-lived allocations. You want all of that
to come from the new gen. I don’t fool around with ratios. I just set the numbers.

We used these:

-d64
-server
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+ExplicitGCInvokesConcurrent
-Xms8g
-Xmx8g
-XX:NewSize=2g
-XX:MaxPermSize=256m

Now we run G1.

This is a cluster with 25 million documents, 8 shards, 48 nodes, each node has 36 CPUs.
Queries average 25 terms, which uses a lot of CPU.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 3, 2018, at 6:56 PM, Jeff Courtade <co...@gmail.com> wrote:
> 
> We use 4.3.0 I found that we went into gc hell as you describe with small
> newgen. We use  CMS gc as well
> 
> Using newration=2 got us out of that 3 wasn't enough...heap of 32 gig
> only....
> I have not gone over 32 gig as testing showed diminishing returns over 32
> gig. I only was brave enough to go to 40 though.
> 
> On Wed, Oct 3, 2018, 5:34 PM Shawn Heisey <ap...@elyograg.org> wrote:
> 
>> On 10/3/2018 8:01 AM, yasoobhaider wrote:
>>> Master and slave config:
>>> ram: 120GB
>>> cores: 16
>>> 
>>> At any point there are between 10-20 slaves in the cluster, each serving
>> ~2k
>>> requests per minute. Each slave houses two collections of approx 10G
>>> (~2.5mil docs) and 2G(10mil docs) when optimized.
>>> 
>>> I am working with Solr 6.2.1
>>> 
>>> Solr configuration:
>> <snip>
>>> -Xmn10G
>>> -Xms80G
>>> -Xmx80G
>> 
>> I cannot imagine that an 80GB heap is needed when there are only 12.5
>> million documents and 12GB of index data.  I've handled MUCH larger
>> indexes with only 8GB of heap.  Even with your very high query rate, if
>> you really do need 80GB of heap, there's something unusual going on.
>> 
>>> I would really be grateful for any advice on the following:
>>> 
>>> 1. What could be the reason behind CMS not being able to free up the
>> memory?
>>> What are some experiments I can run to solve this problem?
>> 
>> Maybe there's no garbage in the heap to free up?  If the GC never
>> finishes, that sounds like a possible problem with either Java or the
>> operating system, maybe even some kind of hardware issue.
>> 
>>> 2. Can stopping/starting indexing be a reason for such drastic changes
>> to GC
>>> pattern?
>> 
>> Indexing generally requires more heap than just handling queries.
>> 
>>> 3. I have read at multiple places on this mailing list that the heap size
>>> should be much lower (2x-3x the size of collection), but the last time I
>>> tried CMS was not able to run smoothly and GC STW would occur which was
>> only
>>> solved by a restart. My reasoning for this is that the type of queries
>> and
>>> the throughput are also a factor in deciding the heap size, so it may be
>>> that our queries are creating too many objects maybe. Is my reasoning
>>> correct or should I try with a lower heap size (if it helps achieve a
>> stable
>>> gc pattern)?
>> 
>> Do you have a GC log covering a good long runtime, where the problems
>> happened during the time the log covers?  Can you share it?  Attachments
>> rarely make it to the list, you'll need to find a file sharing site.
>> The small excerpt from the GC log that you included in your message
>> isn't enough to make any kind of determination.  Full disclosure:  I'm
>> going to send your log to http://gceasy.io for analysis.  You can do
>> this yourself, their analysis is really good.
>> 
>> There is no generic advice possible regarding how large a heap you
>> need.  It will depend on many factors.
>> 
>>> (4. Silly question, but what is the right way to ask question on the
>> mailing
>>> list? via mail or via the nabble website? I sent this question earlier
>> as a
>>> mail, but it was not showing up on the nabble website so I am posting it
>>> from the website now)
>> 
>> Nabble mirrors the mailing list in forum format.  It's generally better
>> to use the mailing list directly.  The project has absolutely no
>> influence over the Nabble website, and things do not always work
>> correctly when Nabble is involved.  The IRC channel is another good way
>> to get support.  If there is somebody paying attention when you ask your
>> question, a far more interactive chat can be obtained.
>> 
>> Thanks,
>> Shawn
>> 
>>

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by Jeff Courtade <co...@gmail.com>.

We use 4.3.0 I found that we went into gc hell as you describe with small
newgen. We use  CMS gc as well

Using newration=2 got us out of that 3 wasn't enough...heap of 32 gig
only....
 I have not gone over 32 gig as testing showed diminishing returns over 32
gig. I only was brave enough to go to 40 though.

On Wed, Oct 3, 2018, 5:34 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/3/2018 8:01 AM, yasoobhaider wrote:
> > Master and slave config:
> > ram: 120GB
> > cores: 16
> >
> > At any point there are between 10-20 slaves in the cluster, each serving
> ~2k
> > requests per minute. Each slave houses two collections of approx 10G
> > (~2.5mil docs) and 2G(10mil docs) when optimized.
> >
> > I am working with Solr 6.2.1
> >
> > Solr configuration:
> <snip>
> > -Xmn10G
> > -Xms80G
> > -Xmx80G
>
> I cannot imagine that an 80GB heap is needed when there are only 12.5
> million documents and 12GB of index data.  I've handled MUCH larger
> indexes with only 8GB of heap.  Even with your very high query rate, if
> you really do need 80GB of heap, there's something unusual going on.
>
> > I would really be grateful for any advice on the following:
> >
> > 1. What could be the reason behind CMS not being able to free up the
> memory?
> > What are some experiments I can run to solve this problem?
>
> Maybe there's no garbage in the heap to free up?  If the GC never
> finishes, that sounds like a possible problem with either Java or the
> operating system, maybe even some kind of hardware issue.
>
> > 2. Can stopping/starting indexing be a reason for such drastic changes
> to GC
> > pattern?
>
> Indexing generally requires more heap than just handling queries.
>
> > 3. I have read at multiple places on this mailing list that the heap size
> > should be much lower (2x-3x the size of collection), but the last time I
> > tried CMS was not able to run smoothly and GC STW would occur which was
> only
> > solved by a restart. My reasoning for this is that the type of queries
> and
> > the throughput are also a factor in deciding the heap size, so it may be
> > that our queries are creating too many objects maybe. Is my reasoning
> > correct or should I try with a lower heap size (if it helps achieve a
> stable
> > gc pattern)?
>
> Do you have a GC log covering a good long runtime, where the problems
> happened during the time the log covers?  Can you share it?  Attachments
> rarely make it to the list, you'll need to find a file sharing site.
> The small excerpt from the GC log that you included in your message
> isn't enough to make any kind of determination.  Full disclosure:  I'm
> going to send your log to http://gceasy.io for analysis.  You can do
> this yourself, their analysis is really good.
>
> There is no generic advice possible regarding how large a heap you
> need.  It will depend on many factors.
>
> > (4. Silly question, but what is the right way to ask question on the
> mailing
> > list? via mail or via the nabble website? I sent this question earlier
> as a
> > mail, but it was not showing up on the nabble website so I am posting it
> > from the website now)
>
> Nabble mirrors the mailing list in forum format.  It's generally better
> to use the mailing list directly.  The project has absolutely no
> influence over the Nabble website, and things do not always work
> correctly when Nabble is involved.  The IRC channel is another good way
> to get support.  If there is somebody paying attention when you ask your
> question, a far more interactive chat can be obtained.
>
> Thanks,
> Shawn
>
>

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/3/2018 8:01 AM, yasoobhaider wrote:
> Master and slave config:
> ram: 120GB
> cores: 16
>
> At any point there are between 10-20 slaves in the cluster, each serving ~2k
> requests per minute. Each slave houses two collections of approx 10G
> (~2.5mil docs) and 2G(10mil docs) when optimized.
>
> I am working with Solr 6.2.1
>
> Solr configuration:
<snip>
> -Xmn10G
> -Xms80G
> -Xmx80G

I cannot imagine that an 80GB heap is needed when there are only 12.5 
million documents and 12GB of index data.  I've handled MUCH larger 
indexes with only 8GB of heap.  Even with your very high query rate, if 
you really do need 80GB of heap, there's something unusual going on.

> I would really be grateful for any advice on the following:
>
> 1. What could be the reason behind CMS not being able to free up the memory?
> What are some experiments I can run to solve this problem?

Maybe there's no garbage in the heap to free up?  If the GC never 
finishes, that sounds like a possible problem with either Java or the 
operating system, maybe even some kind of hardware issue.

> 2. Can stopping/starting indexing be a reason for such drastic changes to GC
> pattern?

Indexing generally requires more heap than just handling queries.

> 3. I have read at multiple places on this mailing list that the heap size
> should be much lower (2x-3x the size of collection), but the last time I
> tried CMS was not able to run smoothly and GC STW would occur which was only
> solved by a restart. My reasoning for this is that the type of queries and
> the throughput are also a factor in deciding the heap size, so it may be
> that our queries are creating too many objects maybe. Is my reasoning
> correct or should I try with a lower heap size (if it helps achieve a stable
> gc pattern)?

Do you have a GC log covering a good long runtime, where the problems 
happened during the time the log covers?  Can you share it?  Attachments 
rarely make it to the list, you'll need to find a file sharing site.  
The small excerpt from the GC log that you included in your message 
isn't enough to make any kind of determination.  Full disclosure:  I'm 
going to send your log to http://gceasy.io for analysis.  You can do 
this yourself, their analysis is really good.

There is no generic advice possible regarding how large a heap you 
need.  It will depend on many factors.

> (4. Silly question, but what is the right way to ask question on the mailing
> list? via mail or via the nabble website? I sent this question earlier as a
> mail, but it was not showing up on the nabble website so I am posting it
> from the website now)

Nabble mirrors the mailing list in forum format.  It's generally better 
to use the mailing list directly.  The project has absolutely no 
influence over the Nabble website, and things do not always work 
correctly when Nabble is involved.  The IRC channel is another good way 
to get support.  If there is somebody paying attention when you ask your 
question, a far more interactive chat can be obtained.

Thanks,
Shawn

Re: CMS GC - Old Generation collection never finishes (due to GC Allocation Failure?)

Posted by Dominique Bejean <do...@eolya.fr>.

Hi,

1/
As previously said by other persons, my first action would be to understand
why you need so much heap ?

The first step is to maximize your heap size to 31Gb (or obviously less if
possible).
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/

Can you provide some typical sorl requests covering most of your use cases
? Take them in solr logs in order to provide also hits count and qtime.

   - take care to rows and fl parameters
   - if you are using facets, use JSON API facets


Did you optimise your schema ?

   - remove unnecessary fields from you indices
   - optimize indexed, stored and docValues attributes (do not index or
   store unnecessary)


Did you increase to much solr caches ?

I didn't see the java version you are using.


2/
with huge heap, I would try the G1 GC.


3/
I would stop optimize indexes


4/
It looks like you have enough RAM to for your heap and the system cache (80
Gb + 20 Gb < 120 Gb), but did you disable swap on your server
(vm.swappiness = 1) ?


5/
How often are you updating your indexes on master (continuously, once an
hour, ... once a day) ?


Regards

Dominique



Le mer. 3 oct. 2018 à 23:11, yasoobhaider <ya...@gmail.com> a
écrit :

> Hi
>
> I'm working with a Solr cluster with master-slave architecture.
>
> Master and slave config:
> ram: 120GB
> cores: 16
>
> At any point there are between 10-20 slaves in the cluster, each serving
> ~2k
> requests per minute. Each slave houses two collections of approx 10G
> (~2.5mil docs) and 2G(10mil docs) when optimized.
>
> I am working with Solr 6.2.1
>
> Solr configuration:
>
> -XX:+CMSParallelRemarkEnabled
> -XX:+CMSScavengeBeforeRemark
> -XX:+ParallelRefProcEnabled
> -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps
> -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps
> -XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC
> -XX:-OmitStackTraceInFastThrow
> -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTime=6000
> -XX:ConcGCThreads=4
> -XX:MaxTenuringThreshold=8
> -XX:ParallelGCThreads=4
> -XX:PretenureSizeThreshold=64m
> -XX:SurvivorRatio=15
> -XX:TargetSurvivorRatio=90
> -Xmn10G
> -Xms80G
> -Xmx80G
>
> Some of these configurations have been reached by multiple trial and errors
> over time, including the huge heap size.
>
> This cluster usually runs without any error.
>
> In the usual scenario, old gen gc is triggered according to the
> configuration at 50% old gen occupancy, and the collector clears out the
> memory over the next minute or so. This happens every 10-15 minutes.
>
> However, I have noticed that sometimes the GC pattern of the slaves
> completely changes and old gen gc is not able to clear the memory.
>
> After observing the gc logs closely for multiple old gen gc collections, I
> noticed that the old gen gc is triggered at 50% occupancy, but if there is
> a
> GC Allocation Failure before the collection completes (after CMS Initial
> Remark but before CMS reset), the old gen collection is not able to clear
> much memory. And as soon as this collection completes, another old gen gc
> is
> triggered.
>
> And in worst case scenarios, this cycle of old gen gc triggering, GC
> allocation failure keeps happening, and the old gen memory keeps
> increasing,
> leading to a single threaded STW GC, which is not able to do much, and I
> have to restart the solr server.
>
> The last time this happened after the following sequence of events:
>
> 1. We optimized the bigger collection bringing it to its optimized size of
> ~10G.
> 2. For an unrelated reason, we had stopped indexing to the master. We
> usually index at a low-ish throughput of ~1mil docs/day. This is relevant
> as
> when we are indexing, the size of the collection increases, and this
> effects
> the heap size used by collection.
> 3. The slaves started behaving erratically, with old gc collection not
> being
> able to free up the required memory and finally being stuck in a STW GC.
>
> As unlikely as this sounds, this is the only thing that changed on the
> cluster. There was no change in query throughput or type of queries.
>
> I restarted the slaves multiple times but the gc behaved in the same way
> for
> over three days. Then when we fixed the indexing and made it live, the
> slaves resumed their original gc pattern and are running without any issues
> for over 24 hours now.
>
> I would really be grateful for any advice on the following:
>
> 1. What could be the reason behind CMS not being able to free up the
> memory?
> What are some experiments I can run to solve this problem?
> 2. Can stopping/starting indexing be a reason for such drastic changes to
> GC
> pattern?
> 3. I have read at multiple places on this mailing list that the heap size
> should be much lower (2x-3x the size of collection), but the last time I
> tried CMS was not able to run smoothly and GC STW would occur which was
> only
> solved by a restart. My reasoning for this is that the type of queries and
> the throughput are also a factor in deciding the heap size, so it may be
> that our queries are creating too many objects maybe. Is my reasoning
> correct or should I try with a lower heap size (if it helps achieve a
> stable
> gc pattern)?
>
> (4. Silly question, but what is the right way to ask question on the
> mailing
> list? via mail or via the nabble website? I sent this question earlier as a
> mail, but it was not showing up on the nabble website so I am posting it
> from the website now)
>
>
>