You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Mina Naguib (JIRA)" <ji...@apache.org> on 2017/03/22 14:06:41 UTC
[jira] [Created] (CASSANDRA-13365) Nodes entering GC loop, does not
recover
Mina Naguib created CASSANDRA-13365:
---------------------------------------
Summary: Nodes entering GC loop, does not recover
Key: CASSANDRA-13365
URL: https://issues.apache.org/jira/browse/CASSANDRA-13365
Project: Cassandra
Issue Type: Bug
Environment: 34-node cluster over 4 DCs
Linux CentOS 7.2 x86
Mix of 64GB/128GB RAM / node
Mix of 32/40 hardware threads / node, Xeon ~2.4Ghz
High read volume, low write volume, occasional sstable bulk loading
Reporter: Mina Naguib
Over the last week we've been observing two related problems affecting our Cassandra cluster
Problem 1: 1-few nodes per DC entering GC loop, not recovering
Checking the heap usage stats, there's a sudden jump of 1-3GB. Some nodes recover, but some don't and log this:
{noformat}
2017-03-21T11:23:02.957-0400: 54099.519: [Full GC (Allocation Failure) 13G->11G(14G), 29.4127307 secs]
2017-03-21T11:23:45.270-0400: 54141.833: [Full GC (Allocation Failure) 13G->12G(14G), 28.1561881 secs]
2017-03-21T11:24:20.307-0400: 54176.869: [Full GC (Allocation Failure) 13G->13G(14G), 27.7019501 secs]
2017-03-21T11:24:50.528-0400: 54207.090: [Full GC (Allocation Failure) 13G->13G(14G), 27.1372267 secs]
2017-03-21T11:25:19.190-0400: 54235.752: [Full GC (Allocation Failure) 13G->13G(14G), 27.0703975 secs]
2017-03-21T11:25:46.711-0400: 54263.273: [Full GC (Allocation Failure) 13G->13G(14G), 27.3187768 secs]
2017-03-21T11:26:15.419-0400: 54291.981: [Full GC (Allocation Failure) 13G->13G(14G), 26.9493405 secs]
2017-03-21T11:26:43.399-0400: 54319.961: [Full GC (Allocation Failure) 13G->13G(14G), 27.5222085 secs]
2017-03-21T11:27:11.383-0400: 54347.945: [Full GC (Allocation Failure) 13G->13G(14G), 27.1769581 secs]
2017-03-21T11:27:40.174-0400: 54376.737: [Full GC (Allocation Failure) 13G->13G(14G), 27.4639031 secs]
2017-03-21T11:28:08.946-0400: 54405.508: [Full GC (Allocation Failure) 13G->13G(14G), 30.3480523 secs]
2017-03-21T11:28:40.117-0400: 54436.680: [Full GC (Allocation Failure) 13G->13G(14G), 27.8220513 secs]
2017-03-21T11:29:08.459-0400: 54465.022: [Full GC (Allocation Failure) 13G->13G(14G), 27.4691271 secs]
2017-03-21T11:29:37.114-0400: 54493.676: [Full GC (Allocation Failure) 13G->13G(14G), 27.0275733 secs]
2017-03-21T11:30:04.635-0400: 54521.198: [Full GC (Allocation Failure) 13G->13G(14G), 27.1902627 secs]
2017-03-21T11:30:32.114-0400: 54548.676: [Full GC (Allocation Failure) 13G->13G(14G), 27.8872850 secs]
2017-03-21T11:31:01.430-0400: 54577.993: [Full GC (Allocation Failure) 13G->13G(14G), 27.1609706 secs]
2017-03-21T11:31:29.024-0400: 54605.587: [Full GC (Allocation Failure) 13G->13G(14G), 27.3635138 secs]
2017-03-21T11:31:57.303-0400: 54633.865: [Full GC (Allocation Failure) 13G->13G(14G), 27.4143510 secs]
2017-03-21T11:32:25.110-0400: 54661.672: [Full GC (Allocation Failure) 13G->13G(14G), 27.8595986 secs]
2017-03-21T11:32:53.922-0400: 54690.485: [Full GC (Allocation Failure) 13G->13G(14G), 27.5242543 secs]
2017-03-21T11:33:21.867-0400: 54718.429: [Full GC (Allocation Failure) 13G->13G(14G), 30.8930130 secs]
2017-03-21T11:33:53.712-0400: 54750.275: [Full GC (Allocation Failure) 13G->13G(14G), 27.6523013 secs]
2017-03-21T11:34:21.760-0400: 54778.322: [Full GC (Allocation Failure) 13G->13G(14G), 27.3030198 secs]
2017-03-21T11:34:50.073-0400: 54806.635: [Full GC (Allocation Failure) 13G->13G(14G), 27.1594154 secs]
2017-03-21T11:35:17.743-0400: 54834.306: [Full GC (Allocation Failure) 13G->13G(14G), 27.3766949 secs]
2017-03-21T11:35:45.797-0400: 54862.360: [Full GC (Allocation Failure) 13G->13G(14G), 27.5756770 secs]
2017-03-21T11:36:13.816-0400: 54890.378: [Full GC (Allocation Failure) 13G->13G(14G), 27.5541813 secs]
2017-03-21T11:36:41.926-0400: 54918.488: [Full GC (Allocation Failure) 13G->13G(14G), 33.7510103 secs]
2017-03-21T11:37:16.132-0400: 54952.695: [Full GC (Allocation Failure) 13G->13G(14G), 27.4856611 secs]
2017-03-21T11:37:44.454-0400: 54981.017: [Full GC (Allocation Failure) 13G->13G(14G), 28.1269335 secs]
2017-03-21T11:38:12.774-0400: 55009.337: [Full GC (Allocation Failure) 13G->13G(14G), 27.7830448 secs]
2017-03-21T11:38:40.840-0400: 55037.402: [Full GC (Allocation Failure) 13G->13G(14G), 27.3527326 secs]
2017-03-21T11:39:08.610-0400: 55065.173: [Full GC (Allocation Failure) 13G->13G(14G), 27.5828941 secs]
2017-03-21T11:39:36.833-0400: 55093.396: [Full GC (Allocation Failure) 13G->13G(14G), 27.9303030 secs]
2017-03-21T11:40:05.265-0400: 55121.828: [Full GC (Allocation Failure) 13G->13G(14G), 36.9902867 secs]
2017-03-21T11:40:42.400-0400: 55158.963: [Full GC (Allocation Failure) 13G->13G(14G), 27.6835744 secs]
2017-03-21T11:41:10.529-0400: 55187.091: [Full GC (Allocation Failure) 13G->13G(14G), 27.1899555 secs]
2017-03-21T11:41:38.018-0400: 55214.581: [Full GC (Allocation Failure) 13G->13G(14G), 27.7309706 secs]
2017-03-21T11:42:06.062-0400: 55242.624: [Full GC (Allocation Failure) 13G->13G(14G), 29.2103234 secs]
2017-03-21T11:42:35.579-0400: 55272.141: [Full GC (Allocation Failure) 13G->13G(14G), 27.9902741 secs]
2017-03-21T11:43:03.955-0400: 55300.518: [Full GC (Allocation Failure) 13G->13G(14G), 28.0764545 secs]
2017-03-21T11:43:32.186-0400: 55328.748: [Full GC (Allocation Failure) 13G->13G(14G), 28.3913296 secs]
2017-03-21T11:44:00.810-0400: 55357.373: [Full GC (Allocation Failure) 13G->13G(14G), 27.0945131 secs]
2017-03-21T11:44:28.343-0400: 55384.905: [Full GC (Allocation Failure) 13G->13G(14G), 27.3357763 secs]
2017-03-21T11:44:55.909-0400: 55412.471: [Full GC (Allocation Failure) 13G->13G(14G), 27.9216626 secs]
2017-03-21T11:45:24.070-0400: 55440.632: [Full GC (Allocation Failure) 13G->13G(14G), 29.8189953 secs]
2017-03-21T11:45:54.092-0400: 55470.654: [Full GC (Allocation Failure) 13G->13G(14G), 27.9992085 secs]
2017-03-21T11:46:22.219-0400: 55498.782: [Full GC (Allocation Failure) 13G->13G(14G), 27.4622100 secs]
2017-03-21T11:46:49.983-0400: 55526.545: [Full GC (Allocation Failure) 13G->13G(14G), 27.9159143 secs]
2017-03-21T11:47:18.029-0400: 55554.592: [Full GC (Allocation Failure) 13G->13G(14G), 28.5207919 secs]
2017-03-21T11:47:46.938-0400: 55583.500: [Full GC (Allocation Failure) 13G->13G(14G), 27.0355291 secs]
2017-03-21T11:48:14.235-0400: 55610.798: [Full GC (Allocation Failure) 13G->13G(14G), 35.6284865 secs]
2017-03-21T11:48:50.013-0400: 55646.575: [Full GC (Allocation Failure) 13G->13G(14G), 27.3299061 secs]
2017-03-21T11:49:17.592-0400: 55674.154: [Full GC (Allocation Failure) 13G->13G(14G), 27.2544348 secs]
{noformat}
When a node enters the above state:, it becomes useless over 1-5 minutes:
* Client connections die
* New connections time out
* CPU usage drops - all threads do nothing except for 1 thread near 100% CPU ("VM Thread" os_prio=0 tid=0x00007ffaf8209bf0 nid=0x2d53 runnable"), doing the above GC in a loop
* JMX becomes unqueriable
* Regular kill signal is ignored (needs kill -9)
I've managed to extract a full heap dump of a node while in this state, but it's 20GB and I have not been able to find a tool that can analyze it.
I've extracted a histogram of heap objects. From a healthy node:
{noformat}
num #instances #bytes class name
----------------------------------------------
1: 12688007 1116544616 io.netty.buffer.PooledUnsafeDirectByteBuf
2: 24938940 798046080 io.netty.util.Recycler$DefaultHandle
3: 11457548 733283072 io.netty.channel.ChannelOutboundBuffer$Entry
4: 9582138 459942624 java.nio.HeapByteBuffer
5: 16974 393432416 [I
6: 2014719 311356992 [B
7: 4982167 181324848 [Ljava.lang.Object;
8: 8531 118782704 [Lio.netty.util.Recycler$DefaultHandle;
9: 2827662 113106480 org.apache.cassandra.db.rows.BufferCell
10: 2737816 87610112 org.apache.cassandra.db.rows.BTreeRow
11: 2740110 70228240 [Ljava.nio.ByteBuffer;
12: 2738081 65713944 org.apache.cassandra.db.BufferClustering
13: 1763956 56446592 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
14: 1763736 56439552 com.googlecode.concurrentlinkedhashmap.ConcurrentHashMapV8$Node
15: 2205825 52939800 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
16: 1617026 51744832 java.util.concurrent.ConcurrentHashMap$Node
17: 1769934 42478416 org.apache.cassandra.cache.KeyCacheKey
{noformat}
From an unhealthy node:
{noformat}
num #instances #bytes class name
----------------------------------------------
1: 103726026 4978849248 java.nio.HeapByteBuffer
2: 95646048 2832233872 [B
3: 28076465 1207916952 [Ljava.lang.Object;
4: 12803448 1126703424 io.netty.buffer.PooledUnsafeDirectByteBuf
5: 25054634 801748288 io.netty.util.Recycler$DefaultHandle
6: 11457548 733283072 io.netty.channel.ChannelOutboundBuffer$Entry
7: 24725380 593409120 java.util.ArrayList
8: 5808290 185865280 java.util.RegularEnumSet
9: 2870889 160769784 org.apache.cassandra.transport.messages.ResultMessage$Rows
10: 2937336 140992128 io.netty.buffer.SlicedAbstractByteBuf
11: 8854 118773984 [Lio.netty.util.Recycler$DefaultHandle;
12: 2830805 113232200 org.apache.cassandra.db.rows.BufferCell
13: 2937336 93994752 org.apache.cassandra.transport.Frame$Header
14: 2870928 91869696 org.apache.cassandra.cql3.ResultSet$ResultMetadata
15: 2728627 87316064 org.apache.cassandra.db.rows.BTreeRow
16: 2947033 70728792 java.util.concurrent.ConcurrentLinkedQueue$Node
17: 2937336 70496064 org.apache.cassandra.transport.Frame
{noformat}
The increase is clear in these top 3 object types:
{noformat}
1: 103726026 4978849248 java.nio.HeapByteBuffer
2: 95646048 2832233872 [B
3: 28076465 1207916952 [Ljava.lang.Object;
{noformat}
Other related observations:
Client socket states: Essentially the JVM has stopped working, and the recv-queue is piling up kernel-side but not being consumed app-side. Worth noting that the peers (clients) have given up and closed the connections on their end, but they still appear ESTABlished on cassandra's side since it hasn't drain the read buffer till EOF/error/complete the closure:
{noformat}
ESTAB 558612 0 172.30.72.133:9042 172.30.72.145:11418
ESTAB 564172 0 172.30.72.133:9042 172.30.72.145:12420
ESTAB 545284 0 172.30.72.133:9042 172.30.72.145:18212
ESTAB 565468 0 172.30.72.133:9042 172.30.72.145:18346
ESTAB 525884 0 172.30.72.133:9042 172.30.72.145:18352
ESTAB 532324 0 172.30.72.133:9042 172.30.72.145:28896
ESTAB 517592 0 172.30.72.133:9042 172.30.72.145:34504
ESTAB 553392 0 172.30.72.133:9042 172.30.72.145:36926
ESTAB 581952 0 172.30.72.133:9042 172.30.72.145:40684
ESTAB 543668 0 172.30.72.133:9042 172.30.72.145:42282
ESTAB 511860 0 172.30.72.133:9042 172.30.72.145:47350
ESTAB 590380 0 172.30.72.133:9042 172.30.72.145:47566
ESTAB 583120 0 172.30.72.133:9042 172.30.72.145:50386
ESTAB 536772 0 172.30.72.133:9042 172.30.72.145:52656
ESTAB 31 0 172.30.72.133:9042 172.30.72.145:52946
ESTAB 532140 0 172.30.72.133:9042 172.30.72.145:53618
ESTAB 14397 0 172.30.72.133:9042 172.30.72.150:10016
ESTAB 6444 0 172.30.72.133:9042 172.30.72.150:12176
ESTAB 7568 0 172.30.72.133:9042 172.30.72.150:12934
ESTAB 139479 0 172.30.72.133:9042 172.30.72.150:13186
ESTAB 6896 0 172.30.72.133:9042 172.30.72.150:17316
ESTAB 7808 0 172.30.72.133:9042 172.30.72.150:17448
ESTAB 8712 0 172.30.72.133:9042 172.30.72.150:17748
ESTAB 6944 0 172.30.72.133:9042 172.30.72.150:18472
ESTAB 136700 0 172.30.72.133:9042 172.30.72.150:19860
ESTAB 16260 0 172.30.72.133:9042 172.30.72.150:20696
ESTAB 8943 0 172.30.72.133:9042 172.30.72.150:21208
ESTAB 132877 0 172.30.72.133:9042 172.30.72.150:21420
ESTAB 6936 0 172.30.72.133:9042 172.30.72.150:25440
ESTAB 92 0 172.30.72.133:9042 172.30.72.150:25778
ESTAB 137312 0 172.30.72.133:9042 172.30.72.150:28258
ESTAB 40 0 172.30.72.133:9042 172.30.72.150:28348
ESTAB 6940 0 172.30.72.133:9042 172.30.72.150:29718
ESTAB 6416 0 172.30.72.133:9042 172.30.72.150:29858
ESTAB 6580 0 172.30.72.133:9042 172.30.72.150:30518
ESTAB 4202 0 172.30.72.133:9042 172.30.72.150:30948
ESTAB 6060 0 172.30.72.133:9042 172.30.72.150:35276
ESTAB 0 0 172.30.72.133:9042 172.30.72.150:40114
ESTAB 134303 0 172.30.72.133:9042 172.30.72.150:42212
ESTAB 396 0 172.30.72.133:9042 172.30.72.150:44848
ESTAB 136450 0 172.30.72.133:9042 172.30.72.150:47742
ESTAB 8318 0 172.30.72.133:9042 172.30.72.150:51054
ESTAB 11976 0 172.30.72.133:9042 172.30.72.150:54560
ESTAB 136193 0 172.30.72.133:9042 172.30.72.150:56094
ESTAB 138737 0 172.30.72.133:9042 172.30.72.150:60382
ESTAB 7996 0 172.30.72.133:9042 172.30.72.150:60932
ESTAB 137235 0 172.30.72.133:9042 172.30.72.150:61568
ESTAB 8184 0 172.30.72.133:9042 172.30.72.150:63054
ESTAB 985276 0 172.30.72.133:9042 172.30.72.175:23494
ESTAB 601528 0 172.30.72.133:9042 172.30.72.175:24140
ESTAB 557844 0 172.30.72.133:9042 172.30.72.175:42248
ESTAB 469344 0 172.30.72.133:9042 172.30.72.175:59704
ESTAB 507160 0 172.30.72.133:9042 172.30.72.185:11958
ESTAB 449196 0 172.30.72.133:9042 172.30.72.185:15458
ESTAB 487796 0 172.30.72.133:9042 172.30.72.185:16458
ESTAB 441132 0 172.30.72.133:9042 172.30.72.185:23194
ESTAB 486664 0 172.30.72.133:9042 172.30.72.185:25276
ESTAB 554780 0 172.30.72.133:9042 172.30.72.185:29102
ESTAB 496248 0 172.30.72.133:9042 172.30.72.185:29428
ESTAB 420048 0 172.30.72.133:9042 172.30.72.185:39650
ESTAB 498704 0 172.30.72.133:9042 172.30.72.185:46424
ESTAB 506488 0 172.30.72.133:9042 172.30.72.185:49722
ESTAB 509416 0 172.30.72.133:9042 172.30.72.185:55838
ESTAB 511956 0 172.30.72.133:9042 172.30.72.185:57670
ESTAB 496720 0 172.30.72.133:9042 172.30.72.185:59884
{noformat}
perf top during this state:
{noformat}
19.73% libjvm.so [.] InstanceKlass::oop_follow_contents
14.05% libjvm.so [.] ObjArrayKlass::oop_follow_contents
7.83% libjvm.so [.] MarkSweep::follow_stack
6.38% libjvm.so [.] HeapRegion::block_size
5.41% libjvm.so [.] OtherRegionsTable::add_reference
5.12% libjvm.so [.] UpdateRSOopClosure::do_oop
4.77% libjvm.so [.] HeapRegion::block_is_obj
{noformat}
Frequency: Affecting nodes in our 2 busier DCs (other 2 DCs serve less volume). ~ 2-10 times per day. Not always same nodes (though lately 2 hiccup more frequently than others).
Tried:
* Upgrading cassandra from 3.9 to 3.10
* Increasing heap size from 8GB to 14GB
* Upgrading JVM 1.8 from u112 to u121
* Switching from CMS to G1GC
The last change, switching from CMS to G1GC reduced the frequency of the problem but has not eliminated it.
Problem 2: Read volume in DC depressed
With our RF set to 2 per DC, if 2 or more nodes in a DC enter the above GC loop described in problem 1, all nodes in the same DC accumulate very high active/pending/blocked Task-Transport queues.
Total read volume for the DC drops to 10%-30% of normal.
Tried:
* Increasing -Dcassandra.max_queued_native_transport_requests to 4096
* Reducing read_request_timeout_in_ms to 400
* Reducing write_request_timeout_in_ms to 800
Improved things slightly but not too much.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)