You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "C. Scott Andreas (JIRA)" <ji...@apache.org> on 2018/11/19 06:15:00 UTC
[jira] [Updated] (CASSANDRA-13365) Nodes entering GC loop, does not
recover
[ https://issues.apache.org/jira/browse/CASSANDRA-13365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
C. Scott Andreas updated CASSANDRA-13365:
-----------------------------------------
Component/s: Core
> Nodes entering GC loop, does not recover
> ----------------------------------------
>
> Key: CASSANDRA-13365
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13365
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: 34-node cluster over 4 DCs
> Linux CentOS 7.2 x86
> Mix of 64GB/128GB RAM / node
> Mix of 32/40 hardware threads / node, Xeon ~2.4Ghz
> High read volume, low write volume, occasional sstable bulk loading
> Reporter: Mina Naguib
> Priority: Major
>
> Over the last week we've been observing two related problems affecting our Cassandra cluster
> Problem 1: 1-few nodes per DC entering GC loop, not recovering
> Checking the heap usage stats, there's a sudden jump of 1-3GB. Some nodes recover, but some don't and log this:
> {noformat}
> 2017-03-21T11:23:02.957-0400: 54099.519: [Full GC (Allocation Failure) 13G->11G(14G), 29.4127307 secs]
> 2017-03-21T11:23:45.270-0400: 54141.833: [Full GC (Allocation Failure) 13G->12G(14G), 28.1561881 secs]
> 2017-03-21T11:24:20.307-0400: 54176.869: [Full GC (Allocation Failure) 13G->13G(14G), 27.7019501 secs]
> 2017-03-21T11:24:50.528-0400: 54207.090: [Full GC (Allocation Failure) 13G->13G(14G), 27.1372267 secs]
> 2017-03-21T11:25:19.190-0400: 54235.752: [Full GC (Allocation Failure) 13G->13G(14G), 27.0703975 secs]
> 2017-03-21T11:25:46.711-0400: 54263.273: [Full GC (Allocation Failure) 13G->13G(14G), 27.3187768 secs]
> 2017-03-21T11:26:15.419-0400: 54291.981: [Full GC (Allocation Failure) 13G->13G(14G), 26.9493405 secs]
> 2017-03-21T11:26:43.399-0400: 54319.961: [Full GC (Allocation Failure) 13G->13G(14G), 27.5222085 secs]
> 2017-03-21T11:27:11.383-0400: 54347.945: [Full GC (Allocation Failure) 13G->13G(14G), 27.1769581 secs]
> 2017-03-21T11:27:40.174-0400: 54376.737: [Full GC (Allocation Failure) 13G->13G(14G), 27.4639031 secs]
> 2017-03-21T11:28:08.946-0400: 54405.508: [Full GC (Allocation Failure) 13G->13G(14G), 30.3480523 secs]
> 2017-03-21T11:28:40.117-0400: 54436.680: [Full GC (Allocation Failure) 13G->13G(14G), 27.8220513 secs]
> 2017-03-21T11:29:08.459-0400: 54465.022: [Full GC (Allocation Failure) 13G->13G(14G), 27.4691271 secs]
> 2017-03-21T11:29:37.114-0400: 54493.676: [Full GC (Allocation Failure) 13G->13G(14G), 27.0275733 secs]
> 2017-03-21T11:30:04.635-0400: 54521.198: [Full GC (Allocation Failure) 13G->13G(14G), 27.1902627 secs]
> 2017-03-21T11:30:32.114-0400: 54548.676: [Full GC (Allocation Failure) 13G->13G(14G), 27.8872850 secs]
> 2017-03-21T11:31:01.430-0400: 54577.993: [Full GC (Allocation Failure) 13G->13G(14G), 27.1609706 secs]
> 2017-03-21T11:31:29.024-0400: 54605.587: [Full GC (Allocation Failure) 13G->13G(14G), 27.3635138 secs]
> 2017-03-21T11:31:57.303-0400: 54633.865: [Full GC (Allocation Failure) 13G->13G(14G), 27.4143510 secs]
> 2017-03-21T11:32:25.110-0400: 54661.672: [Full GC (Allocation Failure) 13G->13G(14G), 27.8595986 secs]
> 2017-03-21T11:32:53.922-0400: 54690.485: [Full GC (Allocation Failure) 13G->13G(14G), 27.5242543 secs]
> 2017-03-21T11:33:21.867-0400: 54718.429: [Full GC (Allocation Failure) 13G->13G(14G), 30.8930130 secs]
> 2017-03-21T11:33:53.712-0400: 54750.275: [Full GC (Allocation Failure) 13G->13G(14G), 27.6523013 secs]
> 2017-03-21T11:34:21.760-0400: 54778.322: [Full GC (Allocation Failure) 13G->13G(14G), 27.3030198 secs]
> 2017-03-21T11:34:50.073-0400: 54806.635: [Full GC (Allocation Failure) 13G->13G(14G), 27.1594154 secs]
> 2017-03-21T11:35:17.743-0400: 54834.306: [Full GC (Allocation Failure) 13G->13G(14G), 27.3766949 secs]
> 2017-03-21T11:35:45.797-0400: 54862.360: [Full GC (Allocation Failure) 13G->13G(14G), 27.5756770 secs]
> 2017-03-21T11:36:13.816-0400: 54890.378: [Full GC (Allocation Failure) 13G->13G(14G), 27.5541813 secs]
> 2017-03-21T11:36:41.926-0400: 54918.488: [Full GC (Allocation Failure) 13G->13G(14G), 33.7510103 secs]
> 2017-03-21T11:37:16.132-0400: 54952.695: [Full GC (Allocation Failure) 13G->13G(14G), 27.4856611 secs]
> 2017-03-21T11:37:44.454-0400: 54981.017: [Full GC (Allocation Failure) 13G->13G(14G), 28.1269335 secs]
> 2017-03-21T11:38:12.774-0400: 55009.337: [Full GC (Allocation Failure) 13G->13G(14G), 27.7830448 secs]
> 2017-03-21T11:38:40.840-0400: 55037.402: [Full GC (Allocation Failure) 13G->13G(14G), 27.3527326 secs]
> 2017-03-21T11:39:08.610-0400: 55065.173: [Full GC (Allocation Failure) 13G->13G(14G), 27.5828941 secs]
> 2017-03-21T11:39:36.833-0400: 55093.396: [Full GC (Allocation Failure) 13G->13G(14G), 27.9303030 secs]
> 2017-03-21T11:40:05.265-0400: 55121.828: [Full GC (Allocation Failure) 13G->13G(14G), 36.9902867 secs]
> 2017-03-21T11:40:42.400-0400: 55158.963: [Full GC (Allocation Failure) 13G->13G(14G), 27.6835744 secs]
> 2017-03-21T11:41:10.529-0400: 55187.091: [Full GC (Allocation Failure) 13G->13G(14G), 27.1899555 secs]
> 2017-03-21T11:41:38.018-0400: 55214.581: [Full GC (Allocation Failure) 13G->13G(14G), 27.7309706 secs]
> 2017-03-21T11:42:06.062-0400: 55242.624: [Full GC (Allocation Failure) 13G->13G(14G), 29.2103234 secs]
> 2017-03-21T11:42:35.579-0400: 55272.141: [Full GC (Allocation Failure) 13G->13G(14G), 27.9902741 secs]
> 2017-03-21T11:43:03.955-0400: 55300.518: [Full GC (Allocation Failure) 13G->13G(14G), 28.0764545 secs]
> 2017-03-21T11:43:32.186-0400: 55328.748: [Full GC (Allocation Failure) 13G->13G(14G), 28.3913296 secs]
> 2017-03-21T11:44:00.810-0400: 55357.373: [Full GC (Allocation Failure) 13G->13G(14G), 27.0945131 secs]
> 2017-03-21T11:44:28.343-0400: 55384.905: [Full GC (Allocation Failure) 13G->13G(14G), 27.3357763 secs]
> 2017-03-21T11:44:55.909-0400: 55412.471: [Full GC (Allocation Failure) 13G->13G(14G), 27.9216626 secs]
> 2017-03-21T11:45:24.070-0400: 55440.632: [Full GC (Allocation Failure) 13G->13G(14G), 29.8189953 secs]
> 2017-03-21T11:45:54.092-0400: 55470.654: [Full GC (Allocation Failure) 13G->13G(14G), 27.9992085 secs]
> 2017-03-21T11:46:22.219-0400: 55498.782: [Full GC (Allocation Failure) 13G->13G(14G), 27.4622100 secs]
> 2017-03-21T11:46:49.983-0400: 55526.545: [Full GC (Allocation Failure) 13G->13G(14G), 27.9159143 secs]
> 2017-03-21T11:47:18.029-0400: 55554.592: [Full GC (Allocation Failure) 13G->13G(14G), 28.5207919 secs]
> 2017-03-21T11:47:46.938-0400: 55583.500: [Full GC (Allocation Failure) 13G->13G(14G), 27.0355291 secs]
> 2017-03-21T11:48:14.235-0400: 55610.798: [Full GC (Allocation Failure) 13G->13G(14G), 35.6284865 secs]
> 2017-03-21T11:48:50.013-0400: 55646.575: [Full GC (Allocation Failure) 13G->13G(14G), 27.3299061 secs]
> 2017-03-21T11:49:17.592-0400: 55674.154: [Full GC (Allocation Failure) 13G->13G(14G), 27.2544348 secs]
> {noformat}
> When a node enters the above state:, it becomes useless over 1-5 minutes:
> * Client connections die
> * New connections time out
> * CPU usage drops - all threads do nothing except for 1 thread near 100% CPU ("VM Thread" os_prio=0 tid=0x00007ffaf8209bf0 nid=0x2d53 runnable"), doing the above GC in a loop
> * JMX becomes unqueriable
> * Regular kill signal is ignored (needs kill -9)
> I've managed to extract a full heap dump of a node while in this state, but it's 20GB and I have not been able to find a tool that can analyze it.
> I've extracted a histogram of heap objects. From a healthy node:
> {noformat}
> num #instances #bytes class name
> ----------------------------------------------
> 1: 12688007 1116544616 io.netty.buffer.PooledUnsafeDirectByteBuf
> 2: 24938940 798046080 io.netty.util.Recycler$DefaultHandle
> 3: 11457548 733283072 io.netty.channel.ChannelOutboundBuffer$Entry
> 4: 9582138 459942624 java.nio.HeapByteBuffer
> 5: 16974 393432416 [I
> 6: 2014719 311356992 [B
> 7: 4982167 181324848 [Ljava.lang.Object;
> 8: 8531 118782704 [Lio.netty.util.Recycler$DefaultHandle;
> 9: 2827662 113106480 org.apache.cassandra.db.rows.BufferCell
> 10: 2737816 87610112 org.apache.cassandra.db.rows.BTreeRow
> 11: 2740110 70228240 [Ljava.nio.ByteBuffer;
> 12: 2738081 65713944 org.apache.cassandra.db.BufferClustering
> 13: 1763956 56446592 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
> 14: 1763736 56439552 com.googlecode.concurrentlinkedhashmap.ConcurrentHashMapV8$Node
> 15: 2205825 52939800 com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
> 16: 1617026 51744832 java.util.concurrent.ConcurrentHashMap$Node
> 17: 1769934 42478416 org.apache.cassandra.cache.KeyCacheKey
> {noformat}
> From an unhealthy node:
> {noformat}
> num #instances #bytes class name
> ----------------------------------------------
> 1: 103726026 4978849248 java.nio.HeapByteBuffer
> 2: 95646048 2832233872 [B
> 3: 28076465 1207916952 [Ljava.lang.Object;
> 4: 12803448 1126703424 io.netty.buffer.PooledUnsafeDirectByteBuf
> 5: 25054634 801748288 io.netty.util.Recycler$DefaultHandle
> 6: 11457548 733283072 io.netty.channel.ChannelOutboundBuffer$Entry
> 7: 24725380 593409120 java.util.ArrayList
> 8: 5808290 185865280 java.util.RegularEnumSet
> 9: 2870889 160769784 org.apache.cassandra.transport.messages.ResultMessage$Rows
> 10: 2937336 140992128 io.netty.buffer.SlicedAbstractByteBuf
> 11: 8854 118773984 [Lio.netty.util.Recycler$DefaultHandle;
> 12: 2830805 113232200 org.apache.cassandra.db.rows.BufferCell
> 13: 2937336 93994752 org.apache.cassandra.transport.Frame$Header
> 14: 2870928 91869696 org.apache.cassandra.cql3.ResultSet$ResultMetadata
> 15: 2728627 87316064 org.apache.cassandra.db.rows.BTreeRow
> 16: 2947033 70728792 java.util.concurrent.ConcurrentLinkedQueue$Node
> 17: 2937336 70496064 org.apache.cassandra.transport.Frame
> {noformat}
> The increase is clear in these top 3 object types:
> {noformat}
> 1: 103726026 4978849248 java.nio.HeapByteBuffer
> 2: 95646048 2832233872 [B
> 3: 28076465 1207916952 [Ljava.lang.Object;
> {noformat}
> Other related observations:
> Client socket states: Essentially the JVM has stopped working, and the recv-queue is piling up kernel-side but not being consumed app-side. Worth noting that the peers (clients) have given up and closed the connections on their end, but they still appear ESTABlished on cassandra's side since it hasn't drain the read buffer till EOF/error/complete the closure:
> {noformat}
> ESTAB 558612 0 172.30.72.133:9042 172.30.72.145:11418
> ESTAB 564172 0 172.30.72.133:9042 172.30.72.145:12420
> ESTAB 545284 0 172.30.72.133:9042 172.30.72.145:18212
> ESTAB 565468 0 172.30.72.133:9042 172.30.72.145:18346
> ESTAB 525884 0 172.30.72.133:9042 172.30.72.145:18352
> ESTAB 532324 0 172.30.72.133:9042 172.30.72.145:28896
> ESTAB 517592 0 172.30.72.133:9042 172.30.72.145:34504
> ESTAB 553392 0 172.30.72.133:9042 172.30.72.145:36926
> ESTAB 581952 0 172.30.72.133:9042 172.30.72.145:40684
> ESTAB 543668 0 172.30.72.133:9042 172.30.72.145:42282
> ESTAB 511860 0 172.30.72.133:9042 172.30.72.145:47350
> ESTAB 590380 0 172.30.72.133:9042 172.30.72.145:47566
> ESTAB 583120 0 172.30.72.133:9042 172.30.72.145:50386
> ESTAB 536772 0 172.30.72.133:9042 172.30.72.145:52656
> ESTAB 31 0 172.30.72.133:9042 172.30.72.145:52946
> ESTAB 532140 0 172.30.72.133:9042 172.30.72.145:53618
> ESTAB 14397 0 172.30.72.133:9042 172.30.72.150:10016
> ESTAB 6444 0 172.30.72.133:9042 172.30.72.150:12176
> ESTAB 7568 0 172.30.72.133:9042 172.30.72.150:12934
> ESTAB 139479 0 172.30.72.133:9042 172.30.72.150:13186
> ESTAB 6896 0 172.30.72.133:9042 172.30.72.150:17316
> ESTAB 7808 0 172.30.72.133:9042 172.30.72.150:17448
> ESTAB 8712 0 172.30.72.133:9042 172.30.72.150:17748
> ESTAB 6944 0 172.30.72.133:9042 172.30.72.150:18472
> ESTAB 136700 0 172.30.72.133:9042 172.30.72.150:19860
> ESTAB 16260 0 172.30.72.133:9042 172.30.72.150:20696
> ESTAB 8943 0 172.30.72.133:9042 172.30.72.150:21208
> ESTAB 132877 0 172.30.72.133:9042 172.30.72.150:21420
> ESTAB 6936 0 172.30.72.133:9042 172.30.72.150:25440
> ESTAB 92 0 172.30.72.133:9042 172.30.72.150:25778
> ESTAB 137312 0 172.30.72.133:9042 172.30.72.150:28258
> ESTAB 40 0 172.30.72.133:9042 172.30.72.150:28348
> ESTAB 6940 0 172.30.72.133:9042 172.30.72.150:29718
> ESTAB 6416 0 172.30.72.133:9042 172.30.72.150:29858
> ESTAB 6580 0 172.30.72.133:9042 172.30.72.150:30518
> ESTAB 4202 0 172.30.72.133:9042 172.30.72.150:30948
> ESTAB 6060 0 172.30.72.133:9042 172.30.72.150:35276
> ESTAB 0 0 172.30.72.133:9042 172.30.72.150:40114
> ESTAB 134303 0 172.30.72.133:9042 172.30.72.150:42212
> ESTAB 396 0 172.30.72.133:9042 172.30.72.150:44848
> ESTAB 136450 0 172.30.72.133:9042 172.30.72.150:47742
> ESTAB 8318 0 172.30.72.133:9042 172.30.72.150:51054
> ESTAB 11976 0 172.30.72.133:9042 172.30.72.150:54560
> ESTAB 136193 0 172.30.72.133:9042 172.30.72.150:56094
> ESTAB 138737 0 172.30.72.133:9042 172.30.72.150:60382
> ESTAB 7996 0 172.30.72.133:9042 172.30.72.150:60932
> ESTAB 137235 0 172.30.72.133:9042 172.30.72.150:61568
> ESTAB 8184 0 172.30.72.133:9042 172.30.72.150:63054
> ESTAB 985276 0 172.30.72.133:9042 172.30.72.175:23494
> ESTAB 601528 0 172.30.72.133:9042 172.30.72.175:24140
> ESTAB 557844 0 172.30.72.133:9042 172.30.72.175:42248
> ESTAB 469344 0 172.30.72.133:9042 172.30.72.175:59704
> ESTAB 507160 0 172.30.72.133:9042 172.30.72.185:11958
> ESTAB 449196 0 172.30.72.133:9042 172.30.72.185:15458
> ESTAB 487796 0 172.30.72.133:9042 172.30.72.185:16458
> ESTAB 441132 0 172.30.72.133:9042 172.30.72.185:23194
> ESTAB 486664 0 172.30.72.133:9042 172.30.72.185:25276
> ESTAB 554780 0 172.30.72.133:9042 172.30.72.185:29102
> ESTAB 496248 0 172.30.72.133:9042 172.30.72.185:29428
> ESTAB 420048 0 172.30.72.133:9042 172.30.72.185:39650
> ESTAB 498704 0 172.30.72.133:9042 172.30.72.185:46424
> ESTAB 506488 0 172.30.72.133:9042 172.30.72.185:49722
> ESTAB 509416 0 172.30.72.133:9042 172.30.72.185:55838
> ESTAB 511956 0 172.30.72.133:9042 172.30.72.185:57670
> ESTAB 496720 0 172.30.72.133:9042 172.30.72.185:59884
> {noformat}
> perf top during this state:
> {noformat}
> 19.73% libjvm.so [.] InstanceKlass::oop_follow_contents
> 14.05% libjvm.so [.] ObjArrayKlass::oop_follow_contents
> 7.83% libjvm.so [.] MarkSweep::follow_stack
> 6.38% libjvm.so [.] HeapRegion::block_size
> 5.41% libjvm.so [.] OtherRegionsTable::add_reference
> 5.12% libjvm.so [.] UpdateRSOopClosure::do_oop
> 4.77% libjvm.so [.] HeapRegion::block_is_obj
> {noformat}
> Frequency: Affecting nodes in our 2 busier DCs (other 2 DCs serve less volume). ~ 2-10 times per day. Not always same nodes (though lately 2 hiccup more frequently than others).
> Tried:
> * Upgrading cassandra from 3.9 to 3.10
> * Increasing heap size from 8GB to 14GB
> * Upgrading JVM 1.8 from u112 to u121
> * Switching from CMS to G1GC
> The last change, switching from CMS to G1GC reduced the frequency of the problem but has not eliminated it.
> Problem 2: Read volume in DC depressed
> With our RF set to 2 per DC, if 2 or more nodes in a DC enter the above GC loop described in problem 1, all nodes in the same DC accumulate very high active/pending/blocked Task-Transport queues.
> Total read volume for the DC drops to 10%-30% of normal.
> Tried:
> * Increasing -Dcassandra.max_queued_native_transport_requests to 4096
> * Reducing read_request_timeout_in_ms to 400
> * Reducing write_request_timeout_in_ms to 800
> Improved things slightly but not too much.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org