You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "C. Scott Andreas (JIRA)" <ji...@apache.org> on 2018/11/19 06:15:00 UTC
[jira] [Updated] (CASSANDRA-13365) Nodes entering GC loop, does not recover

     [ https://issues.apache.org/jira/browse/CASSANDRA-13365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

C. Scott Andreas updated CASSANDRA-13365:
-----------------------------------------
    Component/s: Core

> Nodes entering GC loop, does not recover
> ----------------------------------------
>
>                 Key: CASSANDRA-13365
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13365
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 34-node cluster over 4 DCs
> Linux CentOS 7.2 x86
> Mix of 64GB/128GB RAM / node
> Mix of 32/40 hardware threads / node, Xeon ~2.4Ghz
> High read volume, low write volume, occasional sstable bulk loading
>            Reporter: Mina Naguib
>            Priority: Major
>
> Over the last week we've been observing two related problems affecting our Cassandra cluster
> Problem 1: 1-few nodes per DC entering GC loop, not recovering
> Checking the heap usage stats, there's a sudden jump of 1-3GB. Some nodes recover, but some don't and log this:
> {noformat}
> 2017-03-21T11:23:02.957-0400: 54099.519: [Full GC (Allocation Failure)  13G->11G(14G), 29.4127307 secs]
> 2017-03-21T11:23:45.270-0400: 54141.833: [Full GC (Allocation Failure)  13G->12G(14G), 28.1561881 secs]
> 2017-03-21T11:24:20.307-0400: 54176.869: [Full GC (Allocation Failure)  13G->13G(14G), 27.7019501 secs]
> 2017-03-21T11:24:50.528-0400: 54207.090: [Full GC (Allocation Failure)  13G->13G(14G), 27.1372267 secs]
> 2017-03-21T11:25:19.190-0400: 54235.752: [Full GC (Allocation Failure)  13G->13G(14G), 27.0703975 secs]
> 2017-03-21T11:25:46.711-0400: 54263.273: [Full GC (Allocation Failure)  13G->13G(14G), 27.3187768 secs]
> 2017-03-21T11:26:15.419-0400: 54291.981: [Full GC (Allocation Failure)  13G->13G(14G), 26.9493405 secs]
> 2017-03-21T11:26:43.399-0400: 54319.961: [Full GC (Allocation Failure)  13G->13G(14G), 27.5222085 secs]
> 2017-03-21T11:27:11.383-0400: 54347.945: [Full GC (Allocation Failure)  13G->13G(14G), 27.1769581 secs]
> 2017-03-21T11:27:40.174-0400: 54376.737: [Full GC (Allocation Failure)  13G->13G(14G), 27.4639031 secs]
> 2017-03-21T11:28:08.946-0400: 54405.508: [Full GC (Allocation Failure)  13G->13G(14G), 30.3480523 secs]
> 2017-03-21T11:28:40.117-0400: 54436.680: [Full GC (Allocation Failure)  13G->13G(14G), 27.8220513 secs]
> 2017-03-21T11:29:08.459-0400: 54465.022: [Full GC (Allocation Failure)  13G->13G(14G), 27.4691271 secs]
> 2017-03-21T11:29:37.114-0400: 54493.676: [Full GC (Allocation Failure)  13G->13G(14G), 27.0275733 secs]
> 2017-03-21T11:30:04.635-0400: 54521.198: [Full GC (Allocation Failure)  13G->13G(14G), 27.1902627 secs]
> 2017-03-21T11:30:32.114-0400: 54548.676: [Full GC (Allocation Failure)  13G->13G(14G), 27.8872850 secs]
> 2017-03-21T11:31:01.430-0400: 54577.993: [Full GC (Allocation Failure)  13G->13G(14G), 27.1609706 secs]
> 2017-03-21T11:31:29.024-0400: 54605.587: [Full GC (Allocation Failure)  13G->13G(14G), 27.3635138 secs]
> 2017-03-21T11:31:57.303-0400: 54633.865: [Full GC (Allocation Failure)  13G->13G(14G), 27.4143510 secs]
> 2017-03-21T11:32:25.110-0400: 54661.672: [Full GC (Allocation Failure)  13G->13G(14G), 27.8595986 secs]
> 2017-03-21T11:32:53.922-0400: 54690.485: [Full GC (Allocation Failure)  13G->13G(14G), 27.5242543 secs]
> 2017-03-21T11:33:21.867-0400: 54718.429: [Full GC (Allocation Failure)  13G->13G(14G), 30.8930130 secs]
> 2017-03-21T11:33:53.712-0400: 54750.275: [Full GC (Allocation Failure)  13G->13G(14G), 27.6523013 secs]
> 2017-03-21T11:34:21.760-0400: 54778.322: [Full GC (Allocation Failure)  13G->13G(14G), 27.3030198 secs]
> 2017-03-21T11:34:50.073-0400: 54806.635: [Full GC (Allocation Failure)  13G->13G(14G), 27.1594154 secs]
> 2017-03-21T11:35:17.743-0400: 54834.306: [Full GC (Allocation Failure)  13G->13G(14G), 27.3766949 secs]
> 2017-03-21T11:35:45.797-0400: 54862.360: [Full GC (Allocation Failure)  13G->13G(14G), 27.5756770 secs]
> 2017-03-21T11:36:13.816-0400: 54890.378: [Full GC (Allocation Failure)  13G->13G(14G), 27.5541813 secs]
> 2017-03-21T11:36:41.926-0400: 54918.488: [Full GC (Allocation Failure)  13G->13G(14G), 33.7510103 secs]
> 2017-03-21T11:37:16.132-0400: 54952.695: [Full GC (Allocation Failure)  13G->13G(14G), 27.4856611 secs]
> 2017-03-21T11:37:44.454-0400: 54981.017: [Full GC (Allocation Failure)  13G->13G(14G), 28.1269335 secs]
> 2017-03-21T11:38:12.774-0400: 55009.337: [Full GC (Allocation Failure)  13G->13G(14G), 27.7830448 secs]
> 2017-03-21T11:38:40.840-0400: 55037.402: [Full GC (Allocation Failure)  13G->13G(14G), 27.3527326 secs]
> 2017-03-21T11:39:08.610-0400: 55065.173: [Full GC (Allocation Failure)  13G->13G(14G), 27.5828941 secs]
> 2017-03-21T11:39:36.833-0400: 55093.396: [Full GC (Allocation Failure)  13G->13G(14G), 27.9303030 secs]
> 2017-03-21T11:40:05.265-0400: 55121.828: [Full GC (Allocation Failure)  13G->13G(14G), 36.9902867 secs]
> 2017-03-21T11:40:42.400-0400: 55158.963: [Full GC (Allocation Failure)  13G->13G(14G), 27.6835744 secs]
> 2017-03-21T11:41:10.529-0400: 55187.091: [Full GC (Allocation Failure)  13G->13G(14G), 27.1899555 secs]
> 2017-03-21T11:41:38.018-0400: 55214.581: [Full GC (Allocation Failure)  13G->13G(14G), 27.7309706 secs]
> 2017-03-21T11:42:06.062-0400: 55242.624: [Full GC (Allocation Failure)  13G->13G(14G), 29.2103234 secs]
> 2017-03-21T11:42:35.579-0400: 55272.141: [Full GC (Allocation Failure)  13G->13G(14G), 27.9902741 secs]
> 2017-03-21T11:43:03.955-0400: 55300.518: [Full GC (Allocation Failure)  13G->13G(14G), 28.0764545 secs]
> 2017-03-21T11:43:32.186-0400: 55328.748: [Full GC (Allocation Failure)  13G->13G(14G), 28.3913296 secs]
> 2017-03-21T11:44:00.810-0400: 55357.373: [Full GC (Allocation Failure)  13G->13G(14G), 27.0945131 secs]
> 2017-03-21T11:44:28.343-0400: 55384.905: [Full GC (Allocation Failure)  13G->13G(14G), 27.3357763 secs]
> 2017-03-21T11:44:55.909-0400: 55412.471: [Full GC (Allocation Failure)  13G->13G(14G), 27.9216626 secs]
> 2017-03-21T11:45:24.070-0400: 55440.632: [Full GC (Allocation Failure)  13G->13G(14G), 29.8189953 secs]
> 2017-03-21T11:45:54.092-0400: 55470.654: [Full GC (Allocation Failure)  13G->13G(14G), 27.9992085 secs]
> 2017-03-21T11:46:22.219-0400: 55498.782: [Full GC (Allocation Failure)  13G->13G(14G), 27.4622100 secs]
> 2017-03-21T11:46:49.983-0400: 55526.545: [Full GC (Allocation Failure)  13G->13G(14G), 27.9159143 secs]
> 2017-03-21T11:47:18.029-0400: 55554.592: [Full GC (Allocation Failure)  13G->13G(14G), 28.5207919 secs]
> 2017-03-21T11:47:46.938-0400: 55583.500: [Full GC (Allocation Failure)  13G->13G(14G), 27.0355291 secs]
> 2017-03-21T11:48:14.235-0400: 55610.798: [Full GC (Allocation Failure)  13G->13G(14G), 35.6284865 secs]
> 2017-03-21T11:48:50.013-0400: 55646.575: [Full GC (Allocation Failure)  13G->13G(14G), 27.3299061 secs]
> 2017-03-21T11:49:17.592-0400: 55674.154: [Full GC (Allocation Failure)  13G->13G(14G), 27.2544348 secs]
> {noformat}
> When a node enters the above state:, it becomes useless over 1-5 minutes:
> * Client connections die
> * New connections time out
> * CPU usage drops - all threads do nothing except for 1 thread near 100% CPU ("VM Thread" os_prio=0 tid=0x00007ffaf8209bf0 nid=0x2d53 runnable"), doing the above GC in a loop
> * JMX becomes unqueriable
> * Regular kill signal is ignored (needs kill -9)
> I've managed to extract a full heap dump of a node while in this state, but it's 20GB and I have not been able to find a tool that can analyze it.
> I've extracted a histogram of heap objects.  From a healthy node:
> {noformat}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:      12688007     1116544616  io.netty.buffer.PooledUnsafeDirectByteBuf
>    2:      24938940      798046080  io.netty.util.Recycler$DefaultHandle
>    3:      11457548      733283072  io.netty.channel.ChannelOutboundBuffer$Entry
>    4:       9582138      459942624  java.nio.HeapByteBuffer
>    5:         16974      393432416  [I
>    6:       2014719      311356992  [B
>    7:       4982167      181324848  [Ljava.lang.Object;
>    8:          8531      118782704  [Lio.netty.util.Recycler$DefaultHandle;
>    9:       2827662      113106480  org.apache.cassandra.db.rows.BufferCell
>   10:       2737816       87610112  org.apache.cassandra.db.rows.BTreeRow
>   11:       2740110       70228240  [Ljava.nio.ByteBuffer;
>   12:       2738081       65713944  org.apache.cassandra.db.BufferClustering
>   13:       1763956       56446592  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
>   14:       1763736       56439552  com.googlecode.concurrentlinkedhashmap.ConcurrentHashMapV8$Node
>   15:       2205825       52939800  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
>   16:       1617026       51744832  java.util.concurrent.ConcurrentHashMap$Node
>   17:       1769934       42478416  org.apache.cassandra.cache.KeyCacheKey
> {noformat}
> From an unhealthy node:
> {noformat}
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:     103726026     4978849248  java.nio.HeapByteBuffer
>    2:      95646048     2832233872  [B
>    3:      28076465     1207916952  [Ljava.lang.Object;
>    4:      12803448     1126703424  io.netty.buffer.PooledUnsafeDirectByteBuf
>    5:      25054634      801748288  io.netty.util.Recycler$DefaultHandle
>    6:      11457548      733283072  io.netty.channel.ChannelOutboundBuffer$Entry
>    7:      24725380      593409120  java.util.ArrayList
>    8:       5808290      185865280  java.util.RegularEnumSet
>    9:       2870889      160769784  org.apache.cassandra.transport.messages.ResultMessage$Rows
>   10:       2937336      140992128  io.netty.buffer.SlicedAbstractByteBuf
>   11:          8854      118773984  [Lio.netty.util.Recycler$DefaultHandle;
>   12:       2830805      113232200  org.apache.cassandra.db.rows.BufferCell
>   13:       2937336       93994752  org.apache.cassandra.transport.Frame$Header
>   14:       2870928       91869696  org.apache.cassandra.cql3.ResultSet$ResultMetadata
>   15:       2728627       87316064  org.apache.cassandra.db.rows.BTreeRow
>   16:       2947033       70728792  java.util.concurrent.ConcurrentLinkedQueue$Node
>   17:       2937336       70496064  org.apache.cassandra.transport.Frame
> {noformat}
> The increase is clear in these top 3 object types:
> {noformat}
>    1:     103726026     4978849248  java.nio.HeapByteBuffer
>    2:      95646048     2832233872  [B
>    3:      28076465     1207916952  [Ljava.lang.Object;
> {noformat}
> Other related observations:
> Client socket states: Essentially the JVM has stopped working, and the recv-queue is piling up kernel-side but not being consumed app-side.  Worth noting that the peers (clients) have given up and closed the connections on their end, but they still appear ESTABlished on cassandra's side since it hasn't drain the read buffer till EOF/error/complete the closure:
> {noformat}
> ESTAB      558612 0      172.30.72.133:9042               172.30.72.145:11418
> ESTAB      564172 0      172.30.72.133:9042               172.30.72.145:12420
> ESTAB      545284 0      172.30.72.133:9042               172.30.72.145:18212
> ESTAB      565468 0      172.30.72.133:9042               172.30.72.145:18346
> ESTAB      525884 0      172.30.72.133:9042               172.30.72.145:18352
> ESTAB      532324 0      172.30.72.133:9042               172.30.72.145:28896
> ESTAB      517592 0      172.30.72.133:9042               172.30.72.145:34504
> ESTAB      553392 0      172.30.72.133:9042               172.30.72.145:36926
> ESTAB      581952 0      172.30.72.133:9042               172.30.72.145:40684
> ESTAB      543668 0      172.30.72.133:9042               172.30.72.145:42282
> ESTAB      511860 0      172.30.72.133:9042               172.30.72.145:47350
> ESTAB      590380 0      172.30.72.133:9042               172.30.72.145:47566
> ESTAB      583120 0      172.30.72.133:9042               172.30.72.145:50386
> ESTAB      536772 0      172.30.72.133:9042               172.30.72.145:52656
> ESTAB      31     0      172.30.72.133:9042               172.30.72.145:52946
> ESTAB      532140 0      172.30.72.133:9042               172.30.72.145:53618
> ESTAB      14397  0      172.30.72.133:9042               172.30.72.150:10016
> ESTAB      6444   0      172.30.72.133:9042               172.30.72.150:12176
> ESTAB      7568   0      172.30.72.133:9042               172.30.72.150:12934
> ESTAB      139479 0      172.30.72.133:9042               172.30.72.150:13186
> ESTAB      6896   0      172.30.72.133:9042               172.30.72.150:17316
> ESTAB      7808   0      172.30.72.133:9042               172.30.72.150:17448
> ESTAB      8712   0      172.30.72.133:9042               172.30.72.150:17748
> ESTAB      6944   0      172.30.72.133:9042               172.30.72.150:18472
> ESTAB      136700 0      172.30.72.133:9042               172.30.72.150:19860
> ESTAB      16260  0      172.30.72.133:9042               172.30.72.150:20696
> ESTAB      8943   0      172.30.72.133:9042               172.30.72.150:21208
> ESTAB      132877 0      172.30.72.133:9042               172.30.72.150:21420
> ESTAB      6936   0      172.30.72.133:9042               172.30.72.150:25440
> ESTAB      92     0      172.30.72.133:9042               172.30.72.150:25778
> ESTAB      137312 0      172.30.72.133:9042               172.30.72.150:28258
> ESTAB      40     0      172.30.72.133:9042               172.30.72.150:28348
> ESTAB      6940   0      172.30.72.133:9042               172.30.72.150:29718
> ESTAB      6416   0      172.30.72.133:9042               172.30.72.150:29858
> ESTAB      6580   0      172.30.72.133:9042               172.30.72.150:30518
> ESTAB      4202   0      172.30.72.133:9042               172.30.72.150:30948
> ESTAB      6060   0      172.30.72.133:9042               172.30.72.150:35276
> ESTAB      0      0      172.30.72.133:9042               172.30.72.150:40114
> ESTAB      134303 0      172.30.72.133:9042               172.30.72.150:42212
> ESTAB      396    0      172.30.72.133:9042               172.30.72.150:44848
> ESTAB      136450 0      172.30.72.133:9042               172.30.72.150:47742
> ESTAB      8318   0      172.30.72.133:9042               172.30.72.150:51054
> ESTAB      11976  0      172.30.72.133:9042               172.30.72.150:54560
> ESTAB      136193 0      172.30.72.133:9042               172.30.72.150:56094
> ESTAB      138737 0      172.30.72.133:9042               172.30.72.150:60382
> ESTAB      7996   0      172.30.72.133:9042               172.30.72.150:60932
> ESTAB      137235 0      172.30.72.133:9042               172.30.72.150:61568
> ESTAB      8184   0      172.30.72.133:9042               172.30.72.150:63054
> ESTAB      985276 0      172.30.72.133:9042               172.30.72.175:23494
> ESTAB      601528 0      172.30.72.133:9042               172.30.72.175:24140
> ESTAB      557844 0      172.30.72.133:9042               172.30.72.175:42248
> ESTAB      469344 0      172.30.72.133:9042               172.30.72.175:59704
> ESTAB      507160 0      172.30.72.133:9042               172.30.72.185:11958
> ESTAB      449196 0      172.30.72.133:9042               172.30.72.185:15458
> ESTAB      487796 0      172.30.72.133:9042               172.30.72.185:16458
> ESTAB      441132 0      172.30.72.133:9042               172.30.72.185:23194
> ESTAB      486664 0      172.30.72.133:9042               172.30.72.185:25276
> ESTAB      554780 0      172.30.72.133:9042               172.30.72.185:29102
> ESTAB      496248 0      172.30.72.133:9042               172.30.72.185:29428
> ESTAB      420048 0      172.30.72.133:9042               172.30.72.185:39650
> ESTAB      498704 0      172.30.72.133:9042               172.30.72.185:46424
> ESTAB      506488 0      172.30.72.133:9042               172.30.72.185:49722
> ESTAB      509416 0      172.30.72.133:9042               172.30.72.185:55838
> ESTAB      511956 0      172.30.72.133:9042               172.30.72.185:57670
> ESTAB      496720 0      172.30.72.133:9042               172.30.72.185:59884
> {noformat}
> perf top during this state:
> {noformat}
>   19.73%  libjvm.so                                              [.] InstanceKlass::oop_follow_contents
>   14.05%  libjvm.so                                              [.] ObjArrayKlass::oop_follow_contents
>    7.83%  libjvm.so                                              [.] MarkSweep::follow_stack
>    6.38%  libjvm.so                                              [.] HeapRegion::block_size
>    5.41%  libjvm.so                                              [.] OtherRegionsTable::add_reference
>    5.12%  libjvm.so                                              [.] UpdateRSOopClosure::do_oop
>    4.77%  libjvm.so                                              [.] HeapRegion::block_is_obj
> {noformat}
> Frequency: Affecting nodes in our 2 busier DCs (other 2 DCs serve less volume).  ~ 2-10 times per day.  Not always same nodes (though lately 2 hiccup more frequently than others).
> Tried:
> * Upgrading cassandra from 3.9 to 3.10
> * Increasing heap size from 8GB to 14GB
> * Upgrading JVM 1.8 from u112 to u121
> * Switching from CMS to G1GC
> The last change, switching from CMS to G1GC reduced the frequency of the problem but has not eliminated it.
> Problem 2: Read volume in DC depressed
> With our RF set to 2 per DC, if 2 or more nodes in a DC enter the above GC loop described in problem 1, all nodes in the same DC accumulate very high active/pending/blocked Task-Transport queues.
> Total read volume for the DC drops to 10%-30% of normal.
> Tried:
> * Increasing -Dcassandra.max_queued_native_transport_requests to 4096
> * Reducing read_request_timeout_in_ms to 400
> * Reducing write_request_timeout_in_ms to 800
> Improved things slightly but not too much.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org