You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Martin Mačura <m....@gmail.com> on 2018/03/23 12:18:31 UTC

Nodes unresponsive after upgrade 3.9 -> 3.11.2

Hi all,

We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
it to 3.11.2.

Each node has 32 GB RAM, 8 GB Cassandra heap size.

After the upgrade, clients started reporting connection issues:

cassandra | [ERROR] Closing established connection pool to host
<redacted> because of the following error: Read error 'connection
reset by peer' (src/pool.cpp:384)
cassandra | [ERROR] Unable to establish a control connection to host
<redacted> because of the following error: Error: 'Request timed out'
(0x0100000E) (src/control_connection.cpp:263)


Cassandra logs are full of garbage collection warnings:

WARN  [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
ConcurrentMarkSweep GC in 7858ms.  Par Eden Space: 6871908352 ->
1774446288; Par Survivor Space: 858980344 -> 0
INFO  [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
Pool Name                    Active   Pending      Completed   Blocked
 All Time Blocked
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
MutationStage                    10         9        2526002         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
ViewMutationStage                 0         0              0         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
ReadStage                         2         2         943544         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
RequestResponseStage              0         0        1666876         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
ReadRepairStage                   0         0          10362         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
CounterMutationStage              0         0              0         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
MiscStage                         0         0              0         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
CompactionExecutor                0         0           3076         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
MemtableReclaimMemory             0         0             44         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
PendingRangeCalculator            0         0              4         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
GossipStage                       0         0          14287         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
SecondaryIndexManagement          0         0              0         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
HintsDispatcher                   0         0              1         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_1         0         0             37
 0                 0
INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_2         0         0             37
 0                 0
INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
MigrationStage                    0         0              2         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
MemtablePostFlush                 0         0             72         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_0         0         0             44
 0                 0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
ValidationExecutor                0         0              0         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
Sampler                           0         0              0         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
MemtableFlushWriter               0         0             44         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_5         0         0             37
 0                 0
INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
InternalResponseStage             0         0              0         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_3         0         0             37
 0                 0
INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_4         0         0             37
 0                 0
INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
AntiEntropyStage                  0         0              0         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
CacheCleanupExecutor              0         0              0         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
Native-Transport-Requests        67         0        3627740         0
           625945
INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
UserDefinedFunctions              2        36      295998373         0
                0
INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 -
CompactionManager                 0         0
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 -
MessagingService                n/a       1/1
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 -
Cache Type                     Size                 Capacity
    KeysToSave
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 -
KeyCache                    9066584                104857600
           all
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 -
RowCache                          0                        0
           all
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 -
Table                       Memtable ops,data
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system_distributed.parent_repair_history                 0,0
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system_distributed.repair_history                 0,0
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system_distributed.view_build_status                 0,0
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system.compaction_history             18,5064
INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system.schema_aggregates                  0,0
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.schema_triggers                    0,0
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.size_estimates            27018,940475
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.paxos                              0,0
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.views_builds_in_progress                 0,0
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.batches                            0,0
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.schema_keyspaces                   0,0
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.sstable_activity        118152,2241249
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.batchlog                           0,0
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.schema_columns                     0,0
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.hints                              0,0
INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.IndexInfo                          0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.schema_columnfamilies                 0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.schema_functions                   0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.built_views                        0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.peer_events                        0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.range_xfers                        0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.peers                              0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.transferred_ranges                 0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.schema_usertypes                   0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.local                              0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.available_ranges                   0,0
INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.prepared_statements            67,13574
INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.columns                     0,0
INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.types                       0,0
INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.indexes                     0,0
INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.keyspaces                   0,0
INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.dropped_columns                 0,0
INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.aggregates                  0,0
INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.triggers                    0,0
INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.tables                      0,0
INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
system_schema.views                       0,0
INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
system_schema.functions                   0,0
INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
<redacted>.event_postmortem          2855,140051
INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
<redacted>.group_meta                        0,0
INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
<redacted>.group_postmortem          5715,381267
INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_source           49336,24272521
INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_alert                1592,78328
INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_group            59896,29310038
INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.group_alert               3185,212161
INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_child             17319,9226867
INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_output         123016,242316519
INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event                 137939,66815911
INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_meta                       1,57
INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
<redacted>.postmortem               4929,4353986
INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
<redacted>.audit_type                   8003,2624883
INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
<redacted>.audit                        8005,2934624
INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_auth.roles                         0,0
INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_auth.role_members                  0,0
INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_auth.resource_role_permissons_index                 0,0
INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_auth.role_permissions                 0,0
INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_traces.sessions                    0,0
INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_traces.events                      0,0

Also, there are many warnings about UDFs executing too slow:

WARN  [Native-Transport-Requests-80] 2018-03-23 05:08:41,327
UDFunction.java:436 - User defined function asm_log.ttime_state :
(frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint,
bigint>> ran longer than 500ms

And sometimes the node crashes with the following:
ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027
JVMStabilityInspector.java:162 - JVM state determined to be unstable.
Exiting forcefully due to:
java.util.concurrent.TimeoutException: User defined function
<redacted>.checks_state : (frozen<tuple<bigint>>, int, int) ->
frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM

There is no substantial increase in load average.

We increased heap size to 12 GB, that did not help.  Is there anything
else we can try, besides downgrading?



Thanks,

Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Nodes unresponsive after upgrade 3.9 -> 3.11.2

Posted by Nitan Kainth <ni...@gmail.com>.
Martin,

Would you pls share settings you had before and what did you change? We have similar issue.



> On Mar 23, 2018, at 8:47 AM, Martin Mačura <m....@gmail.com> wrote:
> 
> Nevermind, we resolved the issue....  JVM heap settings were misconfigured
> 
> Martin
> 
>> On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura <m....@gmail.com> wrote:
>> Hi all,
>> 
>> We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
>> it to 3.11.2.
>> 
>> Each node has 32 GB RAM, 8 GB Cassandra heap size.
>> 
>> After the upgrade, clients started reporting connection issues:
>> 
>> cassandra | [ERROR] Closing established connection pool to host
>> <redacted> because of the following error: Read error 'connection
>> reset by peer' (src/pool.cpp:384)
>> cassandra | [ERROR] Unable to establish a control connection to host
>> <redacted> because of the following error: Error: 'Request timed out'
>> (0x0100000E) (src/control_connection.cpp:263)
>> 
>> 
>> Cassandra logs are full of garbage collection warnings:
>> 
>> WARN  [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
>> ConcurrentMarkSweep GC in 7858ms.  Par Eden Space: 6871908352 ->
>> 1774446288; Par Survivor Space: 858980344 -> 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
>> Pool Name                    Active   Pending      Completed   Blocked
>> All Time Blocked
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> MutationStage                    10         9        2526002         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> ViewMutationStage                 0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> ReadStage                         2         2         943544         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> RequestResponseStage              0         0        1666876         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> ReadRepairStage                   0         0          10362         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> CounterMutationStage              0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> MiscStage                         0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> CompactionExecutor                0         0           3076         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> MemtableReclaimMemory             0         0             44         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> PendingRangeCalculator            0         0              4         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> GossipStage                       0         0          14287         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> SecondaryIndexManagement          0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> HintsDispatcher                   0         0              1         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_1         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_2         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
>> MigrationStage                    0         0              2         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> MemtablePostFlush                 0         0             72         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_0         0         0             44
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> ValidationExecutor                0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> Sampler                           0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> MemtableFlushWriter               0         0             44         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_5         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> InternalResponseStage             0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_3         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_4         0         0             37
>> 0                 0
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> AntiEntropyStage                  0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> CacheCleanupExecutor              0         0              0         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> Native-Transport-Requests        67         0        3627740         0
>>           625945
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> UserDefinedFunctions              2        36      295998373         0
>>                0
>> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 -
>> CompactionManager                 0         0
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 -
>> MessagingService                n/a       1/1
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 -
>> Cache Type                     Size                 Capacity
>>    KeysToSave
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 -
>> KeyCache                    9066584                104857600
>>           all
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 -
>> RowCache                          0                        0
>>           all
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 -
>> Table                       Memtable ops,data
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system_distributed.parent_repair_history                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system_distributed.repair_history                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system_distributed.view_build_status                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system.compaction_history             18,5064
>> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system.schema_aggregates                  0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.schema_triggers                    0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.size_estimates            27018,940475
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.paxos                              0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.views_builds_in_progress                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.batches                            0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.schema_keyspaces                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.sstable_activity        118152,2241249
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.batchlog                           0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.schema_columns                     0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.hints                              0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.IndexInfo                          0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.schema_columnfamilies                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.schema_functions                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.built_views                        0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.peer_events                        0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.range_xfers                        0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.peers                              0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.transferred_ranges                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.schema_usertypes                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.local                              0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.available_ranges                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.prepared_statements            67,13574
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.columns                     0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.types                       0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.indexes                     0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.keyspaces                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.dropped_columns                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.aggregates                  0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.triggers                    0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.tables                      0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> system_schema.views                       0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> system_schema.functions                   0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> <redacted>.event_postmortem          2855,140051
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> <redacted>.group_meta                        0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> <redacted>.group_postmortem          5715,381267
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_source           49336,24272521
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_alert                1592,78328
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_group            59896,29310038
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.group_alert               3185,212161
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_child             17319,9226867
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_output         123016,242316519
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event                 137939,66815911
>> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_meta                       1,57
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> <redacted>.postmortem               4929,4353986
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> <redacted>.audit_type                   8003,2624883
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> <redacted>.audit                        8005,2934624
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.roles                         0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.role_members                  0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.resource_role_permissons_index                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.role_permissions                 0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_traces.sessions                    0,0
>> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_traces.events                      0,0
>> 
>> Also, there are many warnings about UDFs executing too slow:
>> 
>> WARN  [Native-Transport-Requests-80] 2018-03-23 05:08:41,327
>> UDFunction.java:436 - User defined function asm_log.ttime_state :
>> (frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint,
>> bigint>> ran longer than 500ms
>> 
>> And sometimes the node crashes with the following:
>> ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027
>> JVMStabilityInspector.java:162 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>> java.util.concurrent.TimeoutException: User defined function
>> <redacted>.checks_state : (frozen<tuple<bigint>>, int, int) ->
>> frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM
>> 
>> There is no substantial increase in load average.
>> 
>> We increased heap size to 12 GB, that did not help.  Is there anything
>> else we can try, besides downgrading?
>> 
>> 
>> 
>> Thanks,
>> 
>> Martin
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: Nodes unresponsive after upgrade 3.9 -> 3.11.2

Posted by Martin Mačura <m....@gmail.com>.
Nevermind, we resolved the issue....  JVM heap settings were misconfigured

Martin

On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura <m....@gmail.com> wrote:
> Hi all,
>
> We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
> it to 3.11.2.
>
> Each node has 32 GB RAM, 8 GB Cassandra heap size.
>
> After the upgrade, clients started reporting connection issues:
>
> cassandra | [ERROR] Closing established connection pool to host
> <redacted> because of the following error: Read error 'connection
> reset by peer' (src/pool.cpp:384)
> cassandra | [ERROR] Unable to establish a control connection to host
> <redacted> because of the following error: Error: 'Request timed out'
> (0x0100000E) (src/control_connection.cpp:263)
>
>
> Cassandra logs are full of garbage collection warnings:
>
> WARN  [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
> ConcurrentMarkSweep GC in 7858ms.  Par Eden Space: 6871908352 ->
> 1774446288; Par Survivor Space: 858980344 -> 0
> INFO  [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
> Pool Name                    Active   Pending      Completed   Blocked
>  All Time Blocked
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> MutationStage                    10         9        2526002         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> ViewMutationStage                 0         0              0         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> ReadStage                         2         2         943544         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> RequestResponseStage              0         0        1666876         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> ReadRepairStage                   0         0          10362         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> CounterMutationStage              0         0              0         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> MiscStage                         0         0              0         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> CompactionExecutor                0         0           3076         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> MemtableReclaimMemory             0         0             44         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> PendingRangeCalculator            0         0              4         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> GossipStage                       0         0          14287         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> SecondaryIndexManagement          0         0              0         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> HintsDispatcher                   0         0              1         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_1         0         0             37
>  0                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_2         0         0             37
>  0                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
> MigrationStage                    0         0              2         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> MemtablePostFlush                 0         0             72         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_0         0         0             44
>  0                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> ValidationExecutor                0         0              0         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> Sampler                           0         0              0         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> MemtableFlushWriter               0         0             44         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_5         0         0             37
>  0                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> InternalResponseStage             0         0              0         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_3         0         0             37
>  0                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_4         0         0             37
>  0                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
> AntiEntropyStage                  0         0              0         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
> CacheCleanupExecutor              0         0              0         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
> Native-Transport-Requests        67         0        3627740         0
>            625945
> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
> UserDefinedFunctions              2        36      295998373         0
>                 0
> INFO  [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 -
> CompactionManager                 0         0
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 -
> MessagingService                n/a       1/1
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 -
> Cache Type                     Size                 Capacity
>     KeysToSave
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 -
> KeyCache                    9066584                104857600
>            all
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 -
> RowCache                          0                        0
>            all
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 -
> Table                       Memtable ops,data
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system_distributed.parent_repair_history                 0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system_distributed.repair_history                 0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system_distributed.view_build_status                 0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system.compaction_history             18,5064
> INFO  [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system.schema_aggregates                  0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.schema_triggers                    0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.size_estimates            27018,940475
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.paxos                              0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.views_builds_in_progress                 0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.batches                            0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.schema_keyspaces                   0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.sstable_activity        118152,2241249
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.batchlog                           0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.schema_columns                     0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.hints                              0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.IndexInfo                          0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.schema_columnfamilies                 0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.schema_functions                   0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.built_views                        0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.peer_events                        0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.range_xfers                        0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.peers                              0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.transferred_ranges                 0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.schema_usertypes                   0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.local                              0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.available_ranges                   0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.prepared_statements            67,13574
> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.columns                     0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.types                       0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.indexes                     0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.keyspaces                   0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.dropped_columns                 0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.aggregates                  0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.triggers                    0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.tables                      0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> system_schema.views                       0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> system_schema.functions                   0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> <redacted>.event_postmortem          2855,140051
> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> <redacted>.group_meta                        0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> <redacted>.group_postmortem          5715,381267
> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_source           49336,24272521
> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_alert                1592,78328
> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_group            59896,29310038
> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.group_alert               3185,212161
> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_child             17319,9226867
> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_output         123016,242316519
> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event                 137939,66815911
> INFO  [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_meta                       1,57
> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> <redacted>.postmortem               4929,4353986
> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> <redacted>.audit_type                   8003,2624883
> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> <redacted>.audit                        8005,2934624
> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_auth.roles                         0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_auth.role_members                  0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_auth.resource_role_permissons_index                 0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_auth.role_permissions                 0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_traces.sessions                    0,0
> INFO  [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_traces.events                      0,0
>
> Also, there are many warnings about UDFs executing too slow:
>
> WARN  [Native-Transport-Requests-80] 2018-03-23 05:08:41,327
> UDFunction.java:436 - User defined function asm_log.ttime_state :
> (frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint,
> bigint>> ran longer than 500ms
>
> And sometimes the node crashes with the following:
> ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027
> JVMStabilityInspector.java:162 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.util.concurrent.TimeoutException: User defined function
> <redacted>.checks_state : (frozen<tuple<bigint>>, int, int) ->
> frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM
>
> There is no substantial increase in load average.
>
> We increased heap size to 12 GB, that did not help.  Is there anything
> else we can try, besides downgrading?
>
>
>
> Thanks,
>
> Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org