You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Martin Mačura <m....@gmail.com> on 2018/03/23 12:18:31 UTC
Nodes unresponsive after upgrade 3.9 -> 3.11.2
Hi all,
We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
it to 3.11.2.
Each node has 32 GB RAM, 8 GB Cassandra heap size.
After the upgrade, clients started reporting connection issues:
cassandra | [ERROR] Closing established connection pool to host
<redacted> because of the following error: Read error 'connection
reset by peer' (src/pool.cpp:384)
cassandra | [ERROR] Unable to establish a control connection to host
<redacted> because of the following error: Error: 'Request timed out'
(0x0100000E) (src/control_connection.cpp:263)
Cassandra logs are full of garbage collection warnings:
WARN [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
ConcurrentMarkSweep GC in 7858ms. Par Eden Space: 6871908352 ->
1774446288; Par Survivor Space: 858980344 -> 0
INFO [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
Pool Name Active Pending Completed Blocked
All Time Blocked
INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
MutationStage 10 9 2526002 0
0
INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
ViewMutationStage 0 0 0 0
0
INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
ReadStage 2 2 943544 0
0
INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
RequestResponseStage 0 0 1666876 0
0
INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
ReadRepairStage 0 0 10362 0
0
INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
CounterMutationStage 0 0 0 0
0
INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
MiscStage 0 0 0 0
0
INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
CompactionExecutor 0 0 3076 0
0
INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
MemtableReclaimMemory 0 0 44 0
0
INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
PendingRangeCalculator 0 0 4 0
0
INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
GossipStage 0 0 14287 0
0
INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
SecondaryIndexManagement 0 0 0 0
0
INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
HintsDispatcher 0 0 1 0
0
INFO [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_1 0 0 37
0 0
INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_2 0 0 37
0 0
INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
MigrationStage 0 0 2 0
0
INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
MemtablePostFlush 0 0 72 0
0
INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_0 0 0 44
0 0
INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
ValidationExecutor 0 0 0 0
0
INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
Sampler 0 0 0 0
0
INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
MemtableFlushWriter 0 0 44 0
0
INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_5 0 0 37
0 0
INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
InternalResponseStage 0 0 0 0
0
INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_3 0 0 37
0 0
INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
PerDiskMemtableFlushWriter_4 0 0 37
0 0
INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
AntiEntropyStage 0 0 0 0
0
INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
CacheCleanupExecutor 0 0 0 0
0
INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
Native-Transport-Requests 67 0 3627740 0
625945
INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
UserDefinedFunctions 2 36 295998373 0
0
INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 -
CompactionManager 0 0
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 -
MessagingService n/a 1/1
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 -
Cache Type Size Capacity
KeysToSave
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 -
KeyCache 9066584 104857600
all
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 -
RowCache 0 0
all
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 -
Table Memtable ops,data
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system_distributed.parent_repair_history 0,0
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system_distributed.repair_history 0,0
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system_distributed.view_build_status 0,0
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system.compaction_history 18,5064
INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
system.schema_aggregates 0,0
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.schema_triggers 0,0
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.size_estimates 27018,940475
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.paxos 0,0
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.views_builds_in_progress 0,0
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.batches 0,0
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.schema_keyspaces 0,0
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.sstable_activity 118152,2241249
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.batchlog 0,0
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.schema_columns 0,0
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.hints 0,0
INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
system.IndexInfo 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.schema_columnfamilies 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.schema_functions 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.built_views 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.peer_events 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.range_xfers 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.peers 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.transferred_ranges 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.schema_usertypes 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.local 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.available_ranges 0,0
INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
system.prepared_statements 67,13574
INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.columns 0,0
INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.types 0,0
INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.indexes 0,0
INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.keyspaces 0,0
INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.dropped_columns 0,0
INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.aggregates 0,0
INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.triggers 0,0
INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
system_schema.tables 0,0
INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
system_schema.views 0,0
INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
system_schema.functions 0,0
INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
<redacted>.event_postmortem 2855,140051
INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
<redacted>.group_meta 0,0
INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
<redacted>.group_postmortem 5715,381267
INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_source 49336,24272521
INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_alert 1592,78328
INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_group 59896,29310038
INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.group_alert 3185,212161
INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_child 17319,9226867
INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_output 123016,242316519
INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event 137939,66815911
INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
<redacted>.event_meta 1,57
INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
<redacted>.postmortem 4929,4353986
INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
<redacted>.audit_type 8003,2624883
INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
<redacted>.audit 8005,2934624
INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_auth.roles 0,0
INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_auth.role_members 0,0
INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_auth.resource_role_permissons_index 0,0
INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_auth.role_permissions 0,0
INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_traces.sessions 0,0
INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
system_traces.events 0,0
Also, there are many warnings about UDFs executing too slow:
WARN [Native-Transport-Requests-80] 2018-03-23 05:08:41,327
UDFunction.java:436 - User defined function asm_log.ttime_state :
(frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint,
bigint>> ran longer than 500ms
And sometimes the node crashes with the following:
ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027
JVMStabilityInspector.java:162 - JVM state determined to be unstable.
Exiting forcefully due to:
java.util.concurrent.TimeoutException: User defined function
<redacted>.checks_state : (frozen<tuple<bigint>>, int, int) ->
frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM
There is no substantial increase in load average.
We increased heap size to 12 GB, that did not help. Is there anything
else we can try, besides downgrading?
Thanks,
Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org
Re: Nodes unresponsive after upgrade 3.9 -> 3.11.2
Posted by Nitan Kainth <ni...@gmail.com>.
Martin,
Would you pls share settings you had before and what did you change? We have similar issue.
> On Mar 23, 2018, at 8:47 AM, Martin Mačura <m....@gmail.com> wrote:
>
> Nevermind, we resolved the issue.... JVM heap settings were misconfigured
>
> Martin
>
>> On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura <m....@gmail.com> wrote:
>> Hi all,
>>
>> We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
>> it to 3.11.2.
>>
>> Each node has 32 GB RAM, 8 GB Cassandra heap size.
>>
>> After the upgrade, clients started reporting connection issues:
>>
>> cassandra | [ERROR] Closing established connection pool to host
>> <redacted> because of the following error: Read error 'connection
>> reset by peer' (src/pool.cpp:384)
>> cassandra | [ERROR] Unable to establish a control connection to host
>> <redacted> because of the following error: Error: 'Request timed out'
>> (0x0100000E) (src/control_connection.cpp:263)
>>
>>
>> Cassandra logs are full of garbage collection warnings:
>>
>> WARN [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
>> ConcurrentMarkSweep GC in 7858ms. Par Eden Space: 6871908352 ->
>> 1774446288; Par Survivor Space: 858980344 -> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
>> Pool Name Active Pending Completed Blocked
>> All Time Blocked
>> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> MutationStage 10 9 2526002 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> ViewMutationStage 0 0 0 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> ReadStage 2 2 943544 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
>> RequestResponseStage 0 0 1666876 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> ReadRepairStage 0 0 10362 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> CounterMutationStage 0 0 0 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> MiscStage 0 0 0 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> CompactionExecutor 0 0 3076 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
>> MemtableReclaimMemory 0 0 44 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> PendingRangeCalculator 0 0 4 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> GossipStage 0 0 14287 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> SecondaryIndexManagement 0 0 0 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
>> HintsDispatcher 0 0 1 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_1 0 0 37
>> 0 0
>> INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_2 0 0 37
>> 0 0
>> INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
>> MigrationStage 0 0 2 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> MemtablePostFlush 0 0 72 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_0 0 0 44
>> 0 0
>> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> ValidationExecutor 0 0 0 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
>> Sampler 0 0 0 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> MemtableFlushWriter 0 0 44 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_5 0 0 37
>> 0 0
>> INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
>> InternalResponseStage 0 0 0 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_3 0 0 37
>> 0 0
>> INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
>> PerDiskMemtableFlushWriter_4 0 0 37
>> 0 0
>> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> AntiEntropyStage 0 0 0 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> CacheCleanupExecutor 0 0 0 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> Native-Transport-Requests 67 0 3627740 0
>> 625945
>> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
>> UserDefinedFunctions 2 36 295998373 0
>> 0
>> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 -
>> CompactionManager 0 0
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 -
>> MessagingService n/a 1/1
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 -
>> Cache Type Size Capacity
>> KeysToSave
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 -
>> KeyCache 9066584 104857600
>> all
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 -
>> RowCache 0 0
>> all
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 -
>> Table Memtable ops,data
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system_distributed.parent_repair_history 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system_distributed.repair_history 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system_distributed.view_build_status 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system.compaction_history 18,5064
>> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
>> system.schema_aggregates 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.schema_triggers 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.size_estimates 27018,940475
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.paxos 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.views_builds_in_progress 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.batches 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.schema_keyspaces 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.sstable_activity 118152,2241249
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.batchlog 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.schema_columns 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.hints 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
>> system.IndexInfo 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.schema_columnfamilies 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.schema_functions 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.built_views 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.peer_events 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.range_xfers 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.peers 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.transferred_ranges 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.schema_usertypes 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.local 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.available_ranges 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
>> system.prepared_statements 67,13574
>> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.columns 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.types 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.indexes 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.keyspaces 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.dropped_columns 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.aggregates 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.triggers 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
>> system_schema.tables 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> system_schema.views 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> system_schema.functions 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> <redacted>.event_postmortem 2855,140051
>> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> <redacted>.group_meta 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
>> <redacted>.group_postmortem 5715,381267
>> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_source 49336,24272521
>> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_alert 1592,78328
>> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_group 59896,29310038
>> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.group_alert 3185,212161
>> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_child 17319,9226867
>> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_output 123016,242316519
>> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event 137939,66815911
>> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
>> <redacted>.event_meta 1,57
>> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> <redacted>.postmortem 4929,4353986
>> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> <redacted>.audit_type 8003,2624883
>> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> <redacted>.audit 8005,2934624
>> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.roles 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.role_members 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.resource_role_permissons_index 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_auth.role_permissions 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_traces.sessions 0,0
>> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
>> system_traces.events 0,0
>>
>> Also, there are many warnings about UDFs executing too slow:
>>
>> WARN [Native-Transport-Requests-80] 2018-03-23 05:08:41,327
>> UDFunction.java:436 - User defined function asm_log.ttime_state :
>> (frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint,
>> bigint>> ran longer than 500ms
>>
>> And sometimes the node crashes with the following:
>> ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027
>> JVMStabilityInspector.java:162 - JVM state determined to be unstable.
>> Exiting forcefully due to:
>> java.util.concurrent.TimeoutException: User defined function
>> <redacted>.checks_state : (frozen<tuple<bigint>>, int, int) ->
>> frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM
>>
>> There is no substantial increase in load average.
>>
>> We increased heap size to 12 GB, that did not help. Is there anything
>> else we can try, besides downgrading?
>>
>>
>>
>> Thanks,
>>
>> Martin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org
Re: Nodes unresponsive after upgrade 3.9 -> 3.11.2
Posted by Martin Mačura <m....@gmail.com>.
Nevermind, we resolved the issue.... JVM heap settings were misconfigured
Martin
On Fri, Mar 23, 2018 at 1:18 PM, Martin Mačura <m....@gmail.com> wrote:
> Hi all,
>
> We have a cluster of 3 nodes with RF 3 that ran fine until we upgraded
> it to 3.11.2.
>
> Each node has 32 GB RAM, 8 GB Cassandra heap size.
>
> After the upgrade, clients started reporting connection issues:
>
> cassandra | [ERROR] Closing established connection pool to host
> <redacted> because of the following error: Read error 'connection
> reset by peer' (src/pool.cpp:384)
> cassandra | [ERROR] Unable to establish a control connection to host
> <redacted> because of the following error: Error: 'Request timed out'
> (0x0100000E) (src/control_connection.cpp:263)
>
>
> Cassandra logs are full of garbage collection warnings:
>
> WARN [Service Thread] 2018-03-23 05:04:17,780 GCInspector.java:282 -
> ConcurrentMarkSweep GC in 7858ms. Par Eden Space: 6871908352 ->
> 1774446288; Par Survivor Space: 858980344 -> 0
> INFO [Service Thread] 2018-03-23 05:04:17,780 StatusLogger.java:47 -
> Pool Name Active Pending Completed Blocked
> All Time Blocked
> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> MutationStage 10 9 2526002 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> ViewMutationStage 0 0 0 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> ReadStage 2 2 943544 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,784 StatusLogger.java:51 -
> RequestResponseStage 0 0 1666876 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> ReadRepairStage 0 0 10362 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> CounterMutationStage 0 0 0 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> MiscStage 0 0 0 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> CompactionExecutor 0 0 3076 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,785 StatusLogger.java:51 -
> MemtableReclaimMemory 0 0 44 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> PendingRangeCalculator 0 0 4 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> GossipStage 0 0 14287 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> SecondaryIndexManagement 0 0 0 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,786 StatusLogger.java:51 -
> HintsDispatcher 0 0 1 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,804 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_1 0 0 37
> 0 0
> INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_2 0 0 37
> 0 0
> INFO [Service Thread] 2018-03-23 05:04:17,805 StatusLogger.java:51 -
> MigrationStage 0 0 2 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> MemtablePostFlush 0 0 72 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_0 0 0 44
> 0 0
> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> ValidationExecutor 0 0 0 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,806 StatusLogger.java:51 -
> Sampler 0 0 0 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> MemtableFlushWriter 0 0 44 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_5 0 0 37
> 0 0
> INFO [Service Thread] 2018-03-23 05:04:17,807 StatusLogger.java:51 -
> InternalResponseStage 0 0 0 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_3 0 0 37
> 0 0
> INFO [Service Thread] 2018-03-23 05:04:17,819 StatusLogger.java:51 -
> PerDiskMemtableFlushWriter_4 0 0 37
> 0 0
> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
> AntiEntropyStage 0 0 0 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
> CacheCleanupExecutor 0 0 0 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
> Native-Transport-Requests 67 0 3627740 0
> 625945
> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:51 -
> UserDefinedFunctions 2 36 295998373 0
> 0
> INFO [Service Thread] 2018-03-23 05:04:17,820 StatusLogger.java:61 -
> CompactionManager 0 0
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:73 -
> MessagingService n/a 1/1
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:83 -
> Cache Type Size Capacity
> KeysToSave
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:85 -
> KeyCache 9066584 104857600
> all
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:91 -
> RowCache 0 0
> all
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:98 -
> Table Memtable ops,data
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system_distributed.parent_repair_history 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system_distributed.repair_history 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system_distributed.view_build_status 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system.compaction_history 18,5064
> INFO [Service Thread] 2018-03-23 05:04:17,821 StatusLogger.java:101 -
> system.schema_aggregates 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.schema_triggers 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.size_estimates 27018,940475
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.paxos 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.views_builds_in_progress 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.batches 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.schema_keyspaces 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.sstable_activity 118152,2241249
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.batchlog 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.schema_columns 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.hints 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,822 StatusLogger.java:101 -
> system.IndexInfo 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.schema_columnfamilies 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.schema_functions 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.built_views 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.peer_events 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.range_xfers 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.peers 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.transferred_ranges 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.schema_usertypes 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.local 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.available_ranges 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,823 StatusLogger.java:101 -
> system.prepared_statements 67,13574
> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.columns 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.types 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.indexes 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.keyspaces 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.dropped_columns 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.aggregates 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.triggers 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,824 StatusLogger.java:101 -
> system_schema.tables 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> system_schema.views 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> system_schema.functions 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> <redacted>.event_postmortem 2855,140051
> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> <redacted>.group_meta 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,825 StatusLogger.java:101 -
> <redacted>.group_postmortem 5715,381267
> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_source 49336,24272521
> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_alert 1592,78328
> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_group 59896,29310038
> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.group_alert 3185,212161
> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_child 17319,9226867
> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_output 123016,242316519
> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event 137939,66815911
> INFO [Service Thread] 2018-03-23 05:04:17,826 StatusLogger.java:101 -
> <redacted>.event_meta 1,57
> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> <redacted>.postmortem 4929,4353986
> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> <redacted>.audit_type 8003,2624883
> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> <redacted>.audit 8005,2934624
> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_auth.roles 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_auth.role_members 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_auth.resource_role_permissons_index 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_auth.role_permissions 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_traces.sessions 0,0
> INFO [Service Thread] 2018-03-23 05:04:17,827 StatusLogger.java:101 -
> system_traces.events 0,0
>
> Also, there are many warnings about UDFs executing too slow:
>
> WARN [Native-Transport-Requests-80] 2018-03-23 05:08:41,327
> UDFunction.java:436 - User defined function asm_log.ttime_state :
> (frozen<tuple<bigint, bigint>>, int, int, int) -> frozen<tuple<bigint,
> bigint>> ran longer than 500ms
>
> And sometimes the node crashes with the following:
> ERROR [NonPeriodicTasks:1] 2018-03-23 05:04:18,027
> JVMStabilityInspector.java:162 - JVM state determined to be unstable.
> Exiting forcefully due to:
> java.util.concurrent.TimeoutException: User defined function
> <redacted>.checks_state : (frozen<tuple<bigint>>, int, int) ->
> frozen<tuple<bigint>> ran longer than 1500ms - will stop Cassandra VM
>
> There is no substantial increase in load average.
>
> We increased heap size to 12 GB, that did not help. Is there anything
> else we can try, besides downgrading?
>
>
>
> Thanks,
>
> Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org