You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Matt Byrd (Jira)" <ji...@apache.org> on 2019/12/05 18:58:00 UTC

[jira] [Assigned] (CASSANDRA-11748) Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process

     [ https://issues.apache.org/jira/browse/CASSANDRA-11748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Byrd reassigned CASSANDRA-11748:
-------------------------------------

    Assignee:     (was: Nirmal Singh KPS)

> Schema version mismatch may leads to Casandra OOM at bootstrap during a rolling upgrade process
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-11748
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11748
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>         Environment: Rolling upgrade process from 1.2.19 to 2.0.17. 
> CentOS 6.6
> Occurred in different C* node of different scale of deployment (2G ~ 5G)
>            Reporter: Michael Fong
>            Priority: Urgent
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We have observed multiple times when a multi-node C* (v2.0.17) cluster ran into OOM in bootstrap during a rolling upgrade process from 1.2.19 to 2.0.17. 
> Here is the simple guideline of our rolling upgrade process
> 1. Update schema on a node, and wait until all nodes to be in schema version agreemnt - via nodetool describeclulster
> 2. Restart a Cassandra node
> 3. After restart, there is a chance that the the restarted node has different schema version.
> 4. All nodes in cluster start to rapidly exchange schema information, and any of node could run into OOM. 
> The following is the system.log that occur in one of our 2-node cluster test bed
> ----------------------------------
> Before rebooting node 2:
> Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 MigrationManager.java (line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 MigrationManager.java (line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
> After rebooting node 2, 
> Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b
> The node2  keeps submitting the migration task over 100+ times to the other node.
> INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node /192.168.88.33 has restarted, now UP
> INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) Updating topology for /192.168.88.33
> ...
> DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 102) Submitting migration task for /192.168.88.33
> ... ( over 100+ times)
> ----------------------------------
> On the otherhand, Node 1 keeps updating its gossip information, followed by receiving and submitting migrationTask afterwards: 
> INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line 978) InetAddress /192.168.88.34 is now UP
> ...
> DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 MigrationRequestVerbHandler.java (line 41) Received migration request from /192.168.88.34.
> …… ( over 100+ times)
> DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line 127) submitting migration task for /192.168.88.34
> .....  (over 50+ times)
> On the side note, we have over 200+ column families defined in Cassandra database, which may related to this amount of rpc traffic.
> P.S.2 The over requested schema migration task will eventually have InternalResponseStage performing schema merge operation. Since this operation requires a compaction for each merge and is much slower to consume. Thus, the back-pressure of incoming schema migration content objects consumes all of the heap space and ultimately ends up OOM!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org