You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "mck (JIRA)" <ji...@apache.org> on 2017/04/12 22:09:41 UTC

[jira] [Commented] (CASSANDRA-13441) Schema version uses built-in digest which includes timestamps, causing migration storms

    [ https://issues.apache.org/jira/browse/CASSANDRA-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966726#comment-15966726 ] 

mck commented on CASSANDRA-13441:
---------------------------------

I suspect we'll see a number of people doing 2.1.x and 2.2.x upgrades to 3.11.x (especially the bigger clusters after a few patch releases on 3.11), long before we see many upgrading to 4.0.x.

Why not slate this for 3.11.x ?

> Schema version uses built-in digest which includes timestamps, causing migration storms
> ---------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13441
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13441
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Schema
>            Reporter: Jeff Jirsa
>            Assignee: Jeff Jirsa
>             Fix For: 4.x
>
>
> In versions < 3.0, schema was essentially deterministic - a given schema always hashed to the same version, so during a rolling upgrade (say 2.0 -> 2.1), the first node to upgrade to 2.1 would add the new tables, setting the new 2.1 version ID, and subsequently upgraded hosts would settle on that version.
> In 3.0, we delegate the digest calculation to the post-8099 data structures, which are the same digest calculators used in the read path for digest match/mismatch - which means it includes timestamps (and ttls).
> Since schema will never use TTL, we don't care about TTL fields. Similarly, when a 3.0 node upgrades and writes its own new-in-3.0 system tables, it'll write the same tables that exist in the schema with brand new timestamps. As written, this will cause all nodes in the cluster to change schema (to the version with the newest timestamp), and then change a second time as the non-system schema is propagated to the newly upgraded nodes.
> On a sufficiently large cluster with a non-trivial schema, this could cause (literally) millions of migration tasks to needlessly bounce across the cluster.
> Up for discussion: if we fix this in 3.0 (say 3.0.X where X >= 14), then any 3.0 node below this will always mismatch, and cause ping-ponging described in CASSANDRA-11050 . However, if we don't fix it, we create a situation that's potentially an outage on rolling upgrade. I'm leaning towards a strong warning in NEWS about the right way to upgrade, and fixing it in 4.x, but wouldn't mind hearing opinions from [~slebresne] and [~iamaleksey] and [~amorton] since you three already talked about this on CASSANDRA-11050 . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)