You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Aleksey Yeschenko (JIRA)" <ji...@apache.org> on 2015/11/12 17:41:11 UTC

[jira] [Updated] (CASSANDRA-9539) Race condition in schema propagation with dependence for cluster stability

     [ https://issues.apache.org/jira/browse/CASSANDRA-9539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko updated CASSANDRA-9539:
-----------------------------------------
    Issue Type: Improvement  (was: Bug)

> Race condition in schema propagation with dependence for cluster stability
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9539
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9539
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>             Fix For: 3.0.x
>
>
> Follow up from CASSANDRA-8099. Split out into its own ticket for discussion following a brief exchange over github.
> My initial comment in SerializationHeader
> {quote}
> // TODO 8099: this looks like a potential race condition with schema changes to me: within a given node we
> // can accept writes to a column not present in the metadata, or receive stream data without them.
> // This shouldn't cause deserialization to fail
> {quote}
> And [~slebresne]'s response:
> {quote}
> I've also somewhat edited the comment in {{SerializationHeader}}. It's true that we're theoretically racy, but it's not a new thing to 8099 nor isolated to this specific part of the code. In fact, I suspect we're not terribly likely to get a problem at this particular point of the code because while nodes are not prevented from taking writes for columns they don't know about yet, we'll complain before it reaches the memtable (in the CQL layer if that's the coordinator, in message deserialization otherwise). And while we could get it through streams, given how schema propagation work and where streaming is used, it feels quite unlikely that streaming would reach a node before a schema change.
> Anyway, don't mean by that that we shouldn't improve all of this, just adding my bit of context.
> {quote}
> My concern is that we expose ourselves to nodes failing to start up if there is a bug or problem with schema propagation, or if the race condition manages to present purely through timing, let's say due to flapping network problems (either are possible, but the former is more likely). Right now we would continue to function in this scenario, but after 8099 the node will fail on opening its sstables. I think this is something we should fix preferably before, or early on in release. We know our schema propagation code is not brlliant, and tightly coupling stability of the cluster to it concerns me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)