You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/05/03 15:58:03 UTC

[jira] [Assigned] (CASSANDRA-833) fix consistencylevel during bootstrap

     [ https://issues.apache.org/jira/browse/CASSANDRA-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-833:
----------------------------------------

    Assignee: Sylvain Lebresne

Consider the case of CL=1, RF=3 to replicas A, B, C. We begin bootstrapping node D, and write a row K to the range being moved from C to D.

If the cluster is heavily loaded, it's possible that we write one copy to C, all the other writes get dropped, and once bootstrap completes we lose the row. Or if we write one copy to D, and cancel bootstrap, we again lose the row.

As said above, we want to satisfy CL for both the pre- and post-bootstrap nodes (in case bootstrap aborts).  This requires treating the old/new range owner as a unit: both D *and* C need to accept the write for it to count towards CL. So rather than considering {A, B, C, D} we should consider {A, B, (C, D)}.

This is a lot of complexity to introduce. A simplification that preserves correctness is to continue treating nodes independently but require *one more node* than normal CL. So CL=1 would actually require 2 nodes; CL=Q would require 3 (for RF=3), and so forth.  (Note that Q(3) + 1 is the same as Q(4), which is what the existing code computes; that is one reason I chose a CL=1 example to start with, since those are *not* the same even for the simple case of RF=3.)

This would mean we may fail a few writes unnecessarily (a write to A or B is actually sufficient to satisfy CL=1, but this scheme would time that out) but never allow a write to succeed that would leave CL unsatisfied post-bootstrap (or if bootstrap is cancelled).

> fix consistencylevel during bootstrap
> -------------------------------------
>
>                 Key: CASSANDRA-833
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-833
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>            Reporter: Jonathan Ellis
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.1
>
>
> As originally designed, bootstrap nodes should *always* get *all* writes under any consistencylevel, so when bootstrap finishes the operator can run cleanup on the old nodes w/o fear that he might lose data.
> but if a bootstrap operation fails or is aborted, that means all writes will fail until the ex-bootstrapping node is decommissioned.  so starting in CASSANDRA-722, we just ignore dead nodes in consistencylevel calculations.
> but this breaks the original design.  CASSANDRA-822 adds a partial fix for this (just adding bootstrap targets into the RF targets and hinting normally), but this is still broken under certain conditions.  The real fix is to consider consistencylevel for two sets of nodes:
>   1. the RF targets as currently existing (no pending ranges)
>   2.  the RF targets as they will exist after all movement ops are done
> If we satisfy CL for both sets then we will always be in good shape.
> I'm not sure if we can easily calculate 2. from the current TokenMetadata, though.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira