You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (JIRA)" <ji...@apache.org> on 2014/02/04 17:42:15 UTC

[jira] [Comment Edited] (CASSANDRA-6648) Race condition during node bootstrapping

    [ https://issues.apache.org/jira/browse/CASSANDRA-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890834#comment-13890834 ] 

Brandon Williams edited comment on CASSANDRA-6648 at 2/4/14 4:42 PM:
---------------------------------------------------------------------

v2 builds on Sergio's patch, but changes the gossiper's own fat client check (not isFatClient) to ignore epstate.isAlive and just rely on the timestamp of the last update and membership.


was (Author: brandon.williams):
v2 builds on Sergio's patch, but changes the gossiper's fat client check to ignore epstate.isAlive and just rely on the timestamp of the last update and membership.

> Race condition during node bootstrapping
> ----------------------------------------
>
>                 Key: CASSANDRA-6648
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6648
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sergio Bossa
>            Assignee: Sergio Bossa
>            Priority: Critical
>         Attachments: 6648-v2.txt, CASSANDRA-6648.patch
>
>
> When bootstrapping a new node, data is "missing" as if the new node didn't actually bootstrap, which I tracked down to the following scenario:
> 1) New node joins token ring and waits for schema to be settled before actually bootstrapping.
> 2) The schema scheck somewhat passes and it starts bootstrapping.
> 3) Bootstrapping doesn't find the ks/cf that should have received from the other node.
> 4) Queries at this point cause NPEs, until when later they "recover" but data is missed.
> The problem seems to be caused by a race condition between the migration manager and the bootstrapper, with the former running after the latter.
> I think this is supposed to protect against such scenarios:
> {noformat}
>             while (!MigrationManager.isReadyForBootstrap())
>             {
>                 setMode(Mode.JOINING, "waiting for schema information to complete", true);
>                 Uninterruptibles.sleepUninterruptibly(1, TimeUnit.SECONDS);
>             }
> {noformat}
> But MigrationManager.isReadyForBootstrap() implementation is quite fragile and doesn't take into account "slow" schema propagation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)