You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "T Jake Luciani (JIRA)" <ji...@apache.org> on 2015/01/03 00:14:35 UTC

[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

    [ https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263304#comment-14263304 ] 

T Jake Luciani commented on CASSANDRA-8494:
-------------------------------------------

Rather than add a rich state management to bootstrap why don't we consider joining nodes a part of the ring right away and proxy non-streamed ranges to a known replica till all the data is streamed.  If the node dies nothing bad happens.  We already send extra writes to joining nodes, so we would only need to add the ability for a joining node to track what data has been streamed so far. 

> incremental bootstrap
> ---------------------
>
>                 Key: CASSANDRA-8494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jon Haddad
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: density
>             Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming data before the node is available for requests.  This can be problematic with "fat nodes", since it may require 20TB of data to be streamed over before the machine can be useful.  This can result in a massive window of time before the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is available, I suggest modifying the bootstrap process to only acquire a single initial token before being marked UP.  This would likely be a configuration parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP state, and could then acquire additional tokens (one or a handful at a time), which would be streamed over while the node is active and serving requests.  The benefit here is that with the default 256 tokens a node could become an active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)