You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sandeep Tata (JIRA)" <ji...@apache.org> on 2009/05/22 03:33:45 UTC

[jira] Created: (CASSANDRA-195) Improve bootstrap algorithm

Improve bootstrap algorithm
---------------------------

                 Key: CASSANDRA-195
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
             Project: Cassandra
          Issue Type: Improvement
    Affects Versions: trunk
         Environment: all
            Reporter: Sandeep Tata
             Fix For: trunk


When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-195:
-----------------------------------

    Attachment: 195-v3-delta1.patch

This applies after 195-v3.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742883#action_12742883 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

Thanks!

> did this so that the op doesn't have to worry about re-starting correctly in bootstrap mode if the node died during bootstrap and got restarted.

I'm really -1 on trying to be clever and second-guessing the op.  It just leads to confusion, e.g. when we had the CF definitions stored locally as well as in the xml -- it seemed like adding new CFs to the xml should Just Work but it didn't because Cassandra was outsmarting you.

+ if (StorageService.instance().isBootstrapMode())
+ {
+ logger_.error("Cannot bootstrap another node: I'm in bootstrap mode myself!");
+ return;
+ } 

still needs to be replaced w/ assert.

+    public static String rename(String tmpFilename)

why move this to SST and make it public?  SSTW is the only user, it should stay there private.

+    public static synchronized SSTableReader renameAndOpen(String dataFileName) throws IOException

I don't think this doesn't need to be synchronized -- calling it on different args doesn't need it, and calling it twice on the same args is erroneous.

+        boolean bootstrap = false;
+        if (bs != null && bs.contains("true"))
+            bootstrap = true;

better: 
boolean bootstrap = bs != null && bs.contains("true")

     public StorageService()

should be removed

in the endpoint-finding code:
+                       if(endPoint.equals(StorageService.getLocalStorageEndPoint()) && !isBootstrapMode)

the extra check should be unnecessary, since we shouldn't be looking up endpoints at all in bootstrap mode, right?

-                       if ( StorageService.instance().isInSameDataCenter(endpoints[j]) && FailureDetector.instance()
+                       if ( StorageService.instance().isInSameDataCenter(endpoints[j]) && FailureDetector.instance()
+                               && !tokenMetadata_.isBootstrapping(endpoints[j]))

I don't think this is quite right.  Introducing the new node into the right place on the ring, but then trying to special case it to not get used, seems problematic.  (What if you only have a replication factor of one?  Then the data will just disappear until boostrap is complete.)

Can we instead not introduce the bootstraping node into the ring until it is done?  And have the nodes providing data to it, echo writes in the bootstrap range over?


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738669#action_12738669 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

Some implementation-level details:
 
 - add a "bootstrap mode" to cassandra startup.  When started with bootstrap mode, it would wait a minute or two to get the node/token map, then tell the node whose range it is moving into to send over the data.  When that is done it will start answering replies.  We don't want the node to behave like a normal node at all until then, so we should take the bootstrap command out of nodeprobe.  If it can't complete bootstrap, it should abort.  (Bootstrap by definition requires operator intervention so this is fair.)
 - the node D should continue receiving writes for the range in question during this process, and forward them to the bootstrapping node Z
 - if anticompact splits existing SSTables (removing the old "big" one) and leaves both live during this process, we will save doing an extra scan of the old SSTable for Cleanup later in the old model of copying out the to-move data to a special directory.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>             Fix For: 0.5
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738683#action_12738683 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------

What if a client ends up submitting a write/read to a node in "bootstrap mode" ? (Could happen in a few scenarios)

We have three options:
1. Throw an UnavailableException (easy :) )
2. Forward the request to one of the older nodes that has the data
3. Prevent such a scenario in the first place by not gossiping the token of the new node until it is out of the bootstrap mode (complicated, probably unnecessary)

I'm leaning towards Option #2 being the best compromise: R/W still available, but at an extra hop until bootstrap completes.


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>             Fix For: 0.5
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-195:
-----------------------------------

    Attachment: 195-v2.patch

Rebased + cleaned up formatting/whitespace diffs in StorageService to make reviewing easier.

There's some big block movement in Bootstrapper and LeaveJoinProtocolHelper -- but that's mostly refactoring changes and should much easier to read. (There's even a unit test now testing this code :) )


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-195:
-----------------------------------

    Attachment: 195-v3-delta1.patch

This applies after 195-v3.


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742587#action_12742587 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------


>> This should throw some kind of error -- either the other nodes are buggy, sending reads to a bootstrapping node, or your ops team has screwed up and is allowing client-level traffic to the node. neither should fail silently.

Good call. I'll throw a runtime exception here.

>> Similarly, we should add a verifyNotBootstrapping (better name?) call to CassandraServer instead of adding && !bootstrapping to all the StorageProxy calls. (There may be a place we can do this centrally by overriding the right method from the thrift-generated Cassandra interface.)

There's only one place in StorageProxy.readProtocol that we're testing for bootstrap mode.  I don't see why/how to move this to CassandraServer.


> + cf.addColumn(new Column(BOOTSTRAP, BasicUtilities.shortToByteArray(isBootstrap?(short)1:(short)0)));
> + return new StorageMetadata(token, generation,isBootstrap);

> not to be anal, but watch the spacing on the ?: operator and method arguments (this comment applies to other patch lines too)

Okay -- I'll do the right thing and move this into boolean methods in BasicUtilities.

>> I don't think storing the mode in ST and trying to second-guess the op is a good idea. If the op says -b, then we bootstrap. Otherwise we don't.

Yeah, I thought about this a bit ... I did this so that the op doesn't have to worry about re-starting correctly in bootstrap mode if the node died during bootstrap and got restarted. When it comes back up, you want it to resume bootstrap.
It makes it trickier to abort the bootstrap and startup in an unsafe mode. If that is really what the op wants to do, he should delete the entry in the ST from an admin app. I figured the normal thing would be to resume bootstrap and wanted to make that simpler/easier. I'd lean towards keeping this behaviour.

> + protected boolean localOnly;

>I think this means "send messages out to other nodes to bootstrap me" if true, otherwise, bootstrap some other node. Is that right? It seems like those two operations should be in difference classes, not a single class doing two different things based on an if statement.

Yes -- that was the plan, but I ended up not using it. I'll clean this part out -- we're really using the same code to do either thing.

> + public static synchronized SSTableReader renameAndOpen(String dataFileName) throws IOException

> fits better in SSTW -- Reader shouldn't be renaming things just as a conceptual purity thing. :)

:-) Okay. Done.


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744619#action_12744619 ] 

Sandeep Tata edited comment on CASSANDRA-195 at 8/18/09 11:03 AM:
------------------------------------------------------------------

>>I don't see where this fixes the serve-reads-from-the-old-nodes-until-bootstrap-complete problem, can you give a high-level summary of how that works? 

The tokenMetadata_.update calls now requires a bootstrap flag. If this flag is true, the new node is added to a separate set bootstrapNodes and does not affect calculations for sending reads and *writes*. 

The node only surfaces in the ring once bootstrap is completed. The new node will *not* have the writes that arrived during bootstrap -- a consequence of the changes that accommodate the replication=1 case. We will need writes to see a different ring (different result for getStorageEndPoints) than the reads -- that's not in this patch.

>>maybe I don't understand how this works -- to me that looks like if we send any update about a node, everyone will clear the bootstrap flag on it, unless we explicitly set bootstrap to true. shouldn't we only update bootstrap status when it's explicitly set to true or false?

Hmm, I thought the gossiper always sent the full endpoint state, so it should always include bootstrap status. If it isn't sent, it means the node is not bootstrapping. See Gossiper.GossipDigestAckVerbHandler

Even otherwise, if oldToken == newToken, the only action taken is deliverHints. The bootstrap status is not cleared.

      was (Author: sandeep_tata):
    >>I don't see where this fixes the serve-reads-from-the-old-nodes-until-bootstrap-complete problem, can you give a high-level summary of how that works? 

The tokenMetadata_.update calls now requires a bootstrap flag. If this flag is true, the new node is added to a separate set bootstrapNodes and does not affect calculations for sending reads and *writes*. 

The node only surfaces in the ring once bootstrap is completed. The new node will *not* have the writes that arrived during bootstrap -- a consequence of the changes that accommodate the replication=1 case. We will need writes to see a different ring (different result for getStorageEndPoints) than the reads -- that's not in this patch.

>>maybe I don't understand how this works -- to me that looks like if we send any update about a node, everyone will clear the bootstrap flag on it, unless we explicitly set bootstrap to true. shouldn't we only update bootstrap status when it's explicitly set to true or false?

Hmm, I thought the gossiper always sent the full endpoint state, so it should always include bootstrap status. If it isn't sent, it means the node is not bootstrapping. See:

Gossiper.makeRandomGossipDigest:
      EndPointState epState = endPointStateMap_.get(localEndPoint_);

Even otherwise, if oldToken == newToken, the only action taken is deliverHints. The bootstrap status is not cleared.
  
> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744626#action_12744626 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

> If this flag is true, the new node is added to a separate set bootstrapNodes

Oh, I see.  That makes sense.  So the node is present in the Gossiper liveEndpoints, so it gets the state of the cluster sent to it, but it's not in the tokenmetadata endpoint or token maps, so it doesn't become part of the read/write path.

I don't see any actual uses of bootstrapNodes, though -- did you have something in mind?

> We will need writes to see a different ring (different result for getStorageEndPoints) than the reads

I really think that just forwarding the writes from the old node is going to be simpler.

> I thought the gossiper always sent the full endpoint state

in that case wouldn't something like this be clearer?

        ApplicationState bState = epState.getApplicationState(StorageService.BOOTSTRAP_MODE); 
        assert bState != null;
        boolean bootstrapState = Boolean.parseBoolean(bState.getState());
        if (logger_.isDebugEnabled()) 
            logger_.debug(ep.getHost() + " has bootstrap state: " + bootstrapState); 


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744646#action_12744646 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------

>> I don't see any actual uses of bootstrapNodes, though -- did you have something in mind? 

Yes. To do the forwarding of the writes.

The writes don't need to physically see a different ring, just that the bootstrap node be included in the list of nodes an endpoint needs to write to.

>>   ApplicationState bState = epState.getApplicationState(StorageService.BOOTSTRAP_MODE); 
>>   assert bState != null; 

That wouldn't work for the case when the cluster was started with no bootstrapping nodes. 
The gossiper always sends the full endpoint state, but the state may not contain an entry for BOOTSTRAP_MODE which should be interpreted as normal mode.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-195:
-----------------------------------

    Attachment: 195-v3.patch

Regenerated incorporating feedback from jbellis.

Also changed the script bin/cassandra script to pass -Dbootstrap=true when -b is specified

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741662#action_12741662 ] 

Chris Goffinet commented on CASSANDRA-195:
------------------------------------------

Can we add support for turning bootstrap mode on through MBean as well? For a large cluster that is managed through systems like Puppet, it doesn't exactly make things easier when some things should be run using -b and some things not for adding nodes.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739577#action_12739577 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------

>> one more thing to fix: bootstrap should use getTempSSTablePath and rename when write is complete, instead of getNextFileName which is crash-unsafe

Yes, I've got that on my local branch now :)

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744595#action_12744595 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

I don't see where this fixes the serve-reads-from-the-old-nodes-until-bootstrap-complete problem, can you give a high-level summary of how that works?

this code looks suspicious:

        ApplicationState bState = epState.getApplicationState(StorageService.BOOTSTRAP_MODE);
        boolean bootstrapState = false;
        if (bState != null)
        {
            bootstrapState = Boolean.parseBoolean(bState.getState());
            if (logger_.isDebugEnabled())
                logger_.debug(ep.getHost() + " has bootstrap state: " + bootstrapState);
        }

maybe I don't understand how this works -- to me that looks like if we send any update about a node, everyone will clear the bootstrap flag on it, unless we explicitly set bootstrap to true.  shouldn't we only update boostrap status when it's explicitly set to true or false?

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-195:
-----------------------------------

    Attachment:     (was: 195-v3-delta1.patch)

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-195:
-------------------------------------

    Comment: was deleted

(was: Can we add support for turning bootstrap mode on through MBean as well? For a large cluster that is managed through systems like Puppet, it doesn't exactly make things easier when some things should be run using -b and some things not for adding nodes.)

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Michael Greene (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Greene updated CASSANDRA-195:
-------------------------------------

    Fix Version/s:     (was: 0.5)
                   0.4

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.4
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4-delta1.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742427#action_12742427 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

are you still planning to split the patch into reformatting/whitespace and content changes?

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744680#action_12744680 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------

Here's the scenario for why forwarding logic will need to know the set of bootstrapNodes:

We have node A, B, C and the new node enters between B and C to get: A, B, N, C.
Say we use B (among other nodes) to bootstrap the data on N.

We still want to be able to forward writes that arrive at P, Q, R, etc intended for A B C or B C D to reach A B N C. Which means every node needs to know the nodes that are being bootstrapped and their locations on the ring (not just the nodes that are bootstrapping the new node which know what ranges they are sending N). That's why we're storing the bootstrapNodes in tokenMetadata.

When insert or insertBlocking ask for R storage endpoints, we want to pick R + bootstrapping nodes to receive the update.






> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-195:
-----------------------------------

    Attachment: 195-v4.patch

Rebased (and removed code that stored bootstrap status in System Table)

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744709#action_12744709 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

> The gossiper always sends the full endpoint state, but the state may not contain an entry for BOOTSTRAP_MODE which should be interpreted as normal mode.

okay, so this should really be a flag, instead of a boolean.

        ApplicationState bState = ; 
        boolean bootstrapState = epState.getApplicationState(StorageService.BOOTSTRAP_MODE) != null;
        if (logger_.isDebugEnabled()) 
                logger_.debug(ep.getHost() + " has bootstrap state: " + bootstrapState); 

with the code in start

            Gossiper.instance().addApplicationState(StorageService.BOOTSTRAP_MODE, new ApplicationState(""));



> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata reopened CASSANDRA-195:
------------------------------------


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742594#action_12742594 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------

Verified with urandom that bin/cassandra script doesn't pass -Dvar variable on the commandline "bin/cassandra -Dfoo" to the JVM.
I'll open a ticket for this.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742556#action_12742556 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

Thanks a lot for the patch, and the cleanup -- that makes it a lot easier to follow!

overall this is going in the right direction.  some comments:

command line stuff:

the Java Way to do this would be to define -Dbootstrap.  I'm pretty sure bin/cassandra correctly propagates args to the jvm like that.  (if it does not, we should open a ticket for that.)  then rather than passing a bootstrap variable around, just have the right place in the code query the environment.

+            if (StorageService.instance().isBootstrapMode())
+            {
+               /* Don't service reads! */
+               return;
+            }

This should throw some kind of error -- either the other nodes are buggy, sending reads to a bootstrapping node, or your ops team has screwed up and is allowing client-level traffic to the node.  neither should fail silently.

Similarly, we should add a verifyNotBootstrapping (better name?) call to CassandraServer instead of adding && !bootstrapping to all the StorageProxy calls.  (There may be a place we can do this centrally by overriding the right method from the thrift-generated Cassandra interface.)

+            cf.addColumn(new Column(BOOTSTRAP, BasicUtilities.shortToByteArray(isBootstrap?(short)1:(short)0)));
+            return new StorageMetadata(token, generation,isBootstrap);

not to be anal, but watch the spacing on the ?: operator and method arguments (this comment applies to other patch lines too)

+        /* Stored value overrides passed in value */
+        IColumn bootstrapColumn = cf.getColumn(BOOTSTRAP);
+        boolean readBootstrap = (BasicUtilities.byteArrayToShort(bootstrapColumn.value())==1)?true:false;
+        if (!isBootstrap && readBootstrap)
+        {
+            logger_.warn("Probably failed a previous bootstrap! Starting in bootstrap mode.");
+        }

I don't think storing the mode in ST and trying to second-guess the op is a good idea.  If the op says -b, then we bootstrap.  Otherwise we don't.

+    protected boolean localOnly;

I think this means "send messages out to other nodes to bootstrap me" if true, otherwise, bootstrap some other node.  Is that right?  It seems like those two operations should be in difference classes, not a single class doing two different things based on an if statement.

+        if (StorageService.instance().isBootstrapMode())
+        {
+            logger_.error("Cannot bootstrap another node: I'm in bootstrap mode myself!");
+            return;
+        }

If node Z asks node A to bootstrap it, it should be Z's responsibility to make sure that A is not in bootstrap mode.  (note that bootstrap mode only changes from True to False, never the other way.)  so I would replace this with an assert !StorageService.instance().isBootstrapMode()

+    public static synchronized SSTableReader renameAndOpen(String dataFileName) throws IOException

fits better in SSTW -- Reader shouldn't be renaming things just as a conceptual purity thing. :)


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-195.
--------------------------------------

    Resolution: Fixed

committed v4 + delta1

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4-delta1.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744299#action_12744299 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

can you rebase to trunk, please?  -v3 no longer applies.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739494#action_12739494 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

one more thing to fix: bootstrap should use getTempSSTablePath and rename when write is complete, instead of getNextFileName which is crash-unsafe

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743010#action_12743010 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------

Thanks. I fixed the minor stuff. 
This leaves 2 things:

1.
>> I'm really -1 on trying to be clever and second-guessing the op. It just leads to confusion, e.g. when we had the CF definitions stored locally as well as in the xml -- it seemed like adding new CFs to the xml should Just Work but it didn't because Cassandra was outsmarting you. 

Okay. I'll change this back so it does exactly what the op wants. But I'll still write a warning to the log if the restart is in normal mode and the node remembers that it didn't finish bootstrap.

2.
>>I don't think this is quite right. Introducing the new node into the right place on the ring, but then trying to special case it to not get used, seems problematic. (What if you only have a replication factor of one? Then the data will just disappear until boostrap is complete.)

Can we instead not introduce the bootstraping node into the ring until it is done? And have the nodes providing data to it, echo writes in the bootstrap range over? 

I didn't think about the replication factor=1 case. The fix is a little more involved than just maintaining tokenmetadata correctly based on whether we see a "bootstrap" flag in gossip. I'll make these changes for v4.



> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-195:
-----------------------------------

    Attachment: 195-v1.patch

1. Allows you to start cassandra in bootstrap mode "bin/cassandra -b"
2. If reads arrive at this node during bootstrap from a client, they are served using remoteRead
3. If reads arrive at this node during bootstrap from another node, they are dropped
4. Until bootstrap is complete, node tells other nodes that it is in "bootstrap mode" -- this info is used to *not* send reads to this node.
5. The bootstrap code is not tested for multiple new nodes (>1) are added between 2 existing nodes


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743040#action_12743040 ] 

Sandeep Tata edited comment on CASSANDRA-195 at 8/13/09 6:03 PM:
-----------------------------------------------------------------

This applies after 195-v3.
I just realized there's a minor bug with delta1 if requests from clients come directly to a bootstrapping node (weakReadRemote might serve up the local node as a candidate). I'll fix this and upload a new version.

      was (Author: sandeep_tata):
    This applies after 195-v3.

  
> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744619#action_12744619 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------

>>I don't see where this fixes the serve-reads-from-the-old-nodes-until-bootstrap-complete problem, can you give a high-level summary of how that works? 

The tokenMetadata_.update calls now requires a bootstrap flag. If this flag is true, the new node is added to a separate set bootstrapNodes and does not affect calculations for sending reads and *writes*. 

The node only surfaces in the ring once bootstrap is completed. The new node will *not* have the writes that arrived during bootstrap -- a consequence of the changes that accommodate the replication=1 case. We will need writes to see a different ring (different result for getStorageEndPoints) than the reads -- that's not in this patch.

>>maybe I don't understand how this works -- to me that looks like if we send any update about a node, everyone will clear the bootstrap flag on it, unless we explicitly set bootstrap to true. shouldn't we only update bootstrap status when it's explicitly set to true or false?

Hmm, I thought the gossiper always sent the full endpoint state, so it should always include bootstrap status. If it isn't sent, it means the node is not bootstrapping. See:

Gossiper.makeRandomGossipDigest:
      EndPointState epState = endPointStateMap_.get(localEndPoint_);

Even otherwise, if oldToken == newToken, the only action taken is deliverHints. The bootstrap status is not cleared.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712116#action_12712116 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

How does the new node (node Z) know that there are no nodes that were down at the time Z was brought up, but which need to send it data?

(Is this addressed at all in the existing bootstrap code?  I think what will happen is that a node D that is down at the time of bootstrap will never send the data over.)

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: trunk
>         Environment: all
>            Reporter: Sandeep Tata
>             Fix For: trunk
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741933#action_12741933 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------

I' suppose we could add support for bootstrap through MBean, but eventually, I'm guessing "bootstrap mode" will be the default way for a node to join the cluster. For nodes that have data and are coming back, we'll need a recovery version of bootstrap that perhaps uses the Merkle trees.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745433#action_12745433 ] 

Hudson commented on CASSANDRA-195:
----------------------------------

Integrated in Cassandra #173 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/173/])
    Add "bootstrap mode" to node startup.  This causes the node to tell the
nodes that have data it needs to send it the data, and not otherwise
participate in reads or writes until the bootstrap is complete.
patch by Sandeep Tata; reviewed by jbellis for 


> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.4
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4-delta1.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742090#action_12742090 ] 

Sandeep Tata commented on CASSANDRA-195:
----------------------------------------

Oops -- closed this by mistake instead of 257. This issue is very much alive.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata updated CASSANDRA-195:
-----------------------------------

    Attachment: 195-v4-delta1.patch

>>okay, so this should really be a flag, instead of a boolean. 

We can do this either way. Moving to a check like:
  boolean bootstrapState = epState.getApplicationState(StorageService.BOOTSTRAP_MODE) != null; 

will need the compensating action in removeBootstrapSource() to delete application state.
This has the advantage of slightly reducing the size of gossip messages when bootstrap is complete/absent. 

v4-delta1 does this.

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>         Attachments: 195-v1.patch, 195-v2.patch, 195-v3-delta1.patch, 195-v3.patch, 195-v4-delta1.patch, 195-v4.patch
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738983#action_12738983 ] 

Jonathan Ellis commented on CASSANDRA-195:
------------------------------------------

When started in bootstrap mode, Z should advertise its status with a flag that says "I'm here for the purposes of sending me data, but I'm not accepting reads until I have that data."  Then we can reject reads w/ UnavailableException since anyone sending a read anyway is buggy.

(Writes we should accept since they will be forwarded by the old node.)

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>             Fix For: 0.5
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CASSANDRA-195) Improve bootstrap algorithm

Posted by "Sandeep Tata (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sandeep Tata reassigned CASSANDRA-195:
--------------------------------------

    Assignee: Sandeep Tata

> Improve bootstrap algorithm
> ---------------------------
>
>                 Key: CASSANDRA-195
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-195
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: all
>            Reporter: Sandeep Tata
>            Assignee: Sandeep Tata
>             Fix For: 0.5
>
>
> When you add a node to an existing cluster and the map gets updated, the new node may respond to read requests by saying it doesn't have any of the data until it gets the data from the node(s) the previously owned this range (the load-balancing code, when working properly can take care of this). While this behaviour is compatible with eventual consistency, it would be much friendlier for the new node not to "surface" in the EndPoint maps for reads until it has transferred the data over from the old nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.