You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Peter Schuller (JIRA)" <ji...@apache.org> on 2011/04/07 19:30:05 UTC

[jira] [Created] (CASSANDRA-2435) auto bootstrap happened on already bootstrapped nodes

auto bootstrap happened on already bootstrapped nodes
-----------------------------------------------------

Key: CASSANDRA-2435
URL: https://issues.apache.org/jira/browse/CASSANDRA-2435
Project: Cassandra
Issue Type: Bug
Affects Versions: 0.7.2
Reporter: Peter Schuller
Priority: Minor

I believe the following was observed on 0.7.2. I meant to dig deeper, but never had the time, and now I want to at least file this even if I don't have extremely helpful information.

A piece of background is that we consciously made the decision to have the default configuration on nodes have auto_bootstrap set to true. The logic was that if one accidentally were to start a new node, we'd rather have it join with data than join *without* data and cause bogus read results in the cluster.

We executed this policy (by way of having the puppet managed config have auto_bootstrap set to true).

On one of our clusters with 5 nodes, we did some moves. All looked well; the moves completed. For unrelated reasons, we wanted to restart nodes after they had been moved. When we did, three of the 5, specifically those 3 that were *NOT* seed nodes, initiated a bootstrap procedure! Before the moves the cluster had been running for several days at least.

The logs indicated the automatic token selection, and they joined the ring under a new automatically selected token.

Presumably, this violated consistency but at the time there was no live traffic to the cluster and we didn't confirm (put traffic on it after repair+cleanup).

I did look a little bit at the code in light of this but didn't see anything obvious, so I don't really know what the likely culprit is.

A potential complication was that seed nodes were moved without using the correct procedure of de-seeding them first. This was clearly wrong, but it is not obvious to me that it would cause other nodes to incorrectly bootstrap since a node should *never* bootstrap more than once if the local system tables say it's been bootstrapped.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2435) auto bootstrap happened on already bootstrapped nodes

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018072#comment-13018072 ] 

Peter Schuller commented on CASSANDRA-2435:
-------------------------------------------

FWIW, looks good to me (but I only did visual inspection and some code jumping in the 0.7 branch; haven't tested it).


> auto bootstrap happened on already bootstrapped nodes
> -----------------------------------------------------
>
>                 Key: CASSANDRA-2435
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2435
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Peter Schuller
>            Assignee: Jonathan Ellis
>            Priority: Critical
>             Fix For: 0.7.5
>
>         Attachments: 2435.txt
>
>
> I believe the following was observed on 0.7.2. I meant to dig deeper, but never had the time, and now I want to at least file this even if I don't have extremely helpful information.
> A piece of background is that we consciously made the decision to have the default configuration on nodes have auto_bootstrap set to true. The logic was that if one accidentally were to start a new node, we'd rather have it join with data than join *without* data and cause bogus read results in the cluster.
> We executed this policy (by way of having the puppet managed config have auto_bootstrap set to true).
> On one of our clusters with 5 nodes, we did some moves. All looked well; the moves completed. For unrelated reasons, we wanted to restart nodes after they had been moved. When we did, three of the 5, specifically those 3 that were *NOT* seed nodes, initiated a bootstrap procedure! Before the moves the cluster had been running for several days at least.
> The logs indicated the automatic token selection, and they joined the ring under a new automatically selected token.
> Presumably, this violated consistency but at the time there was no live traffic to the cluster and we didn't confirm (put traffic on it after repair+cleanup).
> I did look a little bit at the code in light of this but didn't see anything obvious, so I don't really know what the likely culprit is.
> A potential complication was that seed nodes were moved without using the correct procedure of de-seeding them first. This was clearly wrong, but it is not obvious to me that it would cause other nodes to incorrectly bootstrap since a node should *never* bootstrap more than once if the local system tables say it's been bootstrapped.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2435) auto bootstrap happened on already bootstrapped nodes

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2435:
--------------------------------------

    Priority: Critical  (was: Major)

> auto bootstrap happened on already bootstrapped nodes
> -----------------------------------------------------
>
>                 Key: CASSANDRA-2435
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2435
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Peter Schuller
>            Assignee: Jonathan Ellis
>            Priority: Critical
>             Fix For: 0.7.5
>
>         Attachments: 2435.txt
>
>
> I believe the following was observed on 0.7.2. I meant to dig deeper, but never had the time, and now I want to at least file this even if I don't have extremely helpful information.
> A piece of background is that we consciously made the decision to have the default configuration on nodes have auto_bootstrap set to true. The logic was that if one accidentally were to start a new node, we'd rather have it join with data than join *without* data and cause bogus read results in the cluster.
> We executed this policy (by way of having the puppet managed config have auto_bootstrap set to true).
> On one of our clusters with 5 nodes, we did some moves. All looked well; the moves completed. For unrelated reasons, we wanted to restart nodes after they had been moved. When we did, three of the 5, specifically those 3 that were *NOT* seed nodes, initiated a bootstrap procedure! Before the moves the cluster had been running for several days at least.
> The logs indicated the automatic token selection, and they joined the ring under a new automatically selected token.
> Presumably, this violated consistency but at the time there was no live traffic to the cluster and we didn't confirm (put traffic on it after repair+cleanup).
> I did look a little bit at the code in light of this but didn't see anything obvious, so I don't really know what the likely culprit is.
> A potential complication was that seed nodes were moved without using the correct procedure of de-seeding them first. This was clearly wrong, but it is not obvious to me that it would cause other nodes to incorrectly bootstrap since a node should *never* bootstrap more than once if the local system tables say it's been bootstrapped.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2435) auto bootstrap happened on already bootstrapped nodes

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2435:
--------------------------------------

    Attachment: 2435.txt

patch to fix.

> auto bootstrap happened on already bootstrapped nodes
> -----------------------------------------------------
>
>                 Key: CASSANDRA-2435
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2435
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Peter Schuller
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.5
>
>         Attachments: 2435.txt
>
>
> I believe the following was observed on 0.7.2. I meant to dig deeper, but never had the time, and now I want to at least file this even if I don't have extremely helpful information.
> A piece of background is that we consciously made the decision to have the default configuration on nodes have auto_bootstrap set to true. The logic was that if one accidentally were to start a new node, we'd rather have it join with data than join *without* data and cause bogus read results in the cluster.
> We executed this policy (by way of having the puppet managed config have auto_bootstrap set to true).
> On one of our clusters with 5 nodes, we did some moves. All looked well; the moves completed. For unrelated reasons, we wanted to restart nodes after they had been moved. When we did, three of the 5, specifically those 3 that were *NOT* seed nodes, initiated a bootstrap procedure! Before the moves the cluster had been running for several days at least.
> The logs indicated the automatic token selection, and they joined the ring under a new automatically selected token.
> Presumably, this violated consistency but at the time there was no live traffic to the cluster and we didn't confirm (put traffic on it after repair+cleanup).
> I did look a little bit at the code in light of this but didn't see anything obvious, so I don't really know what the likely culprit is.
> A potential complication was that seed nodes were moved without using the correct procedure of de-seeding them first. This was clearly wrong, but it is not obvious to me that it would cause other nodes to incorrectly bootstrap since a node should *never* bootstrap more than once if the local system tables say it's been bootstrapped.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2435) auto bootstrap happened on already bootstrapped nodes

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2435:
--------------------------------------

          Component/s: Core
             Priority: Major  (was: Minor)
    Affects Version/s:     (was: 0.7.2)
                       0.7.0
        Fix Version/s: 0.7.5
             Assignee: Jonathan Ellis

recall that move (until 0.8) consists of

- unbootstrap
- bootstrap to new location

unbootstrap calls storageservice.leavering (same as decommission), which marks the node as not-bootstrapped with setBootstrapped(false).  

in one of the refactorings during 0.7 development we removed the call to setBootstrapped(true) from finishBootstrapping.  So next restart it will indeed autobootstrap if that is enabled in the config file.

> auto bootstrap happened on already bootstrapped nodes
> -----------------------------------------------------
>
>                 Key: CASSANDRA-2435
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2435
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Peter Schuller
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.5
>
>
> I believe the following was observed on 0.7.2. I meant to dig deeper, but never had the time, and now I want to at least file this even if I don't have extremely helpful information.
> A piece of background is that we consciously made the decision to have the default configuration on nodes have auto_bootstrap set to true. The logic was that if one accidentally were to start a new node, we'd rather have it join with data than join *without* data and cause bogus read results in the cluster.
> We executed this policy (by way of having the puppet managed config have auto_bootstrap set to true).
> On one of our clusters with 5 nodes, we did some moves. All looked well; the moves completed. For unrelated reasons, we wanted to restart nodes after they had been moved. When we did, three of the 5, specifically those 3 that were *NOT* seed nodes, initiated a bootstrap procedure! Before the moves the cluster had been running for several days at least.
> The logs indicated the automatic token selection, and they joined the ring under a new automatically selected token.
> Presumably, this violated consistency but at the time there was no live traffic to the cluster and we didn't confirm (put traffic on it after repair+cleanup).
> I did look a little bit at the code in light of this but didn't see anything obvious, so I don't really know what the likely culprit is.
> A potential complication was that seed nodes were moved without using the correct procedure of de-seeding them first. This was clearly wrong, but it is not obvious to me that it would cause other nodes to incorrectly bootstrap since a node should *never* bootstrap more than once if the local system tables say it's been bootstrapped.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2435) auto bootstrap happened on already bootstrapped nodes

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018184#comment-13018184 ] 

Hudson commented on CASSANDRA-2435:
-----------------------------------

Integrated in Cassandra-0.7 #430 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/430/])
    re-set bootstrapped flag after move finishes
patch by jbellis; reviewed by Peter Schuller and Nick Bailey for CASSANDRA-2435


> auto bootstrap happened on already bootstrapped nodes
> -----------------------------------------------------
>
>                 Key: CASSANDRA-2435
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2435
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Peter Schuller
>            Assignee: Jonathan Ellis
>            Priority: Critical
>             Fix For: 0.7.5
>
>         Attachments: 2435.txt
>
>
> I believe the following was observed on 0.7.2. I meant to dig deeper, but never had the time, and now I want to at least file this even if I don't have extremely helpful information.
> A piece of background is that we consciously made the decision to have the default configuration on nodes have auto_bootstrap set to true. The logic was that if one accidentally were to start a new node, we'd rather have it join with data than join *without* data and cause bogus read results in the cluster.
> We executed this policy (by way of having the puppet managed config have auto_bootstrap set to true).
> On one of our clusters with 5 nodes, we did some moves. All looked well; the moves completed. For unrelated reasons, we wanted to restart nodes after they had been moved. When we did, three of the 5, specifically those 3 that were *NOT* seed nodes, initiated a bootstrap procedure! Before the moves the cluster had been running for several days at least.
> The logs indicated the automatic token selection, and they joined the ring under a new automatically selected token.
> Presumably, this violated consistency but at the time there was no live traffic to the cluster and we didn't confirm (put traffic on it after repair+cleanup).
> I did look a little bit at the code in light of this but didn't see anything obvious, so I don't really know what the likely culprit is.
> A potential complication was that seed nodes were moved without using the correct procedure of de-seeding them first. This was clearly wrong, but it is not obvious to me that it would cause other nodes to incorrectly bootstrap since a node should *never* bootstrap more than once if the local system tables say it's been bootstrapped.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2435) auto bootstrap happened on already bootstrapped nodes

Posted by "Nick Bailey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018135#comment-13018135 ] 

Nick Bailey commented on CASSANDRA-2435:
----------------------------------------

+1

> auto bootstrap happened on already bootstrapped nodes
> -----------------------------------------------------
>
>                 Key: CASSANDRA-2435
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2435
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Peter Schuller
>            Assignee: Jonathan Ellis
>            Priority: Critical
>             Fix For: 0.7.5
>
>         Attachments: 2435.txt
>
>
> I believe the following was observed on 0.7.2. I meant to dig deeper, but never had the time, and now I want to at least file this even if I don't have extremely helpful information.
> A piece of background is that we consciously made the decision to have the default configuration on nodes have auto_bootstrap set to true. The logic was that if one accidentally were to start a new node, we'd rather have it join with data than join *without* data and cause bogus read results in the cluster.
> We executed this policy (by way of having the puppet managed config have auto_bootstrap set to true).
> On one of our clusters with 5 nodes, we did some moves. All looked well; the moves completed. For unrelated reasons, we wanted to restart nodes after they had been moved. When we did, three of the 5, specifically those 3 that were *NOT* seed nodes, initiated a bootstrap procedure! Before the moves the cluster had been running for several days at least.
> The logs indicated the automatic token selection, and they joined the ring under a new automatically selected token.
> Presumably, this violated consistency but at the time there was no live traffic to the cluster and we didn't confirm (put traffic on it after repair+cleanup).
> I did look a little bit at the code in light of this but didn't see anything obvious, so I don't really know what the likely culprit is.
> A potential complication was that seed nodes were moved without using the correct procedure of de-seeding them first. This was clearly wrong, but it is not obvious to me that it would cause other nodes to incorrectly bootstrap since a node should *never* bootstrap more than once if the local system tables say it's been bootstrapped.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira