You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Sholes, Joshua" <Jo...@cable.comcast.com> on 2014/02/02 19:48:45 UTC

One of my nodes is in the wrong datacenter - help!

All,

I had a node in my 8-node production 1.2.8 cluster have a serious problem and need to be removed and rebuilt.   However, after doing nodetool removenode and then bootstrapping a new node on the same IP address, the new node somehow ended up with a different datacenter name (the rest of the nodes are in dc $NAME, and the new one is in dc $NAME6934724 — as in, a string of seemingly random numbers appended to the correct name).   How can I force it to change DC names back to what it should be?

I’m working with 500+GB per node here so bootstrapping it again is not a huge issue, but I’d prefer to avoid it anyway.  I am NOT able to change the node’s IP address at this time so I’m stuck with bootstrapping a new node in the same place, which my gut feeling tells me might be part of the problem.

Any insight anyone can give me is highly appreciated.

Thanks,
--
Josh Sholes

Re: One of my nodes is in the wrong datacenter - help!

Posted by Edward Capriolo <ed...@gmail.com>.

Maybe that node was just trying to tell you that it really  wanted to work
in a different data center :)


On Mon, Feb 10, 2014 at 10:08 AM, Sholes, Joshua <
Joshua_Sholes@cable.comcast.com> wrote:

>  In case anyone was following this issue, it ended up being something
> that looked an awful lot like CASSANDRA-6053 -- when the node was removed,
> it didn't successfully remove from the peers table from all nodes, and thus
> several of them were doing their best to try to contact it despite it being
> down.
>  --
> Josh Sholes
>
>   From: <Sholes>, Josh Sholes <Jo...@cable.comcast.com>
> Date: Thursday, February 6, 2014 at 1:41 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: One of my nodes is in the wrong datacenter - help!
>
>    Thanks for the advice.   I did use "removenode" as I was aware of the
> replace_token problems.
> I haven't run into the issue in CASSANDRA-6615 yet, and I don't believe
> I'm at risk for it.
>
>  I'm actually running into a different problem.   Having done a remove
> node on the node with the incorrect datacenter name, I am still getting
> "one or more nodes were unavailable" messages when doing queries with
> consistency=all.   I'm doing a full repair pass on the column family in
> question just to be safe (which is taking forever!) before I do anything
> else.   So to reiterate:  my cluster now shows 7 nodes up when looking with
> gossipinfo or status, but will still not do consistency=all queries.   Are
> there any best practices for finding out other issues with the cluster, or
> should I anticipate the repair pass will fix the problem?
>  --
> Josh Sholes
>
>   From: Robert Coli <rc...@eventbrite.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Monday, February 3, 2014 at 7:30 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: One of my nodes is in the wrong datacenter - help!
>
>    On Sun, Feb 2, 2014 at 10:48 AM, Sholes, Joshua <
> Joshua_Sholes@cable.comcast.com> wrote:
>
>>  I had a node in my 8-node production 1.2.8 cluster have a serious
>> problem and need to be removed and rebuilt.   However, after doing nodetool
>> removenode and then bootstrapping a new node on the same IP address, the
>> new node somehow ended up with a different datacenter name (the rest of the
>> nodes are in dc $NAME, and the new one is in dc $NAME6934724 -- as in, a
>> string of seemingly random numbers appended to the correct name).   How can
>> I force it to change DC names back to what it should be?
>>
>
>  You could change the entry in the system.local columnfamily on the
> affected node...
>
>  cqlsh > update system.local set data_center = "$NAME";
>
> ... but that is Not Supported and may have side effects of which I am not
> aware.
>
>   I'm working with 500+GB per node here so bootstrapping it again is not
>> a huge issue, but I'd prefer to avoid it anyway.  I am NOT able to change
>> the node's IP address at this time so I'm stuck with bootstrapping a new
>> node in the same place, which my gut feeling tells me might be part of the
>> problem.
>>
>
>  Note that replace_node/replace_token are broken in 1.2.8, did you
> attempt to use either of these? I presume not because you said you did
> removenode...
>
>   If I were you, I would probably removenode and re-bootstrap, as the
> safest alternative.
>
>  As an aside, while trying to deal with this issue you should be aware of
> this ticket, so you do not do the sequence of actions it describes.
>
>  https://issues.apache.org/jira/browse/CASSANDRA-6615
>
>  =Rob
>

Re: One of my nodes is in the wrong datacenter - help!

Posted by "Sholes, Joshua" <Jo...@cable.comcast.com>.

In case anyone was following this issue, it ended up being something that looked an awful lot like CASSANDRA-6053 — when the node was removed, it didn’t successfully remove from the peers table from all nodes, and thus several of them were doing their best to try to contact it despite it being down.
--
Josh Sholes

From: <Sholes>, Josh Sholes <Jo...@cable.comcast.com>>
Date: Thursday, February 6, 2014 at 1:41 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: One of my nodes is in the wrong datacenter - help!

Thanks for the advice.   I did use “removenode” as I was aware of the replace_token problems.
I haven’t run into the issue in CASSANDRA-6615 yet, and I don’t believe I’m at risk for it.

I’m actually running into a different problem.   Having done a remove node on the node with the incorrect datacenter name, I am still getting “one or more nodes were unavailable” messages when doing queries with consistency=all.   I’m doing a full repair pass on the column family in question just to be safe (which is taking forever!) before I do anything else.   So to reiterate:  my cluster now shows 7 nodes up when looking with gossipinfo or status, but will still not do consistency=all queries.   Are there any best practices for finding out other issues with the cluster, or should I anticipate the repair pass will fix the problem?
--
Josh Sholes

From: Robert Coli <rc...@eventbrite.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Monday, February 3, 2014 at 7:30 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: One of my nodes is in the wrong datacenter - help!

On Sun, Feb 2, 2014 at 10:48 AM, Sholes, Joshua <Jo...@cable.comcast.com>> wrote:
I had a node in my 8-node production 1.2.8 cluster have a serious problem and need to be removed and rebuilt.   However, after doing nodetool removenode and then bootstrapping a new node on the same IP address, the new node somehow ended up with a different datacenter name (the rest of the nodes are in dc $NAME, and the new one is in dc $NAME6934724 — as in, a string of seemingly random numbers appended to the correct name).   How can I force it to change DC names back to what it should be?

You could change the entry in the system.local columnfamily on the affected node...

cqlsh > update system.local set data_center = "$NAME";

... but that is Not Supported and may have side effects of which I am not aware.

I’m working with 500+GB per node here so bootstrapping it again is not a huge issue, but I’d prefer to avoid it anyway.  I am NOT able to change the node’s IP address at this time so I’m stuck with bootstrapping a new node in the same place, which my gut feeling tells me might be part of the problem.

Note that replace_node/replace_token are broken in 1.2.8, did you attempt to use either of these? I presume not because you said you did removenode...

 If I were you, I would probably removenode and re-bootstrap, as the safest alternative.

As an aside, while trying to deal with this issue you should be aware of this ticket, so you do not do the sequence of actions it describes.

https://issues.apache.org/jira/browse/CASSANDRA-6615

=Rob

Re: One of my nodes is in the wrong datacenter - help!

Posted by "Sholes, Joshua" <Jo...@cable.comcast.com>.

Thanks for the advice.   I did use “removenode” as I was aware of the replace_token problems.
I haven’t run into the issue in CASSANDRA-6615 yet, and I don’t believe I’m at risk for it.

I’m actually running into a different problem.   Having done a remove node on the node with the incorrect datacenter name, I am still getting “one or more nodes were unavailable” messages when doing queries with consistency=all.   I’m doing a full repair pass on the column family in question just to be safe (which is taking forever!) before I do anything else.   So to reiterate:  my cluster now shows 7 nodes up when looking with gossipinfo or status, but will still not do consistency=all queries.   Are there any best practices for finding out other issues with the cluster, or should I anticipate the repair pass will fix the problem?
--
Josh Sholes

From: Robert Coli <rc...@eventbrite.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Monday, February 3, 2014 at 7:30 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: One of my nodes is in the wrong datacenter - help!

On Sun, Feb 2, 2014 at 10:48 AM, Sholes, Joshua <Jo...@cable.comcast.com>> wrote:
I had a node in my 8-node production 1.2.8 cluster have a serious problem and need to be removed and rebuilt.   However, after doing nodetool removenode and then bootstrapping a new node on the same IP address, the new node somehow ended up with a different datacenter name (the rest of the nodes are in dc $NAME, and the new one is in dc $NAME6934724 — as in, a string of seemingly random numbers appended to the correct name).   How can I force it to change DC names back to what it should be?

You could change the entry in the system.local columnfamily on the affected node...

cqlsh > update system.local set data_center = "$NAME";

... but that is Not Supported and may have side effects of which I am not aware.

I’m working with 500+GB per node here so bootstrapping it again is not a huge issue, but I’d prefer to avoid it anyway.  I am NOT able to change the node’s IP address at this time so I’m stuck with bootstrapping a new node in the same place, which my gut feeling tells me might be part of the problem.

Note that replace_node/replace_token are broken in 1.2.8, did you attempt to use either of these? I presume not because you said you did removenode...

 If I were you, I would probably removenode and re-bootstrap, as the safest alternative.

As an aside, while trying to deal with this issue you should be aware of this ticket, so you do not do the sequence of actions it describes.

https://issues.apache.org/jira/browse/CASSANDRA-6615

=Rob

Re: One of my nodes is in the wrong datacenter - help!

Posted by Robert Coli <rc...@eventbrite.com>.

On Sun, Feb 2, 2014 at 10:48 AM, Sholes, Joshua <
Joshua_Sholes@cable.comcast.com> wrote:

>  I had a node in my 8-node production 1.2.8 cluster have a serious
> problem and need to be removed and rebuilt.   However, after doing nodetool
> removenode and then bootstrapping a new node on the same IP address, the
> new node somehow ended up with a different datacenter name (the rest of the
> nodes are in dc $NAME, and the new one is in dc $NAME6934724 -- as in, a
> string of seemingly random numbers appended to the correct name).   How can
> I force it to change DC names back to what it should be?
>

You could change the entry in the system.local columnfamily on the affected
node...

cqlsh > update system.local set data_center = "$NAME";

... but that is Not Supported and may have side effects of which I am not
aware.

 I'm working with 500+GB per node here so bootstrapping it again is not a
> huge issue, but I'd prefer to avoid it anyway.  I am NOT able to change the
> node's IP address at this time so I'm stuck with bootstrapping a new node
> in the same place, which my gut feeling tells me might be part of the
> problem.
>

Note that replace_node/replace_token are broken in 1.2.8, did you attempt
to use either of these? I presume not because you said you did removenode...

 If I were you, I would probably removenode and re-bootstrap, as the safest
alternative.

As an aside, while trying to deal with this issue you should be aware of
this ticket, so you do not do the sequence of actions it describes.

https://issues.apache.org/jira/browse/CASSANDRA-6615

=Rob