You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by buddhasystem <po...@bnl.gov> on 2011/01/26 16:00:11 UTC

Node going down when streaming data, what next?

I was moving a node and at some point it started streaming data to 2 other
nodes. Later, that node keeled over and let's assume I can't fix it for the
next 3 days and just want to move tokens on the remaining three to even out
and see if I can live with it.

But I can't do that! The node that was on the receiving end of the stream
refuses to move, because it's still "receiving".

What do I do?

Maxim

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5962944.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Node going down when streaming data, what next?

Posted by aaron morton <aa...@thelastpickle.com>.

Just to check whats going on...

You have a dead node 'A', it's turned off and still showing up in the nodetool ring results for other nodes but the token field is empty. And you've tried running nodetool removetoken on any other node in the cluster. Is that correct ?

Can you include the the output of the nodetool ring command and the version you are using ? 

Thanks
Aaron

On 29 Jan 2011, at 14:43, buddhasystem wrote:

> 
> It does remove tokens, and the "ring" shows that the problematic node owns 0
> tokens, which is OK. However, it's still there, listed.
> 
> It's not a bug but kind of like a feature -- you can move that node back in
> two days later and "move" tokens in same or different way.
> 
> What I wish happened was that API allowed for the nodetool to issue a
> command:
> 
> nodetool --host foobar removeempty
> 
> Which would then really scratch the node with zero tokens from the ring, no
> questions asked. Even if the flaky node physically disappeared.
> 
> -- 
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5971851.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Node going down when streaming data, what next?

Posted by buddhasystem <po...@bnl.gov>.

It does remove tokens, and the "ring" shows that the problematic node owns 0
tokens, which is OK. However, it's still there, listed.

It's not a bug but kind of like a feature -- you can move that node back in
two days later and "move" tokens in same or different way.

What I wish happened was that API allowed for the nodetool to issue a
command:

nodetool --host foobar removeempty

Which would then really scratch the node with zero tokens from the ring, no
questions asked. Even if the flaky node physically disappeared.

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5971851.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Node going down when streaming data, what next?

Posted by Robert Coli <rc...@digg.com>.

On Fri, Jan 28, 2011 at 1:51 PM, buddhasystem <po...@bnl.gov> wrote:
>
> I can "remove token" to any other
> node, but -- the dead machine is going to hang around in my "ring" reports
> like a zombie.

If you "remove token" on the other nodes and the dead machine "hangs
around", that sounds like a bug? I haven't  necessarily been following
this particular thread, but "remove token" is supposed to.. remove..
tokens.. ?

=Rob

Re: Node going down when streaming data, what next?

Posted by buddhasystem <po...@bnl.gov>.

Sorry Aaron but this doesn't help. As I said, machine is dead, kaput,
finished. So I can't do "decommission". I can "remove token" to any other
node, but -- the dead machine is going to hang around in my "ring" reports
like a zombie.

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5971349.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Node going down when streaming data, what next?

Posted by aaron morton <aa...@thelastpickle.com>.

nodetool remotetoken or nodetool decommission 
http://wiki.apache.org/cassandra/Operations#Removing_nodes_entirely

Hope that helps
Aaron

On 28 Jan 2011, at 11:30, buddhasystem wrote:

> 
> OK, after running "repair" and waiting overnight the rebalancing worked and
> now 3 nodes share the load as I expected. However, one node that is broken
> is still listed in the ring. I have no intention of reviving it. What's the
> optimal way to get rid of it as far as the ring configuration is concerned
> (it's still listed as "down" but I would like to really scratch it)?
> 
> Thanks,
> 
> Maxim
> 
> -- 
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5968075.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Node going down when streaming data, what next?

Posted by buddhasystem <po...@bnl.gov>.

OK, after running "repair" and waiting overnight the rebalancing worked and
now 3 nodes share the load as I expected. However, one node that is broken
is still listed in the ring. I have no intention of reviving it. What's the
optimal way to get rid of it as far as the ring configuration is concerned
(it's still listed as "down" but I would like to really scratch it)?

Thanks,

Maxim

-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5968075.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Node going down when streaming data, what next?

Posted by buddhasystem <po...@bnl.gov>.

Hello,

from what I know, you don't really have to restart "simultaneously",
although of course you don't want to wait.

I finally decided to use "removetoken" command to actually scratch out the
sickly node from the cluster. I'll bootstrap is later when it's fixed.


-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964804.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Node going down when streaming data, what next?

Posted by Dan Hendry <da...@gmail.com>.

When this has happened to me, restarting the node you are trying to
move works. I can't remeber the exact conditions but I have also hade
to restart all nodes in the cluster simultaneously once or twice as
well.

I would love to know if there is a better way of doing it.

On Wednesday, January 26, 2011, buddhasystem <po...@bnl.gov> wrote:
>
> Bump. I still don't know what is the best things to do, plz help.
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>

Re: Node going down when streaming data, what next?

Posted by buddhasystem <po...@bnl.gov>.

Bump. I still don't know what is the best things to do, plz help.
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.