You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Tom van den Berge <to...@drillster.com> on 2014/09/10 21:37:51 UTC

Node being rebuilt receives read requests

I have a datacenter with a single node, and I want to start using vnodes. I
have followed the instructions (
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html),
and set up a new node in a new datacenter (auto_bootstrap=false, seed=node
in old dc, num_tokens=256 and initial_token is not set, topoloy file
updated).

After starting the new node, I used "nodetool rebuild -- old-dc" to start
filling up the new node.

The idea was to switch the client app to the new node when completed, and
to decommission the old (non-vnodes) node.

While the new node was being filled, my client application (still connected
to the old node, and no auto-discovery of nodes enabled) started showing
errors about rows that could not be found. The only reason I can think of
is that for some reason, the old node reroutes some queries to the new
(incomplete) node. Why would the old node send requests to the new node?
The old node contains 100% of all data, since it is a single-node
datacenter with replication factor 1, so I would say there is no need to
forward the request to another node. And, even more important, the new node
is in the middle of a 'rebuild' process, and therefore does not have all
data.

I noticed that after starting the new node, and before issuing the
'nodetool rebuild' command, 'nodetool info' shows that the new (emtpy) node
has status Normal. I expected that the status would be 'Joining', since
it's not ready yet.

To me, it seems that the cluster does not know the difference between a
node that's being rebuilt, and a node that is ready, and therefore nodes
that are being rebuilt also receive requests from other nodes. If this is
correct, how should one set up a new datacenter, without affecting the
clients that are connected to the old one?

I learned that some time ago, consistency level LOCAL_ONE was introduced to
prevent cross-datacenter requests. I changed my client to use this
(instead of ONE), but it did not make a difference; I still saw many failed
queries in my client. I can't understand why.

Any help is greatly appreciated.

Thanks,
Tom