You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Thunder Stumpges <th...@gmail.com> on 2014/02/06 22:19:58 UTC

Auto-Bootstrap not Auto-Bootstrapping?

Hi all,

We recently needed/wanted to reconfigure the disks for our 3-node C*2.0.4
Cassandra setup and rebuild the server at the same time. Upon adding the
newly rebuilt server into the cluster, it immediately started serving read
requests with no data! Then because the latency is so "good" the vast
majority of requests were pushed onto that server. We are using 3 nodes
with RF=3. Why wouldn't the node stream in the needed data before serving?
My impression was that the auto_bootstrap setting was true by default (we
have not set it anywhere) and that a new node entering the cluster would
stream in data for its tokens (virtual nodes) prior to serving requests.

Does this have to do with re-using the same name/ip as the old server which
also happens to be in the seed list on our clients and in cassandra.yaml ?

Our admin did the following steps during this process:

- Stop one of the 3 servers. It then appeared as DOWN to the rest of the
cluster.
- Rebuild the system, reconfigure disks (name and ip are same as the server
that came down)
  - NOTE: there was NO data left from before on this machine, it is a new
bare-metal install
- nodetool removenode <old_host_id> (from one of the other remaining nodes)
  - wait for completion ~15 min
- Start cassandra on new node, wait for it to come up
- nodetool repair (on new node)

Immediately when it came up it was as if we'd lost 1/3 of our data because
so many read requests were hitting this new empty node. There does appear
to be streaming data coming into the new node, but it is still serving many
empty reponses.

Another curious thing is that I set all of our reads to Quorum ahead of
time hoping if this did happen again (after the first time caught us out),
that the quorum reads would prevent the bad consistency. This does not
appear to have helped.

Any insight as to what the heck went wrong here would be greatly
appreciated.

Thanks,
Thunder