You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2010/01/04 02:29:15 UTC

[Cassandra Wiki] Update of "Operations" by JonathanEllis

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Operations" page has been changed by JonathanEllis.
The comment on this change is: explain repair/bootstrap options more clearly.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=17&rev2=18

--------------------------------------------------

1. Anti-Entropy: when `nodeprobe repair` is run, Cassandra performs a major compaction, computes a Merkle Tree of the data on that node, and compares it with the versions on other replicas, to catch any out of sync data that hasn't been read recently. This is intended to be run infrequently (e.g., weekly) since major compaction is relatively expensive.

=== Handling failure ===
- If a node goes down and comes back up, the ordinary repair mechanisms will be adequate to deal with any inconsistent data. If a node goes down entirely, you should be aware of the following as well:
+ If a node goes down and comes back up, the ordinary repair mechanisms will be adequate to deal with any inconsistent data. If a node goes down entirely, then you have two options:

- 1. Remove the old node from the ring first, or bring up a replacement node with the same IP and Token as the old; otherwise, the old node will stay part of the ring in a "down" state, which will degrade your replication factor for the affected Range
+ 1. Bring up a replacement node with the same IP and Token as the old, and run `nodeprobe repair`. Until the repair process is complete, clients reading only from this node may get no data back. Using a higher !ConsistencyLevel on reads will avoid this.
* If you don't know the Token of the old node, you can retrieve it from any of the other nodes' `system` keyspace, !ColumnFamily `LocationInfo`, key `L`.
* You can also run `nodeprobe ring `to lookup a node's token (Unless there was some kind of outage, and the others came up but not the down one).
- 1. Removing the old node, then bootstrapping the new one, may be more performant than using Anti-Entropy. Testing needed.
- * Even brute-force rsyncing of data from the relevant replicas and running cleanup on the replacement node may be more performant
+ 1. Remove the old token ring entry with `nodeprobe removetoken`
+ * optionally, bootstrap a new node at either the old node's location (using the InitialToken configuration directive) or at an automatically determined one. Since a bootstrapping node does not advertise itself as available for reads until it has all the data for its ranges transferred, this avoids the problem of clients reading at !ConsistencyLevel.ONE seeing empty replies. This may also be more performant than using the `nodeprobe repair` approach; testing needed.
+
+ Do not leave the old node permanently in the token ring as "Down;" when it is in this state the cluster thinks it may eventually come back up with its old data, and will not re-replicate the data it was responsible for elsewhere.

== Backing up data ==
Cassandra can snapshot data while online using `nodeprobe snapshot`. You can then back up those snapshots using any desired system, although leaving them where they are is probably the option that makes the most sense on large clusters.