You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2010/01/04 02:56:37 UTC

[Cassandra Wiki] Update of "Operations" by Chris Goffinet

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Operations" page has been changed by Chris Goffinet.
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=18&rev2=19

--------------------------------------------------

  === Handling failure ===
  If a node goes down and comes back up, the ordinary repair mechanisms will be adequate to deal with any inconsistent data.  If a node goes down entirely, then you have two options:
  
+  1. (Recommended approach) Run `nodeprobe removetoken` on all live nodes. You will need to supply the token of the dead node. You can obtain this by running `nodeprobe ring` on any live node to find the token (Unless there was some kind of outage, and the others came up but not the down one).
+   
-  1. Bring up a replacement node with the same IP and Token as the old, and run `nodeprobe repair`.  Until the repair process is complete, clients reading only from this node may get no data back.  Using a higher !ConsistencyLevel on reads will avoid this.
-   * If you don't know the Token of the old node, you can retrieve it from any of the other nodes' `system` keyspace, !ColumnFamily `LocationInfo`, key `L`.
-   * You can also run  `nodeprobe ring `to lookup a node's token (Unless there was some kind of outage, and the others came up but not the down one).
-  1. Remove the old token ring entry with `nodeprobe removetoken`
-   * optionally, bootstrap a new node at either the old node's location (using the InitialToken configuration directive) or at an automatically determined one.  Since a bootstrapping node does not advertise itself as available for reads until it has all the data for its ranges transferred, this avoids the problem of clients reading at !ConsistencyLevel.ONE seeing empty replies.  This may also be more performant than using the `nodeprobe repair` approach; testing needed.
  
- Do not leave the old node permanently in the token ring as "Down;" when it is in this state the cluster thinks it may eventually come back up with its old data, and will not re-replicate the data it was responsible for elsewhere.
+  Next, bring up the replacement node with a new IP address, and !AutoBootstrap set to true in storage-conf.xml. This will place the replacement node in the cluster and find the appropriate position automatically. Then the bootstrap process begins. While this process runs, the node will not receive reads until finished. 
+ 
+  1. (Advanced approach) Bring up a replacement node with the same IP and token as the old, and run `nodeprobe repair`. Until the repair process is complete, clients reading only from this node may get no data back.  Using a higher !ConsistencyLevel on reads will avoid this. You can obtain the old token by running `nodeprobe ring` on any live node to find the token (Unless there was some kind of outage, and the others came up but not the down one).
+ 
+ The reason why you run `nodeprobe removetoken` on all live nodes is so that the Hinted Handoff can stop collecting writes for the failed node.
  
  == Backing up data ==
  Cassandra can snapshot data while online using `nodeprobe snapshot`.  You can then back up those snapshots using any desired system, although leaving them where they are is probably the option that makes the most sense on large clusters.