You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2011/06/28 07:51:38 UTC

[Cassandra Wiki] Update of "Operations" by TimSmith

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "Operations" page has been changed by TimSmith:
http://wiki.apache.org/cassandra/Operations?action=diff&rev1=94&rev2=95

Comment:
Fix typos

  See PerformanceTuning
  
  == Schema management ==
- Server clocks should be synchronized with something like ntp.  Otherwise, schema changes may be rejected as being obsolete.
+ Server clocks should be synchronized with something like NTP.  Otherwise, schema changes may be rejected as being obsolete.
  
  See LiveSchemaUpdates [refers to functionality in 0.7]
  
@@ -23, +23 @@

  === Token selection ===
  Using a strong hash function means !RandomPartitioner keys will, on average, be evenly spread across the Token space, but you can still have imbalances if your Tokens do not divide up the range evenly, so you should specify !InitialToken to your first nodes as `i * (2**127 / N)` for i = 0 .. N-1. In Cassandra 0.7, you should specify `initial_token` in `cassandra.yaml`.
  
- With !NetworkTopologyStrategy, you should calculate the tokens the nodes in each DC independantly. Tokens still neded to be unique, so you can add 1 to the tokens in the 2nd DC, add 2 in the 3rd, and so on.  Thus, for a 4-node cluster in 2 datacenters, you would have
+ With !NetworkTopologyStrategy, you should calculate the tokens the nodes in each DC independently. Tokens still needed to be unique, so you can add 1 to the tokens in the 2nd DC, add 2 in the 3rd, and so on.  Thus, for a 4-node cluster in 2 datacenters, you would have
  {{{
  DC1
  node 1 = 0
@@ -91, +91 @@

  Important things to note:
  
   1. You should wait long enough for all the nodes in your cluster to become aware of the bootstrapping node via gossip before starting another bootstrap.  The new node will log "Bootstrapping" when this is safe, 2 minutes after starting.  (90s to make sure it has accurate load information, and 30s waiting for other nodes to start sending it inserts happening in its to-be-assumed part of the token ring.)
-  1. Relating to point 1, one can only boostrap N nodes at a time with automatic token picking, where N is the size of the existing cluster. If you need to more than double the size of your cluster, you have to wait for the first N nodes to finish until your cluster is size 2N before bootstrapping more nodes. So if your current cluster is 5 nodes and you want add 7 nodes, bootstrap 5 and let those finish before boostrapping the last two.
+  1. Relating to point 1, one can only bootstrap N nodes at a time with automatic token picking, where N is the size of the existing cluster. If you need to more than double the size of your cluster, you have to wait for the first N nodes to finish until your cluster is size 2N before bootstrapping more nodes. So if your current cluster is 5 nodes and you want add 7 nodes, bootstrap 5 and let those finish before bootstrapping the last two.
   1. As a safety measure, Cassandra does not automatically remove data from nodes that "lose" part of their Token Range to a newly added node.  Run `nodetool cleanup` on the source node(s) (neighboring nodes that shared the same subrange) when you are satisfied the new node is up and working. If you do not do this the old data will still be counted against the load on that node and future bootstrap attempts at choosing a location will be thrown off.
   1. When bootstrapping a new node, existing nodes have to divide the key space before beginning replication.  This can take awhile, so be patient.
   1. During bootstrap, a node will drop the Thrift port and will not be accessible from `nodetool`.
@@ -148, +148 @@

  Consider how to schedule your repairs. A repair causes additional disk and CPU activity on the nodes participating in the repair, and it will typically be a good idea to spread repairs out over time so as to minimize the chances of repairs running concurrently on many nodes.
  
  ==== Dealing with the consequences of nodetool repair not running within GCGraceSeconds ====
- If `nodetool repair` has not been run often enough to the pointthat GCGraceSeconds has passed, you risk forgotten deletes (see DistributedDeletes). In addition to data popping up that has been deleted, you may see inconsistencies in data return from different nodes that will not self-heal by read-repair or further `nodetool repair`. Some further details on this latter effect is documented in [[https://issues.apache.org/jira/browse/CASSANDRA-1316|CASSANDRA-1316]].
+ If `nodetool repair` has not been run often enough to the point that GCGraceSeconds has passed, you risk forgotten deletes (see DistributedDeletes). In addition to data popping up that has been deleted, you may see inconsistencies in data return from different nodes that will not self-heal by read-repair or further `nodetool repair`. Some further details on this latter effect is documented in [[https://issues.apache.org/jira/browse/CASSANDRA-1316|CASSANDRA-1316]].
  
  There are at least three ways to deal with this scenario.