You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by jb...@apache.org on 2011/09/20 13:59:45 UTC

svn commit: r1173099 - /cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/locator/TokenMetadata.java

Author: jbellis
Date: Tue Sep 20 11:59:45 2011
New Revision: 1173099

URL: http://svn.apache.org/viewvc?rev=1173099&view=rev
Log:
update comments

Modified:
    cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/locator/TokenMetadata.java

Modified: cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/locator/TokenMetadata.java
URL: http://svn.apache.org/viewvc/cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/locator/TokenMetadata.java?rev=1173099&r1=1173098&r2=1173099&view=diff
==============================================================================
--- cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/locator/TokenMetadata.java (original)
+++ cassandra/branches/cassandra-1.0.0/src/java/org/apache/cassandra/locator/TokenMetadata.java Tue Sep 20 11:59:45 2011
@@ -43,22 +43,34 @@ public class TokenMetadata
     /* Maintains token to endpoint map of every node in the cluster. */
     private BiMap<Token, InetAddress> tokenToEndpointMap;
 
-    // Suppose that there is a ring of nodes A, C and E, with replication factor 3.
+    // Prior to CASSANDRA-603, we just had <tt>Map<Range, InetAddress> pendingRanges<tt>,
+    // which was added to when a node began bootstrap and removed from when it finished.
+    //
+    // This is inadequate when multiple changes are allowed simultaneously.  For example,
+    // suppose that there is a ring of nodes A, C and E, with replication factor 3.
     // Node D bootstraps between C and E, so its pending ranges will be E-A, A-C and C-D.
-    // Now suppose node B bootstraps between A and C at the same time. Its pending ranges would be C-E, E-A and A-B.
-    // Now both nodes have pending range E-A in their list, which will cause pending range collision
-    // even though we're only talking about replica range, not even primary range. The same thing happens
-    // for any nodes that boot simultaneously between same two nodes. For this we cannot simply make pending ranges a <tt>Multimap</tt>,
-    // since that would make us unable to notice the real problem of two nodes trying to boot using the same token.
-    // In order to do this properly, we need to know what tokens are booting at any time.
+    // Now suppose node B bootstraps between A and C at the same time. Its pending ranges
+    // would be C-E, E-A and A-B. Now both nodes need to be assigned pending range E-A,
+    // which we would be unable to represent with the old Map.  The same thing happens
+    // even more obviously for any nodes that boot simultaneously between same two nodes.
+    //
+    // So, we made two changes:
+    //
+    // First, we changed pendingRanges to a <tt>Multimap<Range, InetAddress></tt> (now
+    // <tt>Map<String, Multimap<Range, InetAddress>></tt>, because replication strategy
+    // and options are per-KeySpace).
+    //
+    // Second, we added the bootstrapTokens and leavingEndpoints collections, so we can
+    // rebuild pendingRanges from the complete information of what is going on, when
+    // additional changes are made mid-operation.
+    //
+    // Finally, note that recording the tokens of joining nodes in bootstrapTokens also
+    // means we can detect and reject the addition of multiple nodes at the same token
+    // before one becomes part of the ring.
     private BiMap<Token, InetAddress> bootstrapTokens = HashBiMap.create();
-
-    // we will need to know at all times what nodes are leaving and calculate ranges accordingly.
-    // An anonymous pending ranges list is not enough, as that does not tell which node is leaving
-    // and/or if the ranges are there because of bootstrap or leave operation.
-    // (See CASSANDRA-603 for more detail + examples).
+    // (don't need to record Token here since it's still part of tokenToEndpointMap until it's done leaving)
     private Set<InetAddress> leavingEndpoints = new HashSet<InetAddress>();
-
+    // this is a cache of the calculation from {tokenToEndpointMap, bootstrapTokens, leavingEndpoints}
     private ConcurrentMap<String, Multimap<Range, InetAddress>> pendingRanges = new ConcurrentHashMap<String, Multimap<Range, InetAddress>>();
 
     // nodes which are migrating to the new tokens in the ring