You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2010/02/11 15:50:58 UTC

[Cassandra Wiki] Trivial Update of "AntiEntropy" by JonathanEllis

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "AntiEntropy" page has been changed by JonathanEllis.
http://wiki.apache.org/cassandra/AntiEntropy?action=diff&rev1=3&rev2=4

--------------------------------------------------

+ == Implementation ==
+
Cassandra's implementation is modeled on Dynamo's, with [[ArchitectureAntiEntropy|modifications to support the richer data model]]. Quoting from [[http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html|Amazon's Dynamo]] section 4.7,

- . To detect the inconsistencies between replicas faster and to minimize the amount of transferred data, Dynamo uses Merkle trees. A Merkle tree is a hash tree where leaves are hashes of the values of individual keys. Parent nodes higher in the tree are hashes of their respective children. The principal advantage of Merkle tree is that each branch of the tree can be checked independently without requiring nodes to download the entire tree or the entire data set. Moreover, Merkle trees help in reducing the amount of data that needs to be transferred while checking for inconsistencies among replicas. For instance, if the hash values of the root of two trees are equal, then the values of the leaf nodes in the tree are equal and the nodes require no synchronization. If not, it implies that the values of some replicas are different. In such cases, the nodes may exchange the hash values of children and the process continues until it reaches the leaves of the trees, at which point the hosts can identify the keys that are “out of sync”. Merkle trees minimize the amount of data that needs to be transferred for synchronization and reduce the number of disk reads performed during the anti-entropy process.
+ . To detect the inconsistencies between replicas faster and to minimize the amount of transferred data, Dynamo uses Merkle trees [13]. A Merkle tree is a hash tree where leaves are hashes of the values of individual keys. Parent nodes higher in the tree are hashes of their respective children. The principal advantage of Merkle tree is that each branch of the tree can be checked independently without requiring nodes to download the entire [...] data set.

- . Dynamo uses Merkle trees for anti-entropy as follows: Each node maintains a separate Merkle tree for each key range (the set of keys covered by a virtual node) it hosts. This allows nodes to compare whether the keys within a key range are up-to-date. In this scheme, two nodes exchange the root of the Merkle tree corresponding to the key ranges that they host in common. Subsequently, using the tree traversal scheme described above the nodes determine if they have any differences and perform the appropriate synchronization action.
+ The key difference in Cassandra's implementation of anti-entropy is that the Merkle trees are built per column family, and they are not maintained for longer than it takes to send them to neighboring nodes. Instead, the trees are generated as snapshots of the dataset during major compactions: this means that excess data might be sent across the network, but it saves local disk IO, and is preferable for very large datasets.

+ == Operation ==
+ Anti entropy repairs can be triggered manually for a single node via 'nodetool repair', although they will occasionally trigger automatically.
+