You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2011/06/15 21:27:13 UTC
[Hadoop Wiki] Update of "ZooKeeper/MountRemoteZookeeper" by ebortnik
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "ZooKeeper/MountRemoteZookeeper" page has been changed by ebortnik:
http://wiki.apache.org/hadoop/ZooKeeper/MountRemoteZookeeper?action=diff&rev1=10&rev2=11
The performance of read and write accesses to the home partition remains unaffected in most cases (see the details below), whereas accesses
to remote partitions might suffer from lower latency/throughput.
- Unlike the design proposed in http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper that addresses the same problem, we do not view the solution as
- partitioning a single namespace. Instead, our design provides a way for incorporating a part of one ZK namespace in
+ Our design provides a way for incorporating a part of one ZK namespace in
another ZK namespace, similarly to ''mount'' in Linux. Each ZK cluster has its own leader, and intra-cluster requests are handled
exactly as currently in ZK. Inter-cluster requests are coordinated by the local leader. We strive to minimize the inter-cluster communication
required to achieve the consistency guarantees.
=== Background ===
+
+ Namespace partitioning for throughput increase has been partially addressed in ZK before.
+ The observer architecture allows read-only access to part of the namespace. The synchronization
+ with the partition's quorum happens in the background. Since the remote partition is read-only,
+ updates to it can be scheduled at arbitrary timing, and the consistency is preserved. The downside
+ is restricted access semantics. Our solution can reuse the synchronization building block implemented
+ for observers.
+
+ The design in http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper discusses how to detect
+ the need for partition automatically. It does not deal with the question of reconciling multiple partitions
+ (containers). Hence, this discussion is largely orthogonal to our proposal.
Several Distributed Shared Memory (DSM) systems proposed in 1990's implemented sequential consistency, but hit
performance bottlenecks around false sharing. ZK is a different case, because the namespace is structured as a filesystem,