You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2011/06/15 21:27:13 UTC

[Hadoop Wiki] Update of "ZooKeeper/MountRemoteZookeeper" by ebortnik

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "ZooKeeper/MountRemoteZookeeper" page has been changed by ebortnik:
http://wiki.apache.org/hadoop/ZooKeeper/MountRemoteZookeeper?action=diff&rev1=10&rev2=11

  The performance of read and write accesses to the home partition remains unaffected in most cases (see the details below), whereas accesses
  to remote partitions might suffer from lower latency/throughput. 
  
- Unlike the design proposed in http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper that addresses the same problem, we do not view the solution as
- partitioning a single namespace. Instead, our design provides a way for incorporating a part of one ZK namespace in 
+ Our design provides a way for incorporating a part of one ZK namespace in 
  another ZK namespace, similarly to ''mount'' in Linux. Each ZK cluster has its own leader, and intra-cluster requests are handled
  exactly as currently in ZK. Inter-cluster requests are coordinated by the local leader. We strive to minimize the inter-cluster communication 
  required to achieve the consistency guarantees.
  
  === Background ===
+ 
+ Namespace partitioning for throughput increase has been partially addressed in ZK before. 
+ The observer architecture allows read-only access to part of the namespace. The synchronization
+ with the partition's quorum happens in the background. Since the remote partition is read-only,
+ updates to it can be scheduled at arbitrary timing, and the consistency is preserved. The downside 
+ is restricted access semantics. Our solution can reuse the synchronization building block implemented
+ for observers. 
+ 
+ The design in http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper discusses how to detect
+ the need for partition automatically. It does not deal with the question of reconciling multiple partitions 
+ (containers). Hence, this discussion is largely orthogonal to our proposal. 
  
  Several Distributed Shared Memory (DSM) systems proposed in 1990's implemented sequential consistency, but hit
  performance bottlenecks around false sharing. ZK is a different case, because the namespace is structured as a filesystem,