You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Hari A V (JIRA)" <ji...@apache.org> on 2011/06/13 12:49:51 UTC

[jira] [Commented] (ZOOKEEPER-646) Namespace partitioning in ZK

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048510#comment-13048510 ] 

Hari A V commented on ZOOKEEPER-646:
------------------------------------

Hi Kay, 

I am looking forward to do a prototype on this. I would be very much interested to know the practical uses cases for Partitioned Zookeeper which you have in mind. As per my understanding, the very high level problem it tries to solve is write throughput scalability. i.e. When we add more Zookeeper nodes, we should be able to get more "write throughput". 

>From "https://cwiki.apache.org/ZOOKEEPER/partitionedzookeeper.html"
 "By having distinct ensembles handling different portions of the state, we end up relaxing the ordering guarantees" 
How different it is from directly running separate ensembles ? One can as well run different Zookeeper cluster to achieve this right? Whether the solution also address running multiple name spaces in , say an existing 3 Node Zookeeper cluster.  

I can think of something like this - 
Currently Write operations from all clients are processed sequentially by the Leader Zookeeper. The suggestion is to provide a provision for parallel writes for unrelated data in the same ensemble. For eg: In a cluster setup, the same ZK ensebmle may be used by Hbase for its metadata and other components for cluster confuration management. We don’t need to queue these operations and perform them sequentially. They can go parallel. But still all HBase operations may still need to be sequential to keep order of operations.

Here (http://ria101.wordpress.com/2010/05/12/locking-and-transactions-over-cassandra-using-cages/) I found another idea of Hash based partitioning for Zookeeper.
"The solution we suggest is simply to run more than one ZooKeeper cluster for the purposes of locking and transactions, and simply to hash locks and transactions onto particular clusters". 
Here they want to address about the locks. I am thinking of performing a hash on the "root nodes" itself (or introduce partition name) and perform operations paralelly in ZK Server(In most of the scenarios, znodes "/conf" and "/leaders" may be unrelated). Its more of running multiple partitions in the same ensemble. Effectively make writes paralell in Leader ZK in an ensemble. Still need to think more on transaction logs and snapshotting aspects and how this will be affected. 

I would be glad to hear from you guys.  

- Hari

> Namespace partitioning in ZK 
> -----------------------------
>
>                 Key: ZOOKEEPER-646
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-646
>             Project: ZooKeeper
>          Issue Type: New Feature
>            Reporter: Kay Kay
>
> Tracking JIRA for namespace partitioning in ZK 
> From the mailing list (- courtesy: Mahadev / Flavio ) , discussion during Jan 2010 - 
> "Hi, Mahadev said it all, we have been thinking about it for a while, but
> >> haven't had time to work on it. I also don't think we have a jira open for
> >> it; at least I couldn't find one. But, we did put together some comments:
> >>
> >>    http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeper
> >>
> >> One of the main issues we have observed there is that partitioning will
> >> force us to change our consistency guarantees, which is far from ideal.
> >> However, some users seem to be ok with it, but I'm not sure we have
> >> agreement.
> >>
> >> In any case, please feel free to contribute or simply express your
> >> interests so that we can take them into account.
> >>
> >> Thanks,
> >> -Flavio
> >>
> >>
> >> On Jan 15, 2010, at 12:49 AM, Mahadev Konar wrote:
> >>
> > >>> Hi kay,
> > >>>  the namespace partitioning in zookeeper has been on a back burner for a
> > >>> long time. There isnt any jira open on it. There had been some
> > >>> discussions
> > >>> on this but no real work. Flavio/Ben have had this on there minds for a
> > >>> while but no real work/proposal is out yet.
> > >>>
> > >>> May I know is this something you are looking for in production?
> > >>>
> > >>> Thanks
> > >>> mahadev
> "

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira