You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Tao Xie <xi...@gmail.com> on 2010/09/14 05:09:08 UTC

how about zookeeper overhead?

I see the following recommendation in
http://hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements

"It is recommended to run a ZooKeeper quorum of 3, 5 or 7 machines, and give
each ZooKeeper server around 1GB of RAM, and if possible, its own dedicated
disk. For very heavily loaded clusters, run ZooKeeper servers on separate
machines from the Region Servers (DataNodes and TaskTrackers).

Now my cofiguration is
1 master + NN
1 client (doing heavy put & get)
6 RS+DN+ZK.

If I start only one zk on the master node, I see throughput for put
operation increase. I want to know what's the correct way to configure zk
and if I have only one zk, what about the impacts to put and get
performance? Can the zk becomes bottleneck? I heard someone says the read
performance will be negatively affected. I haven't tested it yet.

Thanks.

RE: how about zookeeper overhead?

Posted by "Buttler, David" <bu...@llnl.gov>.
I think the standard advice is to use only one zk node for clusters of size < 10, and to collocate it with the namenode.  So, I would suggest changing your config to:
1 master + NN + ZK
1 client (doing heavy put & get)
6 RS+DN.

The reason you want to have an odd number of zk nodes is because zookeeper uses a quorum protocol that requires a majority of configured nodes to be operational (e.g. 1/2 + 1).  So if you have 6 zk nodes you will need to have 4 operational at any one time or the cluster goes down.  If you have 5 zk nodes, you will only need to have 3 available -- giving you an opportunity to perform maintenance on one and still be resilient to a single failure.  In general, the more zk nodes you have the faster reads will be (as they can be distributed among any of the nodes), and the slower writes will be (as all nodes must complete a write).

That being said, your cluster is so small that you don't have to worry so much about fault tolerance.  A single zk node will be much better.

Dave


-----Original Message-----
From: Tao Xie [mailto:xietao.mailbox@gmail.com] 
Sent: Monday, September 13, 2010 8:09 PM
To: user@hbase.apache.org
Subject: how about zookeeper overhead?

I see the following recommendation in
http://*hbase.apache.org/docs/r0.20.6/api/overview-summary.html#requirements

"It is recommended to run a ZooKeeper quorum of 3, 5 or 7 machines, and give
each ZooKeeper server around 1GB of RAM, and if possible, its own dedicated
disk. For very heavily loaded clusters, run ZooKeeper servers on separate
machines from the Region Servers (DataNodes and TaskTrackers).

Now my cofiguration is
1 master + NN
1 client (doing heavy put & get)
6 RS+DN+ZK.

If I start only one zk on the master node, I see throughput for put
operation increase. I want to know what's the correct way to configure zk
and if I have only one zk, what about the impacts to put and get
performance? Can the zk becomes bottleneck? I heard someone says the read
performance will be negatively affected. I haven't tested it yet.

Thanks.