You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Chad Harrington <ch...@datascaler.com> on 2009/03/04 02:30:43 UTC

How large an ensemble can one build with Zookeeper?

Clearly Zookeeper can handle ensembles of a dozen or so servers.  How large
an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
Are there limitations that make the system unusable at large numbers of
servers?

Thanks,

-- 
Chad Harrington
CEO
DataScaler, Inc.
charrington@datascaler.com
201A Ravendale Dr.
Mountain View, CA  94043
Phone: 650-515-3437
Fax: 650-887-1544

Re: How large an ensemble can one build with Zookeeper?

Posted by David Pollak <fe...@gmail.com>.
JD,
When I last looked at HBase (about a year ago), the performance was lacking.
 Have there been material improvements in HBase's performance in the last
year?

Thanks,

David

PS -- If this is not the correct list for such questions, I pre-apologize.
 Just whack me with a 2x4 and I'll take the discussion off the ZooKeeper
list.

On Wed, Mar 4, 2009 at 6:02 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> David,
>
> This is exactly what we are doing in the HBase project (www.hbase.org).
> Zookeeper is currently being integrated for our next major version and some
> parts are already in place.
>
> Regards,
>
> J-D
>
> On Wed, Mar 4, 2009 at 9:00 AM, David Pollak
> <fe...@gmail.com>wrote:
>
> > On Tue, Mar 3, 2009 at 9:33 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> > > zookeeper is not really what you would call a scalable system because
> all
> > > transactions that are updates go through the leader for serialization.
> > > Zookeeper is, instead, a high throughput HA system. That said, the
> > > throughput of a modest zookeeper  cluster is fairly prodigous so for
> the
> > > normal application of coordinating a large cluster, these limits are
> > beyond
> > > what just about anyone needs.
> > >
> > > For other uses, though, 50 K updates per second wouldn't cut it.
> >
> >
> > I understand that Google uses Chubby (a ZooKeeper clone... or vice versa
> > :-)
> > ) as the coordination mechanism for Big Table.  Do you have any insight
> > into
> > Chubby's performance characteristics... and if it would be possible to
> > build
> > a Big Table clone that had scalability characteristics of Big Table with
> > ZooKeeper as the underlying coordinator?
> >
> >
> > >
> > >
> > >
> > > Sent from my iPhone
> > >
> > >
> > > On Mar 3, 2009, at 17:30, Chad Harrington <ch...@datascaler.com>
> > > wrote:
> > >
> > >  Clearly Zookeeper can handle ensembles of a dozen or so servers.  How
> > >> large
> > >> an ensemble can one build with Zookeeper?  100 servers?  10,000
> servers?
> > >> Are there limitations that make the system unusable at large numbers
> of
> > >> servers?
> > >>
> > >> Thanks,
> > >>
> > >> --
> > >> Chad Harrington
> > >> CEO
> > >> DataScaler, Inc.
> > >> charrington@datascaler.com
> > >> 201A Ravendale Dr.
> > >> Mountain View, CA  94043
> > >> Phone: 650-515-3437
> > >> Fax: 650-887-1544
> > >>
> > >
> >
> >
> > --
> > Lift, the simply functional web framework http://liftweb.net
> > Beginning Scala http://www.apress.com/book/view/1430219890
> > Follow me: http://twitter.com/dpp
> > Git some: http://github.com/dpp
> >
>



-- 
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Re: How large an ensemble can one build with Zookeeper?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
David,

This is exactly what we are doing in the HBase project (www.hbase.org).
Zookeeper is currently being integrated for our next major version and some
parts are already in place.

Regards,

J-D

On Wed, Mar 4, 2009 at 9:00 AM, David Pollak
<fe...@gmail.com>wrote:

> On Tue, Mar 3, 2009 at 9:33 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > zookeeper is not really what you would call a scalable system because all
> > transactions that are updates go through the leader for serialization.
> > Zookeeper is, instead, a high throughput HA system. That said, the
> > throughput of a modest zookeeper  cluster is fairly prodigous so for the
> > normal application of coordinating a large cluster, these limits are
> beyond
> > what just about anyone needs.
> >
> > For other uses, though, 50 K updates per second wouldn't cut it.
>
>
> I understand that Google uses Chubby (a ZooKeeper clone... or vice versa
> :-)
> ) as the coordination mechanism for Big Table.  Do you have any insight
> into
> Chubby's performance characteristics... and if it would be possible to
> build
> a Big Table clone that had scalability characteristics of Big Table with
> ZooKeeper as the underlying coordinator?
>
>
> >
> >
> >
> > Sent from my iPhone
> >
> >
> > On Mar 3, 2009, at 17:30, Chad Harrington <ch...@datascaler.com>
> > wrote:
> >
> >  Clearly Zookeeper can handle ensembles of a dozen or so servers.  How
> >> large
> >> an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
> >> Are there limitations that make the system unusable at large numbers of
> >> servers?
> >>
> >> Thanks,
> >>
> >> --
> >> Chad Harrington
> >> CEO
> >> DataScaler, Inc.
> >> charrington@datascaler.com
> >> 201A Ravendale Dr.
> >> Mountain View, CA  94043
> >> Phone: 650-515-3437
> >> Fax: 650-887-1544
> >>
> >
>
>
> --
> Lift, the simply functional web framework http://liftweb.net
> Beginning Scala http://www.apress.com/book/view/1430219890
> Follow me: http://twitter.com/dpp
> Git some: http://github.com/dpp
>

Re: How large an ensemble can one build with Zookeeper?

Posted by Ted Dunning <te...@gmail.com>.
Chubby and Zookeeper have very different ways at getting to similar
purposes.  Chubby is a locking service, while zookeeper is all about
avoiding locks.  Zookeeper is better described as a coordination service.

Regarding performance, I am pretty sure that Zookeeper could keep up with
some pretty enormous clusters quite easily.  I would expect that the
performance of the underlying file system is more like to be the critical
performance issue.

On Wed, Mar 4, 2009 at 6:00 AM, David Pollak
<fe...@gmail.com>wrote:

>
> I understand that Google uses Chubby (a ZooKeeper clone... or vice versa
> :-)
> ) as the coordination mechanism for Big Table.  Do you have any insight
> into
> Chubby's performance characteristics... and if it would be possible to
> build
> a Big Table clone that had scalability characteristics of Big Table with
> ZooKeeper as the underlying coordinator?
>
>

Re: How large an ensemble can one build with Zookeeper?

Posted by David Pollak <fe...@gmail.com>.
On Tue, Mar 3, 2009 at 9:33 PM, Ted Dunning <te...@gmail.com> wrote:

> zookeeper is not really what you would call a scalable system because all
> transactions that are updates go through the leader for serialization.
> Zookeeper is, instead, a high throughput HA system. That said, the
> throughput of a modest zookeeper  cluster is fairly prodigous so for the
> normal application of coordinating a large cluster, these limits are beyond
> what just about anyone needs.
>
> For other uses, though, 50 K updates per second wouldn't cut it.


I understand that Google uses Chubby (a ZooKeeper clone... or vice versa :-)
) as the coordination mechanism for Big Table.  Do you have any insight into
Chubby's performance characteristics... and if it would be possible to build
a Big Table clone that had scalability characteristics of Big Table with
ZooKeeper as the underlying coordinator?


>
>
>
> Sent from my iPhone
>
>
> On Mar 3, 2009, at 17:30, Chad Harrington <ch...@datascaler.com>
> wrote:
>
>  Clearly Zookeeper can handle ensembles of a dozen or so servers.  How
>> large
>> an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
>> Are there limitations that make the system unusable at large numbers of
>> servers?
>>
>> Thanks,
>>
>> --
>> Chad Harrington
>> CEO
>> DataScaler, Inc.
>> charrington@datascaler.com
>> 201A Ravendale Dr.
>> Mountain View, CA  94043
>> Phone: 650-515-3437
>> Fax: 650-887-1544
>>
>


-- 
Lift, the simply functional web framework http://liftweb.net
Beginning Scala http://www.apress.com/book/view/1430219890
Follow me: http://twitter.com/dpp
Git some: http://github.com/dpp

Re: How large an ensemble can one build with Zookeeper?

Posted by Ted Dunning <te...@gmail.com>.
zookeeper is not really what you would call a scalable system because  
all transactions that are updates go through the leader for  
serialization. Zookeeper is, instead, a high throughput HA system.  
That said, the throughput of a modest zookeeper  cluster is fairly  
prodigous so for the normal application of coordinating a large  
cluster, these limits are beyond what just about anyone needs.

For other uses, though, 50 K updates per second wouldn't cut it.


Sent from my iPhone

On Mar 3, 2009, at 17:30, Chad Harrington <ch...@datascaler.com>  
wrote:

> Clearly Zookeeper can handle ensembles of a dozen or so servers.   
> How large
> an ensemble can one build with Zookeeper?  100 servers?  10,000  
> servers?
> Are there limitations that make the system unusable at large numbers  
> of
> servers?
>
> Thanks,
>
> -- 
> Chad Harrington
> CEO
> DataScaler, Inc.
> charrington@datascaler.com
> 201A Ravendale Dr.
> Mountain View, CA  94043
> Phone: 650-515-3437
> Fax: 650-887-1544

Re: How large an ensemble can one build with Zookeeper?

Posted by Benjamin Reed <br...@yahoo-inc.com>.
I realize this is discussion is over, but i did want to make one quick 
clarification. when we talk about ensembles, we are talking about the 
servers that make up the zookeeper service. we refer to the servers that 
use the zookeeper service as clients. we have systems here that use 
ensembles of five servers to provide zookeeper service to thousands of 
client servers without problem.

ben

Chad Harrington wrote:
> Clearly Zookeeper can handle ensembles of a dozen or so servers.  How large
> an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
> Are there limitations that make the system unusable at large numbers of
> servers?
>
> Thanks,
>
>   


Re: How large an ensemble can one build with Zookeeper?

Posted by Mahadev Konar <ma...@yahoo-inc.com>.
HI Chad,
 The maximum number of zookeeper servers we have tested with is 13. Even
with 13 the performance starts to degrade very quickly (compared to ensemble
of 5 and 7). I am not sure we have the current numbers (we have made 3x or
so performance improvements) but with the old number in zookeeper.pdf on
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations

The slide is at the end.

You can see that the performance drops with 13 servers. We usually suggest 5
or 7 servers for ZooKeeper. We can get around 20K-30K writes per second and
more than 50K reads per second from an ensemble of 5 servers (as of now with
performance enhancements). With 5 servers you can tolerate a failure of 2
nodes. 
Please take a look at zookeeper presentations -
http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperPresentations
To find out more about Zookeeper.

What is the rationale behind having such a huge amount of zookeeper servers?

Thanks
mahadev


On 3/3/09 5:30 PM, "Chad Harrington" <ch...@datascaler.com> wrote:

> Clearly Zookeeper can handle ensembles of a dozen or so servers.  How large
> an ensemble can one build with Zookeeper?  100 servers?  10,000 servers?
> Are there limitations that make the system unusable at large numbers of
> servers?
> 
> Thanks,