You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Michael Morello <mi...@gmail.com> on 2012/07/09 14:01:39 UTC

Zookeeper and multiple data centers

Hi all,

I work on a project and I would be happy to have your thoughts about our
requirements and how Zookeeper meets them.

The facts :
* We need to share configuration items between 10 data centers.
Configuration must be synchronized between data centers (actually we can
tolerate a few seconds of inconsistency)
* Configuration items will be serialized in JSon and together they can fit
into 256MB of heap
* R/W ratio is 90% read and 10% write and client number should be low (50
to 100 in each data center)
* A client running in a DC can freely communicate with a host in an other DC
* Latency between data center is 20 to 60 ms
* Only 1 host (machine) per data center might be dedicated to a Zookeeper
process : machines are big IBM AIX boxes only one is dedicated for this
project in each DC
* Project must survive a data center crash

Since configuration items are small and they must be synchronized and we
need a fail-over mechanism Zookeeper appears to be a good candidate, but
i'm not sure how to deploy it mainly because we have to start only one
Zookeeper process in each data center.
My idea is to deploy 1 follower in only 5 DC. This way there are 5
followers all over the country and we can lost 2 DC). Of course all the
clients on all the data centers must know where are the 5 zookeeper servers.
Do you see any downside to do this ?

I know that Zookeeper has been designed to run on a LAN and on "commodity
hardware" but regarding the R/W ratio and the latency do you think that it
is a good idea to deploy it this way ?

Thanks for your comments

Best regards,
Michael

Re: Zookeeper and multiple data centers

Posted by Michael Morello <mi...@gmail.com>.
Hi Jean-Pierre,

Thank you for your update, here are some additional details :
The configuration will be distributed on a multitude of znodes, when i talk
about 256MB it is because we plan to have hundreds of them. We will keep
JSon data as small as possible.
Regarding the connectivity we use a private network and according to the
SLA a network availability of 99.99% is expected between data centers.
Knowing this do you still think that we will run
into SessionExpired and ConnectionLoss issues ?

Best regards,
Michael

2012/7/9 Jean-Pierre Koenig <je...@memonews.com>

> [...]
> But you should beware of large payload. ZK is not designed to handle
> huge amount of data [....] I highly recommend not more
> than 1024 KB payload.  The other point you should consider here is
> (network) latency. i guess your ZK clients (your 50) will see a lot of
> SessionExpired or ConnectionLoss exceptions, depending on the
> connectivity of your DC's among one another.
>
> On Mon, Jul 9, 2012 at 2:01 PM, Michael Morello
> <mi...@gmail.com> wrote:
> > Hi all,
> >
> > I work on a project and I would be happy to have your thoughts about our
> > requirements and how Zookeeper meets them.
> >
> > The facts :
> > * We need to share configuration items between 10 data centers.
> > Configuration must be synchronized between data centers (actually we can
> > tolerate a few seconds of inconsistency)
> > * Configuration items will be serialized in JSon and together they can
> fit
> > into 256MB of heap
> > * R/W ratio is 90% read and 10% write and client number should be low (50
> > to 100 in each data center)
> > * A client running in a DC can freely communicate with a host in an
> other DC
> > * Latency between data center is 20 to 60 ms
> > * Only 1 host (machine) per data center might be dedicated to a Zookeeper
> > process : machines are big IBM AIX boxes only one is dedicated for this
> > project in each DC
> > * Project must survive a data center crash
> >
> > Since configuration items are small and they must be synchronized and we
> > need a fail-over mechanism Zookeeper appears to be a good candidate, but
> > i'm not sure how to deploy it mainly because we have to start only one
> > Zookeeper process in each data center.
> > My idea is to deploy 1 follower in only 5 DC. This way there are 5
> > followers all over the country and we can lost 2 DC). Of course all the
> > clients on all the data centers must know where are the 5 zookeeper
> servers.
> > Do you see any downside to do this ?
> >
> > I know that Zookeeper has been designed to run on a LAN and on "commodity
> > hardware" but regarding the R/W ratio and the latency do you think that
> it
> > is a good idea to deploy it this way ?
>

Re: Zookeeper and multiple data centers

Posted by Jean-Pierre Koenig <je...@memonews.com>.
Hi Michael,

as you said, ZK does not require high end server hardware. Neither the
number of clients nor the size of your ZK quorum is a problem.

But you should beware of large payload. ZK is not designed to handle
huge amount of data and 512MB is much more than huge. Since ZK ack is
required by the majority of quorum (write requests) your data is
shipped throught the entire zk cluster. I highly recommend not more
than 1024 KB payload.  The other point you should consider here is
(network) latency. i guess your ZK clients (your 50) will see a lot of
SessionExpired or ConnectionLoss exceptions, depending on the
connectivity of your DC's among one another.

Regards, JP

On Mon, Jul 9, 2012 at 2:01 PM, Michael Morello
<mi...@gmail.com> wrote:
> Hi all,
>
> I work on a project and I would be happy to have your thoughts about our
> requirements and how Zookeeper meets them.
>
> The facts :
> * We need to share configuration items between 10 data centers.
> Configuration must be synchronized between data centers (actually we can
> tolerate a few seconds of inconsistency)
> * Configuration items will be serialized in JSon and together they can fit
> into 256MB of heap
> * R/W ratio is 90% read and 10% write and client number should be low (50
> to 100 in each data center)
> * A client running in a DC can freely communicate with a host in an other DC
> * Latency between data center is 20 to 60 ms
> * Only 1 host (machine) per data center might be dedicated to a Zookeeper
> process : machines are big IBM AIX boxes only one is dedicated for this
> project in each DC
> * Project must survive a data center crash
>
> Since configuration items are small and they must be synchronized and we
> need a fail-over mechanism Zookeeper appears to be a good candidate, but
> i'm not sure how to deploy it mainly because we have to start only one
> Zookeeper process in each data center.
> My idea is to deploy 1 follower in only 5 DC. This way there are 5
> followers all over the country and we can lost 2 DC). Of course all the
> clients on all the data centers must know where are the 5 zookeeper servers.
> Do you see any downside to do this ?
>
> I know that Zookeeper has been designed to run on a LAN and on "commodity
> hardware" but regarding the R/W ratio and the latency do you think that it
> is a good idea to deploy it this way ?
>
> Thanks for your comments
>
> Best regards,
> Michael



-- 
Jean-Pierre Koenig
Head of Technology

MeMo News AG
Sonnenstr. 4
CH-8280 Kreuzlingen

Tel: +41 71 508 24 86
Fax: +41 71 671 20 26
E-Mail: jean-pierre.koenig@memonews.com

http://www.memonews.com
http://twitter.com/MeMoNewsAG
http://facebook.com/MeMoNewsAG
http://xing.com/companies/MeMoNewsAG