You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Dhruba Borthakur <dh...@gmail.com> on 2011/07/01 23:49:50 UTC

Re: Discussion on supporting a large number of clients for a zk ensemble

Hi Ben/Camille: can you comment on Vishal's logs/config? The "local session"
idea seems promising to me.

Vishal: it would be nice if you create a JIRA with your proposal and we can
continue discussion in the JIRA?

thanks a bunch,
dhruba

On Mon, May 30, 2011 at 11:15 AM, Vishal Kathuria <vi...@fb.com>wrote:

> Thanks for looking at this Camille and Benjamin,
>
> setup:
> There are 5 machines, 2 hosting clients and 3 hosting servers.
> There is one client process on each of the client machines
> The client process has 20 threads, each thread with 500 sessions.
> So I have a total of 20K clients, so it isn't that high really
>
> Hardware
> Two proc Intel® Xeon® Processor L5420  (total 8 cores)
> 8G RAM
>
>
> The workload is fairly simple:
> All sessions do is keep a watch on a node. Once the watch fires, the client
> reads the contents of the node and puts the watch again.
> There is one thread that is periodically updating the node being watched
> (once every 30s - so very infrequent)
>
> When the system starts off, things are fine, then a few timers starts
> missing and eventually there are lots of expired connections.
>
> The logs are really long, but pretty much repetitive, so I am attaching the
> tail of the logs.
> The client timeout is 300s
>
> JVM Parameters
> -XX:+UseConcMarkSweepGC  -XX:+PrintGCDetails -XX:MaxGCPauseMillis=50
> -Dzookeeper.globalOutstandingLimit=30000 -Xms6000m -Xmx6000m -Xdebug
> -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8180
> I have GC logging turned on. I am not seeing long GC pauses, so I don't
> think that's it.
>
> Next steps I am trying
> 1. Look at the CPU utilization on the server machines
> 2. If the CPU is pegged at 100%, add some additional tracing in the server
> to validate my hypothesis that the session tracker is getting overwhelmed
>
> If you folks have any other suggestions, that would greatly help. I started
> working with zookeeper a couple of weeks ago so it is very likely I might be
> missing something obvious.
>
>
> Thanks!
> Vishal
>
> -----Original Message-----
> From: Benjamin Reed [mailto:breed@apache.org]
> Sent: Sunday, May 29, 2011 8:42 PM
> To: dev@zookeeper.apache.org
> Subject: Re: Discussion on supporting a large number of clients for a zk
> ensemble
>
> i second camille's suggestion. i also know there are other people looking
> into using zookeeper with a large number of clients, so it would be good to
> figure out what are the limits and then how to cross them. i like your
> proposed solutions, but i would rather start down that road after we have
> resolved the issues that we can for the normal clients.
>
> ben
>
> On Fri, May 27, 2011 at 4:23 PM, Fournier, Camille F. [Tech] <
> Camille.Fournier@gs.com> wrote:
> > I would recommend that you spend some time making sure that your guess
> about the cause is correct before trying to design solutions to the problem.
> Can you provide us some hard numbers, logs, and configuration information?
> It's always possible that some aspect of your configuration that you hadn't
> considered important is in fact the trigger here.
> >
> > Thanks,
> > Camille
> >
> > -----Original Message-----
> > From: Vishal Kathuria [mailto:vishal.kathuria@fb.com]
> > Sent: Friday, May 27, 2011 6:32 PM
> > To: dev@zookeeper.apache.org
> > Subject: Discussion on supporting a large number of clients for a zk
> > ensemble
> >
> > Hi Folks,
> > I wanted to start a discussion on how we can support a large number of
> > clients in zookeeper.  I am at facebook and we are using zookeeper for
> > quite a few projects. There are a couple of projects where we are
> > designing for a large number of clients. The projects are
> >
> >
> > 1.       Building a directory service for holding configuration
> information (lookup table for which node to go to for a given key).
> >
> > 2.       For HDFS clients, where clients lookup zookeeper for the
> > current namenode
> >
> > This information changes infrequently and is small, so update rate or
> size of data is not an issue.
> >
> > The key challenge is to support that large a number of clients (30K to
> start with, but eventually could be 100K).  A big chunk of the clients can
> try to connect/disconnect at the same time  - so herd effect can happen.
> >
> > I was trying out a 3 node ensemble. I noticed that with about 20K
> clients, there we quite a few session expires and disconnects.
> > I looked through the code briefly and since all the pings are eventually
> handled by the leader, my guess is that the leader thread is not keeping up.
> I haven't yet do the instrumentation/tracing to validate this.
> >
> > I have been thinking about how to improve this and thought of the
> following solution. I am trying to hit 2 goals with this.
> >
> > 1.       Make it possible to have a very large number of clients (each
> client has a watch) without losing connections too often.
> >
> > 2.       Improve how quickly a large number of clients can connect.
> >
> > Solution
> >
> > 1.       The idea is to introduce a new type of session - "local"
> session. A "local" session doesn't have a full functionality of a normal
> session.
> >
> > 2.       Local sessions cannot create ephemeral nodes.
> >
> > 3.       Once a local session is lost, you cannot re-establish it using
> the session-id/password. The session and its watches are gone for good.
> >
> > 4.       When a local session connects, the session info is only
> maintained on the zookeeper server that it is connected to. The leader is
> not aware of the creation of such a session and there is no state written to
> disk.
> >
> > 5.       The pings and expiration is handled by the server that the
> session is connected to.
> >
> > With the above changes, it should be easy to scale ZK by adding more
> learners, which manage the "local" sessions independently. Also, the rate at
> which you can establish "local" sessions, would be significantly higher than
> the normal sessions.
> >
> > Would like to stir up a discussion on whether this is the best way to
> achieve these goals or if I am missing simpler ways of accomplishing this.
> >
> > Thanks!
> > Vishal
> >
> > .
> >
> >
>



-- 
Connect to me at http://www.facebook.com/dhruba

RE: Discussion on supporting a large number of clients for a zk ensemble

Posted by Vishal Kathuria <vi...@fb.com>.
Thanks for the suggestion Dhruba.
I will open a Jira and continue the discussion there. I also got a chance to discuss some of the ideas at the zookeeper community meet yesterday.

I have prototyped some of my ideas and I should soon be able to share the performance sceanarios and measurements too.

Thanks!
Vishal

-----Original Message-----
From: Dhruba Borthakur [mailto:dhruba@gmail.com] 
Sent: Friday, July 01, 2011 2:50 PM
To: dev@zookeeper.apache.org
Subject: Re: Discussion on supporting a large number of clients for a zk ensemble

Hi Ben/Camille: can you comment on Vishal's logs/config? The "local session"
idea seems promising to me.

Vishal: it would be nice if you create a JIRA with your proposal and we can continue discussion in the JIRA?

thanks a bunch,
dhruba

On Mon, May 30, 2011 at 11:15 AM, Vishal Kathuria <vi...@fb.com>wrote:

> Thanks for looking at this Camille and Benjamin,
>
> setup:
> There are 5 machines, 2 hosting clients and 3 hosting servers.
> There is one client process on each of the client machines The client 
> process has 20 threads, each thread with 500 sessions.
> So I have a total of 20K clients, so it isn't that high really
>
> Hardware
> Two proc Intel(r) Xeon(r) Processor L5420  (total 8 cores) 8G RAM
>
>
> The workload is fairly simple:
> All sessions do is keep a watch on a node. Once the watch fires, the client
> reads the contents of the node and puts the watch again.
> There is one thread that is periodically updating the node being watched
> (once every 30s - so very infrequent)
>
> When the system starts off, things are fine, then a few timers starts
> missing and eventually there are lots of expired connections.
>
> The logs are really long, but pretty much repetitive, so I am attaching the
> tail of the logs.
> The client timeout is 300s
>
> JVM Parameters
> -XX:+UseConcMarkSweepGC  -XX:+PrintGCDetails -XX:MaxGCPauseMillis=50
> -Dzookeeper.globalOutstandingLimit=30000 -Xms6000m -Xmx6000m -Xdebug
> -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8180
> I have GC logging turned on. I am not seeing long GC pauses, so I don't
> think that's it.
>
> Next steps I am trying
> 1. Look at the CPU utilization on the server machines
> 2. If the CPU is pegged at 100%, add some additional tracing in the server
> to validate my hypothesis that the session tracker is getting overwhelmed
>
> If you folks have any other suggestions, that would greatly help. I started
> working with zookeeper a couple of weeks ago so it is very likely I might be
> missing something obvious.
>
>
> Thanks!
> Vishal
>
> -----Original Message-----
> From: Benjamin Reed [mailto:breed@apache.org]
> Sent: Sunday, May 29, 2011 8:42 PM
> To: dev@zookeeper.apache.org
> Subject: Re: Discussion on supporting a large number of clients for a zk
> ensemble
>
> i second camille's suggestion. i also know there are other people looking
> into using zookeeper with a large number of clients, so it would be good to
> figure out what are the limits and then how to cross them. i like your
> proposed solutions, but i would rather start down that road after we have
> resolved the issues that we can for the normal clients.
>
> ben
>
> On Fri, May 27, 2011 at 4:23 PM, Fournier, Camille F. [Tech] <
> Camille.Fournier@gs.com> wrote:
> > I would recommend that you spend some time making sure that your guess
> about the cause is correct before trying to design solutions to the problem.
> Can you provide us some hard numbers, logs, and configuration information?
> It's always possible that some aspect of your configuration that you hadn't
> considered important is in fact the trigger here.
> >
> > Thanks,
> > Camille
> >
> > -----Original Message-----
> > From: Vishal Kathuria [mailto:vishal.kathuria@fb.com]
> > Sent: Friday, May 27, 2011 6:32 PM
> > To: dev@zookeeper.apache.org
> > Subject: Discussion on supporting a large number of clients for a zk
> > ensemble
> >
> > Hi Folks,
> > I wanted to start a discussion on how we can support a large number of
> > clients in zookeeper.  I am at facebook and we are using zookeeper for
> > quite a few projects. There are a couple of projects where we are
> > designing for a large number of clients. The projects are
> >
> >
> > 1.       Building a directory service for holding configuration
> information (lookup table for which node to go to for a given key).
> >
> > 2.       For HDFS clients, where clients lookup zookeeper for the
> > current namenode
> >
> > This information changes infrequently and is small, so update rate or
> size of data is not an issue.
> >
> > The key challenge is to support that large a number of clients (30K to
> start with, but eventually could be 100K).  A big chunk of the clients can
> try to connect/disconnect at the same time  - so herd effect can happen.
> >
> > I was trying out a 3 node ensemble. I noticed that with about 20K
> clients, there we quite a few session expires and disconnects.
> > I looked through the code briefly and since all the pings are eventually
> handled by the leader, my guess is that the leader thread is not keeping up.
> I haven't yet do the instrumentation/tracing to validate this.
> >
> > I have been thinking about how to improve this and thought of the
> following solution. I am trying to hit 2 goals with this.
> >
> > 1.       Make it possible to have a very large number of clients (each
> client has a watch) without losing connections too often.
> >
> > 2.       Improve how quickly a large number of clients can connect.
> >
> > Solution
> >
> > 1.       The idea is to introduce a new type of session - "local"
> session. A "local" session doesn't have a full functionality of a normal
> session.
> >
> > 2.       Local sessions cannot create ephemeral nodes.
> >
> > 3.       Once a local session is lost, you cannot re-establish it using
> the session-id/password. The session and its watches are gone for good.
> >
> > 4.       When a local session connects, the session info is only
> maintained on the zookeeper server that it is connected to. The leader is
> not aware of the creation of such a session and there is no state written to
> disk.
> >
> > 5.       The pings and expiration is handled by the server that the
> session is connected to.
> >
> > With the above changes, it should be easy to scale ZK by adding more
> learners, which manage the "local" sessions independently. Also, the rate at
> which you can establish "local" sessions, would be significantly higher than
> the normal sessions.
> >
> > Would like to stir up a discussion on whether this is the best way to
> achieve these goals or if I am missing simpler ways of accomplishing this.
> >
> > Thanks!
> > Vishal
> >
> > .
> >
> >
>



-- 
Connect to me at http://www.facebook.com/dhruba