You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Narasimha Tadepalli <Na...@pervasive.com> on 2012/05/15 19:29:01 UTC

Distribution Problems With Multiple Zookeeper Clients

Dear All

We have a situation where messages are not distributed equally when we have multiple clients listening to one zookeeper cluster. Say we have 30 client instances listing to one cluster and when 1000 messages submitted in 30 mins to cluster I assume each client approximately supposed to receive 33 messages. But out of 30 only 10 client instances taking max load and rest of them getting very low volume of messages. Is it something can be configurable in zookeeper settings or need to implement some custom solution at our end to distribute equally? Before I reinvent the wheel looking around for some suggestions if any of you faced similar situation.

Thanks
Narasimha

RE: Distribution Problems With Multiple Zookeeper Clients

Posted by Narasimha Tadepalli <Na...@pervasive.com>.

Hi Camille

Your assumption totally right. When I verified again the clients are getting more events based on their priority of registration with server. The clients which are registered late getting less notifications. As you suggested I will check where I can control that behavior either in server side or client side. 

Thanks
Narasimha

-----Original Message-----
From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of Camille Fournier
Sent: Thursday, May 17, 2012 1:22 PM
To: user@zookeeper.apache.org
Subject: Re: Distribution Problems With Multiple Zookeeper Clients

The below is written assuming that all clients are seeing all events, but then they race to get a lock of some sort to do the work, and the same 10 are always getting the lock to do the work. If in fact not all of your clients are even getting all the events, that's another problem.

So here's what I think happens, although other devs that know this code better may prove me wrong. When a client connects to a server and creates a watch for a particular path, that member of the ZK quorum adds the watch for that path to a WatchManager. The WatchManager underlying has a HashSet that contains the watches for that path. When an event happens on that path, the server will iteratate through the watchers on that path and send them the watch notification.
It's quite possible that if your events are infrequent and/or your client servers aren't that loaded, what will happen is that the first few clients that registered that watch on each quorum member are likely to receive and process the watch first because their notifications were sent first, and will also always reset the watch for that path first if your code is written to reset the watch immediately upon receiving the notification.
They always win the race, and thus always do all the work.
In general, the indication is that you have more clients that you need available to do the work you want to do. If in fact you don't, perhaps the right thing to do is to investigate how you are handing off work and responding to watch notifications within your client. IE, if you have a client that is already doing some work and it gets a watch notification, it may not want to race for the lock. You may want to schedule trying to get the lock and then process more work in a limited thread pool, so that you know there's a limit of N tasks that can be being performed by each client and thus scope the max load on each server.

Does this make sense?

C

On Wed, May 16, 2012 at 3:46 PM, Narasimha Tadepalli < Narasimha.Tadepalli@pervasive.com> wrote:

> Hi Camille
>
> Sorry for the confusion. Yes it is watches. We have multiple clients 
> configured to watch on event change at server end. For example we have 
> a data directory of /data/345/text. All 30 clients keep watching for 
> event change under /data/345 directory if there is any change clients 
> need to process and read the child nodes. In this situation not all 
> clients not getting equal events. I am looking for a way to distribute 
> the load equally to all client instances. I hope I provided enough 
> clarification now or else let me know.
>
> Thanks
> Narasimha
>
> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of 
> Camille Fournier
> Sent: Tuesday, May 15, 2012 1:20 PM
> To: user@zookeeper.apache.org
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> I'm not sure what you mean by messages. Are you talking about watches? 
> Can you describe your clients in more detail?
>
> Thanks,
> Camille
>
> On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli < 
> Narasimha.Tadepalli@pervasive.com> wrote:
>
> > Dear All
> >
> > We have a situation where messages are not distributed equally when 
> > we have multiple clients listening to one zookeeper cluster. Say we 
> > have
> > 30 client instances listing to one cluster and when 1000 messages 
> > submitted in
> > 30 mins to cluster I assume each client approximately supposed to 
> > receive
> > 33 messages. But out of 30 only 10 client instances taking max load 
> > and rest of them getting very low volume of messages. Is it 
> > something can be configurable in zookeeper settings or need to 
> > implement some custom solution at our end to distribute equally? 
> > Before I reinvent the wheel looking around for some suggestions if 
> > any of you faced
> similar situation.
> >
> > Thanks
> > Narasimha
> >
> >
>
>

Re: Distribution Problems With Multiple Zookeeper Clients

Posted by Camille Fournier <ca...@apache.org>.

Again, if you want your clients to perform equal work, you need to balance
when they will take jobs with how many jobs they are currently processing.
If instance1 is doing 100 jobs and it shouldn't be, then there must be a
case when instance1 is running one job and getting the lock to run another,
but instance20 (say) is not running anything. If you want to balance
better, you need to change the way you race to grab the lock to do a job.

I never suggested you were locking a job that wasn't ready to process. But
your clients are locking a job when they are already busy, and this means
that the early clients are doing more work than you want them to. Here's a
pseudoalgoritm that would fix this:
client
when(watch notification)
if(my # jobs in flight == 0)
try to grab lock immediately
else
wait(# jobs in flight * 100ms * random)
try to grab lock

Now, if client 1 gets a watch notification but it already has a job in
flight, it's going to sleep a bit before it tries to grab the lock. This
will give the later clients a chance to get the lock first.

A better way to do this is to have a bounded queue of threads to process
locking and work, but I can't write you a pseudoalgorithm for that and I
suspect it would be a bit beyond what you really need.

C

On Fri, May 25, 2012 at 1:30 PM, Narasimha Tadepalli <
Narasimha.Tadepalli@pervasive.com> wrote:

> Actually we are locking the jobs before accepting new jobs. None of the
> workers won't lock the job if it is not ready to process yet. Let me ask
> you this in relation to your second response where you expressed some good
> assumptions.
>
> Below stats give you some rough estimate on what exactly going on.
>
> Zookeeper Client                Total Number of Jobs processed in two hour
> time
>
> Client Instance1 ------------------>         100
> Client Instance2 ------------------>         90
> Client Instance3 ------------------>         80
> Client Instance4 ------------------>         70
> Client Instance5 ------------------>         60
> Client Instance6 ------------------>         50
> Client Instance7 ------------------>         40
> Client Instance8 ------------------>         30
>
>
> All these instances started 24 hours back in different time slots, but
> data which I presented here for last two hours. Your assumption was Client
> Instance1 registered with server first and that's why it succeeding the
> race in receiving the event notification first always. Which is right also
> after verifying the facts. But my problem here how do I force each of this
> clients to perform equally or approximately equal. Ie. All worker instances
> should able to process 65 jobs in two hours ( all 8 workers processed 520
> which is divided by 8 = 65). As I mentioned it doesn't have to be exact 65
> but not 30 or 100. I hope you can understand my situation clearly now. BTW
> in reality we launch workers between 50 to 100 in a day.
>
> Thanks
> Narasimha
>
>
>
>
> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of
> Camille Fournier
> Sent: Friday, May 25, 2012 11:48 AM
> To: user@zookeeper.apache.org
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> If your code is doing the following:
> client gets watch notification
> client immediately tries to grab lock
> client then puts job in queue to process
>
> That's not going to work.
>
> You need to do
> client gets watch notification
> client puts lock grab in queue with work that is being processed when
> queue has bandwidth, try to grab lock and process job
>
> The grabbing of the lock to do work and the queue of threads available to
> do work need to be coupled, otherwise you are grabbing work you don't have
> capacity to do.
>
> You can also hack this by
> client gets watch notification
> client does a random sleep or a sleep based on amount of work currently on
> this machine, then tries to grab lock
>
> C
>
> On Fri, May 25, 2012 at 12:41 PM, Narasimha Tadepalli <
> Narasimha.Tadepalli@pervasive.com> wrote:
>
> > No actually server keep accumulating lot of jobs in queue which are
> > not picking up by any of these idle worker instances. Those jobs are
> > waiting until the other workers finished their currently processing
> > jobs. Where do you exactly suggesting me to put sleeps to prevent
> > watchers receiving events further? As long as zookeeper session is
> > active I didn't find any way to control these watchers to stop
> > receiving events. Please advise me if there is way to control it.
> >
> > Thanks
> > Narasimha
> >
> > -----Original Message-----
> > From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of
> > Camille Fournier
> > Sent: Thursday, May 24, 2012 5:22 PM
> > To: user@zookeeper.apache.org
> > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> >
> > You can put random sleeps in after you get a notification before you
> > try to grab the lock, or sleeps based on the active job count, to
> > favor workers with no or few jobs in flight. It seems to me that if
> > you have limited the jobs able to be processed on a worker by limiting
> > your thread pool appropriately, and if you still aren't hitting all 30
> > servers, maybe you don't need 30 servers to be doing these jobs? Is that
> possible?
> >
> > C
> >
> >
> > On Thu, May 24, 2012 at 3:55 PM, Narasimha Tadepalli <
> > Narasimha.Tadepalli@pervasive.com> wrote:
> >
> > > Hi Camille
> > >
> > > I tried to control the job load at zookeeper clients by minimizing
> > > the number of jobs to process, but no luck in forcing the other idle
> > > workers to pick up the events. I am wondering if there is any way I
> > > can force the watcher to stop receiving events or force the
> > > zookeeper to connection time out without calling .close() method to
> > > let it retry to connect to the server which makes it rest of the
> > > client instances move to top priority while receiving the events.
> > > Appreciate your help
> > again.
> > >
> > > Thanks
> > > Narasimha
> > >
> > >
> > > -----Original Message-----
> > > From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf
> > > Of Camille Fournier
> > > Sent: Thursday, May 17, 2012 1:22 PM
> > > To: user@zookeeper.apache.org
> > > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> > >
> > > The below is written assuming that all clients are seeing all
> > > events, but then they race to get a lock of some sort to do the
> > > work, and the same 10 are always getting the lock to do the work. If
> > > in fact not all of your clients are even getting all the events,
> that's another problem.
> > >
> > >
> > > So here's what I think happens, although other devs that know this
> > > code better may prove me wrong. When a client connects to a server
> > > and creates a watch for a particular path, that member of the ZK
> > > quorum adds the watch for that path to a WatchManager. The
> > > WatchManager underlying has a HashSet that contains the watches for
> > > that path. When an event happens on that path, the server will
> > > iteratate through the watchers on that path and send them the watch
> notification.
> > > It's quite possible that if your events are infrequent and/or your
> > > client servers aren't that loaded, what will happen is that the
> > > first few clients that registered that watch on each quorum member
> > > are likely to receive and process the watch first because their
> > > notifications were sent first, and will also always reset the watch
> > > for that path first if your code is written to reset the watch
> > immediately upon receiving the notification.
> > > They always win the race, and thus always do all the work.
> > > In general, the indication is that you have more clients that you
> > > need available to do the work you want to do. If in fact you don't,
> > > perhaps the right thing to do is to investigate how you are handing
> > > off work and responding to watch notifications within your client.
> > > IE, if you have a client that is already doing some work and it gets
> > > a watch notification, it may not want to race for the lock. You may
> > > want to schedule trying to get the lock and then process more work
> > > in a limited thread pool, so that you know there's a limit of N
> > > tasks that can be being performed by each client and thus scope the
> > > max load on
> > each server.
> > >
> > > Does this make sense?
> > >
> > > C
> > >
> > > On Wed, May 16, 2012 at 3:46 PM, Narasimha Tadepalli <
> > > Narasimha.Tadepalli@pervasive.com> wrote:
> > >
> > > > Hi Camille
> > > >
> > > > Sorry for the confusion. Yes it is watches. We have multiple
> > > > clients configured to watch on event change at server end. For
> > > > example we have a data directory of /data/345/text. All 30 clients
> > > > keep watching for event change under /data/345 directory if there
> > > > is any change clients need to process and read the child nodes. In
> > > > this situation not all clients not getting equal events. I am
> > > > looking for a way to distribute the load equally to all client
> > > > instances. I hope I provided enough clarification now or else let me
> know.
> > > >
> > > > Thanks
> > > > Narasimha
> > > >
> > > > -----Original Message-----
> > > > From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf
> > > > Of Camille Fournier
> > > > Sent: Tuesday, May 15, 2012 1:20 PM
> > > > To: user@zookeeper.apache.org
> > > > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> > > >
> > > > I'm not sure what you mean by messages. Are you talking about
> watches?
> > > > Can you describe your clients in more detail?
> > > >
> > > > Thanks,
> > > > Camille
> > > >
> > > > On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli <
> > > > Narasimha.Tadepalli@pervasive.com> wrote:
> > > >
> > > > > Dear All
> > > > >
> > > > > We have a situation where messages are not distributed equally
> > > > > when we have multiple clients listening to one zookeeper cluster.
> > > > > Say we have
> > > > > 30 client instances listing to one cluster and when 1000
> > > > > messages submitted in
> > > > > 30 mins to cluster I assume each client approximately supposed
> > > > > to receive
> > > > > 33 messages. But out of 30 only 10 client instances taking max
> > > > > load and rest of them getting very low volume of messages. Is it
> > > > > something can be configurable in zookeeper settings or need to
> > > > > implement some custom solution at our end to distribute equally?
> > > > > Before I reinvent the wheel looking around for some suggestions
> > > > > if any of you faced
> > > > similar situation.
> > > > >
> > > > > Thanks
> > > > > Narasimha
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>

RE: Distribution Problems With Multiple Zookeeper Clients

Posted by Narasimha Tadepalli <Na...@pervasive.com>.

Actually we are locking the jobs before accepting new jobs. None of the workers won't lock the job if it is not ready to process yet. Let me ask you this in relation to your second response where you expressed some good assumptions.

Below stats give you some rough estimate on what exactly going on.

Zookeeper Client		Total Number of Jobs processed in two hour time

Client Instance1 ------------------>         100
Client Instance2 ------------------>         90
Client Instance3 ------------------>         80
Client Instance4 ------------------>         70
Client Instance5 ------------------>         60
Client Instance6 ------------------>         50
Client Instance7 ------------------>         40
Client Instance8 ------------------>         30


All these instances started 24 hours back in different time slots, but data which I presented here for last two hours. Your assumption was Client Instance1 registered with server first and that's why it succeeding the race in receiving the event notification first always. Which is right also after verifying the facts. But my problem here how do I force each of this clients to perform equally or approximately equal. Ie. All worker instances should able to process 65 jobs in two hours ( all 8 workers processed 520 which is divided by 8 = 65). As I mentioned it doesn't have to be exact 65 but not 30 or 100. I hope you can understand my situation clearly now. BTW in reality we launch workers between 50 to 100 in a day.

Thanks
Narasimha




-----Original Message-----
From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of Camille Fournier
Sent: Friday, May 25, 2012 11:48 AM
To: user@zookeeper.apache.org
Subject: Re: Distribution Problems With Multiple Zookeeper Clients

If your code is doing the following:
client gets watch notification
client immediately tries to grab lock
client then puts job in queue to process

That's not going to work.

You need to do
client gets watch notification
client puts lock grab in queue with work that is being processed when queue has bandwidth, try to grab lock and process job

The grabbing of the lock to do work and the queue of threads available to do work need to be coupled, otherwise you are grabbing work you don't have capacity to do.

You can also hack this by
client gets watch notification
client does a random sleep or a sleep based on amount of work currently on this machine, then tries to grab lock

C

On Fri, May 25, 2012 at 12:41 PM, Narasimha Tadepalli < Narasimha.Tadepalli@pervasive.com> wrote:

> No actually server keep accumulating lot of jobs in queue which are 
> not picking up by any of these idle worker instances. Those jobs are 
> waiting until the other workers finished their currently processing 
> jobs. Where do you exactly suggesting me to put sleeps to prevent 
> watchers receiving events further? As long as zookeeper session is 
> active I didn't find any way to control these watchers to stop 
> receiving events. Please advise me if there is way to control it.
>
> Thanks
> Narasimha
>
> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of 
> Camille Fournier
> Sent: Thursday, May 24, 2012 5:22 PM
> To: user@zookeeper.apache.org
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> You can put random sleeps in after you get a notification before you 
> try to grab the lock, or sleeps based on the active job count, to 
> favor workers with no or few jobs in flight. It seems to me that if 
> you have limited the jobs able to be processed on a worker by limiting 
> your thread pool appropriately, and if you still aren't hitting all 30 
> servers, maybe you don't need 30 servers to be doing these jobs? Is that possible?
>
> C
>
>
> On Thu, May 24, 2012 at 3:55 PM, Narasimha Tadepalli < 
> Narasimha.Tadepalli@pervasive.com> wrote:
>
> > Hi Camille
> >
> > I tried to control the job load at zookeeper clients by minimizing 
> > the number of jobs to process, but no luck in forcing the other idle 
> > workers to pick up the events. I am wondering if there is any way I 
> > can force the watcher to stop receiving events or force the 
> > zookeeper to connection time out without calling .close() method to 
> > let it retry to connect to the server which makes it rest of the 
> > client instances move to top priority while receiving the events. 
> > Appreciate your help
> again.
> >
> > Thanks
> > Narasimha
> >
> >
> > -----Original Message-----
> > From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf 
> > Of Camille Fournier
> > Sent: Thursday, May 17, 2012 1:22 PM
> > To: user@zookeeper.apache.org
> > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> >
> > The below is written assuming that all clients are seeing all 
> > events, but then they race to get a lock of some sort to do the 
> > work, and the same 10 are always getting the lock to do the work. If 
> > in fact not all of your clients are even getting all the events, that's another problem.
> >
> >
> > So here's what I think happens, although other devs that know this 
> > code better may prove me wrong. When a client connects to a server 
> > and creates a watch for a particular path, that member of the ZK 
> > quorum adds the watch for that path to a WatchManager. The 
> > WatchManager underlying has a HashSet that contains the watches for 
> > that path. When an event happens on that path, the server will 
> > iteratate through the watchers on that path and send them the watch notification.
> > It's quite possible that if your events are infrequent and/or your 
> > client servers aren't that loaded, what will happen is that the 
> > first few clients that registered that watch on each quorum member 
> > are likely to receive and process the watch first because their 
> > notifications were sent first, and will also always reset the watch 
> > for that path first if your code is written to reset the watch
> immediately upon receiving the notification.
> > They always win the race, and thus always do all the work.
> > In general, the indication is that you have more clients that you 
> > need available to do the work you want to do. If in fact you don't, 
> > perhaps the right thing to do is to investigate how you are handing 
> > off work and responding to watch notifications within your client. 
> > IE, if you have a client that is already doing some work and it gets 
> > a watch notification, it may not want to race for the lock. You may 
> > want to schedule trying to get the lock and then process more work 
> > in a limited thread pool, so that you know there's a limit of N 
> > tasks that can be being performed by each client and thus scope the 
> > max load on
> each server.
> >
> > Does this make sense?
> >
> > C
> >
> > On Wed, May 16, 2012 at 3:46 PM, Narasimha Tadepalli < 
> > Narasimha.Tadepalli@pervasive.com> wrote:
> >
> > > Hi Camille
> > >
> > > Sorry for the confusion. Yes it is watches. We have multiple 
> > > clients configured to watch on event change at server end. For 
> > > example we have a data directory of /data/345/text. All 30 clients 
> > > keep watching for event change under /data/345 directory if there 
> > > is any change clients need to process and read the child nodes. In 
> > > this situation not all clients not getting equal events. I am 
> > > looking for a way to distribute the load equally to all client 
> > > instances. I hope I provided enough clarification now or else let me know.
> > >
> > > Thanks
> > > Narasimha
> > >
> > > -----Original Message-----
> > > From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf 
> > > Of Camille Fournier
> > > Sent: Tuesday, May 15, 2012 1:20 PM
> > > To: user@zookeeper.apache.org
> > > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> > >
> > > I'm not sure what you mean by messages. Are you talking about watches?
> > > Can you describe your clients in more detail?
> > >
> > > Thanks,
> > > Camille
> > >
> > > On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli < 
> > > Narasimha.Tadepalli@pervasive.com> wrote:
> > >
> > > > Dear All
> > > >
> > > > We have a situation where messages are not distributed equally 
> > > > when we have multiple clients listening to one zookeeper cluster.
> > > > Say we have
> > > > 30 client instances listing to one cluster and when 1000 
> > > > messages submitted in
> > > > 30 mins to cluster I assume each client approximately supposed 
> > > > to receive
> > > > 33 messages. But out of 30 only 10 client instances taking max 
> > > > load and rest of them getting very low volume of messages. Is it 
> > > > something can be configurable in zookeeper settings or need to 
> > > > implement some custom solution at our end to distribute equally?
> > > > Before I reinvent the wheel looking around for some suggestions 
> > > > if any of you faced
> > > similar situation.
> > > >
> > > > Thanks
> > > > Narasimha
> > > >
> > > >
> > >
> > >
> >
> >
>
>

Re: Distribution Problems With Multiple Zookeeper Clients

Posted by Camille Fournier <ca...@apache.org>.

If your code is doing the following:
client gets watch notification
client immediately tries to grab lock
client then puts job in queue to process

That's not going to work.

You need to do
client gets watch notification
client puts lock grab in queue with work that is being processed
when queue has bandwidth, try to grab lock and process job

The grabbing of the lock to do work and the queue of threads available to
do work need to be coupled, otherwise you are grabbing work you don't have
capacity to do.

You can also hack this by
client gets watch notification
client does a random sleep or a sleep based on amount of work currently on
this machine, then tries to grab lock

C

On Fri, May 25, 2012 at 12:41 PM, Narasimha Tadepalli <
Narasimha.Tadepalli@pervasive.com> wrote:

> No actually server keep accumulating lot of jobs in queue which are not
> picking up by any of these idle worker instances. Those jobs are waiting
> until the other workers finished their currently processing jobs. Where do
> you exactly suggesting me to put sleeps to prevent watchers receiving
> events further? As long as zookeeper session is active I didn't find any
> way to control these watchers to stop receiving events. Please advise me if
> there is way to control it.
>
> Thanks
> Narasimha
>
> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of
> Camille Fournier
> Sent: Thursday, May 24, 2012 5:22 PM
> To: user@zookeeper.apache.org
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> You can put random sleeps in after you get a notification before you try
> to grab the lock, or sleeps based on the active job count, to favor workers
> with no or few jobs in flight. It seems to me that if you have limited the
> jobs able to be processed on a worker by limiting your thread pool
> appropriately, and if you still aren't hitting all 30 servers, maybe you
> don't need 30 servers to be doing these jobs? Is that possible?
>
> C
>
>
> On Thu, May 24, 2012 at 3:55 PM, Narasimha Tadepalli <
> Narasimha.Tadepalli@pervasive.com> wrote:
>
> > Hi Camille
> >
> > I tried to control the job load at zookeeper clients by minimizing the
> > number of jobs to process, but no luck in forcing the other idle
> > workers to pick up the events. I am wondering if there is any way I
> > can force the watcher to stop receiving events or force the zookeeper
> > to connection time out without calling .close() method to let it retry
> > to connect to the server which makes it rest of the client instances
> > move to top priority while receiving the events. Appreciate your help
> again.
> >
> > Thanks
> > Narasimha
> >
> >
> > -----Original Message-----
> > From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of
> > Camille Fournier
> > Sent: Thursday, May 17, 2012 1:22 PM
> > To: user@zookeeper.apache.org
> > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> >
> > The below is written assuming that all clients are seeing all events,
> > but then they race to get a lock of some sort to do the work, and the
> > same 10 are always getting the lock to do the work. If in fact not all
> > of your clients are even getting all the events, that's another problem.
> >
> >
> > So here's what I think happens, although other devs that know this
> > code better may prove me wrong. When a client connects to a server and
> > creates a watch for a particular path, that member of the ZK quorum
> > adds the watch for that path to a WatchManager. The WatchManager
> > underlying has a HashSet that contains the watches for that path. When
> > an event happens on that path, the server will iteratate through the
> > watchers on that path and send them the watch notification.
> > It's quite possible that if your events are infrequent and/or your
> > client servers aren't that loaded, what will happen is that the first
> > few clients that registered that watch on each quorum member are
> > likely to receive and process the watch first because their
> > notifications were sent first, and will also always reset the watch
> > for that path first if your code is written to reset the watch
> immediately upon receiving the notification.
> > They always win the race, and thus always do all the work.
> > In general, the indication is that you have more clients that you need
> > available to do the work you want to do. If in fact you don't, perhaps
> > the right thing to do is to investigate how you are handing off work
> > and responding to watch notifications within your client. IE, if you
> > have a client that is already doing some work and it gets a watch
> > notification, it may not want to race for the lock. You may want to
> > schedule trying to get the lock and then process more work in a
> > limited thread pool, so that you know there's a limit of N tasks that
> > can be being performed by each client and thus scope the max load on
> each server.
> >
> > Does this make sense?
> >
> > C
> >
> > On Wed, May 16, 2012 at 3:46 PM, Narasimha Tadepalli <
> > Narasimha.Tadepalli@pervasive.com> wrote:
> >
> > > Hi Camille
> > >
> > > Sorry for the confusion. Yes it is watches. We have multiple clients
> > > configured to watch on event change at server end. For example we
> > > have a data directory of /data/345/text. All 30 clients keep
> > > watching for event change under /data/345 directory if there is any
> > > change clients need to process and read the child nodes. In this
> > > situation not all clients not getting equal events. I am looking for
> > > a way to distribute the load equally to all client instances. I hope
> > > I provided enough clarification now or else let me know.
> > >
> > > Thanks
> > > Narasimha
> > >
> > > -----Original Message-----
> > > From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf
> > > Of Camille Fournier
> > > Sent: Tuesday, May 15, 2012 1:20 PM
> > > To: user@zookeeper.apache.org
> > > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> > >
> > > I'm not sure what you mean by messages. Are you talking about watches?
> > > Can you describe your clients in more detail?
> > >
> > > Thanks,
> > > Camille
> > >
> > > On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli <
> > > Narasimha.Tadepalli@pervasive.com> wrote:
> > >
> > > > Dear All
> > > >
> > > > We have a situation where messages are not distributed equally
> > > > when we have multiple clients listening to one zookeeper cluster.
> > > > Say we have
> > > > 30 client instances listing to one cluster and when 1000 messages
> > > > submitted in
> > > > 30 mins to cluster I assume each client approximately supposed to
> > > > receive
> > > > 33 messages. But out of 30 only 10 client instances taking max
> > > > load and rest of them getting very low volume of messages. Is it
> > > > something can be configurable in zookeeper settings or need to
> > > > implement some custom solution at our end to distribute equally?
> > > > Before I reinvent the wheel looking around for some suggestions if
> > > > any of you faced
> > > similar situation.
> > > >
> > > > Thanks
> > > > Narasimha
> > > >
> > > >
> > >
> > >
> >
> >
>
>

RE: Distribution Problems With Multiple Zookeeper Clients

Posted by Narasimha Tadepalli <Na...@pervasive.com>.

No actually server keep accumulating lot of jobs in queue which are not picking up by any of these idle worker instances. Those jobs are waiting until the other workers finished their currently processing jobs. Where do you exactly suggesting me to put sleeps to prevent watchers receiving events further? As long as zookeeper session is active I didn't find any way to control these watchers to stop receiving events. Please advise me if there is way to control it. 

Thanks
Narasimha

-----Original Message-----
From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of Camille Fournier
Sent: Thursday, May 24, 2012 5:22 PM
To: user@zookeeper.apache.org
Subject: Re: Distribution Problems With Multiple Zookeeper Clients

You can put random sleeps in after you get a notification before you try to grab the lock, or sleeps based on the active job count, to favor workers with no or few jobs in flight. It seems to me that if you have limited the jobs able to be processed on a worker by limiting your thread pool appropriately, and if you still aren't hitting all 30 servers, maybe you don't need 30 servers to be doing these jobs? Is that possible?

C


On Thu, May 24, 2012 at 3:55 PM, Narasimha Tadepalli < Narasimha.Tadepalli@pervasive.com> wrote:

> Hi Camille
>
> I tried to control the job load at zookeeper clients by minimizing the 
> number of jobs to process, but no luck in forcing the other idle 
> workers to pick up the events. I am wondering if there is any way I 
> can force the watcher to stop receiving events or force the zookeeper 
> to connection time out without calling .close() method to let it retry 
> to connect to the server which makes it rest of the client instances 
> move to top priority while receiving the events. Appreciate your help again.
>
> Thanks
> Narasimha
>
>
> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of 
> Camille Fournier
> Sent: Thursday, May 17, 2012 1:22 PM
> To: user@zookeeper.apache.org
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> The below is written assuming that all clients are seeing all events, 
> but then they race to get a lock of some sort to do the work, and the 
> same 10 are always getting the lock to do the work. If in fact not all 
> of your clients are even getting all the events, that's another problem.
>
>
> So here's what I think happens, although other devs that know this 
> code better may prove me wrong. When a client connects to a server and 
> creates a watch for a particular path, that member of the ZK quorum 
> adds the watch for that path to a WatchManager. The WatchManager 
> underlying has a HashSet that contains the watches for that path. When 
> an event happens on that path, the server will iteratate through the 
> watchers on that path and send them the watch notification.
> It's quite possible that if your events are infrequent and/or your 
> client servers aren't that loaded, what will happen is that the first 
> few clients that registered that watch on each quorum member are 
> likely to receive and process the watch first because their 
> notifications were sent first, and will also always reset the watch 
> for that path first if your code is written to reset the watch immediately upon receiving the notification.
> They always win the race, and thus always do all the work.
> In general, the indication is that you have more clients that you need 
> available to do the work you want to do. If in fact you don't, perhaps 
> the right thing to do is to investigate how you are handing off work 
> and responding to watch notifications within your client. IE, if you 
> have a client that is already doing some work and it gets a watch 
> notification, it may not want to race for the lock. You may want to 
> schedule trying to get the lock and then process more work in a 
> limited thread pool, so that you know there's a limit of N tasks that 
> can be being performed by each client and thus scope the max load on each server.
>
> Does this make sense?
>
> C
>
> On Wed, May 16, 2012 at 3:46 PM, Narasimha Tadepalli < 
> Narasimha.Tadepalli@pervasive.com> wrote:
>
> > Hi Camille
> >
> > Sorry for the confusion. Yes it is watches. We have multiple clients 
> > configured to watch on event change at server end. For example we 
> > have a data directory of /data/345/text. All 30 clients keep 
> > watching for event change under /data/345 directory if there is any 
> > change clients need to process and read the child nodes. In this 
> > situation not all clients not getting equal events. I am looking for 
> > a way to distribute the load equally to all client instances. I hope 
> > I provided enough clarification now or else let me know.
> >
> > Thanks
> > Narasimha
> >
> > -----Original Message-----
> > From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf 
> > Of Camille Fournier
> > Sent: Tuesday, May 15, 2012 1:20 PM
> > To: user@zookeeper.apache.org
> > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> >
> > I'm not sure what you mean by messages. Are you talking about watches?
> > Can you describe your clients in more detail?
> >
> > Thanks,
> > Camille
> >
> > On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli < 
> > Narasimha.Tadepalli@pervasive.com> wrote:
> >
> > > Dear All
> > >
> > > We have a situation where messages are not distributed equally 
> > > when we have multiple clients listening to one zookeeper cluster. 
> > > Say we have
> > > 30 client instances listing to one cluster and when 1000 messages 
> > > submitted in
> > > 30 mins to cluster I assume each client approximately supposed to 
> > > receive
> > > 33 messages. But out of 30 only 10 client instances taking max 
> > > load and rest of them getting very low volume of messages. Is it 
> > > something can be configurable in zookeeper settings or need to 
> > > implement some custom solution at our end to distribute equally?
> > > Before I reinvent the wheel looking around for some suggestions if 
> > > any of you faced
> > similar situation.
> > >
> > > Thanks
> > > Narasimha
> > >
> > >
> >
> >
>
>

Re: Understanding Load on Zookeeper Box

Posted by Matthew Ward <ma...@pixelpipe.com>.

Sorry dyslexic moment, for Zookeeper 3.4.3.

On May 24, 2012, at 3:42 PM, Matthew Ward wrote:

> I have a couple theories and questions I was hoping to clear up (all java based 3.3.4):
> 1) I have been trying to troubleshoot the reason for high system wait time on one of our zookeeper instances. The theory I have is that setting watches increases the system wait load. Does this theory sound accurate?
> 2) Question 2 is a follow up to the first... whenever I do a watch and wait for the event, I have an 'insurance policy' (since AWS is fun...) of setting a mutex with a timeout, before retrying the operation and potentially setting another watch. How does zookeeper handle duplicate watches? Am I exacerbating the system wait load issue by setting duplicate watches? If there a way I should cancel the watch?
> 
> Thanks For the Insight,
> Matt
>

Re: Understanding Load on Zookeeper Box

Posted by Patrick Hunt <ph...@apache.org>.

On Thu, May 24, 2012 at 3:42 PM, Matthew Ward <ma...@pixelpipe.com> wrote:
> I have a couple theories and questions I was hoping to clear up (all java based 3.3.4):
> 1) I have been trying to troubleshoot the reason for high system wait time on one of our zookeeper instances. The theory I have is that setting watches increases the system wait load. Does this theory sound accurate?

The two most common causes of high latency are GC/swapping and high
disk utilization on the transaction log (WAL). Check for that first.

Have you seen this page?
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Troubleshooting

Given you mention AWS in q2 that might also be related - remember
you're not accessing the disk(s) directly so disk issues are even more
likely - the main issue being that we need to fsync the txnlog before
responding to the proposal. (I often use strace on the fsync fdatasync
methods to track/graph this)

> 2) Question 2 is a follow up to the first... whenever I do a watch and wait for the event, I have an 'insurance policy' (since AWS is fun...) of setting a mutex with a timeout, before retrying the operation and potentially setting another watch. How does zookeeper handle duplicate watches? Am I exacerbating the system wait load issue by setting duplicate watches? If there a way I should cancel the watch?

A particular session can establish only a single watch on a particular
path. Multiple watches have no negative effect (other than a
round-trip read to the server of course).

Patrick

Understanding Load on Zookeeper Box

Posted by Matthew Ward <ma...@pixelpipe.com>.

I have a couple theories and questions I was hoping to clear up (all java based 3.3.4):
1) I have been trying to troubleshoot the reason for high system wait time on one of our zookeeper instances. The theory I have is that setting watches increases the system wait load. Does this theory sound accurate?
2) Question 2 is a follow up to the first... whenever I do a watch and wait for the event, I have an 'insurance policy' (since AWS is fun...) of setting a mutex with a timeout, before retrying the operation and potentially setting another watch. How does zookeeper handle duplicate watches? Am I exacerbating the system wait load issue by setting duplicate watches? If there a way I should cancel the watch?

Thanks For the Insight,
Matt

Re: Distribution Problems With Multiple Zookeeper Clients

Posted by Camille Fournier <ca...@apache.org>.

You can put random sleeps in after you get a notification before you try to
grab the lock, or sleeps based on the active job count, to favor workers
with no or few jobs in flight. It seems to me that if you have limited the
jobs able to be processed on a worker by limiting your thread pool
appropriately, and if you still aren't hitting all 30 servers, maybe you
don't need 30 servers to be doing these jobs? Is that possible?

C


On Thu, May 24, 2012 at 3:55 PM, Narasimha Tadepalli <
Narasimha.Tadepalli@pervasive.com> wrote:

> Hi Camille
>
> I tried to control the job load at zookeeper clients by minimizing the
> number of jobs to process, but no luck in forcing the other idle workers to
> pick up the events. I am wondering if there is any way I can force the
> watcher to stop receiving events or force the zookeeper to connection time
> out without calling .close() method to let it retry to connect to the
> server which makes it rest of the client instances move to top priority
> while receiving the events. Appreciate your help again.
>
> Thanks
> Narasimha
>
>
> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of
> Camille Fournier
> Sent: Thursday, May 17, 2012 1:22 PM
> To: user@zookeeper.apache.org
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> The below is written assuming that all clients are seeing all events, but
> then they race to get a lock of some sort to do the work, and the same 10
> are always getting the lock to do the work. If in fact not all of your
> clients are even getting all the events, that's another problem.
>
>
> So here's what I think happens, although other devs that know this code
> better may prove me wrong. When a client connects to a server and creates a
> watch for a particular path, that member of the ZK quorum adds the watch
> for that path to a WatchManager. The WatchManager underlying has a HashSet
> that contains the watches for that path. When an event happens on that
> path, the server will iteratate through the watchers on that path and send
> them the watch notification.
> It's quite possible that if your events are infrequent and/or your client
> servers aren't that loaded, what will happen is that the first few clients
> that registered that watch on each quorum member are likely to receive and
> process the watch first because their notifications were sent first, and
> will also always reset the watch for that path first if your code is
> written to reset the watch immediately upon receiving the notification.
> They always win the race, and thus always do all the work.
> In general, the indication is that you have more clients that you need
> available to do the work you want to do. If in fact you don't, perhaps the
> right thing to do is to investigate how you are handing off work and
> responding to watch notifications within your client. IE, if you have a
> client that is already doing some work and it gets a watch notification, it
> may not want to race for the lock. You may want to schedule trying to get
> the lock and then process more work in a limited thread pool, so that you
> know there's a limit of N tasks that can be being performed by each client
> and thus scope the max load on each server.
>
> Does this make sense?
>
> C
>
> On Wed, May 16, 2012 at 3:46 PM, Narasimha Tadepalli <
> Narasimha.Tadepalli@pervasive.com> wrote:
>
> > Hi Camille
> >
> > Sorry for the confusion. Yes it is watches. We have multiple clients
> > configured to watch on event change at server end. For example we have
> > a data directory of /data/345/text. All 30 clients keep watching for
> > event change under /data/345 directory if there is any change clients
> > need to process and read the child nodes. In this situation not all
> > clients not getting equal events. I am looking for a way to distribute
> > the load equally to all client instances. I hope I provided enough
> > clarification now or else let me know.
> >
> > Thanks
> > Narasimha
> >
> > -----Original Message-----
> > From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of
> > Camille Fournier
> > Sent: Tuesday, May 15, 2012 1:20 PM
> > To: user@zookeeper.apache.org
> > Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> >
> > I'm not sure what you mean by messages. Are you talking about watches?
> > Can you describe your clients in more detail?
> >
> > Thanks,
> > Camille
> >
> > On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli <
> > Narasimha.Tadepalli@pervasive.com> wrote:
> >
> > > Dear All
> > >
> > > We have a situation where messages are not distributed equally when
> > > we have multiple clients listening to one zookeeper cluster. Say we
> > > have
> > > 30 client instances listing to one cluster and when 1000 messages
> > > submitted in
> > > 30 mins to cluster I assume each client approximately supposed to
> > > receive
> > > 33 messages. But out of 30 only 10 client instances taking max load
> > > and rest of them getting very low volume of messages. Is it
> > > something can be configurable in zookeeper settings or need to
> > > implement some custom solution at our end to distribute equally?
> > > Before I reinvent the wheel looking around for some suggestions if
> > > any of you faced
> > similar situation.
> > >
> > > Thanks
> > > Narasimha
> > >
> > >
> >
> >
>
>

RE: Distribution Problems With Multiple Zookeeper Clients

Posted by Narasimha Tadepalli <Na...@pervasive.com>.

Hi Camille

I tried to control the job load at zookeeper clients by minimizing the number of jobs to process, but no luck in forcing the other idle workers to pick up the events. I am wondering if there is any way I can force the watcher to stop receiving events or force the zookeeper to connection time out without calling .close() method to let it retry to connect to the server which makes it rest of the client instances move to top priority while receiving the events. Appreciate your help again.

Thanks
Narasimha

-----Original Message-----
From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of Camille Fournier
Sent: Thursday, May 17, 2012 1:22 PM
To: user@zookeeper.apache.org
Subject: Re: Distribution Problems With Multiple Zookeeper Clients

The below is written assuming that all clients are seeing all events, but then they race to get a lock of some sort to do the work, and the same 10 are always getting the lock to do the work. If in fact not all of your clients are even getting all the events, that's another problem.

So here's what I think happens, although other devs that know this code better may prove me wrong. When a client connects to a server and creates a watch for a particular path, that member of the ZK quorum adds the watch for that path to a WatchManager. The WatchManager underlying has a HashSet that contains the watches for that path. When an event happens on that path, the server will iteratate through the watchers on that path and send them the watch notification.
It's quite possible that if your events are infrequent and/or your client servers aren't that loaded, what will happen is that the first few clients that registered that watch on each quorum member are likely to receive and process the watch first because their notifications were sent first, and will also always reset the watch for that path first if your code is written to reset the watch immediately upon receiving the notification.
They always win the race, and thus always do all the work.
In general, the indication is that you have more clients that you need available to do the work you want to do. If in fact you don't, perhaps the right thing to do is to investigate how you are handing off work and responding to watch notifications within your client. IE, if you have a client that is already doing some work and it gets a watch notification, it may not want to race for the lock. You may want to schedule trying to get the lock and then process more work in a limited thread pool, so that you know there's a limit of N tasks that can be being performed by each client and thus scope the max load on each server.

Does this make sense?

C

On Wed, May 16, 2012 at 3:46 PM, Narasimha Tadepalli < Narasimha.Tadepalli@pervasive.com> wrote:

> Hi Camille
>
> Sorry for the confusion. Yes it is watches. We have multiple clients 
> configured to watch on event change at server end. For example we have 
> a data directory of /data/345/text. All 30 clients keep watching for 
> event change under /data/345 directory if there is any change clients 
> need to process and read the child nodes. In this situation not all 
> clients not getting equal events. I am looking for a way to distribute 
> the load equally to all client instances. I hope I provided enough 
> clarification now or else let me know.
>
> Thanks
> Narasimha
>
> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of 
> Camille Fournier
> Sent: Tuesday, May 15, 2012 1:20 PM
> To: user@zookeeper.apache.org
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> I'm not sure what you mean by messages. Are you talking about watches? 
> Can you describe your clients in more detail?
>
> Thanks,
> Camille
>
> On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli < 
> Narasimha.Tadepalli@pervasive.com> wrote:
>
> > Dear All
> >
> > We have a situation where messages are not distributed equally when 
> > we have multiple clients listening to one zookeeper cluster. Say we 
> > have
> > 30 client instances listing to one cluster and when 1000 messages 
> > submitted in
> > 30 mins to cluster I assume each client approximately supposed to 
> > receive
> > 33 messages. But out of 30 only 10 client instances taking max load 
> > and rest of them getting very low volume of messages. Is it 
> > something can be configurable in zookeeper settings or need to 
> > implement some custom solution at our end to distribute equally? 
> > Before I reinvent the wheel looking around for some suggestions if 
> > any of you faced
> similar situation.
> >
> > Thanks
> > Narasimha
> >
> >
>
>

Re: Distribution Problems With Multiple Zookeeper Clients

Posted by Camille Fournier <ca...@apache.org>.

The below is written assuming that all clients are seeing all events, but
then they race to get a lock of some sort to do the work, and the same 10
are always getting the lock to do the work. If in fact not all of your
clients are even getting all the events, that's another problem.

So here's what I think happens, although other devs that know this code
better may prove me wrong. When a client connects to a server and creates a
watch for a particular path, that member of the ZK quorum adds the watch
for that path to a WatchManager. The WatchManager underlying has a HashSet
that contains the watches for that path. When an event happens on that
path, the server will iteratate through the watchers on that path and send
them the watch notification.
It's quite possible that if your events are infrequent and/or your client
servers aren't that loaded, what will happen is that the first few clients
that registered that watch on each quorum member are likely to receive and
process the watch first because their notifications were sent first, and
will also always reset the watch for that path first if your code is
written to reset the watch immediately upon receiving the notification.
They always win the race, and thus always do all the work.
In general, the indication is that you have more clients that you need
available to do the work you want to do. If in fact you don't, perhaps the
right thing to do is to investigate how you are handing off work and
responding to watch notifications within your client. IE, if you have a
client that is already doing some work and it gets a watch notification, it
may not want to race for the lock. You may want to schedule trying to get
the lock and then process more work in a limited thread pool, so that you
know there's a limit of N tasks that can be being performed by each client
and thus scope the max load on each server.

Does this make sense?

C

On Wed, May 16, 2012 at 3:46 PM, Narasimha Tadepalli <
Narasimha.Tadepalli@pervasive.com> wrote:

> Hi Camille
>
> Sorry for the confusion. Yes it is watches. We have multiple clients
> configured to watch on event change at server end. For example we have a
> data directory of /data/345/text. All 30 clients keep watching for event
> change under /data/345 directory if there is any change clients need to
> process and read the child nodes. In this situation not all clients not
> getting equal events. I am looking for a way to distribute the load equally
> to all client instances. I hope I provided enough clarification now or else
> let me know.
>
> Thanks
> Narasimha
>
> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of
> Camille Fournier
> Sent: Tuesday, May 15, 2012 1:20 PM
> To: user@zookeeper.apache.org
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
>
> I'm not sure what you mean by messages. Are you talking about watches? Can
> you describe your clients in more detail?
>
> Thanks,
> Camille
>
> On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli <
> Narasimha.Tadepalli@pervasive.com> wrote:
>
> > Dear All
> >
> > We have a situation where messages are not distributed equally when we
> > have multiple clients listening to one zookeeper cluster. Say we have
> > 30 client instances listing to one cluster and when 1000 messages
> > submitted in
> > 30 mins to cluster I assume each client approximately supposed to
> > receive
> > 33 messages. But out of 30 only 10 client instances taking max load
> > and rest of them getting very low volume of messages. Is it something
> > can be configurable in zookeeper settings or need to implement some
> > custom solution at our end to distribute equally? Before I reinvent
> > the wheel looking around for some suggestions if any of you faced
> similar situation.
> >
> > Thanks
> > Narasimha
> >
> >
>
>

RE: Distribution Problems With Multiple Zookeeper Clients

Posted by Narasimha Tadepalli <Na...@pervasive.com>.

Hi Camille

Sorry for the confusion. Yes it is watches. We have multiple clients configured to watch on event change at server end. For example we have a data directory of /data/345/text. All 30 clients keep watching for event change under /data/345 directory if there is any change clients need to process and read the child nodes. In this situation not all clients not getting equal events. I am looking for a way to distribute the load equally to all client instances. I hope I provided enough clarification now or else let me know.

Thanks
Narasimha

-----Original Message-----
From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of Camille Fournier
Sent: Tuesday, May 15, 2012 1:20 PM
To: user@zookeeper.apache.org
Subject: Re: Distribution Problems With Multiple Zookeeper Clients

I'm not sure what you mean by messages. Are you talking about watches? Can you describe your clients in more detail?

Thanks,
Camille

On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli < Narasimha.Tadepalli@pervasive.com> wrote:

> Dear All
>
> We have a situation where messages are not distributed equally when we 
> have multiple clients listening to one zookeeper cluster. Say we have 
> 30 client instances listing to one cluster and when 1000 messages 
> submitted in
> 30 mins to cluster I assume each client approximately supposed to 
> receive
> 33 messages. But out of 30 only 10 client instances taking max load 
> and rest of them getting very low volume of messages. Is it something 
> can be configurable in zookeeper settings or need to implement some 
> custom solution at our end to distribute equally? Before I reinvent 
> the wheel looking around for some suggestions if any of you faced similar situation.
>
> Thanks
> Narasimha
>
>

Re: Distribution Problems With Multiple Zookeeper Clients

Posted by Camille Fournier <ca...@apache.org>.

I'm not sure what you mean by messages. Are you talking about watches? Can
you describe your clients in more detail?

Thanks,
Camille

On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli <
Narasimha.Tadepalli@pervasive.com> wrote:

> Dear All
>
> We have a situation where messages are not distributed equally when we
> have multiple clients listening to one zookeeper cluster. Say we have 30
> client instances listing to one cluster and when 1000 messages submitted in
> 30 mins to cluster I assume each client approximately supposed to receive
> 33 messages. But out of 30 only 10 client instances taking max load and
> rest of them getting very low volume of messages. Is it something can be
> configurable in zookeeper settings or need to implement some custom
> solution at our end to distribute equally? Before I reinvent the wheel
> looking around for some suggestions if any of you faced similar situation.
>
> Thanks
> Narasimha
>
>