You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Irfan Hamid <ih...@salesforce.com> on 2016/06/22 20:18:46 UTC

Simulate expired connection for testing

Hi,

I'm testing some client code against a ZK cluster, and since it's local
testing I can use a single ZK instance as well if need be. I'm trying to
simulate an expired connection so I can validate my reconnect logic. I've
tried the following:

1. Single ZK server: connect from client, then kill -9 zkpid (this results
in disconnect). Try waiting a long time then restart ZK server, it
reconnects;
2. Multiple ZK server quorum: connect from client, then kill -9 zkpid (the
one client connected to). This causes the client to be disconnected and
then connects to a different quorum server, until there is no majority left
active (1 out of 3) and then it can't reconnect.

Is there an easy way for me to simulate an end-to-end scenario outside of
unit-tests that lets me see the behavior of my reconnect logic?

Thanks,
Irfan.

Re: election algorithm for the observer

Posted by Ryan Zhang <ya...@hotmail.com>.

Thanks, Flavio.  I’ve create a JIRA ticket  https://issues.apache.org/jira/browse/ZOOKEEPER-2461 <https://issues.apache.org/jira/browse/ZOOKEEPER-2461>

> On Jun 28, 2016, at 2:26 AM, Flavio Junqueira <fp...@apache.org> wrote:
> 
> Maybe it is better if we walk through an example in which you think we can have a better observer-specific implementation. Feel free to start a jira so that we can discuss it.
> 
> -Flavio
> 
>> On 27 Jun 2016, at 23:02, Ryan Zhang <ya...@hotmail.com> wrote:
>> 
>> Hi,  We (twitter) zookeeper cluster ( based on 3.4.x) recently encountered a prolonged leader election downtime because of the lead machine was accidentally wiped. After looking at the logs, we noticed that the observer was trying to connect to the wrong leader for a prolonged time. The specific bug of taking too long to connect to the lead was fixed in the trunk. However, I wonder why should the observer accept the lead sid when the notifications that it get are all from the “LOOKING” quorum machine?  The election algorithm (lookForLeader)  seems to be same for the observer or participants. Is that on purpose? Would it be good to have a specific observer logic to only act on notifications from “LEADING” and “FOLLOWING” machine? Thanks.
>

Re: election algorithm for the observer

Posted by Flavio Junqueira <fp...@apache.org>.

Maybe it is better if we walk through an example in which you think we can have a better observer-specific implementation. Feel free to start a jira so that we can discuss it.

-Flavio

> On 27 Jun 2016, at 23:02, Ryan Zhang <ya...@hotmail.com> wrote:
> 
> Hi,  We (twitter) zookeeper cluster ( based on 3.4.x) recently encountered a prolonged leader election downtime because of the lead machine was accidentally wiped. After looking at the logs, we noticed that the observer was trying to connect to the wrong leader for a prolonged time. The specific bug of taking too long to connect to the lead was fixed in the trunk. However, I wonder why should the observer accept the lead sid when the notifications that it get are all from the “LOOKING” quorum machine?  The election algorithm (lookForLeader)  seems to be same for the observer or participants. Is that on purpose? Would it be good to have a specific observer logic to only act on notifications from “LEADING” and “FOLLOWING” machine? Thanks.

election algorithm for the observer

Posted by Ryan Zhang <ya...@hotmail.com>.

Hi,  We (twitter) zookeeper cluster ( based on 3.4.x) recently encountered a prolonged leader election downtime because of the lead machine was accidentally wiped. After looking at the logs, we noticed that the observer was trying to connect to the wrong leader for a prolonged time. The specific bug of taking too long to connect to the lead was fixed in the trunk. However, I wonder why should the observer accept the lead sid when the notifications that it get are all from the “LOOKING” quorum machine?  The election algorithm (lookForLeader)  seems to be same for the observer or participants. Is that on purpose? Would it be good to have a specific observer logic to only act on notifications from “LEADING” and “FOLLOWING” machine? Thanks.

Re: Simulate expired connection for testing

Posted by Irfan Hamid <ih...@salesforce.com>.

Thanks Jordan and Guy. I'll look into both those options.

Regards,
Irfan.

On Thu, Jun 23, 2016 at 8:40 AM, Guy Laden <gu...@gmail.com> wrote:

> Hi Irfan,
> Does the ZooKeeper cluster that you want to run your clients against enable
> you to connect the servers via JMX?
> If yes then you could retrieve the list of clients connected to the server
> (via JMX) and then invoke terminateConnection or terminateSession on the
> connection of interest to you.
>
> https://github.com/apache/zookeeper/blob/branch-3.4/src/java/main/org/apache/zookeeper/server/ConnectionMXBean.java
> So, not exactly session expiration but might be of use.
> Regards,
> Guy
>
> On Thu, Jun 23, 2016 at 5:02 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
> > This is the old way to do it:
> >
> >
> >
> https://github.com/apache/curator/blob/master/curator-test/src/main/java/org/apache/curator/test/KillSession.java
> >
> > -Jordan
> >
> > > On Jun 23, 2016, at 8:46 AM, Irfan Hamid <ih...@salesforce.com>
> wrote:
> > >
> > > Thanks Jordan. I'm currently using 3.4.6 so was hoping for a solution
> > > there. Secondly, the solution you've described would be suitable for
> > > unit-testing since it requires access to the ZooKeeper client object,
> or
> > am
> > > I missing something and we can inject a session expiration for a
> > different
> > > session than the one the client object is in?
> > >
> > > Thanks,
> > > Irfan.
> > >
> > >
> > > On Wed, Jun 22, 2016 at 1:21 PM, Jordan Zimmerman <
> > > jordan@jordanzimmerman.com> wrote:
> > >
> > >> In ZK 3.5.x there is a method for this:
> > >>
> > >>        client.getTestable().injectSessionExpiration();
> > >>
> > >> -JZ
> > >>
> > >>> On Jun 22, 2016, at 3:18 PM, Irfan Hamid <ih...@salesforce.com>
> > wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> I'm testing some client code against a ZK cluster, and since it's
> local
> > >>> testing I can use a single ZK instance as well if need be. I'm trying
> > to
> > >>> simulate an expired connection so I can validate my reconnect logic.
> > I've
> > >>> tried the following:
> > >>>
> > >>> 1. Single ZK server: connect from client, then kill -9 zkpid (this
> > >> results
> > >>> in disconnect). Try waiting a long time then restart ZK server, it
> > >>> reconnects;
> > >>> 2. Multiple ZK server quorum: connect from client, then kill -9 zkpid
> > >> (the
> > >>> one client connected to). This causes the client to be disconnected
> and
> > >>> then connects to a different quorum server, until there is no
> majority
> > >> left
> > >>> active (1 out of 3) and then it can't reconnect.
> > >>>
> > >>> Is there an easy way for me to simulate an end-to-end scenario
> outside
> > of
> > >>> unit-tests that lets me see the behavior of my reconnect logic?
> > >>>
> > >>> Thanks,
> > >>> Irfan.
> > >>
> > >>
> >
> >
>

Re: Simulate expired connection for testing

Posted by Guy Laden <gu...@gmail.com>.

Hi Irfan,
Does the ZooKeeper cluster that you want to run your clients against enable
you to connect the servers via JMX?
If yes then you could retrieve the list of clients connected to the server
(via JMX) and then invoke terminateConnection or terminateSession on the
connection of interest to you.
https://github.com/apache/zookeeper/blob/branch-3.4/src/java/main/org/apache/zookeeper/server/ConnectionMXBean.java
So, not exactly session expiration but might be of use.
Regards,
Guy

On Thu, Jun 23, 2016 at 5:02 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> This is the old way to do it:
>
>
> https://github.com/apache/curator/blob/master/curator-test/src/main/java/org/apache/curator/test/KillSession.java
>
> -Jordan
>
> > On Jun 23, 2016, at 8:46 AM, Irfan Hamid <ih...@salesforce.com> wrote:
> >
> > Thanks Jordan. I'm currently using 3.4.6 so was hoping for a solution
> > there. Secondly, the solution you've described would be suitable for
> > unit-testing since it requires access to the ZooKeeper client object, or
> am
> > I missing something and we can inject a session expiration for a
> different
> > session than the one the client object is in?
> >
> > Thanks,
> > Irfan.
> >
> >
> > On Wed, Jun 22, 2016 at 1:21 PM, Jordan Zimmerman <
> > jordan@jordanzimmerman.com> wrote:
> >
> >> In ZK 3.5.x there is a method for this:
> >>
> >>        client.getTestable().injectSessionExpiration();
> >>
> >> -JZ
> >>
> >>> On Jun 22, 2016, at 3:18 PM, Irfan Hamid <ih...@salesforce.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I'm testing some client code against a ZK cluster, and since it's local
> >>> testing I can use a single ZK instance as well if need be. I'm trying
> to
> >>> simulate an expired connection so I can validate my reconnect logic.
> I've
> >>> tried the following:
> >>>
> >>> 1. Single ZK server: connect from client, then kill -9 zkpid (this
> >> results
> >>> in disconnect). Try waiting a long time then restart ZK server, it
> >>> reconnects;
> >>> 2. Multiple ZK server quorum: connect from client, then kill -9 zkpid
> >> (the
> >>> one client connected to). This causes the client to be disconnected and
> >>> then connects to a different quorum server, until there is no majority
> >> left
> >>> active (1 out of 3) and then it can't reconnect.
> >>>
> >>> Is there an easy way for me to simulate an end-to-end scenario outside
> of
> >>> unit-tests that lets me see the behavior of my reconnect logic?
> >>>
> >>> Thanks,
> >>> Irfan.
> >>
> >>
>
>

Re: Simulate expired connection for testing

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

This is the old way to do it:

https://github.com/apache/curator/blob/master/curator-test/src/main/java/org/apache/curator/test/KillSession.java

-Jordan

> On Jun 23, 2016, at 8:46 AM, Irfan Hamid <ih...@salesforce.com> wrote:
> 
> Thanks Jordan. I'm currently using 3.4.6 so was hoping for a solution
> there. Secondly, the solution you've described would be suitable for
> unit-testing since it requires access to the ZooKeeper client object, or am
> I missing something and we can inject a session expiration for a different
> session than the one the client object is in?
> 
> Thanks,
> Irfan.
> 
> 
> On Wed, Jun 22, 2016 at 1:21 PM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
> 
>> In ZK 3.5.x there is a method for this:
>> 
>>        client.getTestable().injectSessionExpiration();
>> 
>> -JZ
>> 
>>> On Jun 22, 2016, at 3:18 PM, Irfan Hamid <ih...@salesforce.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm testing some client code against a ZK cluster, and since it's local
>>> testing I can use a single ZK instance as well if need be. I'm trying to
>>> simulate an expired connection so I can validate my reconnect logic. I've
>>> tried the following:
>>> 
>>> 1. Single ZK server: connect from client, then kill -9 zkpid (this
>> results
>>> in disconnect). Try waiting a long time then restart ZK server, it
>>> reconnects;
>>> 2. Multiple ZK server quorum: connect from client, then kill -9 zkpid
>> (the
>>> one client connected to). This causes the client to be disconnected and
>>> then connects to a different quorum server, until there is no majority
>> left
>>> active (1 out of 3) and then it can't reconnect.
>>> 
>>> Is there an easy way for me to simulate an end-to-end scenario outside of
>>> unit-tests that lets me see the behavior of my reconnect logic?
>>> 
>>> Thanks,
>>> Irfan.
>> 
>>

Re: Simulate expired connection for testing

Posted by Irfan Hamid <ih...@salesforce.com>.

Thanks Jordan. I'm currently using 3.4.6 so was hoping for a solution
there. Secondly, the solution you've described would be suitable for
unit-testing since it requires access to the ZooKeeper client object, or am
I missing something and we can inject a session expiration for a different
session than the one the client object is in?

Thanks,
Irfan.


On Wed, Jun 22, 2016 at 1:21 PM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> In ZK 3.5.x there is a method for this:
>
>         client.getTestable().injectSessionExpiration();
>
> -JZ
>
> > On Jun 22, 2016, at 3:18 PM, Irfan Hamid <ih...@salesforce.com> wrote:
> >
> > Hi,
> >
> > I'm testing some client code against a ZK cluster, and since it's local
> > testing I can use a single ZK instance as well if need be. I'm trying to
> > simulate an expired connection so I can validate my reconnect logic. I've
> > tried the following:
> >
> > 1. Single ZK server: connect from client, then kill -9 zkpid (this
> results
> > in disconnect). Try waiting a long time then restart ZK server, it
> > reconnects;
> > 2. Multiple ZK server quorum: connect from client, then kill -9 zkpid
> (the
> > one client connected to). This causes the client to be disconnected and
> > then connects to a different quorum server, until there is no majority
> left
> > active (1 out of 3) and then it can't reconnect.
> >
> > Is there an easy way for me to simulate an end-to-end scenario outside of
> > unit-tests that lets me see the behavior of my reconnect logic?
> >
> > Thanks,
> > Irfan.
>
>

Re: Simulate expired connection for testing

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

In ZK 3.5.x there is a method for this:

	client.getTestable().injectSessionExpiration();

-JZ

> On Jun 22, 2016, at 3:18 PM, Irfan Hamid <ih...@salesforce.com> wrote:
> 
> Hi,
> 
> I'm testing some client code against a ZK cluster, and since it's local
> testing I can use a single ZK instance as well if need be. I'm trying to
> simulate an expired connection so I can validate my reconnect logic. I've
> tried the following:
> 
> 1. Single ZK server: connect from client, then kill -9 zkpid (this results
> in disconnect). Try waiting a long time then restart ZK server, it
> reconnects;
> 2. Multiple ZK server quorum: connect from client, then kill -9 zkpid (the
> one client connected to). This causes the client to be disconnected and
> then connects to a different quorum server, until there is no majority left
> active (1 out of 3) and then it can't reconnect.
> 
> Is there an easy way for me to simulate an end-to-end scenario outside of
> unit-tests that lets me see the behavior of my reconnect logic?
> 
> Thanks,
> Irfan.