You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Avinash Lakshman <av...@gmail.com> on 2010/04/19 07:00:09 UTC

Recovery protocol

Hi All

Let's say I have a Zookeeper cluster with nodes A, B, C, D and E.  Let's
assume A is the leader. Now let us assume after a few writes have taken
place the node B crashes. When it comes back up what is the recovery
protocol? Does it join the cluster immediately and start taking writes and
reads while it is catching up on writes that it lost when it was down?

Cheers
Avinash

Re: Recovery issue - how to debug?

Posted by Vishal K <vi...@gmail.com>.

Hi Hao,

How are you determining whether a ZK server has received the writes or not?

Regards,
-Vishal

On Mon, Apr 19, 2010 at 1:54 AM, Dr Hao He <he...@softtouchit.com> wrote:

> I have zookeeper cluster E1 with 3 nodes A,B, and C.
>
> I stopped C and did some writes on E1.  Both A and B received the writes.
>  I then started C and after a short while, C also received the writes.
>
> All seem to go well so I replicated the setup to another cluster E2 with
> exactly 3 nodes: A2, B2, and C2.
>
> I stopped C2 and did some writes on E2.  A2 received the writes.  I then
> started C2.  However, no matter how long I wait, C2 never received the
> writes.
>
> I then did more writes on E2.  Then C2 can receive all the writes including
> the old writes when it was down.
>
> How do I find out what was wrong withe E2 setup?
>
> I am running 3.2.2 on all nodes.
>
> Regards,
>
> Dr Hao He
>
> XPE - the truly SOA platform
>
> he@softtouchit.com
> http://softtouchit.com
>
>
>

Re: Recovery issue - how to debug?

Posted by Dr Hao He <he...@softtouchit.com>.

hi, All,

Thanks folks.  It turned out that zookeeper did send messages to all nodes.  The issue was not caused by zookeeper.

Regards,

Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com



On 20/04/2010, at 7:15 AM, Ted Dunning wrote:

> Can you attach the screen shot to the JIRA issue?  The mailing list strips
> these things.
> 
> On Mon, Apr 19, 2010 at 1:18 PM, Travis Crawford
> <tr...@gmail.com>wrote:
> 
>> Filed:
>> 
>>   https://issues.apache.org/jira/browse/ZOOKEEPER-744
>> 
>> Attached is a screenshot of some JMX output in Ganglia - its currently
>> implemented using a -javaagent tool I happened to find. Having a
>> simple non-java way to fetch monitoring stats and publish to an
>> external monitoring system would be awesome, and probably reusable by
>> others.
>>

Re: Recovery issue - how to debug?

Posted by Travis Crawford <tr...@gmail.com>.

On Mon, Apr 19, 2010 at 2:15 PM, Ted Dunning <te...@gmail.com> wrote:
> Can you attach the screen shot to the JIRA issue?  The mailing list strips
> these things.

Oops. Updated jira:

https://issues.apache.org/jira/browse/ZOOKEEPER-744

--travis


>
> On Mon, Apr 19, 2010 at 1:18 PM, Travis Crawford
> <tr...@gmail.com>wrote:
>
>> Filed:
>>
>>    https://issues.apache.org/jira/browse/ZOOKEEPER-744
>>
>> Attached is a screenshot of some JMX output in Ganglia - its currently
>> implemented using a -javaagent tool I happened to find. Having a
>> simple non-java way to fetch monitoring stats and publish to an
>> external monitoring system would be awesome, and probably reusable by
>> others.
>>
>

Re: Recovery issue - how to debug?

Posted by Ted Dunning <te...@gmail.com>.

Can you attach the screen shot to the JIRA issue?  The mailing list strips
these things.

On Mon, Apr 19, 2010 at 1:18 PM, Travis Crawford
<tr...@gmail.com>wrote:

> Filed:
>
>    https://issues.apache.org/jira/browse/ZOOKEEPER-744
>
> Attached is a screenshot of some JMX output in Ganglia - its currently
> implemented using a -javaagent tool I happened to find. Having a
> simple non-java way to fetch monitoring stats and publish to an
> external monitoring system would be awesome, and probably reusable by
> others.
>

Re: Recovery issue - how to debug?

Posted by Travis Crawford <tr...@gmail.com>.

On Mon, Apr 19, 2010 at 12:10 PM, Patrick Hunt <ph...@apache.org> wrote:
>
> On 04/19/2010 11:55 AM, Travis Crawford wrote:
>>
>> To double-check, is the best way to tell a ZK instance is up-to-date
>> by looking at its ``LastZxid`` value? For example:
>>
>> $ java -jar /home/travis/cmdline-jmxclient-0.10.5.jar - localhost:8081
>>
>> org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Follower,name3=InMemoryDataTree
>> LastZxid
>> 04/19/2010 18:42:45 +0000 org.archive.jmx.Client LastZxid: 0xf000420ad
>>
>> I believe the ``LastZxid`` for each ZK instance needs to be compared
>> to the leader to see how far behind it is.
>
> Well the server will only be "active" once it joins the quorum (usually as a
> follower) so if it's having trouble joining that data might not be
> available. But yes, once the server is active then you could examine the
> lastzxid to determine if/howmuch it's lagging the leader (quorum).
>
>>
>>
>> It would be a lot easier from the operations perspective if the leader
>> explicitly published some health stats:
>>
>> (a) Count of instances in the ensemble.
>> (b) Count of up-to-date instances in the ensemble.
>>
>> This would greatly simplify monitoring&  alerting - when an instance
>> falls behind one could configure their monitoring system to let
>> someone know and take a look at the logs.
>
> That's a great idea. Please enter a JIRA for this - a new 4 letter word and
> JMX support. It would also be a great starter project for someone interested
> in becoming more familiar with the server code.

Filed:

    https://issues.apache.org/jira/browse/ZOOKEEPER-744

Attached is a screenshot of some JMX output in Ganglia - its currently
implemented using a -javaagent tool I happened to find. Having a
simple non-java way to fetch monitoring stats and publish to an
external monitoring system would be awesome, and probably reusable by
others.

--travis


>
> Patrick
>
>
>>
>> --travis
>>
>>
>>
>>
>> On Mon, Apr 19, 2010 at 10:14 AM, Patrick Hunt<ph...@apache.org>  wrote:
>>>
>>> Usually the server logs will shed light on such issues. If we had access
>>> to
>>> them it might be easier to speculate.
>>>
>>> Patrick
>>>
>>> On 04/19/2010 09:22 AM, Mahadev Konar wrote:
>>>>
>>>> Hi Hao,
>>>>   As Vishal already asked, how are you determining if the writes are
>>>> being
>>>> received?
>>>>  Also, what was the status of C2 when you checked for these writes? Do
>>>> you
>>>> have the output of echo "stat" | nc localhost port?
>>>>
>>>> How long did you wait when you say that C2 did not received the writes?
>>>> What
>>>> was the status of C2 (again echo "stat" | nc localhost port) when you
>>>> saw
>>>> the C2 had received the writes?
>>>>
>>>> Thanks
>>>> mahadev
>>>>
>>>>
>>>> On 4/18/10 10:54 PM, "Dr Hao He"<he...@softtouchit.com>    wrote:
>>>>
>>>>> I have zookeeper cluster E1 with 3 nodes A,B, and C.
>>>>>
>>>>> I stopped C and did some writes on E1.  Both A and B received the
>>>>> writes.
>>>>>  I
>>>>> then started C and after a short while, C also received the writes.
>>>>>
>>>>> All seem to go well so I replicated the setup to another cluster E2
>>>>> with
>>>>> exactly 3 nodes: A2, B2, and C2.
>>>>>
>>>>> I stopped C2 and did some writes on E2.  A2 received the writes.  I
>>>>> then
>>>>> started C2.  However, no matter how long I wait, C2 never received the
>>>>> writes.
>>>>>
>>>>> I then did more writes on E2.  Then C2 can receive all the writes
>>>>> including
>>>>> the old writes when it was down.
>>>>>
>>>>> How do I find out what was wrong withe E2 setup?
>>>>>
>>>>> I am running 3.2.2 on all nodes.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Dr Hao He
>>>>>
>>>>> XPE - the truly SOA platform
>>>>>
>>>>> he@softtouchit.com
>>>>> http://softtouchit.com
>>>>>
>>>>>
>>>>
>>>
>

Re: Recovery issue - how to debug?

Posted by Patrick Hunt <ph...@apache.org>.

On 04/19/2010 11:55 AM, Travis Crawford wrote:
> To double-check, is the best way to tell a ZK instance is up-to-date
> by looking at its ``LastZxid`` value? For example:
>
> $ java -jar /home/travis/cmdline-jmxclient-0.10.5.jar - localhost:8081
> org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Follower,name3=InMemoryDataTree
> LastZxid
> 04/19/2010 18:42:45 +0000 org.archive.jmx.Client LastZxid: 0xf000420ad
>
> I believe the ``LastZxid`` for each ZK instance needs to be compared
> to the leader to see how far behind it is.

Well the server will only be "active" once it joins the quorum (usually 
as a follower) so if it's having trouble joining that data might not be 
available. But yes, once the server is active then you could examine the 
lastzxid to determine if/howmuch it's lagging the leader (quorum).

>
>
> It would be a lot easier from the operations perspective if the leader
> explicitly published some health stats:
>
> (a) Count of instances in the ensemble.
> (b) Count of up-to-date instances in the ensemble.
>
> This would greatly simplify monitoring&  alerting - when an instance
> falls behind one could configure their monitoring system to let
> someone know and take a look at the logs.

That's a great idea. Please enter a JIRA for this - a new 4 letter word 
and JMX support. It would also be a great starter project for someone 
interested in becoming more familiar with the server code.

Patrick


>
> --travis
>
>
>
>
> On Mon, Apr 19, 2010 at 10:14 AM, Patrick Hunt<ph...@apache.org>  wrote:
>> Usually the server logs will shed light on such issues. If we had access to
>> them it might be easier to speculate.
>>
>> Patrick
>>
>> On 04/19/2010 09:22 AM, Mahadev Konar wrote:
>>>
>>> Hi Hao,
>>>    As Vishal already asked, how are you determining if the writes are being
>>> received?
>>>   Also, what was the status of C2 when you checked for these writes? Do you
>>> have the output of echo "stat" | nc localhost port?
>>>
>>> How long did you wait when you say that C2 did not received the writes?
>>> What
>>> was the status of C2 (again echo "stat" | nc localhost port) when you saw
>>> the C2 had received the writes?
>>>
>>> Thanks
>>> mahadev
>>>
>>>
>>> On 4/18/10 10:54 PM, "Dr Hao He"<he...@softtouchit.com>    wrote:
>>>
>>>> I have zookeeper cluster E1 with 3 nodes A,B, and C.
>>>>
>>>> I stopped C and did some writes on E1.  Both A and B received the writes.
>>>>   I
>>>> then started C and after a short while, C also received the writes.
>>>>
>>>> All seem to go well so I replicated the setup to another cluster E2 with
>>>> exactly 3 nodes: A2, B2, and C2.
>>>>
>>>> I stopped C2 and did some writes on E2.  A2 received the writes.  I then
>>>> started C2.  However, no matter how long I wait, C2 never received the
>>>> writes.
>>>>
>>>> I then did more writes on E2.  Then C2 can receive all the writes
>>>> including
>>>> the old writes when it was down.
>>>>
>>>> How do I find out what was wrong withe E2 setup?
>>>>
>>>> I am running 3.2.2 on all nodes.
>>>>
>>>> Regards,
>>>>
>>>> Dr Hao He
>>>>
>>>> XPE - the truly SOA platform
>>>>
>>>> he@softtouchit.com
>>>> http://softtouchit.com
>>>>
>>>>
>>>
>>

Re: Recovery issue - how to debug?

Posted by Ted Dunning <te...@gmail.com>.

Travis, have you seen the ruok command?

It should be pretty easy to add other stats to that.

On Mon, Apr 19, 2010 at 11:55 AM, Travis Crawford
<tr...@gmail.com>wrote:

> It would be a lot easier from the operations perspective if the leader
> explicitly published some health stats:
>
> (a) Count of instances in the ensemble.
> (b) Count of up-to-date instances in the ensemble.
>

Re: Recovery issue - how to debug?

Posted by Travis Crawford <tr...@gmail.com>.

To double-check, is the best way to tell a ZK instance is up-to-date
by looking at its ``LastZxid`` value? For example:

$ java -jar /home/travis/cmdline-jmxclient-0.10.5.jar - localhost:8081
org.apache.ZooKeeperService:name0=ReplicatedServer_id1,name1=replica.1,name2=Follower,name3=InMemoryDataTree
LastZxid
04/19/2010 18:42:45 +0000 org.archive.jmx.Client LastZxid: 0xf000420ad

I believe the ``LastZxid`` for each ZK instance needs to be compared
to the leader to see how far behind it is.


It would be a lot easier from the operations perspective if the leader
explicitly published some health stats:

(a) Count of instances in the ensemble.
(b) Count of up-to-date instances in the ensemble.

This would greatly simplify monitoring & alerting - when an instance
falls behind one could configure their monitoring system to let
someone know and take a look at the logs.

--travis




On Mon, Apr 19, 2010 at 10:14 AM, Patrick Hunt <ph...@apache.org> wrote:
> Usually the server logs will shed light on such issues. If we had access to
> them it might be easier to speculate.
>
> Patrick
>
> On 04/19/2010 09:22 AM, Mahadev Konar wrote:
>>
>> Hi Hao,
>>   As Vishal already asked, how are you determining if the writes are being
>> received?
>>  Also, what was the status of C2 when you checked for these writes? Do you
>> have the output of echo "stat" | nc localhost port?
>>
>> How long did you wait when you say that C2 did not received the writes?
>> What
>> was the status of C2 (again echo "stat" | nc localhost port) when you saw
>> the C2 had received the writes?
>>
>> Thanks
>> mahadev
>>
>>
>> On 4/18/10 10:54 PM, "Dr Hao He"<he...@softtouchit.com>  wrote:
>>
>>> I have zookeeper cluster E1 with 3 nodes A,B, and C.
>>>
>>> I stopped C and did some writes on E1.  Both A and B received the writes.
>>>  I
>>> then started C and after a short while, C also received the writes.
>>>
>>> All seem to go well so I replicated the setup to another cluster E2 with
>>> exactly 3 nodes: A2, B2, and C2.
>>>
>>> I stopped C2 and did some writes on E2.  A2 received the writes.  I then
>>> started C2.  However, no matter how long I wait, C2 never received the
>>> writes.
>>>
>>> I then did more writes on E2.  Then C2 can receive all the writes
>>> including
>>> the old writes when it was down.
>>>
>>> How do I find out what was wrong withe E2 setup?
>>>
>>> I am running 3.2.2 on all nodes.
>>>
>>> Regards,
>>>
>>> Dr Hao He
>>>
>>> XPE - the truly SOA platform
>>>
>>> he@softtouchit.com
>>> http://softtouchit.com
>>>
>>>
>>
>

Re: Recovery issue - how to debug?

Posted by Patrick Hunt <ph...@apache.org>.

Usually the server logs will shed light on such issues. If we had access 
to them it might be easier to speculate.

Patrick

On 04/19/2010 09:22 AM, Mahadev Konar wrote:
> Hi Hao,
>    As Vishal already asked, how are you determining if the writes are being
> received?
>   Also, what was the status of C2 when you checked for these writes? Do you
> have the output of echo "stat" | nc localhost port?
>
> How long did you wait when you say that C2 did not received the writes? What
> was the status of C2 (again echo "stat" | nc localhost port) when you saw
> the C2 had received the writes?
>
> Thanks
> mahadev
>
>
> On 4/18/10 10:54 PM, "Dr Hao He"<he...@softtouchit.com>  wrote:
>
>> I have zookeeper cluster E1 with 3 nodes A,B, and C.
>>
>> I stopped C and did some writes on E1.  Both A and B received the writes.  I
>> then started C and after a short while, C also received the writes.
>>
>> All seem to go well so I replicated the setup to another cluster E2 with
>> exactly 3 nodes: A2, B2, and C2.
>>
>> I stopped C2 and did some writes on E2.  A2 received the writes.  I then
>> started C2.  However, no matter how long I wait, C2 never received the writes.
>>
>> I then did more writes on E2.  Then C2 can receive all the writes including
>> the old writes when it was down.
>>
>> How do I find out what was wrong withe E2 setup?
>>
>> I am running 3.2.2 on all nodes.
>>
>> Regards,
>>
>> Dr Hao He
>>
>> XPE - the truly SOA platform
>>
>> he@softtouchit.com
>> http://softtouchit.com
>>
>>
>

Re: Recovery issue - how to debug?

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

Hi Hao,
  As Vishal already asked, how are you determining if the writes are being
received? 
 Also, what was the status of C2 when you checked for these writes? Do you
have the output of echo "stat" | nc localhost port?

How long did you wait when you say that C2 did not received the writes? What
was the status of C2 (again echo "stat" | nc localhost port) when you saw
the C2 had received the writes?

Thanks
mahadev

On 4/18/10 10:54 PM, "Dr Hao He" <he...@softtouchit.com> wrote:

> I have zookeeper cluster E1 with 3 nodes A,B, and C.
> 
> I stopped C and did some writes on E1.  Both A and B received the writes.  I
> then started C and after a short while, C also received the writes.
> 
> All seem to go well so I replicated the setup to another cluster E2 with
> exactly 3 nodes: A2, B2, and C2.
> 
> I stopped C2 and did some writes on E2.  A2 received the writes.  I then
> started C2.  However, no matter how long I wait, C2 never received the writes.
> 
> I then did more writes on E2.  Then C2 can receive all the writes including
> the old writes when it was down.
> 
> How do I find out what was wrong withe E2 setup?
> 
> I am running 3.2.2 on all nodes.
> 
> Regards,
> 
> Dr Hao He
> 
> XPE - the truly SOA platform
> 
> he@softtouchit.com
> http://softtouchit.com
> 
>

Recovery issue - how to debug?

Posted by Dr Hao He <he...@softtouchit.com>.

I have zookeeper cluster E1 with 3 nodes A,B, and C.

I stopped C and did some writes on E1.  Both A and B received the writes.  I then started C and after a short while, C also received the writes.

All seem to go well so I replicated the setup to another cluster E2 with exactly 3 nodes: A2, B2, and C2.

I stopped C2 and did some writes on E2.  A2 received the writes.  I then started C2.  However, no matter how long I wait, C2 never received the writes.

I then did more writes on E2.  Then C2 can receive all the writes including the old writes when it was down.  

How do I find out what was wrong withe E2 setup? 

I am running 3.2.2 on all nodes.

Regards,

Dr Hao He

XPE - the truly SOA platform

he@softtouchit.com
http://softtouchit.com

Re: Recovery protocol

Posted by Ted Dunning <te...@gmail.com>.

No.  It catches up first.  Then it joins in normal operations.

On Sun, Apr 18, 2010 at 10:00 PM, Avinash Lakshman <
avinash.lakshman@gmail.com> wrote:

> Hi All
>
> Let's say I have a Zookeeper cluster with nodes A, B, C, D and E.  Let's
> assume A is the leader. Now let us assume after a few writes have taken
> place the node B crashes. When it comes back up what is the recovery
> protocol? Does it join the cluster immediately and start taking writes and
> reads while it is catching up on writes that it lost when it was down?
>
> Cheers
> Avinash
>