You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Deepak Jagtap <de...@maxta.com> on 2014/02/26 04:04:59 UTC

New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Hi,

I replacing one of the zookeeper server from 3 node quorum.
Initially all zookeeper serves were running 3.5.0.1515976 version.
I successfully replaced Node3 with newer version 3.5.0.1551730.
When I am trying to replace Node2 with the same zookeeper version.
I couldn't start zookeeper server on Node2 as it is continuously stuck in
leader election loop printing  following messages:

2014-02-26 02:45:23,709 [myid:3] - INFO
 [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
Notification time out: 60000
2014-02-26 02:45:23,710 [myid:3] - INFO
 [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server
identifier, so dropping the connection: (5, 3)
2014-02-26 02:45:23,712 [myid:3] - INFO
 [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3
(n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x0
(n.peerEPoch), LOOKING (my state)1 (n.config version)


Network connections and configuration of the node being upgraded are fine.
The other 2 nodes in the quorum are fine and serving the request.

Any idea what might be causing this?

Thanks & Regards,
Deepak

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Michi Mutsuzaki <mi...@cs.stanford.edu>.
Hi Deepak,

I can't say for sure. If you have the log file from the test run in
which both StandAloneDisabledTest and QuorumTest, somebody might be
able to spot why QuorumTest is failing. Could you upload the log file
to ZOOKEEPER-1870?

https://issues.apache.org/jira/browse/ZOOKEEPER-1870

Thanks!
--Michi


On Thu, Mar 13, 2014 at 4:20 PM, Deepak Jagtap <de...@maxta.com> wrote:
> Hello Michi,
>
> I observed following while testing patch for 1805 against trunk revision
> 1574686.
> I ran " ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean test tar"
> against trunk revision 1574686.
> Build failed as StandAloneDisabledTest failed.
>
> After applying 1805 against 1574686 build failed with following test failed:
> 1. StandAloneDisabledTest
> 2. QuorumTest
>
> When I only run QuorumTest against this (1574686 + 1805 patch) it succeeds.
>  (using "ant -Dtestcase=QuorumTest test")
>
> Please advise, if I should assume build is successful except
> StandAloneDisabled test?
>
> Thanks & Regards,
> Deepak
>
>
> On Mon, Mar 10, 2014 at 6:11 PM, Deepak Jagtap <de...@maxta.com>
> wrote:
>>
>> Thanks Michi!
>>
>>
>> On Mon, Mar 10, 2014 at 5:40 PM, Michi Mutsuzaki <mi...@cs.stanford.edu>
>> wrote:
>>>
>>> StandaloneDisabledTest.startSingleServerTest seems to be failing from
>>> the same issue. We should fix this soon.
>>>
>>> https://issues.apache.org/jira/browse/ZOOKEEPER-1870
>>>
>>> On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap <de...@maxta.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > Another query regarding 1805.
>>> > I am observing zookeeper rolling upgrade is always succeeds when I
>>> > apply
>>> > 1805 patch.
>>> > When I apply both 1810 and 1805 patch rolling upgrade fails due to an
>>> > issue mentioned earlier.
>>> >
>>> > Please advise, if it's fine to use only patch 1805 for the trunk?
>>> >
>>> > Thanks & Regards,
>>> > Deepak
>>> >
>>> >
>>> > On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap
>>> > <de...@maxta.com>wrote:
>>> >
>>> >> Hi German,
>>> >>
>>> >> I have applied patch 1810 and 1805 against trunk revision 1574686
>>> >> (recent
>>> >> revision against which 1810 patch build succeeded).
>>> >> But observing following error in the zookeeper log on the new node
>>> >> joining
>>> >> quorum:
>>> >>
>>> >> 2014-03-10 21:11:25,126 [myid:1] - INFO
>>> >>  [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server
>>> >> identifier, so dropping the connection: (3, 1)
>>> >> 2014-03-10 21:11:25,127 [myid:1] - INFO  [/169.254.44.1:3888
>>> >> :QuorumCnxManager$Listener@540] - Received connection request /
>>> >> 169.254.44.3:51507
>>> >> 2014-03-10 21:11:25,193 [myid:1] - ERROR
>>> >> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread
>>> >> Thread[WorkerReceiver[myid=1],5,main] died
>>> >> java.lang.OutOfMemoryError: Java heap space
>>> >>    at
>>> >>
>>> >> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273)
>>> >>    at java.lang.Thread.run(Unknown Source)
>>> >>
>>> >> Followed by these messages getting printed repeatedly:
>>> >> 2014-03-10 21:11:25,328 [myid:1] - INFO
>>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>>> >> Notification time out: 400
>>> >> 2014-03-10 21:11:25,729 [myid:1] - INFO
>>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>>> >> Notification time out: 800
>>> >> 2014-03-10 21:11:26,530 [myid:1] - INFO
>>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>>> >> Notification time out: 1600
>>> >> 2014-03-10 21:11:28,131 [myid:1] - INFO
>>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>>> >> Notification time out: 3200
>>> >> 2014-03-10 21:11:31,332 [myid:1] - INFO
>>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>>> >> Notification time out: 6400
>>> >>
>>> >> Thanks & Reagrds,
>>> >> Deepak
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap
>>> >> <de...@maxta.com>wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> I have applied only 1805 patch, not 1810.
>>> >>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5).
>>> >>> It was failing very consistently in our environment, and after 1805
>>> >>> patch
>>> >>> it went smoothly.
>>> >>>
>>> >>> Regards,
>>> >>> Deepak
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco <
>>> >>> german.blanco.blanco@gmail.com> wrote:
>>> >>>
>>> >>>> Hello,
>>> >>>>
>>> >>>> do you mean ZOOKEEPER-1810 patch?
>>> >>>> That one alone doesn't solve the problem. On the other hand, the
>>> >>>> problem
>>> >>>> doesn't happen always, so after a rolling start it might get solved.
>>> >>>> We need 1818 as well, but it is easier to go step by step and get
>>> >>>> 1810 in
>>> >>>> trunk first.
>>> >>>> I hope that as soon as 3.4.6 is out this might get some attention.
>>> >>>>
>>> >>>> Regards,
>>> >>>>
>>> >>>> German.
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap
>>> >>>> <deepak.jagtap@maxta.com
>>> >>>> >wrote:
>>> >>>>
>>> >>>> > Hi,
>>> >>>> >
>>> >>>> > Please ignore the previous comment, I used wrong jar file and
>>> >>>> > hence
>>> >>>> rolling
>>> >>>> > upgrade failed.
>>> >>>> > After applying patch for bug  on zookeeper-3.5.0.1562289
>>> >>>> > revision, rolling upgrade went fine.
>>> >>>> >
>>> >>>> > I have patched in house zookeeper version, but it would be
>>> >>>> > convenient
>>> >>>> if we
>>> >>>> > apply patch on trunk and use the latest trunk.
>>> >>>> > Please advise if I can apply the patch on the trunk and test it
>>> >>>> > for
>>> >>>> you.
>>> >>>> >
>>> >>>> > Thanks & Regards,
>>> >>>> > Deepak
>>> >>>> >
>>> >>>> >
>>> >>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <
>>> >>>> deepak.jagtap@maxta.com
>>> >>>> > >wrote:
>>> >>>> >
>>> >>>> > > Hi German,
>>> >>>> > >
>>> >>>> > > I tried applying patch for 1805 but problem still persists.
>>> >>>> > > Following are the notification messages logged repeatedly by the
>>> >>>> > > node
>>> >>>> > > which fails to join the quorum:
>>> >>>> > >
>>> >>>> > >
>>> >>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO
>>> >>>> > >
>>> >>>> > > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
>>> >>>> > > Notification time out: 51200
>>> >>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO
>>> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification:
>>> >>>> > > 2
>>> >>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2
>>> >>>> (n.sid),
>>> >>>> > 0x0
>>> >>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>> >>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO
>>> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification:
>>> >>>> > > 3
>>> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING
>>> >>>> (n.state), 1
>>> >>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config
>>> >>>> > > version)
>>> >>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO
>>> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification:
>>> >>>> > > 3
>>> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round),
>>> >>>> LEADING
>>> >>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1
>>> >>>> (n.config
>>> >>>> > > version)
>>> >>>> > >
>>> >>>> > >
>>> >>>> > >
>>> >>>> > > Patch for 1732 is already included in the trunk.
>>> >>>> > >
>>> >>>> > >
>>> >>>> > > Thanks & Regards,
>>> >>>> > > Deepak
>>> >>>> > >
>>> >>>> > >
>>> >>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <
>>> >>>> deepak.jagtap@maxta.com
>>> >>>> > >wrote:
>>> >>>> > >
>>> >>>> > >> Hi Flavio, German,
>>> >>>> > >>
>>> >>>> > >> Since this fix is critical for zookeeper rolling upgrade is it
>>> >>>> > >> ok
>>> >>>> if I
>>> >>>> > >> apply this patch to 3.5.0 trunk?
>>> >>>> > >> Is it straightforward to apply this patch to trunk?
>>> >>>> > >>
>>> >>>> > >> Thanks & Regards,
>>> >>>> > >> Deepak
>>> >>>> > >>
>>> >>>> > >>
>>> >>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
>>> >>>> > deepak.jagtap@maxta.com>wrote:
>>> >>>> > >>
>>> >>>> > >>> Thanks German!
>>> >>>> > >>> Just wondering is there any chance that this patch may be
>>> >>>> > >>> applied
>>> >>>> to
>>> >>>> > >>> trunk in near future?
>>> >>>> > >>> If it's fine with you guys, I would be more than happy to
>>> >>>> > >>> apply the
>>> >>>> > >>> fixes (from 3.4.5) to trunk and test them.
>>> >>>> > >>>
>>> >>>> > >>> Thanks & Regards,
>>> >>>> > >>> Deepak
>>> >>>> > >>>
>>> >>>> > >>>
>>> >>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
>>> >>>> > >>> german.blanco.blanco@gmail.com> wrote:
>>> >>>> > >>>
>>> >>>> > >>>> Hello Deepak,
>>> >>>> > >>>>
>>> >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some
>>> >>>> cases in
>>> >>>> > >>>> which an ensemble can be formed so that it doesn't allow any
>>> >>>> > >>>> other
>>> >>>> > >>>> zookeeper server to join.
>>> >>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed
>>> >>>> > >>>> in
>>> >>>> trunk
>>> >>>> > >>>> yet.
>>> >>>> > >>>> Check if the Notifications sent around contain different
>>> >>>> > >>>> values
>>> >>>> for
>>> >>>> > the
>>> >>>> > >>>> vote in the members of the ensemble.
>>> >>>> > >>>> If you force a new election (e.g. by killing the leader) I
>>> >>>> > >>>> guess
>>> >>>> > >>>> everything
>>> >>>> > >>>> should work normally, but don't take my word for it.
>>> >>>> > >>>> Flavio should know more about this.
>>> >>>> > >>>>
>>> >>>> > >>>> Cheers,
>>> >>>> > >>>>
>>> >>>> > >>>> German.
>>> >>>> > >>>>
>>> >>>> > >>>>
>>> >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
>>> >>>> > deepak.jagtap@maxta.com
>>> >>>> > >>>> >wrote:
>>> >>>> > >>>>
>>> >>>> > >>>> > Hi,
>>> >>>> > >>>> >
>>> >>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum.
>>> >>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976
>>> >>>> version.
>>> >>>> > >>>> > I successfully replaced Node3 with newer version
>>> >>>> > >>>> > 3.5.0.1551730.
>>> >>>> > >>>> > When I am trying to replace Node2 with the same zookeeper
>>> >>>> version.
>>> >>>> > >>>> > I couldn't start zookeeper server on Node2 as it is
>>> >>>> > >>>> > continuously
>>> >>>> > >>>> stuck in
>>> >>>> > >>>> > leader election loop printing  following messages:
>>> >>>> > >>>> >
>>> >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
>>> >>>> > >>>> >
>>> >>>> > >>>> > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
>>> >>>> -
>>> >>>> > >>>> > Notification time out: 60000
>>> >>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
>>> >>>> > >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller
>>> >>>> server
>>> >>>> > >>>> > identifier, so dropping the connection: (5, 3)
>>> >>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
>>> >>>> > >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] -
>>> >>>> Notification: 3
>>> >>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state),
>>> >>>> > >>>> > 3
>>> >>>> > >>>> (n.sid), 0x0
>>> >>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>> >>>> > >>>> >
>>> >>>> > >>>> >
>>> >>>> > >>>> > Network connections and configuration of the node being
>>> >>>> upgraded are
>>> >>>> > >>>> fine.
>>> >>>> > >>>> > The other 2 nodes in the quorum are fine and serving the
>>> >>>> request.
>>> >>>> > >>>> >
>>> >>>> > >>>> > Any idea what might be causing this?
>>> >>>> > >>>> >
>>> >>>> > >>>> > Thanks & Regards,
>>> >>>> > >>>> > Deepak
>>> >>>> > >>>> >
>>> >>>> > >>>>
>>> >>>> > >>>
>>> >>>> > >>>
>>> >>>> > >>
>>> >>>> > >
>>> >>>> >
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>
>>
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Hello Michi,

I observed following while testing patch for 1805 against trunk revision
1574686.
I ran " ant -Djavac.args="-Xlint -Xmaxwarns 1000" clean test tar"
against trunk revision 1574686.
Build failed as StandAloneDisabledTest failed.

After applying 1805 against 1574686 build failed with following test failed:
1. StandAloneDisabledTest
2. QuorumTest

When I only run QuorumTest against this (1574686 + 1805 patch) it succeeds.
 (using "ant -Dtestcase=QuorumTest test")

Please advise, if I should assume build is successful except
StandAloneDisabled test?

Thanks & Regards,
Deepak


On Mon, Mar 10, 2014 at 6:11 PM, Deepak Jagtap <de...@maxta.com>wrote:

> Thanks Michi!
>
>
> On Mon, Mar 10, 2014 at 5:40 PM, Michi Mutsuzaki <mi...@cs.stanford.edu>wrote:
>
>> StandaloneDisabledTest.startSingleServerTest seems to be failing from
>> the same issue. We should fix this soon.
>>
>> https://issues.apache.org/jira/browse/ZOOKEEPER-1870
>>
>> On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap <de...@maxta.com>
>> wrote:
>> > Hello,
>> >
>> > Another query regarding 1805.
>> > I am observing zookeeper rolling upgrade is always succeeds when I apply
>> > 1805 patch.
>> > When I apply both 1810 and 1805 patch rolling upgrade fails due to an
>> > issue mentioned earlier.
>> >
>> > Please advise, if it's fine to use only patch 1805 for the trunk?
>> >
>> > Thanks & Regards,
>> > Deepak
>> >
>> >
>> > On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap <deepak.jagtap@maxta.com
>> >wrote:
>> >
>> >> Hi German,
>> >>
>> >> I have applied patch 1810 and 1805 against trunk revision 1574686
>> (recent
>> >> revision against which 1810 patch build succeeded).
>> >> But observing following error in the zookeeper log on the new node
>> joining
>> >> quorum:
>> >>
>> >> 2014-03-10 21:11:25,126 [myid:1] - INFO
>> >>  [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server
>> >> identifier, so dropping the connection: (3, 1)
>> >> 2014-03-10 21:11:25,127 [myid:1] - INFO  [/169.254.44.1:3888
>> >> :QuorumCnxManager$Listener@540] - Received connection request /
>> >> 169.254.44.3:51507
>> >> 2014-03-10 21:11:25,193 [myid:1] - ERROR
>> >> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread
>> >> Thread[WorkerReceiver[myid=1],5,main] died
>> >> java.lang.OutOfMemoryError: Java heap space
>> >>    at
>> >>
>> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273)
>> >>    at java.lang.Thread.run(Unknown Source)
>> >>
>> >> Followed by these messages getting printed repeatedly:
>> >> 2014-03-10 21:11:25,328 [myid:1] - INFO
>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> >> Notification time out: 400
>> >> 2014-03-10 21:11:25,729 [myid:1] - INFO
>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> >> Notification time out: 800
>> >> 2014-03-10 21:11:26,530 [myid:1] - INFO
>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> >> Notification time out: 1600
>> >> 2014-03-10 21:11:28,131 [myid:1] - INFO
>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> >> Notification time out: 3200
>> >> 2014-03-10 21:11:31,332 [myid:1] - INFO
>> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> >> Notification time out: 6400
>> >>
>> >> Thanks & Reagrds,
>> >> Deepak
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap <
>> deepak.jagtap@maxta.com>wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> I have applied only 1805 patch, not 1810.
>> >>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5).
>> >>> It was failing very consistently in our environment, and after 1805
>> patch
>> >>> it went smoothly.
>> >>>
>> >>> Regards,
>> >>> Deepak
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco <
>> >>> german.blanco.blanco@gmail.com> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> do you mean ZOOKEEPER-1810 patch?
>> >>>> That one alone doesn't solve the problem. On the other hand, the
>> problem
>> >>>> doesn't happen always, so after a rolling start it might get solved.
>> >>>> We need 1818 as well, but it is easier to go step by step and get
>> 1810 in
>> >>>> trunk first.
>> >>>> I hope that as soon as 3.4.6 is out this might get some attention.
>> >>>>
>> >>>> Regards,
>> >>>>
>> >>>> German.
>> >>>>
>> >>>>
>> >>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <
>> deepak.jagtap@maxta.com
>> >>>> >wrote:
>> >>>>
>> >>>> > Hi,
>> >>>> >
>> >>>> > Please ignore the previous comment, I used wrong jar file and hence
>> >>>> rolling
>> >>>> > upgrade failed.
>> >>>> > After applying patch for bug  on zookeeper-3.5.0.1562289
>> >>>> > revision, rolling upgrade went fine.
>> >>>> >
>> >>>> > I have patched in house zookeeper version, but it would be
>> convenient
>> >>>> if we
>> >>>> > apply patch on trunk and use the latest trunk.
>> >>>> > Please advise if I can apply the patch on the trunk and test it for
>> >>>> you.
>> >>>> >
>> >>>> > Thanks & Regards,
>> >>>> > Deepak
>> >>>> >
>> >>>> >
>> >>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <
>> >>>> deepak.jagtap@maxta.com
>> >>>> > >wrote:
>> >>>> >
>> >>>> > > Hi German,
>> >>>> > >
>> >>>> > > I tried applying patch for 1805 but problem still persists.
>> >>>> > > Following are the notification messages logged repeatedly by the
>> node
>> >>>> > > which fails to join the quorum:
>> >>>> > >
>> >>>> > >
>> >>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO
>> >>>> > >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
>> -
>> >>>> > > Notification time out: 51200
>> >>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO
>> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] -
>> Notification: 2
>> >>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2
>> >>>> (n.sid),
>> >>>> > 0x0
>> >>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>> >>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO
>> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] -
>> Notification: 3
>> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING
>> >>>> (n.state), 1
>> >>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config
>> version)
>> >>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO
>> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] -
>> Notification: 3
>> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round),
>> >>>> LEADING
>> >>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1
>> >>>> (n.config
>> >>>> > > version)
>> >>>> > >
>> >>>> > >
>> >>>> > >
>> >>>> > > Patch for 1732 is already included in the trunk.
>> >>>> > >
>> >>>> > >
>> >>>> > > Thanks & Regards,
>> >>>> > > Deepak
>> >>>> > >
>> >>>> > >
>> >>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <
>> >>>> deepak.jagtap@maxta.com
>> >>>> > >wrote:
>> >>>> > >
>> >>>> > >> Hi Flavio, German,
>> >>>> > >>
>> >>>> > >> Since this fix is critical for zookeeper rolling upgrade is it
>> ok
>> >>>> if I
>> >>>> > >> apply this patch to 3.5.0 trunk?
>> >>>> > >> Is it straightforward to apply this patch to trunk?
>> >>>> > >>
>> >>>> > >> Thanks & Regards,
>> >>>> > >> Deepak
>> >>>> > >>
>> >>>> > >>
>> >>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
>> >>>> > deepak.jagtap@maxta.com>wrote:
>> >>>> > >>
>> >>>> > >>> Thanks German!
>> >>>> > >>> Just wondering is there any chance that this patch may be
>> applied
>> >>>> to
>> >>>> > >>> trunk in near future?
>> >>>> > >>> If it's fine with you guys, I would be more than happy to
>> apply the
>> >>>> > >>> fixes (from 3.4.5) to trunk and test them.
>> >>>> > >>>
>> >>>> > >>> Thanks & Regards,
>> >>>> > >>> Deepak
>> >>>> > >>>
>> >>>> > >>>
>> >>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
>> >>>> > >>> german.blanco.blanco@gmail.com> wrote:
>> >>>> > >>>
>> >>>> > >>>> Hello Deepak,
>> >>>> > >>>>
>> >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some
>> >>>> cases in
>> >>>> > >>>> which an ensemble can be formed so that it doesn't allow any
>> other
>> >>>> > >>>> zookeeper server to join.
>> >>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in
>> >>>> trunk
>> >>>> > >>>> yet.
>> >>>> > >>>> Check if the Notifications sent around contain different
>> values
>> >>>> for
>> >>>> > the
>> >>>> > >>>> vote in the members of the ensemble.
>> >>>> > >>>> If you force a new election (e.g. by killing the leader) I
>> guess
>> >>>> > >>>> everything
>> >>>> > >>>> should work normally, but don't take my word for it.
>> >>>> > >>>> Flavio should know more about this.
>> >>>> > >>>>
>> >>>> > >>>> Cheers,
>> >>>> > >>>>
>> >>>> > >>>> German.
>> >>>> > >>>>
>> >>>> > >>>>
>> >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
>> >>>> > deepak.jagtap@maxta.com
>> >>>> > >>>> >wrote:
>> >>>> > >>>>
>> >>>> > >>>> > Hi,
>> >>>> > >>>> >
>> >>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum.
>> >>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976
>> >>>> version.
>> >>>> > >>>> > I successfully replaced Node3 with newer version
>> 3.5.0.1551730.
>> >>>> > >>>> > When I am trying to replace Node2 with the same zookeeper
>> >>>> version.
>> >>>> > >>>> > I couldn't start zookeeper server on Node2 as it is
>> continuously
>> >>>> > >>>> stuck in
>> >>>> > >>>> > leader election loop printing  following messages:
>> >>>> > >>>> >
>> >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
>> >>>> > >>>> >
>>  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
>> >>>> -
>> >>>> > >>>> > Notification time out: 60000
>> >>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
>> >>>> > >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller
>> >>>> server
>> >>>> > >>>> > identifier, so dropping the connection: (5, 3)
>> >>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
>> >>>> > >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] -
>> >>>> Notification: 3
>> >>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state),
>> 3
>> >>>> > >>>> (n.sid), 0x0
>> >>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>> >>>> > >>>> >
>> >>>> > >>>> >
>> >>>> > >>>> > Network connections and configuration of the node being
>> >>>> upgraded are
>> >>>> > >>>> fine.
>> >>>> > >>>> > The other 2 nodes in the quorum are fine and serving the
>> >>>> request.
>> >>>> > >>>> >
>> >>>> > >>>> > Any idea what might be causing this?
>> >>>> > >>>> >
>> >>>> > >>>> > Thanks & Regards,
>> >>>> > >>>> > Deepak
>> >>>> > >>>> >
>> >>>> > >>>>
>> >>>> > >>>
>> >>>> > >>>
>> >>>> > >>
>> >>>> > >
>> >>>> >
>> >>>>
>> >>>
>> >>>
>> >>
>>
>
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Thanks Michi!


On Mon, Mar 10, 2014 at 5:40 PM, Michi Mutsuzaki <mi...@cs.stanford.edu>wrote:

> StandaloneDisabledTest.startSingleServerTest seems to be failing from
> the same issue. We should fix this soon.
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-1870
>
> On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap <de...@maxta.com>
> wrote:
> > Hello,
> >
> > Another query regarding 1805.
> > I am observing zookeeper rolling upgrade is always succeeds when I apply
> > 1805 patch.
> > When I apply both 1810 and 1805 patch rolling upgrade fails due to an
> > issue mentioned earlier.
> >
> > Please advise, if it's fine to use only patch 1805 for the trunk?
> >
> > Thanks & Regards,
> > Deepak
> >
> >
> > On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
> >
> >> Hi German,
> >>
> >> I have applied patch 1810 and 1805 against trunk revision 1574686
> (recent
> >> revision against which 1810 patch build succeeded).
> >> But observing following error in the zookeeper log on the new node
> joining
> >> quorum:
> >>
> >> 2014-03-10 21:11:25,126 [myid:1] - INFO
> >>  [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server
> >> identifier, so dropping the connection: (3, 1)
> >> 2014-03-10 21:11:25,127 [myid:1] - INFO  [/169.254.44.1:3888
> >> :QuorumCnxManager$Listener@540] - Received connection request /
> >> 169.254.44.3:51507
> >> 2014-03-10 21:11:25,193 [myid:1] - ERROR
> >> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread
> >> Thread[WorkerReceiver[myid=1],5,main] died
> >> java.lang.OutOfMemoryError: Java heap space
> >>    at
> >>
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273)
> >>    at java.lang.Thread.run(Unknown Source)
> >>
> >> Followed by these messages getting printed repeatedly:
> >> 2014-03-10 21:11:25,328 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 400
> >> 2014-03-10 21:11:25,729 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 800
> >> 2014-03-10 21:11:26,530 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 1600
> >> 2014-03-10 21:11:28,131 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 3200
> >> 2014-03-10 21:11:31,332 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 6400
> >>
> >> Thanks & Reagrds,
> >> Deepak
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
> >>
> >>> Hi,
> >>>
> >>> I have applied only 1805 patch, not 1810.
> >>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5).
> >>> It was failing very consistently in our environment, and after 1805
> patch
> >>> it went smoothly.
> >>>
> >>> Regards,
> >>> Deepak
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco <
> >>> german.blanco.blanco@gmail.com> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> do you mean ZOOKEEPER-1810 patch?
> >>>> That one alone doesn't solve the problem. On the other hand, the
> problem
> >>>> doesn't happen always, so after a rolling start it might get solved.
> >>>> We need 1818 as well, but it is easier to go step by step and get
> 1810 in
> >>>> trunk first.
> >>>> I hope that as soon as 3.4.6 is out this might get some attention.
> >>>>
> >>>> Regards,
> >>>>
> >>>> German.
> >>>>
> >>>>
> >>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <
> deepak.jagtap@maxta.com
> >>>> >wrote:
> >>>>
> >>>> > Hi,
> >>>> >
> >>>> > Please ignore the previous comment, I used wrong jar file and hence
> >>>> rolling
> >>>> > upgrade failed.
> >>>> > After applying patch for bug  on zookeeper-3.5.0.1562289
> >>>> > revision, rolling upgrade went fine.
> >>>> >
> >>>> > I have patched in house zookeeper version, but it would be
> convenient
> >>>> if we
> >>>> > apply patch on trunk and use the latest trunk.
> >>>> > Please advise if I can apply the patch on the trunk and test it for
> >>>> you.
> >>>> >
> >>>> > Thanks & Regards,
> >>>> > Deepak
> >>>> >
> >>>> >
> >>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <
> >>>> deepak.jagtap@maxta.com
> >>>> > >wrote:
> >>>> >
> >>>> > > Hi German,
> >>>> > >
> >>>> > > I tried applying patch for 1805 but problem still persists.
> >>>> > > Following are the notification messages logged repeatedly by the
> node
> >>>> > > which fails to join the quorum:
> >>>> > >
> >>>> > >
> >>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO
> >>>> > >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
> -
> >>>> > > Notification time out: 51200
> >>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO
> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification:
> 2
> >>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2
> >>>> (n.sid),
> >>>> > 0x0
> >>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> >>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO
> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification:
> 3
> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING
> >>>> (n.state), 1
> >>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
> >>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO
> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification:
> 3
> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round),
> >>>> LEADING
> >>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1
> >>>> (n.config
> >>>> > > version)
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > Patch for 1732 is already included in the trunk.
> >>>> > >
> >>>> > >
> >>>> > > Thanks & Regards,
> >>>> > > Deepak
> >>>> > >
> >>>> > >
> >>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <
> >>>> deepak.jagtap@maxta.com
> >>>> > >wrote:
> >>>> > >
> >>>> > >> Hi Flavio, German,
> >>>> > >>
> >>>> > >> Since this fix is critical for zookeeper rolling upgrade is it ok
> >>>> if I
> >>>> > >> apply this patch to 3.5.0 trunk?
> >>>> > >> Is it straightforward to apply this patch to trunk?
> >>>> > >>
> >>>> > >> Thanks & Regards,
> >>>> > >> Deepak
> >>>> > >>
> >>>> > >>
> >>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
> >>>> > deepak.jagtap@maxta.com>wrote:
> >>>> > >>
> >>>> > >>> Thanks German!
> >>>> > >>> Just wondering is there any chance that this patch may be
> applied
> >>>> to
> >>>> > >>> trunk in near future?
> >>>> > >>> If it's fine with you guys, I would be more than happy to apply
> the
> >>>> > >>> fixes (from 3.4.5) to trunk and test them.
> >>>> > >>>
> >>>> > >>> Thanks & Regards,
> >>>> > >>> Deepak
> >>>> > >>>
> >>>> > >>>
> >>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
> >>>> > >>> german.blanco.blanco@gmail.com> wrote:
> >>>> > >>>
> >>>> > >>>> Hello Deepak,
> >>>> > >>>>
> >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some
> >>>> cases in
> >>>> > >>>> which an ensemble can be formed so that it doesn't allow any
> other
> >>>> > >>>> zookeeper server to join.
> >>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in
> >>>> trunk
> >>>> > >>>> yet.
> >>>> > >>>> Check if the Notifications sent around contain different values
> >>>> for
> >>>> > the
> >>>> > >>>> vote in the members of the ensemble.
> >>>> > >>>> If you force a new election (e.g. by killing the leader) I
> guess
> >>>> > >>>> everything
> >>>> > >>>> should work normally, but don't take my word for it.
> >>>> > >>>> Flavio should know more about this.
> >>>> > >>>>
> >>>> > >>>> Cheers,
> >>>> > >>>>
> >>>> > >>>> German.
> >>>> > >>>>
> >>>> > >>>>
> >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
> >>>> > deepak.jagtap@maxta.com
> >>>> > >>>> >wrote:
> >>>> > >>>>
> >>>> > >>>> > Hi,
> >>>> > >>>> >
> >>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum.
> >>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976
> >>>> version.
> >>>> > >>>> > I successfully replaced Node3 with newer version
> 3.5.0.1551730.
> >>>> > >>>> > When I am trying to replace Node2 with the same zookeeper
> >>>> version.
> >>>> > >>>> > I couldn't start zookeeper server on Node2 as it is
> continuously
> >>>> > >>>> stuck in
> >>>> > >>>> > leader election loop printing  following messages:
> >>>> > >>>> >
> >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
> >>>> > >>>> >
>  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
> >>>> -
> >>>> > >>>> > Notification time out: 60000
> >>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
> >>>> > >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller
> >>>> server
> >>>> > >>>> > identifier, so dropping the connection: (5, 3)
> >>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
> >>>> > >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] -
> >>>> Notification: 3
> >>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3
> >>>> > >>>> (n.sid), 0x0
> >>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> >>>> > >>>> >
> >>>> > >>>> >
> >>>> > >>>> > Network connections and configuration of the node being
> >>>> upgraded are
> >>>> > >>>> fine.
> >>>> > >>>> > The other 2 nodes in the quorum are fine and serving the
> >>>> request.
> >>>> > >>>> >
> >>>> > >>>> > Any idea what might be causing this?
> >>>> > >>>> >
> >>>> > >>>> > Thanks & Regards,
> >>>> > >>>> > Deepak
> >>>> > >>>> >
> >>>> > >>>>
> >>>> > >>>
> >>>> > >>>
> >>>> > >>
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Michi Mutsuzaki <mi...@cs.stanford.edu>.
StandaloneDisabledTest.startSingleServerTest seems to be failing from
the same issue. We should fix this soon.

https://issues.apache.org/jira/browse/ZOOKEEPER-1870

On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap <de...@maxta.com> wrote:
> Hello,
>
> Another query regarding 1805.
> I am observing zookeeper rolling upgrade is always succeeds when I apply
> 1805 patch.
> When I apply both 1810 and 1805 patch rolling upgrade fails due to an
> issue mentioned earlier.
>
> Please advise, if it's fine to use only patch 1805 for the trunk?
>
> Thanks & Regards,
> Deepak
>
>
> On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap <de...@maxta.com>wrote:
>
>> Hi German,
>>
>> I have applied patch 1810 and 1805 against trunk revision 1574686 (recent
>> revision against which 1810 patch build succeeded).
>> But observing following error in the zookeeper log on the new node joining
>> quorum:
>>
>> 2014-03-10 21:11:25,126 [myid:1] - INFO
>>  [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server
>> identifier, so dropping the connection: (3, 1)
>> 2014-03-10 21:11:25,127 [myid:1] - INFO  [/169.254.44.1:3888
>> :QuorumCnxManager$Listener@540] - Received connection request /
>> 169.254.44.3:51507
>> 2014-03-10 21:11:25,193 [myid:1] - ERROR
>> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread
>> Thread[WorkerReceiver[myid=1],5,main] died
>> java.lang.OutOfMemoryError: Java heap space
>>    at
>> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273)
>>    at java.lang.Thread.run(Unknown Source)
>>
>> Followed by these messages getting printed repeatedly:
>> 2014-03-10 21:11:25,328 [myid:1] - INFO
>>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> Notification time out: 400
>> 2014-03-10 21:11:25,729 [myid:1] - INFO
>>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> Notification time out: 800
>> 2014-03-10 21:11:26,530 [myid:1] - INFO
>>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> Notification time out: 1600
>> 2014-03-10 21:11:28,131 [myid:1] - INFO
>>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> Notification time out: 3200
>> 2014-03-10 21:11:31,332 [myid:1] - INFO
>>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
>> Notification time out: 6400
>>
>> Thanks & Reagrds,
>> Deepak
>>
>>
>>
>>
>>
>> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap <de...@maxta.com>wrote:
>>
>>> Hi,
>>>
>>> I have applied only 1805 patch, not 1810.
>>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5).
>>> It was failing very consistently in our environment, and after 1805 patch
>>> it went smoothly.
>>>
>>> Regards,
>>> Deepak
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco <
>>> german.blanco.blanco@gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> do you mean ZOOKEEPER-1810 patch?
>>>> That one alone doesn't solve the problem. On the other hand, the problem
>>>> doesn't happen always, so after a rolling start it might get solved.
>>>> We need 1818 as well, but it is easier to go step by step and get 1810 in
>>>> trunk first.
>>>> I hope that as soon as 3.4.6 is out this might get some attention.
>>>>
>>>> Regards,
>>>>
>>>> German.
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <deepak.jagtap@maxta.com
>>>> >wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > Please ignore the previous comment, I used wrong jar file and hence
>>>> rolling
>>>> > upgrade failed.
>>>> > After applying patch for bug  on zookeeper-3.5.0.1562289
>>>> > revision, rolling upgrade went fine.
>>>> >
>>>> > I have patched in house zookeeper version, but it would be convenient
>>>> if we
>>>> > apply patch on trunk and use the latest trunk.
>>>> > Please advise if I can apply the patch on the trunk and test it for
>>>> you.
>>>> >
>>>> > Thanks & Regards,
>>>> > Deepak
>>>> >
>>>> >
>>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <
>>>> deepak.jagtap@maxta.com
>>>> > >wrote:
>>>> >
>>>> > > Hi German,
>>>> > >
>>>> > > I tried applying patch for 1805 but problem still persists.
>>>> > > Following are the notification messages logged repeatedly by the node
>>>> > > which fails to join the quorum:
>>>> > >
>>>> > >
>>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO
>>>> > >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
>>>> > > Notification time out: 51200
>>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO
>>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2
>>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2
>>>> (n.sid),
>>>> > 0x0
>>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO
>>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
>>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING
>>>> (n.state), 1
>>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO
>>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
>>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round),
>>>> LEADING
>>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1
>>>> (n.config
>>>> > > version)
>>>> > >
>>>> > >
>>>> > >
>>>> > > Patch for 1732 is already included in the trunk.
>>>> > >
>>>> > >
>>>> > > Thanks & Regards,
>>>> > > Deepak
>>>> > >
>>>> > >
>>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <
>>>> deepak.jagtap@maxta.com
>>>> > >wrote:
>>>> > >
>>>> > >> Hi Flavio, German,
>>>> > >>
>>>> > >> Since this fix is critical for zookeeper rolling upgrade is it ok
>>>> if I
>>>> > >> apply this patch to 3.5.0 trunk?
>>>> > >> Is it straightforward to apply this patch to trunk?
>>>> > >>
>>>> > >> Thanks & Regards,
>>>> > >> Deepak
>>>> > >>
>>>> > >>
>>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
>>>> > deepak.jagtap@maxta.com>wrote:
>>>> > >>
>>>> > >>> Thanks German!
>>>> > >>> Just wondering is there any chance that this patch may be applied
>>>> to
>>>> > >>> trunk in near future?
>>>> > >>> If it's fine with you guys, I would be more than happy to apply the
>>>> > >>> fixes (from 3.4.5) to trunk and test them.
>>>> > >>>
>>>> > >>> Thanks & Regards,
>>>> > >>> Deepak
>>>> > >>>
>>>> > >>>
>>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
>>>> > >>> german.blanco.blanco@gmail.com> wrote:
>>>> > >>>
>>>> > >>>> Hello Deepak,
>>>> > >>>>
>>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some
>>>> cases in
>>>> > >>>> which an ensemble can be formed so that it doesn't allow any other
>>>> > >>>> zookeeper server to join.
>>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in
>>>> trunk
>>>> > >>>> yet.
>>>> > >>>> Check if the Notifications sent around contain different values
>>>> for
>>>> > the
>>>> > >>>> vote in the members of the ensemble.
>>>> > >>>> If you force a new election (e.g. by killing the leader) I guess
>>>> > >>>> everything
>>>> > >>>> should work normally, but don't take my word for it.
>>>> > >>>> Flavio should know more about this.
>>>> > >>>>
>>>> > >>>> Cheers,
>>>> > >>>>
>>>> > >>>> German.
>>>> > >>>>
>>>> > >>>>
>>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
>>>> > deepak.jagtap@maxta.com
>>>> > >>>> >wrote:
>>>> > >>>>
>>>> > >>>> > Hi,
>>>> > >>>> >
>>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum.
>>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976
>>>> version.
>>>> > >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
>>>> > >>>> > When I am trying to replace Node2 with the same zookeeper
>>>> version.
>>>> > >>>> > I couldn't start zookeeper server on Node2 as it is continuously
>>>> > >>>> stuck in
>>>> > >>>> > leader election loop printing  following messages:
>>>> > >>>> >
>>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
>>>> > >>>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
>>>> -
>>>> > >>>> > Notification time out: 60000
>>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
>>>> > >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller
>>>> server
>>>> > >>>> > identifier, so dropping the connection: (5, 3)
>>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
>>>> > >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] -
>>>> Notification: 3
>>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3
>>>> > >>>> (n.sid), 0x0
>>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>>> > >>>> >
>>>> > >>>> >
>>>> > >>>> > Network connections and configuration of the node being
>>>> upgraded are
>>>> > >>>> fine.
>>>> > >>>> > The other 2 nodes in the quorum are fine and serving the
>>>> request.
>>>> > >>>> >
>>>> > >>>> > Any idea what might be causing this?
>>>> > >>>> >
>>>> > >>>> > Thanks & Regards,
>>>> > >>>> > Deepak
>>>> > >>>> >
>>>> > >>>>
>>>> > >>>
>>>> > >>>
>>>> > >>
>>>> > >
>>>> >
>>>>
>>>
>>>
>>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Hello German,

With reference to the issue discussed in following thread, I confirmed that patch for bug 1805 has not been
applied on zookeeper-3.5.0 top of trunk.  So wanted to confirm is there any specific reason that patch can not be applied to trunk.
Is it ok if I cherry-pick that fix from 3.4.6 and apply to trunk?
Please advise.

Thanks,
Deepak

________________________________
From: Deepak Jagtap
Sent: Friday, April 1, 2016 12:35 PM
To: Deepak Jagtap
Subject: Fw: New zookeeper server fails to join quorum with msg "Have smaller server identifie"




________________________________
From: Deepak Jagtap
Sent: Monday, March 10, 2014 5:33 PM
To: user@zookeeper.apache.org
Subject: Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Hello,

Another query regarding 1805.
I am observing zookeeper rolling upgrade is always succeeds when I apply 1805 patch.
When I apply both 1810 and 1805 patch rolling upgrade fails due to an issue mentioned earlier.

Please advise, if it's fine to use only patch 1805 for the trunk?

Thanks & Regards,
Deepak


On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap <de...@maxta.com>> wrote:
Hi German,

I have applied patch 1810 and 1805 against trunk revision 1574686 (recent revision against which 1810 patch build succeeded).
But observing following error in the zookeeper log on the new node joining quorum:

2014-03-10 21:11:25,126 [myid:1] - INFO  [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server identifier, so dropping the connection: (3, 1)
2014-03-10 21:11:25,127 [myid:1] - INFO  [/169.254.44.1:3888:QuorumCnxManager$Listener@540] - Received connection request /169.254.44.3:51507<http://169.254.44.3:51507>
2014-03-10 21:11:25,193 [myid:1] - ERROR [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread Thread[WorkerReceiver[myid=1],5,main] died
java.lang.OutOfMemoryError: Java heap space
   at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273)
   at java.lang.Thread.run(Unknown Source)

Followed by these messages getting printed repeatedly:
2014-03-10 21:11:25,328 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - Notification time out: 400
2014-03-10 21:11:25,729 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - Notification time out: 800
2014-03-10 21:11:26,530 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - Notification time out: 1600
2014-03-10 21:11:28,131 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - Notification time out: 3200
2014-03-10 21:11:31,332 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - Notification time out: 6400

Thanks & Reagrds,
Deepak





On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap <de...@maxta.com>> wrote:
Hi,

I have applied only 1805 patch, not 1810.
And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5).
It was failing very consistently in our environment, and after 1805 patch it went smoothly.

Regards,
Deepak





On Wed, Mar 5, 2014 at 7:36 AM, German Blanco <ge...@gmail.com>> wrote:
Hello,

do you mean ZOOKEEPER-1810 patch?
That one alone doesn't solve the problem. On the other hand, the problem
doesn't happen always, so after a rolling start it might get solved.
We need 1818 as well, but it is easier to go step by step and get 1810 in
trunk first.
I hope that as soon as 3.4.6 is out this might get some attention.

Regards,

German.


On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <de...@maxta.com>>wrote:

> Hi,
>
> Please ignore the previous comment, I used wrong jar file and hence rolling
> upgrade failed.
> After applying patch for bug  on zookeeper-3.5.0.1562289
> revision, rolling upgrade went fine.
>
> I have patched in house zookeeper version, but it would be convenient if we
> apply patch on trunk and use the latest trunk.
> Please advise if I can apply the patch on the trunk and test it for you.
>
> Thanks & Regards,
> Deepak
>
>
> On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <de...@maxta.com>
> >wrote:
>
> > Hi German,
> >
> > I tried applying patch for 1805 but problem still persists.
> > Following are the notification messages logged repeatedly by the node
> > which fails to join the quorum:
> >
> >
> > 2014-03-04 20:00:54,398 [myid:2] - INFO
> >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
> > Notification time out: 51200
> > 2014-03-04 20:00:54,400 [myid:2] - INFO
> >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2
> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid),
> 0x0
> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> > 2014-03-04 20:00:54,401 [myid:2] - INFO
> >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 1
> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
> > 2014-03-04 20:00:54,403 [myid:2] - INFO
> >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), LEADING
> > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config
> > version)
> >
> >
> >
> > Patch for 1732 is already included in the trunk.
> >
> >
> > Thanks & Regards,
> > Deepak
> >
> >
> > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <de...@maxta.com>
> >wrote:
> >
> >> Hi Flavio, German,
> >>
> >> Since this fix is critical for zookeeper rolling upgrade is it ok if I
> >> apply this patch to 3.5.0 trunk?
> >> Is it straightforward to apply this patch to trunk?
> >>
> >> Thanks & Regards,
> >> Deepak
> >>
> >>
> >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
> deepak.jagtap@maxta.com<ma...@maxta.com>>wrote:
> >>
> >>> Thanks German!
> >>> Just wondering is there any chance that this patch may be applied to
> >>> trunk in near future?
> >>> If it's fine with you guys, I would be more than happy to apply the
> >>> fixes (from 3.4.5) to trunk and test them.
> >>>
> >>> Thanks & Regards,
> >>> Deepak
> >>>
> >>>
> >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
> >>> german.blanco.blanco@gmail.com<ma...@gmail.com>> wrote:
> >>>
> >>>> Hello Deepak,
> >>>>
> >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in
> >>>> which an ensemble can be formed so that it doesn't allow any other
> >>>> zookeeper server to join.
> >>>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk
> >>>> yet.
> >>>> Check if the Notifications sent around contain different values for
> the
> >>>> vote in the members of the ensemble.
> >>>> If you force a new election (e.g. by killing the leader) I guess
> >>>> everything
> >>>> should work normally, but don't take my word for it.
> >>>> Flavio should know more about this.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> German.
> >>>>
> >>>>
> >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
> deepak.jagtap@maxta.com<ma...@maxta.com>
> >>>> >wrote:
> >>>>
> >>>> > Hi,
> >>>> >
> >>>> > I replacing one of the zookeeper server from 3 node quorum.
> >>>> > Initially all zookeeper serves were running 3.5.0.1515976 version.
> >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
> >>>> > When I am trying to replace Node2 with the same zookeeper version.
> >>>> > I couldn't start zookeeper server on Node2 as it is continuously
> >>>> stuck in
> >>>> > leader election loop printing  following messages:
> >>>> >
> >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
> >>>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
> >>>> > Notification time out: 60000
> >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
> >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server
> >>>> > identifier, so dropping the connection: (5, 3)
> >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
> >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3
> >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3
> >>>> (n.sid), 0x0
> >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> >>>> >
> >>>> >
> >>>> > Network connections and configuration of the node being upgraded are
> >>>> fine.
> >>>> > The other 2 nodes in the quorum are fine and serving the request.
> >>>> >
> >>>> > Any idea what might be causing this?
> >>>> >
> >>>> > Thanks & Regards,
> >>>> > Deepak
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>




Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Hello,

Another query regarding 1805.
I am observing zookeeper rolling upgrade is always succeeds when I apply
1805 patch.
When I apply both 1810 and 1805 patch rolling upgrade fails due to an
issue mentioned earlier.

Please advise, if it's fine to use only patch 1805 for the trunk?

Thanks & Regards,
Deepak


On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap <de...@maxta.com>wrote:

> Hi German,
>
> I have applied patch 1810 and 1805 against trunk revision 1574686 (recent
> revision against which 1810 patch build succeeded).
> But observing following error in the zookeeper log on the new node joining
> quorum:
>
> 2014-03-10 21:11:25,126 [myid:1] - INFO
>  [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server
> identifier, so dropping the connection: (3, 1)
> 2014-03-10 21:11:25,127 [myid:1] - INFO  [/169.254.44.1:3888
> :QuorumCnxManager$Listener@540] - Received connection request /
> 169.254.44.3:51507
> 2014-03-10 21:11:25,193 [myid:1] - ERROR
> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread
> Thread[WorkerReceiver[myid=1],5,main] died
> java.lang.OutOfMemoryError: Java heap space
>    at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273)
>    at java.lang.Thread.run(Unknown Source)
>
> Followed by these messages getting printed repeatedly:
> 2014-03-10 21:11:25,328 [myid:1] - INFO
>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> Notification time out: 400
> 2014-03-10 21:11:25,729 [myid:1] - INFO
>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> Notification time out: 800
> 2014-03-10 21:11:26,530 [myid:1] - INFO
>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> Notification time out: 1600
> 2014-03-10 21:11:28,131 [myid:1] - INFO
>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> Notification time out: 3200
> 2014-03-10 21:11:31,332 [myid:1] - INFO
>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> Notification time out: 6400
>
> Thanks & Reagrds,
> Deepak
>
>
>
>
>
> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap <de...@maxta.com>wrote:
>
>> Hi,
>>
>> I have applied only 1805 patch, not 1810.
>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5).
>> It was failing very consistently in our environment, and after 1805 patch
>> it went smoothly.
>>
>> Regards,
>> Deepak
>>
>>
>>
>>
>>
>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco <
>> german.blanco.blanco@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> do you mean ZOOKEEPER-1810 patch?
>>> That one alone doesn't solve the problem. On the other hand, the problem
>>> doesn't happen always, so after a rolling start it might get solved.
>>> We need 1818 as well, but it is easier to go step by step and get 1810 in
>>> trunk first.
>>> I hope that as soon as 3.4.6 is out this might get some attention.
>>>
>>> Regards,
>>>
>>> German.
>>>
>>>
>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <deepak.jagtap@maxta.com
>>> >wrote:
>>>
>>> > Hi,
>>> >
>>> > Please ignore the previous comment, I used wrong jar file and hence
>>> rolling
>>> > upgrade failed.
>>> > After applying patch for bug  on zookeeper-3.5.0.1562289
>>> > revision, rolling upgrade went fine.
>>> >
>>> > I have patched in house zookeeper version, but it would be convenient
>>> if we
>>> > apply patch on trunk and use the latest trunk.
>>> > Please advise if I can apply the patch on the trunk and test it for
>>> you.
>>> >
>>> > Thanks & Regards,
>>> > Deepak
>>> >
>>> >
>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <
>>> deepak.jagtap@maxta.com
>>> > >wrote:
>>> >
>>> > > Hi German,
>>> > >
>>> > > I tried applying patch for 1805 but problem still persists.
>>> > > Following are the notification messages logged repeatedly by the node
>>> > > which fails to join the quorum:
>>> > >
>>> > >
>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO
>>> > >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
>>> > > Notification time out: 51200
>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO
>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2
>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2
>>> (n.sid),
>>> > 0x0
>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO
>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING
>>> (n.state), 1
>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO
>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round),
>>> LEADING
>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1
>>> (n.config
>>> > > version)
>>> > >
>>> > >
>>> > >
>>> > > Patch for 1732 is already included in the trunk.
>>> > >
>>> > >
>>> > > Thanks & Regards,
>>> > > Deepak
>>> > >
>>> > >
>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <
>>> deepak.jagtap@maxta.com
>>> > >wrote:
>>> > >
>>> > >> Hi Flavio, German,
>>> > >>
>>> > >> Since this fix is critical for zookeeper rolling upgrade is it ok
>>> if I
>>> > >> apply this patch to 3.5.0 trunk?
>>> > >> Is it straightforward to apply this patch to trunk?
>>> > >>
>>> > >> Thanks & Regards,
>>> > >> Deepak
>>> > >>
>>> > >>
>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
>>> > deepak.jagtap@maxta.com>wrote:
>>> > >>
>>> > >>> Thanks German!
>>> > >>> Just wondering is there any chance that this patch may be applied
>>> to
>>> > >>> trunk in near future?
>>> > >>> If it's fine with you guys, I would be more than happy to apply the
>>> > >>> fixes (from 3.4.5) to trunk and test them.
>>> > >>>
>>> > >>> Thanks & Regards,
>>> > >>> Deepak
>>> > >>>
>>> > >>>
>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
>>> > >>> german.blanco.blanco@gmail.com> wrote:
>>> > >>>
>>> > >>>> Hello Deepak,
>>> > >>>>
>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some
>>> cases in
>>> > >>>> which an ensemble can be formed so that it doesn't allow any other
>>> > >>>> zookeeper server to join.
>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in
>>> trunk
>>> > >>>> yet.
>>> > >>>> Check if the Notifications sent around contain different values
>>> for
>>> > the
>>> > >>>> vote in the members of the ensemble.
>>> > >>>> If you force a new election (e.g. by killing the leader) I guess
>>> > >>>> everything
>>> > >>>> should work normally, but don't take my word for it.
>>> > >>>> Flavio should know more about this.
>>> > >>>>
>>> > >>>> Cheers,
>>> > >>>>
>>> > >>>> German.
>>> > >>>>
>>> > >>>>
>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
>>> > deepak.jagtap@maxta.com
>>> > >>>> >wrote:
>>> > >>>>
>>> > >>>> > Hi,
>>> > >>>> >
>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum.
>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976
>>> version.
>>> > >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
>>> > >>>> > When I am trying to replace Node2 with the same zookeeper
>>> version.
>>> > >>>> > I couldn't start zookeeper server on Node2 as it is continuously
>>> > >>>> stuck in
>>> > >>>> > leader election loop printing  following messages:
>>> > >>>> >
>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
>>> > >>>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
>>> -
>>> > >>>> > Notification time out: 60000
>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
>>> > >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller
>>> server
>>> > >>>> > identifier, so dropping the connection: (5, 3)
>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
>>> > >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] -
>>> Notification: 3
>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3
>>> > >>>> (n.sid), 0x0
>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>> > >>>> >
>>> > >>>> >
>>> > >>>> > Network connections and configuration of the node being
>>> upgraded are
>>> > >>>> fine.
>>> > >>>> > The other 2 nodes in the quorum are fine and serving the
>>> request.
>>> > >>>> >
>>> > >>>> > Any idea what might be causing this?
>>> > >>>> >
>>> > >>>> > Thanks & Regards,
>>> > >>>> > Deepak
>>> > >>>> >
>>> > >>>>
>>> > >>>
>>> > >>>
>>> > >>
>>> > >
>>> >
>>>
>>
>>
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Hi German,

I have applied patch 1810 and 1805 against trunk revision 1574686 (recent
revision against which 1810 patch build succeeded).
But observing following error in the zookeeper log on the new node joining
quorum:

2014-03-10 21:11:25,126 [myid:1] - INFO
 [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server
identifier, so dropping the connection: (3, 1)
2014-03-10 21:11:25,127 [myid:1] - INFO  [/169.254.44.1:3888
:QuorumCnxManager$Listener@540] - Received connection request /
169.254.44.3:51507
2014-03-10 21:11:25,193 [myid:1] - ERROR
[WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread
Thread[WorkerReceiver[myid=1],5,main] died
java.lang.OutOfMemoryError: Java heap space
   at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273)
   at java.lang.Thread.run(Unknown Source)

Followed by these messages getting printed repeatedly:
2014-03-10 21:11:25,328 [myid:1] - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
Notification time out: 400
2014-03-10 21:11:25,729 [myid:1] - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
Notification time out: 800
2014-03-10 21:11:26,530 [myid:1] - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
Notification time out: 1600
2014-03-10 21:11:28,131 [myid:1] - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
Notification time out: 3200
2014-03-10 21:11:31,332 [myid:1] - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
Notification time out: 6400

Thanks & Reagrds,
Deepak





On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap <de...@maxta.com>wrote:

> Hi,
>
> I have applied only 1805 patch, not 1810.
> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5).
> It was failing very consistently in our environment, and after 1805 patch
> it went smoothly.
>
> Regards,
> Deepak
>
>
>
>
>
> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco <
> german.blanco.blanco@gmail.com> wrote:
>
>> Hello,
>>
>> do you mean ZOOKEEPER-1810 patch?
>> That one alone doesn't solve the problem. On the other hand, the problem
>> doesn't happen always, so after a rolling start it might get solved.
>> We need 1818 as well, but it is easier to go step by step and get 1810 in
>> trunk first.
>> I hope that as soon as 3.4.6 is out this might get some attention.
>>
>> Regards,
>>
>> German.
>>
>>
>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <deepak.jagtap@maxta.com
>> >wrote:
>>
>> > Hi,
>> >
>> > Please ignore the previous comment, I used wrong jar file and hence
>> rolling
>> > upgrade failed.
>> > After applying patch for bug  on zookeeper-3.5.0.1562289
>> > revision, rolling upgrade went fine.
>> >
>> > I have patched in house zookeeper version, but it would be convenient
>> if we
>> > apply patch on trunk and use the latest trunk.
>> > Please advise if I can apply the patch on the trunk and test it for you.
>> >
>> > Thanks & Regards,
>> > Deepak
>> >
>> >
>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <deepak.jagtap@maxta.com
>> > >wrote:
>> >
>> > > Hi German,
>> > >
>> > > I tried applying patch for 1805 but problem still persists.
>> > > Following are the notification messages logged repeatedly by the node
>> > > which fails to join the quorum:
>> > >
>> > >
>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO
>> > >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
>> > > Notification time out: 51200
>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO
>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2
>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid),
>> > 0x0
>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO
>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state),
>> 1
>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO
>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round),
>> LEADING
>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config
>> > > version)
>> > >
>> > >
>> > >
>> > > Patch for 1732 is already included in the trunk.
>> > >
>> > >
>> > > Thanks & Regards,
>> > > Deepak
>> > >
>> > >
>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <
>> deepak.jagtap@maxta.com
>> > >wrote:
>> > >
>> > >> Hi Flavio, German,
>> > >>
>> > >> Since this fix is critical for zookeeper rolling upgrade is it ok if
>> I
>> > >> apply this patch to 3.5.0 trunk?
>> > >> Is it straightforward to apply this patch to trunk?
>> > >>
>> > >> Thanks & Regards,
>> > >> Deepak
>> > >>
>> > >>
>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
>> > deepak.jagtap@maxta.com>wrote:
>> > >>
>> > >>> Thanks German!
>> > >>> Just wondering is there any chance that this patch may be applied to
>> > >>> trunk in near future?
>> > >>> If it's fine with you guys, I would be more than happy to apply the
>> > >>> fixes (from 3.4.5) to trunk and test them.
>> > >>>
>> > >>> Thanks & Regards,
>> > >>> Deepak
>> > >>>
>> > >>>
>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
>> > >>> german.blanco.blanco@gmail.com> wrote:
>> > >>>
>> > >>>> Hello Deepak,
>> > >>>>
>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some
>> cases in
>> > >>>> which an ensemble can be formed so that it doesn't allow any other
>> > >>>> zookeeper server to join.
>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in
>> trunk
>> > >>>> yet.
>> > >>>> Check if the Notifications sent around contain different values for
>> > the
>> > >>>> vote in the members of the ensemble.
>> > >>>> If you force a new election (e.g. by killing the leader) I guess
>> > >>>> everything
>> > >>>> should work normally, but don't take my word for it.
>> > >>>> Flavio should know more about this.
>> > >>>>
>> > >>>> Cheers,
>> > >>>>
>> > >>>> German.
>> > >>>>
>> > >>>>
>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
>> > deepak.jagtap@maxta.com
>> > >>>> >wrote:
>> > >>>>
>> > >>>> > Hi,
>> > >>>> >
>> > >>>> > I replacing one of the zookeeper server from 3 node quorum.
>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976
>> version.
>> > >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
>> > >>>> > When I am trying to replace Node2 with the same zookeeper
>> version.
>> > >>>> > I couldn't start zookeeper server on Node2 as it is continuously
>> > >>>> stuck in
>> > >>>> > leader election loop printing  following messages:
>> > >>>> >
>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
>> > >>>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
>> -
>> > >>>> > Notification time out: 60000
>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
>> > >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller
>> server
>> > >>>> > identifier, so dropping the connection: (5, 3)
>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
>> > >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] -
>> Notification: 3
>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3
>> > >>>> (n.sid), 0x0
>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>> > >>>> >
>> > >>>> >
>> > >>>> > Network connections and configuration of the node being upgraded
>> are
>> > >>>> fine.
>> > >>>> > The other 2 nodes in the quorum are fine and serving the request.
>> > >>>> >
>> > >>>> > Any idea what might be causing this?
>> > >>>> >
>> > >>>> > Thanks & Regards,
>> > >>>> > Deepak
>> > >>>> >
>> > >>>>
>> > >>>
>> > >>>
>> > >>
>> > >
>> >
>>
>
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Hi,

I have applied only 1805 patch, not 1810.
And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5).
It was failing very consistently in our environment, and after 1805 patch
it went smoothly.

Regards,
Deepak





On Wed, Mar 5, 2014 at 7:36 AM, German Blanco <
german.blanco.blanco@gmail.com> wrote:

> Hello,
>
> do you mean ZOOKEEPER-1810 patch?
> That one alone doesn't solve the problem. On the other hand, the problem
> doesn't happen always, so after a rolling start it might get solved.
> We need 1818 as well, but it is easier to go step by step and get 1810 in
> trunk first.
> I hope that as soon as 3.4.6 is out this might get some attention.
>
> Regards,
>
> German.
>
>
> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
>
> > Hi,
> >
> > Please ignore the previous comment, I used wrong jar file and hence
> rolling
> > upgrade failed.
> > After applying patch for bug  on zookeeper-3.5.0.1562289
> > revision, rolling upgrade went fine.
> >
> > I have patched in house zookeeper version, but it would be convenient if
> we
> > apply patch on trunk and use the latest trunk.
> > Please advise if I can apply the patch on the trunk and test it for you.
> >
> > Thanks & Regards,
> > Deepak
> >
> >
> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <deepak.jagtap@maxta.com
> > >wrote:
> >
> > > Hi German,
> > >
> > > I tried applying patch for 1805 but problem still persists.
> > > Following are the notification messages logged repeatedly by the node
> > > which fails to join the quorum:
> > >
> > >
> > > 2014-03-04 20:00:54,398 [myid:2] - INFO
> > >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
> > > Notification time out: 51200
> > > 2014-03-04 20:00:54,400 [myid:2] - INFO
> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2
> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid),
> > 0x0
> > > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> > > 2014-03-04 20:00:54,401 [myid:2] - INFO
> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 1
> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
> > > 2014-03-04 20:00:54,403 [myid:2] - INFO
> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), LEADING
> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config
> > > version)
> > >
> > >
> > >
> > > Patch for 1732 is already included in the trunk.
> > >
> > >
> > > Thanks & Regards,
> > > Deepak
> > >
> > >
> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <
> deepak.jagtap@maxta.com
> > >wrote:
> > >
> > >> Hi Flavio, German,
> > >>
> > >> Since this fix is critical for zookeeper rolling upgrade is it ok if I
> > >> apply this patch to 3.5.0 trunk?
> > >> Is it straightforward to apply this patch to trunk?
> > >>
> > >> Thanks & Regards,
> > >> Deepak
> > >>
> > >>
> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
> > deepak.jagtap@maxta.com>wrote:
> > >>
> > >>> Thanks German!
> > >>> Just wondering is there any chance that this patch may be applied to
> > >>> trunk in near future?
> > >>> If it's fine with you guys, I would be more than happy to apply the
> > >>> fixes (from 3.4.5) to trunk and test them.
> > >>>
> > >>> Thanks & Regards,
> > >>> Deepak
> > >>>
> > >>>
> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
> > >>> german.blanco.blanco@gmail.com> wrote:
> > >>>
> > >>>> Hello Deepak,
> > >>>>
> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases
> in
> > >>>> which an ensemble can be formed so that it doesn't allow any other
> > >>>> zookeeper server to join.
> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk
> > >>>> yet.
> > >>>> Check if the Notifications sent around contain different values for
> > the
> > >>>> vote in the members of the ensemble.
> > >>>> If you force a new election (e.g. by killing the leader) I guess
> > >>>> everything
> > >>>> should work normally, but don't take my word for it.
> > >>>> Flavio should know more about this.
> > >>>>
> > >>>> Cheers,
> > >>>>
> > >>>> German.
> > >>>>
> > >>>>
> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
> > deepak.jagtap@maxta.com
> > >>>> >wrote:
> > >>>>
> > >>>> > Hi,
> > >>>> >
> > >>>> > I replacing one of the zookeeper server from 3 node quorum.
> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976 version.
> > >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
> > >>>> > When I am trying to replace Node2 with the same zookeeper version.
> > >>>> > I couldn't start zookeeper server on Node2 as it is continuously
> > >>>> stuck in
> > >>>> > leader election loop printing  following messages:
> > >>>> >
> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
> > >>>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
> -
> > >>>> > Notification time out: 60000
> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
> > >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller
> server
> > >>>> > identifier, so dropping the connection: (5, 3)
> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
> > >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification:
> 3
> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3
> > >>>> (n.sid), 0x0
> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> > >>>> >
> > >>>> >
> > >>>> > Network connections and configuration of the node being upgraded
> are
> > >>>> fine.
> > >>>> > The other 2 nodes in the quorum are fine and serving the request.
> > >>>> >
> > >>>> > Any idea what might be causing this?
> > >>>> >
> > >>>> > Thanks & Regards,
> > >>>> > Deepak
> > >>>> >
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by German Blanco <ge...@gmail.com>.
Hello,

do you mean ZOOKEEPER-1810 patch?
That one alone doesn't solve the problem. On the other hand, the problem
doesn't happen always, so after a rolling start it might get solved.
We need 1818 as well, but it is easier to go step by step and get 1810 in
trunk first.
I hope that as soon as 3.4.6 is out this might get some attention.

Regards,

German.


On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <de...@maxta.com>wrote:

> Hi,
>
> Please ignore the previous comment, I used wrong jar file and hence rolling
> upgrade failed.
> After applying patch for bug  on zookeeper-3.5.0.1562289
> revision, rolling upgrade went fine.
>
> I have patched in house zookeeper version, but it would be convenient if we
> apply patch on trunk and use the latest trunk.
> Please advise if I can apply the patch on the trunk and test it for you.
>
> Thanks & Regards,
> Deepak
>
>
> On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
>
> > Hi German,
> >
> > I tried applying patch for 1805 but problem still persists.
> > Following are the notification messages logged repeatedly by the node
> > which fails to join the quorum:
> >
> >
> > 2014-03-04 20:00:54,398 [myid:2] - INFO
> >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
> > Notification time out: 51200
> > 2014-03-04 20:00:54,400 [myid:2] - INFO
> >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2
> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid),
> 0x0
> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> > 2014-03-04 20:00:54,401 [myid:2] - INFO
> >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 1
> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
> > 2014-03-04 20:00:54,403 [myid:2] - INFO
> >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), LEADING
> > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config
> > version)
> >
> >
> >
> > Patch for 1732 is already included in the trunk.
> >
> >
> > Thanks & Regards,
> > Deepak
> >
> >
> > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
> >
> >> Hi Flavio, German,
> >>
> >> Since this fix is critical for zookeeper rolling upgrade is it ok if I
> >> apply this patch to 3.5.0 trunk?
> >> Is it straightforward to apply this patch to trunk?
> >>
> >> Thanks & Regards,
> >> Deepak
> >>
> >>
> >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
> deepak.jagtap@maxta.com>wrote:
> >>
> >>> Thanks German!
> >>> Just wondering is there any chance that this patch may be applied to
> >>> trunk in near future?
> >>> If it's fine with you guys, I would be more than happy to apply the
> >>> fixes (from 3.4.5) to trunk and test them.
> >>>
> >>> Thanks & Regards,
> >>> Deepak
> >>>
> >>>
> >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
> >>> german.blanco.blanco@gmail.com> wrote:
> >>>
> >>>> Hello Deepak,
> >>>>
> >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in
> >>>> which an ensemble can be formed so that it doesn't allow any other
> >>>> zookeeper server to join.
> >>>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk
> >>>> yet.
> >>>> Check if the Notifications sent around contain different values for
> the
> >>>> vote in the members of the ensemble.
> >>>> If you force a new election (e.g. by killing the leader) I guess
> >>>> everything
> >>>> should work normally, but don't take my word for it.
> >>>> Flavio should know more about this.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> German.
> >>>>
> >>>>
> >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
> deepak.jagtap@maxta.com
> >>>> >wrote:
> >>>>
> >>>> > Hi,
> >>>> >
> >>>> > I replacing one of the zookeeper server from 3 node quorum.
> >>>> > Initially all zookeeper serves were running 3.5.0.1515976 version.
> >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
> >>>> > When I am trying to replace Node2 with the same zookeeper version.
> >>>> > I couldn't start zookeeper server on Node2 as it is continuously
> >>>> stuck in
> >>>> > leader election loop printing  following messages:
> >>>> >
> >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
> >>>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
> >>>> > Notification time out: 60000
> >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
> >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server
> >>>> > identifier, so dropping the connection: (5, 3)
> >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
> >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3
> >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3
> >>>> (n.sid), 0x0
> >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> >>>> >
> >>>> >
> >>>> > Network connections and configuration of the node being upgraded are
> >>>> fine.
> >>>> > The other 2 nodes in the quorum are fine and serving the request.
> >>>> >
> >>>> > Any idea what might be causing this?
> >>>> >
> >>>> > Thanks & Regards,
> >>>> > Deepak
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Hi,

Please ignore the previous comment, I used wrong jar file and hence rolling
upgrade failed.
After applying patch for bug  on zookeeper-3.5.0.1562289
revision, rolling upgrade went fine.

I have patched in house zookeeper version, but it would be convenient if we
apply patch on trunk and use the latest trunk.
Please advise if I can apply the patch on the trunk and test it for you.

Thanks & Regards,
Deepak


On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <de...@maxta.com>wrote:

> Hi German,
>
> I tried applying patch for 1805 but problem still persists.
> Following are the notification messages logged repeatedly by the node
> which fails to join the quorum:
>
>
> 2014-03-04 20:00:54,398 [myid:2] - INFO
>  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
> Notification time out: 51200
> 2014-03-04 20:00:54,400 [myid:2] - INFO
>  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2
> (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0
> (n.peerEPoch), LOOKING (my state)1 (n.config version)
> 2014-03-04 20:00:54,401 [myid:2] - INFO
>  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 1
> (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
> 2014-03-04 20:00:54,403 [myid:2] - INFO
>  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), LEADING
> (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config
> version)
>
>
>
> Patch for 1732 is already included in the trunk.
>
>
> Thanks & Regards,
> Deepak
>
>
> On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <de...@maxta.com>wrote:
>
>> Hi Flavio, German,
>>
>> Since this fix is critical for zookeeper rolling upgrade is it ok if I
>> apply this patch to 3.5.0 trunk?
>> Is it straightforward to apply this patch to trunk?
>>
>> Thanks & Regards,
>> Deepak
>>
>>
>> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <de...@maxta.com>wrote:
>>
>>> Thanks German!
>>> Just wondering is there any chance that this patch may be applied to
>>> trunk in near future?
>>> If it's fine with you guys, I would be more than happy to apply the
>>> fixes (from 3.4.5) to trunk and test them.
>>>
>>> Thanks & Regards,
>>> Deepak
>>>
>>>
>>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
>>> german.blanco.blanco@gmail.com> wrote:
>>>
>>>> Hello Deepak,
>>>>
>>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in
>>>> which an ensemble can be formed so that it doesn't allow any other
>>>> zookeeper server to join.
>>>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk
>>>> yet.
>>>> Check if the Notifications sent around contain different values for the
>>>> vote in the members of the ensemble.
>>>> If you force a new election (e.g. by killing the leader) I guess
>>>> everything
>>>> should work normally, but don't take my word for it.
>>>> Flavio should know more about this.
>>>>
>>>> Cheers,
>>>>
>>>> German.
>>>>
>>>>
>>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <deepak.jagtap@maxta.com
>>>> >wrote:
>>>>
>>>> > Hi,
>>>> >
>>>> > I replacing one of the zookeeper server from 3 node quorum.
>>>> > Initially all zookeeper serves were running 3.5.0.1515976 version.
>>>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
>>>> > When I am trying to replace Node2 with the same zookeeper version.
>>>> > I couldn't start zookeeper server on Node2 as it is continuously
>>>> stuck in
>>>> > leader election loop printing  following messages:
>>>> >
>>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
>>>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
>>>> > Notification time out: 60000
>>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
>>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server
>>>> > identifier, so dropping the connection: (5, 3)
>>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
>>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3
>>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3
>>>> (n.sid), 0x0
>>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>>> >
>>>> >
>>>> > Network connections and configuration of the node being upgraded are
>>>> fine.
>>>> > The other 2 nodes in the quorum are fine and serving the request.
>>>> >
>>>> > Any idea what might be causing this?
>>>> >
>>>> > Thanks & Regards,
>>>> > Deepak
>>>> >
>>>>
>>>
>>>
>>
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Hi German,

I tried applying patch for 1805 but problem still persists.
Following are the notification messages logged repeatedly by the node which
fails to join the quorum:


2014-03-04 20:00:54,398 [myid:2] - INFO
 [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
Notification time out: 51200
2014-03-04 20:00:54,400 [myid:2] - INFO
 [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2
(n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0
(n.peerEPoch), LOOKING (my state)1 (n.config version)
2014-03-04 20:00:54,401 [myid:2] - INFO
 [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
(n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 1
(n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
2014-03-04 20:00:54,403 [myid:2] - INFO
 [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
(n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), LEADING
(n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config
version)



Patch for 1732 is already included in the trunk.


Thanks & Regards,
Deepak


On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <de...@maxta.com>wrote:

> Hi Flavio, German,
>
> Since this fix is critical for zookeeper rolling upgrade is it ok if I
> apply this patch to 3.5.0 trunk?
> Is it straightforward to apply this patch to trunk?
>
> Thanks & Regards,
> Deepak
>
>
> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <de...@maxta.com>wrote:
>
>> Thanks German!
>> Just wondering is there any chance that this patch may be applied to
>> trunk in near future?
>> If it's fine with you guys, I would be more than happy to apply the fixes
>> (from 3.4.5) to trunk and test them.
>>
>> Thanks & Regards,
>> Deepak
>>
>>
>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
>> german.blanco.blanco@gmail.com> wrote:
>>
>>> Hello Deepak,
>>>
>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in
>>> which an ensemble can be formed so that it doesn't allow any other
>>> zookeeper server to join.
>>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk yet.
>>> Check if the Notifications sent around contain different values for the
>>> vote in the members of the ensemble.
>>> If you force a new election (e.g. by killing the leader) I guess
>>> everything
>>> should work normally, but don't take my word for it.
>>> Flavio should know more about this.
>>>
>>> Cheers,
>>>
>>> German.
>>>
>>>
>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <deepak.jagtap@maxta.com
>>> >wrote:
>>>
>>> > Hi,
>>> >
>>> > I replacing one of the zookeeper server from 3 node quorum.
>>> > Initially all zookeeper serves were running 3.5.0.1515976 version.
>>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
>>> > When I am trying to replace Node2 with the same zookeeper version.
>>> > I couldn't start zookeeper server on Node2 as it is continuously stuck
>>> in
>>> > leader election loop printing  following messages:
>>> >
>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
>>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
>>> > Notification time out: 60000
>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server
>>> > identifier, so dropping the connection: (5, 3)
>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3
>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid),
>>> 0x0
>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>>> >
>>> >
>>> > Network connections and configuration of the node being upgraded are
>>> fine.
>>> > The other 2 nodes in the quorum are fine and serving the request.
>>> >
>>> > Any idea what might be causing this?
>>> >
>>> > Thanks & Regards,
>>> > Deepak
>>> >
>>>
>>
>>
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Hi Flavio, German,

Since this fix is critical for zookeeper rolling upgrade is it ok if I
apply this patch to 3.5.0 trunk?
Is it straightforward to apply this patch to trunk?

Thanks & Regards,
Deepak


On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <de...@maxta.com>wrote:

> Thanks German!
> Just wondering is there any chance that this patch may be applied to trunk
> in near future?
> If it's fine with you guys, I would be more than happy to apply the fixes
> (from 3.4.5) to trunk and test them.
>
> Thanks & Regards,
> Deepak
>
>
> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
> german.blanco.blanco@gmail.com> wrote:
>
>> Hello Deepak,
>>
>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in
>> which an ensemble can be formed so that it doesn't allow any other
>> zookeeper server to join.
>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk yet.
>> Check if the Notifications sent around contain different values for the
>> vote in the members of the ensemble.
>> If you force a new election (e.g. by killing the leader) I guess
>> everything
>> should work normally, but don't take my word for it.
>> Flavio should know more about this.
>>
>> Cheers,
>>
>> German.
>>
>>
>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <deepak.jagtap@maxta.com
>> >wrote:
>>
>> > Hi,
>> >
>> > I replacing one of the zookeeper server from 3 node quorum.
>> > Initially all zookeeper serves were running 3.5.0.1515976 version.
>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
>> > When I am trying to replace Node2 with the same zookeeper version.
>> > I couldn't start zookeeper server on Node2 as it is continuously stuck
>> in
>> > leader election loop printing  following messages:
>> >
>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
>> > Notification time out: 60000
>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server
>> > identifier, so dropping the connection: (5, 3)
>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3
>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid),
>> 0x0
>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
>> >
>> >
>> > Network connections and configuration of the node being upgraded are
>> fine.
>> > The other 2 nodes in the quorum are fine and serving the request.
>> >
>> > Any idea what might be causing this?
>> >
>> > Thanks & Regards,
>> > Deepak
>> >
>>
>
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by Deepak Jagtap <de...@maxta.com>.
Thanks German!
Just wondering is there any chance that this patch may be applied to trunk
in near future?
If it's fine with you guys, I would be more than happy to apply the fixes
(from 3.4.5) to trunk and test them.

Thanks & Regards,
Deepak


On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
german.blanco.blanco@gmail.com> wrote:

> Hello Deepak,
>
> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in
> which an ensemble can be formed so that it doesn't allow any other
> zookeeper server to join.
> This has been fixed in branch 3.4, but it hasn't been fixed in trunk yet.
> Check if the Notifications sent around contain different values for the
> vote in the members of the ensemble.
> If you force a new election (e.g. by killing the leader) I guess everything
> should work normally, but don't take my word for it.
> Flavio should know more about this.
>
> Cheers,
>
> German.
>
>
> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
>
> > Hi,
> >
> > I replacing one of the zookeeper server from 3 node quorum.
> > Initially all zookeeper serves were running 3.5.0.1515976 version.
> > I successfully replaced Node3 with newer version 3.5.0.1551730.
> > When I am trying to replace Node2 with the same zookeeper version.
> > I couldn't start zookeeper server on Node2 as it is continuously stuck in
> > leader election loop printing  following messages:
> >
> > 2014-02-26 02:45:23,709 [myid:3] - INFO
> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
> > Notification time out: 60000
> > 2014-02-26 02:45:23,710 [myid:3] - INFO
> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server
> > identifier, so dropping the connection: (5, 3)
> > 2014-02-26 02:45:23,712 [myid:3] - INFO
> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3
> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid),
> 0x0
> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> >
> >
> > Network connections and configuration of the node being upgraded are
> fine.
> > The other 2 nodes in the quorum are fine and serving the request.
> >
> > Any idea what might be causing this?
> >
> > Thanks & Regards,
> > Deepak
> >
>

Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"

Posted by German Blanco <ge...@gmail.com>.
Hello Deepak,

due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in
which an ensemble can be formed so that it doesn't allow any other
zookeeper server to join.
This has been fixed in branch 3.4, but it hasn't been fixed in trunk yet.
Check if the Notifications sent around contain different values for the
vote in the members of the ensemble.
If you force a new election (e.g. by killing the leader) I guess everything
should work normally, but don't take my word for it.
Flavio should know more about this.

Cheers,

German.


On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <de...@maxta.com>wrote:

> Hi,
>
> I replacing one of the zookeeper server from 3 node quorum.
> Initially all zookeeper serves were running 3.5.0.1515976 version.
> I successfully replaced Node3 with newer version 3.5.0.1551730.
> When I am trying to replace Node2 with the same zookeeper version.
> I couldn't start zookeeper server on Node2 as it is continuously stuck in
> leader election loop printing  following messages:
>
> 2014-02-26 02:45:23,709 [myid:3] - INFO
>  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
> Notification time out: 60000
> 2014-02-26 02:45:23,710 [myid:3] - INFO
>  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server
> identifier, so dropping the connection: (5, 3)
> 2014-02-26 02:45:23,712 [myid:3] - INFO
>  [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3
> (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x0
> (n.peerEPoch), LOOKING (my state)1 (n.config version)
>
>
> Network connections and configuration of the node being upgraded are fine.
> The other 2 nodes in the quorum are fine and serving the request.
>
> Any idea what might be causing this?
>
> Thanks & Regards,
> Deepak
>