You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Debraj Manna <su...@gmail.com> on 2019/08/20 09:44:33 UTC

The current epoch, 7, is older than the last zxid, 8589935882

Hi

I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes after
reboot of machine zookeeper is not starting and I am seeing the below
errors in logs.

I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 . Can
someone let me if this is fixed in 3.4.13 or not as I can see the issue
still open? Also can somone suggest what is the recommended way to recover
the set-up ?

2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - Unable to
load database on disk
java.io.IOException: The current epoch, 7, is older than the last zxid,
34359738370
at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
Caused by: java.io.IOException: The current epoch, 7, is older than the
last zxid, 34359738370
at
org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
... 4 more----

Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Debraj Manna <su...@gmail.com>.
Is there any issue with zookeeper 3.4.13?

On Thu, Aug 29, 2019 at 10:13 AM Andor Molnar <an...@apache.org> wrote:

> Thanks for the info, I’m still looking.
> So, this is an Ubuntu packaged version of ZooKeeper.
>
> Andor
>
>
>
> > On 2019. Aug 27., at 14:13, Debraj Manna <su...@gmail.com>
> wrote:
> >
> > No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2
> >
> > I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can
> see
> > zookeeper is getting started with 3.4.13 as shown below . The complete
> logs
> > are placed in the below gist
> >
> > https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9
> >
> > nohup java -Dzookeeper.datadir.autocreate=false
> > -Dzookeeper.log.dir=/var/log/zookeeper
> > -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp
> >
> '/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/*:/usr/lib/zookeeper/lib/*'
> > -Dzookeeper.log.threshold=INFO -Dcom.sun.management.jmxremote
> > -Dcom.sun.management.jmxremote.local.only=false
> > org.apache.zookeeper.server.quorum.QuorumPeerMain
> > /etc/zookeeper/conf/zoo.cfg
> > + sleep 1
> > + echo STARTED
> > STARTED
> >
> > The content of zookeeper.log is placed in the below gist after the start
> >
> > https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6
> >
> > Let me know if you need any more logs.
> >
> > On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar <an...@apache.org> wrote:
> >
> >> I confirmed that the fix is included in 3.4.13. That’s why I asked if
> you
> >> can see ‘updatingEpoch’ file in the data folder.
> >>
> >> I don’t think the issue is not related, but I want to make sure that
> >> you’re running the right version by verifying the beginning of ZK logs.
> >>
> >> Andor
> >>
> >>
> >>
> >>> On 2019. Aug 26., at 13:43, Debraj Manna <su...@gmail.com>
> >> wrote:
> >>>
> >>> Below is the content of currentEpoch.tmp
> >>>
> >>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
> >>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
> >>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >> currentEpoch.tmp
> >>> 8support@platform2
> >>>
> >>> Starting zookeeper logs are rolled over as the issue was there for some
> >>> time. Will the current log with the node in this state help? Btw why do
> >> you
> >>> think this issue may not be related to zookeeper?
> >>>
> >>>
> >>>
> >>> On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <an...@apache.org> wrote:
> >>>
> >>>> Hi Debraj,
> >>>>
> >>>> The fix should be in all 3.4 versions from 3.4.6 onward, including
> >> 3.4.13.
> >>>> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
> >>>> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to
> >> ZooKeeper.
> >>>>
> >>>> Would you please share full startup logs of the failing node?
> >>>>
> >>>> Regards,
> >>>> Andor
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On 2019. Aug 23., at 18:53, Debraj Manna <su...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Can someone answer by below query?
> >>>>>
> >>>>> I am getting confused after going through ZOOKEEPER-1653
> >>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
> >>>> ZOOKEEPER-2354
> >>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
> >> say
> >>>> it
> >>>>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
> >>>> 3.4.13
> >>>>> also. Can someone let me know if the issue is present in 3.4.13 also?
> >>>>>
> >>>>>
> >>>>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <
> subharaj.manna@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> With the other two zookeeper servers running I stopped the zookeeper
> >> in
> >>>>>> the broken node and the deleted all the contents inside
> >>>> /var/lib/zookeeper/version-2
> >>>>>> and started the zookeeper back on the node. It is running fine now
> and
> >>>> got
> >>>>>> all the data from the other servers.
> >>>>>>
> >>>>>> I am getting confused after going through ZOOKEEPER-1653
> >>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
> >>>> ZOOKEEPER-2354
> >>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
> >> say
> >>>>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue
> in
> >>>>>> 3.4.13 also. Can someone let me know if the issue is present in
> 3.4.13
> >>>> also?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <
> >> subharaj.manna@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for replying.
> >>>>>>>
> >>>>>>> What is the recommended way to remove a node and delete all data
> from
> >>>> it
> >>>>>>> and make it start fresh?
> >>>>>>>
> >>>>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <
> eolivelli@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello,
> >>>>>>>> Sorry for so late reply.
> >>>>>>>> If you have 3 servers you can nuke the broken one and make it
> start
> >>>> from
> >>>>>>>> scratch, it will join the cluster and then recover data from the
> >> other
> >>>>>>>> servers
> >>>>>>>>
> >>>>>>>> Try it in a staging env, not in production
> >>>>>>>>
> >>>>>>>> Enrico
> >>>>>>>>
> >>>>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com>
> >> ha
> >>>>>>>> scritto:
> >>>>>>>>
> >>>>>>>>> The same has been asked in stackoverflow
> >>>>>>>>> <
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
> >>>>>>>>>>
> >>>>>>>>> also. But no response there also.
> >>>>>>>>>
> >>>>>>>>> Anyone any thoughts on this one?
> >>>>>>>>>
> >>>>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <
> >>>> subharaj.manna@gmail.com
> >>>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Posted wrong Jira link. I meant
> >>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can
> >> someone
> >>>>>>>> let
> >>>>>>>>> me
> >>>>>>>>>> know what is the recommended way to recover the node?
> >>>>>>>>>>
> >>>>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >>>>>>>> acceptedEpoch
> >>>>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >>>>>>>> currentEpoch
> >>>>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >>>>>>>>> currentEpoch.tmp
> >>>>>>>>>> 8support@platform2
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
> >>>>>>>> subharaj.manna@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi
> >>>>>>>>>>>
> >>>>>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13.
> >>>> Sometimes
> >>>>>>>>>>> after reboot of machine zookeeper is not starting and I am
> seeing
> >>>>>>>> the
> >>>>>>>>> below
> >>>>>>>>>>> errors in logs.
> >>>>>>>>>>>
> >>>>>>>>>>> I have seen
> https://issues.apache.org/jira/browse/ZOOKEEPER-1653
> >> .
> >>>>>>>> Can
> >>>>>>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see
> the
> >>>>>>>> issue
> >>>>>>>>>>> still open? Also can somone suggest what is the recommended way
> >> to
> >>>>>>>>> recover
> >>>>>>>>>>> the set-up ?
> >>>>>>>>>>>
> >>>>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692]
> -
> >>>>>>>> Unable
> >>>>>>>>>>> to load database on disk
> >>>>>>>>>>> java.io.IOException: The current epoch, 7, is older than the
> last
> >>>>>>>> zxid,
> >>>>>>>>>>> 34359738370
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >>>>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR
> [main:QuorumPeerMain@92
> >> ]
> >>>> -
> >>>>>>>>>>> Unexpected exception, exiting abnormally
> >>>>>>>>>>> java.lang.RuntimeException: Unable to run quorum server
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >>>>>>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older
> >> than
> >>>>>>>> the
> >>>>>>>>>>> last zxid, 34359738370
> >>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >>>>>>>>>>> ... 4 more----
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Andor Molnar <an...@apache.org>.
Thanks for the info, I’m still looking.
So, this is an Ubuntu packaged version of ZooKeeper.

Andor



> On 2019. Aug 27., at 14:13, Debraj Manna <su...@gmail.com> wrote:
> 
> No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2
> 
> I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can see
> zookeeper is getting started with 3.4.13 as shown below . The complete logs
> are placed in the below gist
> 
> https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9
> 
> nohup java -Dzookeeper.datadir.autocreate=false
> -Dzookeeper.log.dir=/var/log/zookeeper
> -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp
> '/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/*:/usr/lib/zookeeper/lib/*'
> -Dzookeeper.log.threshold=INFO -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> org.apache.zookeeper.server.quorum.QuorumPeerMain
> /etc/zookeeper/conf/zoo.cfg
> + sleep 1
> + echo STARTED
> STARTED
> 
> The content of zookeeper.log is placed in the below gist after the start
> 
> https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6
> 
> Let me know if you need any more logs.
> 
> On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar <an...@apache.org> wrote:
> 
>> I confirmed that the fix is included in 3.4.13. That’s why I asked if you
>> can see ‘updatingEpoch’ file in the data folder.
>> 
>> I don’t think the issue is not related, but I want to make sure that
>> you’re running the right version by verifying the beginning of ZK logs.
>> 
>> Andor
>> 
>> 
>> 
>>> On 2019. Aug 26., at 13:43, Debraj Manna <su...@gmail.com>
>> wrote:
>>> 
>>> Below is the content of currentEpoch.tmp
>>> 
>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>> currentEpoch.tmp
>>> 8support@platform2
>>> 
>>> Starting zookeeper logs are rolled over as the issue was there for some
>>> time. Will the current log with the node in this state help? Btw why do
>> you
>>> think this issue may not be related to zookeeper?
>>> 
>>> 
>>> 
>>> On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <an...@apache.org> wrote:
>>> 
>>>> Hi Debraj,
>>>> 
>>>> The fix should be in all 3.4 versions from 3.4.6 onward, including
>> 3.4.13.
>>>> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
>>>> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to
>> ZooKeeper.
>>>> 
>>>> Would you please share full startup logs of the failing node?
>>>> 
>>>> Regards,
>>>> Andor
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 2019. Aug 23., at 18:53, Debraj Manna <su...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Can someone answer by below query?
>>>>> 
>>>>> I am getting confused after going through ZOOKEEPER-1653
>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>>>> ZOOKEEPER-2354
>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
>> say
>>>> it
>>>>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>>>> 3.4.13
>>>>> also. Can someone let me know if the issue is present in 3.4.13 also?
>>>>> 
>>>>> 
>>>>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <su...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> With the other two zookeeper servers running I stopped the zookeeper
>> in
>>>>>> the broken node and the deleted all the contents inside
>>>> /var/lib/zookeeper/version-2
>>>>>> and started the zookeeper back on the node. It is running fine now and
>>>> got
>>>>>> all the data from the other servers.
>>>>>> 
>>>>>> I am getting confused after going through ZOOKEEPER-1653
>>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>>>> ZOOKEEPER-2354
>>>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
>> say
>>>>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>>>>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13
>>>> also?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <
>> subharaj.manna@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Thanks for replying.
>>>>>>> 
>>>>>>> What is the recommended way to remove a node and delete all data from
>>>> it
>>>>>>> and make it start fresh?
>>>>>>> 
>>>>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eo...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> Sorry for so late reply.
>>>>>>>> If you have 3 servers you can nuke the broken one and make it start
>>>> from
>>>>>>>> scratch, it will join the cluster and then recover data from the
>> other
>>>>>>>> servers
>>>>>>>> 
>>>>>>>> Try it in a staging env, not in production
>>>>>>>> 
>>>>>>>> Enrico
>>>>>>>> 
>>>>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com>
>> ha
>>>>>>>> scritto:
>>>>>>>> 
>>>>>>>>> The same has been asked in stackoverflow
>>>>>>>>> <
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>>>>>>>>>> 
>>>>>>>>> also. But no response there also.
>>>>>>>>> 
>>>>>>>>> Anyone any thoughts on this one?
>>>>>>>>> 
>>>>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <
>>>> subharaj.manna@gmail.com
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Posted wrong Jira link. I meant
>>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can
>> someone
>>>>>>>> let
>>>>>>>>> me
>>>>>>>>>> know what is the recommended way to recover the node?
>>>>>>>>>> 
>>>>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>>>>> acceptedEpoch
>>>>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>>>>> currentEpoch
>>>>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>>>>>> currentEpoch.tmp
>>>>>>>>>> 8support@platform2
>>>>>>>>>> 
>>>>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
>>>>>>>> subharaj.manna@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi
>>>>>>>>>>> 
>>>>>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13.
>>>> Sometimes
>>>>>>>>>>> after reboot of machine zookeeper is not starting and I am seeing
>>>>>>>> the
>>>>>>>>> below
>>>>>>>>>>> errors in logs.
>>>>>>>>>>> 
>>>>>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653
>> .
>>>>>>>> Can
>>>>>>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see the
>>>>>>>> issue
>>>>>>>>>>> still open? Also can somone suggest what is the recommended way
>> to
>>>>>>>>> recover
>>>>>>>>>>> the set-up ?
>>>>>>>>>>> 
>>>>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
>>>>>>>> Unable
>>>>>>>>>>> to load database on disk
>>>>>>>>>>> java.io.IOException: The current epoch, 7, is older than the last
>>>>>>>> zxid,
>>>>>>>>>>> 34359738370
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92
>> ]
>>>> -
>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>> java.lang.RuntimeException: Unable to run quorum server
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older
>> than
>>>>>>>> the
>>>>>>>>>>> last zxid, 34359738370
>>>>>>>>>>> at
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>>>>>> ... 4 more----
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Debraj Manna <su...@gmail.com>.
No I don't see the updatingEpoch file in /var/lib/zookeeper/version-2

I started zookeeper by adding set -x in /usr/bin/zookeeper-server I can see
zookeeper is getting started with 3.4.13 as shown below . The complete logs
are placed in the below gist

https://gist.github.com/debraj-manna/509ec3d497016c4a249ee2b8dace05d9

nohup java -Dzookeeper.datadir.autocreate=false
-Dzookeeper.log.dir=/var/log/zookeeper
-Dzookeeper.root.logger=INFO,ROLLINGFILE -cp
'/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/usr/lib/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/lib/zookeeper/bin/../lib/jline-2.11.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.13.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/*:/usr/lib/zookeeper/lib/*'
-Dzookeeper.log.threshold=INFO -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.local.only=false
org.apache.zookeeper.server.quorum.QuorumPeerMain
/etc/zookeeper/conf/zoo.cfg
+ sleep 1
+ echo STARTED
STARTED

The content of zookeeper.log is placed in the below gist after the start

https://gist.github.com/debraj-manna/9800c5bef32837c62bdfb324c0589ad6

Let me know if you need any more logs.

On Mon, Aug 26, 2019 at 9:21 PM Andor Molnar <an...@apache.org> wrote:

> I confirmed that the fix is included in 3.4.13. That’s why I asked if you
> can see ‘updatingEpoch’ file in the data folder.
>
> I don’t think the issue is not related, but I want to make sure that
> you’re running the right version by verifying the beginning of ZK logs.
>
> Andor
>
>
>
> > On 2019. Aug 26., at 13:43, Debraj Manna <su...@gmail.com>
> wrote:
> >
> > Below is the content of currentEpoch.tmp
> >
> > support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
> > 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
> > 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> currentEpoch.tmp
> > 8support@platform2
> >
> > Starting zookeeper logs are rolled over as the issue was there for some
> > time. Will the current log with the node in this state help? Btw why do
> you
> > think this issue may not be related to zookeeper?
> >
> >
> >
> > On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <an...@apache.org> wrote:
> >
> >> Hi Debraj,
> >>
> >> The fix should be in all 3.4 versions from 3.4.6 onward, including
> 3.4.13.
> >> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
> >> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to
> ZooKeeper.
> >>
> >> Would you please share full startup logs of the failing node?
> >>
> >> Regards,
> >> Andor
> >>
> >>
> >>
> >>
> >>> On 2019. Aug 23., at 18:53, Debraj Manna <su...@gmail.com>
> >> wrote:
> >>>
> >>> Can someone answer by below query?
> >>>
> >>> I am getting confused after going through ZOOKEEPER-1653
> >>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
> >> ZOOKEEPER-2354
> >>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
> say
> >> it
> >>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
> >> 3.4.13
> >>> also. Can someone let me know if the issue is present in 3.4.13 also?
> >>>
> >>>
> >>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <su...@gmail.com>
> >>> wrote:
> >>>
> >>>> With the other two zookeeper servers running I stopped the zookeeper
> in
> >>>> the broken node and the deleted all the contents inside
> >> /var/lib/zookeeper/version-2
> >>>> and started the zookeeper back on the node. It is running fine now and
> >> got
> >>>> all the data from the other servers.
> >>>>
> >>>> I am getting confused after going through ZOOKEEPER-1653
> >>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
> >> ZOOKEEPER-2354
> >>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues
> say
> >>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
> >>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13
> >> also?
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <
> subharaj.manna@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Thanks for replying.
> >>>>>
> >>>>> What is the recommended way to remove a node and delete all data from
> >> it
> >>>>> and make it start fresh?
> >>>>>
> >>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eo...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hello,
> >>>>>> Sorry for so late reply.
> >>>>>> If you have 3 servers you can nuke the broken one and make it start
> >> from
> >>>>>> scratch, it will join the cluster and then recover data from the
> other
> >>>>>> servers
> >>>>>>
> >>>>>> Try it in a staging env, not in production
> >>>>>>
> >>>>>> Enrico
> >>>>>>
> >>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com>
> ha
> >>>>>> scritto:
> >>>>>>
> >>>>>>> The same has been asked in stackoverflow
> >>>>>>> <
> >>>>>>>
> >>>>>>
> >>
> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
> >>>>>>>>
> >>>>>>> also. But no response there also.
> >>>>>>>
> >>>>>>> Anyone any thoughts on this one?
> >>>>>>>
> >>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <
> >> subharaj.manna@gmail.com
> >>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Posted wrong Jira link. I meant
> >>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can
> someone
> >>>>>> let
> >>>>>>> me
> >>>>>>>> know what is the recommended way to recover the node?
> >>>>>>>>
> >>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >>>>>> acceptedEpoch
> >>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >>>>>> currentEpoch
> >>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >>>>>>> currentEpoch.tmp
> >>>>>>>> 8support@platform2
> >>>>>>>>
> >>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
> >>>>>> subharaj.manna@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi
> >>>>>>>>>
> >>>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13.
> >> Sometimes
> >>>>>>>>> after reboot of machine zookeeper is not starting and I am seeing
> >>>>>> the
> >>>>>>> below
> >>>>>>>>> errors in logs.
> >>>>>>>>>
> >>>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653
> .
> >>>>>> Can
> >>>>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see the
> >>>>>> issue
> >>>>>>>>> still open? Also can somone suggest what is the recommended way
> to
> >>>>>>> recover
> >>>>>>>>> the set-up ?
> >>>>>>>>>
> >>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
> >>>>>> Unable
> >>>>>>>>> to load database on disk
> >>>>>>>>> java.io.IOException: The current epoch, 7, is older than the last
> >>>>>> zxid,
> >>>>>>>>> 34359738370
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>
> >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92
> ]
> >> -
> >>>>>>>>> Unexpected exception, exiting abnormally
> >>>>>>>>> java.lang.RuntimeException: Unable to run quorum server
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>
> >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >>>>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older
> than
> >>>>>> the
> >>>>>>>>> last zxid, 34359738370
> >>>>>>>>> at
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >>>>>>>>> ... 4 more----
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>
> >>
>
>

Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Andor Molnar <an...@apache.org>.
I confirmed that the fix is included in 3.4.13. That’s why I asked if you can see ‘updatingEpoch’ file in the data folder. 

I don’t think the issue is not related, but I want to make sure that you’re running the right version by verifying the beginning of ZK logs.

Andor



> On 2019. Aug 26., at 13:43, Debraj Manna <su...@gmail.com> wrote:
> 
> Below is the content of currentEpoch.tmp
> 
> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
> 8support@platform2
> 
> Starting zookeeper logs are rolled over as the issue was there for some
> time. Will the current log with the node in this state help? Btw why do you
> think this issue may not be related to zookeeper?
> 
> 
> 
> On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <an...@apache.org> wrote:
> 
>> Hi Debraj,
>> 
>> The fix should be in all 3.4 versions from 3.4.6 onward, including 3.4.13.
>> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
>> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to ZooKeeper.
>> 
>> Would you please share full startup logs of the failing node?
>> 
>> Regards,
>> Andor
>> 
>> 
>> 
>> 
>>> On 2019. Aug 23., at 18:53, Debraj Manna <su...@gmail.com>
>> wrote:
>>> 
>>> Can someone answer by below query?
>>> 
>>> I am getting confused after going through ZOOKEEPER-1653
>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>> ZOOKEEPER-2354
>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say
>> it
>>> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>> 3.4.13
>>> also. Can someone let me know if the issue is present in 3.4.13 also?
>>> 
>>> 
>>> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <su...@gmail.com>
>>> wrote:
>>> 
>>>> With the other two zookeeper servers running I stopped the zookeeper in
>>>> the broken node and the deleted all the contents inside
>> /var/lib/zookeeper/version-2
>>>> and started the zookeeper back on the node. It is running fine now and
>> got
>>>> all the data from the other servers.
>>>> 
>>>> I am getting confused after going through ZOOKEEPER-1653
>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
>> ZOOKEEPER-2354
>>>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say
>>>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>>>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13
>> also?
>>>> 
>>>> 
>>>> 
>>>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <su...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Thanks for replying.
>>>>> 
>>>>> What is the recommended way to remove a node and delete all data from
>> it
>>>>> and make it start fresh?
>>>>> 
>>>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eo...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> Sorry for so late reply.
>>>>>> If you have 3 servers you can nuke the broken one and make it start
>> from
>>>>>> scratch, it will join the cluster and then recover data from the other
>>>>>> servers
>>>>>> 
>>>>>> Try it in a staging env, not in production
>>>>>> 
>>>>>> Enrico
>>>>>> 
>>>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com> ha
>>>>>> scritto:
>>>>>> 
>>>>>>> The same has been asked in stackoverflow
>>>>>>> <
>>>>>>> 
>>>>>> 
>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>>>>>>>> 
>>>>>>> also. But no response there also.
>>>>>>> 
>>>>>>> Anyone any thoughts on this one?
>>>>>>> 
>>>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <
>> subharaj.manna@gmail.com
>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Posted wrong Jira link. I meant
>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone
>>>>>> let
>>>>>>> me
>>>>>>>> know what is the recommended way to recover the node?
>>>>>>>> 
>>>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>>> acceptedEpoch
>>>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>>> currentEpoch
>>>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>>>> currentEpoch.tmp
>>>>>>>> 8support@platform2
>>>>>>>> 
>>>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
>>>>>> subharaj.manna@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi
>>>>>>>>> 
>>>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13.
>> Sometimes
>>>>>>>>> after reboot of machine zookeeper is not starting and I am seeing
>>>>>> the
>>>>>>> below
>>>>>>>>> errors in logs.
>>>>>>>>> 
>>>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 .
>>>>>> Can
>>>>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see the
>>>>>> issue
>>>>>>>>> still open? Also can somone suggest what is the recommended way to
>>>>>>> recover
>>>>>>>>> the set-up ?
>>>>>>>>> 
>>>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
>>>>>> Unable
>>>>>>>>> to load database on disk
>>>>>>>>> java.io.IOException: The current epoch, 7, is older than the last
>>>>>> zxid,
>>>>>>>>> 34359738370
>>>>>>>>> at
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>>>> at
>>>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>>>> at
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>>>> at
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>>>> at
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92]
>> -
>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>> java.lang.RuntimeException: Unable to run quorum server
>>>>>>>>> at
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>>>>>>>>> at
>>>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>>>> at
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>>>> at
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>>>> at
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older than
>>>>>> the
>>>>>>>>> last zxid, 34359738370
>>>>>>>>> at
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>>>> ... 4 more----
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>> 
>> 


Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Debraj Manna <su...@gmail.com>.
Below is the content of currentEpoch.tmp

support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
7support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
8support@platform2

Starting zookeeper logs are rolled over as the issue was there for some
time. Will the current log with the node in this state help? Btw why do you
think this issue may not be related to zookeeper?



On Mon, Aug 26, 2019 at 4:56 PM Andor Molnar <an...@apache.org> wrote:

> Hi Debraj,
>
> The fix should be in all 3.4 versions from 3.4.6 onward, including 3.4.13.
> Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
> Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to ZooKeeper.
>
> Would you please share full startup logs of the failing node?
>
> Regards,
> Andor
>
>
>
>
> > On 2019. Aug 23., at 18:53, Debraj Manna <su...@gmail.com>
> wrote:
> >
> > Can someone answer by below query?
> >
> > I am getting confused after going through ZOOKEEPER-1653
> > <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
> ZOOKEEPER-2354
> > <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say
> it
> > is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
> 3.4.13
> > also. Can someone let me know if the issue is present in 3.4.13 also?
> >
> >
> > On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <su...@gmail.com>
> > wrote:
> >
> >> With the other two zookeeper servers running I stopped the zookeeper in
> >> the broken node and the deleted all the contents inside
> /var/lib/zookeeper/version-2
> >> and started the zookeeper back on the node. It is running fine now and
> got
> >> all the data from the other servers.
> >>
> >> I am getting confused after going through ZOOKEEPER-1653
> >> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and
> ZOOKEEPER-2354
> >> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say
> >> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
> >> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13
> also?
> >>
> >>
> >>
> >> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <su...@gmail.com>
> >> wrote:
> >>
> >>> Thanks for replying.
> >>>
> >>> What is the recommended way to remove a node and delete all data from
> it
> >>> and make it start fresh?
> >>>
> >>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eo...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hello,
> >>>> Sorry for so late reply.
> >>>> If you have 3 servers you can nuke the broken one and make it start
> from
> >>>> scratch, it will join the cluster and then recover data from the other
> >>>> servers
> >>>>
> >>>> Try it in a staging env, not in production
> >>>>
> >>>> Enrico
> >>>>
> >>>> Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com> ha
> >>>> scritto:
> >>>>
> >>>>> The same has been asked in stackoverflow
> >>>>> <
> >>>>>
> >>>>
> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
> >>>>>>
> >>>>> also. But no response there also.
> >>>>>
> >>>>> Anyone any thoughts on this one?
> >>>>>
> >>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <
> subharaj.manna@gmail.com
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Posted wrong Jira link. I meant
> >>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone
> >>>> let
> >>>>> me
> >>>>>> know what is the recommended way to recover the node?
> >>>>>>
> >>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >>>> acceptedEpoch
> >>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >>>> currentEpoch
> >>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> >>>>> currentEpoch.tmp
> >>>>>> 8support@platform2
> >>>>>>
> >>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
> >>>> subharaj.manna@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi
> >>>>>>>
> >>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13.
> Sometimes
> >>>>>>> after reboot of machine zookeeper is not starting and I am seeing
> >>>> the
> >>>>> below
> >>>>>>> errors in logs.
> >>>>>>>
> >>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 .
> >>>> Can
> >>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see the
> >>>> issue
> >>>>>>> still open? Also can somone suggest what is the recommended way to
> >>>>> recover
> >>>>>>> the set-up ?
> >>>>>>>
> >>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
> >>>> Unable
> >>>>>>> to load database on disk
> >>>>>>> java.io.IOException: The current epoch, 7, is older than the last
> >>>> zxid,
> >>>>>>> 34359738370
> >>>>>>> at
> >>>>>>>
> >>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >>>>>>> at
> >>>>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >>>>>>> at
> >>>>>>>
> >>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >>>>>>> at
> >>>>>>>
> >>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >>>>>>> at
> >>>>>>>
> >>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92]
> -
> >>>>>>> Unexpected exception, exiting abnormally
> >>>>>>> java.lang.RuntimeException: Unable to run quorum server
> >>>>>>> at
> >>>>>>>
> >>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
> >>>>>>> at
> >>>>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >>>>>>> at
> >>>>>>>
> >>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >>>>>>> at
> >>>>>>>
> >>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >>>>>>> at
> >>>>>>>
> >>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older than
> >>>> the
> >>>>>>> last zxid, 34359738370
> >>>>>>> at
> >>>>>>>
> >>>>>
> >>>>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >>>>>>> ... 4 more----
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
>
>

Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Andor Molnar <an...@apache.org>.
Hi Debraj,

The fix should be in all 3.4 versions from 3.4.6 onward, including 3.4.13.
Can you see ‘updatingEpoch’ file in /var/lib/zookeeper/version-2 ?
Also what is ‘currentEpoch.tmp’ ? I’m not sure if it relates to ZooKeeper.

Would you please share full startup logs of the failing node?

Regards,
Andor




> On 2019. Aug 23., at 18:53, Debraj Manna <su...@gmail.com> wrote:
> 
> Can someone answer by below query?
> 
> I am getting confused after going through ZOOKEEPER-1653
> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and ZOOKEEPER-2354
> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say it
> is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in 3.4.13
> also. Can someone let me know if the issue is present in 3.4.13 also?
> 
> 
> On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <su...@gmail.com>
> wrote:
> 
>> With the other two zookeeper servers running I stopped the zookeeper in
>> the broken node and the deleted all the contents inside /var/lib/zookeeper/version-2
>> and started the zookeeper back on the node. It is running fine now and got
>> all the data from the other servers.
>> 
>> I am getting confused after going through ZOOKEEPER-1653
>> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and ZOOKEEPER-2354
>> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say
>> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
>> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13 also?
>> 
>> 
>> 
>> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <su...@gmail.com>
>> wrote:
>> 
>>> Thanks for replying.
>>> 
>>> What is the recommended way to remove a node and delete all data from it
>>> and make it start fresh?
>>> 
>>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eo...@gmail.com>
>>> wrote:
>>> 
>>>> Hello,
>>>> Sorry for so late reply.
>>>> If you have 3 servers you can nuke the broken one and make it start from
>>>> scratch, it will join the cluster and then recover data from the other
>>>> servers
>>>> 
>>>> Try it in a staging env, not in production
>>>> 
>>>> Enrico
>>>> 
>>>> Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com> ha
>>>> scritto:
>>>> 
>>>>> The same has been asked in stackoverflow
>>>>> <
>>>>> 
>>>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>>>>>> 
>>>>> also. But no response there also.
>>>>> 
>>>>> Anyone any thoughts on this one?
>>>>> 
>>>>> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <subharaj.manna@gmail.com
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Posted wrong Jira link. I meant
>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone
>>>> let
>>>>> me
>>>>>> know what is the recommended way to recover the node?
>>>>>> 
>>>>>> support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>> acceptedEpoch
>>>>>> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>> currentEpoch
>>>>>> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>>>> currentEpoch.tmp
>>>>>> 8support@platform2
>>>>>> 
>>>>>> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
>>>> subharaj.manna@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> 
>>>>>>> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes
>>>>>>> after reboot of machine zookeeper is not starting and I am seeing
>>>> the
>>>>> below
>>>>>>> errors in logs.
>>>>>>> 
>>>>>>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 .
>>>> Can
>>>>>>> someone let me if this is fixed in 3.4.13 or not as I can see the
>>>> issue
>>>>>>> still open? Also can somone suggest what is the recommended way to
>>>>> recover
>>>>>>> the set-up ?
>>>>>>> 
>>>>>>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
>>>> Unable
>>>>>>> to load database on disk
>>>>>>> java.io.IOException: The current epoch, 7, is older than the last
>>>> zxid,
>>>>>>> 34359738370
>>>>>>> at
>>>>>>> 
>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>> at
>>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>> at
>>>>>>> 
>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>> at
>>>>>>> 
>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>> at
>>>>>>> 
>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
>>>>>>> Unexpected exception, exiting abnormally
>>>>>>> java.lang.RuntimeException: Unable to run quorum server
>>>>>>> at
>>>>>>> 
>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>>>>>>> at
>>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>>>>>> at
>>>>>>> 
>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>>>>>> at
>>>>>>> 
>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>>>>>> at
>>>>>>> 
>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>>>>>> Caused by: java.io.IOException: The current epoch, 7, is older than
>>>> the
>>>>>>> last zxid, 34359738370
>>>>>>> at
>>>>>>> 
>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>>>>>> ... 4 more----
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 


Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Debraj Manna <su...@gmail.com>.
Can someone answer by below query?

I am getting confused after going through ZOOKEEPER-1653
<https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and ZOOKEEPER-2354
<https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say it
is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in 3.4.13
also. Can someone let me know if the issue is present in 3.4.13 also?


On Wed 21 Aug, 2019, 12:35 PM Debraj Manna, <su...@gmail.com>
wrote:

> With the other two zookeeper servers running I stopped the zookeeper in
> the broken node and the deleted all the contents inside /var/lib/zookeeper/version-2
> and started the zookeeper back on the node. It is running fine now and got
> all the data from the other servers.
>
> I am getting confused after going through ZOOKEEPER-1653
> <https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and ZOOKEEPER-2354
> <https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say
> it is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in
> 3.4.13 also. Can someone let me know if the issue is present in 3.4.13 also?
>
>
>
> On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <su...@gmail.com>
> wrote:
>
>> Thanks for replying.
>>
>> What is the recommended way to remove a node and delete all data from it
>> and make it start fresh?
>>
>> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eo...@gmail.com>
>> wrote:
>>
>>> Hello,
>>> Sorry for so late reply.
>>> If you have 3 servers you can nuke the broken one and make it start from
>>> scratch, it will join the cluster and then recover data from the other
>>> servers
>>>
>>> Try it in a staging env, not in production
>>>
>>> Enrico
>>>
>>> Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com> ha
>>> scritto:
>>>
>>> > The same has been asked in stackoverflow
>>> > <
>>> >
>>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>>> > >
>>> > also. But no response there also.
>>> >
>>> > Anyone any thoughts on this one?
>>> >
>>> > On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <subharaj.manna@gmail.com
>>> >
>>> > wrote:
>>> >
>>> > > Posted wrong Jira link. I meant
>>> > > https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone
>>> let
>>> > me
>>> > > know what is the recommended way to recover the node?
>>> > >
>>> > > support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>> acceptedEpoch
>>> > > 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>> currentEpoch
>>> > > 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>>> > currentEpoch.tmp
>>> > > 8support@platform2
>>> > >
>>> > > On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
>>> subharaj.manna@gmail.com>
>>> > > wrote:
>>> > >
>>> > >> Hi
>>> > >>
>>> > >> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes
>>> > >> after reboot of machine zookeeper is not starting and I am seeing
>>> the
>>> > below
>>> > >> errors in logs.
>>> > >>
>>> > >> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 .
>>> Can
>>> > >> someone let me if this is fixed in 3.4.13 or not as I can see the
>>> issue
>>> > >> still open? Also can somone suggest what is the recommended way to
>>> > recover
>>> > >> the set-up ?
>>> > >>
>>> > >> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
>>> Unable
>>> > >> to load database on disk
>>> > >> java.io.IOException: The current epoch, 7, is older than the last
>>> zxid,
>>> > >> 34359738370
>>> > >> at
>>> > >>
>>> >
>>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>> > >> at
>>> > >>
>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>> > >> at
>>> > >>
>>> >
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>> > >> at
>>> > >>
>>> >
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>> > >> at
>>> > >>
>>> >
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>> > >> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
>>> > >> Unexpected exception, exiting abnormally
>>> > >> java.lang.RuntimeException: Unable to run quorum server
>>> > >> at
>>> > >>
>>> >
>>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>>> > >> at
>>> > >>
>>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>>> > >> at
>>> > >>
>>> >
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>>> > >> at
>>> > >>
>>> >
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>>> > >> at
>>> > >>
>>> >
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>>> > >> Caused by: java.io.IOException: The current epoch, 7, is older than
>>> the
>>> > >> last zxid, 34359738370
>>> > >> at
>>> > >>
>>> >
>>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>>> > >> ... 4 more----
>>> > >>
>>> > >>
>>> > >>
>>> >
>>>
>>

Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Debraj Manna <su...@gmail.com>.
With the other two zookeeper servers running I stopped the zookeeper in the
broken node and the deleted all the contents inside
/var/lib/zookeeper/version-2
and started the zookeeper back on the node. It is running fine now and got
all the data from the other servers.

I am getting confused after going through ZOOKEEPER-1653
<https://issues.apache.org/jira/browse/ZOOKEEPER-1653> and ZOOKEEPER-2354
<https://issues.apache.org/jira/browse/ZOOKEEPER-2354> . The issues say it
is fixed in 3.4.6 but exists in 3.5.x. But I am seeing the issue in 3.4.13
also. Can someone let me know if the issue is present in 3.4.13 also?



On Wed, Aug 21, 2019 at 8:54 AM Debraj Manna <su...@gmail.com>
wrote:

> Thanks for replying.
>
> What is the recommended way to remove a node and delete all data from it
> and make it start fresh?
>
> On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eo...@gmail.com>
> wrote:
>
>> Hello,
>> Sorry for so late reply.
>> If you have 3 servers you can nuke the broken one and make it start from
>> scratch, it will join the cluster and then recover data from the other
>> servers
>>
>> Try it in a staging env, not in production
>>
>> Enrico
>>
>> Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com> ha
>> scritto:
>>
>> > The same has been asked in stackoverflow
>> > <
>> >
>> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
>> > >
>> > also. But no response there also.
>> >
>> > Anyone any thoughts on this one?
>> >
>> > On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <su...@gmail.com>
>> > wrote:
>> >
>> > > Posted wrong Jira link. I meant
>> > > https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone
>> let
>> > me
>> > > know what is the recommended way to recover the node?
>> > >
>> > > support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>> acceptedEpoch
>> > > 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>> currentEpoch
>> > > 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
>> > currentEpoch.tmp
>> > > 8support@platform2
>> > >
>> > > On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <
>> subharaj.manna@gmail.com>
>> > > wrote:
>> > >
>> > >> Hi
>> > >>
>> > >> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes
>> > >> after reboot of machine zookeeper is not starting and I am seeing the
>> > below
>> > >> errors in logs.
>> > >>
>> > >> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 .
>> Can
>> > >> someone let me if this is fixed in 3.4.13 or not as I can see the
>> issue
>> > >> still open? Also can somone suggest what is the recommended way to
>> > recover
>> > >> the set-up ?
>> > >>
>> > >> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
>> Unable
>> > >> to load database on disk
>> > >> java.io.IOException: The current epoch, 7, is older than the last
>> zxid,
>> > >> 34359738370
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>> > >> at
>> > >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>> > >> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
>> > >> Unexpected exception, exiting abnormally
>> > >> java.lang.RuntimeException: Unable to run quorum server
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>> > >> at
>> > >>
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>> > >> Caused by: java.io.IOException: The current epoch, 7, is older than
>> the
>> > >> last zxid, 34359738370
>> > >> at
>> > >>
>> >
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>> > >> ... 4 more----
>> > >>
>> > >>
>> > >>
>> >
>>
>

Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Debraj Manna <su...@gmail.com>.
Thanks for replying.

What is the recommended way to remove a node and delete all data from it
and make it start fresh?

On Wed 21 Aug, 2019, 12:58 AM Enrico Olivelli, <eo...@gmail.com> wrote:

> Hello,
> Sorry for so late reply.
> If you have 3 servers you can nuke the broken one and make it start from
> scratch, it will join the cluster and then recover data from the other
> servers
>
> Try it in a staging env, not in production
>
> Enrico
>
> Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com> ha
> scritto:
>
> > The same has been asked in stackoverflow
> > <
> >
> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
> > >
> > also. But no response there also.
> >
> > Anyone any thoughts on this one?
> >
> > On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <su...@gmail.com>
> > wrote:
> >
> > > Posted wrong Jira link. I meant
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone let
> > me
> > > know what is the recommended way to recover the node?
> > >
> > > support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
> > > 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
> > > 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> > currentEpoch.tmp
> > > 8support@platform2
> > >
> > > On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <subharaj.manna@gmail.com
> >
> > > wrote:
> > >
> > >> Hi
> > >>
> > >> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes
> > >> after reboot of machine zookeeper is not starting and I am seeing the
> > below
> > >> errors in logs.
> > >>
> > >> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 .
> Can
> > >> someone let me if this is fixed in 3.4.13 or not as I can see the
> issue
> > >> still open? Also can somone suggest what is the recommended way to
> > recover
> > >> the set-up ?
> > >>
> > >> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] -
> Unable
> > >> to load database on disk
> > >> java.io.IOException: The current epoch, 7, is older than the last
> zxid,
> > >> 34359738370
> > >> at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> > >> at
> > >>
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> > >> at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> > >> at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> > >> at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> > >> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
> > >> Unexpected exception, exiting abnormally
> > >> java.lang.RuntimeException: Unable to run quorum server
> > >> at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
> > >> at
> > >>
> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> > >> at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> > >> at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> > >> at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> > >> Caused by: java.io.IOException: The current epoch, 7, is older than
> the
> > >> last zxid, 34359738370
> > >> at
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> > >> ... 4 more----
> > >>
> > >>
> > >>
> >
>

Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Enrico Olivelli <eo...@gmail.com>.
Hello,
Sorry for so late reply.
If you have 3 servers you can nuke the broken one and make it start from
scratch, it will join the cluster and then recover data from the other
servers

Try it in a staging env, not in production

Enrico

Il mar 20 ago 2019, 20:30 Debraj Manna <su...@gmail.com> ha
scritto:

> The same has been asked in stackoverflow
> <
> https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid
> >
> also. But no response there also.
>
> Anyone any thoughts on this one?
>
> On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <su...@gmail.com>
> wrote:
>
> > Posted wrong Jira link. I meant
> > https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone let
> me
> > know what is the recommended way to recover the node?
> >
> > support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
> > 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
> > 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat
> currentEpoch.tmp
> > 8support@platform2
> >
> > On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <su...@gmail.com>
> > wrote:
> >
> >> Hi
> >>
> >> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes
> >> after reboot of machine zookeeper is not starting and I am seeing the
> below
> >> errors in logs.
> >>
> >> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 . Can
> >> someone let me if this is fixed in 3.4.13 or not as I can see the issue
> >> still open? Also can somone suggest what is the recommended way to
> recover
> >> the set-up ?
> >>
> >> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - Unable
> >> to load database on disk
> >> java.io.IOException: The current epoch, 7, is older than the last zxid,
> >> 34359738370
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >> at
> >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
> >> Unexpected exception, exiting abnormally
> >> java.lang.RuntimeException: Unable to run quorum server
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
> >> at
> >> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> >> Caused by: java.io.IOException: The current epoch, 7, is older than the
> >> last zxid, 34359738370
> >> at
> >>
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> >> ... 4 more----
> >>
> >>
> >>
>

Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Debraj Manna <su...@gmail.com>.
The same has been asked in stackoverflow
<https://stackoverflow.com/questions/57574298/zookeeper-error-the-current-epoch-is-older-than-the-last-zxid>
also. But no response there also.

Anyone any thoughts on this one?

On Tue, Aug 20, 2019 at 4:43 PM Debraj Manna <su...@gmail.com>
wrote:

> Posted wrong Jira link. I meant
> https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone let me
> know what is the recommended way to recover the node?
>
> support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
> 8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
> 7support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
> 8support@platform2
>
> On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <su...@gmail.com>
> wrote:
>
>> Hi
>>
>> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes
>> after reboot of machine zookeeper is not starting and I am seeing the below
>> errors in logs.
>>
>> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 . Can
>> someone let me if this is fixed in 3.4.13 or not as I can see the issue
>> still open? Also can somone suggest what is the recommended way to recover
>> the set-up ?
>>
>> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - Unable
>> to load database on disk
>> java.io.IOException: The current epoch, 7, is older than the last zxid,
>> 34359738370
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
>> Unexpected exception, exiting abnormally
>> java.lang.RuntimeException: Unable to run quorum server
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>> Caused by: java.io.IOException: The current epoch, 7, is older than the
>> last zxid, 34359738370
>> at
>> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
>> ... 4 more----
>>
>>
>>

Re: The current epoch, 7, is older than the last zxid, 8589935882

Posted by Debraj Manna <su...@gmail.com>.
Posted wrong Jira link. I meant
https://issues.apache.org/jira/browse/ZOOKEEPER-2354.  Can someone let me
know what is the recommended way to recover the node?

support@platform2:/var/lib/zookeeper/version-2$ sudo cat acceptedEpoch
8support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch
7support@platform2:/var/lib/zookeeper/version-2$ sudo cat currentEpoch.tmp
8support@platform2

On Tue, Aug 20, 2019 at 3:14 PM Debraj Manna <su...@gmail.com>
wrote:

> Hi
>
> I am using a zookeeper ensemble of 3 nodes running 3.4.13. Sometimes after
> reboot of machine zookeeper is not starting and I am seeing the below
> errors in logs.
>
> I have seen https://issues.apache.org/jira/browse/ZOOKEEPER-1653 . Can
> someone let me if this is fixed in 3.4.13 or not as I can see the issue
> still open? Also can somone suggest what is the recommended way to recover
> the set-up ?
>
> 2019-08-19 04:18:36,906 [myid:2] - ERROR [main:QuorumPeer@692] - Unable
> to load database on disk
> java.io.IOException: The current epoch, 7, is older than the last zxid,
> 34359738370
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> 2019-08-19 04:18:36,908 [myid:2] - ERROR [main:QuorumPeerMain@92] -
> Unexpected exception, exiting abnormally
> java.lang.RuntimeException: Unable to run quorum server
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:693)
> at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:635)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:170)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
> at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
> Caused by: java.io.IOException: The current epoch, 7, is older than the
> last zxid, 34359738370
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:674)
> ... 4 more----
>
>
>