You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Koen De Groote <ko...@limecraft.com> on 2019/08/13 13:42:54 UTC

Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

I would also like to know if this is possible.

From going over the github page, it seems there is a JMX method to force
the creation of a snapshot. Yet the docker image is configured as such that
a port will never be assigned to the JMX process.

Is there any way to bypass this?

On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jo...@gmail.com> wrote:

> Thanks. It is possible to force Zookeeper to create a snapshot? I will
> check I think the snapshot count is set to 1 in the cfg
>
> > Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eo...@gmail.com>:
> >
> > Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
> jornfranke@gmail.com>
> > ha scritto:
> >
> >> ok, then let me verify tomorrow if a snapshot file is indeed there. If
> it
> >> is missing then I wonder why it was missing. There was no crash or
> whatever
> >> and 3.4.14 works without issue, but of course it could have loaded them
> >> from the log files. However, then I wonder why it does not create one.
> >>
> >
> >
> >
> > I remember now that some other user, I think Sijie, reported a similar
> > problem some month ago, that it is not possible to upgrade from 3.4 to
> 3.5
> > if no snapshot is present.
> > IIRC The fix was to force the creation of at least one snapshot file and
> > then upgrade
> >
> > Enrico
> >
> >
> >>
> >> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <ha...@apache.org> wrote:
> >>
> >>>>> I just wonder why it does not find a valid snapshot.
> >>>
> >>> If there are local snapshot files and the files are valid, then it's a
> >> bug
> >>> that server fails to load them.
> >>>
> >>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
> >>>
> >>> Not I am aware of. There are some format changes (added compression
> >>> support) in master branch, but that's not shipped with 3.5.5.
> >>>
> >>>
> >>>
> >>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jo...@gmail.com>
> >> wrote:
> >>>
> >>>> ok, then it affects basically all standalone nodes? This is fine,
> >> despite
> >>>> that it means some extra work (for uncritical lab environments).
> >>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
> >>>> behind it).The logs are fine (it works in 3.4.14 without issues, even
> >>> after
> >>>> downgrading back). There is no issue with disk space and there are no
> 0
> >>>> byte files.  I just wonder why it does not find a valid snapshot. Is
> it
> >>>> because the format changed in 3.5.5 compared to 3.4.14?
> >>>>
> >>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <ha...@apache.org> wrote:
> >>>>
> >>>>>>> java.io.IOException: No snapshot found, but there are log entries.
> >>>>> Something is broken!
> >>>>>
> >>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't want
> >>> to
> >>>>> end up with potential inconsistent state across the ensemble when
> >>>>> recovering from empty snapshot.
> >>>>>
> >>>>> To continue upgrade, just delete all txn log files and let the node
> >>> sync
> >>>>> the snapshot from the quorum.
> >>>>>
> >>>>>
> >>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jo...@gmail.com> ha
> >>>> scritto:
> >>>>>>
> >>>>>>> It also seems that 3.5.5 does not attempt to read all of the
> >>> logfiles
> >>>>> (I
> >>>>>>> have to still confirm), but the two it reads exist, it has access
> >>> and
> >>>>>> they
> >>>>>>> are much more than 0 byte
> >>>>>>>
> >>>>>>
> >>>>>> We should have the stackstace of the EOFException.
> >>>>>>
> >>>>>> Anyone on this list has a better idea?
> >>>>>>
> >>>>>> Enrico
> >>>>>>
> >>>>>>
> >>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
> >> jornfranke@gmail.com
> >>>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> (of course i do not run them at the same time)
> >>>>>>>>
> >>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> >>> jornfranke@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> thank you for the quick reply. They read from the same disk
> >>> paths
> >>>>> and
> >>>>>>>>> have the same access rights (in fact the RHEL service executes
> >>>> them
> >>>>> as
> >>>>>>> the
> >>>>>>>>> same specific user).
> >>>>>>>>>
> >>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
> >>>>> eolivelli@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <jo...@gmail.com>
> >>> ha
> >>>>>>> scritto:
> >>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14
> >>> (used
> >>>>> for
> >>>>>>>>>> Solr)
> >>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in
> >>>> standalone
> >>>>>> mode
> >>>>>>>>>>> (other environments have a proper ensemble). I increased
> >>>>>>> jute.maxbuffer
> >>>>>>>>>>> beyond the default (but not excessively) - this was working
> >>>>>> perfectly
> >>>>>>>>>> fine
> >>>>>>>>>>> in 3.4.14.
> >>>>>>>>>>>
> >>>>>>>>>>> Basically I reuse for the migration the same config files,
> >>>> except
> >>>>>>> that
> >>>>>>>>>> I
> >>>>>>>>>>> whitelist some commands (later I am also interested in
> >> adding
> >>>>> SSL).
> >>>>>>>>>>>
> >>>>>>>>>>> I have the following error message when starting Zookeeper
> >>> with
> >>>>>> 3.5.5
> >>>>>>>>>>> (basically, I just changed the symboling link from
> >> zookeeper
> >>> to
> >>>>>> point
> >>>>>>>>>> to
> >>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
> >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
> >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
> >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
> >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
> >>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
> >>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
> >>>>>>>>>>> /zookeeper/version-2/log.b34
> >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
> >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
> >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
> >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
> >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
> >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
> >>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
> >>>>>>>>>>> /zookeeper/version-2/log.b72
> >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
> >>>>>> [main:ZooKeeperServerMain@83
> >>>>>>> ]
> >>>>>>>>>> -
> >>>>>>>>>>> Unexpected exception, exiting abnormally
> >>>>>>>>>>> java.io.IOException: No snapshot found, but there are log
> >>>>> entries.
> >>>>>>>>>>> Something is broken!
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
> >>>>>>>>>>>        at
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> >>>>>>>>>>>
> >>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is
> >>>>> resolved
> >>>>>>> and
> >>>>>>>>>>> Zookeeper works normally. However, I would like to leverage
> >>> the
> >>>>> new
> >>>>>>>>>> version
> >>>>>>>>>>> 3.5.5.
> >>>>>>>>>>>
> >>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Can you compare these logs with  logs of 3.4.x ? Are they
> >>> reading
> >>>>>> from
> >>>>>>>>>> the
> >>>>>>>>>> same disk paths?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid
> >>> it,
> >>>> I
> >>>>>> can
> >>>>>>>>>>> reconstruct it, but still)?  I will try also in the other
> >>>>>>> environments
> >>>>>>>>>> and
> >>>>>>>>>>> also with an environment with an ensemble, but i would like
> >>> to
> >>>>> know
> >>>>>>>>>> before
> >>>>>>>>>>> what the issue could be.
> >>>>>>>>>>>
> >>>>>>>>>>> Not sure if it is relevant, but:
> >>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for
> >>> clients
> >>>>> and
> >>>>>>>>>> quorum.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth
> >>>>>>>>>>
> >>>>>>>>>> Enrico
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

Posted by Enrico Olivelli <eo...@gmail.com>.
Il mar 13 ago 2019, 15:43 Koen De Groote <ko...@limecraft.com> ha
scritto:

> I would also like to know if this is possible.
>
> From going over the github page, it seems there is a JMX method to force
> the creation of a snapshot. Yet the docker image is configured as such that
> a port will never be assigned to the JMX process.
>

Can't you modify your docker image in order to expose the JMX API? I am not
a docket expert but it should be possible

Enrico


> Is there any way to bypass this?
>
> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jo...@gmail.com> wrote:
>
> > Thanks. It is possible to force Zookeeper to create a snapshot? I will
> > check I think the snapshot count is set to 1 in the cfg
> >
> > > Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eo...@gmail.com>:
> > >
> > > Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
> > jornfranke@gmail.com>
> > > ha scritto:
> > >
> > >> ok, then let me verify tomorrow if a snapshot file is indeed there. If
> > it
> > >> is missing then I wonder why it was missing. There was no crash or
> > whatever
> > >> and 3.4.14 works without issue, but of course it could have loaded
> them
> > >> from the log files. However, then I wonder why it does not create one.
> > >>
> > >
> > >
> > >
> > > I remember now that some other user, I think Sijie, reported a similar
> > > problem some month ago, that it is not possible to upgrade from 3.4 to
> > 3.5
> > > if no snapshot is present.
> > > IIRC The fix was to force the creation of at least one snapshot file
> and
> > > then upgrade
> > >
> > > Enrico
> > >
> > >
> > >>
> > >> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <ha...@apache.org> wrote:
> > >>
> > >>>>> I just wonder why it does not find a valid snapshot.
> > >>>
> > >>> If there are local snapshot files and the files are valid, then it's
> a
> > >> bug
> > >>> that server fails to load them.
> > >>>
> > >>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
> > >>>
> > >>> Not I am aware of. There are some format changes (added compression
> > >>> support) in master branch, but that's not shipped with 3.5.5.
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jo...@gmail.com>
> > >> wrote:
> > >>>
> > >>>> ok, then it affects basically all standalone nodes? This is fine,
> > >> despite
> > >>>> that it means some extra work (for uncritical lab environments).
> > >>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full
> history
> > >>>> behind it).The logs are fine (it works in 3.4.14 without issues,
> even
> > >>> after
> > >>>> downgrading back). There is no issue with disk space and there are
> no
> > 0
> > >>>> byte files.  I just wonder why it does not find a valid snapshot. Is
> > it
> > >>>> because the format changed in 3.5.5 compared to 3.4.14?
> > >>>>
> > >>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <ha...@apache.org>
> wrote:
> > >>>>
> > >>>>>>> java.io.IOException: No snapshot found, but there are log
> entries.
> > >>>>> Something is broken!
> > >>>>>
> > >>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't
> want
> > >>> to
> > >>>>> end up with potential inconsistent state across the ensemble when
> > >>>>> recovering from empty snapshot.
> > >>>>>
> > >>>>> To continue upgrade, just delete all txn log files and let the node
> > >>> sync
> > >>>>> the snapshot from the quorum.
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <
> eolivelli@gmail.com
> > >>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jo...@gmail.com> ha
> > >>>> scritto:
> > >>>>>>
> > >>>>>>> It also seems that 3.5.5 does not attempt to read all of the
> > >>> logfiles
> > >>>>> (I
> > >>>>>>> have to still confirm), but the two it reads exist, it has access
> > >>> and
> > >>>>>> they
> > >>>>>>> are much more than 0 byte
> > >>>>>>>
> > >>>>>>
> > >>>>>> We should have the stackstace of the EOFException.
> > >>>>>>
> > >>>>>> Anyone on this list has a better idea?
> > >>>>>>
> > >>>>>> Enrico
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
> > >> jornfranke@gmail.com
> > >>>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> (of course i do not run them at the same time)
> > >>>>>>>>
> > >>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
> > >>> jornfranke@gmail.com
> > >>>>>
> > >>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> thank you for the quick reply. They read from the same disk
> > >>> paths
> > >>>>> and
> > >>>>>>>>> have the same access rights (in fact the RHEL service executes
> > >>>> them
> > >>>>> as
> > >>>>>>> the
> > >>>>>>>>> same specific user).
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
> > >>>>> eolivelli@gmail.com
> > >>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <jo...@gmail.com>
> > >>> ha
> > >>>>>>> scritto:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14
> > >>> (used
> > >>>>> for
> > >>>>>>>>>> Solr)
> > >>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in
> > >>>> standalone
> > >>>>>> mode
> > >>>>>>>>>>> (other environments have a proper ensemble). I increased
> > >>>>>>> jute.maxbuffer
> > >>>>>>>>>>> beyond the default (but not excessively) - this was working
> > >>>>>> perfectly
> > >>>>>>>>>> fine
> > >>>>>>>>>>> in 3.4.14.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Basically I reuse for the migration the same config files,
> > >>>> except
> > >>>>>>> that
> > >>>>>>>>>> I
> > >>>>>>>>>>> whitelist some commands (later I am also interested in
> > >> adding
> > >>>>> SSL).
> > >>>>>>>>>>>
> > >>>>>>>>>>> I have the following error message when starting Zookeeper
> > >>> with
> > >>>>>> 3.5.5
> > >>>>>>>>>>> (basically, I just changed the symboling link from
> > >> zookeeper
> > >>> to
> > >>>>>> point
> > >>>>>>>>>> to
> > >>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
> > >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
> > >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
> > >>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
> > >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
> > >>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
> > >>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
> > >>>>>>>>>>> /zookeeper/version-2/log.b34
> > >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
> > >>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
> > >>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
> > >>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
> > >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
> > >>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
> > >>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
> > >>>>>>>>>>> /zookeeper/version-2/log.b72
> > >>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
> > >>>>>> [main:ZooKeeperServerMain@83
> > >>>>>>> ]
> > >>>>>>>>>> -
> > >>>>>>>>>>> Unexpected exception, exiting abnormally
> > >>>>>>>>>>> java.io.IOException: No snapshot found, but there are log
> > >>>>> entries.
> > >>>>>>>>>>> Something is broken!
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>
> > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
> > >>>>>>>>>>>        at
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is
> > >>>>> resolved
> > >>>>>>> and
> > >>>>>>>>>>> Zookeeper works normally. However, I would like to leverage
> > >>> the
> > >>>>> new
> > >>>>>>>>>> version
> > >>>>>>>>>>> 3.5.5.
> > >>>>>>>>>>>
> > >>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available.
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Can you compare these logs with  logs of 3.4.x ? Are they
> > >>> reading
> > >>>>>> from
> > >>>>>>>>>> the
> > >>>>>>>>>> same disk paths?
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid
> > >>> it,
> > >>>> I
> > >>>>>> can
> > >>>>>>>>>>> reconstruct it, but still)?  I will try also in the other
> > >>>>>>> environments
> > >>>>>>>>>> and
> > >>>>>>>>>>> also with an environment with an ensemble, but i would like
> > >>> to
> > >>>>> know
> > >>>>>>>>>> before
> > >>>>>>>>>>> what the issue could be.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Not sure if it is relevant, but:
> > >>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for
> > >>> clients
> > >>>>> and
> > >>>>>>>>>> quorum.
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth
> > >>>>>>>>>>
> > >>>>>>>>>> Enrico
> > >>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
>

Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

Posted by Andor Molnar <an...@apache.org>.
After some digging it turned out that this is an outstanding issue in 3.4->3.5 upgrade. I’ve found the following e-mail thread about it:
https://markmail.org/thread/rbhzbro6nszypwwp

…and an open Jira:
https://issues.apache.org/jira/browse/ZOOKEEPER-3056

Unfortunately, patch is still not available, but essentially the solution is to force ZooKeeper to create snapshot file somehow. Sorry, Admin interface is not available in 3.4, it was my bad to recommend it.

In the last Jira comment there’s a workaround:
To perform an upgrade (3.4 -> 3.5):
	• download the "snapshot.0" file attached
	• copy it to the versioned directory (e.g. "version-2") within your data directory (parameter "dataDir" in your config - this is the directory containing the "myid" file for a peer)
	• restart the peer
	• upgrade the peer (this can be combined with the above step if you like)

Would you please give it a try?

Andor




> On 2019. Aug 14., at 10:44, Andor Molnar <an...@apache.org> wrote:
> 
> Hi Jorn,
> 
> Thanks for reaching out to us, this is a very important exercise to make sure the upgrade path works as expected.
> 
> - Please do an `ls -al` in your data dir to make sure you have valid snapshot files.
> - It would be also useful to expose the Admin port (8080/tcp by default) and check the output of `lastSnapshotCommand`.
> 
> Regards,
> Andor
> 
> 
> 
> 
> 
>> On 2019. Aug 14., at 7:13, Jörn Franke <jo...@gmail.com> wrote:
>> 
>> For me the issue occurred only in standalone mode. With the ensemble I simply cleared the data directory and it received the zookeeper data from the quorum. 
>> 
>>> Am 13.08.2019 um 15:42 schrieb Koen De Groote <ko...@limecraft.com>:
>>> 
>>> I would also like to know if this is possible.
>>> 
>>> From going over the github page, it seems there is a JMX method to force
>>> the creation of a snapshot. Yet the docker image is configured as such that
>>> a port will never be assigned to the JMX process.
>>> 
>>> Is there any way to bypass this?
>>> 
>>>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jo...@gmail.com> wrote:
>>>> 
>>>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>>>> check I think the snapshot count is set to 1 in the cfg
>>>> 
>>>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eo...@gmail.com>:
>>>>> 
>>>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>>>> jornfranke@gmail.com>
>>>>> ha scritto:
>>>>> 
>>>>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If
>>>> it
>>>>>> is missing then I wonder why it was missing. There was no crash or
>>>> whatever
>>>>>> and 3.4.14 works without issue, but of course it could have loaded them
>>>>>> from the log files. However, then I wonder why it does not create one.
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> I remember now that some other user, I think Sijie, reported a similar
>>>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>>>> 3.5
>>>>> if no snapshot is present.
>>>>> IIRC The fix was to force the creation of at least one snapshot file and
>>>>> then upgrade
>>>>> 
>>>>> Enrico
>>>>> 
>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <ha...@apache.org> wrote:
>>>>>> 
>>>>>>>>> I just wonder why it does not find a valid snapshot.
>>>>>>> 
>>>>>>> If there are local snapshot files and the files are valid, then it's a
>>>>>> bug
>>>>>>> that server fails to load them.
>>>>>>> 
>>>>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
>>>>>>> 
>>>>>>> Not I am aware of. There are some format changes (added compression
>>>>>>> support) in master branch, but that's not shipped with 3.5.5.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jo...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> ok, then it affects basically all standalone nodes? This is fine,
>>>>>> despite
>>>>>>>> that it means some extra work (for uncritical lab environments).
>>>>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
>>>>>>>> behind it).The logs are fine (it works in 3.4.14 without issues, even
>>>>>>> after
>>>>>>>> downgrading back). There is no issue with disk space and there are no
>>>> 0
>>>>>>>> byte files.  I just wonder why it does not find a valid snapshot. Is
>>>> it
>>>>>>>> because the format changed in 3.5.5 compared to 3.4.14?
>>>>>>>> 
>>>>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <ha...@apache.org> wrote:
>>>>>>>> 
>>>>>>>>>>> java.io.IOException: No snapshot found, but there are log entries.
>>>>>>>>> Something is broken!
>>>>>>>>> 
>>>>>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't want
>>>>>>> to
>>>>>>>>> end up with potential inconsistent state across the ensemble when
>>>>>>>>> recovering from empty snapshot.
>>>>>>>>> 
>>>>>>>>> To continue upgrade, just delete all txn log files and let the node
>>>>>>> sync
>>>>>>>>> the snapshot from the quorum.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jo...@gmail.com> ha
>>>>>>>> scritto:
>>>>>>>>>> 
>>>>>>>>>>> It also seems that 3.5.5 does not attempt to read all of the
>>>>>>> logfiles
>>>>>>>>> (I
>>>>>>>>>>> have to still confirm), but the two it reads exist, it has access
>>>>>>> and
>>>>>>>>>> they
>>>>>>>>>>> are much more than 0 byte
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> We should have the stackstace of the EOFException.
>>>>>>>>>> 
>>>>>>>>>> Anyone on this list has a better idea?
>>>>>>>>>> 
>>>>>>>>>> Enrico
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
>>>>>> jornfranke@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> (of course i do not run them at the same time)
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
>>>>>>> jornfranke@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> thank you for the quick reply. They read from the same disk
>>>>>>> paths
>>>>>>>>> and
>>>>>>>>>>>>> have the same access rights (in fact the RHEL service executes
>>>>>>>> them
>>>>>>>>> as
>>>>>>>>>>> the
>>>>>>>>>>>>> same specific user).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
>>>>>>>>> eolivelli@gmail.com
>>>>>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <jo...@gmail.com>
>>>>>>> ha
>>>>>>>>>>> scritto:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14
>>>>>>> (used
>>>>>>>>> for
>>>>>>>>>>>>>> Solr)
>>>>>>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in
>>>>>>>> standalone
>>>>>>>>>> mode
>>>>>>>>>>>>>>> (other environments have a proper ensemble). I increased
>>>>>>>>>>> jute.maxbuffer
>>>>>>>>>>>>>>> beyond the default (but not excessively) - this was working
>>>>>>>>>> perfectly
>>>>>>>>>>>>>> fine
>>>>>>>>>>>>>>> in 3.4.14.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Basically I reuse for the migration the same config files,
>>>>>>>> except
>>>>>>>>>>> that
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>> whitelist some commands (later I am also interested in
>>>>>> adding
>>>>>>>>> SSL).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I have the following error message when starting Zookeeper
>>>>>>> with
>>>>>>>>>> 3.5.5
>>>>>>>>>>>>>>> (basically, I just changed the symboling link from
>>>>>> zookeeper
>>>>>>> to
>>>>>>>>>> point
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>>>>>> /zookeeper/version-2/log.b34
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
>>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>>>>>> /zookeeper/version-2/log.b72
>>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
>>>>>>>>>> [main:ZooKeeperServerMain@83
>>>>>>>>>>> ]
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>>>>>> java.io.IOException: No snapshot found, but there are log
>>>>>>>>> entries.
>>>>>>>>>>>>>>> Something is broken!
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>>>>>>>>>>>>>>>     at
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is
>>>>>>>>> resolved
>>>>>>>>>>> and
>>>>>>>>>>>>>>> Zookeeper works normally. However, I would like to leverage
>>>>>>> the
>>>>>>>>> new
>>>>>>>>>>>>>> version
>>>>>>>>>>>>>>> 3.5.5.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Can you compare these logs with  logs of 3.4.x ? Are they
>>>>>>> reading
>>>>>>>>>> from
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same disk paths?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid
>>>>>>> it,
>>>>>>>> I
>>>>>>>>>> can
>>>>>>>>>>>>>>> reconstruct it, but still)?  I will try also in the other
>>>>>>>>>>> environments
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> also with an environment with an ensemble, but i would like
>>>>>>> to
>>>>>>>>> know
>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>> what the issue could be.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Not sure if it is relevant, but:
>>>>>>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for
>>>>>>> clients
>>>>>>>>> and
>>>>>>>>>>>>>> quorum.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
> 


Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

Posted by Andor Molnar <an...@apache.org>.
Hi Jorn,

Thanks for reaching out to us, this is a very important exercise to make sure the upgrade path works as expected.

- Please do an `ls -al` in your data dir to make sure you have valid snapshot files.
- It would be also useful to expose the Admin port (8080/tcp by default) and check the output of `lastSnapshotCommand`.

Regards,
Andor





> On 2019. Aug 14., at 7:13, Jörn Franke <jo...@gmail.com> wrote:
> 
> For me the issue occurred only in standalone mode. With the ensemble I simply cleared the data directory and it received the zookeeper data from the quorum. 
> 
>> Am 13.08.2019 um 15:42 schrieb Koen De Groote <ko...@limecraft.com>:
>> 
>> I would also like to know if this is possible.
>> 
>> From going over the github page, it seems there is a JMX method to force
>> the creation of a snapshot. Yet the docker image is configured as such that
>> a port will never be assigned to the JMX process.
>> 
>> Is there any way to bypass this?
>> 
>>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jo...@gmail.com> wrote:
>>> 
>>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>>> check I think the snapshot count is set to 1 in the cfg
>>> 
>>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eo...@gmail.com>:
>>>> 
>>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>>> jornfranke@gmail.com>
>>>> ha scritto:
>>>> 
>>>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If
>>> it
>>>>> is missing then I wonder why it was missing. There was no crash or
>>> whatever
>>>>> and 3.4.14 works without issue, but of course it could have loaded them
>>>>> from the log files. However, then I wonder why it does not create one.
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> I remember now that some other user, I think Sijie, reported a similar
>>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>>> 3.5
>>>> if no snapshot is present.
>>>> IIRC The fix was to force the creation of at least one snapshot file and
>>>> then upgrade
>>>> 
>>>> Enrico
>>>> 
>>>> 
>>>>> 
>>>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <ha...@apache.org> wrote:
>>>>> 
>>>>>>>> I just wonder why it does not find a valid snapshot.
>>>>>> 
>>>>>> If there are local snapshot files and the files are valid, then it's a
>>>>> bug
>>>>>> that server fails to load them.
>>>>>> 
>>>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
>>>>>> 
>>>>>> Not I am aware of. There are some format changes (added compression
>>>>>> support) in master branch, but that's not shipped with 3.5.5.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jo...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> ok, then it affects basically all standalone nodes? This is fine,
>>>>> despite
>>>>>>> that it means some extra work (for uncritical lab environments).
>>>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
>>>>>>> behind it).The logs are fine (it works in 3.4.14 without issues, even
>>>>>> after
>>>>>>> downgrading back). There is no issue with disk space and there are no
>>> 0
>>>>>>> byte files.  I just wonder why it does not find a valid snapshot. Is
>>> it
>>>>>>> because the format changed in 3.5.5 compared to 3.4.14?
>>>>>>> 
>>>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <ha...@apache.org> wrote:
>>>>>>> 
>>>>>>>>>> java.io.IOException: No snapshot found, but there are log entries.
>>>>>>>> Something is broken!
>>>>>>>> 
>>>>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't want
>>>>>> to
>>>>>>>> end up with potential inconsistent state across the ensemble when
>>>>>>>> recovering from empty snapshot.
>>>>>>>> 
>>>>>>>> To continue upgrade, just delete all txn log files and let the node
>>>>>> sync
>>>>>>>> the snapshot from the quorum.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jo...@gmail.com> ha
>>>>>>> scritto:
>>>>>>>>> 
>>>>>>>>>> It also seems that 3.5.5 does not attempt to read all of the
>>>>>> logfiles
>>>>>>>> (I
>>>>>>>>>> have to still confirm), but the two it reads exist, it has access
>>>>>> and
>>>>>>>>> they
>>>>>>>>>> are much more than 0 byte
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We should have the stackstace of the EOFException.
>>>>>>>>> 
>>>>>>>>> Anyone on this list has a better idea?
>>>>>>>>> 
>>>>>>>>> Enrico
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
>>>>> jornfranke@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> (of course i do not run them at the same time)
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
>>>>>> jornfranke@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> thank you for the quick reply. They read from the same disk
>>>>>> paths
>>>>>>>> and
>>>>>>>>>>>> have the same access rights (in fact the RHEL service executes
>>>>>>> them
>>>>>>>> as
>>>>>>>>>> the
>>>>>>>>>>>> same specific user).
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
>>>>>>>> eolivelli@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <jo...@gmail.com>
>>>>>> ha
>>>>>>>>>> scritto:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14
>>>>>> (used
>>>>>>>> for
>>>>>>>>>>>>> Solr)
>>>>>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in
>>>>>>> standalone
>>>>>>>>> mode
>>>>>>>>>>>>>> (other environments have a proper ensemble). I increased
>>>>>>>>>> jute.maxbuffer
>>>>>>>>>>>>>> beyond the default (but not excessively) - this was working
>>>>>>>>> perfectly
>>>>>>>>>>>>> fine
>>>>>>>>>>>>>> in 3.4.14.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Basically I reuse for the migration the same config files,
>>>>>>> except
>>>>>>>>>> that
>>>>>>>>>>>>> I
>>>>>>>>>>>>>> whitelist some commands (later I am also interested in
>>>>> adding
>>>>>>>> SSL).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I have the following error message when starting Zookeeper
>>>>>> with
>>>>>>>>> 3.5.5
>>>>>>>>>>>>>> (basically, I just changed the symboling link from
>>>>> zookeeper
>>>>>> to
>>>>>>>>> point
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>>>>> /zookeeper/version-2/log.b34
>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
>>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>>>>> /zookeeper/version-2/log.b72
>>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
>>>>>>>>> [main:ZooKeeperServerMain@83
>>>>>>>>>> ]
>>>>>>>>>>>>> -
>>>>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>>>>> java.io.IOException: No snapshot found, but there are log
>>>>>>>> entries.
>>>>>>>>>>>>>> Something is broken!
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>>>>>>>>>>>>>>      at
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is
>>>>>>>> resolved
>>>>>>>>>> and
>>>>>>>>>>>>>> Zookeeper works normally. However, I would like to leverage
>>>>>> the
>>>>>>>> new
>>>>>>>>>>>>> version
>>>>>>>>>>>>>> 3.5.5.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Can you compare these logs with  logs of 3.4.x ? Are they
>>>>>> reading
>>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same disk paths?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid
>>>>>> it,
>>>>>>> I
>>>>>>>>> can
>>>>>>>>>>>>>> reconstruct it, but still)?  I will try also in the other
>>>>>>>>>> environments
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> also with an environment with an ensemble, but i would like
>>>>>> to
>>>>>>>> know
>>>>>>>>>>>>> before
>>>>>>>>>>>>>> what the issue could be.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Not sure if it is relevant, but:
>>>>>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for
>>>>>> clients
>>>>>>>> and
>>>>>>>>>>>>> quorum.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 


Re: Issue migrating from Zookeeper 3.4.14 to 3.5.5

Posted by Jörn Franke <jo...@gmail.com>.
For me the issue occurred only in standalone mode. With the ensemble I simply cleared the data directory and it received the zookeeper data from the quorum. 

> Am 13.08.2019 um 15:42 schrieb Koen De Groote <ko...@limecraft.com>:
> 
> I would also like to know if this is possible.
> 
> From going over the github page, it seems there is a JMX method to force
> the creation of a snapshot. Yet the docker image is configured as such that
> a port will never be assigned to the JMX process.
> 
> Is there any way to bypass this?
> 
>> On Tue, Jul 30, 2019 at 8:51 AM Jörn Franke <jo...@gmail.com> wrote:
>> 
>> Thanks. It is possible to force Zookeeper to create a snapshot? I will
>> check I think the snapshot count is set to 1 in the cfg
>> 
>>> Am 30.07.2019 um 08:06 schrieb Enrico Olivelli <eo...@gmail.com>:
>>> 
>>> Il giorno lun 29 lug 2019 alle ore 23:59 Jörn Franke <
>> jornfranke@gmail.com>
>>> ha scritto:
>>> 
>>>> ok, then let me verify tomorrow if a snapshot file is indeed there. If
>> it
>>>> is missing then I wonder why it was missing. There was no crash or
>> whatever
>>>> and 3.4.14 works without issue, but of course it could have loaded them
>>>> from the log files. However, then I wonder why it does not create one.
>>>> 
>>> 
>>> 
>>> 
>>> I remember now that some other user, I think Sijie, reported a similar
>>> problem some month ago, that it is not possible to upgrade from 3.4 to
>> 3.5
>>> if no snapshot is present.
>>> IIRC The fix was to force the creation of at least one snapshot file and
>>> then upgrade
>>> 
>>> Enrico
>>> 
>>> 
>>>> 
>>>> On Mon, Jul 29, 2019 at 11:45 PM Michael Han <ha...@apache.org> wrote:
>>>> 
>>>>>>> I just wonder why it does not find a valid snapshot.
>>>>> 
>>>>> If there are local snapshot files and the files are valid, then it's a
>>>> bug
>>>>> that server fails to load them.
>>>>> 
>>>>>>> Is it because the format changed in 3.5.5 compared to 3.4.14?
>>>>> 
>>>>> Not I am aware of. There are some format changes (added compression
>>>>> support) in master branch, but that's not shipped with 3.5.5.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Jul 29, 2019 at 2:31 PM Jörn Franke <jo...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> ok, then it affects basically all standalone nodes? This is fine,
>>>> despite
>>>>>> that it means some extra work (for uncritical lab environments).
>>>>>> I am not sure it is ZOOKEEPER-2325, but I don't know the full history
>>>>>> behind it).The logs are fine (it works in 3.4.14 without issues, even
>>>>> after
>>>>>> downgrading back). There is no issue with disk space and there are no
>> 0
>>>>>> byte files.  I just wonder why it does not find a valid snapshot. Is
>> it
>>>>>> because the format changed in 3.5.5 compared to 3.4.14?
>>>>>> 
>>>>>> On Mon, Jul 29, 2019 at 11:25 PM Michael Han <ha...@apache.org> wrote:
>>>>>> 
>>>>>>>>> java.io.IOException: No snapshot found, but there are log entries.
>>>>>>> Something is broken!
>>>>>>> 
>>>>>>> This is expected behavior introduced in ZOOKEEPER-2325. We don't want
>>>>> to
>>>>>>> end up with potential inconsistent state across the ensemble when
>>>>>>> recovering from empty snapshot.
>>>>>>> 
>>>>>>> To continue upgrade, just delete all txn log files and let the node
>>>>> sync
>>>>>>> the snapshot from the quorum.
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Jul 29, 2019 at 1:38 PM Enrico Olivelli <eolivelli@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Il lun 29 lug 2019, 22:32 Jörn Franke <jo...@gmail.com> ha
>>>>>> scritto:
>>>>>>>> 
>>>>>>>>> It also seems that 3.5.5 does not attempt to read all of the
>>>>> logfiles
>>>>>>> (I
>>>>>>>>> have to still confirm), but the two it reads exist, it has access
>>>>> and
>>>>>>>> they
>>>>>>>>> are much more than 0 byte
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> We should have the stackstace of the EOFException.
>>>>>>>> 
>>>>>>>> Anyone on this list has a better idea?
>>>>>>>> 
>>>>>>>> Enrico
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Mon, Jul 29, 2019 at 10:13 PM Jörn Franke <
>>>> jornfranke@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> (of course i do not run them at the same time)
>>>>>>>>>> 
>>>>>>>>>> On Mon, Jul 29, 2019 at 10:10 PM Jörn Franke <
>>>>> jornfranke@gmail.com
>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> thank you for the quick reply. They read from the same disk
>>>>> paths
>>>>>>> and
>>>>>>>>>>> have the same access rights (in fact the RHEL service executes
>>>>>> them
>>>>>>> as
>>>>>>>>> the
>>>>>>>>>>> same specific user).
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jul 29, 2019 at 10:09 PM Enrico Olivelli <
>>>>>>> eolivelli@gmail.com
>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Il lun 29 lug 2019, 21:50 Jörn Franke <jo...@gmail.com>
>>>>> ha
>>>>>>>>> scritto:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I tried to migrate a lab environment from Zookeepr 3.4.14
>>>>> (used
>>>>>>> for
>>>>>>>>>>>> Solr)
>>>>>>>>>>>>> to 3.5.5 and encountered an issue. It is ZooKeeper in
>>>>>> standalone
>>>>>>>> mode
>>>>>>>>>>>>> (other environments have a proper ensemble). I increased
>>>>>>>>> jute.maxbuffer
>>>>>>>>>>>>> beyond the default (but not excessively) - this was working
>>>>>>>> perfectly
>>>>>>>>>>>> fine
>>>>>>>>>>>>> in 3.4.14.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Basically I reuse for the migration the same config files,
>>>>>> except
>>>>>>>>> that
>>>>>>>>>>>> I
>>>>>>>>>>>>> whitelist some commands (later I am also interested in
>>>> adding
>>>>>>> SSL).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have the following error message when starting Zookeeper
>>>>> with
>>>>>>>> 3.5.5
>>>>>>>>>>>>> (basically, I just changed the symboling link from
>>>> zookeeper
>>>>> to
>>>>>>>> point
>>>>>>>>>>>> to
>>>>>>>>>>>>> 3.5.5 instead of the 3.4.14 directory:
>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b34
>>>>>>>>>>>>> 2019-07-29 15:16:25,217 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b34
>>>>>>>>>>>>> 2019-07-29 15:16:25,222 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>>>> /zookeeper/version-2/log.b34
>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@655]
>>>>>>>>>>>>> - Created new input stream /zookeeper/version-2/log.b72
>>>>>>>>>>>>> 2019-07-29 15:16:25,223 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@658]
>>>>>>>>>>>>> - Created new input archive /zookeeper/version-2/log.b72
>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - DEBUG
>>>>>>>>>>>>> [main:FileTxnLog$FileTxnIterator@696]
>>>>>>>>>>>>> - EOF exception java.io.EOFException: Failed to read
>>>>>>>>>>>>> /zookeeper/version-2/log.b72
>>>>>>>>>>>>> 2019-07-29 15:16:25,224 [myid:] - ERROR
>>>>>>>> [main:ZooKeeperServerMain@83
>>>>>>>>> ]
>>>>>>>>>>>> -
>>>>>>>>>>>>> Unexpected exception, exiting abnormally
>>>>>>>>>>>>> java.io.IOException: No snapshot found, but there are log
>>>>>>> entries.
>>>>>>>>>>>>> Something is broken!
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:211)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:240)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:290)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:450)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:764)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ServerCnxnFactory.startup(ServerCnxnFactory.java:98)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:144)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:106)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:64)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:128)
>>>>>>>>>>>>>       at
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:82)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Strangely enough, if I switch back to 3.4.14 the issue is
>>>>>>> resolved
>>>>>>>>> and
>>>>>>>>>>>>> Zookeeper works normally. However, I would like to leverage
>>>>> the
>>>>>>> new
>>>>>>>>>>>> version
>>>>>>>>>>>>> 3.5.5.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There are no 0 bytes files. Disk space is plenty available.
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Can you compare these logs with  logs of 3.4.x ? Are they
>>>>> reading
>>>>>>>> from
>>>>>>>>>>>> the
>>>>>>>>>>>> same disk paths?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> Any idea beyond erasing the data dir (I would try to avoid
>>>>> it,
>>>>>> I
>>>>>>>> can
>>>>>>>>>>>>> reconstruct it, but still)?  I will try also in the other
>>>>>>>>> environments
>>>>>>>>>>>> and
>>>>>>>>>>>>> also with an environment with an ensemble, but i would like
>>>>> to
>>>>>>> know
>>>>>>>>>>>> before
>>>>>>>>>>>>> what the issue could be.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Not sure if it is relevant, but:
>>>>>>>>>>>>> Activated Kerberos Authentication and Kerberos SSL for
>>>>> clients
>>>>>>> and
>>>>>>>>>>>> quorum.
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Quorum? In standalone mode there is no 'quorum' auth
>>>>>>>>>>>> 
>>>>>>>>>>>> Enrico
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>