You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Enrico Olivelli <eo...@gmail.com> on 2020/02/10 08:47:40 UTC

Rolling upgrade from 3.5 to 3.6 - expected behaviour

Hi,
even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
of 3.6.0 I wanted to finish my tests and I am coming to an apparent
blocker.

I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
peers are not able to talk to each other.
I have a cluster of 3, server1, server2 and server3.
When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:

2020-02-10 09:35:07,745 [myid:3] - INFO
[localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
connection request 127.0.0.1:62591
2020-02-10 09:35:07,746 [myid:3] - ERROR
[localhost/127.0.0.1:3334:QuorumCnxManager@527] -
org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
Got unrecognized protocol version -65535

Once I upgrade all of the peers the system is up and running, without
apparently no data loss.

During the upgrade as soon as I upgrade the first node, say, server1,
server1 is not able to accept connections (error "Close of session 0x0
java.io.IOException: ZooKeeperServer not running")  from clients, this
is expected, because as far as it cannot talk with the other peers it
is practically partitioned away from the cluster.

My questions are:
1) is this expected ? I can't remember protocol changes from 3.5 to
3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
not in the community as dev so I cannot tell
2) is this a viable option for users ? to have some temporary glitch
during the upgrade and hope that the upgrade completes without
troubles ?

In theory as long as two servers are running the same major version
(3.5 or 3.6) we have a quorum and the system is able to make progress
and to server clients.
I feel that this is quite dangerous, but I don't have enough context
to understand how this problem is possible and when we decided to
break compatibility.

The other option is that I am wrong in my test and I am messing up :-)

The other upgrade path I would like to see working like a charm is the
upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
should encourage users to move to 3.6 and not to 3.5.

Regards
Enrico

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Szalay-Bekő Máté <sz...@gmail.com>.
Actually, we have an other option: we can follow the way, how the rolling
restart support for the QuorumSSL was implemented.
- we can make 3.6.0 to be able to read both protocol versions
- we can add a parameter that tells the 3.6.0 which protocol version to use
(using the old one brakes / disables the MultiAddress feature, but I think
that is OK during upgrade)
- then we can make a rolling upgrade with the old protocol version
- then we can change the parameter to use the new protocol version (at this
point all nodes can understand both versions)
- then we can do a rolling restart with the new config

I would vote on this solution.

Kind regards,
Mate


On Mon, Feb 10, 2020 at 11:17 AM Szalay-Bekő Máté <
szalay.beko.mate@gmail.com> wrote:

> Hi Enrico!
>
> This is caused by the different PROTOCOL_VERSION in the QuorumCnxManager.
> The Protocol version  was changed last time in ZOOKEEPER-2186 released
> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
> also changed the protocol version when the format of the initial message
> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> compatible in this case and is the 'expected' behavior if you upgrade e.g
> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
>
> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to
> the conclusion that it is not that bad, as there will be no data loss as
> you wrote. The tricky thing is that during rolling upgrade we should ensure
> both backward and forward compatibility to make sure that the old and the
> new part of the quorum can still speak to each other. The current solution
> (simply failing if the protocol versions mismatch) is more simple and still
> working just fine: as the servers are restarted one-by-one, the nodes with
> the old protocol version and the nodes with the new protocol version will
> form two partitions, but any given time only one partition will have the
> quorum.
>
> Still, thinking it trough, as a side effect in these cases there will be a
> short time when none of the partitions will have quorums (when we have N
> servers with the old protocol version, N servers with the new protocol
> version, and there is one server just being restarted). I am not sure if we
> can accept this.
>
> For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
> the initial message of the old protocol version with the new code. But I am
> not sure if it would be enough (as the old code will not be able to parse
> the new initial message).
>
> One option can be to make a patch also for 3.5 to have a version which
> supports both protocol versions. (let's say in 3.5.8) Then we can write to
> the release note, that if you need rolling upgrade from any versions since
> 3.4.7, then you have to first upgrade from 3.5.8 before upgrading to 3.6.0.
> We can even make the same thing on the 3.4 branch.
>
> But I am also new to the community... It would be great to hear the
> opinion of more experienced people.
> Whatever the decision will be, I am happy to make the changes.
>
> And sorry for breaking the RC (if we decide that this needs to be
> changed...).  ZOOKEEPER-3188 was a complex patch.
>
> Kind regards,
> Mate
>
> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
>> Hi,
>> even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
>> of 3.6.0 I wanted to finish my tests and I am coming to an apparent
>> blocker.
>>
>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
>> peers are not able to talk to each other.
>> I have a cluster of 3, server1, server2 and server3.
>> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:
>>
>> 2020-02-10 09:35:07,745 [myid:3] - INFO
>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
>> connection request 127.0.0.1:62591
>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
>>
>> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
>> Got unrecognized protocol version -65535
>>
>> Once I upgrade all of the peers the system is up and running, without
>> apparently no data loss.
>>
>> During the upgrade as soon as I upgrade the first node, say, server1,
>> server1 is not able to accept connections (error "Close of session 0x0
>> java.io.IOException: ZooKeeperServer not running")  from clients, this
>> is expected, because as far as it cannot talk with the other peers it
>> is practically partitioned away from the cluster.
>>
>> My questions are:
>> 1) is this expected ? I can't remember protocol changes from 3.5 to
>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
>> not in the community as dev so I cannot tell
>> 2) is this a viable option for users ? to have some temporary glitch
>> during the upgrade and hope that the upgrade completes without
>> troubles ?
>>
>> In theory as long as two servers are running the same major version
>> (3.5 or 3.6) we have a quorum and the system is able to make progress
>> and to server clients.
>> I feel that this is quite dangerous, but I don't have enough context
>> to understand how this problem is possible and when we decided to
>> break compatibility.
>>
>> The other option is that I am wrong in my test and I am messing up :-)
>>
>> The other upgrade path I would like to see working like a charm is the
>> upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
>> should encourage users to move to 3.6 and not to 3.5.
>>
>> Regards
>> Enrico
>>
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Szalay-Bekő Máté <sz...@gmail.com>.
Hi Andor,

this is almost exactly what I proposed. More precisely:

1) First we make multi-address feature disabled by default.

2) If disabled, quorum protocol automatically uses the old protocol version
which lets 3.5 and 3.6 communicate smoothly. The code in 3.6.0 will be able
to understand both the old and the new protocols, but it will send the
messages in the old protocol by default.

3) Once the user finished the first rolling upgrade, the cluster is running
3.6 and communicating with old protocol.

4) If user wants to enable multi-address feature (I think that’s quite rare
anyway), another rolling restart is needed, but this time there will be no
partitions, as all the members of the cluster will understand both
protocols.

At least this is the theory :) I still have to verify this with tests.

Kind regards,
Mate


On Tue, Feb 11, 2020, 22:10 Andor Molnar <an...@apache.org> wrote:

> Mate,
>
> Let me reiterate to see if I understand you correctly:
>
> 1) First we make multi-address feature disabled by default.
>
> 2) If disabled, quorum protocol automatically uses the old protocol
> version which lets 3.5 and 3.6 communicate smoothly.
>
> 3) Once the user finished the first rolling restart, the cluster is
> running 3.6 and communicating with old protocol.
>
> 4) If user wants to enable multi-address feature (I think that’s quite
> rare anyway), another rolling restart is needed, but this time partitions
> cannot be avoided.
>
> Is that accurate?
>
> I think it’s not the end of the world.
>
> Andor
>
>
>
>
> > On 2020. Feb 11., at 21:27, Szalay-Bekő Máté <sz...@gmail.com>
> wrote:
> >
> > I see the main problem here in the fact that we are missing proper
> > versioning in the leader election / quorum protocols. I tried to simply
> > implement backward compatibility in 3.6, but it didn't solve the problem.
> > The new code understands the old protocol, but it can not decide when to
> > use the new or the old protocol during connection initiation. So the old
> > servers can not read the new init messages and we still temporarly end up
> > having two partitions during rolling restart.
> >
> > I already suggested two ways to handle this later, but I think for 3.6.0
> > now the simplest solution is to disable the new MultiAddress feature and
> > stick to the old protocol version by default. Plus extend the
> documentation
> > with the note, that enabling the MultiAddress feature is not possible
> > during a rolling upgrade, but it needs to be done with a separate rolling
> > restart. With this approach, the rolling restart should "just work" with
> > the 3.4 / 3.5 configs and we don't require any extra step / configuration
> > from the users, unless they want to use the new feature. I plan to
> submit a
> > PR with these changes tomorrow to ZOOKEEPER-3720, if there isn't any
> > different opinion.
> >
> > P.S. For 4.0 we might need to put some extra thinking into backward
> > compatibility / versioning for the quorum and client protocols.
> >
> >
> > On Tue, Feb 11, 2020, 20:44 Michael K. Edwards <m....@gmail.com>
> > wrote:
> >
> >> I hate to say it, but I think 3.6.0 should release as is.  It is
> impossible
> >> to *reliably* retrofit backwards compatibility / interoperability onto a
> >> release that was engineered from the beginning without that goal.  Learn
> >> the lesson, set goals differently in the future.
> >>
> >> On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <
> >> szalay.beko.mate@gmail.com>
> >> wrote:
> >>
> >>> FYI: I created these scripts for my local tests:
> >>> https://github.com/symat/zk-rolling-upgrade-test
> >>>
> >>> For the long term I would also add some script that actually monitors
> the
> >>> state of the quorum and also runs continuous traffic, not just 1-2
> >>> smoketests after each restart. But I don't know how important this
> would
> >>> be.
> >>>
> >>> On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
> >>> wrote:
> >>>
> >>>> Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
> >>>> <an...@apache.org> ha scritto:
> >>>>>
> >>>>> The most obvious one which crosses my mind is that I previously
> >> worked
> >>>> on:
> >>>>>
> >>>>> 1) run old version cluster,
> >>>>> 2) connect to each node and run smoke tests,
> >>>>> 3) restart one node with new code,
> >>>>> 4) goto 2) until all nodes are upgraded
> >>>>>
> >>>>> I think this wouldn’t work in a “unit test”, we probably need a
> >>> separate
> >>>> Jenkins job and a nice python script to do this.
> >>>>>
> >>>>> Andor
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org> wrote:
> >>>>>>
> >>>>>> Anyone have ideas how we could add testing for upgrade? Obviously
> >>>> something
> >>>>>> we're missing, esp given it's import.
> >>>>
> >>>> I will send an email next days with a proposal.
> >>>> btw my idea is very like Andor's one
> >>>>
> >>>> Once we have an automatic environment we can launch from Jenkins
> >>>>
> >>>> Enrico
> >>>>
> >>>>
> >>>>>>
> >>>>>> Patrick
> >>>>>>
> >>>>>> On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
> >>> eolivelli@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> >>>>>>> <sz...@gmail.com> ha scritto:
> >>>>>>>>
> >>>>>>>> Hi All,
> >>>>>>>>
> >>>>>>>> about the question from Michael:
> >>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> >>> protocol
> >>>> and
> >>>>>>>>> speak old message format when it's talking to old server?
> >>>>>>>>
> >>>>>>>> In this particular case, it might be enough. The protocol change
> >>>> happened
> >>>>>>>> now in the 'initial message' sent by the QuorumCnxManager. Maybe
> >> it
> >>>> is
> >>>>>>> not
> >>>>>>>> a problem if the new servers can not initiate channels to the old
> >>>>>>> servers,
> >>>>>>>> maybe it is enough if these channel gets initiated by the old
> >>> servers
> >>>>>>> only.
> >>>>>>>> I will test it quickly.
> >>>>>>>>
> >>>>>>>> Although I have no idea if any other thing changed in the quorum
> >>>> protocol
> >>>>>>>> between 3.5 and 3.6. In other cases it might not be enough if the
> >>> new
> >>>>>>>> servers can understand the old messages, as the old servers can
> >>>> break by
> >>>>>>>> not understanding the messages from the new servers. Also, in the
> >>>> code
> >>>>>>>> currently (AFAIK) there is no generic knowledge of protocol
> >>>> versions, the
> >>>>>>>> servers are not storing that which protocol versions they
> >>> can/should
> >>>> use
> >>>>>>> to
> >>>>>>>> communicate to which particular other servers. Maybe we don't
> >> even
> >>>> need
> >>>>>>>> this, but I would feel better if we would have more tests around
> >>>> these
> >>>>>>>> things.
> >>>>>>>>
> >>>>>>>> My suggestion for the long term:
> >>>>>>>> - let's fix this particular issue now with 3.6.0 quickly (I start
> >>>> doing
> >>>>>>>> this today)
> >>>>>>>> - let's do some automation (backed up with jenkins) that will
> >> test
> >>> a
> >>>>>>> whole
> >>>>>>>> combinations of different ZooKeeper upgrade paths by making
> >> rolling
> >>>>>>>> upgrades during some light traffic. Let's have a bit better
> >>>> definition
> >>>>>>>> about what we expect (e.g. the quorum is up, but some clients can
> >>> get
> >>>>>>>> disconnected? What will happen to the ephemeral nodes? Do we want
> >>> to
> >>>>>>>> gracefully close or transfer the user sessions before stopping
> >> the
> >>>> old
> >>>>>>>> server?) and let's see where this broke. Just by checking the
> >>> code, I
> >>>>>>> don't
> >>>>>>>> think the quorum will always be up (e.g. between older 3.4
> >> versions
> >>>> and
> >>>>>>>> 3.5).
> >>>>>>>
> >>>>>>>
> >>>>>>> I am happy to work on this topic
> >>>>>>>
> >>>>>>>> - we need to update the Wiki about the working rolling upgrade
> >>> paths
> >>>> and
> >>>>>>>> maybe about workarounds if needed
> >>>>>>>> - we might need to do some fixes (adding backward compatible
> >>> versions
> >>>>>>>> and/or specific parameters that enforce old protocol temporary
> >>>> during the
> >>>>>>>> rolling upgrade that can be changed later to the new protocol by
> >>>> either
> >>>>>>>> dynamic reconfig or by rolling restart)
> >>>>>>>
> >>>>>>> it would be much better on 3.6 code to have some support for
> >>>>>>> compatibility with 3.5 servers
> >>>>>>> we can't require old code to be forward compatible but we can make
> >>> new
> >>>>>>> code be compatible to a certain extend with old code.
> >>>>>>> If we can achieve this compatibility goal without a flag is
> >> better,
> >>>>>>> users won't have to care about this part and they simply "trust"
> >> on
> >>> us
> >>>>>>>
> >>>>>>> The rollback story is also important, but maybe we are still not
> >>> ready
> >>>>>>> for it, in case of local changes to store,
> >>>>>>> it is better to have a clear design and plan and work for a new
> >>>> release
> >>>>>>> (3.7?)
> >>>>>>>
> >>>>>>> Enrico
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Depending on your comments, I am happy to create a few Jira
> >> tickets
> >>>>>>> around
> >>>>>>>> these topics.
> >>>>>>>>
> >>>>>>>> Kind regards,
> >>>>>>>> Mate
> >>>>>>>>
> >>>>>>>> ps. Enrico, sorry about your RC... I owe you a beer, let me know
> >> if
> >>>> you
> >>>>>>> are
> >>>>>>>> near to Budapest ;)
> >>>>>>>>
> >>>>>>>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
> >>> eolivelli@gmail.com
> >>>>>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Good.
> >>>>>>>>>
> >>>>>>>>> I will cancel the vote for 3.6.0rc2.
> >>>>>>>>>
> >>>>>>>>> I appreciate very much If Mate and his colleagues have time to
> >>> work
> >>>> on
> >>>>>>> a
> >>>>>>>>> fix.
> >>>>>>>>> Otherwise I will have cycles next week
> >>>>>>>>>
> >>>>>>>>> I would also like to spend my time in setting up a few minimal
> >>>>>>> integration
> >>>>>>>>> tests about the upgrade story
> >>>>>>>>>
> >>>>>>>>> Enrico
> >>>>>>>>>
> >>>>>>>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
> >>> scritto:
> >>>>>>>>>
> >>>>>>>>>> Kudos Enrico, very thorough work as the final gate keeper of
> >> the
> >>>>>>> release!
> >>>>>>>>>>
> >>>>>>>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> >>>>>>>>>>
> >>>>>>>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of
> >>> the
> >>>>>>> rare
> >>>>>>>>>> piece of software that put so much emphasis on compatibilities
> >>> thus
> >>>>>>> it
> >>>>>>>>> just
> >>>>>>>>>> works when upgrade / downgrade, which is amazing. One guarantee
> >>> we
> >>>>>>> always
> >>>>>>>>>> had is during rolling upgrade, the quorum will always be
> >>> available,
> >>>>>>>>> leading
> >>>>>>>>>> to no service interruption. It would be sad we lose such
> >>> capability
> >>>>>>> given
> >>>>>>>>>> this is still a tractable problem.
> >>>>>>>>>>
> >>>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> >>> protocol
> >>>>>>> and
> >>>>>>>>>> speak old message format when it's talking to old server?
> >>>> Basically,
> >>>>>>> an
> >>>>>>>>>> ugly if else check against the protocol version should work and
> >>>>>>> there is
> >>>>>>>>> no
> >>>>>>>>>> need to have multiple pass on rolling upgrade process.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> >>>>>>> eolivelli@gmail.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I suggest this plan:
> >>>>>>>>>>> - release 3.6.0 now
> >>>>>>>>>>> - improve the migration story, the flow outlined by Mate is
> >>>>>>>>>>> interesting, but it will take time
> >>>>>>>>>>>
> >>>>>>>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize
> >> the
> >>>>>>>>>>> release this evening (within 8-10 hours) if no one comes out
> >> in
> >>>> the
> >>>>>>>>>>> VOTE thread with a -1
> >>>>>>>>>>>
> >>>>>>>>>>> Enrico
> >>>>>>>>>>>
> >>>>>>>>>>> Enrico
> >>>>>>>>>>>
> >>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> >>>>>>>>>>> <ph...@apache.org> ha scritto:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <
> >> andor@apache.org
> >>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Answers inline.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> In my experience when you are close to a release it is
> >>>>>>> better to
> >>>>>>>>> to
> >>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
> >>>>>>> so I
> >>>>>>>>> am
> >>>>>>>>>>>>>> responsible for this change)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Although this statement is acceptable for me, I don’t feel
> >>> this
> >>>>>>>>> patch
> >>>>>>>>>>>>> should not have been merged into 3.6.0. Submission has been
> >>>>>>>>> preceded
> >>>>>>>>>>> by a
> >>>>>>>>>>>>> long argument with MAPR folks who originally wanted to be
> >>>>>>> merged
> >>>>>>>>> into
> >>>>>>>>>>> 3.4
> >>>>>>>>>>>>> branch (considering the pace how ZooKeeper community is
> >> moving
> >>>>>>>>>>> forward) and
> >>>>>>>>>>>>> we reached an agreement that release it with 3.6.0.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Make a long story short, this patch has been outstanding for
> >>>>>>> ages
> >>>>>>>>>>> without
> >>>>>>>>>>>>> much attention from the community and contributors made a
> >> lot
> >>>>>>> of
> >>>>>>>>>>> effort to
> >>>>>>>>>>>>> get it done before the release.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> I would like to ear from people that have been in the
> >>>>>>> community
> >>>>>>>>> for
> >>>>>>>>>>>>>> long time, then I am ready to complete the release process
> >>>>>>> for
> >>>>>>>>>>>>>> 3.6.0rc2.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Me too.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I tend to accept the way rolling restart works now - as you
> >>>>>>>>> described
> >>>>>>>>>>>>> Enrico - and given that situation was pretty much the same
> >>>>>>> between
> >>>>>>>>>> 3.4
> >>>>>>>>>>> and
> >>>>>>>>>>>>> 3.5, I don’t feel we have to make additional changes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On the other hand, the fix that Mate suggested sounds quite
> >>>>>>> cool,
> >>>>>>>>> I’m
> >>>>>>>>>>> also
> >>>>>>>>>>>>> happy to work on getting it in.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Fyi, Release Management page says the following:
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>
> >> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> "major.minor release of ZooKeeper must be backwards
> >> compatible
> >>>>>>> with
> >>>>>>>>>> the
> >>>>>>>>>>>>> previous minor release, major.(minor-1)"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Our users, direct and indirect, value the ability to migrate
> >> to
> >>>>>>> newer
> >>>>>>>>>>>> versions - esp as we drop support for older. Frictions such
> >> as
> >>>>>>> this
> >>>>>>>>> can
> >>>>>>>>>>> be
> >>>>>>>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given
> >> our
> >>>>>>>>>> published
> >>>>>>>>>>>> guidelines.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Patrick
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Andor
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
> >>>>>>> eolivelli@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank you Mate for checking and explaining this story.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I find it very interesting that the cause is ZOOKEEPER-3188
> >>>>>>> as:
> >>>>>>>>>>>>>> - it is the last "big patch" committed to 3.6 before
> >>>>>>> starting the
> >>>>>>>>>>>>>> release process
> >>>>>>>>>>>>>> - it is the cause of the failure of the first RC
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> In my experience when you are close to a release it is
> >>>>>>> better to
> >>>>>>>>> to
> >>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
> >>>>>>> so I
> >>>>>>>>> am
> >>>>>>>>>>>>>> responsible for this change)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is a pointer to the change to whom who wants to
> >>>>>>> understand
> >>>>>>>>>>> better
> >>>>>>>>>>>>>> the context
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was the
> >>>>>>> same
> >>>>>>>>>> and
> >>>>>>>>>>>>>> if this statement holds then I feel we can continue
> >>>>>>>>>>>>>> with this release.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is
> >> too
> >>>>>>>>>>> complex.
> >>>>>>>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and we
> >>>>>>> do
> >>>>>>>>> not
> >>>>>>>>>>>>>> have tools to certify this compatibility (at least not in
> >> the
> >>>>>>>>> short
> >>>>>>>>>>>>>> term)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I would like to ear from people that have been in the
> >>>>>>> community
> >>>>>>>>> for
> >>>>>>>>>>>>>> long time, then I am ready to complete the release process
> >>>>>>> for
> >>>>>>>>>>>>>> 3.6.0rc2.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I will update the website and the release notes with a
> >>>>>>> specific
> >>>>>>>>>>>>>> warning about the upgrade, we should also update the Wiki
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Enrico
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> >>>>>>>>>>>>>> <sz...@gmail.com> ha scritto:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Enrico!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
> >>>>>>>>>>>>> QuorumCnxManager.
> >>>>>>>>>>>>>>> The Protocol version  was changed last time in
> >>>>>>> ZOOKEEPER-2186
> >>>>>>>>>>> released
> >>>>>>>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some
> >>>>>>> bugs.
> >>>>>>>>>>> Later I
> >>>>>>>>>>>>>>> also changed the protocol version when the format of the
> >>>>>>> initial
> >>>>>>>>>>> message
> >>>>>>>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum protocol
> >>>>>>> is
> >>>>>>>>> not
> >>>>>>>>>>>>>>> compatible in this case and is the 'expected' behavior if
> >>>>>>> you
> >>>>>>>>>>> upgrade
> >>>>>>>>>>>>> e.g
> >>>>>>>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6
> >> to
> >>>>>>>>> 3.6.0.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
> >>>>>>> then and
> >>>>>>>>>>> got to
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> conclusion that it is not that bad, as there will be no
> >> data
> >>>>>>>>> loss
> >>>>>>>>>>> as you
> >>>>>>>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
> >>>>>>> should
> >>>>>>>>>>> ensure
> >>>>>>>>>>>>>>> both backward and forward compatibility to make sure that
> >>>>>>> the
> >>>>>>>>> old
> >>>>>>>>>>> and
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> new part of the quorum can still speak to each other. The
> >>>>>>>>> current
> >>>>>>>>>>>>> solution
> >>>>>>>>>>>>>>> (simply failing if the protocol versions mismatch) is more
> >>>>>>>>> simple
> >>>>>>>>>>> and
> >>>>>>>>>>>>> still
> >>>>>>>>>>>>>>> working just fine: as the servers are restarted
> >> one-by-one,
> >>>>>>> the
> >>>>>>>>>>> nodes
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>>>> the old protocol version and the nodes with the new
> >> protocol
> >>>>>>>>>> version
> >>>>>>>>>>>>> will
> >>>>>>>>>>>>>>> form two partitions, but any given time only one partition
> >>>>>>> will
> >>>>>>>>>>> have the
> >>>>>>>>>>>>>>> quorum.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Still, thinking it trough, as a side effect in these cases
> >>>>>>> there
> >>>>>>>>>>> will
> >>>>>>>>>>>>> be a
> >>>>>>>>>>>>>>> short time when none of the partitions will have quorums
> >>>>>>> (when
> >>>>>>>>> we
> >>>>>>>>>>> have N
> >>>>>>>>>>>>>>> servers with the old protocol version, N servers with the
> >>>>>>> new
> >>>>>>>>>>> protocol
> >>>>>>>>>>>>>>> version, and there is one server just being restarted). I
> >>>>>>> am not
> >>>>>>>>>>> sure
> >>>>>>>>>>>>> if we
> >>>>>>>>>>>>>>> can accept this.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
> >>>>>>> possible
> >>>>>>>>> to
> >>>>>>>>>>> parse
> >>>>>>>>>>>>>>> the initial message of the old protocol version with the
> >> new
> >>>>>>>>> code.
> >>>>>>>>>>> But
> >>>>>>>>>>>>> I am
> >>>>>>>>>>>>>>> not sure if it would be enough (as the old code will not
> >> be
> >>>>>>> able
> >>>>>>>>>> to
> >>>>>>>>>>>>> parse
> >>>>>>>>>>>>>>> the new initial message).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> One option can be to make a patch also for 3.5 to have a
> >>>>>>> version
> >>>>>>>>>>> which
> >>>>>>>>>>>>>>> supports both protocol versions. (let's say in 3.5.8) Then
> >>>>>>> we
> >>>>>>>>> can
> >>>>>>>>>>> write
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> the release note, that if you need rolling upgrade from
> >> any
> >>>>>>>>>> versions
> >>>>>>>>>>>>> since
> >>>>>>>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
> >>>>>>>>> upgrading
> >>>>>>>>>> to
> >>>>>>>>>>>>> 3.6.0.
> >>>>>>>>>>>>>>> We can even make the same thing on the 3.4 branch.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> But I am also new to the community... It would be great to
> >>>>>>> hear
> >>>>>>>>>> the
> >>>>>>>>>>>>> opinion
> >>>>>>>>>>>>>>> of more experienced people.
> >>>>>>>>>>>>>>> Whatever the decision will be, I am happy to make the
> >>>>>>> changes.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> And sorry for breaking the RC (if we decide that this
> >> needs
> >>>>>>> to
> >>>>>>>>> be
> >>>>>>>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>> Mate
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> >>>>>>>>>>> eolivelli@gmail.com>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
> >>>>>>> closing the
> >>>>>>>>>>> VOTE
> >>>>>>>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to
> >> an
> >>>>>>>>>> apparent
> >>>>>>>>>>>>>>>> blocker.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> >>>>>>> looks
> >>>>>>>>>> like
> >>>>>>>>>>>>>>>> peers are not able to talk to each other.
> >>>>>>>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
> >>>>>>>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> >>>>>>> errors on
> >>>>>>>>>> 3.5
> >>>>>>>>>>>>> nodes:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> >>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918]
> >> -
> >>>>>>>>>>> Received
> >>>>>>>>>>>>>>>> connection request 127.0.0.1:62591
> >>>>>>>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> >>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> >>>>>>>>>>>>>>>> Got unrecognized protocol version -65535
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Once I upgrade all of the peers the system is up and
> >>>>>>> running,
> >>>>>>>>>>> without
> >>>>>>>>>>>>>>>> apparently no data loss.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
> >>>>>>> say,
> >>>>>>>>>>> server1,
> >>>>>>>>>>>>>>>> server1 is not able to accept connections (error "Close
> >> of
> >>>>>>>>>> session
> >>>>>>>>>>> 0x0
> >>>>>>>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
> >>>>>>>>> clients,
> >>>>>>>>>>> this
> >>>>>>>>>>>>>>>> is expected, because as far as it cannot talk with the
> >>>>>>> other
> >>>>>>>>>> peers
> >>>>>>>>>>> it
> >>>>>>>>>>>>>>>> is practically partitioned away from the cluster.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> My questions are:
> >>>>>>>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
> >>>>>>> from
> >>>>>>>>> 3.5
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long
> >> ago,
> >>>>>>>>> and I
> >>>>>>>>>>> was
> >>>>>>>>>>>>>>>> not in the community as dev so I cannot tell
> >>>>>>>>>>>>>>>> 2) is this a viable option for users ? to have some
> >>>>>>> temporary
> >>>>>>>>>>> glitch
> >>>>>>>>>>>>>>>> during the upgrade and hope that the upgrade completes
> >>>>>>> without
> >>>>>>>>>>>>>>>> troubles ?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In theory as long as two servers are running the same
> >> major
> >>>>>>>>>> version
> >>>>>>>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
> >>>>>>> make
> >>>>>>>>>>> progress
> >>>>>>>>>>>>>>>> and to server clients.
> >>>>>>>>>>>>>>>> I feel that this is quite dangerous, but I don't have
> >>>>>>> enough
> >>>>>>>>>>> context
> >>>>>>>>>>>>>>>> to understand how this problem is possible and when we
> >>>>>>> decided
> >>>>>>>>> to
> >>>>>>>>>>>>>>>> break compatibility.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The other option is that I am wrong in my test and I am
> >>>>>>> messing
> >>>>>>>>>> up
> >>>>>>>>>>> :-)
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The other upgrade path I would like to see working like a
> >>>>>>> charm
> >>>>>>>>>> is
> >>>>>>>>>>> the
> >>>>>>>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
> >>>>>>> release
> >>>>>>>>> 3.6
> >>>>>>>>>> we
> >>>>>>>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Regards
> >>>>>>>>>>>>>>>> Enrico
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
>
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Andor Molnar <an...@apache.org>.
Mate,

Let me reiterate to see if I understand you correctly:

1) First we make multi-address feature disabled by default. 

2) If disabled, quorum protocol automatically uses the old protocol version which lets 3.5 and 3.6 communicate smoothly.

3) Once the user finished the first rolling restart, the cluster is running 3.6 and communicating with old protocol.

4) If user wants to enable multi-address feature (I think that’s quite rare anyway), another rolling restart is needed, but this time partitions cannot be avoided.

Is that accurate?

I think it’s not the end of the world.

Andor




> On 2020. Feb 11., at 21:27, Szalay-Bekő Máté <sz...@gmail.com> wrote:
> 
> I see the main problem here in the fact that we are missing proper
> versioning in the leader election / quorum protocols. I tried to simply
> implement backward compatibility in 3.6, but it didn't solve the problem.
> The new code understands the old protocol, but it can not decide when to
> use the new or the old protocol during connection initiation. So the old
> servers can not read the new init messages and we still temporarly end up
> having two partitions during rolling restart.
> 
> I already suggested two ways to handle this later, but I think for 3.6.0
> now the simplest solution is to disable the new MultiAddress feature and
> stick to the old protocol version by default. Plus extend the documentation
> with the note, that enabling the MultiAddress feature is not possible
> during a rolling upgrade, but it needs to be done with a separate rolling
> restart. With this approach, the rolling restart should "just work" with
> the 3.4 / 3.5 configs and we don't require any extra step / configuration
> from the users, unless they want to use the new feature. I plan to submit a
> PR with these changes tomorrow to ZOOKEEPER-3720, if there isn't any
> different opinion.
> 
> P.S. For 4.0 we might need to put some extra thinking into backward
> compatibility / versioning for the quorum and client protocols.
> 
> 
> On Tue, Feb 11, 2020, 20:44 Michael K. Edwards <m....@gmail.com>
> wrote:
> 
>> I hate to say it, but I think 3.6.0 should release as is.  It is impossible
>> to *reliably* retrofit backwards compatibility / interoperability onto a
>> release that was engineered from the beginning without that goal.  Learn
>> the lesson, set goals differently in the future.
>> 
>> On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <
>> szalay.beko.mate@gmail.com>
>> wrote:
>> 
>>> FYI: I created these scripts for my local tests:
>>> https://github.com/symat/zk-rolling-upgrade-test
>>> 
>>> For the long term I would also add some script that actually monitors the
>>> state of the quorum and also runs continuous traffic, not just 1-2
>>> smoketests after each restart. But I don't know how important this would
>>> be.
>>> 
>>> On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
>>> wrote:
>>> 
>>>> Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
>>>> <an...@apache.org> ha scritto:
>>>>> 
>>>>> The most obvious one which crosses my mind is that I previously
>> worked
>>>> on:
>>>>> 
>>>>> 1) run old version cluster,
>>>>> 2) connect to each node and run smoke tests,
>>>>> 3) restart one node with new code,
>>>>> 4) goto 2) until all nodes are upgraded
>>>>> 
>>>>> I think this wouldn’t work in a “unit test”, we probably need a
>>> separate
>>>> Jenkins job and a nice python script to do this.
>>>>> 
>>>>> Andor
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org> wrote:
>>>>>> 
>>>>>> Anyone have ideas how we could add testing for upgrade? Obviously
>>>> something
>>>>>> we're missing, esp given it's import.
>>>> 
>>>> I will send an email next days with a proposal.
>>>> btw my idea is very like Andor's one
>>>> 
>>>> Once we have an automatic environment we can launch from Jenkins
>>>> 
>>>> Enrico
>>>> 
>>>> 
>>>>>> 
>>>>>> Patrick
>>>>>> 
>>>>>> On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
>>> eolivelli@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
>>>>>>> <sz...@gmail.com> ha scritto:
>>>>>>>> 
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> about the question from Michael:
>>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
>>> protocol
>>>> and
>>>>>>>>> speak old message format when it's talking to old server?
>>>>>>>> 
>>>>>>>> In this particular case, it might be enough. The protocol change
>>>> happened
>>>>>>>> now in the 'initial message' sent by the QuorumCnxManager. Maybe
>> it
>>>> is
>>>>>>> not
>>>>>>>> a problem if the new servers can not initiate channels to the old
>>>>>>> servers,
>>>>>>>> maybe it is enough if these channel gets initiated by the old
>>> servers
>>>>>>> only.
>>>>>>>> I will test it quickly.
>>>>>>>> 
>>>>>>>> Although I have no idea if any other thing changed in the quorum
>>>> protocol
>>>>>>>> between 3.5 and 3.6. In other cases it might not be enough if the
>>> new
>>>>>>>> servers can understand the old messages, as the old servers can
>>>> break by
>>>>>>>> not understanding the messages from the new servers. Also, in the
>>>> code
>>>>>>>> currently (AFAIK) there is no generic knowledge of protocol
>>>> versions, the
>>>>>>>> servers are not storing that which protocol versions they
>>> can/should
>>>> use
>>>>>>> to
>>>>>>>> communicate to which particular other servers. Maybe we don't
>> even
>>>> need
>>>>>>>> this, but I would feel better if we would have more tests around
>>>> these
>>>>>>>> things.
>>>>>>>> 
>>>>>>>> My suggestion for the long term:
>>>>>>>> - let's fix this particular issue now with 3.6.0 quickly (I start
>>>> doing
>>>>>>>> this today)
>>>>>>>> - let's do some automation (backed up with jenkins) that will
>> test
>>> a
>>>>>>> whole
>>>>>>>> combinations of different ZooKeeper upgrade paths by making
>> rolling
>>>>>>>> upgrades during some light traffic. Let's have a bit better
>>>> definition
>>>>>>>> about what we expect (e.g. the quorum is up, but some clients can
>>> get
>>>>>>>> disconnected? What will happen to the ephemeral nodes? Do we want
>>> to
>>>>>>>> gracefully close or transfer the user sessions before stopping
>> the
>>>> old
>>>>>>>> server?) and let's see where this broke. Just by checking the
>>> code, I
>>>>>>> don't
>>>>>>>> think the quorum will always be up (e.g. between older 3.4
>> versions
>>>> and
>>>>>>>> 3.5).
>>>>>>> 
>>>>>>> 
>>>>>>> I am happy to work on this topic
>>>>>>> 
>>>>>>>> - we need to update the Wiki about the working rolling upgrade
>>> paths
>>>> and
>>>>>>>> maybe about workarounds if needed
>>>>>>>> - we might need to do some fixes (adding backward compatible
>>> versions
>>>>>>>> and/or specific parameters that enforce old protocol temporary
>>>> during the
>>>>>>>> rolling upgrade that can be changed later to the new protocol by
>>>> either
>>>>>>>> dynamic reconfig or by rolling restart)
>>>>>>> 
>>>>>>> it would be much better on 3.6 code to have some support for
>>>>>>> compatibility with 3.5 servers
>>>>>>> we can't require old code to be forward compatible but we can make
>>> new
>>>>>>> code be compatible to a certain extend with old code.
>>>>>>> If we can achieve this compatibility goal without a flag is
>> better,
>>>>>>> users won't have to care about this part and they simply "trust"
>> on
>>> us
>>>>>>> 
>>>>>>> The rollback story is also important, but maybe we are still not
>>> ready
>>>>>>> for it, in case of local changes to store,
>>>>>>> it is better to have a clear design and plan and work for a new
>>>> release
>>>>>>> (3.7?)
>>>>>>> 
>>>>>>> Enrico
>>>>>>> 
>>>>>>>> 
>>>>>>>> Depending on your comments, I am happy to create a few Jira
>> tickets
>>>>>>> around
>>>>>>>> these topics.
>>>>>>>> 
>>>>>>>> Kind regards,
>>>>>>>> Mate
>>>>>>>> 
>>>>>>>> ps. Enrico, sorry about your RC... I owe you a beer, let me know
>> if
>>>> you
>>>>>>> are
>>>>>>>> near to Budapest ;)
>>>>>>>> 
>>>>>>>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
>>> eolivelli@gmail.com
>>>>> 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Good.
>>>>>>>>> 
>>>>>>>>> I will cancel the vote for 3.6.0rc2.
>>>>>>>>> 
>>>>>>>>> I appreciate very much If Mate and his colleagues have time to
>>> work
>>>> on
>>>>>>> a
>>>>>>>>> fix.
>>>>>>>>> Otherwise I will have cycles next week
>>>>>>>>> 
>>>>>>>>> I would also like to spend my time in setting up a few minimal
>>>>>>> integration
>>>>>>>>> tests about the upgrade story
>>>>>>>>> 
>>>>>>>>> Enrico
>>>>>>>>> 
>>>>>>>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
>>> scritto:
>>>>>>>>> 
>>>>>>>>>> Kudos Enrico, very thorough work as the final gate keeper of
>> the
>>>>>>> release!
>>>>>>>>>> 
>>>>>>>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
>>>>>>>>>> 
>>>>>>>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of
>>> the
>>>>>>> rare
>>>>>>>>>> piece of software that put so much emphasis on compatibilities
>>> thus
>>>>>>> it
>>>>>>>>> just
>>>>>>>>>> works when upgrade / downgrade, which is amazing. One guarantee
>>> we
>>>>>>> always
>>>>>>>>>> had is during rolling upgrade, the quorum will always be
>>> available,
>>>>>>>>> leading
>>>>>>>>>> to no service interruption. It would be sad we lose such
>>> capability
>>>>>>> given
>>>>>>>>>> this is still a tractable problem.
>>>>>>>>>> 
>>>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
>>> protocol
>>>>>>> and
>>>>>>>>>> speak old message format when it's talking to old server?
>>>> Basically,
>>>>>>> an
>>>>>>>>>> ugly if else check against the protocol version should work and
>>>>>>> there is
>>>>>>>>> no
>>>>>>>>>> need to have multiple pass on rolling upgrade process.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
>>>>>>> eolivelli@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> I suggest this plan:
>>>>>>>>>>> - release 3.6.0 now
>>>>>>>>>>> - improve the migration story, the flow outlined by Mate is
>>>>>>>>>>> interesting, but it will take time
>>>>>>>>>>> 
>>>>>>>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize
>> the
>>>>>>>>>>> release this evening (within 8-10 hours) if no one comes out
>> in
>>>> the
>>>>>>>>>>> VOTE thread with a -1
>>>>>>>>>>> 
>>>>>>>>>>> Enrico
>>>>>>>>>>> 
>>>>>>>>>>> Enrico
>>>>>>>>>>> 
>>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
>>>>>>>>>>> <ph...@apache.org> ha scritto:
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <
>> andor@apache.org
>>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Answers inline.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In my experience when you are close to a release it is
>>>>>>> better to
>>>>>>>>> to
>>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
>>>>>>> so I
>>>>>>>>> am
>>>>>>>>>>>>>> responsible for this change)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Although this statement is acceptable for me, I don’t feel
>>> this
>>>>>>>>> patch
>>>>>>>>>>>>> should not have been merged into 3.6.0. Submission has been
>>>>>>>>> preceded
>>>>>>>>>>> by a
>>>>>>>>>>>>> long argument with MAPR folks who originally wanted to be
>>>>>>> merged
>>>>>>>>> into
>>>>>>>>>>> 3.4
>>>>>>>>>>>>> branch (considering the pace how ZooKeeper community is
>> moving
>>>>>>>>>>> forward) and
>>>>>>>>>>>>> we reached an agreement that release it with 3.6.0.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Make a long story short, this patch has been outstanding for
>>>>>>> ages
>>>>>>>>>>> without
>>>>>>>>>>>>> much attention from the community and contributors made a
>> lot
>>>>>>> of
>>>>>>>>>>> effort to
>>>>>>>>>>>>> get it done before the release.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I would like to ear from people that have been in the
>>>>>>> community
>>>>>>>>> for
>>>>>>>>>>>>>> long time, then I am ready to complete the release process
>>>>>>> for
>>>>>>>>>>>>>> 3.6.0rc2.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Me too.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I tend to accept the way rolling restart works now - as you
>>>>>>>>> described
>>>>>>>>>>>>> Enrico - and given that situation was pretty much the same
>>>>>>> between
>>>>>>>>>> 3.4
>>>>>>>>>>> and
>>>>>>>>>>>>> 3.5, I don’t feel we have to make additional changes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On the other hand, the fix that Mate suggested sounds quite
>>>>>>> cool,
>>>>>>>>> I’m
>>>>>>>>>>> also
>>>>>>>>>>>>> happy to work on getting it in.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Fyi, Release Management page says the following:
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
>>>>>>>>>>>>> 
>>>>>>>>>>>>> "major.minor release of ZooKeeper must be backwards
>> compatible
>>>>>>> with
>>>>>>>>>> the
>>>>>>>>>>>>> previous minor release, major.(minor-1)"
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> Our users, direct and indirect, value the ability to migrate
>> to
>>>>>>> newer
>>>>>>>>>>>> versions - esp as we drop support for older. Frictions such
>> as
>>>>>>> this
>>>>>>>>> can
>>>>>>>>>>> be
>>>>>>>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given
>> our
>>>>>>>>>> published
>>>>>>>>>>>> guidelines.
>>>>>>>>>>>> 
>>>>>>>>>>>> Patrick
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> Andor
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
>>>>>>> eolivelli@gmail.com
>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thank you Mate for checking and explaining this story.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I find it very interesting that the cause is ZOOKEEPER-3188
>>>>>>> as:
>>>>>>>>>>>>>> - it is the last "big patch" committed to 3.6 before
>>>>>>> starting the
>>>>>>>>>>>>>> release process
>>>>>>>>>>>>>> - it is the cause of the failure of the first RC
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In my experience when you are close to a release it is
>>>>>>> better to
>>>>>>>>> to
>>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
>>>>>>> so I
>>>>>>>>> am
>>>>>>>>>>>>>> responsible for this change)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This is a pointer to the change to whom who wants to
>>>>>>> understand
>>>>>>>>>>> better
>>>>>>>>>>>>>> the context
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>>> 
>> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was the
>>>>>>> same
>>>>>>>>>> and
>>>>>>>>>>>>>> if this statement holds then I feel we can continue
>>>>>>>>>>>>>> with this release.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is
>> too
>>>>>>>>>>> complex.
>>>>>>>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and we
>>>>>>> do
>>>>>>>>> not
>>>>>>>>>>>>>> have tools to certify this compatibility (at least not in
>> the
>>>>>>>>> short
>>>>>>>>>>>>>> term)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I would like to ear from people that have been in the
>>>>>>> community
>>>>>>>>> for
>>>>>>>>>>>>>> long time, then I am ready to complete the release process
>>>>>>> for
>>>>>>>>>>>>>> 3.6.0rc2.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I will update the website and the release notes with a
>>>>>>> specific
>>>>>>>>>>>>>> warning about the upgrade, we should also update the Wiki
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
>>>>>>>>>>>>>> <sz...@gmail.com> ha scritto:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Enrico!
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
>>>>>>>>>>>>> QuorumCnxManager.
>>>>>>>>>>>>>>> The Protocol version  was changed last time in
>>>>>>> ZOOKEEPER-2186
>>>>>>>>>>> released
>>>>>>>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some
>>>>>>> bugs.
>>>>>>>>>>> Later I
>>>>>>>>>>>>>>> also changed the protocol version when the format of the
>>>>>>> initial
>>>>>>>>>>> message
>>>>>>>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum protocol
>>>>>>> is
>>>>>>>>> not
>>>>>>>>>>>>>>> compatible in this case and is the 'expected' behavior if
>>>>>>> you
>>>>>>>>>>> upgrade
>>>>>>>>>>>>> e.g
>>>>>>>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6
>> to
>>>>>>>>> 3.6.0.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
>>>>>>> then and
>>>>>>>>>>> got to
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> conclusion that it is not that bad, as there will be no
>> data
>>>>>>>>> loss
>>>>>>>>>>> as you
>>>>>>>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
>>>>>>> should
>>>>>>>>>>> ensure
>>>>>>>>>>>>>>> both backward and forward compatibility to make sure that
>>>>>>> the
>>>>>>>>> old
>>>>>>>>>>> and
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> new part of the quorum can still speak to each other. The
>>>>>>>>> current
>>>>>>>>>>>>> solution
>>>>>>>>>>>>>>> (simply failing if the protocol versions mismatch) is more
>>>>>>>>> simple
>>>>>>>>>>> and
>>>>>>>>>>>>> still
>>>>>>>>>>>>>>> working just fine: as the servers are restarted
>> one-by-one,
>>>>>>> the
>>>>>>>>>>> nodes
>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> the old protocol version and the nodes with the new
>> protocol
>>>>>>>>>> version
>>>>>>>>>>>>> will
>>>>>>>>>>>>>>> form two partitions, but any given time only one partition
>>>>>>> will
>>>>>>>>>>> have the
>>>>>>>>>>>>>>> quorum.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Still, thinking it trough, as a side effect in these cases
>>>>>>> there
>>>>>>>>>>> will
>>>>>>>>>>>>> be a
>>>>>>>>>>>>>>> short time when none of the partitions will have quorums
>>>>>>> (when
>>>>>>>>> we
>>>>>>>>>>> have N
>>>>>>>>>>>>>>> servers with the old protocol version, N servers with the
>>>>>>> new
>>>>>>>>>>> protocol
>>>>>>>>>>>>>>> version, and there is one server just being restarted). I
>>>>>>> am not
>>>>>>>>>>> sure
>>>>>>>>>>>>> if we
>>>>>>>>>>>>>>> can accept this.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
>>>>>>> possible
>>>>>>>>> to
>>>>>>>>>>> parse
>>>>>>>>>>>>>>> the initial message of the old protocol version with the
>> new
>>>>>>>>> code.
>>>>>>>>>>> But
>>>>>>>>>>>>> I am
>>>>>>>>>>>>>>> not sure if it would be enough (as the old code will not
>> be
>>>>>>> able
>>>>>>>>>> to
>>>>>>>>>>>>> parse
>>>>>>>>>>>>>>> the new initial message).
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> One option can be to make a patch also for 3.5 to have a
>>>>>>> version
>>>>>>>>>>> which
>>>>>>>>>>>>>>> supports both protocol versions. (let's say in 3.5.8) Then
>>>>>>> we
>>>>>>>>> can
>>>>>>>>>>> write
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the release note, that if you need rolling upgrade from
>> any
>>>>>>>>>> versions
>>>>>>>>>>>>> since
>>>>>>>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
>>>>>>>>> upgrading
>>>>>>>>>> to
>>>>>>>>>>>>> 3.6.0.
>>>>>>>>>>>>>>> We can even make the same thing on the 3.4 branch.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> But I am also new to the community... It would be great to
>>>>>>> hear
>>>>>>>>>> the
>>>>>>>>>>>>> opinion
>>>>>>>>>>>>>>> of more experienced people.
>>>>>>>>>>>>>>> Whatever the decision will be, I am happy to make the
>>>>>>> changes.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> And sorry for breaking the RC (if we decide that this
>> needs
>>>>>>> to
>>>>>>>>> be
>>>>>>>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>> Mate
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
>>>>>>>>>>> eolivelli@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
>>>>>>> closing the
>>>>>>>>>>> VOTE
>>>>>>>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to
>> an
>>>>>>>>>> apparent
>>>>>>>>>>>>>>>> blocker.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
>>>>>>> looks
>>>>>>>>>> like
>>>>>>>>>>>>>>>> peers are not able to talk to each other.
>>>>>>>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
>>>>>>>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
>>>>>>> errors on
>>>>>>>>>> 3.5
>>>>>>>>>>>>> nodes:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
>>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918]
>> -
>>>>>>>>>>> Received
>>>>>>>>>>>>>>>> connection request 127.0.0.1:62591
>>>>>>>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
>>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>> 
>>> 
>> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
>>>>>>>>>>>>>>>> Got unrecognized protocol version -65535
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Once I upgrade all of the peers the system is up and
>>>>>>> running,
>>>>>>>>>>> without
>>>>>>>>>>>>>>>> apparently no data loss.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
>>>>>>> say,
>>>>>>>>>>> server1,
>>>>>>>>>>>>>>>> server1 is not able to accept connections (error "Close
>> of
>>>>>>>>>> session
>>>>>>>>>>> 0x0
>>>>>>>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
>>>>>>>>> clients,
>>>>>>>>>>> this
>>>>>>>>>>>>>>>> is expected, because as far as it cannot talk with the
>>>>>>> other
>>>>>>>>>> peers
>>>>>>>>>>> it
>>>>>>>>>>>>>>>> is practically partitioned away from the cluster.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> My questions are:
>>>>>>>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
>>>>>>> from
>>>>>>>>> 3.5
>>>>>>>>>> to
>>>>>>>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long
>> ago,
>>>>>>>>> and I
>>>>>>>>>>> was
>>>>>>>>>>>>>>>> not in the community as dev so I cannot tell
>>>>>>>>>>>>>>>> 2) is this a viable option for users ? to have some
>>>>>>> temporary
>>>>>>>>>>> glitch
>>>>>>>>>>>>>>>> during the upgrade and hope that the upgrade completes
>>>>>>> without
>>>>>>>>>>>>>>>> troubles ?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> In theory as long as two servers are running the same
>> major
>>>>>>>>>> version
>>>>>>>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
>>>>>>> make
>>>>>>>>>>> progress
>>>>>>>>>>>>>>>> and to server clients.
>>>>>>>>>>>>>>>> I feel that this is quite dangerous, but I don't have
>>>>>>> enough
>>>>>>>>>>> context
>>>>>>>>>>>>>>>> to understand how this problem is possible and when we
>>>>>>> decided
>>>>>>>>> to
>>>>>>>>>>>>>>>> break compatibility.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The other option is that I am wrong in my test and I am
>>>>>>> messing
>>>>>>>>>> up
>>>>>>>>>>> :-)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The other upgrade path I would like to see working like a
>>>>>>> charm
>>>>>>>>>> is
>>>>>>>>>>> the
>>>>>>>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
>>>>>>> release
>>>>>>>>> 3.6
>>>>>>>>>> we
>>>>>>>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 


Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Enrico Olivelli <eo...@gmail.com>.
Michael,
your points are valid.
I would like to see the proposal from Mate.
Up to  ZOOKEEPER-3188 no other patch in 3.6 (from my limited point of
view) introduced changes in quorum peer protocol to make it non
compatible with 3.5.

Enrico

Il giorno mar 11 feb 2020 alle ore 23:35 Michael K. Edwards
<m....@gmail.com> ha scritto:
>
> I think it would be prudent to emphasize in the release notes that rolling
> upgrades (and mixed ensembles generally) are effectively untested.  That
> this was, in practice, a non-goal of this release cycle.  Because if we can
> get to rc2 without noticing a showstopper, clearly it's not something that
> anyone has gotten around to attempting; and there have to be a hundred
> corner cases beyond the MultiAddress issue.
>
> On Tue, Feb 11, 2020 at 12:27 PM Szalay-Bekő Máté <
> szalay.beko.mate@gmail.com> wrote:
>
> > I see the main problem here in the fact that we are missing proper
> > versioning in the leader election / quorum protocols. I tried to simply
> > implement backward compatibility in 3.6, but it didn't solve the problem.
> > The new code understands the old protocol, but it can not decide when to
> > use the new or the old protocol during connection initiation. So the old
> > servers can not read the new init messages and we still temporarly end up
> > having two partitions during rolling restart.
> >
> > I already suggested two ways to handle this later, but I think for 3.6.0
> > now the simplest solution is to disable the new MultiAddress feature and
> > stick to the old protocol version by default. Plus extend the
> > documentation with the note, that enabling the MultiAddress feature is not
> > possible during a rolling upgrade, but it needs to be done with a separate
> > rolling restart. With this approach, the rolling restart should "just work"
> > with the 3.4 / 3.5 configs and we don't require any extra step /
> > configuration from the users, unless they want to use the new feature. I
> > plan to submit a PR with these changes tomorrow to ZOOKEEPER-3720, if there
> > isn't any different opinion.
> >
> > P.S. For 4.0 we might need to put some extra thinking into backward
> > compatibility / versioning for the quorum and client protocols.
> >
> >
> > On Tue, Feb 11, 2020, 20:44 Michael K. Edwards <m....@gmail.com>
> > wrote:
> >
> >> I hate to say it, but I think 3.6.0 should release as is.  It is
> >> impossible
> >> to *reliably* retrofit backwards compatibility / interoperability onto a
> >> release that was engineered from the beginning without that goal.  Learn
> >> the lesson, set goals differently in the future.
> >>
> >> On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <
> >> szalay.beko.mate@gmail.com>
> >> wrote:
> >>
> >> > FYI: I created these scripts for my local tests:
> >> > https://github.com/symat/zk-rolling-upgrade-test
> >> >
> >> > For the long term I would also add some script that actually monitors
> >> the
> >> > state of the quorum and also runs continuous traffic, not just 1-2
> >> > smoketests after each restart. But I don't know how important this would
> >> > be.
> >> >
> >> > On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
> >> > wrote:
> >> >
> >> > > Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
> >> > > <an...@apache.org> ha scritto:
> >> > > >
> >> > > > The most obvious one which crosses my mind is that I previously
> >> worked
> >> > > on:
> >> > > >
> >> > > > 1) run old version cluster,
> >> > > > 2) connect to each node and run smoke tests,
> >> > > > 3) restart one node with new code,
> >> > > > 4) goto 2) until all nodes are upgraded
> >> > > >
> >> > > > I think this wouldn’t work in a “unit test”, we probably need a
> >> > separate
> >> > > Jenkins job and a nice python script to do this.
> >> > > >
> >> > > > Andor
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > > On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org>
> >> wrote:
> >> > > > >
> >> > > > > Anyone have ideas how we could add testing for upgrade? Obviously
> >> > > something
> >> > > > > we're missing, esp given it's import.
> >> > >
> >> > > I will send an email next days with a proposal.
> >> > > btw my idea is very like Andor's one
> >> > >
> >> > > Once we have an automatic environment we can launch from Jenkins
> >> > >
> >> > > Enrico
> >> > >
> >> > >
> >> > > > >
> >> > > > > Patrick
> >> > > > >
> >> > > > > On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
> >> > eolivelli@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > >> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> >> > > > >> <sz...@gmail.com> ha scritto:
> >> > > > >>>
> >> > > > >>> Hi All,
> >> > > > >>>
> >> > > > >>> about the question from Michael:
> >> > > > >>>> Regarding the fix, can we just make 3.6.0 aware of the old
> >> > protocol
> >> > > and
> >> > > > >>>> speak old message format when it's talking to old server?
> >> > > > >>>
> >> > > > >>> In this particular case, it might be enough. The protocol change
> >> > > happened
> >> > > > >>> now in the 'initial message' sent by the QuorumCnxManager.
> >> Maybe it
> >> > > is
> >> > > > >> not
> >> > > > >>> a problem if the new servers can not initiate channels to the
> >> old
> >> > > > >> servers,
> >> > > > >>> maybe it is enough if these channel gets initiated by the old
> >> > servers
> >> > > > >> only.
> >> > > > >>> I will test it quickly.
> >> > > > >>>
> >> > > > >>> Although I have no idea if any other thing changed in the quorum
> >> > > protocol
> >> > > > >>> between 3.5 and 3.6. In other cases it might not be enough if
> >> the
> >> > new
> >> > > > >>> servers can understand the old messages, as the old servers can
> >> > > break by
> >> > > > >>> not understanding the messages from the new servers. Also, in
> >> the
> >> > > code
> >> > > > >>> currently (AFAIK) there is no generic knowledge of protocol
> >> > > versions, the
> >> > > > >>> servers are not storing that which protocol versions they
> >> > can/should
> >> > > use
> >> > > > >> to
> >> > > > >>> communicate to which particular other servers. Maybe we don't
> >> even
> >> > > need
> >> > > > >>> this, but I would feel better if we would have more tests around
> >> > > these
> >> > > > >>> things.
> >> > > > >>>
> >> > > > >>> My suggestion for the long term:
> >> > > > >>> - let's fix this particular issue now with 3.6.0 quickly (I
> >> start
> >> > > doing
> >> > > > >>> this today)
> >> > > > >>> - let's do some automation (backed up with jenkins) that will
> >> test
> >> > a
> >> > > > >> whole
> >> > > > >>> combinations of different ZooKeeper upgrade paths by making
> >> rolling
> >> > > > >>> upgrades during some light traffic. Let's have a bit better
> >> > > definition
> >> > > > >>> about what we expect (e.g. the quorum is up, but some clients
> >> can
> >> > get
> >> > > > >>> disconnected? What will happen to the ephemeral nodes? Do we
> >> want
> >> > to
> >> > > > >>> gracefully close or transfer the user sessions before stopping
> >> the
> >> > > old
> >> > > > >>> server?) and let's see where this broke. Just by checking the
> >> > code, I
> >> > > > >> don't
> >> > > > >>> think the quorum will always be up (e.g. between older 3.4
> >> versions
> >> > > and
> >> > > > >>> 3.5).
> >> > > > >>
> >> > > > >>
> >> > > > >> I am happy to work on this topic
> >> > > > >>
> >> > > > >>> - we need to update the Wiki about the working rolling upgrade
> >> > paths
> >> > > and
> >> > > > >>> maybe about workarounds if needed
> >> > > > >>> - we might need to do some fixes (adding backward compatible
> >> > versions
> >> > > > >>> and/or specific parameters that enforce old protocol temporary
> >> > > during the
> >> > > > >>> rolling upgrade that can be changed later to the new protocol by
> >> > > either
> >> > > > >>> dynamic reconfig or by rolling restart)
> >> > > > >>
> >> > > > >> it would be much better on 3.6 code to have some support for
> >> > > > >> compatibility with 3.5 servers
> >> > > > >> we can't require old code to be forward compatible but we can
> >> make
> >> > new
> >> > > > >> code be compatible to a certain extend with old code.
> >> > > > >> If we can achieve this compatibility goal without a flag is
> >> better,
> >> > > > >> users won't have to care about this part and they simply "trust"
> >> on
> >> > us
> >> > > > >>
> >> > > > >> The rollback story is also important, but maybe we are still not
> >> > ready
> >> > > > >> for it, in case of local changes to store,
> >> > > > >> it is better to have a clear design and plan and work for a new
> >> > > release
> >> > > > >> (3.7?)
> >> > > > >>
> >> > > > >> Enrico
> >> > > > >>
> >> > > > >>>
> >> > > > >>> Depending on your comments, I am happy to create a few Jira
> >> tickets
> >> > > > >> around
> >> > > > >>> these topics.
> >> > > > >>>
> >> > > > >>> Kind regards,
> >> > > > >>> Mate
> >> > > > >>>
> >> > > > >>> ps. Enrico, sorry about your RC... I owe you a beer, let me
> >> know if
> >> > > you
> >> > > > >> are
> >> > > > >>> near to Budapest ;)
> >> > > > >>>
> >> > > > >>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
> >> > eolivelli@gmail.com
> >> > > >
> >> > > > >> wrote:
> >> > > > >>>
> >> > > > >>>> Good.
> >> > > > >>>>
> >> > > > >>>> I will cancel the vote for 3.6.0rc2.
> >> > > > >>>>
> >> > > > >>>> I appreciate very much If Mate and his colleagues have time to
> >> > work
> >> > > on
> >> > > > >> a
> >> > > > >>>> fix.
> >> > > > >>>> Otherwise I will have cycles next week
> >> > > > >>>>
> >> > > > >>>> I would also like to spend my time in setting up a few minimal
> >> > > > >> integration
> >> > > > >>>> tests about the upgrade story
> >> > > > >>>>
> >> > > > >>>> Enrico
> >> > > > >>>>
> >> > > > >>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
> >> > scritto:
> >> > > > >>>>
> >> > > > >>>>> Kudos Enrico, very thorough work as the final gate keeper of
> >> the
> >> > > > >> release!
> >> > > > >>>>>
> >> > > > >>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> >> > > > >>>>>
> >> > > > >>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of
> >> > the
> >> > > > >> rare
> >> > > > >>>>> piece of software that put so much emphasis on compatibilities
> >> > thus
> >> > > > >> it
> >> > > > >>>> just
> >> > > > >>>>> works when upgrade / downgrade, which is amazing. One
> >> guarantee
> >> > we
> >> > > > >> always
> >> > > > >>>>> had is during rolling upgrade, the quorum will always be
> >> > available,
> >> > > > >>>> leading
> >> > > > >>>>> to no service interruption. It would be sad we lose such
> >> > capability
> >> > > > >> given
> >> > > > >>>>> this is still a tractable problem.
> >> > > > >>>>>
> >> > > > >>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> >> > protocol
> >> > > > >> and
> >> > > > >>>>> speak old message format when it's talking to old server?
> >> > > Basically,
> >> > > > >> an
> >> > > > >>>>> ugly if else check against the protocol version should work
> >> and
> >> > > > >> there is
> >> > > > >>>> no
> >> > > > >>>>> need to have multiple pass on rolling upgrade process.
> >> > > > >>>>>
> >> > > > >>>>>
> >> > > > >>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> >> > > > >> eolivelli@gmail.com>
> >> > > > >>>>> wrote:
> >> > > > >>>>>
> >> > > > >>>>>> I suggest this plan:
> >> > > > >>>>>> - release 3.6.0 now
> >> > > > >>>>>> - improve the migration story, the flow outlined by Mate is
> >> > > > >>>>>> interesting, but it will take time
> >> > > > >>>>>>
> >> > > > >>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize
> >> the
> >> > > > >>>>>> release this evening (within 8-10 hours) if no one comes out
> >> in
> >> > > the
> >> > > > >>>>>> VOTE thread with a -1
> >> > > > >>>>>>
> >> > > > >>>>>> Enrico
> >> > > > >>>>>>
> >> > > > >>>>>> Enrico
> >> > > > >>>>>>
> >> > > > >>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> >> > > > >>>>>> <ph...@apache.org> ha scritto:
> >> > > > >>>>>>>
> >> > > > >>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <
> >> andor@apache.org
> >> > >
> >> > > > >>>> wrote:
> >> > > > >>>>>>>
> >> > > > >>>>>>>> Hi,
> >> > > > >>>>>>>>
> >> > > > >>>>>>>> Answers inline.
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>> In my experience when you are close to a release it is
> >> > > > >> better to
> >> > > > >>>> to
> >> > > > >>>>>>>>> make big changes. (I am among the approvers of that patch,
> >> > > > >> so I
> >> > > > >>>> am
> >> > > > >>>>>>>>> responsible for this change)
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>>> Although this statement is acceptable for me, I don’t feel
> >> > this
> >> > > > >>>> patch
> >> > > > >>>>>>>> should not have been merged into 3.6.0. Submission has been
> >> > > > >>>> preceded
> >> > > > >>>>>> by a
> >> > > > >>>>>>>> long argument with MAPR folks who originally wanted to be
> >> > > > >> merged
> >> > > > >>>> into
> >> > > > >>>>>> 3.4
> >> > > > >>>>>>>> branch (considering the pace how ZooKeeper community is
> >> moving
> >> > > > >>>>>> forward) and
> >> > > > >>>>>>>> we reached an agreement that release it with 3.6.0.
> >> > > > >>>>>>>>
> >> > > > >>>>>>>> Make a long story short, this patch has been outstanding
> >> for
> >> > > > >> ages
> >> > > > >>>>>> without
> >> > > > >>>>>>>> much attention from the community and contributors made a
> >> lot
> >> > > > >> of
> >> > > > >>>>>> effort to
> >> > > > >>>>>>>> get it done before the release.
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>> I would like to ear from people that have been in the
> >> > > > >> community
> >> > > > >>>> for
> >> > > > >>>>>>>>> long time, then I am ready to complete the release process
> >> > > > >> for
> >> > > > >>>>>>>>> 3.6.0rc2.
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>>> Me too.
> >> > > > >>>>>>>>
> >> > > > >>>>>>>> I tend to accept the way rolling restart works now - as you
> >> > > > >>>> described
> >> > > > >>>>>>>> Enrico - and given that situation was pretty much the same
> >> > > > >> between
> >> > > > >>>>> 3.4
> >> > > > >>>>>> and
> >> > > > >>>>>>>> 3.5, I don’t feel we have to make additional changes.
> >> > > > >>>>>>>>
> >> > > > >>>>>>>> On the other hand, the fix that Mate suggested sounds quite
> >> > > > >> cool,
> >> > > > >>>> I’m
> >> > > > >>>>>> also
> >> > > > >>>>>>>> happy to work on getting it in.
> >> > > > >>>>>>>>
> >> > > > >>>>>>>> Fyi, Release Management page says the following:
> >> > > > >>>>>>>>
> >> > > > >>>>>>
> >> > > > >>>>
> >> > > > >>
> >> > >
> >> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> >> > > > >>>>>>>>
> >> > > > >>>>>>>> "major.minor release of ZooKeeper must be backwards
> >> compatible
> >> > > > >> with
> >> > > > >>>>> the
> >> > > > >>>>>>>> previous minor release, major.(minor-1)"
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>> Our users, direct and indirect, value the ability to
> >> migrate to
> >> > > > >> newer
> >> > > > >>>>>>> versions - esp as we drop support for older. Frictions such
> >> as
> >> > > > >> this
> >> > > > >>>> can
> >> > > > >>>>>> be
> >> > > > >>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given
> >> our
> >> > > > >>>>> published
> >> > > > >>>>>>> guidelines.
> >> > > > >>>>>>>
> >> > > > >>>>>>> Patrick
> >> > > > >>>>>>>
> >> > > > >>>>>>>
> >> > > > >>>>>>>> Andor
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
> >> > > > >> eolivelli@gmail.com
> >> > > > >>>>>
> >> > > > >>>>>> wrote:
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> Thank you Mate for checking and explaining this story.
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> I find it very interesting that the cause is
> >> ZOOKEEPER-3188
> >> > > > >> as:
> >> > > > >>>>>>>>> - it is the last "big patch" committed to 3.6 before
> >> > > > >> starting the
> >> > > > >>>>>>>>> release process
> >> > > > >>>>>>>>> - it is the cause of the failure of the first RC
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> In my experience when you are close to a release it is
> >> > > > >> better to
> >> > > > >>>> to
> >> > > > >>>>>>>>> make big changes. (I am among the approvers of that patch,
> >> > > > >> so I
> >> > > > >>>> am
> >> > > > >>>>>>>>> responsible for this change)
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> This is a pointer to the change to whom who wants to
> >> > > > >> understand
> >> > > > >>>>>> better
> >> > > > >>>>>>>>> the context
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>
> >> > > > >>>>>
> >> > > > >>>>
> >> > > > >>
> >> > >
> >> >
> >> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was
> >> the
> >> > > > >> same
> >> > > > >>>>> and
> >> > > > >>>>>>>>> if this statement holds then I feel we can continue
> >> > > > >>>>>>>>> with this release.
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is
> >> too
> >> > > > >>>>>> complex.
> >> > > > >>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and
> >> we
> >> > > > >> do
> >> > > > >>>> not
> >> > > > >>>>>>>>> have tools to certify this compatibility (at least not in
> >> the
> >> > > > >>>> short
> >> > > > >>>>>>>>> term)
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> I would like to ear from people that have been in the
> >> > > > >> community
> >> > > > >>>> for
> >> > > > >>>>>>>>> long time, then I am ready to complete the release process
> >> > > > >> for
> >> > > > >>>>>>>>> 3.6.0rc2.
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> I will update the website and the release notes with a
> >> > > > >> specific
> >> > > > >>>>>>>>> warning about the upgrade, we should also update the Wiki
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> Enrico
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>>
> >> > > > >>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> >> > > > >>>>>>>>> <sz...@gmail.com> ha scritto:
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> Hi Enrico!
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
> >> > > > >>>>>>>> QuorumCnxManager.
> >> > > > >>>>>>>>>> The Protocol version  was changed last time in
> >> > > > >> ZOOKEEPER-2186
> >> > > > >>>>>> released
> >> > > > >>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix
> >> some
> >> > > > >> bugs.
> >> > > > >>>>>> Later I
> >> > > > >>>>>>>>>> also changed the protocol version when the format of the
> >> > > > >> initial
> >> > > > >>>>>> message
> >> > > > >>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum
> >> protocol
> >> > > > >> is
> >> > > > >>>> not
> >> > > > >>>>>>>>>> compatible in this case and is the 'expected' behavior if
> >> > > > >> you
> >> > > > >>>>>> upgrade
> >> > > > >>>>>>>> e.g
> >> > > > >>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6
> >> to
> >> > > > >>>> 3.6.0.
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
> >> > > > >> then and
> >> > > > >>>>>> got to
> >> > > > >>>>>>>> the
> >> > > > >>>>>>>>>> conclusion that it is not that bad, as there will be no
> >> data
> >> > > > >>>> loss
> >> > > > >>>>>> as you
> >> > > > >>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
> >> > > > >> should
> >> > > > >>>>>> ensure
> >> > > > >>>>>>>>>> both backward and forward compatibility to make sure that
> >> > > > >> the
> >> > > > >>>> old
> >> > > > >>>>>> and
> >> > > > >>>>>>>> the
> >> > > > >>>>>>>>>> new part of the quorum can still speak to each other. The
> >> > > > >>>> current
> >> > > > >>>>>>>> solution
> >> > > > >>>>>>>>>> (simply failing if the protocol versions mismatch) is
> >> more
> >> > > > >>>> simple
> >> > > > >>>>>> and
> >> > > > >>>>>>>> still
> >> > > > >>>>>>>>>> working just fine: as the servers are restarted
> >> one-by-one,
> >> > > > >> the
> >> > > > >>>>>> nodes
> >> > > > >>>>>>>> with
> >> > > > >>>>>>>>>> the old protocol version and the nodes with the new
> >> protocol
> >> > > > >>>>> version
> >> > > > >>>>>>>> will
> >> > > > >>>>>>>>>> form two partitions, but any given time only one
> >> partition
> >> > > > >> will
> >> > > > >>>>>> have the
> >> > > > >>>>>>>>>> quorum.
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> Still, thinking it trough, as a side effect in these
> >> cases
> >> > > > >> there
> >> > > > >>>>>> will
> >> > > > >>>>>>>> be a
> >> > > > >>>>>>>>>> short time when none of the partitions will have quorums
> >> > > > >> (when
> >> > > > >>>> we
> >> > > > >>>>>> have N
> >> > > > >>>>>>>>>> servers with the old protocol version, N servers with the
> >> > > > >> new
> >> > > > >>>>>> protocol
> >> > > > >>>>>>>>>> version, and there is one server just being restarted). I
> >> > > > >> am not
> >> > > > >>>>>> sure
> >> > > > >>>>>>>> if we
> >> > > > >>>>>>>>>> can accept this.
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
> >> > > > >> possible
> >> > > > >>>> to
> >> > > > >>>>>> parse
> >> > > > >>>>>>>>>> the initial message of the old protocol version with the
> >> new
> >> > > > >>>> code.
> >> > > > >>>>>> But
> >> > > > >>>>>>>> I am
> >> > > > >>>>>>>>>> not sure if it would be enough (as the old code will not
> >> be
> >> > > > >> able
> >> > > > >>>>> to
> >> > > > >>>>>>>> parse
> >> > > > >>>>>>>>>> the new initial message).
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> One option can be to make a patch also for 3.5 to have a
> >> > > > >> version
> >> > > > >>>>>> which
> >> > > > >>>>>>>>>> supports both protocol versions. (let's say in 3.5.8)
> >> Then
> >> > > > >> we
> >> > > > >>>> can
> >> > > > >>>>>> write
> >> > > > >>>>>>>> to
> >> > > > >>>>>>>>>> the release note, that if you need rolling upgrade from
> >> any
> >> > > > >>>>> versions
> >> > > > >>>>>>>> since
> >> > > > >>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
> >> > > > >>>> upgrading
> >> > > > >>>>> to
> >> > > > >>>>>>>> 3.6.0.
> >> > > > >>>>>>>>>> We can even make the same thing on the 3.4 branch.
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> But I am also new to the community... It would be great
> >> to
> >> > > > >> hear
> >> > > > >>>>> the
> >> > > > >>>>>>>> opinion
> >> > > > >>>>>>>>>> of more experienced people.
> >> > > > >>>>>>>>>> Whatever the decision will be, I am happy to make the
> >> > > > >> changes.
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> And sorry for breaking the RC (if we decide that this
> >> needs
> >> > > > >> to
> >> > > > >>>> be
> >> > > > >>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> Kind regards,
> >> > > > >>>>>>>>>> Mate
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> >> > > > >>>>>> eolivelli@gmail.com>
> >> > > > >>>>>>>> wrote:
> >> > > > >>>>>>>>>>
> >> > > > >>>>>>>>>>> Hi,
> >> > > > >>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
> >> > > > >> closing the
> >> > > > >>>>>> VOTE
> >> > > > >>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to
> >> an
> >> > > > >>>>> apparent
> >> > > > >>>>>>>>>>> blocker.
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> >> > > > >> looks
> >> > > > >>>>> like
> >> > > > >>>>>>>>>>> peers are not able to talk to each other.
> >> > > > >>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
> >> > > > >>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> >> > > > >> errors on
> >> > > > >>>>> 3.5
> >> > > > >>>>>>>> nodes:
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> >> > > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918]
> >> -
> >> > > > >>>>>> Received
> >> > > > >>>>>>>>>>> connection request 127.0.0.1:62591
> >> > > > >>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> >> > > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>
> >> > > > >>>>>
> >> > > > >>>>
> >> > > > >>
> >> > >
> >> >
> >> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> >> > > > >>>>>>>>>>> Got unrecognized protocol version -65535
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>> Once I upgrade all of the peers the system is up and
> >> > > > >> running,
> >> > > > >>>>>> without
> >> > > > >>>>>>>>>>> apparently no data loss.
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
> >> > > > >> say,
> >> > > > >>>>>> server1,
> >> > > > >>>>>>>>>>> server1 is not able to accept connections (error "Close
> >> of
> >> > > > >>>>> session
> >> > > > >>>>>> 0x0
> >> > > > >>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
> >> > > > >>>> clients,
> >> > > > >>>>>> this
> >> > > > >>>>>>>>>>> is expected, because as far as it cannot talk with the
> >> > > > >> other
> >> > > > >>>>> peers
> >> > > > >>>>>> it
> >> > > > >>>>>>>>>>> is practically partitioned away from the cluster.
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>> My questions are:
> >> > > > >>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
> >> > > > >> from
> >> > > > >>>> 3.5
> >> > > > >>>>> to
> >> > > > >>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long
> >> ago,
> >> > > > >>>> and I
> >> > > > >>>>>> was
> >> > > > >>>>>>>>>>> not in the community as dev so I cannot tell
> >> > > > >>>>>>>>>>> 2) is this a viable option for users ? to have some
> >> > > > >> temporary
> >> > > > >>>>>> glitch
> >> > > > >>>>>>>>>>> during the upgrade and hope that the upgrade completes
> >> > > > >> without
> >> > > > >>>>>>>>>>> troubles ?
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>> In theory as long as two servers are running the same
> >> major
> >> > > > >>>>> version
> >> > > > >>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
> >> > > > >> make
> >> > > > >>>>>> progress
> >> > > > >>>>>>>>>>> and to server clients.
> >> > > > >>>>>>>>>>> I feel that this is quite dangerous, but I don't have
> >> > > > >> enough
> >> > > > >>>>>> context
> >> > > > >>>>>>>>>>> to understand how this problem is possible and when we
> >> > > > >> decided
> >> > > > >>>> to
> >> > > > >>>>>>>>>>> break compatibility.
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>> The other option is that I am wrong in my test and I am
> >> > > > >> messing
> >> > > > >>>>> up
> >> > > > >>>>>> :-)
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>> The other upgrade path I would like to see working like
> >> a
> >> > > > >> charm
> >> > > > >>>>> is
> >> > > > >>>>>> the
> >> > > > >>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
> >> > > > >> release
> >> > > > >>>> 3.6
> >> > > > >>>>> we
> >> > > > >>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>>>> Regards
> >> > > > >>>>>>>>>>> Enrico
> >> > > > >>>>>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>>>
> >> > > > >>>>>>
> >> > > > >>>>>
> >> > > > >>>>
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> >>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Szalay-Bekő Máté <sz...@gmail.com>.
FYI: PR just submitted, see  https://github.com/apache/zookeeper/pull/1251
any comments welcomed! :)

Kind regards,
Mate

On Wed, Feb 12, 2020 at 1:16 PM Andor Molnar <an...@apache.org> wrote:

> Hi Michael,
>
> "if we can get to rc2 without noticing a showstopper…”
>
> 200% disagree with this.
>
> The whole point of release voting system is to identify problems no matter
> how big they are. The message of finding a showstopper for me is that
> people paying attention and accurately testing the release. This is a very
> good thing and emphasises how much effort the ZooKeeper community is
> putting into every single release. Otherwise we could just set up a Jenkins
> job which creates and publishes a new release in every six months and say
> good luck with them.
>
> I admit that currently we don’t have (rolling) upgrade tests, but I feel
> demand from the community to fill this gap.
>
> “rolling upgrades (and mixed ensembles generally) are effectively untested”
>
> Not true. That’s exactly what we are currently doing (manually for now).
>
> "there have to be a hundred corner cases beyond the MultiAddress issue”
>
> Sure thing. True for every new feature in every release. That’s why I’m
> happy disabling it by default. People usually don’t pick up releases ending
> with .0, production upgrades are expected from .1 or .2 or maybe later
> depending on how much risk would like to be taken.
>
> Andor
>
>
>
> > On 2020. Feb 11., at 23:35, Michael K. Edwards <m....@gmail.com>
> wrote:
> >
> > I think it would be prudent to emphasize in the release notes that
> rolling
> > upgrades (and mixed ensembles generally) are effectively untested.  That
> > this was, in practice, a non-goal of this release cycle.  Because if we
> can
> > get to rc2 without noticing a showstopper, clearly it's not something
> that
> > anyone has gotten around to attempting; and there have to be a hundred
> > corner cases beyond the MultiAddress issue.
> >
> > On Tue, Feb 11, 2020 at 12:27 PM Szalay-Bekő Máté <
> > szalay.beko.mate@gmail.com> wrote:
> >
> >> I see the main problem here in the fact that we are missing proper
> >> versioning in the leader election / quorum protocols. I tried to simply
> >> implement backward compatibility in 3.6, but it didn't solve the
> problem.
> >> The new code understands the old protocol, but it can not decide when to
> >> use the new or the old protocol during connection initiation. So the old
> >> servers can not read the new init messages and we still temporarly end
> up
> >> having two partitions during rolling restart.
> >>
> >> I already suggested two ways to handle this later, but I think for 3.6.0
> >> now the simplest solution is to disable the new MultiAddress feature and
> >> stick to the old protocol version by default. Plus extend the
> >> documentation with the note, that enabling the MultiAddress feature is
> not
> >> possible during a rolling upgrade, but it needs to be done with a
> separate
> >> rolling restart. With this approach, the rolling restart should "just
> work"
> >> with the 3.4 / 3.5 configs and we don't require any extra step /
> >> configuration from the users, unless they want to use the new feature. I
> >> plan to submit a PR with these changes tomorrow to ZOOKEEPER-3720, if
> there
> >> isn't any different opinion.
> >>
> >> P.S. For 4.0 we might need to put some extra thinking into backward
> >> compatibility / versioning for the quorum and client protocols.
> >>
> >>
> >> On Tue, Feb 11, 2020, 20:44 Michael K. Edwards <m....@gmail.com>
> >> wrote:
> >>
> >>> I hate to say it, but I think 3.6.0 should release as is.  It is
> >>> impossible
> >>> to *reliably* retrofit backwards compatibility / interoperability onto
> a
> >>> release that was engineered from the beginning without that goal.
> Learn
> >>> the lesson, set goals differently in the future.
> >>>
> >>> On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <
> >>> szalay.beko.mate@gmail.com>
> >>> wrote:
> >>>
> >>>> FYI: I created these scripts for my local tests:
> >>>> https://github.com/symat/zk-rolling-upgrade-test
> >>>>
> >>>> For the long term I would also add some script that actually monitors
> >>> the
> >>>> state of the quorum and also runs continuous traffic, not just 1-2
> >>>> smoketests after each restart. But I don't know how important this
> would
> >>>> be.
> >>>>
> >>>> On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
> >>>>> <an...@apache.org> ha scritto:
> >>>>>>
> >>>>>> The most obvious one which crosses my mind is that I previously
> >>> worked
> >>>>> on:
> >>>>>>
> >>>>>> 1) run old version cluster,
> >>>>>> 2) connect to each node and run smoke tests,
> >>>>>> 3) restart one node with new code,
> >>>>>> 4) goto 2) until all nodes are upgraded
> >>>>>>
> >>>>>> I think this wouldn’t work in a “unit test”, we probably need a
> >>>> separate
> >>>>> Jenkins job and a nice python script to do this.
> >>>>>>
> >>>>>> Andor
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org>
> >>> wrote:
> >>>>>>>
> >>>>>>> Anyone have ideas how we could add testing for upgrade? Obviously
> >>>>> something
> >>>>>>> we're missing, esp given it's import.
> >>>>>
> >>>>> I will send an email next days with a proposal.
> >>>>> btw my idea is very like Andor's one
> >>>>>
> >>>>> Once we have an automatic environment we can launch from Jenkins
> >>>>>
> >>>>> Enrico
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>> Patrick
> >>>>>>>
> >>>>>>> On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
> >>>> eolivelli@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> >>>>>>>> <sz...@gmail.com> ha scritto:
> >>>>>>>>>
> >>>>>>>>> Hi All,
> >>>>>>>>>
> >>>>>>>>> about the question from Michael:
> >>>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> >>>> protocol
> >>>>> and
> >>>>>>>>>> speak old message format when it's talking to old server?
> >>>>>>>>>
> >>>>>>>>> In this particular case, it might be enough. The protocol change
> >>>>> happened
> >>>>>>>>> now in the 'initial message' sent by the QuorumCnxManager.
> >>> Maybe it
> >>>>> is
> >>>>>>>> not
> >>>>>>>>> a problem if the new servers can not initiate channels to the
> >>> old
> >>>>>>>> servers,
> >>>>>>>>> maybe it is enough if these channel gets initiated by the old
> >>>> servers
> >>>>>>>> only.
> >>>>>>>>> I will test it quickly.
> >>>>>>>>>
> >>>>>>>>> Although I have no idea if any other thing changed in the quorum
> >>>>> protocol
> >>>>>>>>> between 3.5 and 3.6. In other cases it might not be enough if
> >>> the
> >>>> new
> >>>>>>>>> servers can understand the old messages, as the old servers can
> >>>>> break by
> >>>>>>>>> not understanding the messages from the new servers. Also, in
> >>> the
> >>>>> code
> >>>>>>>>> currently (AFAIK) there is no generic knowledge of protocol
> >>>>> versions, the
> >>>>>>>>> servers are not storing that which protocol versions they
> >>>> can/should
> >>>>> use
> >>>>>>>> to
> >>>>>>>>> communicate to which particular other servers. Maybe we don't
> >>> even
> >>>>> need
> >>>>>>>>> this, but I would feel better if we would have more tests around
> >>>>> these
> >>>>>>>>> things.
> >>>>>>>>>
> >>>>>>>>> My suggestion for the long term:
> >>>>>>>>> - let's fix this particular issue now with 3.6.0 quickly (I
> >>> start
> >>>>> doing
> >>>>>>>>> this today)
> >>>>>>>>> - let's do some automation (backed up with jenkins) that will
> >>> test
> >>>> a
> >>>>>>>> whole
> >>>>>>>>> combinations of different ZooKeeper upgrade paths by making
> >>> rolling
> >>>>>>>>> upgrades during some light traffic. Let's have a bit better
> >>>>> definition
> >>>>>>>>> about what we expect (e.g. the quorum is up, but some clients
> >>> can
> >>>> get
> >>>>>>>>> disconnected? What will happen to the ephemeral nodes? Do we
> >>> want
> >>>> to
> >>>>>>>>> gracefully close or transfer the user sessions before stopping
> >>> the
> >>>>> old
> >>>>>>>>> server?) and let's see where this broke. Just by checking the
> >>>> code, I
> >>>>>>>> don't
> >>>>>>>>> think the quorum will always be up (e.g. between older 3.4
> >>> versions
> >>>>> and
> >>>>>>>>> 3.5).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I am happy to work on this topic
> >>>>>>>>
> >>>>>>>>> - we need to update the Wiki about the working rolling upgrade
> >>>> paths
> >>>>> and
> >>>>>>>>> maybe about workarounds if needed
> >>>>>>>>> - we might need to do some fixes (adding backward compatible
> >>>> versions
> >>>>>>>>> and/or specific parameters that enforce old protocol temporary
> >>>>> during the
> >>>>>>>>> rolling upgrade that can be changed later to the new protocol by
> >>>>> either
> >>>>>>>>> dynamic reconfig or by rolling restart)
> >>>>>>>>
> >>>>>>>> it would be much better on 3.6 code to have some support for
> >>>>>>>> compatibility with 3.5 servers
> >>>>>>>> we can't require old code to be forward compatible but we can
> >>> make
> >>>> new
> >>>>>>>> code be compatible to a certain extend with old code.
> >>>>>>>> If we can achieve this compatibility goal without a flag is
> >>> better,
> >>>>>>>> users won't have to care about this part and they simply "trust"
> >>> on
> >>>> us
> >>>>>>>>
> >>>>>>>> The rollback story is also important, but maybe we are still not
> >>>> ready
> >>>>>>>> for it, in case of local changes to store,
> >>>>>>>> it is better to have a clear design and plan and work for a new
> >>>>> release
> >>>>>>>> (3.7?)
> >>>>>>>>
> >>>>>>>> Enrico
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Depending on your comments, I am happy to create a few Jira
> >>> tickets
> >>>>>>>> around
> >>>>>>>>> these topics.
> >>>>>>>>>
> >>>>>>>>> Kind regards,
> >>>>>>>>> Mate
> >>>>>>>>>
> >>>>>>>>> ps. Enrico, sorry about your RC... I owe you a beer, let me
> >>> know if
> >>>>> you
> >>>>>>>> are
> >>>>>>>>> near to Budapest ;)
> >>>>>>>>>
> >>>>>>>>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
> >>>> eolivelli@gmail.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Good.
> >>>>>>>>>>
> >>>>>>>>>> I will cancel the vote for 3.6.0rc2.
> >>>>>>>>>>
> >>>>>>>>>> I appreciate very much If Mate and his colleagues have time to
> >>>> work
> >>>>> on
> >>>>>>>> a
> >>>>>>>>>> fix.
> >>>>>>>>>> Otherwise I will have cycles next week
> >>>>>>>>>>
> >>>>>>>>>> I would also like to spend my time in setting up a few minimal
> >>>>>>>> integration
> >>>>>>>>>> tests about the upgrade story
> >>>>>>>>>>
> >>>>>>>>>> Enrico
> >>>>>>>>>>
> >>>>>>>>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
> >>>> scritto:
> >>>>>>>>>>
> >>>>>>>>>>> Kudos Enrico, very thorough work as the final gate keeper of
> >>> the
> >>>>>>>> release!
> >>>>>>>>>>>
> >>>>>>>>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> >>>>>>>>>>>
> >>>>>>>>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of
> >>>> the
> >>>>>>>> rare
> >>>>>>>>>>> piece of software that put so much emphasis on compatibilities
> >>>> thus
> >>>>>>>> it
> >>>>>>>>>> just
> >>>>>>>>>>> works when upgrade / downgrade, which is amazing. One
> >>> guarantee
> >>>> we
> >>>>>>>> always
> >>>>>>>>>>> had is during rolling upgrade, the quorum will always be
> >>>> available,
> >>>>>>>>>> leading
> >>>>>>>>>>> to no service interruption. It would be sad we lose such
> >>>> capability
> >>>>>>>> given
> >>>>>>>>>>> this is still a tractable problem.
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> >>>> protocol
> >>>>>>>> and
> >>>>>>>>>>> speak old message format when it's talking to old server?
> >>>>> Basically,
> >>>>>>>> an
> >>>>>>>>>>> ugly if else check against the protocol version should work
> >>> and
> >>>>>>>> there is
> >>>>>>>>>> no
> >>>>>>>>>>> need to have multiple pass on rolling upgrade process.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> >>>>>>>> eolivelli@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I suggest this plan:
> >>>>>>>>>>>> - release 3.6.0 now
> >>>>>>>>>>>> - improve the migration story, the flow outlined by Mate is
> >>>>>>>>>>>> interesting, but it will take time
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize
> >>> the
> >>>>>>>>>>>> release this evening (within 8-10 hours) if no one comes out
> >>> in
> >>>>> the
> >>>>>>>>>>>> VOTE thread with a -1
> >>>>>>>>>>>>
> >>>>>>>>>>>> Enrico
> >>>>>>>>>>>>
> >>>>>>>>>>>> Enrico
> >>>>>>>>>>>>
> >>>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> >>>>>>>>>>>> <ph...@apache.org> ha scritto:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <
> >>> andor@apache.org
> >>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Answers inline.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> In my experience when you are close to a release it is
> >>>>>>>> better to
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
> >>>>>>>> so I
> >>>>>>>>>> am
> >>>>>>>>>>>>>>> responsible for this change)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Although this statement is acceptable for me, I don’t feel
> >>>> this
> >>>>>>>>>> patch
> >>>>>>>>>>>>>> should not have been merged into 3.6.0. Submission has been
> >>>>>>>>>> preceded
> >>>>>>>>>>>> by a
> >>>>>>>>>>>>>> long argument with MAPR folks who originally wanted to be
> >>>>>>>> merged
> >>>>>>>>>> into
> >>>>>>>>>>>> 3.4
> >>>>>>>>>>>>>> branch (considering the pace how ZooKeeper community is
> >>> moving
> >>>>>>>>>>>> forward) and
> >>>>>>>>>>>>>> we reached an agreement that release it with 3.6.0.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Make a long story short, this patch has been outstanding
> >>> for
> >>>>>>>> ages
> >>>>>>>>>>>> without
> >>>>>>>>>>>>>> much attention from the community and contributors made a
> >>> lot
> >>>>>>>> of
> >>>>>>>>>>>> effort to
> >>>>>>>>>>>>>> get it done before the release.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would like to ear from people that have been in the
> >>>>>>>> community
> >>>>>>>>>> for
> >>>>>>>>>>>>>>> long time, then I am ready to complete the release process
> >>>>>>>> for
> >>>>>>>>>>>>>>> 3.6.0rc2.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Me too.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I tend to accept the way rolling restart works now - as you
> >>>>>>>>>> described
> >>>>>>>>>>>>>> Enrico - and given that situation was pretty much the same
> >>>>>>>> between
> >>>>>>>>>>> 3.4
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> 3.5, I don’t feel we have to make additional changes.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On the other hand, the fix that Mate suggested sounds quite
> >>>>>>>> cool,
> >>>>>>>>>> I’m
> >>>>>>>>>>>> also
> >>>>>>>>>>>>>> happy to work on getting it in.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Fyi, Release Management page says the following:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> "major.minor release of ZooKeeper must be backwards
> >>> compatible
> >>>>>>>> with
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> previous minor release, major.(minor-1)"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Our users, direct and indirect, value the ability to
> >>> migrate to
> >>>>>>>> newer
> >>>>>>>>>>>>> versions - esp as we drop support for older. Frictions such
> >>> as
> >>>>>>>> this
> >>>>>>>>>> can
> >>>>>>>>>>>> be
> >>>>>>>>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given
> >>> our
> >>>>>>>>>>> published
> >>>>>>>>>>>>> guidelines.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Patrick
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Andor
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
> >>>>>>>> eolivelli@gmail.com
> >>>>>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thank you Mate for checking and explaining this story.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I find it very interesting that the cause is
> >>> ZOOKEEPER-3188
> >>>>>>>> as:
> >>>>>>>>>>>>>>> - it is the last "big patch" committed to 3.6 before
> >>>>>>>> starting the
> >>>>>>>>>>>>>>> release process
> >>>>>>>>>>>>>>> - it is the cause of the failure of the first RC
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> In my experience when you are close to a release it is
> >>>>>>>> better to
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
> >>>>>>>> so I
> >>>>>>>>>> am
> >>>>>>>>>>>>>>> responsible for this change)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This is a pointer to the change to whom who wants to
> >>>>>>>> understand
> >>>>>>>>>>>> better
> >>>>>>>>>>>>>>> the context
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was
> >>> the
> >>>>>>>> same
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>> if this statement holds then I feel we can continue
> >>>>>>>>>>>>>>> with this release.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is
> >>> too
> >>>>>>>>>>>> complex.
> >>>>>>>>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and
> >>> we
> >>>>>>>> do
> >>>>>>>>>> not
> >>>>>>>>>>>>>>> have tools to certify this compatibility (at least not in
> >>> the
> >>>>>>>>>> short
> >>>>>>>>>>>>>>> term)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would like to ear from people that have been in the
> >>>>>>>> community
> >>>>>>>>>> for
> >>>>>>>>>>>>>>> long time, then I am ready to complete the release process
> >>>>>>>> for
> >>>>>>>>>>>>>>> 3.6.0rc2.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I will update the website and the release notes with a
> >>>>>>>> specific
> >>>>>>>>>>>>>>> warning about the upgrade, we should also update the Wiki
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Enrico
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> >>>>>>>>>>>>>>> <sz...@gmail.com> ha scritto:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Enrico!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
> >>>>>>>>>>>>>> QuorumCnxManager.
> >>>>>>>>>>>>>>>> The Protocol version  was changed last time in
> >>>>>>>> ZOOKEEPER-2186
> >>>>>>>>>>>> released
> >>>>>>>>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix
> >>> some
> >>>>>>>> bugs.
> >>>>>>>>>>>> Later I
> >>>>>>>>>>>>>>>> also changed the protocol version when the format of the
> >>>>>>>> initial
> >>>>>>>>>>>> message
> >>>>>>>>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum
> >>> protocol
> >>>>>>>> is
> >>>>>>>>>> not
> >>>>>>>>>>>>>>>> compatible in this case and is the 'expected' behavior if
> >>>>>>>> you
> >>>>>>>>>>>> upgrade
> >>>>>>>>>>>>>> e.g
> >>>>>>>>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6
> >>> to
> >>>>>>>>>> 3.6.0.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
> >>>>>>>> then and
> >>>>>>>>>>>> got to
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> conclusion that it is not that bad, as there will be no
> >>> data
> >>>>>>>>>> loss
> >>>>>>>>>>>> as you
> >>>>>>>>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
> >>>>>>>> should
> >>>>>>>>>>>> ensure
> >>>>>>>>>>>>>>>> both backward and forward compatibility to make sure that
> >>>>>>>> the
> >>>>>>>>>> old
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> new part of the quorum can still speak to each other. The
> >>>>>>>>>> current
> >>>>>>>>>>>>>> solution
> >>>>>>>>>>>>>>>> (simply failing if the protocol versions mismatch) is
> >>> more
> >>>>>>>>>> simple
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>> working just fine: as the servers are restarted
> >>> one-by-one,
> >>>>>>>> the
> >>>>>>>>>>>> nodes
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>> the old protocol version and the nodes with the new
> >>> protocol
> >>>>>>>>>>> version
> >>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>> form two partitions, but any given time only one
> >>> partition
> >>>>>>>> will
> >>>>>>>>>>>> have the
> >>>>>>>>>>>>>>>> quorum.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Still, thinking it trough, as a side effect in these
> >>> cases
> >>>>>>>> there
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>> be a
> >>>>>>>>>>>>>>>> short time when none of the partitions will have quorums
> >>>>>>>> (when
> >>>>>>>>>> we
> >>>>>>>>>>>> have N
> >>>>>>>>>>>>>>>> servers with the old protocol version, N servers with the
> >>>>>>>> new
> >>>>>>>>>>>> protocol
> >>>>>>>>>>>>>>>> version, and there is one server just being restarted). I
> >>>>>>>> am not
> >>>>>>>>>>>> sure
> >>>>>>>>>>>>>> if we
> >>>>>>>>>>>>>>>> can accept this.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
> >>>>>>>> possible
> >>>>>>>>>> to
> >>>>>>>>>>>> parse
> >>>>>>>>>>>>>>>> the initial message of the old protocol version with the
> >>> new
> >>>>>>>>>> code.
> >>>>>>>>>>>> But
> >>>>>>>>>>>>>> I am
> >>>>>>>>>>>>>>>> not sure if it would be enough (as the old code will not
> >>> be
> >>>>>>>> able
> >>>>>>>>>>> to
> >>>>>>>>>>>>>> parse
> >>>>>>>>>>>>>>>> the new initial message).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> One option can be to make a patch also for 3.5 to have a
> >>>>>>>> version
> >>>>>>>>>>>> which
> >>>>>>>>>>>>>>>> supports both protocol versions. (let's say in 3.5.8)
> >>> Then
> >>>>>>>> we
> >>>>>>>>>> can
> >>>>>>>>>>>> write
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> the release note, that if you need rolling upgrade from
> >>> any
> >>>>>>>>>>> versions
> >>>>>>>>>>>>>> since
> >>>>>>>>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
> >>>>>>>>>> upgrading
> >>>>>>>>>>> to
> >>>>>>>>>>>>>> 3.6.0.
> >>>>>>>>>>>>>>>> We can even make the same thing on the 3.4 branch.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> But I am also new to the community... It would be great
> >>> to
> >>>>>>>> hear
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> opinion
> >>>>>>>>>>>>>>>> of more experienced people.
> >>>>>>>>>>>>>>>> Whatever the decision will be, I am happy to make the
> >>>>>>>> changes.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> And sorry for breaking the RC (if we decide that this
> >>> needs
> >>>>>>>> to
> >>>>>>>>>> be
> >>>>>>>>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>> Mate
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> >>>>>>>>>>>> eolivelli@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
> >>>>>>>> closing the
> >>>>>>>>>>>> VOTE
> >>>>>>>>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to
> >>> an
> >>>>>>>>>>> apparent
> >>>>>>>>>>>>>>>>> blocker.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> >>>>>>>> looks
> >>>>>>>>>>> like
> >>>>>>>>>>>>>>>>> peers are not able to talk to each other.
> >>>>>>>>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
> >>>>>>>>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> >>>>>>>> errors on
> >>>>>>>>>>> 3.5
> >>>>>>>>>>>>>> nodes:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> >>>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918]
> >>> -
> >>>>>>>>>>>> Received
> >>>>>>>>>>>>>>>>> connection request 127.0.0.1:62591
> >>>>>>>>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> >>>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> >>>>>>>>>>>>>>>>> Got unrecognized protocol version -65535
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Once I upgrade all of the peers the system is up and
> >>>>>>>> running,
> >>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>> apparently no data loss.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
> >>>>>>>> say,
> >>>>>>>>>>>> server1,
> >>>>>>>>>>>>>>>>> server1 is not able to accept connections (error "Close
> >>> of
> >>>>>>>>>>> session
> >>>>>>>>>>>> 0x0
> >>>>>>>>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
> >>>>>>>>>> clients,
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>> is expected, because as far as it cannot talk with the
> >>>>>>>> other
> >>>>>>>>>>> peers
> >>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> is practically partitioned away from the cluster.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> My questions are:
> >>>>>>>>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
> >>>>>>>> from
> >>>>>>>>>> 3.5
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long
> >>> ago,
> >>>>>>>>>> and I
> >>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>> not in the community as dev so I cannot tell
> >>>>>>>>>>>>>>>>> 2) is this a viable option for users ? to have some
> >>>>>>>> temporary
> >>>>>>>>>>>> glitch
> >>>>>>>>>>>>>>>>> during the upgrade and hope that the upgrade completes
> >>>>>>>> without
> >>>>>>>>>>>>>>>>> troubles ?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> In theory as long as two servers are running the same
> >>> major
> >>>>>>>>>>> version
> >>>>>>>>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
> >>>>>>>> make
> >>>>>>>>>>>> progress
> >>>>>>>>>>>>>>>>> and to server clients.
> >>>>>>>>>>>>>>>>> I feel that this is quite dangerous, but I don't have
> >>>>>>>> enough
> >>>>>>>>>>>> context
> >>>>>>>>>>>>>>>>> to understand how this problem is possible and when we
> >>>>>>>> decided
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>>> break compatibility.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The other option is that I am wrong in my test and I am
> >>>>>>>> messing
> >>>>>>>>>>> up
> >>>>>>>>>>>> :-)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The other upgrade path I would like to see working like
> >>> a
> >>>>>>>> charm
> >>>>>>>>>>> is
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
> >>>>>>>> release
> >>>>>>>>>> 3.6
> >>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Regards
> >>>>>>>>>>>>>>>>> Enrico
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
>
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by "Michael K. Edwards" <m....@gmail.com>.
Well, I think it's fair to say that they're effectively untested prior to
rc2.  But it's a reasonable posture to take that the features get baked
first and the field upgrade procedure gets tested late in the release
cycle.  Not what I would have expected personally, though, as a former
developer of field upgradable consumer electronics.

What we used to call the "first article" (the firmware delivered to the
manufacturing line) was routinely unusable for much of anything beyond the
first-time setup procedure and download of over-the-air updates, and a
firmware upgrade/downgrade cycle was the first step in smoke testing every
subsequent release candidate.  Seems like the concerns (the high cost of
upgrade failure and likelihood of permanently losing customer trust) are
similar.  But when one doesn't face a hard August ship date for the first
article for a Christmas-shopping-season product release, I suppose one can
afford a different order of operations.

I am very grateful for the ZooKeeper software and for the care and
resources that its maintainers and community put into its integrity and
vitality.  I value the release engineering process, and don't take for
granted that any given snapshot off of a release branch is fit for
purpose.  At the same time, I'd feel more confident recommending it for
more use cases within the engineering organizations I support if there were
stronger test scaffolding around version migration and similar production
operations.  That's something I'd like to help with in the future,
resources permitting.

Cheers,
- Michael

On Wed, Feb 12, 2020 at 4:16 AM Andor Molnar <an...@apache.org> wrote:

> Hi Michael,
>
> "if we can get to rc2 without noticing a showstopper…”
>
> 200% disagree with this.
>
> The whole point of release voting system is to identify problems no matter
> how big they are. The message of finding a showstopper for me is that
> people paying attention and accurately testing the release. This is a very
> good thing and emphasises how much effort the ZooKeeper community is
> putting into every single release. Otherwise we could just set up a Jenkins
> job which creates and publishes a new release in every six months and say
> good luck with them.
>
> I admit that currently we don’t have (rolling) upgrade tests, but I feel
> demand from the community to fill this gap.
>
> “rolling upgrades (and mixed ensembles generally) are effectively untested”
>
> Not true. That’s exactly what we are currently doing (manually for now).
>
> "there have to be a hundred corner cases beyond the MultiAddress issue”
>
> Sure thing. True for every new feature in every release. That’s why I’m
> happy disabling it by default. People usually don’t pick up releases ending
> with .0, production upgrades are expected from .1 or .2 or maybe later
> depending on how much risk would like to be taken.
>
> Andor
>
>
>
> > On 2020. Feb 11., at 23:35, Michael K. Edwards <m....@gmail.com>
> wrote:
> >
> > I think it would be prudent to emphasize in the release notes that
> rolling
> > upgrades (and mixed ensembles generally) are effectively untested.  That
> > this was, in practice, a non-goal of this release cycle.  Because if we
> can
> > get to rc2 without noticing a showstopper, clearly it's not something
> that
> > anyone has gotten around to attempting; and there have to be a hundred
> > corner cases beyond the MultiAddress issue.
> >
> > On Tue, Feb 11, 2020 at 12:27 PM Szalay-Bekő Máté <
> > szalay.beko.mate@gmail.com> wrote:
> >
> >> I see the main problem here in the fact that we are missing proper
> >> versioning in the leader election / quorum protocols. I tried to simply
> >> implement backward compatibility in 3.6, but it didn't solve the
> problem.
> >> The new code understands the old protocol, but it can not decide when to
> >> use the new or the old protocol during connection initiation. So the old
> >> servers can not read the new init messages and we still temporarly end
> up
> >> having two partitions during rolling restart.
> >>
> >> I already suggested two ways to handle this later, but I think for 3.6.0
> >> now the simplest solution is to disable the new MultiAddress feature and
> >> stick to the old protocol version by default. Plus extend the
> >> documentation with the note, that enabling the MultiAddress feature is
> not
> >> possible during a rolling upgrade, but it needs to be done with a
> separate
> >> rolling restart. With this approach, the rolling restart should "just
> work"
> >> with the 3.4 / 3.5 configs and we don't require any extra step /
> >> configuration from the users, unless they want to use the new feature. I
> >> plan to submit a PR with these changes tomorrow to ZOOKEEPER-3720, if
> there
> >> isn't any different opinion.
> >>
> >> P.S. For 4.0 we might need to put some extra thinking into backward
> >> compatibility / versioning for the quorum and client protocols.
> >>
> >>
> >> On Tue, Feb 11, 2020, 20:44 Michael K. Edwards <m....@gmail.com>
> >> wrote:
> >>
> >>> I hate to say it, but I think 3.6.0 should release as is.  It is
> >>> impossible
> >>> to *reliably* retrofit backwards compatibility / interoperability onto
> a
> >>> release that was engineered from the beginning without that goal.
> Learn
> >>> the lesson, set goals differently in the future.
> >>>
> >>> On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <
> >>> szalay.beko.mate@gmail.com>
> >>> wrote:
> >>>
> >>>> FYI: I created these scripts for my local tests:
> >>>> https://github.com/symat/zk-rolling-upgrade-test
> >>>>
> >>>> For the long term I would also add some script that actually monitors
> >>> the
> >>>> state of the quorum and also runs continuous traffic, not just 1-2
> >>>> smoketests after each restart. But I don't know how important this
> would
> >>>> be.
> >>>>
> >>>> On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
> >>>>> <an...@apache.org> ha scritto:
> >>>>>>
> >>>>>> The most obvious one which crosses my mind is that I previously
> >>> worked
> >>>>> on:
> >>>>>>
> >>>>>> 1) run old version cluster,
> >>>>>> 2) connect to each node and run smoke tests,
> >>>>>> 3) restart one node with new code,
> >>>>>> 4) goto 2) until all nodes are upgraded
> >>>>>>
> >>>>>> I think this wouldn’t work in a “unit test”, we probably need a
> >>>> separate
> >>>>> Jenkins job and a nice python script to do this.
> >>>>>>
> >>>>>> Andor
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org>
> >>> wrote:
> >>>>>>>
> >>>>>>> Anyone have ideas how we could add testing for upgrade? Obviously
> >>>>> something
> >>>>>>> we're missing, esp given it's import.
> >>>>>
> >>>>> I will send an email next days with a proposal.
> >>>>> btw my idea is very like Andor's one
> >>>>>
> >>>>> Once we have an automatic environment we can launch from Jenkins
> >>>>>
> >>>>> Enrico
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>> Patrick
> >>>>>>>
> >>>>>>> On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
> >>>> eolivelli@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> >>>>>>>> <sz...@gmail.com> ha scritto:
> >>>>>>>>>
> >>>>>>>>> Hi All,
> >>>>>>>>>
> >>>>>>>>> about the question from Michael:
> >>>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> >>>> protocol
> >>>>> and
> >>>>>>>>>> speak old message format when it's talking to old server?
> >>>>>>>>>
> >>>>>>>>> In this particular case, it might be enough. The protocol change
> >>>>> happened
> >>>>>>>>> now in the 'initial message' sent by the QuorumCnxManager.
> >>> Maybe it
> >>>>> is
> >>>>>>>> not
> >>>>>>>>> a problem if the new servers can not initiate channels to the
> >>> old
> >>>>>>>> servers,
> >>>>>>>>> maybe it is enough if these channel gets initiated by the old
> >>>> servers
> >>>>>>>> only.
> >>>>>>>>> I will test it quickly.
> >>>>>>>>>
> >>>>>>>>> Although I have no idea if any other thing changed in the quorum
> >>>>> protocol
> >>>>>>>>> between 3.5 and 3.6. In other cases it might not be enough if
> >>> the
> >>>> new
> >>>>>>>>> servers can understand the old messages, as the old servers can
> >>>>> break by
> >>>>>>>>> not understanding the messages from the new servers. Also, in
> >>> the
> >>>>> code
> >>>>>>>>> currently (AFAIK) there is no generic knowledge of protocol
> >>>>> versions, the
> >>>>>>>>> servers are not storing that which protocol versions they
> >>>> can/should
> >>>>> use
> >>>>>>>> to
> >>>>>>>>> communicate to which particular other servers. Maybe we don't
> >>> even
> >>>>> need
> >>>>>>>>> this, but I would feel better if we would have more tests around
> >>>>> these
> >>>>>>>>> things.
> >>>>>>>>>
> >>>>>>>>> My suggestion for the long term:
> >>>>>>>>> - let's fix this particular issue now with 3.6.0 quickly (I
> >>> start
> >>>>> doing
> >>>>>>>>> this today)
> >>>>>>>>> - let's do some automation (backed up with jenkins) that will
> >>> test
> >>>> a
> >>>>>>>> whole
> >>>>>>>>> combinations of different ZooKeeper upgrade paths by making
> >>> rolling
> >>>>>>>>> upgrades during some light traffic. Let's have a bit better
> >>>>> definition
> >>>>>>>>> about what we expect (e.g. the quorum is up, but some clients
> >>> can
> >>>> get
> >>>>>>>>> disconnected? What will happen to the ephemeral nodes? Do we
> >>> want
> >>>> to
> >>>>>>>>> gracefully close or transfer the user sessions before stopping
> >>> the
> >>>>> old
> >>>>>>>>> server?) and let's see where this broke. Just by checking the
> >>>> code, I
> >>>>>>>> don't
> >>>>>>>>> think the quorum will always be up (e.g. between older 3.4
> >>> versions
> >>>>> and
> >>>>>>>>> 3.5).
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I am happy to work on this topic
> >>>>>>>>
> >>>>>>>>> - we need to update the Wiki about the working rolling upgrade
> >>>> paths
> >>>>> and
> >>>>>>>>> maybe about workarounds if needed
> >>>>>>>>> - we might need to do some fixes (adding backward compatible
> >>>> versions
> >>>>>>>>> and/or specific parameters that enforce old protocol temporary
> >>>>> during the
> >>>>>>>>> rolling upgrade that can be changed later to the new protocol by
> >>>>> either
> >>>>>>>>> dynamic reconfig or by rolling restart)
> >>>>>>>>
> >>>>>>>> it would be much better on 3.6 code to have some support for
> >>>>>>>> compatibility with 3.5 servers
> >>>>>>>> we can't require old code to be forward compatible but we can
> >>> make
> >>>> new
> >>>>>>>> code be compatible to a certain extend with old code.
> >>>>>>>> If we can achieve this compatibility goal without a flag is
> >>> better,
> >>>>>>>> users won't have to care about this part and they simply "trust"
> >>> on
> >>>> us
> >>>>>>>>
> >>>>>>>> The rollback story is also important, but maybe we are still not
> >>>> ready
> >>>>>>>> for it, in case of local changes to store,
> >>>>>>>> it is better to have a clear design and plan and work for a new
> >>>>> release
> >>>>>>>> (3.7?)
> >>>>>>>>
> >>>>>>>> Enrico
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Depending on your comments, I am happy to create a few Jira
> >>> tickets
> >>>>>>>> around
> >>>>>>>>> these topics.
> >>>>>>>>>
> >>>>>>>>> Kind regards,
> >>>>>>>>> Mate
> >>>>>>>>>
> >>>>>>>>> ps. Enrico, sorry about your RC... I owe you a beer, let me
> >>> know if
> >>>>> you
> >>>>>>>> are
> >>>>>>>>> near to Budapest ;)
> >>>>>>>>>
> >>>>>>>>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
> >>>> eolivelli@gmail.com
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Good.
> >>>>>>>>>>
> >>>>>>>>>> I will cancel the vote for 3.6.0rc2.
> >>>>>>>>>>
> >>>>>>>>>> I appreciate very much If Mate and his colleagues have time to
> >>>> work
> >>>>> on
> >>>>>>>> a
> >>>>>>>>>> fix.
> >>>>>>>>>> Otherwise I will have cycles next week
> >>>>>>>>>>
> >>>>>>>>>> I would also like to spend my time in setting up a few minimal
> >>>>>>>> integration
> >>>>>>>>>> tests about the upgrade story
> >>>>>>>>>>
> >>>>>>>>>> Enrico
> >>>>>>>>>>
> >>>>>>>>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
> >>>> scritto:
> >>>>>>>>>>
> >>>>>>>>>>> Kudos Enrico, very thorough work as the final gate keeper of
> >>> the
> >>>>>>>> release!
> >>>>>>>>>>>
> >>>>>>>>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> >>>>>>>>>>>
> >>>>>>>>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of
> >>>> the
> >>>>>>>> rare
> >>>>>>>>>>> piece of software that put so much emphasis on compatibilities
> >>>> thus
> >>>>>>>> it
> >>>>>>>>>> just
> >>>>>>>>>>> works when upgrade / downgrade, which is amazing. One
> >>> guarantee
> >>>> we
> >>>>>>>> always
> >>>>>>>>>>> had is during rolling upgrade, the quorum will always be
> >>>> available,
> >>>>>>>>>> leading
> >>>>>>>>>>> to no service interruption. It would be sad we lose such
> >>>> capability
> >>>>>>>> given
> >>>>>>>>>>> this is still a tractable problem.
> >>>>>>>>>>>
> >>>>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> >>>> protocol
> >>>>>>>> and
> >>>>>>>>>>> speak old message format when it's talking to old server?
> >>>>> Basically,
> >>>>>>>> an
> >>>>>>>>>>> ugly if else check against the protocol version should work
> >>> and
> >>>>>>>> there is
> >>>>>>>>>> no
> >>>>>>>>>>> need to have multiple pass on rolling upgrade process.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> >>>>>>>> eolivelli@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> I suggest this plan:
> >>>>>>>>>>>> - release 3.6.0 now
> >>>>>>>>>>>> - improve the migration story, the flow outlined by Mate is
> >>>>>>>>>>>> interesting, but it will take time
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize
> >>> the
> >>>>>>>>>>>> release this evening (within 8-10 hours) if no one comes out
> >>> in
> >>>>> the
> >>>>>>>>>>>> VOTE thread with a -1
> >>>>>>>>>>>>
> >>>>>>>>>>>> Enrico
> >>>>>>>>>>>>
> >>>>>>>>>>>> Enrico
> >>>>>>>>>>>>
> >>>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> >>>>>>>>>>>> <ph...@apache.org> ha scritto:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <
> >>> andor@apache.org
> >>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Answers inline.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> In my experience when you are close to a release it is
> >>>>>>>> better to
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
> >>>>>>>> so I
> >>>>>>>>>> am
> >>>>>>>>>>>>>>> responsible for this change)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Although this statement is acceptable for me, I don’t feel
> >>>> this
> >>>>>>>>>> patch
> >>>>>>>>>>>>>> should not have been merged into 3.6.0. Submission has been
> >>>>>>>>>> preceded
> >>>>>>>>>>>> by a
> >>>>>>>>>>>>>> long argument with MAPR folks who originally wanted to be
> >>>>>>>> merged
> >>>>>>>>>> into
> >>>>>>>>>>>> 3.4
> >>>>>>>>>>>>>> branch (considering the pace how ZooKeeper community is
> >>> moving
> >>>>>>>>>>>> forward) and
> >>>>>>>>>>>>>> we reached an agreement that release it with 3.6.0.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Make a long story short, this patch has been outstanding
> >>> for
> >>>>>>>> ages
> >>>>>>>>>>>> without
> >>>>>>>>>>>>>> much attention from the community and contributors made a
> >>> lot
> >>>>>>>> of
> >>>>>>>>>>>> effort to
> >>>>>>>>>>>>>> get it done before the release.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would like to ear from people that have been in the
> >>>>>>>> community
> >>>>>>>>>> for
> >>>>>>>>>>>>>>> long time, then I am ready to complete the release process
> >>>>>>>> for
> >>>>>>>>>>>>>>> 3.6.0rc2.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Me too.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I tend to accept the way rolling restart works now - as you
> >>>>>>>>>> described
> >>>>>>>>>>>>>> Enrico - and given that situation was pretty much the same
> >>>>>>>> between
> >>>>>>>>>>> 3.4
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> 3.5, I don’t feel we have to make additional changes.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On the other hand, the fix that Mate suggested sounds quite
> >>>>>>>> cool,
> >>>>>>>>>> I’m
> >>>>>>>>>>>> also
> >>>>>>>>>>>>>> happy to work on getting it in.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Fyi, Release Management page says the following:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> "major.minor release of ZooKeeper must be backwards
> >>> compatible
> >>>>>>>> with
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> previous minor release, major.(minor-1)"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Our users, direct and indirect, value the ability to
> >>> migrate to
> >>>>>>>> newer
> >>>>>>>>>>>>> versions - esp as we drop support for older. Frictions such
> >>> as
> >>>>>>>> this
> >>>>>>>>>> can
> >>>>>>>>>>>> be
> >>>>>>>>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given
> >>> our
> >>>>>>>>>>> published
> >>>>>>>>>>>>> guidelines.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Patrick
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Andor
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
> >>>>>>>> eolivelli@gmail.com
> >>>>>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thank you Mate for checking and explaining this story.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I find it very interesting that the cause is
> >>> ZOOKEEPER-3188
> >>>>>>>> as:
> >>>>>>>>>>>>>>> - it is the last "big patch" committed to 3.6 before
> >>>>>>>> starting the
> >>>>>>>>>>>>>>> release process
> >>>>>>>>>>>>>>> - it is the cause of the failure of the first RC
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> In my experience when you are close to a release it is
> >>>>>>>> better to
> >>>>>>>>>> to
> >>>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
> >>>>>>>> so I
> >>>>>>>>>> am
> >>>>>>>>>>>>>>> responsible for this change)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This is a pointer to the change to whom who wants to
> >>>>>>>> understand
> >>>>>>>>>>>> better
> >>>>>>>>>>>>>>> the context
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was
> >>> the
> >>>>>>>> same
> >>>>>>>>>>> and
> >>>>>>>>>>>>>>> if this statement holds then I feel we can continue
> >>>>>>>>>>>>>>> with this release.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is
> >>> too
> >>>>>>>>>>>> complex.
> >>>>>>>>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and
> >>> we
> >>>>>>>> do
> >>>>>>>>>> not
> >>>>>>>>>>>>>>> have tools to certify this compatibility (at least not in
> >>> the
> >>>>>>>>>> short
> >>>>>>>>>>>>>>> term)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would like to ear from people that have been in the
> >>>>>>>> community
> >>>>>>>>>> for
> >>>>>>>>>>>>>>> long time, then I am ready to complete the release process
> >>>>>>>> for
> >>>>>>>>>>>>>>> 3.6.0rc2.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I will update the website and the release notes with a
> >>>>>>>> specific
> >>>>>>>>>>>>>>> warning about the upgrade, we should also update the Wiki
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Enrico
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> >>>>>>>>>>>>>>> <sz...@gmail.com> ha scritto:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Enrico!
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
> >>>>>>>>>>>>>> QuorumCnxManager.
> >>>>>>>>>>>>>>>> The Protocol version  was changed last time in
> >>>>>>>> ZOOKEEPER-2186
> >>>>>>>>>>>> released
> >>>>>>>>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix
> >>> some
> >>>>>>>> bugs.
> >>>>>>>>>>>> Later I
> >>>>>>>>>>>>>>>> also changed the protocol version when the format of the
> >>>>>>>> initial
> >>>>>>>>>>>> message
> >>>>>>>>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum
> >>> protocol
> >>>>>>>> is
> >>>>>>>>>> not
> >>>>>>>>>>>>>>>> compatible in this case and is the 'expected' behavior if
> >>>>>>>> you
> >>>>>>>>>>>> upgrade
> >>>>>>>>>>>>>> e.g
> >>>>>>>>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6
> >>> to
> >>>>>>>>>> 3.6.0.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
> >>>>>>>> then and
> >>>>>>>>>>>> got to
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> conclusion that it is not that bad, as there will be no
> >>> data
> >>>>>>>>>> loss
> >>>>>>>>>>>> as you
> >>>>>>>>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
> >>>>>>>> should
> >>>>>>>>>>>> ensure
> >>>>>>>>>>>>>>>> both backward and forward compatibility to make sure that
> >>>>>>>> the
> >>>>>>>>>> old
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> new part of the quorum can still speak to each other. The
> >>>>>>>>>> current
> >>>>>>>>>>>>>> solution
> >>>>>>>>>>>>>>>> (simply failing if the protocol versions mismatch) is
> >>> more
> >>>>>>>>>> simple
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>> working just fine: as the servers are restarted
> >>> one-by-one,
> >>>>>>>> the
> >>>>>>>>>>>> nodes
> >>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>> the old protocol version and the nodes with the new
> >>> protocol
> >>>>>>>>>>> version
> >>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>> form two partitions, but any given time only one
> >>> partition
> >>>>>>>> will
> >>>>>>>>>>>> have the
> >>>>>>>>>>>>>>>> quorum.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Still, thinking it trough, as a side effect in these
> >>> cases
> >>>>>>>> there
> >>>>>>>>>>>> will
> >>>>>>>>>>>>>> be a
> >>>>>>>>>>>>>>>> short time when none of the partitions will have quorums
> >>>>>>>> (when
> >>>>>>>>>> we
> >>>>>>>>>>>> have N
> >>>>>>>>>>>>>>>> servers with the old protocol version, N servers with the
> >>>>>>>> new
> >>>>>>>>>>>> protocol
> >>>>>>>>>>>>>>>> version, and there is one server just being restarted). I
> >>>>>>>> am not
> >>>>>>>>>>>> sure
> >>>>>>>>>>>>>> if we
> >>>>>>>>>>>>>>>> can accept this.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
> >>>>>>>> possible
> >>>>>>>>>> to
> >>>>>>>>>>>> parse
> >>>>>>>>>>>>>>>> the initial message of the old protocol version with the
> >>> new
> >>>>>>>>>> code.
> >>>>>>>>>>>> But
> >>>>>>>>>>>>>> I am
> >>>>>>>>>>>>>>>> not sure if it would be enough (as the old code will not
> >>> be
> >>>>>>>> able
> >>>>>>>>>>> to
> >>>>>>>>>>>>>> parse
> >>>>>>>>>>>>>>>> the new initial message).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> One option can be to make a patch also for 3.5 to have a
> >>>>>>>> version
> >>>>>>>>>>>> which
> >>>>>>>>>>>>>>>> supports both protocol versions. (let's say in 3.5.8)
> >>> Then
> >>>>>>>> we
> >>>>>>>>>> can
> >>>>>>>>>>>> write
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>> the release note, that if you need rolling upgrade from
> >>> any
> >>>>>>>>>>> versions
> >>>>>>>>>>>>>> since
> >>>>>>>>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
> >>>>>>>>>> upgrading
> >>>>>>>>>>> to
> >>>>>>>>>>>>>> 3.6.0.
> >>>>>>>>>>>>>>>> We can even make the same thing on the 3.4 branch.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> But I am also new to the community... It would be great
> >>> to
> >>>>>>>> hear
> >>>>>>>>>>> the
> >>>>>>>>>>>>>> opinion
> >>>>>>>>>>>>>>>> of more experienced people.
> >>>>>>>>>>>>>>>> Whatever the decision will be, I am happy to make the
> >>>>>>>> changes.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> And sorry for breaking the RC (if we decide that this
> >>> needs
> >>>>>>>> to
> >>>>>>>>>> be
> >>>>>>>>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Kind regards,
> >>>>>>>>>>>>>>>> Mate
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> >>>>>>>>>>>> eolivelli@gmail.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
> >>>>>>>> closing the
> >>>>>>>>>>>> VOTE
> >>>>>>>>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to
> >>> an
> >>>>>>>>>>> apparent
> >>>>>>>>>>>>>>>>> blocker.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> >>>>>>>> looks
> >>>>>>>>>>> like
> >>>>>>>>>>>>>>>>> peers are not able to talk to each other.
> >>>>>>>>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
> >>>>>>>>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> >>>>>>>> errors on
> >>>>>>>>>>> 3.5
> >>>>>>>>>>>>>> nodes:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> >>>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918]
> >>> -
> >>>>>>>>>>>> Received
> >>>>>>>>>>>>>>>>> connection request 127.0.0.1:62591
> >>>>>>>>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> >>>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> >>>>>>>>>>>>>>>>> Got unrecognized protocol version -65535
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Once I upgrade all of the peers the system is up and
> >>>>>>>> running,
> >>>>>>>>>>>> without
> >>>>>>>>>>>>>>>>> apparently no data loss.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
> >>>>>>>> say,
> >>>>>>>>>>>> server1,
> >>>>>>>>>>>>>>>>> server1 is not able to accept connections (error "Close
> >>> of
> >>>>>>>>>>> session
> >>>>>>>>>>>> 0x0
> >>>>>>>>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
> >>>>>>>>>> clients,
> >>>>>>>>>>>> this
> >>>>>>>>>>>>>>>>> is expected, because as far as it cannot talk with the
> >>>>>>>> other
> >>>>>>>>>>> peers
> >>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>> is practically partitioned away from the cluster.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> My questions are:
> >>>>>>>>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
> >>>>>>>> from
> >>>>>>>>>> 3.5
> >>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long
> >>> ago,
> >>>>>>>>>> and I
> >>>>>>>>>>>> was
> >>>>>>>>>>>>>>>>> not in the community as dev so I cannot tell
> >>>>>>>>>>>>>>>>> 2) is this a viable option for users ? to have some
> >>>>>>>> temporary
> >>>>>>>>>>>> glitch
> >>>>>>>>>>>>>>>>> during the upgrade and hope that the upgrade completes
> >>>>>>>> without
> >>>>>>>>>>>>>>>>> troubles ?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> In theory as long as two servers are running the same
> >>> major
> >>>>>>>>>>> version
> >>>>>>>>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
> >>>>>>>> make
> >>>>>>>>>>>> progress
> >>>>>>>>>>>>>>>>> and to server clients.
> >>>>>>>>>>>>>>>>> I feel that this is quite dangerous, but I don't have
> >>>>>>>> enough
> >>>>>>>>>>>> context
> >>>>>>>>>>>>>>>>> to understand how this problem is possible and when we
> >>>>>>>> decided
> >>>>>>>>>> to
> >>>>>>>>>>>>>>>>> break compatibility.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The other option is that I am wrong in my test and I am
> >>>>>>>> messing
> >>>>>>>>>>> up
> >>>>>>>>>>>> :-)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> The other upgrade path I would like to see working like
> >>> a
> >>>>>>>> charm
> >>>>>>>>>>> is
> >>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
> >>>>>>>> release
> >>>>>>>>>> 3.6
> >>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Regards
> >>>>>>>>>>>>>>>>> Enrico
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
>
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Andor Molnar <an...@apache.org>.
Hi Michael,

"if we can get to rc2 without noticing a showstopper…”

200% disagree with this. 

The whole point of release voting system is to identify problems no matter how big they are. The message of finding a showstopper for me is that people paying attention and accurately testing the release. This is a very good thing and emphasises how much effort the ZooKeeper community is putting into every single release. Otherwise we could just set up a Jenkins job which creates and publishes a new release in every six months and say good luck with them.

I admit that currently we don’t have (rolling) upgrade tests, but I feel demand from the community to fill this gap.

“rolling upgrades (and mixed ensembles generally) are effectively untested”

Not true. That’s exactly what we are currently doing (manually for now).

"there have to be a hundred corner cases beyond the MultiAddress issue”

Sure thing. True for every new feature in every release. That’s why I’m happy disabling it by default. People usually don’t pick up releases ending with .0, production upgrades are expected from .1 or .2 or maybe later depending on how much risk would like to be taken.

Andor



> On 2020. Feb 11., at 23:35, Michael K. Edwards <m....@gmail.com> wrote:
> 
> I think it would be prudent to emphasize in the release notes that rolling
> upgrades (and mixed ensembles generally) are effectively untested.  That
> this was, in practice, a non-goal of this release cycle.  Because if we can
> get to rc2 without noticing a showstopper, clearly it's not something that
> anyone has gotten around to attempting; and there have to be a hundred
> corner cases beyond the MultiAddress issue.
> 
> On Tue, Feb 11, 2020 at 12:27 PM Szalay-Bekő Máté <
> szalay.beko.mate@gmail.com> wrote:
> 
>> I see the main problem here in the fact that we are missing proper
>> versioning in the leader election / quorum protocols. I tried to simply
>> implement backward compatibility in 3.6, but it didn't solve the problem.
>> The new code understands the old protocol, but it can not decide when to
>> use the new or the old protocol during connection initiation. So the old
>> servers can not read the new init messages and we still temporarly end up
>> having two partitions during rolling restart.
>> 
>> I already suggested two ways to handle this later, but I think for 3.6.0
>> now the simplest solution is to disable the new MultiAddress feature and
>> stick to the old protocol version by default. Plus extend the
>> documentation with the note, that enabling the MultiAddress feature is not
>> possible during a rolling upgrade, but it needs to be done with a separate
>> rolling restart. With this approach, the rolling restart should "just work"
>> with the 3.4 / 3.5 configs and we don't require any extra step /
>> configuration from the users, unless they want to use the new feature. I
>> plan to submit a PR with these changes tomorrow to ZOOKEEPER-3720, if there
>> isn't any different opinion.
>> 
>> P.S. For 4.0 we might need to put some extra thinking into backward
>> compatibility / versioning for the quorum and client protocols.
>> 
>> 
>> On Tue, Feb 11, 2020, 20:44 Michael K. Edwards <m....@gmail.com>
>> wrote:
>> 
>>> I hate to say it, but I think 3.6.0 should release as is.  It is
>>> impossible
>>> to *reliably* retrofit backwards compatibility / interoperability onto a
>>> release that was engineered from the beginning without that goal.  Learn
>>> the lesson, set goals differently in the future.
>>> 
>>> On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <
>>> szalay.beko.mate@gmail.com>
>>> wrote:
>>> 
>>>> FYI: I created these scripts for my local tests:
>>>> https://github.com/symat/zk-rolling-upgrade-test
>>>> 
>>>> For the long term I would also add some script that actually monitors
>>> the
>>>> state of the quorum and also runs continuous traffic, not just 1-2
>>>> smoketests after each restart. But I don't know how important this would
>>>> be.
>>>> 
>>>> On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
>>>>> <an...@apache.org> ha scritto:
>>>>>> 
>>>>>> The most obvious one which crosses my mind is that I previously
>>> worked
>>>>> on:
>>>>>> 
>>>>>> 1) run old version cluster,
>>>>>> 2) connect to each node and run smoke tests,
>>>>>> 3) restart one node with new code,
>>>>>> 4) goto 2) until all nodes are upgraded
>>>>>> 
>>>>>> I think this wouldn’t work in a “unit test”, we probably need a
>>>> separate
>>>>> Jenkins job and a nice python script to do this.
>>>>>> 
>>>>>> Andor
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org>
>>> wrote:
>>>>>>> 
>>>>>>> Anyone have ideas how we could add testing for upgrade? Obviously
>>>>> something
>>>>>>> we're missing, esp given it's import.
>>>>> 
>>>>> I will send an email next days with a proposal.
>>>>> btw my idea is very like Andor's one
>>>>> 
>>>>> Once we have an automatic environment we can launch from Jenkins
>>>>> 
>>>>> Enrico
>>>>> 
>>>>> 
>>>>>>> 
>>>>>>> Patrick
>>>>>>> 
>>>>>>> On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
>>>> eolivelli@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
>>>>>>>> <sz...@gmail.com> ha scritto:
>>>>>>>>> 
>>>>>>>>> Hi All,
>>>>>>>>> 
>>>>>>>>> about the question from Michael:
>>>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
>>>> protocol
>>>>> and
>>>>>>>>>> speak old message format when it's talking to old server?
>>>>>>>>> 
>>>>>>>>> In this particular case, it might be enough. The protocol change
>>>>> happened
>>>>>>>>> now in the 'initial message' sent by the QuorumCnxManager.
>>> Maybe it
>>>>> is
>>>>>>>> not
>>>>>>>>> a problem if the new servers can not initiate channels to the
>>> old
>>>>>>>> servers,
>>>>>>>>> maybe it is enough if these channel gets initiated by the old
>>>> servers
>>>>>>>> only.
>>>>>>>>> I will test it quickly.
>>>>>>>>> 
>>>>>>>>> Although I have no idea if any other thing changed in the quorum
>>>>> protocol
>>>>>>>>> between 3.5 and 3.6. In other cases it might not be enough if
>>> the
>>>> new
>>>>>>>>> servers can understand the old messages, as the old servers can
>>>>> break by
>>>>>>>>> not understanding the messages from the new servers. Also, in
>>> the
>>>>> code
>>>>>>>>> currently (AFAIK) there is no generic knowledge of protocol
>>>>> versions, the
>>>>>>>>> servers are not storing that which protocol versions they
>>>> can/should
>>>>> use
>>>>>>>> to
>>>>>>>>> communicate to which particular other servers. Maybe we don't
>>> even
>>>>> need
>>>>>>>>> this, but I would feel better if we would have more tests around
>>>>> these
>>>>>>>>> things.
>>>>>>>>> 
>>>>>>>>> My suggestion for the long term:
>>>>>>>>> - let's fix this particular issue now with 3.6.0 quickly (I
>>> start
>>>>> doing
>>>>>>>>> this today)
>>>>>>>>> - let's do some automation (backed up with jenkins) that will
>>> test
>>>> a
>>>>>>>> whole
>>>>>>>>> combinations of different ZooKeeper upgrade paths by making
>>> rolling
>>>>>>>>> upgrades during some light traffic. Let's have a bit better
>>>>> definition
>>>>>>>>> about what we expect (e.g. the quorum is up, but some clients
>>> can
>>>> get
>>>>>>>>> disconnected? What will happen to the ephemeral nodes? Do we
>>> want
>>>> to
>>>>>>>>> gracefully close or transfer the user sessions before stopping
>>> the
>>>>> old
>>>>>>>>> server?) and let's see where this broke. Just by checking the
>>>> code, I
>>>>>>>> don't
>>>>>>>>> think the quorum will always be up (e.g. between older 3.4
>>> versions
>>>>> and
>>>>>>>>> 3.5).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I am happy to work on this topic
>>>>>>>> 
>>>>>>>>> - we need to update the Wiki about the working rolling upgrade
>>>> paths
>>>>> and
>>>>>>>>> maybe about workarounds if needed
>>>>>>>>> - we might need to do some fixes (adding backward compatible
>>>> versions
>>>>>>>>> and/or specific parameters that enforce old protocol temporary
>>>>> during the
>>>>>>>>> rolling upgrade that can be changed later to the new protocol by
>>>>> either
>>>>>>>>> dynamic reconfig or by rolling restart)
>>>>>>>> 
>>>>>>>> it would be much better on 3.6 code to have some support for
>>>>>>>> compatibility with 3.5 servers
>>>>>>>> we can't require old code to be forward compatible but we can
>>> make
>>>> new
>>>>>>>> code be compatible to a certain extend with old code.
>>>>>>>> If we can achieve this compatibility goal without a flag is
>>> better,
>>>>>>>> users won't have to care about this part and they simply "trust"
>>> on
>>>> us
>>>>>>>> 
>>>>>>>> The rollback story is also important, but maybe we are still not
>>>> ready
>>>>>>>> for it, in case of local changes to store,
>>>>>>>> it is better to have a clear design and plan and work for a new
>>>>> release
>>>>>>>> (3.7?)
>>>>>>>> 
>>>>>>>> Enrico
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Depending on your comments, I am happy to create a few Jira
>>> tickets
>>>>>>>> around
>>>>>>>>> these topics.
>>>>>>>>> 
>>>>>>>>> Kind regards,
>>>>>>>>> Mate
>>>>>>>>> 
>>>>>>>>> ps. Enrico, sorry about your RC... I owe you a beer, let me
>>> know if
>>>>> you
>>>>>>>> are
>>>>>>>>> near to Budapest ;)
>>>>>>>>> 
>>>>>>>>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
>>>> eolivelli@gmail.com
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Good.
>>>>>>>>>> 
>>>>>>>>>> I will cancel the vote for 3.6.0rc2.
>>>>>>>>>> 
>>>>>>>>>> I appreciate very much If Mate and his colleagues have time to
>>>> work
>>>>> on
>>>>>>>> a
>>>>>>>>>> fix.
>>>>>>>>>> Otherwise I will have cycles next week
>>>>>>>>>> 
>>>>>>>>>> I would also like to spend my time in setting up a few minimal
>>>>>>>> integration
>>>>>>>>>> tests about the upgrade story
>>>>>>>>>> 
>>>>>>>>>> Enrico
>>>>>>>>>> 
>>>>>>>>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
>>>> scritto:
>>>>>>>>>> 
>>>>>>>>>>> Kudos Enrico, very thorough work as the final gate keeper of
>>> the
>>>>>>>> release!
>>>>>>>>>>> 
>>>>>>>>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
>>>>>>>>>>> 
>>>>>>>>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of
>>>> the
>>>>>>>> rare
>>>>>>>>>>> piece of software that put so much emphasis on compatibilities
>>>> thus
>>>>>>>> it
>>>>>>>>>> just
>>>>>>>>>>> works when upgrade / downgrade, which is amazing. One
>>> guarantee
>>>> we
>>>>>>>> always
>>>>>>>>>>> had is during rolling upgrade, the quorum will always be
>>>> available,
>>>>>>>>>> leading
>>>>>>>>>>> to no service interruption. It would be sad we lose such
>>>> capability
>>>>>>>> given
>>>>>>>>>>> this is still a tractable problem.
>>>>>>>>>>> 
>>>>>>>>>>> Regarding the fix, can we just make 3.6.0 aware of the old
>>>> protocol
>>>>>>>> and
>>>>>>>>>>> speak old message format when it's talking to old server?
>>>>> Basically,
>>>>>>>> an
>>>>>>>>>>> ugly if else check against the protocol version should work
>>> and
>>>>>>>> there is
>>>>>>>>>> no
>>>>>>>>>>> need to have multiple pass on rolling upgrade process.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
>>>>>>>> eolivelli@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I suggest this plan:
>>>>>>>>>>>> - release 3.6.0 now
>>>>>>>>>>>> - improve the migration story, the flow outlined by Mate is
>>>>>>>>>>>> interesting, but it will take time
>>>>>>>>>>>> 
>>>>>>>>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize
>>> the
>>>>>>>>>>>> release this evening (within 8-10 hours) if no one comes out
>>> in
>>>>> the
>>>>>>>>>>>> VOTE thread with a -1
>>>>>>>>>>>> 
>>>>>>>>>>>> Enrico
>>>>>>>>>>>> 
>>>>>>>>>>>> Enrico
>>>>>>>>>>>> 
>>>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
>>>>>>>>>>>> <ph...@apache.org> ha scritto:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <
>>> andor@apache.org
>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Answers inline.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In my experience when you are close to a release it is
>>>>>>>> better to
>>>>>>>>>> to
>>>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
>>>>>>>> so I
>>>>>>>>>> am
>>>>>>>>>>>>>>> responsible for this change)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Although this statement is acceptable for me, I don’t feel
>>>> this
>>>>>>>>>> patch
>>>>>>>>>>>>>> should not have been merged into 3.6.0. Submission has been
>>>>>>>>>> preceded
>>>>>>>>>>>> by a
>>>>>>>>>>>>>> long argument with MAPR folks who originally wanted to be
>>>>>>>> merged
>>>>>>>>>> into
>>>>>>>>>>>> 3.4
>>>>>>>>>>>>>> branch (considering the pace how ZooKeeper community is
>>> moving
>>>>>>>>>>>> forward) and
>>>>>>>>>>>>>> we reached an agreement that release it with 3.6.0.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Make a long story short, this patch has been outstanding
>>> for
>>>>>>>> ages
>>>>>>>>>>>> without
>>>>>>>>>>>>>> much attention from the community and contributors made a
>>> lot
>>>>>>>> of
>>>>>>>>>>>> effort to
>>>>>>>>>>>>>> get it done before the release.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I would like to ear from people that have been in the
>>>>>>>> community
>>>>>>>>>> for
>>>>>>>>>>>>>>> long time, then I am ready to complete the release process
>>>>>>>> for
>>>>>>>>>>>>>>> 3.6.0rc2.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Me too.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I tend to accept the way rolling restart works now - as you
>>>>>>>>>> described
>>>>>>>>>>>>>> Enrico - and given that situation was pretty much the same
>>>>>>>> between
>>>>>>>>>>> 3.4
>>>>>>>>>>>> and
>>>>>>>>>>>>>> 3.5, I don’t feel we have to make additional changes.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On the other hand, the fix that Mate suggested sounds quite
>>>>>>>> cool,
>>>>>>>>>> I’m
>>>>>>>>>>>> also
>>>>>>>>>>>>>> happy to work on getting it in.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Fyi, Release Management page says the following:
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> "major.minor release of ZooKeeper must be backwards
>>> compatible
>>>>>>>> with
>>>>>>>>>>> the
>>>>>>>>>>>>>> previous minor release, major.(minor-1)"
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> Our users, direct and indirect, value the ability to
>>> migrate to
>>>>>>>> newer
>>>>>>>>>>>>> versions - esp as we drop support for older. Frictions such
>>> as
>>>>>>>> this
>>>>>>>>>> can
>>>>>>>>>>>> be
>>>>>>>>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given
>>> our
>>>>>>>>>>> published
>>>>>>>>>>>>> guidelines.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Patrick
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Andor
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
>>>>>>>> eolivelli@gmail.com
>>>>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thank you Mate for checking and explaining this story.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I find it very interesting that the cause is
>>> ZOOKEEPER-3188
>>>>>>>> as:
>>>>>>>>>>>>>>> - it is the last "big patch" committed to 3.6 before
>>>>>>>> starting the
>>>>>>>>>>>>>>> release process
>>>>>>>>>>>>>>> - it is the cause of the failure of the first RC
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In my experience when you are close to a release it is
>>>>>>>> better to
>>>>>>>>>> to
>>>>>>>>>>>>>>> make big changes. (I am among the approvers of that patch,
>>>>>>>> so I
>>>>>>>>>> am
>>>>>>>>>>>>>>> responsible for this change)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This is a pointer to the change to whom who wants to
>>>>>>>> understand
>>>>>>>>>>>> better
>>>>>>>>>>>>>>> the context
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>>> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was
>>> the
>>>>>>>> same
>>>>>>>>>>> and
>>>>>>>>>>>>>>> if this statement holds then I feel we can continue
>>>>>>>>>>>>>>> with this release.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is
>>> too
>>>>>>>>>>>> complex.
>>>>>>>>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and
>>> we
>>>>>>>> do
>>>>>>>>>> not
>>>>>>>>>>>>>>> have tools to certify this compatibility (at least not in
>>> the
>>>>>>>>>> short
>>>>>>>>>>>>>>> term)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I would like to ear from people that have been in the
>>>>>>>> community
>>>>>>>>>> for
>>>>>>>>>>>>>>> long time, then I am ready to complete the release process
>>>>>>>> for
>>>>>>>>>>>>>>> 3.6.0rc2.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I will update the website and the release notes with a
>>>>>>>> specific
>>>>>>>>>>>>>>> warning about the upgrade, we should also update the Wiki
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
>>>>>>>>>>>>>>> <sz...@gmail.com> ha scritto:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Enrico!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
>>>>>>>>>>>>>> QuorumCnxManager.
>>>>>>>>>>>>>>>> The Protocol version  was changed last time in
>>>>>>>> ZOOKEEPER-2186
>>>>>>>>>>>> released
>>>>>>>>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix
>>> some
>>>>>>>> bugs.
>>>>>>>>>>>> Later I
>>>>>>>>>>>>>>>> also changed the protocol version when the format of the
>>>>>>>> initial
>>>>>>>>>>>> message
>>>>>>>>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum
>>> protocol
>>>>>>>> is
>>>>>>>>>> not
>>>>>>>>>>>>>>>> compatible in this case and is the 'expected' behavior if
>>>>>>>> you
>>>>>>>>>>>> upgrade
>>>>>>>>>>>>>> e.g
>>>>>>>>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6
>>> to
>>>>>>>>>> 3.6.0.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
>>>>>>>> then and
>>>>>>>>>>>> got to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> conclusion that it is not that bad, as there will be no
>>> data
>>>>>>>>>> loss
>>>>>>>>>>>> as you
>>>>>>>>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
>>>>>>>> should
>>>>>>>>>>>> ensure
>>>>>>>>>>>>>>>> both backward and forward compatibility to make sure that
>>>>>>>> the
>>>>>>>>>> old
>>>>>>>>>>>> and
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> new part of the quorum can still speak to each other. The
>>>>>>>>>> current
>>>>>>>>>>>>>> solution
>>>>>>>>>>>>>>>> (simply failing if the protocol versions mismatch) is
>>> more
>>>>>>>>>> simple
>>>>>>>>>>>> and
>>>>>>>>>>>>>> still
>>>>>>>>>>>>>>>> working just fine: as the servers are restarted
>>> one-by-one,
>>>>>>>> the
>>>>>>>>>>>> nodes
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> the old protocol version and the nodes with the new
>>> protocol
>>>>>>>>>>> version
>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>> form two partitions, but any given time only one
>>> partition
>>>>>>>> will
>>>>>>>>>>>> have the
>>>>>>>>>>>>>>>> quorum.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Still, thinking it trough, as a side effect in these
>>> cases
>>>>>>>> there
>>>>>>>>>>>> will
>>>>>>>>>>>>>> be a
>>>>>>>>>>>>>>>> short time when none of the partitions will have quorums
>>>>>>>> (when
>>>>>>>>>> we
>>>>>>>>>>>> have N
>>>>>>>>>>>>>>>> servers with the old protocol version, N servers with the
>>>>>>>> new
>>>>>>>>>>>> protocol
>>>>>>>>>>>>>>>> version, and there is one server just being restarted). I
>>>>>>>> am not
>>>>>>>>>>>> sure
>>>>>>>>>>>>>> if we
>>>>>>>>>>>>>>>> can accept this.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
>>>>>>>> possible
>>>>>>>>>> to
>>>>>>>>>>>> parse
>>>>>>>>>>>>>>>> the initial message of the old protocol version with the
>>> new
>>>>>>>>>> code.
>>>>>>>>>>>> But
>>>>>>>>>>>>>> I am
>>>>>>>>>>>>>>>> not sure if it would be enough (as the old code will not
>>> be
>>>>>>>> able
>>>>>>>>>>> to
>>>>>>>>>>>>>> parse
>>>>>>>>>>>>>>>> the new initial message).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> One option can be to make a patch also for 3.5 to have a
>>>>>>>> version
>>>>>>>>>>>> which
>>>>>>>>>>>>>>>> supports both protocol versions. (let's say in 3.5.8)
>>> Then
>>>>>>>> we
>>>>>>>>>> can
>>>>>>>>>>>> write
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the release note, that if you need rolling upgrade from
>>> any
>>>>>>>>>>> versions
>>>>>>>>>>>>>> since
>>>>>>>>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
>>>>>>>>>> upgrading
>>>>>>>>>>> to
>>>>>>>>>>>>>> 3.6.0.
>>>>>>>>>>>>>>>> We can even make the same thing on the 3.4 branch.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> But I am also new to the community... It would be great
>>> to
>>>>>>>> hear
>>>>>>>>>>> the
>>>>>>>>>>>>>> opinion
>>>>>>>>>>>>>>>> of more experienced people.
>>>>>>>>>>>>>>>> Whatever the decision will be, I am happy to make the
>>>>>>>> changes.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> And sorry for breaking the RC (if we decide that this
>>> needs
>>>>>>>> to
>>>>>>>>>> be
>>>>>>>>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>> Mate
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
>>>>>>>>>>>> eolivelli@gmail.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
>>>>>>>> closing the
>>>>>>>>>>>> VOTE
>>>>>>>>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to
>>> an
>>>>>>>>>>> apparent
>>>>>>>>>>>>>>>>> blocker.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
>>>>>>>> looks
>>>>>>>>>>> like
>>>>>>>>>>>>>>>>> peers are not able to talk to each other.
>>>>>>>>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
>>>>>>>>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
>>>>>>>> errors on
>>>>>>>>>>> 3.5
>>>>>>>>>>>>>> nodes:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
>>>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918]
>>> -
>>>>>>>>>>>> Received
>>>>>>>>>>>>>>>>> connection request 127.0.0.1:62591
>>>>>>>>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
>>>>>>>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>>> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
>>>>>>>>>>>>>>>>> Got unrecognized protocol version -65535
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Once I upgrade all of the peers the system is up and
>>>>>>>> running,
>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>> apparently no data loss.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
>>>>>>>> say,
>>>>>>>>>>>> server1,
>>>>>>>>>>>>>>>>> server1 is not able to accept connections (error "Close
>>> of
>>>>>>>>>>> session
>>>>>>>>>>>> 0x0
>>>>>>>>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
>>>>>>>>>> clients,
>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> is expected, because as far as it cannot talk with the
>>>>>>>> other
>>>>>>>>>>> peers
>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> is practically partitioned away from the cluster.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> My questions are:
>>>>>>>>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
>>>>>>>> from
>>>>>>>>>> 3.5
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long
>>> ago,
>>>>>>>>>> and I
>>>>>>>>>>>> was
>>>>>>>>>>>>>>>>> not in the community as dev so I cannot tell
>>>>>>>>>>>>>>>>> 2) is this a viable option for users ? to have some
>>>>>>>> temporary
>>>>>>>>>>>> glitch
>>>>>>>>>>>>>>>>> during the upgrade and hope that the upgrade completes
>>>>>>>> without
>>>>>>>>>>>>>>>>> troubles ?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> In theory as long as two servers are running the same
>>> major
>>>>>>>>>>> version
>>>>>>>>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
>>>>>>>> make
>>>>>>>>>>>> progress
>>>>>>>>>>>>>>>>> and to server clients.
>>>>>>>>>>>>>>>>> I feel that this is quite dangerous, but I don't have
>>>>>>>> enough
>>>>>>>>>>>> context
>>>>>>>>>>>>>>>>> to understand how this problem is possible and when we
>>>>>>>> decided
>>>>>>>>>> to
>>>>>>>>>>>>>>>>> break compatibility.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The other option is that I am wrong in my test and I am
>>>>>>>> messing
>>>>>>>>>>> up
>>>>>>>>>>>> :-)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The other upgrade path I would like to see working like
>>> a
>>>>>>>> charm
>>>>>>>>>>> is
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
>>>>>>>> release
>>>>>>>>>> 3.6
>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>> Enrico
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 


Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by "Michael K. Edwards" <m....@gmail.com>.
I think it would be prudent to emphasize in the release notes that rolling
upgrades (and mixed ensembles generally) are effectively untested.  That
this was, in practice, a non-goal of this release cycle.  Because if we can
get to rc2 without noticing a showstopper, clearly it's not something that
anyone has gotten around to attempting; and there have to be a hundred
corner cases beyond the MultiAddress issue.

On Tue, Feb 11, 2020 at 12:27 PM Szalay-Bekő Máté <
szalay.beko.mate@gmail.com> wrote:

> I see the main problem here in the fact that we are missing proper
> versioning in the leader election / quorum protocols. I tried to simply
> implement backward compatibility in 3.6, but it didn't solve the problem.
> The new code understands the old protocol, but it can not decide when to
> use the new or the old protocol during connection initiation. So the old
> servers can not read the new init messages and we still temporarly end up
> having two partitions during rolling restart.
>
> I already suggested two ways to handle this later, but I think for 3.6.0
> now the simplest solution is to disable the new MultiAddress feature and
> stick to the old protocol version by default. Plus extend the
> documentation with the note, that enabling the MultiAddress feature is not
> possible during a rolling upgrade, but it needs to be done with a separate
> rolling restart. With this approach, the rolling restart should "just work"
> with the 3.4 / 3.5 configs and we don't require any extra step /
> configuration from the users, unless they want to use the new feature. I
> plan to submit a PR with these changes tomorrow to ZOOKEEPER-3720, if there
> isn't any different opinion.
>
> P.S. For 4.0 we might need to put some extra thinking into backward
> compatibility / versioning for the quorum and client protocols.
>
>
> On Tue, Feb 11, 2020, 20:44 Michael K. Edwards <m....@gmail.com>
> wrote:
>
>> I hate to say it, but I think 3.6.0 should release as is.  It is
>> impossible
>> to *reliably* retrofit backwards compatibility / interoperability onto a
>> release that was engineered from the beginning without that goal.  Learn
>> the lesson, set goals differently in the future.
>>
>> On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <
>> szalay.beko.mate@gmail.com>
>> wrote:
>>
>> > FYI: I created these scripts for my local tests:
>> > https://github.com/symat/zk-rolling-upgrade-test
>> >
>> > For the long term I would also add some script that actually monitors
>> the
>> > state of the quorum and also runs continuous traffic, not just 1-2
>> > smoketests after each restart. But I don't know how important this would
>> > be.
>> >
>> > On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
>> > wrote:
>> >
>> > > Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
>> > > <an...@apache.org> ha scritto:
>> > > >
>> > > > The most obvious one which crosses my mind is that I previously
>> worked
>> > > on:
>> > > >
>> > > > 1) run old version cluster,
>> > > > 2) connect to each node and run smoke tests,
>> > > > 3) restart one node with new code,
>> > > > 4) goto 2) until all nodes are upgraded
>> > > >
>> > > > I think this wouldn’t work in a “unit test”, we probably need a
>> > separate
>> > > Jenkins job and a nice python script to do this.
>> > > >
>> > > > Andor
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > > On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org>
>> wrote:
>> > > > >
>> > > > > Anyone have ideas how we could add testing for upgrade? Obviously
>> > > something
>> > > > > we're missing, esp given it's import.
>> > >
>> > > I will send an email next days with a proposal.
>> > > btw my idea is very like Andor's one
>> > >
>> > > Once we have an automatic environment we can launch from Jenkins
>> > >
>> > > Enrico
>> > >
>> > >
>> > > > >
>> > > > > Patrick
>> > > > >
>> > > > > On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
>> > eolivelli@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > >> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
>> > > > >> <sz...@gmail.com> ha scritto:
>> > > > >>>
>> > > > >>> Hi All,
>> > > > >>>
>> > > > >>> about the question from Michael:
>> > > > >>>> Regarding the fix, can we just make 3.6.0 aware of the old
>> > protocol
>> > > and
>> > > > >>>> speak old message format when it's talking to old server?
>> > > > >>>
>> > > > >>> In this particular case, it might be enough. The protocol change
>> > > happened
>> > > > >>> now in the 'initial message' sent by the QuorumCnxManager.
>> Maybe it
>> > > is
>> > > > >> not
>> > > > >>> a problem if the new servers can not initiate channels to the
>> old
>> > > > >> servers,
>> > > > >>> maybe it is enough if these channel gets initiated by the old
>> > servers
>> > > > >> only.
>> > > > >>> I will test it quickly.
>> > > > >>>
>> > > > >>> Although I have no idea if any other thing changed in the quorum
>> > > protocol
>> > > > >>> between 3.5 and 3.6. In other cases it might not be enough if
>> the
>> > new
>> > > > >>> servers can understand the old messages, as the old servers can
>> > > break by
>> > > > >>> not understanding the messages from the new servers. Also, in
>> the
>> > > code
>> > > > >>> currently (AFAIK) there is no generic knowledge of protocol
>> > > versions, the
>> > > > >>> servers are not storing that which protocol versions they
>> > can/should
>> > > use
>> > > > >> to
>> > > > >>> communicate to which particular other servers. Maybe we don't
>> even
>> > > need
>> > > > >>> this, but I would feel better if we would have more tests around
>> > > these
>> > > > >>> things.
>> > > > >>>
>> > > > >>> My suggestion for the long term:
>> > > > >>> - let's fix this particular issue now with 3.6.0 quickly (I
>> start
>> > > doing
>> > > > >>> this today)
>> > > > >>> - let's do some automation (backed up with jenkins) that will
>> test
>> > a
>> > > > >> whole
>> > > > >>> combinations of different ZooKeeper upgrade paths by making
>> rolling
>> > > > >>> upgrades during some light traffic. Let's have a bit better
>> > > definition
>> > > > >>> about what we expect (e.g. the quorum is up, but some clients
>> can
>> > get
>> > > > >>> disconnected? What will happen to the ephemeral nodes? Do we
>> want
>> > to
>> > > > >>> gracefully close or transfer the user sessions before stopping
>> the
>> > > old
>> > > > >>> server?) and let's see where this broke. Just by checking the
>> > code, I
>> > > > >> don't
>> > > > >>> think the quorum will always be up (e.g. between older 3.4
>> versions
>> > > and
>> > > > >>> 3.5).
>> > > > >>
>> > > > >>
>> > > > >> I am happy to work on this topic
>> > > > >>
>> > > > >>> - we need to update the Wiki about the working rolling upgrade
>> > paths
>> > > and
>> > > > >>> maybe about workarounds if needed
>> > > > >>> - we might need to do some fixes (adding backward compatible
>> > versions
>> > > > >>> and/or specific parameters that enforce old protocol temporary
>> > > during the
>> > > > >>> rolling upgrade that can be changed later to the new protocol by
>> > > either
>> > > > >>> dynamic reconfig or by rolling restart)
>> > > > >>
>> > > > >> it would be much better on 3.6 code to have some support for
>> > > > >> compatibility with 3.5 servers
>> > > > >> we can't require old code to be forward compatible but we can
>> make
>> > new
>> > > > >> code be compatible to a certain extend with old code.
>> > > > >> If we can achieve this compatibility goal without a flag is
>> better,
>> > > > >> users won't have to care about this part and they simply "trust"
>> on
>> > us
>> > > > >>
>> > > > >> The rollback story is also important, but maybe we are still not
>> > ready
>> > > > >> for it, in case of local changes to store,
>> > > > >> it is better to have a clear design and plan and work for a new
>> > > release
>> > > > >> (3.7?)
>> > > > >>
>> > > > >> Enrico
>> > > > >>
>> > > > >>>
>> > > > >>> Depending on your comments, I am happy to create a few Jira
>> tickets
>> > > > >> around
>> > > > >>> these topics.
>> > > > >>>
>> > > > >>> Kind regards,
>> > > > >>> Mate
>> > > > >>>
>> > > > >>> ps. Enrico, sorry about your RC... I owe you a beer, let me
>> know if
>> > > you
>> > > > >> are
>> > > > >>> near to Budapest ;)
>> > > > >>>
>> > > > >>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
>> > eolivelli@gmail.com
>> > > >
>> > > > >> wrote:
>> > > > >>>
>> > > > >>>> Good.
>> > > > >>>>
>> > > > >>>> I will cancel the vote for 3.6.0rc2.
>> > > > >>>>
>> > > > >>>> I appreciate very much If Mate and his colleagues have time to
>> > work
>> > > on
>> > > > >> a
>> > > > >>>> fix.
>> > > > >>>> Otherwise I will have cycles next week
>> > > > >>>>
>> > > > >>>> I would also like to spend my time in setting up a few minimal
>> > > > >> integration
>> > > > >>>> tests about the upgrade story
>> > > > >>>>
>> > > > >>>> Enrico
>> > > > >>>>
>> > > > >>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
>> > scritto:
>> > > > >>>>
>> > > > >>>>> Kudos Enrico, very thorough work as the final gate keeper of
>> the
>> > > > >> release!
>> > > > >>>>>
>> > > > >>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
>> > > > >>>>>
>> > > > >>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of
>> > the
>> > > > >> rare
>> > > > >>>>> piece of software that put so much emphasis on compatibilities
>> > thus
>> > > > >> it
>> > > > >>>> just
>> > > > >>>>> works when upgrade / downgrade, which is amazing. One
>> guarantee
>> > we
>> > > > >> always
>> > > > >>>>> had is during rolling upgrade, the quorum will always be
>> > available,
>> > > > >>>> leading
>> > > > >>>>> to no service interruption. It would be sad we lose such
>> > capability
>> > > > >> given
>> > > > >>>>> this is still a tractable problem.
>> > > > >>>>>
>> > > > >>>>> Regarding the fix, can we just make 3.6.0 aware of the old
>> > protocol
>> > > > >> and
>> > > > >>>>> speak old message format when it's talking to old server?
>> > > Basically,
>> > > > >> an
>> > > > >>>>> ugly if else check against the protocol version should work
>> and
>> > > > >> there is
>> > > > >>>> no
>> > > > >>>>> need to have multiple pass on rolling upgrade process.
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
>> > > > >> eolivelli@gmail.com>
>> > > > >>>>> wrote:
>> > > > >>>>>
>> > > > >>>>>> I suggest this plan:
>> > > > >>>>>> - release 3.6.0 now
>> > > > >>>>>> - improve the migration story, the flow outlined by Mate is
>> > > > >>>>>> interesting, but it will take time
>> > > > >>>>>>
>> > > > >>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize
>> the
>> > > > >>>>>> release this evening (within 8-10 hours) if no one comes out
>> in
>> > > the
>> > > > >>>>>> VOTE thread with a -1
>> > > > >>>>>>
>> > > > >>>>>> Enrico
>> > > > >>>>>>
>> > > > >>>>>> Enrico
>> > > > >>>>>>
>> > > > >>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
>> > > > >>>>>> <ph...@apache.org> ha scritto:
>> > > > >>>>>>>
>> > > > >>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <
>> andor@apache.org
>> > >
>> > > > >>>> wrote:
>> > > > >>>>>>>
>> > > > >>>>>>>> Hi,
>> > > > >>>>>>>>
>> > > > >>>>>>>> Answers inline.
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>>> In my experience when you are close to a release it is
>> > > > >> better to
>> > > > >>>> to
>> > > > >>>>>>>>> make big changes. (I am among the approvers of that patch,
>> > > > >> so I
>> > > > >>>> am
>> > > > >>>>>>>>> responsible for this change)
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>> Although this statement is acceptable for me, I don’t feel
>> > this
>> > > > >>>> patch
>> > > > >>>>>>>> should not have been merged into 3.6.0. Submission has been
>> > > > >>>> preceded
>> > > > >>>>>> by a
>> > > > >>>>>>>> long argument with MAPR folks who originally wanted to be
>> > > > >> merged
>> > > > >>>> into
>> > > > >>>>>> 3.4
>> > > > >>>>>>>> branch (considering the pace how ZooKeeper community is
>> moving
>> > > > >>>>>> forward) and
>> > > > >>>>>>>> we reached an agreement that release it with 3.6.0.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Make a long story short, this patch has been outstanding
>> for
>> > > > >> ages
>> > > > >>>>>> without
>> > > > >>>>>>>> much attention from the community and contributors made a
>> lot
>> > > > >> of
>> > > > >>>>>> effort to
>> > > > >>>>>>>> get it done before the release.
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>>> I would like to ear from people that have been in the
>> > > > >> community
>> > > > >>>> for
>> > > > >>>>>>>>> long time, then I am ready to complete the release process
>> > > > >> for
>> > > > >>>>>>>>> 3.6.0rc2.
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>> Me too.
>> > > > >>>>>>>>
>> > > > >>>>>>>> I tend to accept the way rolling restart works now - as you
>> > > > >>>> described
>> > > > >>>>>>>> Enrico - and given that situation was pretty much the same
>> > > > >> between
>> > > > >>>>> 3.4
>> > > > >>>>>> and
>> > > > >>>>>>>> 3.5, I don’t feel we have to make additional changes.
>> > > > >>>>>>>>
>> > > > >>>>>>>> On the other hand, the fix that Mate suggested sounds quite
>> > > > >> cool,
>> > > > >>>> I’m
>> > > > >>>>>> also
>> > > > >>>>>>>> happy to work on getting it in.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Fyi, Release Management page says the following:
>> > > > >>>>>>>>
>> > > > >>>>>>
>> > > > >>>>
>> > > > >>
>> > >
>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
>> > > > >>>>>>>>
>> > > > >>>>>>>> "major.minor release of ZooKeeper must be backwards
>> compatible
>> > > > >> with
>> > > > >>>>> the
>> > > > >>>>>>>> previous minor release, major.(minor-1)"
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>> Our users, direct and indirect, value the ability to
>> migrate to
>> > > > >> newer
>> > > > >>>>>>> versions - esp as we drop support for older. Frictions such
>> as
>> > > > >> this
>> > > > >>>> can
>> > > > >>>>>> be
>> > > > >>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given
>> our
>> > > > >>>>> published
>> > > > >>>>>>> guidelines.
>> > > > >>>>>>>
>> > > > >>>>>>> Patrick
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>>> Andor
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
>> > > > >> eolivelli@gmail.com
>> > > > >>>>>
>> > > > >>>>>> wrote:
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> Thank you Mate for checking and explaining this story.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> I find it very interesting that the cause is
>> ZOOKEEPER-3188
>> > > > >> as:
>> > > > >>>>>>>>> - it is the last "big patch" committed to 3.6 before
>> > > > >> starting the
>> > > > >>>>>>>>> release process
>> > > > >>>>>>>>> - it is the cause of the failure of the first RC
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> In my experience when you are close to a release it is
>> > > > >> better to
>> > > > >>>> to
>> > > > >>>>>>>>> make big changes. (I am among the approvers of that patch,
>> > > > >> so I
>> > > > >>>> am
>> > > > >>>>>>>>> responsible for this change)
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> This is a pointer to the change to whom who wants to
>> > > > >> understand
>> > > > >>>>>> better
>> > > > >>>>>>>>> the context
>> > > > >>>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>
>> > >
>> >
>> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was
>> the
>> > > > >> same
>> > > > >>>>> and
>> > > > >>>>>>>>> if this statement holds then I feel we can continue
>> > > > >>>>>>>>> with this release.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is
>> too
>> > > > >>>>>> complex.
>> > > > >>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and
>> we
>> > > > >> do
>> > > > >>>> not
>> > > > >>>>>>>>> have tools to certify this compatibility (at least not in
>> the
>> > > > >>>> short
>> > > > >>>>>>>>> term)
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> I would like to ear from people that have been in the
>> > > > >> community
>> > > > >>>> for
>> > > > >>>>>>>>> long time, then I am ready to complete the release process
>> > > > >> for
>> > > > >>>>>>>>> 3.6.0rc2.
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> I will update the website and the release notes with a
>> > > > >> specific
>> > > > >>>>>>>>> warning about the upgrade, we should also update the Wiki
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> Enrico
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
>> > > > >>>>>>>>> <sz...@gmail.com> ha scritto:
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Hi Enrico!
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
>> > > > >>>>>>>> QuorumCnxManager.
>> > > > >>>>>>>>>> The Protocol version  was changed last time in
>> > > > >> ZOOKEEPER-2186
>> > > > >>>>>> released
>> > > > >>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix
>> some
>> > > > >> bugs.
>> > > > >>>>>> Later I
>> > > > >>>>>>>>>> also changed the protocol version when the format of the
>> > > > >> initial
>> > > > >>>>>> message
>> > > > >>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum
>> protocol
>> > > > >> is
>> > > > >>>> not
>> > > > >>>>>>>>>> compatible in this case and is the 'expected' behavior if
>> > > > >> you
>> > > > >>>>>> upgrade
>> > > > >>>>>>>> e.g
>> > > > >>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6
>> to
>> > > > >>>> 3.6.0.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
>> > > > >> then and
>> > > > >>>>>> got to
>> > > > >>>>>>>> the
>> > > > >>>>>>>>>> conclusion that it is not that bad, as there will be no
>> data
>> > > > >>>> loss
>> > > > >>>>>> as you
>> > > > >>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
>> > > > >> should
>> > > > >>>>>> ensure
>> > > > >>>>>>>>>> both backward and forward compatibility to make sure that
>> > > > >> the
>> > > > >>>> old
>> > > > >>>>>> and
>> > > > >>>>>>>> the
>> > > > >>>>>>>>>> new part of the quorum can still speak to each other. The
>> > > > >>>> current
>> > > > >>>>>>>> solution
>> > > > >>>>>>>>>> (simply failing if the protocol versions mismatch) is
>> more
>> > > > >>>> simple
>> > > > >>>>>> and
>> > > > >>>>>>>> still
>> > > > >>>>>>>>>> working just fine: as the servers are restarted
>> one-by-one,
>> > > > >> the
>> > > > >>>>>> nodes
>> > > > >>>>>>>> with
>> > > > >>>>>>>>>> the old protocol version and the nodes with the new
>> protocol
>> > > > >>>>> version
>> > > > >>>>>>>> will
>> > > > >>>>>>>>>> form two partitions, but any given time only one
>> partition
>> > > > >> will
>> > > > >>>>>> have the
>> > > > >>>>>>>>>> quorum.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Still, thinking it trough, as a side effect in these
>> cases
>> > > > >> there
>> > > > >>>>>> will
>> > > > >>>>>>>> be a
>> > > > >>>>>>>>>> short time when none of the partitions will have quorums
>> > > > >> (when
>> > > > >>>> we
>> > > > >>>>>> have N
>> > > > >>>>>>>>>> servers with the old protocol version, N servers with the
>> > > > >> new
>> > > > >>>>>> protocol
>> > > > >>>>>>>>>> version, and there is one server just being restarted). I
>> > > > >> am not
>> > > > >>>>>> sure
>> > > > >>>>>>>> if we
>> > > > >>>>>>>>>> can accept this.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
>> > > > >> possible
>> > > > >>>> to
>> > > > >>>>>> parse
>> > > > >>>>>>>>>> the initial message of the old protocol version with the
>> new
>> > > > >>>> code.
>> > > > >>>>>> But
>> > > > >>>>>>>> I am
>> > > > >>>>>>>>>> not sure if it would be enough (as the old code will not
>> be
>> > > > >> able
>> > > > >>>>> to
>> > > > >>>>>>>> parse
>> > > > >>>>>>>>>> the new initial message).
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> One option can be to make a patch also for 3.5 to have a
>> > > > >> version
>> > > > >>>>>> which
>> > > > >>>>>>>>>> supports both protocol versions. (let's say in 3.5.8)
>> Then
>> > > > >> we
>> > > > >>>> can
>> > > > >>>>>> write
>> > > > >>>>>>>> to
>> > > > >>>>>>>>>> the release note, that if you need rolling upgrade from
>> any
>> > > > >>>>> versions
>> > > > >>>>>>>> since
>> > > > >>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
>> > > > >>>> upgrading
>> > > > >>>>> to
>> > > > >>>>>>>> 3.6.0.
>> > > > >>>>>>>>>> We can even make the same thing on the 3.4 branch.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> But I am also new to the community... It would be great
>> to
>> > > > >> hear
>> > > > >>>>> the
>> > > > >>>>>>>> opinion
>> > > > >>>>>>>>>> of more experienced people.
>> > > > >>>>>>>>>> Whatever the decision will be, I am happy to make the
>> > > > >> changes.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> And sorry for breaking the RC (if we decide that this
>> needs
>> > > > >> to
>> > > > >>>> be
>> > > > >>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Kind regards,
>> > > > >>>>>>>>>> Mate
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
>> > > > >>>>>> eolivelli@gmail.com>
>> > > > >>>>>>>> wrote:
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>>> Hi,
>> > > > >>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
>> > > > >> closing the
>> > > > >>>>>> VOTE
>> > > > >>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to
>> an
>> > > > >>>>> apparent
>> > > > >>>>>>>>>>> blocker.
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
>> > > > >> looks
>> > > > >>>>> like
>> > > > >>>>>>>>>>> peers are not able to talk to each other.
>> > > > >>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
>> > > > >>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
>> > > > >> errors on
>> > > > >>>>> 3.5
>> > > > >>>>>>>> nodes:
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
>> > > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918]
>> -
>> > > > >>>>>> Received
>> > > > >>>>>>>>>>> connection request 127.0.0.1:62591
>> > > > >>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
>> > > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>
>> > >
>> >
>> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
>> > > > >>>>>>>>>>> Got unrecognized protocol version -65535
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> Once I upgrade all of the peers the system is up and
>> > > > >> running,
>> > > > >>>>>> without
>> > > > >>>>>>>>>>> apparently no data loss.
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
>> > > > >> say,
>> > > > >>>>>> server1,
>> > > > >>>>>>>>>>> server1 is not able to accept connections (error "Close
>> of
>> > > > >>>>> session
>> > > > >>>>>> 0x0
>> > > > >>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
>> > > > >>>> clients,
>> > > > >>>>>> this
>> > > > >>>>>>>>>>> is expected, because as far as it cannot talk with the
>> > > > >> other
>> > > > >>>>> peers
>> > > > >>>>>> it
>> > > > >>>>>>>>>>> is practically partitioned away from the cluster.
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> My questions are:
>> > > > >>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
>> > > > >> from
>> > > > >>>> 3.5
>> > > > >>>>> to
>> > > > >>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long
>> ago,
>> > > > >>>> and I
>> > > > >>>>>> was
>> > > > >>>>>>>>>>> not in the community as dev so I cannot tell
>> > > > >>>>>>>>>>> 2) is this a viable option for users ? to have some
>> > > > >> temporary
>> > > > >>>>>> glitch
>> > > > >>>>>>>>>>> during the upgrade and hope that the upgrade completes
>> > > > >> without
>> > > > >>>>>>>>>>> troubles ?
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> In theory as long as two servers are running the same
>> major
>> > > > >>>>> version
>> > > > >>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
>> > > > >> make
>> > > > >>>>>> progress
>> > > > >>>>>>>>>>> and to server clients.
>> > > > >>>>>>>>>>> I feel that this is quite dangerous, but I don't have
>> > > > >> enough
>> > > > >>>>>> context
>> > > > >>>>>>>>>>> to understand how this problem is possible and when we
>> > > > >> decided
>> > > > >>>> to
>> > > > >>>>>>>>>>> break compatibility.
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> The other option is that I am wrong in my test and I am
>> > > > >> messing
>> > > > >>>>> up
>> > > > >>>>>> :-)
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> The other upgrade path I would like to see working like
>> a
>> > > > >> charm
>> > > > >>>>> is
>> > > > >>>>>> the
>> > > > >>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
>> > > > >> release
>> > > > >>>> 3.6
>> > > > >>>>> we
>> > > > >>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> Regards
>> > > > >>>>>>>>>>> Enrico
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>
>> > > >
>> > >
>> >
>>
>>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Michael Han <ha...@apache.org>.
>> but it didn't solve the problem.

Yes, the constraint is 3.6.0 has to default to old protocol version so the
outgoing message is backward compatible. If we do this, then it's
essentially the "the simplest solution" proposed.

>> disable the new MultiAddress feature and stick to the old protocol
version by default.

+1. I would trade the capability of doing seamless rolling upgrade over a
new feature that's not absolutely required for operations of ZooKeeper. We
can turn the feature on by default in future releases which has better
built in b/f compatibility.

>> One option can be to make a patch also for 3.5 to have a version which
supports both protocol versions.

This is another solution - historically we did similar on requiring a
certain upgrade path when a new feature changed wire protocol such as
dynamic reconfiguration requires 3.4.x first upgrades to 3.4.6 then to
3.5.0. We can start doing this from next 3.5 release so it's forward
compatible.

>> I don't think the quorum will always be up (e.g. between older 3.4
versions and 3.5).

The only case this is not holding is upgrading of versions older than 3.4.6
to 3.5 as previously commented afaik.

On Tue, Feb 11, 2020 at 12:27 PM Szalay-Bekő Máté <
szalay.beko.mate@gmail.com> wrote:

> I see the main problem here in the fact that we are missing proper
> versioning in the leader election / quorum protocols. I tried to simply
> implement backward compatibility in 3.6, but it didn't solve the problem.
> The new code understands the old protocol, but it can not decide when to
> use the new or the old protocol during connection initiation. So the old
> servers can not read the new init messages and we still temporarly end up
> having two partitions during rolling restart.
>
> I already suggested two ways to handle this later, but I think for 3.6.0
> now the simplest solution is to disable the new MultiAddress feature and
> stick to the old protocol version by default. Plus extend the documentation
> with the note, that enabling the MultiAddress feature is not possible
> during a rolling upgrade, but it needs to be done with a separate rolling
> restart. With this approach, the rolling restart should "just work" with
> the 3.4 / 3.5 configs and we don't require any extra step / configuration
> from the users, unless they want to use the new feature. I plan to submit a
> PR with these changes tomorrow to ZOOKEEPER-3720, if there isn't any
> different opinion.
>
> P.S. For 4.0 we might need to put some extra thinking into backward
> compatibility / versioning for the quorum and client protocols.
>
>
> On Tue, Feb 11, 2020, 20:44 Michael K. Edwards <m....@gmail.com>
> wrote:
>
> > I hate to say it, but I think 3.6.0 should release as is.  It is
> impossible
> > to *reliably* retrofit backwards compatibility / interoperability onto a
> > release that was engineered from the beginning without that goal.  Learn
> > the lesson, set goals differently in the future.
> >
> > On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <
> > szalay.beko.mate@gmail.com>
> > wrote:
> >
> > > FYI: I created these scripts for my local tests:
> > > https://github.com/symat/zk-rolling-upgrade-test
> > >
> > > For the long term I would also add some script that actually monitors
> the
> > > state of the quorum and also runs continuous traffic, not just 1-2
> > > smoketests after each restart. But I don't know how important this
> would
> > > be.
> > >
> > > On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
> > > > <an...@apache.org> ha scritto:
> > > > >
> > > > > The most obvious one which crosses my mind is that I previously
> > worked
> > > > on:
> > > > >
> > > > > 1) run old version cluster,
> > > > > 2) connect to each node and run smoke tests,
> > > > > 3) restart one node with new code,
> > > > > 4) goto 2) until all nodes are upgraded
> > > > >
> > > > > I think this wouldn’t work in a “unit test”, we probably need a
> > > separate
> > > > Jenkins job and a nice python script to do this.
> > > > >
> > > > > Andor
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org>
> wrote:
> > > > > >
> > > > > > Anyone have ideas how we could add testing for upgrade? Obviously
> > > > something
> > > > > > we're missing, esp given it's import.
> > > >
> > > > I will send an email next days with a proposal.
> > > > btw my idea is very like Andor's one
> > > >
> > > > Once we have an automatic environment we can launch from Jenkins
> > > >
> > > > Enrico
> > > >
> > > >
> > > > > >
> > > > > > Patrick
> > > > > >
> > > > > > On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
> > > eolivelli@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> > > > > >> <sz...@gmail.com> ha scritto:
> > > > > >>>
> > > > > >>> Hi All,
> > > > > >>>
> > > > > >>> about the question from Michael:
> > > > > >>>> Regarding the fix, can we just make 3.6.0 aware of the old
> > > protocol
> > > > and
> > > > > >>>> speak old message format when it's talking to old server?
> > > > > >>>
> > > > > >>> In this particular case, it might be enough. The protocol
> change
> > > > happened
> > > > > >>> now in the 'initial message' sent by the QuorumCnxManager.
> Maybe
> > it
> > > > is
> > > > > >> not
> > > > > >>> a problem if the new servers can not initiate channels to the
> old
> > > > > >> servers,
> > > > > >>> maybe it is enough if these channel gets initiated by the old
> > > servers
> > > > > >> only.
> > > > > >>> I will test it quickly.
> > > > > >>>
> > > > > >>> Although I have no idea if any other thing changed in the
> quorum
> > > > protocol
> > > > > >>> between 3.5 and 3.6. In other cases it might not be enough if
> the
> > > new
> > > > > >>> servers can understand the old messages, as the old servers can
> > > > break by
> > > > > >>> not understanding the messages from the new servers. Also, in
> the
> > > > code
> > > > > >>> currently (AFAIK) there is no generic knowledge of protocol
> > > > versions, the
> > > > > >>> servers are not storing that which protocol versions they
> > > can/should
> > > > use
> > > > > >> to
> > > > > >>> communicate to which particular other servers. Maybe we don't
> > even
> > > > need
> > > > > >>> this, but I would feel better if we would have more tests
> around
> > > > these
> > > > > >>> things.
> > > > > >>>
> > > > > >>> My suggestion for the long term:
> > > > > >>> - let's fix this particular issue now with 3.6.0 quickly (I
> start
> > > > doing
> > > > > >>> this today)
> > > > > >>> - let's do some automation (backed up with jenkins) that will
> > test
> > > a
> > > > > >> whole
> > > > > >>> combinations of different ZooKeeper upgrade paths by making
> > rolling
> > > > > >>> upgrades during some light traffic. Let's have a bit better
> > > > definition
> > > > > >>> about what we expect (e.g. the quorum is up, but some clients
> can
> > > get
> > > > > >>> disconnected? What will happen to the ephemeral nodes? Do we
> want
> > > to
> > > > > >>> gracefully close or transfer the user sessions before stopping
> > the
> > > > old
> > > > > >>> server?) and let's see where this broke. Just by checking the
> > > code, I
> > > > > >> don't
> > > > > >>> think the quorum will always be up (e.g. between older 3.4
> > versions
> > > > and
> > > > > >>> 3.5).
> > > > > >>
> > > > > >>
> > > > > >> I am happy to work on this topic
> > > > > >>
> > > > > >>> - we need to update the Wiki about the working rolling upgrade
> > > paths
> > > > and
> > > > > >>> maybe about workarounds if needed
> > > > > >>> - we might need to do some fixes (adding backward compatible
> > > versions
> > > > > >>> and/or specific parameters that enforce old protocol temporary
> > > > during the
> > > > > >>> rolling upgrade that can be changed later to the new protocol
> by
> > > > either
> > > > > >>> dynamic reconfig or by rolling restart)
> > > > > >>
> > > > > >> it would be much better on 3.6 code to have some support for
> > > > > >> compatibility with 3.5 servers
> > > > > >> we can't require old code to be forward compatible but we can
> make
> > > new
> > > > > >> code be compatible to a certain extend with old code.
> > > > > >> If we can achieve this compatibility goal without a flag is
> > better,
> > > > > >> users won't have to care about this part and they simply "trust"
> > on
> > > us
> > > > > >>
> > > > > >> The rollback story is also important, but maybe we are still not
> > > ready
> > > > > >> for it, in case of local changes to store,
> > > > > >> it is better to have a clear design and plan and work for a new
> > > > release
> > > > > >> (3.7?)
> > > > > >>
> > > > > >> Enrico
> > > > > >>
> > > > > >>>
> > > > > >>> Depending on your comments, I am happy to create a few Jira
> > tickets
> > > > > >> around
> > > > > >>> these topics.
> > > > > >>>
> > > > > >>> Kind regards,
> > > > > >>> Mate
> > > > > >>>
> > > > > >>> ps. Enrico, sorry about your RC... I owe you a beer, let me
> know
> > if
> > > > you
> > > > > >> are
> > > > > >>> near to Budapest ;)
> > > > > >>>
> > > > > >>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
> > > eolivelli@gmail.com
> > > > >
> > > > > >> wrote:
> > > > > >>>
> > > > > >>>> Good.
> > > > > >>>>
> > > > > >>>> I will cancel the vote for 3.6.0rc2.
> > > > > >>>>
> > > > > >>>> I appreciate very much If Mate and his colleagues have time to
> > > work
> > > > on
> > > > > >> a
> > > > > >>>> fix.
> > > > > >>>> Otherwise I will have cycles next week
> > > > > >>>>
> > > > > >>>> I would also like to spend my time in setting up a few minimal
> > > > > >> integration
> > > > > >>>> tests about the upgrade story
> > > > > >>>>
> > > > > >>>> Enrico
> > > > > >>>>
> > > > > >>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
> > > scritto:
> > > > > >>>>
> > > > > >>>>> Kudos Enrico, very thorough work as the final gate keeper of
> > the
> > > > > >> release!
> > > > > >>>>>
> > > > > >>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> > > > > >>>>>
> > > > > >>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one
> of
> > > the
> > > > > >> rare
> > > > > >>>>> piece of software that put so much emphasis on
> compatibilities
> > > thus
> > > > > >> it
> > > > > >>>> just
> > > > > >>>>> works when upgrade / downgrade, which is amazing. One
> guarantee
> > > we
> > > > > >> always
> > > > > >>>>> had is during rolling upgrade, the quorum will always be
> > > available,
> > > > > >>>> leading
> > > > > >>>>> to no service interruption. It would be sad we lose such
> > > capability
> > > > > >> given
> > > > > >>>>> this is still a tractable problem.
> > > > > >>>>>
> > > > > >>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> > > protocol
> > > > > >> and
> > > > > >>>>> speak old message format when it's talking to old server?
> > > > Basically,
> > > > > >> an
> > > > > >>>>> ugly if else check against the protocol version should work
> and
> > > > > >> there is
> > > > > >>>> no
> > > > > >>>>> need to have multiple pass on rolling upgrade process.
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> > > > > >> eolivelli@gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> I suggest this plan:
> > > > > >>>>>> - release 3.6.0 now
> > > > > >>>>>> - improve the migration story, the flow outlined by Mate is
> > > > > >>>>>> interesting, but it will take time
> > > > > >>>>>>
> > > > > >>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize
> > the
> > > > > >>>>>> release this evening (within 8-10 hours) if no one comes out
> > in
> > > > the
> > > > > >>>>>> VOTE thread with a -1
> > > > > >>>>>>
> > > > > >>>>>> Enrico
> > > > > >>>>>>
> > > > > >>>>>> Enrico
> > > > > >>>>>>
> > > > > >>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> > > > > >>>>>> <ph...@apache.org> ha scritto:
> > > > > >>>>>>>
> > > > > >>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <
> > andor@apache.org
> > > >
> > > > > >>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi,
> > > > > >>>>>>>>
> > > > > >>>>>>>> Answers inline.
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>> In my experience when you are close to a release it is
> > > > > >> better to
> > > > > >>>> to
> > > > > >>>>>>>>> make big changes. (I am among the approvers of that
> patch,
> > > > > >> so I
> > > > > >>>> am
> > > > > >>>>>>>>> responsible for this change)
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Although this statement is acceptable for me, I don’t feel
> > > this
> > > > > >>>> patch
> > > > > >>>>>>>> should not have been merged into 3.6.0. Submission has
> been
> > > > > >>>> preceded
> > > > > >>>>>> by a
> > > > > >>>>>>>> long argument with MAPR folks who originally wanted to be
> > > > > >> merged
> > > > > >>>> into
> > > > > >>>>>> 3.4
> > > > > >>>>>>>> branch (considering the pace how ZooKeeper community is
> > moving
> > > > > >>>>>> forward) and
> > > > > >>>>>>>> we reached an agreement that release it with 3.6.0.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Make a long story short, this patch has been outstanding
> for
> > > > > >> ages
> > > > > >>>>>> without
> > > > > >>>>>>>> much attention from the community and contributors made a
> > lot
> > > > > >> of
> > > > > >>>>>> effort to
> > > > > >>>>>>>> get it done before the release.
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>> I would like to ear from people that have been in the
> > > > > >> community
> > > > > >>>> for
> > > > > >>>>>>>>> long time, then I am ready to complete the release
> process
> > > > > >> for
> > > > > >>>>>>>>> 3.6.0rc2.
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Me too.
> > > > > >>>>>>>>
> > > > > >>>>>>>> I tend to accept the way rolling restart works now - as
> you
> > > > > >>>> described
> > > > > >>>>>>>> Enrico - and given that situation was pretty much the same
> > > > > >> between
> > > > > >>>>> 3.4
> > > > > >>>>>> and
> > > > > >>>>>>>> 3.5, I don’t feel we have to make additional changes.
> > > > > >>>>>>>>
> > > > > >>>>>>>> On the other hand, the fix that Mate suggested sounds
> quite
> > > > > >> cool,
> > > > > >>>> I’m
> > > > > >>>>>> also
> > > > > >>>>>>>> happy to work on getting it in.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Fyi, Release Management page says the following:
> > > > > >>>>>>>>
> > > > > >>>>>>
> > > > > >>>>
> > > > > >>
> > > >
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > > > > >>>>>>>>
> > > > > >>>>>>>> "major.minor release of ZooKeeper must be backwards
> > compatible
> > > > > >> with
> > > > > >>>>> the
> > > > > >>>>>>>> previous minor release, major.(minor-1)"
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>> Our users, direct and indirect, value the ability to
> migrate
> > to
> > > > > >> newer
> > > > > >>>>>>> versions - esp as we drop support for older. Frictions such
> > as
> > > > > >> this
> > > > > >>>> can
> > > > > >>>>>> be
> > > > > >>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given
> > our
> > > > > >>>>> published
> > > > > >>>>>>> guidelines.
> > > > > >>>>>>>
> > > > > >>>>>>> Patrick
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>> Andor
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
> > > > > >> eolivelli@gmail.com
> > > > > >>>>>
> > > > > >>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thank you Mate for checking and explaining this story.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I find it very interesting that the cause is
> ZOOKEEPER-3188
> > > > > >> as:
> > > > > >>>>>>>>> - it is the last "big patch" committed to 3.6 before
> > > > > >> starting the
> > > > > >>>>>>>>> release process
> > > > > >>>>>>>>> - it is the cause of the failure of the first RC
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> In my experience when you are close to a release it is
> > > > > >> better to
> > > > > >>>> to
> > > > > >>>>>>>>> make big changes. (I am among the approvers of that
> patch,
> > > > > >> so I
> > > > > >>>> am
> > > > > >>>>>>>>> responsible for this change)
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> This is a pointer to the change to whom who wants to
> > > > > >> understand
> > > > > >>>>>> better
> > > > > >>>>>>>>> the context
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>
> > > >
> > >
> >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was
> the
> > > > > >> same
> > > > > >>>>> and
> > > > > >>>>>>>>> if this statement holds then I feel we can continue
> > > > > >>>>>>>>> with this release.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is
> > too
> > > > > >>>>>> complex.
> > > > > >>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and
> we
> > > > > >> do
> > > > > >>>> not
> > > > > >>>>>>>>> have tools to certify this compatibility (at least not in
> > the
> > > > > >>>> short
> > > > > >>>>>>>>> term)
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I would like to ear from people that have been in the
> > > > > >> community
> > > > > >>>> for
> > > > > >>>>>>>>> long time, then I am ready to complete the release
> process
> > > > > >> for
> > > > > >>>>>>>>> 3.6.0rc2.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I will update the website and the release notes with a
> > > > > >> specific
> > > > > >>>>>>>>> warning about the upgrade, we should also update the Wiki
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Enrico
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > > > >>>>>>>>> <sz...@gmail.com> ha scritto:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Hi Enrico!
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
> > > > > >>>>>>>> QuorumCnxManager.
> > > > > >>>>>>>>>> The Protocol version  was changed last time in
> > > > > >> ZOOKEEPER-2186
> > > > > >>>>>> released
> > > > > >>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix
> some
> > > > > >> bugs.
> > > > > >>>>>> Later I
> > > > > >>>>>>>>>> also changed the protocol version when the format of the
> > > > > >> initial
> > > > > >>>>>> message
> > > > > >>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum
> protocol
> > > > > >> is
> > > > > >>>> not
> > > > > >>>>>>>>>> compatible in this case and is the 'expected' behavior
> if
> > > > > >> you
> > > > > >>>>>> upgrade
> > > > > >>>>>>>> e.g
> > > > > >>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6
> > to
> > > > > >>>> 3.6.0.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
> > > > > >> then and
> > > > > >>>>>> got to
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>> conclusion that it is not that bad, as there will be no
> > data
> > > > > >>>> loss
> > > > > >>>>>> as you
> > > > > >>>>>>>>>> wrote. The tricky thing is that during rolling upgrade
> we
> > > > > >> should
> > > > > >>>>>> ensure
> > > > > >>>>>>>>>> both backward and forward compatibility to make sure
> that
> > > > > >> the
> > > > > >>>> old
> > > > > >>>>>> and
> > > > > >>>>>>>> the
> > > > > >>>>>>>>>> new part of the quorum can still speak to each other.
> The
> > > > > >>>> current
> > > > > >>>>>>>> solution
> > > > > >>>>>>>>>> (simply failing if the protocol versions mismatch) is
> more
> > > > > >>>> simple
> > > > > >>>>>> and
> > > > > >>>>>>>> still
> > > > > >>>>>>>>>> working just fine: as the servers are restarted
> > one-by-one,
> > > > > >> the
> > > > > >>>>>> nodes
> > > > > >>>>>>>> with
> > > > > >>>>>>>>>> the old protocol version and the nodes with the new
> > protocol
> > > > > >>>>> version
> > > > > >>>>>>>> will
> > > > > >>>>>>>>>> form two partitions, but any given time only one
> partition
> > > > > >> will
> > > > > >>>>>> have the
> > > > > >>>>>>>>>> quorum.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Still, thinking it trough, as a side effect in these
> cases
> > > > > >> there
> > > > > >>>>>> will
> > > > > >>>>>>>> be a
> > > > > >>>>>>>>>> short time when none of the partitions will have quorums
> > > > > >> (when
> > > > > >>>> we
> > > > > >>>>>> have N
> > > > > >>>>>>>>>> servers with the old protocol version, N servers with
> the
> > > > > >> new
> > > > > >>>>>> protocol
> > > > > >>>>>>>>>> version, and there is one server just being restarted).
> I
> > > > > >> am not
> > > > > >>>>>> sure
> > > > > >>>>>>>> if we
> > > > > >>>>>>>>>> can accept this.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
> > > > > >> possible
> > > > > >>>> to
> > > > > >>>>>> parse
> > > > > >>>>>>>>>> the initial message of the old protocol version with the
> > new
> > > > > >>>> code.
> > > > > >>>>>> But
> > > > > >>>>>>>> I am
> > > > > >>>>>>>>>> not sure if it would be enough (as the old code will not
> > be
> > > > > >> able
> > > > > >>>>> to
> > > > > >>>>>>>> parse
> > > > > >>>>>>>>>> the new initial message).
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> One option can be to make a patch also for 3.5 to have a
> > > > > >> version
> > > > > >>>>>> which
> > > > > >>>>>>>>>> supports both protocol versions. (let's say in 3.5.8)
> Then
> > > > > >> we
> > > > > >>>> can
> > > > > >>>>>> write
> > > > > >>>>>>>> to
> > > > > >>>>>>>>>> the release note, that if you need rolling upgrade from
> > any
> > > > > >>>>> versions
> > > > > >>>>>>>> since
> > > > > >>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
> > > > > >>>> upgrading
> > > > > >>>>> to
> > > > > >>>>>>>> 3.6.0.
> > > > > >>>>>>>>>> We can even make the same thing on the 3.4 branch.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> But I am also new to the community... It would be great
> to
> > > > > >> hear
> > > > > >>>>> the
> > > > > >>>>>>>> opinion
> > > > > >>>>>>>>>> of more experienced people.
> > > > > >>>>>>>>>> Whatever the decision will be, I am happy to make the
> > > > > >> changes.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> And sorry for breaking the RC (if we decide that this
> > needs
> > > > > >> to
> > > > > >>>> be
> > > > > >>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Kind regards,
> > > > > >>>>>>>>>> Mate
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> > > > > >>>>>> eolivelli@gmail.com>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Hi,
> > > > > >>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
> > > > > >> closing the
> > > > > >>>>>> VOTE
> > > > > >>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to
> > an
> > > > > >>>>> apparent
> > > > > >>>>>>>>>>> blocker.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> > > > > >> looks
> > > > > >>>>> like
> > > > > >>>>>>>>>>> peers are not able to talk to each other.
> > > > > >>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
> > > > > >>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> > > > > >> errors on
> > > > > >>>>> 3.5
> > > > > >>>>>>>> nodes:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > > > > >>>>>>>>>>> [localhost/127.0.0.1:3334
> :QuorumCnxManager$Listener@918]
> > -
> > > > > >>>>>> Received
> > > > > >>>>>>>>>>> connection request 127.0.0.1:62591
> > > > > >>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > > > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>
> > > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > > > > >>>>>>>>>>> Got unrecognized protocol version -65535
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Once I upgrade all of the peers the system is up and
> > > > > >> running,
> > > > > >>>>>> without
> > > > > >>>>>>>>>>> apparently no data loss.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
> > > > > >> say,
> > > > > >>>>>> server1,
> > > > > >>>>>>>>>>> server1 is not able to accept connections (error "Close
> > of
> > > > > >>>>> session
> > > > > >>>>>> 0x0
> > > > > >>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")
> from
> > > > > >>>> clients,
> > > > > >>>>>> this
> > > > > >>>>>>>>>>> is expected, because as far as it cannot talk with the
> > > > > >> other
> > > > > >>>>> peers
> > > > > >>>>>> it
> > > > > >>>>>>>>>>> is practically partitioned away from the cluster.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> My questions are:
> > > > > >>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
> > > > > >> from
> > > > > >>>> 3.5
> > > > > >>>>> to
> > > > > >>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long
> > ago,
> > > > > >>>> and I
> > > > > >>>>>> was
> > > > > >>>>>>>>>>> not in the community as dev so I cannot tell
> > > > > >>>>>>>>>>> 2) is this a viable option for users ? to have some
> > > > > >> temporary
> > > > > >>>>>> glitch
> > > > > >>>>>>>>>>> during the upgrade and hope that the upgrade completes
> > > > > >> without
> > > > > >>>>>>>>>>> troubles ?
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> In theory as long as two servers are running the same
> > major
> > > > > >>>>> version
> > > > > >>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
> > > > > >> make
> > > > > >>>>>> progress
> > > > > >>>>>>>>>>> and to server clients.
> > > > > >>>>>>>>>>> I feel that this is quite dangerous, but I don't have
> > > > > >> enough
> > > > > >>>>>> context
> > > > > >>>>>>>>>>> to understand how this problem is possible and when we
> > > > > >> decided
> > > > > >>>> to
> > > > > >>>>>>>>>>> break compatibility.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> The other option is that I am wrong in my test and I am
> > > > > >> messing
> > > > > >>>>> up
> > > > > >>>>>> :-)
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> The other upgrade path I would like to see working
> like a
> > > > > >> charm
> > > > > >>>>> is
> > > > > >>>>>> the
> > > > > >>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
> > > > > >> release
> > > > > >>>> 3.6
> > > > > >>>>> we
> > > > > >>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Regards
> > > > > >>>>>>>>>>> Enrico
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> >
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Szalay-Bekő Máté <sz...@gmail.com>.
I see the main problem here in the fact that we are missing proper
versioning in the leader election / quorum protocols. I tried to simply
implement backward compatibility in 3.6, but it didn't solve the problem.
The new code understands the old protocol, but it can not decide when to
use the new or the old protocol during connection initiation. So the old
servers can not read the new init messages and we still temporarly end up
having two partitions during rolling restart.

I already suggested two ways to handle this later, but I think for 3.6.0
now the simplest solution is to disable the new MultiAddress feature and
stick to the old protocol version by default. Plus extend the documentation
with the note, that enabling the MultiAddress feature is not possible
during a rolling upgrade, but it needs to be done with a separate rolling
restart. With this approach, the rolling restart should "just work" with
the 3.4 / 3.5 configs and we don't require any extra step / configuration
from the users, unless they want to use the new feature. I plan to submit a
PR with these changes tomorrow to ZOOKEEPER-3720, if there isn't any
different opinion.

P.S. For 4.0 we might need to put some extra thinking into backward
compatibility / versioning for the quorum and client protocols.


On Tue, Feb 11, 2020, 20:44 Michael K. Edwards <m....@gmail.com>
wrote:

> I hate to say it, but I think 3.6.0 should release as is.  It is impossible
> to *reliably* retrofit backwards compatibility / interoperability onto a
> release that was engineered from the beginning without that goal.  Learn
> the lesson, set goals differently in the future.
>
> On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <
> szalay.beko.mate@gmail.com>
> wrote:
>
> > FYI: I created these scripts for my local tests:
> > https://github.com/symat/zk-rolling-upgrade-test
> >
> > For the long term I would also add some script that actually monitors the
> > state of the quorum and also runs continuous traffic, not just 1-2
> > smoketests after each restart. But I don't know how important this would
> > be.
> >
> > On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
> > > <an...@apache.org> ha scritto:
> > > >
> > > > The most obvious one which crosses my mind is that I previously
> worked
> > > on:
> > > >
> > > > 1) run old version cluster,
> > > > 2) connect to each node and run smoke tests,
> > > > 3) restart one node with new code,
> > > > 4) goto 2) until all nodes are upgraded
> > > >
> > > > I think this wouldn’t work in a “unit test”, we probably need a
> > separate
> > > Jenkins job and a nice python script to do this.
> > > >
> > > > Andor
> > > >
> > > >
> > > >
> > > >
> > > > > On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org> wrote:
> > > > >
> > > > > Anyone have ideas how we could add testing for upgrade? Obviously
> > > something
> > > > > we're missing, esp given it's import.
> > >
> > > I will send an email next days with a proposal.
> > > btw my idea is very like Andor's one
> > >
> > > Once we have an automatic environment we can launch from Jenkins
> > >
> > > Enrico
> > >
> > >
> > > > >
> > > > > Patrick
> > > > >
> > > > > On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
> > eolivelli@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> > > > >> <sz...@gmail.com> ha scritto:
> > > > >>>
> > > > >>> Hi All,
> > > > >>>
> > > > >>> about the question from Michael:
> > > > >>>> Regarding the fix, can we just make 3.6.0 aware of the old
> > protocol
> > > and
> > > > >>>> speak old message format when it's talking to old server?
> > > > >>>
> > > > >>> In this particular case, it might be enough. The protocol change
> > > happened
> > > > >>> now in the 'initial message' sent by the QuorumCnxManager. Maybe
> it
> > > is
> > > > >> not
> > > > >>> a problem if the new servers can not initiate channels to the old
> > > > >> servers,
> > > > >>> maybe it is enough if these channel gets initiated by the old
> > servers
> > > > >> only.
> > > > >>> I will test it quickly.
> > > > >>>
> > > > >>> Although I have no idea if any other thing changed in the quorum
> > > protocol
> > > > >>> between 3.5 and 3.6. In other cases it might not be enough if the
> > new
> > > > >>> servers can understand the old messages, as the old servers can
> > > break by
> > > > >>> not understanding the messages from the new servers. Also, in the
> > > code
> > > > >>> currently (AFAIK) there is no generic knowledge of protocol
> > > versions, the
> > > > >>> servers are not storing that which protocol versions they
> > can/should
> > > use
> > > > >> to
> > > > >>> communicate to which particular other servers. Maybe we don't
> even
> > > need
> > > > >>> this, but I would feel better if we would have more tests around
> > > these
> > > > >>> things.
> > > > >>>
> > > > >>> My suggestion for the long term:
> > > > >>> - let's fix this particular issue now with 3.6.0 quickly (I start
> > > doing
> > > > >>> this today)
> > > > >>> - let's do some automation (backed up with jenkins) that will
> test
> > a
> > > > >> whole
> > > > >>> combinations of different ZooKeeper upgrade paths by making
> rolling
> > > > >>> upgrades during some light traffic. Let's have a bit better
> > > definition
> > > > >>> about what we expect (e.g. the quorum is up, but some clients can
> > get
> > > > >>> disconnected? What will happen to the ephemeral nodes? Do we want
> > to
> > > > >>> gracefully close or transfer the user sessions before stopping
> the
> > > old
> > > > >>> server?) and let's see where this broke. Just by checking the
> > code, I
> > > > >> don't
> > > > >>> think the quorum will always be up (e.g. between older 3.4
> versions
> > > and
> > > > >>> 3.5).
> > > > >>
> > > > >>
> > > > >> I am happy to work on this topic
> > > > >>
> > > > >>> - we need to update the Wiki about the working rolling upgrade
> > paths
> > > and
> > > > >>> maybe about workarounds if needed
> > > > >>> - we might need to do some fixes (adding backward compatible
> > versions
> > > > >>> and/or specific parameters that enforce old protocol temporary
> > > during the
> > > > >>> rolling upgrade that can be changed later to the new protocol by
> > > either
> > > > >>> dynamic reconfig or by rolling restart)
> > > > >>
> > > > >> it would be much better on 3.6 code to have some support for
> > > > >> compatibility with 3.5 servers
> > > > >> we can't require old code to be forward compatible but we can make
> > new
> > > > >> code be compatible to a certain extend with old code.
> > > > >> If we can achieve this compatibility goal without a flag is
> better,
> > > > >> users won't have to care about this part and they simply "trust"
> on
> > us
> > > > >>
> > > > >> The rollback story is also important, but maybe we are still not
> > ready
> > > > >> for it, in case of local changes to store,
> > > > >> it is better to have a clear design and plan and work for a new
> > > release
> > > > >> (3.7?)
> > > > >>
> > > > >> Enrico
> > > > >>
> > > > >>>
> > > > >>> Depending on your comments, I am happy to create a few Jira
> tickets
> > > > >> around
> > > > >>> these topics.
> > > > >>>
> > > > >>> Kind regards,
> > > > >>> Mate
> > > > >>>
> > > > >>> ps. Enrico, sorry about your RC... I owe you a beer, let me know
> if
> > > you
> > > > >> are
> > > > >>> near to Budapest ;)
> > > > >>>
> > > > >>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
> > eolivelli@gmail.com
> > > >
> > > > >> wrote:
> > > > >>>
> > > > >>>> Good.
> > > > >>>>
> > > > >>>> I will cancel the vote for 3.6.0rc2.
> > > > >>>>
> > > > >>>> I appreciate very much If Mate and his colleagues have time to
> > work
> > > on
> > > > >> a
> > > > >>>> fix.
> > > > >>>> Otherwise I will have cycles next week
> > > > >>>>
> > > > >>>> I would also like to spend my time in setting up a few minimal
> > > > >> integration
> > > > >>>> tests about the upgrade story
> > > > >>>>
> > > > >>>> Enrico
> > > > >>>>
> > > > >>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
> > scritto:
> > > > >>>>
> > > > >>>>> Kudos Enrico, very thorough work as the final gate keeper of
> the
> > > > >> release!
> > > > >>>>>
> > > > >>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> > > > >>>>>
> > > > >>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of
> > the
> > > > >> rare
> > > > >>>>> piece of software that put so much emphasis on compatibilities
> > thus
> > > > >> it
> > > > >>>> just
> > > > >>>>> works when upgrade / downgrade, which is amazing. One guarantee
> > we
> > > > >> always
> > > > >>>>> had is during rolling upgrade, the quorum will always be
> > available,
> > > > >>>> leading
> > > > >>>>> to no service interruption. It would be sad we lose such
> > capability
> > > > >> given
> > > > >>>>> this is still a tractable problem.
> > > > >>>>>
> > > > >>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> > protocol
> > > > >> and
> > > > >>>>> speak old message format when it's talking to old server?
> > > Basically,
> > > > >> an
> > > > >>>>> ugly if else check against the protocol version should work and
> > > > >> there is
> > > > >>>> no
> > > > >>>>> need to have multiple pass on rolling upgrade process.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> > > > >> eolivelli@gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> I suggest this plan:
> > > > >>>>>> - release 3.6.0 now
> > > > >>>>>> - improve the migration story, the flow outlined by Mate is
> > > > >>>>>> interesting, but it will take time
> > > > >>>>>>
> > > > >>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize
> the
> > > > >>>>>> release this evening (within 8-10 hours) if no one comes out
> in
> > > the
> > > > >>>>>> VOTE thread with a -1
> > > > >>>>>>
> > > > >>>>>> Enrico
> > > > >>>>>>
> > > > >>>>>> Enrico
> > > > >>>>>>
> > > > >>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> > > > >>>>>> <ph...@apache.org> ha scritto:
> > > > >>>>>>>
> > > > >>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <
> andor@apache.org
> > >
> > > > >>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi,
> > > > >>>>>>>>
> > > > >>>>>>>> Answers inline.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>> In my experience when you are close to a release it is
> > > > >> better to
> > > > >>>> to
> > > > >>>>>>>>> make big changes. (I am among the approvers of that patch,
> > > > >> so I
> > > > >>>> am
> > > > >>>>>>>>> responsible for this change)
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Although this statement is acceptable for me, I don’t feel
> > this
> > > > >>>> patch
> > > > >>>>>>>> should not have been merged into 3.6.0. Submission has been
> > > > >>>> preceded
> > > > >>>>>> by a
> > > > >>>>>>>> long argument with MAPR folks who originally wanted to be
> > > > >> merged
> > > > >>>> into
> > > > >>>>>> 3.4
> > > > >>>>>>>> branch (considering the pace how ZooKeeper community is
> moving
> > > > >>>>>> forward) and
> > > > >>>>>>>> we reached an agreement that release it with 3.6.0.
> > > > >>>>>>>>
> > > > >>>>>>>> Make a long story short, this patch has been outstanding for
> > > > >> ages
> > > > >>>>>> without
> > > > >>>>>>>> much attention from the community and contributors made a
> lot
> > > > >> of
> > > > >>>>>> effort to
> > > > >>>>>>>> get it done before the release.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>> I would like to ear from people that have been in the
> > > > >> community
> > > > >>>> for
> > > > >>>>>>>>> long time, then I am ready to complete the release process
> > > > >> for
> > > > >>>>>>>>> 3.6.0rc2.
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Me too.
> > > > >>>>>>>>
> > > > >>>>>>>> I tend to accept the way rolling restart works now - as you
> > > > >>>> described
> > > > >>>>>>>> Enrico - and given that situation was pretty much the same
> > > > >> between
> > > > >>>>> 3.4
> > > > >>>>>> and
> > > > >>>>>>>> 3.5, I don’t feel we have to make additional changes.
> > > > >>>>>>>>
> > > > >>>>>>>> On the other hand, the fix that Mate suggested sounds quite
> > > > >> cool,
> > > > >>>> I’m
> > > > >>>>>> also
> > > > >>>>>>>> happy to work on getting it in.
> > > > >>>>>>>>
> > > > >>>>>>>> Fyi, Release Management page says the following:
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>
> > > > >>
> > >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > > > >>>>>>>>
> > > > >>>>>>>> "major.minor release of ZooKeeper must be backwards
> compatible
> > > > >> with
> > > > >>>>> the
> > > > >>>>>>>> previous minor release, major.(minor-1)"
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>> Our users, direct and indirect, value the ability to migrate
> to
> > > > >> newer
> > > > >>>>>>> versions - esp as we drop support for older. Frictions such
> as
> > > > >> this
> > > > >>>> can
> > > > >>>>>> be
> > > > >>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given
> our
> > > > >>>>> published
> > > > >>>>>>> guidelines.
> > > > >>>>>>>
> > > > >>>>>>> Patrick
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>> Andor
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
> > > > >> eolivelli@gmail.com
> > > > >>>>>
> > > > >>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thank you Mate for checking and explaining this story.
> > > > >>>>>>>>>
> > > > >>>>>>>>> I find it very interesting that the cause is ZOOKEEPER-3188
> > > > >> as:
> > > > >>>>>>>>> - it is the last "big patch" committed to 3.6 before
> > > > >> starting the
> > > > >>>>>>>>> release process
> > > > >>>>>>>>> - it is the cause of the failure of the first RC
> > > > >>>>>>>>>
> > > > >>>>>>>>> In my experience when you are close to a release it is
> > > > >> better to
> > > > >>>> to
> > > > >>>>>>>>> make big changes. (I am among the approvers of that patch,
> > > > >> so I
> > > > >>>> am
> > > > >>>>>>>>> responsible for this change)
> > > > >>>>>>>>>
> > > > >>>>>>>>> This is a pointer to the change to whom who wants to
> > > > >> understand
> > > > >>>>>> better
> > > > >>>>>>>>> the context
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>
> > >
> >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > > >>>>>>>>>
> > > > >>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was the
> > > > >> same
> > > > >>>>> and
> > > > >>>>>>>>> if this statement holds then I feel we can continue
> > > > >>>>>>>>> with this release.
> > > > >>>>>>>>>
> > > > >>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is
> too
> > > > >>>>>> complex.
> > > > >>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and we
> > > > >> do
> > > > >>>> not
> > > > >>>>>>>>> have tools to certify this compatibility (at least not in
> the
> > > > >>>> short
> > > > >>>>>>>>> term)
> > > > >>>>>>>>>
> > > > >>>>>>>>> I would like to ear from people that have been in the
> > > > >> community
> > > > >>>> for
> > > > >>>>>>>>> long time, then I am ready to complete the release process
> > > > >> for
> > > > >>>>>>>>> 3.6.0rc2.
> > > > >>>>>>>>>
> > > > >>>>>>>>> I will update the website and the release notes with a
> > > > >> specific
> > > > >>>>>>>>> warning about the upgrade, we should also update the Wiki
> > > > >>>>>>>>>
> > > > >>>>>>>>> Enrico
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > > >>>>>>>>> <sz...@gmail.com> ha scritto:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Hi Enrico!
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
> > > > >>>>>>>> QuorumCnxManager.
> > > > >>>>>>>>>> The Protocol version  was changed last time in
> > > > >> ZOOKEEPER-2186
> > > > >>>>>> released
> > > > >>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some
> > > > >> bugs.
> > > > >>>>>> Later I
> > > > >>>>>>>>>> also changed the protocol version when the format of the
> > > > >> initial
> > > > >>>>>> message
> > > > >>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum protocol
> > > > >> is
> > > > >>>> not
> > > > >>>>>>>>>> compatible in this case and is the 'expected' behavior if
> > > > >> you
> > > > >>>>>> upgrade
> > > > >>>>>>>> e.g
> > > > >>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6
> to
> > > > >>>> 3.6.0.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
> > > > >> then and
> > > > >>>>>> got to
> > > > >>>>>>>> the
> > > > >>>>>>>>>> conclusion that it is not that bad, as there will be no
> data
> > > > >>>> loss
> > > > >>>>>> as you
> > > > >>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
> > > > >> should
> > > > >>>>>> ensure
> > > > >>>>>>>>>> both backward and forward compatibility to make sure that
> > > > >> the
> > > > >>>> old
> > > > >>>>>> and
> > > > >>>>>>>> the
> > > > >>>>>>>>>> new part of the quorum can still speak to each other. The
> > > > >>>> current
> > > > >>>>>>>> solution
> > > > >>>>>>>>>> (simply failing if the protocol versions mismatch) is more
> > > > >>>> simple
> > > > >>>>>> and
> > > > >>>>>>>> still
> > > > >>>>>>>>>> working just fine: as the servers are restarted
> one-by-one,
> > > > >> the
> > > > >>>>>> nodes
> > > > >>>>>>>> with
> > > > >>>>>>>>>> the old protocol version and the nodes with the new
> protocol
> > > > >>>>> version
> > > > >>>>>>>> will
> > > > >>>>>>>>>> form two partitions, but any given time only one partition
> > > > >> will
> > > > >>>>>> have the
> > > > >>>>>>>>>> quorum.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Still, thinking it trough, as a side effect in these cases
> > > > >> there
> > > > >>>>>> will
> > > > >>>>>>>> be a
> > > > >>>>>>>>>> short time when none of the partitions will have quorums
> > > > >> (when
> > > > >>>> we
> > > > >>>>>> have N
> > > > >>>>>>>>>> servers with the old protocol version, N servers with the
> > > > >> new
> > > > >>>>>> protocol
> > > > >>>>>>>>>> version, and there is one server just being restarted). I
> > > > >> am not
> > > > >>>>>> sure
> > > > >>>>>>>> if we
> > > > >>>>>>>>>> can accept this.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
> > > > >> possible
> > > > >>>> to
> > > > >>>>>> parse
> > > > >>>>>>>>>> the initial message of the old protocol version with the
> new
> > > > >>>> code.
> > > > >>>>>> But
> > > > >>>>>>>> I am
> > > > >>>>>>>>>> not sure if it would be enough (as the old code will not
> be
> > > > >> able
> > > > >>>>> to
> > > > >>>>>>>> parse
> > > > >>>>>>>>>> the new initial message).
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> One option can be to make a patch also for 3.5 to have a
> > > > >> version
> > > > >>>>>> which
> > > > >>>>>>>>>> supports both protocol versions. (let's say in 3.5.8) Then
> > > > >> we
> > > > >>>> can
> > > > >>>>>> write
> > > > >>>>>>>> to
> > > > >>>>>>>>>> the release note, that if you need rolling upgrade from
> any
> > > > >>>>> versions
> > > > >>>>>>>> since
> > > > >>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
> > > > >>>> upgrading
> > > > >>>>> to
> > > > >>>>>>>> 3.6.0.
> > > > >>>>>>>>>> We can even make the same thing on the 3.4 branch.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> But I am also new to the community... It would be great to
> > > > >> hear
> > > > >>>>> the
> > > > >>>>>>>> opinion
> > > > >>>>>>>>>> of more experienced people.
> > > > >>>>>>>>>> Whatever the decision will be, I am happy to make the
> > > > >> changes.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> And sorry for breaking the RC (if we decide that this
> needs
> > > > >> to
> > > > >>>> be
> > > > >>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Kind regards,
> > > > >>>>>>>>>> Mate
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> > > > >>>>>> eolivelli@gmail.com>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi,
> > > > >>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
> > > > >> closing the
> > > > >>>>>> VOTE
> > > > >>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to
> an
> > > > >>>>> apparent
> > > > >>>>>>>>>>> blocker.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> > > > >> looks
> > > > >>>>> like
> > > > >>>>>>>>>>> peers are not able to talk to each other.
> > > > >>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
> > > > >>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> > > > >> errors on
> > > > >>>>> 3.5
> > > > >>>>>>>> nodes:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918]
> -
> > > > >>>>>> Received
> > > > >>>>>>>>>>> connection request 127.0.0.1:62591
> > > > >>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > > > >>>>>>>>>>> Got unrecognized protocol version -65535
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Once I upgrade all of the peers the system is up and
> > > > >> running,
> > > > >>>>>> without
> > > > >>>>>>>>>>> apparently no data loss.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
> > > > >> say,
> > > > >>>>>> server1,
> > > > >>>>>>>>>>> server1 is not able to accept connections (error "Close
> of
> > > > >>>>> session
> > > > >>>>>> 0x0
> > > > >>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
> > > > >>>> clients,
> > > > >>>>>> this
> > > > >>>>>>>>>>> is expected, because as far as it cannot talk with the
> > > > >> other
> > > > >>>>> peers
> > > > >>>>>> it
> > > > >>>>>>>>>>> is practically partitioned away from the cluster.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> My questions are:
> > > > >>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
> > > > >> from
> > > > >>>> 3.5
> > > > >>>>> to
> > > > >>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long
> ago,
> > > > >>>> and I
> > > > >>>>>> was
> > > > >>>>>>>>>>> not in the community as dev so I cannot tell
> > > > >>>>>>>>>>> 2) is this a viable option for users ? to have some
> > > > >> temporary
> > > > >>>>>> glitch
> > > > >>>>>>>>>>> during the upgrade and hope that the upgrade completes
> > > > >> without
> > > > >>>>>>>>>>> troubles ?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> In theory as long as two servers are running the same
> major
> > > > >>>>> version
> > > > >>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
> > > > >> make
> > > > >>>>>> progress
> > > > >>>>>>>>>>> and to server clients.
> > > > >>>>>>>>>>> I feel that this is quite dangerous, but I don't have
> > > > >> enough
> > > > >>>>>> context
> > > > >>>>>>>>>>> to understand how this problem is possible and when we
> > > > >> decided
> > > > >>>> to
> > > > >>>>>>>>>>> break compatibility.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> The other option is that I am wrong in my test and I am
> > > > >> messing
> > > > >>>>> up
> > > > >>>>>> :-)
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> The other upgrade path I would like to see working like a
> > > > >> charm
> > > > >>>>> is
> > > > >>>>>> the
> > > > >>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
> > > > >> release
> > > > >>>> 3.6
> > > > >>>>> we
> > > > >>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Regards
> > > > >>>>>>>>>>> Enrico
> > > > >>>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > >
> > >
> >
>
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by "Michael K. Edwards" <m....@gmail.com>.
I hate to say it, but I think 3.6.0 should release as is.  It is impossible
to *reliably* retrofit backwards compatibility / interoperability onto a
release that was engineered from the beginning without that goal.  Learn
the lesson, set goals differently in the future.

On Tue, Feb 11, 2020 at 9:41 AM Szalay-Bekő Máté <sz...@gmail.com>
wrote:

> FYI: I created these scripts for my local tests:
> https://github.com/symat/zk-rolling-upgrade-test
>
> For the long term I would also add some script that actually monitors the
> state of the quorum and also runs continuous traffic, not just 1-2
> smoketests after each restart. But I don't know how important this would
> be.
>
> On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
> > <an...@apache.org> ha scritto:
> > >
> > > The most obvious one which crosses my mind is that I previously worked
> > on:
> > >
> > > 1) run old version cluster,
> > > 2) connect to each node and run smoke tests,
> > > 3) restart one node with new code,
> > > 4) goto 2) until all nodes are upgraded
> > >
> > > I think this wouldn’t work in a “unit test”, we probably need a
> separate
> > Jenkins job and a nice python script to do this.
> > >
> > > Andor
> > >
> > >
> > >
> > >
> > > > On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org> wrote:
> > > >
> > > > Anyone have ideas how we could add testing for upgrade? Obviously
> > something
> > > > we're missing, esp given it's import.
> >
> > I will send an email next days with a proposal.
> > btw my idea is very like Andor's one
> >
> > Once we have an automatic environment we can launch from Jenkins
> >
> > Enrico
> >
> >
> > > >
> > > > Patrick
> > > >
> > > > On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <
> eolivelli@gmail.com>
> > > > wrote:
> > > >
> > > >> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> > > >> <sz...@gmail.com> ha scritto:
> > > >>>
> > > >>> Hi All,
> > > >>>
> > > >>> about the question from Michael:
> > > >>>> Regarding the fix, can we just make 3.6.0 aware of the old
> protocol
> > and
> > > >>>> speak old message format when it's talking to old server?
> > > >>>
> > > >>> In this particular case, it might be enough. The protocol change
> > happened
> > > >>> now in the 'initial message' sent by the QuorumCnxManager. Maybe it
> > is
> > > >> not
> > > >>> a problem if the new servers can not initiate channels to the old
> > > >> servers,
> > > >>> maybe it is enough if these channel gets initiated by the old
> servers
> > > >> only.
> > > >>> I will test it quickly.
> > > >>>
> > > >>> Although I have no idea if any other thing changed in the quorum
> > protocol
> > > >>> between 3.5 and 3.6. In other cases it might not be enough if the
> new
> > > >>> servers can understand the old messages, as the old servers can
> > break by
> > > >>> not understanding the messages from the new servers. Also, in the
> > code
> > > >>> currently (AFAIK) there is no generic knowledge of protocol
> > versions, the
> > > >>> servers are not storing that which protocol versions they
> can/should
> > use
> > > >> to
> > > >>> communicate to which particular other servers. Maybe we don't even
> > need
> > > >>> this, but I would feel better if we would have more tests around
> > these
> > > >>> things.
> > > >>>
> > > >>> My suggestion for the long term:
> > > >>> - let's fix this particular issue now with 3.6.0 quickly (I start
> > doing
> > > >>> this today)
> > > >>> - let's do some automation (backed up with jenkins) that will test
> a
> > > >> whole
> > > >>> combinations of different ZooKeeper upgrade paths by making rolling
> > > >>> upgrades during some light traffic. Let's have a bit better
> > definition
> > > >>> about what we expect (e.g. the quorum is up, but some clients can
> get
> > > >>> disconnected? What will happen to the ephemeral nodes? Do we want
> to
> > > >>> gracefully close or transfer the user sessions before stopping the
> > old
> > > >>> server?) and let's see where this broke. Just by checking the
> code, I
> > > >> don't
> > > >>> think the quorum will always be up (e.g. between older 3.4 versions
> > and
> > > >>> 3.5).
> > > >>
> > > >>
> > > >> I am happy to work on this topic
> > > >>
> > > >>> - we need to update the Wiki about the working rolling upgrade
> paths
> > and
> > > >>> maybe about workarounds if needed
> > > >>> - we might need to do some fixes (adding backward compatible
> versions
> > > >>> and/or specific parameters that enforce old protocol temporary
> > during the
> > > >>> rolling upgrade that can be changed later to the new protocol by
> > either
> > > >>> dynamic reconfig or by rolling restart)
> > > >>
> > > >> it would be much better on 3.6 code to have some support for
> > > >> compatibility with 3.5 servers
> > > >> we can't require old code to be forward compatible but we can make
> new
> > > >> code be compatible to a certain extend with old code.
> > > >> If we can achieve this compatibility goal without a flag is better,
> > > >> users won't have to care about this part and they simply "trust" on
> us
> > > >>
> > > >> The rollback story is also important, but maybe we are still not
> ready
> > > >> for it, in case of local changes to store,
> > > >> it is better to have a clear design and plan and work for a new
> > release
> > > >> (3.7?)
> > > >>
> > > >> Enrico
> > > >>
> > > >>>
> > > >>> Depending on your comments, I am happy to create a few Jira tickets
> > > >> around
> > > >>> these topics.
> > > >>>
> > > >>> Kind regards,
> > > >>> Mate
> > > >>>
> > > >>> ps. Enrico, sorry about your RC... I owe you a beer, let me know if
> > you
> > > >> are
> > > >>> near to Budapest ;)
> > > >>>
> > > >>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <
> eolivelli@gmail.com
> > >
> > > >> wrote:
> > > >>>
> > > >>>> Good.
> > > >>>>
> > > >>>> I will cancel the vote for 3.6.0rc2.
> > > >>>>
> > > >>>> I appreciate very much If Mate and his colleagues have time to
> work
> > on
> > > >> a
> > > >>>> fix.
> > > >>>> Otherwise I will have cycles next week
> > > >>>>
> > > >>>> I would also like to spend my time in setting up a few minimal
> > > >> integration
> > > >>>> tests about the upgrade story
> > > >>>>
> > > >>>> Enrico
> > > >>>>
> > > >>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha
> scritto:
> > > >>>>
> > > >>>>> Kudos Enrico, very thorough work as the final gate keeper of the
> > > >> release!
> > > >>>>>
> > > >>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> > > >>>>>
> > > >>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of
> the
> > > >> rare
> > > >>>>> piece of software that put so much emphasis on compatibilities
> thus
> > > >> it
> > > >>>> just
> > > >>>>> works when upgrade / downgrade, which is amazing. One guarantee
> we
> > > >> always
> > > >>>>> had is during rolling upgrade, the quorum will always be
> available,
> > > >>>> leading
> > > >>>>> to no service interruption. It would be sad we lose such
> capability
> > > >> given
> > > >>>>> this is still a tractable problem.
> > > >>>>>
> > > >>>>> Regarding the fix, can we just make 3.6.0 aware of the old
> protocol
> > > >> and
> > > >>>>> speak old message format when it's talking to old server?
> > Basically,
> > > >> an
> > > >>>>> ugly if else check against the protocol version should work and
> > > >> there is
> > > >>>> no
> > > >>>>> need to have multiple pass on rolling upgrade process.
> > > >>>>>
> > > >>>>>
> > > >>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> > > >> eolivelli@gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> I suggest this plan:
> > > >>>>>> - release 3.6.0 now
> > > >>>>>> - improve the migration story, the flow outlined by Mate is
> > > >>>>>> interesting, but it will take time
> > > >>>>>>
> > > >>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize the
> > > >>>>>> release this evening (within 8-10 hours) if no one comes out in
> > the
> > > >>>>>> VOTE thread with a -1
> > > >>>>>>
> > > >>>>>> Enrico
> > > >>>>>>
> > > >>>>>> Enrico
> > > >>>>>>
> > > >>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> > > >>>>>> <ph...@apache.org> ha scritto:
> > > >>>>>>>
> > > >>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <andor@apache.org
> >
> > > >>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi,
> > > >>>>>>>>
> > > >>>>>>>> Answers inline.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> In my experience when you are close to a release it is
> > > >> better to
> > > >>>> to
> > > >>>>>>>>> make big changes. (I am among the approvers of that patch,
> > > >> so I
> > > >>>> am
> > > >>>>>>>>> responsible for this change)
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Although this statement is acceptable for me, I don’t feel
> this
> > > >>>> patch
> > > >>>>>>>> should not have been merged into 3.6.0. Submission has been
> > > >>>> preceded
> > > >>>>>> by a
> > > >>>>>>>> long argument with MAPR folks who originally wanted to be
> > > >> merged
> > > >>>> into
> > > >>>>>> 3.4
> > > >>>>>>>> branch (considering the pace how ZooKeeper community is moving
> > > >>>>>> forward) and
> > > >>>>>>>> we reached an agreement that release it with 3.6.0.
> > > >>>>>>>>
> > > >>>>>>>> Make a long story short, this patch has been outstanding for
> > > >> ages
> > > >>>>>> without
> > > >>>>>>>> much attention from the community and contributors made a lot
> > > >> of
> > > >>>>>> effort to
> > > >>>>>>>> get it done before the release.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> I would like to ear from people that have been in the
> > > >> community
> > > >>>> for
> > > >>>>>>>>> long time, then I am ready to complete the release process
> > > >> for
> > > >>>>>>>>> 3.6.0rc2.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Me too.
> > > >>>>>>>>
> > > >>>>>>>> I tend to accept the way rolling restart works now - as you
> > > >>>> described
> > > >>>>>>>> Enrico - and given that situation was pretty much the same
> > > >> between
> > > >>>>> 3.4
> > > >>>>>> and
> > > >>>>>>>> 3.5, I don’t feel we have to make additional changes.
> > > >>>>>>>>
> > > >>>>>>>> On the other hand, the fix that Mate suggested sounds quite
> > > >> cool,
> > > >>>> I’m
> > > >>>>>> also
> > > >>>>>>>> happy to work on getting it in.
> > > >>>>>>>>
> > > >>>>>>>> Fyi, Release Management page says the following:
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>
> > > >>
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > > >>>>>>>>
> > > >>>>>>>> "major.minor release of ZooKeeper must be backwards compatible
> > > >> with
> > > >>>>> the
> > > >>>>>>>> previous minor release, major.(minor-1)"
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>> Our users, direct and indirect, value the ability to migrate to
> > > >> newer
> > > >>>>>>> versions - esp as we drop support for older. Frictions such as
> > > >> this
> > > >>>> can
> > > >>>>>> be
> > > >>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given our
> > > >>>>> published
> > > >>>>>>> guidelines.
> > > >>>>>>>
> > > >>>>>>> Patrick
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Andor
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
> > > >> eolivelli@gmail.com
> > > >>>>>
> > > >>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Thank you Mate for checking and explaining this story.
> > > >>>>>>>>>
> > > >>>>>>>>> I find it very interesting that the cause is ZOOKEEPER-3188
> > > >> as:
> > > >>>>>>>>> - it is the last "big patch" committed to 3.6 before
> > > >> starting the
> > > >>>>>>>>> release process
> > > >>>>>>>>> - it is the cause of the failure of the first RC
> > > >>>>>>>>>
> > > >>>>>>>>> In my experience when you are close to a release it is
> > > >> better to
> > > >>>> to
> > > >>>>>>>>> make big changes. (I am among the approvers of that patch,
> > > >> so I
> > > >>>> am
> > > >>>>>>>>> responsible for this change)
> > > >>>>>>>>>
> > > >>>>>>>>> This is a pointer to the change to whom who wants to
> > > >> understand
> > > >>>>>> better
> > > >>>>>>>>> the context
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > >>>>>>>>>
> > > >>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was the
> > > >> same
> > > >>>>> and
> > > >>>>>>>>> if this statement holds then I feel we can continue
> > > >>>>>>>>> with this release.
> > > >>>>>>>>>
> > > >>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> > > >>>>>> complex.
> > > >>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and we
> > > >> do
> > > >>>> not
> > > >>>>>>>>> have tools to certify this compatibility (at least not in the
> > > >>>> short
> > > >>>>>>>>> term)
> > > >>>>>>>>>
> > > >>>>>>>>> I would like to ear from people that have been in the
> > > >> community
> > > >>>> for
> > > >>>>>>>>> long time, then I am ready to complete the release process
> > > >> for
> > > >>>>>>>>> 3.6.0rc2.
> > > >>>>>>>>>
> > > >>>>>>>>> I will update the website and the release notes with a
> > > >> specific
> > > >>>>>>>>> warning about the upgrade, we should also update the Wiki
> > > >>>>>>>>>
> > > >>>>>>>>> Enrico
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > >>>>>>>>> <sz...@gmail.com> ha scritto:
> > > >>>>>>>>>>
> > > >>>>>>>>>> Hi Enrico!
> > > >>>>>>>>>>
> > > >>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
> > > >>>>>>>> QuorumCnxManager.
> > > >>>>>>>>>> The Protocol version  was changed last time in
> > > >> ZOOKEEPER-2186
> > > >>>>>> released
> > > >>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some
> > > >> bugs.
> > > >>>>>> Later I
> > > >>>>>>>>>> also changed the protocol version when the format of the
> > > >> initial
> > > >>>>>> message
> > > >>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum protocol
> > > >> is
> > > >>>> not
> > > >>>>>>>>>> compatible in this case and is the 'expected' behavior if
> > > >> you
> > > >>>>>> upgrade
> > > >>>>>>>> e.g
> > > >>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to
> > > >>>> 3.6.0.
> > > >>>>>>>>>>
> > > >>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
> > > >> then and
> > > >>>>>> got to
> > > >>>>>>>> the
> > > >>>>>>>>>> conclusion that it is not that bad, as there will be no data
> > > >>>> loss
> > > >>>>>> as you
> > > >>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
> > > >> should
> > > >>>>>> ensure
> > > >>>>>>>>>> both backward and forward compatibility to make sure that
> > > >> the
> > > >>>> old
> > > >>>>>> and
> > > >>>>>>>> the
> > > >>>>>>>>>> new part of the quorum can still speak to each other. The
> > > >>>> current
> > > >>>>>>>> solution
> > > >>>>>>>>>> (simply failing if the protocol versions mismatch) is more
> > > >>>> simple
> > > >>>>>> and
> > > >>>>>>>> still
> > > >>>>>>>>>> working just fine: as the servers are restarted one-by-one,
> > > >> the
> > > >>>>>> nodes
> > > >>>>>>>> with
> > > >>>>>>>>>> the old protocol version and the nodes with the new protocol
> > > >>>>> version
> > > >>>>>>>> will
> > > >>>>>>>>>> form two partitions, but any given time only one partition
> > > >> will
> > > >>>>>> have the
> > > >>>>>>>>>> quorum.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Still, thinking it trough, as a side effect in these cases
> > > >> there
> > > >>>>>> will
> > > >>>>>>>> be a
> > > >>>>>>>>>> short time when none of the partitions will have quorums
> > > >> (when
> > > >>>> we
> > > >>>>>> have N
> > > >>>>>>>>>> servers with the old protocol version, N servers with the
> > > >> new
> > > >>>>>> protocol
> > > >>>>>>>>>> version, and there is one server just being restarted). I
> > > >> am not
> > > >>>>>> sure
> > > >>>>>>>> if we
> > > >>>>>>>>>> can accept this.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
> > > >> possible
> > > >>>> to
> > > >>>>>> parse
> > > >>>>>>>>>> the initial message of the old protocol version with the new
> > > >>>> code.
> > > >>>>>> But
> > > >>>>>>>> I am
> > > >>>>>>>>>> not sure if it would be enough (as the old code will not be
> > > >> able
> > > >>>>> to
> > > >>>>>>>> parse
> > > >>>>>>>>>> the new initial message).
> > > >>>>>>>>>>
> > > >>>>>>>>>> One option can be to make a patch also for 3.5 to have a
> > > >> version
> > > >>>>>> which
> > > >>>>>>>>>> supports both protocol versions. (let's say in 3.5.8) Then
> > > >> we
> > > >>>> can
> > > >>>>>> write
> > > >>>>>>>> to
> > > >>>>>>>>>> the release note, that if you need rolling upgrade from any
> > > >>>>> versions
> > > >>>>>>>> since
> > > >>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
> > > >>>> upgrading
> > > >>>>> to
> > > >>>>>>>> 3.6.0.
> > > >>>>>>>>>> We can even make the same thing on the 3.4 branch.
> > > >>>>>>>>>>
> > > >>>>>>>>>> But I am also new to the community... It would be great to
> > > >> hear
> > > >>>>> the
> > > >>>>>>>> opinion
> > > >>>>>>>>>> of more experienced people.
> > > >>>>>>>>>> Whatever the decision will be, I am happy to make the
> > > >> changes.
> > > >>>>>>>>>>
> > > >>>>>>>>>> And sorry for breaking the RC (if we decide that this needs
> > > >> to
> > > >>>> be
> > > >>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Kind regards,
> > > >>>>>>>>>> Mate
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> > > >>>>>> eolivelli@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi,
> > > >>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
> > > >> closing the
> > > >>>>>> VOTE
> > > >>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to an
> > > >>>>> apparent
> > > >>>>>>>>>>> blocker.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> > > >> looks
> > > >>>>> like
> > > >>>>>>>>>>> peers are not able to talk to each other.
> > > >>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
> > > >>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> > > >> errors on
> > > >>>>> 3.5
> > > >>>>>>>> nodes:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] -
> > > >>>>>> Received
> > > >>>>>>>>>>> connection request 127.0.0.1:62591
> > > >>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> >
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > > >>>>>>>>>>> Got unrecognized protocol version -65535
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Once I upgrade all of the peers the system is up and
> > > >> running,
> > > >>>>>> without
> > > >>>>>>>>>>> apparently no data loss.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
> > > >> say,
> > > >>>>>> server1,
> > > >>>>>>>>>>> server1 is not able to accept connections (error "Close of
> > > >>>>> session
> > > >>>>>> 0x0
> > > >>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
> > > >>>> clients,
> > > >>>>>> this
> > > >>>>>>>>>>> is expected, because as far as it cannot talk with the
> > > >> other
> > > >>>>> peers
> > > >>>>>> it
> > > >>>>>>>>>>> is practically partitioned away from the cluster.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> My questions are:
> > > >>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
> > > >> from
> > > >>>> 3.5
> > > >>>>> to
> > > >>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago,
> > > >>>> and I
> > > >>>>>> was
> > > >>>>>>>>>>> not in the community as dev so I cannot tell
> > > >>>>>>>>>>> 2) is this a viable option for users ? to have some
> > > >> temporary
> > > >>>>>> glitch
> > > >>>>>>>>>>> during the upgrade and hope that the upgrade completes
> > > >> without
> > > >>>>>>>>>>> troubles ?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> In theory as long as two servers are running the same major
> > > >>>>> version
> > > >>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
> > > >> make
> > > >>>>>> progress
> > > >>>>>>>>>>> and to server clients.
> > > >>>>>>>>>>> I feel that this is quite dangerous, but I don't have
> > > >> enough
> > > >>>>>> context
> > > >>>>>>>>>>> to understand how this problem is possible and when we
> > > >> decided
> > > >>>> to
> > > >>>>>>>>>>> break compatibility.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> The other option is that I am wrong in my test and I am
> > > >> messing
> > > >>>>> up
> > > >>>>>> :-)
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> The other upgrade path I would like to see working like a
> > > >> charm
> > > >>>>> is
> > > >>>>>> the
> > > >>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
> > > >> release
> > > >>>> 3.6
> > > >>>>> we
> > > >>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regards
> > > >>>>>>>>>>> Enrico
> > > >>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > >
> >
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Szalay-Bekő Máté <sz...@gmail.com>.
FYI: I created these scripts for my local tests:
https://github.com/symat/zk-rolling-upgrade-test

For the long term I would also add some script that actually monitors the
state of the quorum and also runs continuous traffic, not just 1-2
smoketests after each restart. But I don't know how important this would be.

On Tue, Feb 11, 2020 at 5:25 PM Enrico Olivelli <eo...@gmail.com> wrote:

> Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
> <an...@apache.org> ha scritto:
> >
> > The most obvious one which crosses my mind is that I previously worked
> on:
> >
> > 1) run old version cluster,
> > 2) connect to each node and run smoke tests,
> > 3) restart one node with new code,
> > 4) goto 2) until all nodes are upgraded
> >
> > I think this wouldn’t work in a “unit test”, we probably need a separate
> Jenkins job and a nice python script to do this.
> >
> > Andor
> >
> >
> >
> >
> > > On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org> wrote:
> > >
> > > Anyone have ideas how we could add testing for upgrade? Obviously
> something
> > > we're missing, esp given it's import.
>
> I will send an email next days with a proposal.
> btw my idea is very like Andor's one
>
> Once we have an automatic environment we can launch from Jenkins
>
> Enrico
>
>
> > >
> > > Patrick
> > >
> > > On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > >> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> > >> <sz...@gmail.com> ha scritto:
> > >>>
> > >>> Hi All,
> > >>>
> > >>> about the question from Michael:
> > >>>> Regarding the fix, can we just make 3.6.0 aware of the old protocol
> and
> > >>>> speak old message format when it's talking to old server?
> > >>>
> > >>> In this particular case, it might be enough. The protocol change
> happened
> > >>> now in the 'initial message' sent by the QuorumCnxManager. Maybe it
> is
> > >> not
> > >>> a problem if the new servers can not initiate channels to the old
> > >> servers,
> > >>> maybe it is enough if these channel gets initiated by the old servers
> > >> only.
> > >>> I will test it quickly.
> > >>>
> > >>> Although I have no idea if any other thing changed in the quorum
> protocol
> > >>> between 3.5 and 3.6. In other cases it might not be enough if the new
> > >>> servers can understand the old messages, as the old servers can
> break by
> > >>> not understanding the messages from the new servers. Also, in the
> code
> > >>> currently (AFAIK) there is no generic knowledge of protocol
> versions, the
> > >>> servers are not storing that which protocol versions they can/should
> use
> > >> to
> > >>> communicate to which particular other servers. Maybe we don't even
> need
> > >>> this, but I would feel better if we would have more tests around
> these
> > >>> things.
> > >>>
> > >>> My suggestion for the long term:
> > >>> - let's fix this particular issue now with 3.6.0 quickly (I start
> doing
> > >>> this today)
> > >>> - let's do some automation (backed up with jenkins) that will test a
> > >> whole
> > >>> combinations of different ZooKeeper upgrade paths by making rolling
> > >>> upgrades during some light traffic. Let's have a bit better
> definition
> > >>> about what we expect (e.g. the quorum is up, but some clients can get
> > >>> disconnected? What will happen to the ephemeral nodes? Do we want to
> > >>> gracefully close or transfer the user sessions before stopping the
> old
> > >>> server?) and let's see where this broke. Just by checking the code, I
> > >> don't
> > >>> think the quorum will always be up (e.g. between older 3.4 versions
> and
> > >>> 3.5).
> > >>
> > >>
> > >> I am happy to work on this topic
> > >>
> > >>> - we need to update the Wiki about the working rolling upgrade paths
> and
> > >>> maybe about workarounds if needed
> > >>> - we might need to do some fixes (adding backward compatible versions
> > >>> and/or specific parameters that enforce old protocol temporary
> during the
> > >>> rolling upgrade that can be changed later to the new protocol by
> either
> > >>> dynamic reconfig or by rolling restart)
> > >>
> > >> it would be much better on 3.6 code to have some support for
> > >> compatibility with 3.5 servers
> > >> we can't require old code to be forward compatible but we can make new
> > >> code be compatible to a certain extend with old code.
> > >> If we can achieve this compatibility goal without a flag is better,
> > >> users won't have to care about this part and they simply "trust" on us
> > >>
> > >> The rollback story is also important, but maybe we are still not ready
> > >> for it, in case of local changes to store,
> > >> it is better to have a clear design and plan and work for a new
> release
> > >> (3.7?)
> > >>
> > >> Enrico
> > >>
> > >>>
> > >>> Depending on your comments, I am happy to create a few Jira tickets
> > >> around
> > >>> these topics.
> > >>>
> > >>> Kind regards,
> > >>> Mate
> > >>>
> > >>> ps. Enrico, sorry about your RC... I owe you a beer, let me know if
> you
> > >> are
> > >>> near to Budapest ;)
> > >>>
> > >>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <eolivelli@gmail.com
> >
> > >> wrote:
> > >>>
> > >>>> Good.
> > >>>>
> > >>>> I will cancel the vote for 3.6.0rc2.
> > >>>>
> > >>>> I appreciate very much If Mate and his colleagues have time to work
> on
> > >> a
> > >>>> fix.
> > >>>> Otherwise I will have cycles next week
> > >>>>
> > >>>> I would also like to spend my time in setting up a few minimal
> > >> integration
> > >>>> tests about the upgrade story
> > >>>>
> > >>>> Enrico
> > >>>>
> > >>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha scritto:
> > >>>>
> > >>>>> Kudos Enrico, very thorough work as the final gate keeper of the
> > >> release!
> > >>>>>
> > >>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> > >>>>>
> > >>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the
> > >> rare
> > >>>>> piece of software that put so much emphasis on compatibilities thus
> > >> it
> > >>>> just
> > >>>>> works when upgrade / downgrade, which is amazing. One guarantee we
> > >> always
> > >>>>> had is during rolling upgrade, the quorum will always be available,
> > >>>> leading
> > >>>>> to no service interruption. It would be sad we lose such capability
> > >> given
> > >>>>> this is still a tractable problem.
> > >>>>>
> > >>>>> Regarding the fix, can we just make 3.6.0 aware of the old protocol
> > >> and
> > >>>>> speak old message format when it's talking to old server?
> Basically,
> > >> an
> > >>>>> ugly if else check against the protocol version should work and
> > >> there is
> > >>>> no
> > >>>>> need to have multiple pass on rolling upgrade process.
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> > >> eolivelli@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> I suggest this plan:
> > >>>>>> - release 3.6.0 now
> > >>>>>> - improve the migration story, the flow outlined by Mate is
> > >>>>>> interesting, but it will take time
> > >>>>>>
> > >>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize the
> > >>>>>> release this evening (within 8-10 hours) if no one comes out in
> the
> > >>>>>> VOTE thread with a -1
> > >>>>>>
> > >>>>>> Enrico
> > >>>>>>
> > >>>>>> Enrico
> > >>>>>>
> > >>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> > >>>>>> <ph...@apache.org> ha scritto:
> > >>>>>>>
> > >>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Answers inline.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> In my experience when you are close to a release it is
> > >> better to
> > >>>> to
> > >>>>>>>>> make big changes. (I am among the approvers of that patch,
> > >> so I
> > >>>> am
> > >>>>>>>>> responsible for this change)
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Although this statement is acceptable for me, I don’t feel this
> > >>>> patch
> > >>>>>>>> should not have been merged into 3.6.0. Submission has been
> > >>>> preceded
> > >>>>>> by a
> > >>>>>>>> long argument with MAPR folks who originally wanted to be
> > >> merged
> > >>>> into
> > >>>>>> 3.4
> > >>>>>>>> branch (considering the pace how ZooKeeper community is moving
> > >>>>>> forward) and
> > >>>>>>>> we reached an agreement that release it with 3.6.0.
> > >>>>>>>>
> > >>>>>>>> Make a long story short, this patch has been outstanding for
> > >> ages
> > >>>>>> without
> > >>>>>>>> much attention from the community and contributors made a lot
> > >> of
> > >>>>>> effort to
> > >>>>>>>> get it done before the release.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> I would like to ear from people that have been in the
> > >> community
> > >>>> for
> > >>>>>>>>> long time, then I am ready to complete the release process
> > >> for
> > >>>>>>>>> 3.6.0rc2.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Me too.
> > >>>>>>>>
> > >>>>>>>> I tend to accept the way rolling restart works now - as you
> > >>>> described
> > >>>>>>>> Enrico - and given that situation was pretty much the same
> > >> between
> > >>>>> 3.4
> > >>>>>> and
> > >>>>>>>> 3.5, I don’t feel we have to make additional changes.
> > >>>>>>>>
> > >>>>>>>> On the other hand, the fix that Mate suggested sounds quite
> > >> cool,
> > >>>> I’m
> > >>>>>> also
> > >>>>>>>> happy to work on getting it in.
> > >>>>>>>>
> > >>>>>>>> Fyi, Release Management page says the following:
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > >>>>>>>>
> > >>>>>>>> "major.minor release of ZooKeeper must be backwards compatible
> > >> with
> > >>>>> the
> > >>>>>>>> previous minor release, major.(minor-1)"
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>> Our users, direct and indirect, value the ability to migrate to
> > >> newer
> > >>>>>>> versions - esp as we drop support for older. Frictions such as
> > >> this
> > >>>> can
> > >>>>>> be
> > >>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given our
> > >>>>> published
> > >>>>>>> guidelines.
> > >>>>>>>
> > >>>>>>> Patrick
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Andor
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
> > >> eolivelli@gmail.com
> > >>>>>
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Thank you Mate for checking and explaining this story.
> > >>>>>>>>>
> > >>>>>>>>> I find it very interesting that the cause is ZOOKEEPER-3188
> > >> as:
> > >>>>>>>>> - it is the last "big patch" committed to 3.6 before
> > >> starting the
> > >>>>>>>>> release process
> > >>>>>>>>> - it is the cause of the failure of the first RC
> > >>>>>>>>>
> > >>>>>>>>> In my experience when you are close to a release it is
> > >> better to
> > >>>> to
> > >>>>>>>>> make big changes. (I am among the approvers of that patch,
> > >> so I
> > >>>> am
> > >>>>>>>>> responsible for this change)
> > >>>>>>>>>
> > >>>>>>>>> This is a pointer to the change to whom who wants to
> > >> understand
> > >>>>>> better
> > >>>>>>>>> the context
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > >>>>>>>>>
> > >>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was the
> > >> same
> > >>>>> and
> > >>>>>>>>> if this statement holds then I feel we can continue
> > >>>>>>>>> with this release.
> > >>>>>>>>>
> > >>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> > >>>>>> complex.
> > >>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and we
> > >> do
> > >>>> not
> > >>>>>>>>> have tools to certify this compatibility (at least not in the
> > >>>> short
> > >>>>>>>>> term)
> > >>>>>>>>>
> > >>>>>>>>> I would like to ear from people that have been in the
> > >> community
> > >>>> for
> > >>>>>>>>> long time, then I am ready to complete the release process
> > >> for
> > >>>>>>>>> 3.6.0rc2.
> > >>>>>>>>>
> > >>>>>>>>> I will update the website and the release notes with a
> > >> specific
> > >>>>>>>>> warning about the upgrade, we should also update the Wiki
> > >>>>>>>>>
> > >>>>>>>>> Enrico
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > >>>>>>>>> <sz...@gmail.com> ha scritto:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi Enrico!
> > >>>>>>>>>>
> > >>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
> > >>>>>>>> QuorumCnxManager.
> > >>>>>>>>>> The Protocol version  was changed last time in
> > >> ZOOKEEPER-2186
> > >>>>>> released
> > >>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some
> > >> bugs.
> > >>>>>> Later I
> > >>>>>>>>>> also changed the protocol version when the format of the
> > >> initial
> > >>>>>> message
> > >>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum protocol
> > >> is
> > >>>> not
> > >>>>>>>>>> compatible in this case and is the 'expected' behavior if
> > >> you
> > >>>>>> upgrade
> > >>>>>>>> e.g
> > >>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to
> > >>>> 3.6.0.
> > >>>>>>>>>>
> > >>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
> > >> then and
> > >>>>>> got to
> > >>>>>>>> the
> > >>>>>>>>>> conclusion that it is not that bad, as there will be no data
> > >>>> loss
> > >>>>>> as you
> > >>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
> > >> should
> > >>>>>> ensure
> > >>>>>>>>>> both backward and forward compatibility to make sure that
> > >> the
> > >>>> old
> > >>>>>> and
> > >>>>>>>> the
> > >>>>>>>>>> new part of the quorum can still speak to each other. The
> > >>>> current
> > >>>>>>>> solution
> > >>>>>>>>>> (simply failing if the protocol versions mismatch) is more
> > >>>> simple
> > >>>>>> and
> > >>>>>>>> still
> > >>>>>>>>>> working just fine: as the servers are restarted one-by-one,
> > >> the
> > >>>>>> nodes
> > >>>>>>>> with
> > >>>>>>>>>> the old protocol version and the nodes with the new protocol
> > >>>>> version
> > >>>>>>>> will
> > >>>>>>>>>> form two partitions, but any given time only one partition
> > >> will
> > >>>>>> have the
> > >>>>>>>>>> quorum.
> > >>>>>>>>>>
> > >>>>>>>>>> Still, thinking it trough, as a side effect in these cases
> > >> there
> > >>>>>> will
> > >>>>>>>> be a
> > >>>>>>>>>> short time when none of the partitions will have quorums
> > >> (when
> > >>>> we
> > >>>>>> have N
> > >>>>>>>>>> servers with the old protocol version, N servers with the
> > >> new
> > >>>>>> protocol
> > >>>>>>>>>> version, and there is one server just being restarted). I
> > >> am not
> > >>>>>> sure
> > >>>>>>>> if we
> > >>>>>>>>>> can accept this.
> > >>>>>>>>>>
> > >>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
> > >> possible
> > >>>> to
> > >>>>>> parse
> > >>>>>>>>>> the initial message of the old protocol version with the new
> > >>>> code.
> > >>>>>> But
> > >>>>>>>> I am
> > >>>>>>>>>> not sure if it would be enough (as the old code will not be
> > >> able
> > >>>>> to
> > >>>>>>>> parse
> > >>>>>>>>>> the new initial message).
> > >>>>>>>>>>
> > >>>>>>>>>> One option can be to make a patch also for 3.5 to have a
> > >> version
> > >>>>>> which
> > >>>>>>>>>> supports both protocol versions. (let's say in 3.5.8) Then
> > >> we
> > >>>> can
> > >>>>>> write
> > >>>>>>>> to
> > >>>>>>>>>> the release note, that if you need rolling upgrade from any
> > >>>>> versions
> > >>>>>>>> since
> > >>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
> > >>>> upgrading
> > >>>>> to
> > >>>>>>>> 3.6.0.
> > >>>>>>>>>> We can even make the same thing on the 3.4 branch.
> > >>>>>>>>>>
> > >>>>>>>>>> But I am also new to the community... It would be great to
> > >> hear
> > >>>>> the
> > >>>>>>>> opinion
> > >>>>>>>>>> of more experienced people.
> > >>>>>>>>>> Whatever the decision will be, I am happy to make the
> > >> changes.
> > >>>>>>>>>>
> > >>>>>>>>>> And sorry for breaking the RC (if we decide that this needs
> > >> to
> > >>>> be
> > >>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
> > >>>>>>>>>>
> > >>>>>>>>>> Kind regards,
> > >>>>>>>>>> Mate
> > >>>>>>>>>>
> > >>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> > >>>>>> eolivelli@gmail.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Hi,
> > >>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
> > >> closing the
> > >>>>>> VOTE
> > >>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to an
> > >>>>> apparent
> > >>>>>>>>>>> blocker.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> > >> looks
> > >>>>> like
> > >>>>>>>>>>> peers are not able to talk to each other.
> > >>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
> > >>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> > >> errors on
> > >>>>> 3.5
> > >>>>>>>> nodes:
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] -
> > >>>>>> Received
> > >>>>>>>>>>> connection request 127.0.0.1:62591
> > >>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > >>>>>>>>>>> Got unrecognized protocol version -65535
> > >>>>>>>>>>>
> > >>>>>>>>>>> Once I upgrade all of the peers the system is up and
> > >> running,
> > >>>>>> without
> > >>>>>>>>>>> apparently no data loss.
> > >>>>>>>>>>>
> > >>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
> > >> say,
> > >>>>>> server1,
> > >>>>>>>>>>> server1 is not able to accept connections (error "Close of
> > >>>>> session
> > >>>>>> 0x0
> > >>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
> > >>>> clients,
> > >>>>>> this
> > >>>>>>>>>>> is expected, because as far as it cannot talk with the
> > >> other
> > >>>>> peers
> > >>>>>> it
> > >>>>>>>>>>> is practically partitioned away from the cluster.
> > >>>>>>>>>>>
> > >>>>>>>>>>> My questions are:
> > >>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
> > >> from
> > >>>> 3.5
> > >>>>> to
> > >>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago,
> > >>>> and I
> > >>>>>> was
> > >>>>>>>>>>> not in the community as dev so I cannot tell
> > >>>>>>>>>>> 2) is this a viable option for users ? to have some
> > >> temporary
> > >>>>>> glitch
> > >>>>>>>>>>> during the upgrade and hope that the upgrade completes
> > >> without
> > >>>>>>>>>>> troubles ?
> > >>>>>>>>>>>
> > >>>>>>>>>>> In theory as long as two servers are running the same major
> > >>>>> version
> > >>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
> > >> make
> > >>>>>> progress
> > >>>>>>>>>>> and to server clients.
> > >>>>>>>>>>> I feel that this is quite dangerous, but I don't have
> > >> enough
> > >>>>>> context
> > >>>>>>>>>>> to understand how this problem is possible and when we
> > >> decided
> > >>>> to
> > >>>>>>>>>>> break compatibility.
> > >>>>>>>>>>>
> > >>>>>>>>>>> The other option is that I am wrong in my test and I am
> > >> messing
> > >>>>> up
> > >>>>>> :-)
> > >>>>>>>>>>>
> > >>>>>>>>>>> The other upgrade path I would like to see working like a
> > >> charm
> > >>>>> is
> > >>>>>> the
> > >>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
> > >> release
> > >>>> 3.6
> > >>>>> we
> > >>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Regards
> > >>>>>>>>>>> Enrico
> > >>>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> >
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Enrico Olivelli <eo...@gmail.com>.
Il giorno mar 11 feb 2020 alle ore 17:17 Andor Molnar
<an...@apache.org> ha scritto:
>
> The most obvious one which crosses my mind is that I previously worked on:
>
> 1) run old version cluster,
> 2) connect to each node and run smoke tests,
> 3) restart one node with new code,
> 4) goto 2) until all nodes are upgraded
>
> I think this wouldn’t work in a “unit test”, we probably need a separate Jenkins job and a nice python script to do this.
>
> Andor
>
>
>
>
> > On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org> wrote:
> >
> > Anyone have ideas how we could add testing for upgrade? Obviously something
> > we're missing, esp given it's import.

I will send an email next days with a proposal.
btw my idea is very like Andor's one

Once we have an automatic environment we can launch from Jenkins

Enrico


> >
> > Patrick
> >
> > On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> >> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> >> <sz...@gmail.com> ha scritto:
> >>>
> >>> Hi All,
> >>>
> >>> about the question from Michael:
> >>>> Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> >>>> speak old message format when it's talking to old server?
> >>>
> >>> In this particular case, it might be enough. The protocol change happened
> >>> now in the 'initial message' sent by the QuorumCnxManager. Maybe it is
> >> not
> >>> a problem if the new servers can not initiate channels to the old
> >> servers,
> >>> maybe it is enough if these channel gets initiated by the old servers
> >> only.
> >>> I will test it quickly.
> >>>
> >>> Although I have no idea if any other thing changed in the quorum protocol
> >>> between 3.5 and 3.6. In other cases it might not be enough if the new
> >>> servers can understand the old messages, as the old servers can break by
> >>> not understanding the messages from the new servers. Also, in the code
> >>> currently (AFAIK) there is no generic knowledge of protocol versions, the
> >>> servers are not storing that which protocol versions they can/should use
> >> to
> >>> communicate to which particular other servers. Maybe we don't even need
> >>> this, but I would feel better if we would have more tests around these
> >>> things.
> >>>
> >>> My suggestion for the long term:
> >>> - let's fix this particular issue now with 3.6.0 quickly (I start doing
> >>> this today)
> >>> - let's do some automation (backed up with jenkins) that will test a
> >> whole
> >>> combinations of different ZooKeeper upgrade paths by making rolling
> >>> upgrades during some light traffic. Let's have a bit better definition
> >>> about what we expect (e.g. the quorum is up, but some clients can get
> >>> disconnected? What will happen to the ephemeral nodes? Do we want to
> >>> gracefully close or transfer the user sessions before stopping the old
> >>> server?) and let's see where this broke. Just by checking the code, I
> >> don't
> >>> think the quorum will always be up (e.g. between older 3.4 versions and
> >>> 3.5).
> >>
> >>
> >> I am happy to work on this topic
> >>
> >>> - we need to update the Wiki about the working rolling upgrade paths and
> >>> maybe about workarounds if needed
> >>> - we might need to do some fixes (adding backward compatible versions
> >>> and/or specific parameters that enforce old protocol temporary during the
> >>> rolling upgrade that can be changed later to the new protocol by either
> >>> dynamic reconfig or by rolling restart)
> >>
> >> it would be much better on 3.6 code to have some support for
> >> compatibility with 3.5 servers
> >> we can't require old code to be forward compatible but we can make new
> >> code be compatible to a certain extend with old code.
> >> If we can achieve this compatibility goal without a flag is better,
> >> users won't have to care about this part and they simply "trust" on us
> >>
> >> The rollback story is also important, but maybe we are still not ready
> >> for it, in case of local changes to store,
> >> it is better to have a clear design and plan and work for a new release
> >> (3.7?)
> >>
> >> Enrico
> >>
> >>>
> >>> Depending on your comments, I am happy to create a few Jira tickets
> >> around
> >>> these topics.
> >>>
> >>> Kind regards,
> >>> Mate
> >>>
> >>> ps. Enrico, sorry about your RC... I owe you a beer, let me know if you
> >> are
> >>> near to Budapest ;)
> >>>
> >>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <eo...@gmail.com>
> >> wrote:
> >>>
> >>>> Good.
> >>>>
> >>>> I will cancel the vote for 3.6.0rc2.
> >>>>
> >>>> I appreciate very much If Mate and his colleagues have time to work on
> >> a
> >>>> fix.
> >>>> Otherwise I will have cycles next week
> >>>>
> >>>> I would also like to spend my time in setting up a few minimal
> >> integration
> >>>> tests about the upgrade story
> >>>>
> >>>> Enrico
> >>>>
> >>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha scritto:
> >>>>
> >>>>> Kudos Enrico, very thorough work as the final gate keeper of the
> >> release!
> >>>>>
> >>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> >>>>>
> >>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the
> >> rare
> >>>>> piece of software that put so much emphasis on compatibilities thus
> >> it
> >>>> just
> >>>>> works when upgrade / downgrade, which is amazing. One guarantee we
> >> always
> >>>>> had is during rolling upgrade, the quorum will always be available,
> >>>> leading
> >>>>> to no service interruption. It would be sad we lose such capability
> >> given
> >>>>> this is still a tractable problem.
> >>>>>
> >>>>> Regarding the fix, can we just make 3.6.0 aware of the old protocol
> >> and
> >>>>> speak old message format when it's talking to old server? Basically,
> >> an
> >>>>> ugly if else check against the protocol version should work and
> >> there is
> >>>> no
> >>>>> need to have multiple pass on rolling upgrade process.
> >>>>>
> >>>>>
> >>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> >> eolivelli@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> I suggest this plan:
> >>>>>> - release 3.6.0 now
> >>>>>> - improve the migration story, the flow outlined by Mate is
> >>>>>> interesting, but it will take time
> >>>>>>
> >>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize the
> >>>>>> release this evening (within 8-10 hours) if no one comes out in the
> >>>>>> VOTE thread with a -1
> >>>>>>
> >>>>>> Enrico
> >>>>>>
> >>>>>> Enrico
> >>>>>>
> >>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> >>>>>> <ph...@apache.org> ha scritto:
> >>>>>>>
> >>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Answers inline.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> In my experience when you are close to a release it is
> >> better to
> >>>> to
> >>>>>>>>> make big changes. (I am among the approvers of that patch,
> >> so I
> >>>> am
> >>>>>>>>> responsible for this change)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Although this statement is acceptable for me, I don’t feel this
> >>>> patch
> >>>>>>>> should not have been merged into 3.6.0. Submission has been
> >>>> preceded
> >>>>>> by a
> >>>>>>>> long argument with MAPR folks who originally wanted to be
> >> merged
> >>>> into
> >>>>>> 3.4
> >>>>>>>> branch (considering the pace how ZooKeeper community is moving
> >>>>>> forward) and
> >>>>>>>> we reached an agreement that release it with 3.6.0.
> >>>>>>>>
> >>>>>>>> Make a long story short, this patch has been outstanding for
> >> ages
> >>>>>> without
> >>>>>>>> much attention from the community and contributors made a lot
> >> of
> >>>>>> effort to
> >>>>>>>> get it done before the release.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> I would like to ear from people that have been in the
> >> community
> >>>> for
> >>>>>>>>> long time, then I am ready to complete the release process
> >> for
> >>>>>>>>> 3.6.0rc2.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Me too.
> >>>>>>>>
> >>>>>>>> I tend to accept the way rolling restart works now - as you
> >>>> described
> >>>>>>>> Enrico - and given that situation was pretty much the same
> >> between
> >>>>> 3.4
> >>>>>> and
> >>>>>>>> 3.5, I don’t feel we have to make additional changes.
> >>>>>>>>
> >>>>>>>> On the other hand, the fix that Mate suggested sounds quite
> >> cool,
> >>>> I’m
> >>>>>> also
> >>>>>>>> happy to work on getting it in.
> >>>>>>>>
> >>>>>>>> Fyi, Release Management page says the following:
> >>>>>>>>
> >>>>>>
> >>>>
> >> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> >>>>>>>>
> >>>>>>>> "major.minor release of ZooKeeper must be backwards compatible
> >> with
> >>>>> the
> >>>>>>>> previous minor release, major.(minor-1)"
> >>>>>>>>
> >>>>>>>>
> >>>>>>> Our users, direct and indirect, value the ability to migrate to
> >> newer
> >>>>>>> versions - esp as we drop support for older. Frictions such as
> >> this
> >>>> can
> >>>>>> be
> >>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given our
> >>>>> published
> >>>>>>> guidelines.
> >>>>>>>
> >>>>>>> Patrick
> >>>>>>>
> >>>>>>>
> >>>>>>>> Andor
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
> >> eolivelli@gmail.com
> >>>>>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Thank you Mate for checking and explaining this story.
> >>>>>>>>>
> >>>>>>>>> I find it very interesting that the cause is ZOOKEEPER-3188
> >> as:
> >>>>>>>>> - it is the last "big patch" committed to 3.6 before
> >> starting the
> >>>>>>>>> release process
> >>>>>>>>> - it is the cause of the failure of the first RC
> >>>>>>>>>
> >>>>>>>>> In my experience when you are close to a release it is
> >> better to
> >>>> to
> >>>>>>>>> make big changes. (I am among the approvers of that patch,
> >> so I
> >>>> am
> >>>>>>>>> responsible for this change)
> >>>>>>>>>
> >>>>>>>>> This is a pointer to the change to whom who wants to
> >> understand
> >>>>>> better
> >>>>>>>>> the context
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> >>>>>>>>>
> >>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was the
> >> same
> >>>>> and
> >>>>>>>>> if this statement holds then I feel we can continue
> >>>>>>>>> with this release.
> >>>>>>>>>
> >>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> >>>>>> complex.
> >>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and we
> >> do
> >>>> not
> >>>>>>>>> have tools to certify this compatibility (at least not in the
> >>>> short
> >>>>>>>>> term)
> >>>>>>>>>
> >>>>>>>>> I would like to ear from people that have been in the
> >> community
> >>>> for
> >>>>>>>>> long time, then I am ready to complete the release process
> >> for
> >>>>>>>>> 3.6.0rc2.
> >>>>>>>>>
> >>>>>>>>> I will update the website and the release notes with a
> >> specific
> >>>>>>>>> warning about the upgrade, we should also update the Wiki
> >>>>>>>>>
> >>>>>>>>> Enrico
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> >>>>>>>>> <sz...@gmail.com> ha scritto:
> >>>>>>>>>>
> >>>>>>>>>> Hi Enrico!
> >>>>>>>>>>
> >>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
> >>>>>>>> QuorumCnxManager.
> >>>>>>>>>> The Protocol version  was changed last time in
> >> ZOOKEEPER-2186
> >>>>>> released
> >>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some
> >> bugs.
> >>>>>> Later I
> >>>>>>>>>> also changed the protocol version when the format of the
> >> initial
> >>>>>> message
> >>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum protocol
> >> is
> >>>> not
> >>>>>>>>>> compatible in this case and is the 'expected' behavior if
> >> you
> >>>>>> upgrade
> >>>>>>>> e.g
> >>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to
> >>>> 3.6.0.
> >>>>>>>>>>
> >>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
> >> then and
> >>>>>> got to
> >>>>>>>> the
> >>>>>>>>>> conclusion that it is not that bad, as there will be no data
> >>>> loss
> >>>>>> as you
> >>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
> >> should
> >>>>>> ensure
> >>>>>>>>>> both backward and forward compatibility to make sure that
> >> the
> >>>> old
> >>>>>> and
> >>>>>>>> the
> >>>>>>>>>> new part of the quorum can still speak to each other. The
> >>>> current
> >>>>>>>> solution
> >>>>>>>>>> (simply failing if the protocol versions mismatch) is more
> >>>> simple
> >>>>>> and
> >>>>>>>> still
> >>>>>>>>>> working just fine: as the servers are restarted one-by-one,
> >> the
> >>>>>> nodes
> >>>>>>>> with
> >>>>>>>>>> the old protocol version and the nodes with the new protocol
> >>>>> version
> >>>>>>>> will
> >>>>>>>>>> form two partitions, but any given time only one partition
> >> will
> >>>>>> have the
> >>>>>>>>>> quorum.
> >>>>>>>>>>
> >>>>>>>>>> Still, thinking it trough, as a side effect in these cases
> >> there
> >>>>>> will
> >>>>>>>> be a
> >>>>>>>>>> short time when none of the partitions will have quorums
> >> (when
> >>>> we
> >>>>>> have N
> >>>>>>>>>> servers with the old protocol version, N servers with the
> >> new
> >>>>>> protocol
> >>>>>>>>>> version, and there is one server just being restarted). I
> >> am not
> >>>>>> sure
> >>>>>>>> if we
> >>>>>>>>>> can accept this.
> >>>>>>>>>>
> >>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
> >> possible
> >>>> to
> >>>>>> parse
> >>>>>>>>>> the initial message of the old protocol version with the new
> >>>> code.
> >>>>>> But
> >>>>>>>> I am
> >>>>>>>>>> not sure if it would be enough (as the old code will not be
> >> able
> >>>>> to
> >>>>>>>> parse
> >>>>>>>>>> the new initial message).
> >>>>>>>>>>
> >>>>>>>>>> One option can be to make a patch also for 3.5 to have a
> >> version
> >>>>>> which
> >>>>>>>>>> supports both protocol versions. (let's say in 3.5.8) Then
> >> we
> >>>> can
> >>>>>> write
> >>>>>>>> to
> >>>>>>>>>> the release note, that if you need rolling upgrade from any
> >>>>> versions
> >>>>>>>> since
> >>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
> >>>> upgrading
> >>>>> to
> >>>>>>>> 3.6.0.
> >>>>>>>>>> We can even make the same thing on the 3.4 branch.
> >>>>>>>>>>
> >>>>>>>>>> But I am also new to the community... It would be great to
> >> hear
> >>>>> the
> >>>>>>>> opinion
> >>>>>>>>>> of more experienced people.
> >>>>>>>>>> Whatever the decision will be, I am happy to make the
> >> changes.
> >>>>>>>>>>
> >>>>>>>>>> And sorry for breaking the RC (if we decide that this needs
> >> to
> >>>> be
> >>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
> >>>>>>>>>>
> >>>>>>>>>> Kind regards,
> >>>>>>>>>> Mate
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> >>>>>> eolivelli@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
> >> closing the
> >>>>>> VOTE
> >>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to an
> >>>>> apparent
> >>>>>>>>>>> blocker.
> >>>>>>>>>>>
> >>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> >> looks
> >>>>> like
> >>>>>>>>>>> peers are not able to talk to each other.
> >>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
> >>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> >> errors on
> >>>>> 3.5
> >>>>>>>> nodes:
> >>>>>>>>>>>
> >>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] -
> >>>>>> Received
> >>>>>>>>>>> connection request 127.0.0.1:62591
> >>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> >>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> >>>>>>>>>>> Got unrecognized protocol version -65535
> >>>>>>>>>>>
> >>>>>>>>>>> Once I upgrade all of the peers the system is up and
> >> running,
> >>>>>> without
> >>>>>>>>>>> apparently no data loss.
> >>>>>>>>>>>
> >>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
> >> say,
> >>>>>> server1,
> >>>>>>>>>>> server1 is not able to accept connections (error "Close of
> >>>>> session
> >>>>>> 0x0
> >>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
> >>>> clients,
> >>>>>> this
> >>>>>>>>>>> is expected, because as far as it cannot talk with the
> >> other
> >>>>> peers
> >>>>>> it
> >>>>>>>>>>> is practically partitioned away from the cluster.
> >>>>>>>>>>>
> >>>>>>>>>>> My questions are:
> >>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
> >> from
> >>>> 3.5
> >>>>> to
> >>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago,
> >>>> and I
> >>>>>> was
> >>>>>>>>>>> not in the community as dev so I cannot tell
> >>>>>>>>>>> 2) is this a viable option for users ? to have some
> >> temporary
> >>>>>> glitch
> >>>>>>>>>>> during the upgrade and hope that the upgrade completes
> >> without
> >>>>>>>>>>> troubles ?
> >>>>>>>>>>>
> >>>>>>>>>>> In theory as long as two servers are running the same major
> >>>>> version
> >>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
> >> make
> >>>>>> progress
> >>>>>>>>>>> and to server clients.
> >>>>>>>>>>> I feel that this is quite dangerous, but I don't have
> >> enough
> >>>>>> context
> >>>>>>>>>>> to understand how this problem is possible and when we
> >> decided
> >>>> to
> >>>>>>>>>>> break compatibility.
> >>>>>>>>>>>
> >>>>>>>>>>> The other option is that I am wrong in my test and I am
> >> messing
> >>>>> up
> >>>>>> :-)
> >>>>>>>>>>>
> >>>>>>>>>>> The other upgrade path I would like to see working like a
> >> charm
> >>>>> is
> >>>>>> the
> >>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
> >> release
> >>>> 3.6
> >>>>> we
> >>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards
> >>>>>>>>>>> Enrico
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Andor Molnar <an...@apache.org>.
The most obvious one which crosses my mind is that I previously worked on:

1) run old version cluster,
2) connect to each node and run smoke tests,
3) restart one node with new code,
4) goto 2) until all nodes are upgraded

I think this wouldn’t work in a “unit test”, we probably need a separate Jenkins job and a nice python script to do this.

Andor




> On 2020. Feb 11., at 16:38, Patrick Hunt <ph...@apache.org> wrote:
> 
> Anyone have ideas how we could add testing for upgrade? Obviously something
> we're missing, esp given it's import.
> 
> Patrick
> 
> On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <eo...@gmail.com>
> wrote:
> 
>> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
>> <sz...@gmail.com> ha scritto:
>>> 
>>> Hi All,
>>> 
>>> about the question from Michael:
>>>> Regarding the fix, can we just make 3.6.0 aware of the old protocol and
>>>> speak old message format when it's talking to old server?
>>> 
>>> In this particular case, it might be enough. The protocol change happened
>>> now in the 'initial message' sent by the QuorumCnxManager. Maybe it is
>> not
>>> a problem if the new servers can not initiate channels to the old
>> servers,
>>> maybe it is enough if these channel gets initiated by the old servers
>> only.
>>> I will test it quickly.
>>> 
>>> Although I have no idea if any other thing changed in the quorum protocol
>>> between 3.5 and 3.6. In other cases it might not be enough if the new
>>> servers can understand the old messages, as the old servers can break by
>>> not understanding the messages from the new servers. Also, in the code
>>> currently (AFAIK) there is no generic knowledge of protocol versions, the
>>> servers are not storing that which protocol versions they can/should use
>> to
>>> communicate to which particular other servers. Maybe we don't even need
>>> this, but I would feel better if we would have more tests around these
>>> things.
>>> 
>>> My suggestion for the long term:
>>> - let's fix this particular issue now with 3.6.0 quickly (I start doing
>>> this today)
>>> - let's do some automation (backed up with jenkins) that will test a
>> whole
>>> combinations of different ZooKeeper upgrade paths by making rolling
>>> upgrades during some light traffic. Let's have a bit better definition
>>> about what we expect (e.g. the quorum is up, but some clients can get
>>> disconnected? What will happen to the ephemeral nodes? Do we want to
>>> gracefully close or transfer the user sessions before stopping the old
>>> server?) and let's see where this broke. Just by checking the code, I
>> don't
>>> think the quorum will always be up (e.g. between older 3.4 versions and
>>> 3.5).
>> 
>> 
>> I am happy to work on this topic
>> 
>>> - we need to update the Wiki about the working rolling upgrade paths and
>>> maybe about workarounds if needed
>>> - we might need to do some fixes (adding backward compatible versions
>>> and/or specific parameters that enforce old protocol temporary during the
>>> rolling upgrade that can be changed later to the new protocol by either
>>> dynamic reconfig or by rolling restart)
>> 
>> it would be much better on 3.6 code to have some support for
>> compatibility with 3.5 servers
>> we can't require old code to be forward compatible but we can make new
>> code be compatible to a certain extend with old code.
>> If we can achieve this compatibility goal without a flag is better,
>> users won't have to care about this part and they simply "trust" on us
>> 
>> The rollback story is also important, but maybe we are still not ready
>> for it, in case of local changes to store,
>> it is better to have a clear design and plan and work for a new release
>> (3.7?)
>> 
>> Enrico
>> 
>>> 
>>> Depending on your comments, I am happy to create a few Jira tickets
>> around
>>> these topics.
>>> 
>>> Kind regards,
>>> Mate
>>> 
>>> ps. Enrico, sorry about your RC... I owe you a beer, let me know if you
>> are
>>> near to Budapest ;)
>>> 
>>> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <eo...@gmail.com>
>> wrote:
>>> 
>>>> Good.
>>>> 
>>>> I will cancel the vote for 3.6.0rc2.
>>>> 
>>>> I appreciate very much If Mate and his colleagues have time to work on
>> a
>>>> fix.
>>>> Otherwise I will have cycles next week
>>>> 
>>>> I would also like to spend my time in setting up a few minimal
>> integration
>>>> tests about the upgrade story
>>>> 
>>>> Enrico
>>>> 
>>>> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha scritto:
>>>> 
>>>>> Kudos Enrico, very thorough work as the final gate keeper of the
>> release!
>>>>> 
>>>>> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
>>>>> 
>>>>> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the
>> rare
>>>>> piece of software that put so much emphasis on compatibilities thus
>> it
>>>> just
>>>>> works when upgrade / downgrade, which is amazing. One guarantee we
>> always
>>>>> had is during rolling upgrade, the quorum will always be available,
>>>> leading
>>>>> to no service interruption. It would be sad we lose such capability
>> given
>>>>> this is still a tractable problem.
>>>>> 
>>>>> Regarding the fix, can we just make 3.6.0 aware of the old protocol
>> and
>>>>> speak old message format when it's talking to old server? Basically,
>> an
>>>>> ugly if else check against the protocol version should work and
>> there is
>>>> no
>>>>> need to have multiple pass on rolling upgrade process.
>>>>> 
>>>>> 
>>>>> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
>> eolivelli@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I suggest this plan:
>>>>>> - release 3.6.0 now
>>>>>> - improve the migration story, the flow outlined by Mate is
>>>>>> interesting, but it will take time
>>>>>> 
>>>>>> 3.6.0rc2 got enough binding votes so I am going to finalize the
>>>>>> release this evening (within 8-10 hours) if no one comes out in the
>>>>>> VOTE thread with a -1
>>>>>> 
>>>>>> Enrico
>>>>>> 
>>>>>> Enrico
>>>>>> 
>>>>>> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
>>>>>> <ph...@apache.org> ha scritto:
>>>>>>> 
>>>>>>> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Answers inline.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> In my experience when you are close to a release it is
>> better to
>>>> to
>>>>>>>>> make big changes. (I am among the approvers of that patch,
>> so I
>>>> am
>>>>>>>>> responsible for this change)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Although this statement is acceptable for me, I don’t feel this
>>>> patch
>>>>>>>> should not have been merged into 3.6.0. Submission has been
>>>> preceded
>>>>>> by a
>>>>>>>> long argument with MAPR folks who originally wanted to be
>> merged
>>>> into
>>>>>> 3.4
>>>>>>>> branch (considering the pace how ZooKeeper community is moving
>>>>>> forward) and
>>>>>>>> we reached an agreement that release it with 3.6.0.
>>>>>>>> 
>>>>>>>> Make a long story short, this patch has been outstanding for
>> ages
>>>>>> without
>>>>>>>> much attention from the community and contributors made a lot
>> of
>>>>>> effort to
>>>>>>>> get it done before the release.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> I would like to ear from people that have been in the
>> community
>>>> for
>>>>>>>>> long time, then I am ready to complete the release process
>> for
>>>>>>>>> 3.6.0rc2.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Me too.
>>>>>>>> 
>>>>>>>> I tend to accept the way rolling restart works now - as you
>>>> described
>>>>>>>> Enrico - and given that situation was pretty much the same
>> between
>>>>> 3.4
>>>>>> and
>>>>>>>> 3.5, I don’t feel we have to make additional changes.
>>>>>>>> 
>>>>>>>> On the other hand, the fix that Mate suggested sounds quite
>> cool,
>>>> I’m
>>>>>> also
>>>>>>>> happy to work on getting it in.
>>>>>>>> 
>>>>>>>> Fyi, Release Management page says the following:
>>>>>>>> 
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
>>>>>>>> 
>>>>>>>> "major.minor release of ZooKeeper must be backwards compatible
>> with
>>>>> the
>>>>>>>> previous minor release, major.(minor-1)"
>>>>>>>> 
>>>>>>>> 
>>>>>>> Our users, direct and indirect, value the ability to migrate to
>> newer
>>>>>>> versions - esp as we drop support for older. Frictions such as
>> this
>>>> can
>>>>>> be
>>>>>>> a reason to go elsewhere. I'm "pro" b/w compact - esp given our
>>>>> published
>>>>>>> guidelines.
>>>>>>> 
>>>>>>> Patrick
>>>>>>> 
>>>>>>> 
>>>>>>>> Andor
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 2020. Feb 10., at 11:32, Enrico Olivelli <
>> eolivelli@gmail.com
>>>>> 
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thank you Mate for checking and explaining this story.
>>>>>>>>> 
>>>>>>>>> I find it very interesting that the cause is ZOOKEEPER-3188
>> as:
>>>>>>>>> - it is the last "big patch" committed to 3.6 before
>> starting the
>>>>>>>>> release process
>>>>>>>>> - it is the cause of the failure of the first RC
>>>>>>>>> 
>>>>>>>>> In my experience when you are close to a release it is
>> better to
>>>> to
>>>>>>>>> make big changes. (I am among the approvers of that patch,
>> so I
>>>> am
>>>>>>>>> responsible for this change)
>>>>>>>>> 
>>>>>>>>> This is a pointer to the change to whom who wants to
>> understand
>>>>>> better
>>>>>>>>> the context
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
>>>>>>>>> 
>>>>>>>>> IIUC even for the upgrade from 3.4 to 3.5 the story was the
>> same
>>>>> and
>>>>>>>>> if this statement holds then I feel we can continue
>>>>>>>>> with this release.
>>>>>>>>> 
>>>>>>>>> - Reverting ZOOKEEPER-3188 is not an option for me, it is too
>>>>>> complex.
>>>>>>>>> - Making 3.5 and 3.6 "compatible" can be very tricky and we
>> do
>>>> not
>>>>>>>>> have tools to certify this compatibility (at least not in the
>>>> short
>>>>>>>>> term)
>>>>>>>>> 
>>>>>>>>> I would like to ear from people that have been in the
>> community
>>>> for
>>>>>>>>> long time, then I am ready to complete the release process
>> for
>>>>>>>>> 3.6.0rc2.
>>>>>>>>> 
>>>>>>>>> I will update the website and the release notes with a
>> specific
>>>>>>>>> warning about the upgrade, we should also update the Wiki
>>>>>>>>> 
>>>>>>>>> Enrico
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
>>>>>>>>> <sz...@gmail.com> ha scritto:
>>>>>>>>>> 
>>>>>>>>>> Hi Enrico!
>>>>>>>>>> 
>>>>>>>>>> This is caused by the different PROTOCOL_VERSION in the
>>>>>>>> QuorumCnxManager.
>>>>>>>>>> The Protocol version  was changed last time in
>> ZOOKEEPER-2186
>>>>>> released
>>>>>>>>>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some
>> bugs.
>>>>>> Later I
>>>>>>>>>> also changed the protocol version when the format of the
>> initial
>>>>>> message
>>>>>>>>>> changed in ZOOKEEPER-3188. So actually the quorum protocol
>> is
>>>> not
>>>>>>>>>> compatible in this case and is the 'expected' behavior if
>> you
>>>>>> upgrade
>>>>>>>> e.g
>>>>>>>>>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to
>>>> 3.6.0.
>>>>>>>>>> 
>>>>>>>>>> We had some discussion in the PR of ZOOKEEPER-3188 back
>> then and
>>>>>> got to
>>>>>>>> the
>>>>>>>>>> conclusion that it is not that bad, as there will be no data
>>>> loss
>>>>>> as you
>>>>>>>>>> wrote. The tricky thing is that during rolling upgrade we
>> should
>>>>>> ensure
>>>>>>>>>> both backward and forward compatibility to make sure that
>> the
>>>> old
>>>>>> and
>>>>>>>> the
>>>>>>>>>> new part of the quorum can still speak to each other. The
>>>> current
>>>>>>>> solution
>>>>>>>>>> (simply failing if the protocol versions mismatch) is more
>>>> simple
>>>>>> and
>>>>>>>> still
>>>>>>>>>> working just fine: as the servers are restarted one-by-one,
>> the
>>>>>> nodes
>>>>>>>> with
>>>>>>>>>> the old protocol version and the nodes with the new protocol
>>>>> version
>>>>>>>> will
>>>>>>>>>> form two partitions, but any given time only one partition
>> will
>>>>>> have the
>>>>>>>>>> quorum.
>>>>>>>>>> 
>>>>>>>>>> Still, thinking it trough, as a side effect in these cases
>> there
>>>>>> will
>>>>>>>> be a
>>>>>>>>>> short time when none of the partitions will have quorums
>> (when
>>>> we
>>>>>> have N
>>>>>>>>>> servers with the old protocol version, N servers with the
>> new
>>>>>> protocol
>>>>>>>>>> version, and there is one server just being restarted). I
>> am not
>>>>>> sure
>>>>>>>> if we
>>>>>>>>>> can accept this.
>>>>>>>>>> 
>>>>>>>>>> For ZOOKEEPER-3188 we can add a small patch to make it
>> possible
>>>> to
>>>>>> parse
>>>>>>>>>> the initial message of the old protocol version with the new
>>>> code.
>>>>>> But
>>>>>>>> I am
>>>>>>>>>> not sure if it would be enough (as the old code will not be
>> able
>>>>> to
>>>>>>>> parse
>>>>>>>>>> the new initial message).
>>>>>>>>>> 
>>>>>>>>>> One option can be to make a patch also for 3.5 to have a
>> version
>>>>>> which
>>>>>>>>>> supports both protocol versions. (let's say in 3.5.8) Then
>> we
>>>> can
>>>>>> write
>>>>>>>> to
>>>>>>>>>> the release note, that if you need rolling upgrade from any
>>>>> versions
>>>>>>>> since
>>>>>>>>>> 3.4.7, then you have to first upgrade from 3.5.8 before
>>>> upgrading
>>>>> to
>>>>>>>> 3.6.0.
>>>>>>>>>> We can even make the same thing on the 3.4 branch.
>>>>>>>>>> 
>>>>>>>>>> But I am also new to the community... It would be great to
>> hear
>>>>> the
>>>>>>>> opinion
>>>>>>>>>> of more experienced people.
>>>>>>>>>> Whatever the decision will be, I am happy to make the
>> changes.
>>>>>>>>>> 
>>>>>>>>>> And sorry for breaking the RC (if we decide that this needs
>> to
>>>> be
>>>>>>>>>> changed...).  ZOOKEEPER-3188 was a complex patch.
>>>>>>>>>> 
>>>>>>>>>> Kind regards,
>>>>>>>>>> Mate
>>>>>>>>>> 
>>>>>>>>>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
>>>>>> eolivelli@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> even if we had enough binding +1 on 3.6.0rc2 before
>> closing the
>>>>>> VOTE
>>>>>>>>>>> of 3.6.0 I wanted to finish my tests and I am coming to an
>>>>> apparent
>>>>>>>>>>> blocker.
>>>>>>>>>>> 
>>>>>>>>>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
>> looks
>>>>> like
>>>>>>>>>>> peers are not able to talk to each other.
>>>>>>>>>>> I have a cluster of 3, server1, server2 and server3.
>>>>>>>>>>> When I upgrade server1 to 3.6.0rc2 I see this kind of
>> errors on
>>>>> 3.5
>>>>>>>> nodes:
>>>>>>>>>>> 
>>>>>>>>>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] -
>>>>>> Received
>>>>>>>>>>> connection request 127.0.0.1:62591
>>>>>>>>>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
>>>>>>>>>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
>>>>>>>>>>> Got unrecognized protocol version -65535
>>>>>>>>>>> 
>>>>>>>>>>> Once I upgrade all of the peers the system is up and
>> running,
>>>>>> without
>>>>>>>>>>> apparently no data loss.
>>>>>>>>>>> 
>>>>>>>>>>> During the upgrade as soon as I upgrade the first node,
>> say,
>>>>>> server1,
>>>>>>>>>>> server1 is not able to accept connections (error "Close of
>>>>> session
>>>>>> 0x0
>>>>>>>>>>> java.io.IOException: ZooKeeperServer not running")  from
>>>> clients,
>>>>>> this
>>>>>>>>>>> is expected, because as far as it cannot talk with the
>> other
>>>>> peers
>>>>>> it
>>>>>>>>>>> is practically partitioned away from the cluster.
>>>>>>>>>>> 
>>>>>>>>>>> My questions are:
>>>>>>>>>>> 1) is this expected ? I can't remember protocol changes
>> from
>>>> 3.5
>>>>> to
>>>>>>>>>>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago,
>>>> and I
>>>>>> was
>>>>>>>>>>> not in the community as dev so I cannot tell
>>>>>>>>>>> 2) is this a viable option for users ? to have some
>> temporary
>>>>>> glitch
>>>>>>>>>>> during the upgrade and hope that the upgrade completes
>> without
>>>>>>>>>>> troubles ?
>>>>>>>>>>> 
>>>>>>>>>>> In theory as long as two servers are running the same major
>>>>> version
>>>>>>>>>>> (3.5 or 3.6) we have a quorum and the system is able to
>> make
>>>>>> progress
>>>>>>>>>>> and to server clients.
>>>>>>>>>>> I feel that this is quite dangerous, but I don't have
>> enough
>>>>>> context
>>>>>>>>>>> to understand how this problem is possible and when we
>> decided
>>>> to
>>>>>>>>>>> break compatibility.
>>>>>>>>>>> 
>>>>>>>>>>> The other option is that I am wrong in my test and I am
>> messing
>>>>> up
>>>>>> :-)
>>>>>>>>>>> 
>>>>>>>>>>> The other upgrade path I would like to see working like a
>> charm
>>>>> is
>>>>>> the
>>>>>>>>>>> upgrade from 3.4 to 3.6, as I see that as soon as we
>> release
>>>> 3.6
>>>>> we
>>>>>>>>>>> should encourage users to move to 3.6 and not to 3.5.
>>>>>>>>>>> 
>>>>>>>>>>> Regards
>>>>>>>>>>> Enrico
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 


Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Patrick Hunt <ph...@apache.org>.
Anyone have ideas how we could add testing for upgrade? Obviously something
we're missing, esp given it's import.

Patrick

On Tue, Feb 11, 2020 at 12:40 AM Enrico Olivelli <eo...@gmail.com>
wrote:

> Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
> <sz...@gmail.com> ha scritto:
> >
> > Hi All,
> >
> > about the question from Michael:
> > > Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> > > speak old message format when it's talking to old server?
> >
> > In this particular case, it might be enough. The protocol change happened
> > now in the 'initial message' sent by the QuorumCnxManager. Maybe it is
> not
> > a problem if the new servers can not initiate channels to the old
> servers,
> > maybe it is enough if these channel gets initiated by the old servers
> only.
> > I will test it quickly.
> >
> > Although I have no idea if any other thing changed in the quorum protocol
> > between 3.5 and 3.6. In other cases it might not be enough if the new
> > servers can understand the old messages, as the old servers can break by
> > not understanding the messages from the new servers. Also, in the code
> > currently (AFAIK) there is no generic knowledge of protocol versions, the
> > servers are not storing that which protocol versions they can/should use
> to
> > communicate to which particular other servers. Maybe we don't even need
> > this, but I would feel better if we would have more tests around these
> > things.
> >
> > My suggestion for the long term:
> > - let's fix this particular issue now with 3.6.0 quickly (I start doing
> > this today)
> > - let's do some automation (backed up with jenkins) that will test a
> whole
> > combinations of different ZooKeeper upgrade paths by making rolling
> > upgrades during some light traffic. Let's have a bit better definition
> > about what we expect (e.g. the quorum is up, but some clients can get
> > disconnected? What will happen to the ephemeral nodes? Do we want to
> > gracefully close or transfer the user sessions before stopping the old
> > server?) and let's see where this broke. Just by checking the code, I
> don't
> > think the quorum will always be up (e.g. between older 3.4 versions and
> > 3.5).
>
>
> I am happy to work on this topic
>
> > - we need to update the Wiki about the working rolling upgrade paths and
> > maybe about workarounds if needed
> > - we might need to do some fixes (adding backward compatible versions
> > and/or specific parameters that enforce old protocol temporary during the
> > rolling upgrade that can be changed later to the new protocol by either
> > dynamic reconfig or by rolling restart)
>
> it would be much better on 3.6 code to have some support for
> compatibility with 3.5 servers
> we can't require old code to be forward compatible but we can make new
> code be compatible to a certain extend with old code.
> If we can achieve this compatibility goal without a flag is better,
> users won't have to care about this part and they simply "trust" on us
>
> The rollback story is also important, but maybe we are still not ready
> for it, in case of local changes to store,
> it is better to have a clear design and plan and work for a new release
> (3.7?)
>
> Enrico
>
> >
> > Depending on your comments, I am happy to create a few Jira tickets
> around
> > these topics.
> >
> > Kind regards,
> > Mate
> >
> > ps. Enrico, sorry about your RC... I owe you a beer, let me know if you
> are
> > near to Budapest ;)
> >
> > On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <eo...@gmail.com>
> wrote:
> >
> > > Good.
> > >
> > > I will cancel the vote for 3.6.0rc2.
> > >
> > > I appreciate very much If Mate and his colleagues have time to work on
> a
> > > fix.
> > > Otherwise I will have cycles next week
> > >
> > > I would also like to spend my time in setting up a few minimal
> integration
> > > tests about the upgrade story
> > >
> > > Enrico
> > >
> > > Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha scritto:
> > >
> > > > Kudos Enrico, very thorough work as the final gate keeper of the
> release!
> > > >
> > > > Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> > > >
> > > > I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the
> rare
> > > > piece of software that put so much emphasis on compatibilities thus
> it
> > > just
> > > > works when upgrade / downgrade, which is amazing. One guarantee we
> always
> > > > had is during rolling upgrade, the quorum will always be available,
> > > leading
> > > > to no service interruption. It would be sad we lose such capability
> given
> > > > this is still a tractable problem.
> > > >
> > > > Regarding the fix, can we just make 3.6.0 aware of the old protocol
> and
> > > > speak old message format when it's talking to old server? Basically,
> an
> > > > ugly if else check against the protocol version should work and
> there is
> > > no
> > > > need to have multiple pass on rolling upgrade process.
> > > >
> > > >
> > > > On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <
> eolivelli@gmail.com>
> > > > wrote:
> > > >
> > > > > I suggest this plan:
> > > > > - release 3.6.0 now
> > > > > - improve the migration story, the flow outlined by Mate is
> > > > > interesting, but it will take time
> > > > >
> > > > > 3.6.0rc2 got enough binding votes so I am going to finalize the
> > > > > release this evening (within 8-10 hours) if no one comes out in the
> > > > > VOTE thread with a -1
> > > > >
> > > > > Enrico
> > > > >
> > > > > Enrico
> > > > >
> > > > > Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> > > > > <ph...@apache.org> ha scritto:
> > > > > >
> > > > > > On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Answers inline.
> > > > > > >
> > > > > > >
> > > > > > > > In my experience when you are close to a release it is
> better to
> > > to
> > > > > > > > make big changes. (I am among the approvers of that patch,
> so I
> > > am
> > > > > > > > responsible for this change)
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Although this statement is acceptable for me, I don’t feel this
> > > patch
> > > > > > > should not have been merged into 3.6.0. Submission has been
> > > preceded
> > > > > by a
> > > > > > > long argument with MAPR folks who originally wanted to be
> merged
> > > into
> > > > > 3.4
> > > > > > > branch (considering the pace how ZooKeeper community is moving
> > > > > forward) and
> > > > > > > we reached an agreement that release it with 3.6.0.
> > > > > > >
> > > > > > > Make a long story short, this patch has been outstanding for
> ages
> > > > > without
> > > > > > > much attention from the community and contributors made a lot
> of
> > > > > effort to
> > > > > > > get it done before the release.
> > > > > > >
> > > > > > >
> > > > > > > > I would like to ear from people that have been in the
> community
> > > for
> > > > > > > > long time, then I am ready to complete the release process
> for
> > > > > > > > 3.6.0rc2.
> > > > > > >
> > > > > > >
> > > > > > > Me too.
> > > > > > >
> > > > > > > I tend to accept the way rolling restart works now - as you
> > > described
> > > > > > > Enrico - and given that situation was pretty much the same
> between
> > > > 3.4
> > > > > and
> > > > > > > 3.5, I don’t feel we have to make additional changes.
> > > > > > >
> > > > > > > On the other hand, the fix that Mate suggested sounds quite
> cool,
> > > I’m
> > > > > also
> > > > > > > happy to work on getting it in.
> > > > > > >
> > > > > > > Fyi, Release Management page says the following:
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > > > > > >
> > > > > > > "major.minor release of ZooKeeper must be backwards compatible
> with
> > > > the
> > > > > > > previous minor release, major.(minor-1)"
> > > > > > >
> > > > > > >
> > > > > > Our users, direct and indirect, value the ability to migrate to
> newer
> > > > > > versions - esp as we drop support for older. Frictions such as
> this
> > > can
> > > > > be
> > > > > > a reason to go elsewhere. I'm "pro" b/w compact - esp given our
> > > > published
> > > > > > guidelines.
> > > > > >
> > > > > > Patrick
> > > > > >
> > > > > >
> > > > > > > Andor
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > On 2020. Feb 10., at 11:32, Enrico Olivelli <
> eolivelli@gmail.com
> > > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > Thank you Mate for checking and explaining this story.
> > > > > > > >
> > > > > > > > I find it very interesting that the cause is ZOOKEEPER-3188
> as:
> > > > > > > > - it is the last "big patch" committed to 3.6 before
> starting the
> > > > > > > > release process
> > > > > > > > - it is the cause of the failure of the first RC
> > > > > > > >
> > > > > > > > In my experience when you are close to a release it is
> better to
> > > to
> > > > > > > > make big changes. (I am among the approvers of that patch,
> so I
> > > am
> > > > > > > > responsible for this change)
> > > > > > > >
> > > > > > > > This is a pointer to the change to whom who wants to
> understand
> > > > > better
> > > > > > > > the context
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > > > > > >
> > > > > > > > IIUC even for the upgrade from 3.4 to 3.5 the story was the
> same
> > > > and
> > > > > > > > if this statement holds then I feel we can continue
> > > > > > > > with this release.
> > > > > > > >
> > > > > > > > - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> > > > > complex.
> > > > > > > > - Making 3.5 and 3.6 "compatible" can be very tricky and we
> do
> > > not
> > > > > > > > have tools to certify this compatibility (at least not in the
> > > short
> > > > > > > > term)
> > > > > > > >
> > > > > > > > I would like to ear from people that have been in the
> community
> > > for
> > > > > > > > long time, then I am ready to complete the release process
> for
> > > > > > > > 3.6.0rc2.
> > > > > > > >
> > > > > > > > I will update the website and the release notes with a
> specific
> > > > > > > > warning about the upgrade, we should also update the Wiki
> > > > > > > >
> > > > > > > > Enrico
> > > > > > > >
> > > > > > > >
> > > > > > > > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > > > > > > <sz...@gmail.com> ha scritto:
> > > > > > > >>
> > > > > > > >> Hi Enrico!
> > > > > > > >>
> > > > > > > >> This is caused by the different PROTOCOL_VERSION in the
> > > > > > > QuorumCnxManager.
> > > > > > > >> The Protocol version  was changed last time in
> ZOOKEEPER-2186
> > > > > released
> > > > > > > >> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some
> bugs.
> > > > > Later I
> > > > > > > >> also changed the protocol version when the format of the
> initial
> > > > > message
> > > > > > > >> changed in ZOOKEEPER-3188. So actually the quorum protocol
> is
> > > not
> > > > > > > >> compatible in this case and is the 'expected' behavior if
> you
> > > > > upgrade
> > > > > > > e.g
> > > > > > > >> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to
> > > 3.6.0.
> > > > > > > >>
> > > > > > > >> We had some discussion in the PR of ZOOKEEPER-3188 back
> then and
> > > > > got to
> > > > > > > the
> > > > > > > >> conclusion that it is not that bad, as there will be no data
> > > loss
> > > > > as you
> > > > > > > >> wrote. The tricky thing is that during rolling upgrade we
> should
> > > > > ensure
> > > > > > > >> both backward and forward compatibility to make sure that
> the
> > > old
> > > > > and
> > > > > > > the
> > > > > > > >> new part of the quorum can still speak to each other. The
> > > current
> > > > > > > solution
> > > > > > > >> (simply failing if the protocol versions mismatch) is more
> > > simple
> > > > > and
> > > > > > > still
> > > > > > > >> working just fine: as the servers are restarted one-by-one,
> the
> > > > > nodes
> > > > > > > with
> > > > > > > >> the old protocol version and the nodes with the new protocol
> > > > version
> > > > > > > will
> > > > > > > >> form two partitions, but any given time only one partition
> will
> > > > > have the
> > > > > > > >> quorum.
> > > > > > > >>
> > > > > > > >> Still, thinking it trough, as a side effect in these cases
> there
> > > > > will
> > > > > > > be a
> > > > > > > >> short time when none of the partitions will have quorums
> (when
> > > we
> > > > > have N
> > > > > > > >> servers with the old protocol version, N servers with the
> new
> > > > > protocol
> > > > > > > >> version, and there is one server just being restarted). I
> am not
> > > > > sure
> > > > > > > if we
> > > > > > > >> can accept this.
> > > > > > > >>
> > > > > > > >> For ZOOKEEPER-3188 we can add a small patch to make it
> possible
> > > to
> > > > > parse
> > > > > > > >> the initial message of the old protocol version with the new
> > > code.
> > > > > But
> > > > > > > I am
> > > > > > > >> not sure if it would be enough (as the old code will not be
> able
> > > > to
> > > > > > > parse
> > > > > > > >> the new initial message).
> > > > > > > >>
> > > > > > > >> One option can be to make a patch also for 3.5 to have a
> version
> > > > > which
> > > > > > > >> supports both protocol versions. (let's say in 3.5.8) Then
> we
> > > can
> > > > > write
> > > > > > > to
> > > > > > > >> the release note, that if you need rolling upgrade from any
> > > > versions
> > > > > > > since
> > > > > > > >> 3.4.7, then you have to first upgrade from 3.5.8 before
> > > upgrading
> > > > to
> > > > > > > 3.6.0.
> > > > > > > >> We can even make the same thing on the 3.4 branch.
> > > > > > > >>
> > > > > > > >> But I am also new to the community... It would be great to
> hear
> > > > the
> > > > > > > opinion
> > > > > > > >> of more experienced people.
> > > > > > > >> Whatever the decision will be, I am happy to make the
> changes.
> > > > > > > >>
> > > > > > > >> And sorry for breaking the RC (if we decide that this needs
> to
> > > be
> > > > > > > >> changed...).  ZOOKEEPER-3188 was a complex patch.
> > > > > > > >>
> > > > > > > >> Kind regards,
> > > > > > > >> Mate
> > > > > > > >>
> > > > > > > >> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> > > > > eolivelli@gmail.com>
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >>> Hi,
> > > > > > > >>> even if we had enough binding +1 on 3.6.0rc2 before
> closing the
> > > > > VOTE
> > > > > > > >>> of 3.6.0 I wanted to finish my tests and I am coming to an
> > > > apparent
> > > > > > > >>> blocker.
> > > > > > > >>>
> > > > > > > >>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it
> looks
> > > > like
> > > > > > > >>> peers are not able to talk to each other.
> > > > > > > >>> I have a cluster of 3, server1, server2 and server3.
> > > > > > > >>> When I upgrade server1 to 3.6.0rc2 I see this kind of
> errors on
> > > > 3.5
> > > > > > > nodes:
> > > > > > > >>>
> > > > > > > >>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > > > > > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] -
> > > > > Received
> > > > > > > >>> connection request 127.0.0.1:62591
> > > > > > > >>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > > > > > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > > > > > > >>>
> > > > > > > >>>
> > > > > > >
> > > > >
> > > >
> > >
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > > > > > > >>> Got unrecognized protocol version -65535
> > > > > > > >>>
> > > > > > > >>> Once I upgrade all of the peers the system is up and
> running,
> > > > > without
> > > > > > > >>> apparently no data loss.
> > > > > > > >>>
> > > > > > > >>> During the upgrade as soon as I upgrade the first node,
> say,
> > > > > server1,
> > > > > > > >>> server1 is not able to accept connections (error "Close of
> > > > session
> > > > > 0x0
> > > > > > > >>> java.io.IOException: ZooKeeperServer not running")  from
> > > clients,
> > > > > this
> > > > > > > >>> is expected, because as far as it cannot talk with the
> other
> > > > peers
> > > > > it
> > > > > > > >>> is practically partitioned away from the cluster.
> > > > > > > >>>
> > > > > > > >>> My questions are:
> > > > > > > >>> 1) is this expected ? I can't remember protocol changes
> from
> > > 3.5
> > > > to
> > > > > > > >>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago,
> > > and I
> > > > > was
> > > > > > > >>> not in the community as dev so I cannot tell
> > > > > > > >>> 2) is this a viable option for users ? to have some
> temporary
> > > > > glitch
> > > > > > > >>> during the upgrade and hope that the upgrade completes
> without
> > > > > > > >>> troubles ?
> > > > > > > >>>
> > > > > > > >>> In theory as long as two servers are running the same major
> > > > version
> > > > > > > >>> (3.5 or 3.6) we have a quorum and the system is able to
> make
> > > > > progress
> > > > > > > >>> and to server clients.
> > > > > > > >>> I feel that this is quite dangerous, but I don't have
> enough
> > > > > context
> > > > > > > >>> to understand how this problem is possible and when we
> decided
> > > to
> > > > > > > >>> break compatibility.
> > > > > > > >>>
> > > > > > > >>> The other option is that I am wrong in my test and I am
> messing
> > > > up
> > > > > :-)
> > > > > > > >>>
> > > > > > > >>> The other upgrade path I would like to see working like a
> charm
> > > > is
> > > > > the
> > > > > > > >>> upgrade from 3.4 to 3.6, as I see that as soon as we
> release
> > > 3.6
> > > > we
> > > > > > > >>> should encourage users to move to 3.6 and not to 3.5.
> > > > > > > >>>
> > > > > > > >>> Regards
> > > > > > > >>> Enrico
> > > > > > > >>>
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Enrico Olivelli <eo...@gmail.com>.
Il giorno mar 11 feb 2020 alle ore 09:12 Szalay-Bekő Máté
<sz...@gmail.com> ha scritto:
>
> Hi All,
>
> about the question from Michael:
> > Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> > speak old message format when it's talking to old server?
>
> In this particular case, it might be enough. The protocol change happened
> now in the 'initial message' sent by the QuorumCnxManager. Maybe it is not
> a problem if the new servers can not initiate channels to the old servers,
> maybe it is enough if these channel gets initiated by the old servers only.
> I will test it quickly.
>
> Although I have no idea if any other thing changed in the quorum protocol
> between 3.5 and 3.6. In other cases it might not be enough if the new
> servers can understand the old messages, as the old servers can break by
> not understanding the messages from the new servers. Also, in the code
> currently (AFAIK) there is no generic knowledge of protocol versions, the
> servers are not storing that which protocol versions they can/should use to
> communicate to which particular other servers. Maybe we don't even need
> this, but I would feel better if we would have more tests around these
> things.
>
> My suggestion for the long term:
> - let's fix this particular issue now with 3.6.0 quickly (I start doing
> this today)
> - let's do some automation (backed up with jenkins) that will test a whole
> combinations of different ZooKeeper upgrade paths by making rolling
> upgrades during some light traffic. Let's have a bit better definition
> about what we expect (e.g. the quorum is up, but some clients can get
> disconnected? What will happen to the ephemeral nodes? Do we want to
> gracefully close or transfer the user sessions before stopping the old
> server?) and let's see where this broke. Just by checking the code, I don't
> think the quorum will always be up (e.g. between older 3.4 versions and
> 3.5).


I am happy to work on this topic

> - we need to update the Wiki about the working rolling upgrade paths and
> maybe about workarounds if needed
> - we might need to do some fixes (adding backward compatible versions
> and/or specific parameters that enforce old protocol temporary during the
> rolling upgrade that can be changed later to the new protocol by either
> dynamic reconfig or by rolling restart)

it would be much better on 3.6 code to have some support for
compatibility with 3.5 servers
we can't require old code to be forward compatible but we can make new
code be compatible to a certain extend with old code.
If we can achieve this compatibility goal without a flag is better,
users won't have to care about this part and they simply "trust" on us

The rollback story is also important, but maybe we are still not ready
for it, in case of local changes to store,
it is better to have a clear design and plan and work for a new release (3.7?)

Enrico

>
> Depending on your comments, I am happy to create a few Jira tickets around
> these topics.
>
> Kind regards,
> Mate
>
> ps. Enrico, sorry about your RC... I owe you a beer, let me know if you are
> near to Budapest ;)
>
> On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <eo...@gmail.com> wrote:
>
> > Good.
> >
> > I will cancel the vote for 3.6.0rc2.
> >
> > I appreciate very much If Mate and his colleagues have time to work on a
> > fix.
> > Otherwise I will have cycles next week
> >
> > I would also like to spend my time in setting up a few minimal integration
> > tests about the upgrade story
> >
> > Enrico
> >
> > Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha scritto:
> >
> > > Kudos Enrico, very thorough work as the final gate keeper of the release!
> > >
> > > Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> > >
> > > I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the rare
> > > piece of software that put so much emphasis on compatibilities thus it
> > just
> > > works when upgrade / downgrade, which is amazing. One guarantee we always
> > > had is during rolling upgrade, the quorum will always be available,
> > leading
> > > to no service interruption. It would be sad we lose such capability given
> > > this is still a tractable problem.
> > >
> > > Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> > > speak old message format when it's talking to old server? Basically, an
> > > ugly if else check against the protocol version should work and there is
> > no
> > > need to have multiple pass on rolling upgrade process.
> > >
> > >
> > > On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > I suggest this plan:
> > > > - release 3.6.0 now
> > > > - improve the migration story, the flow outlined by Mate is
> > > > interesting, but it will take time
> > > >
> > > > 3.6.0rc2 got enough binding votes so I am going to finalize the
> > > > release this evening (within 8-10 hours) if no one comes out in the
> > > > VOTE thread with a -1
> > > >
> > > > Enrico
> > > >
> > > > Enrico
> > > >
> > > > Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> > > > <ph...@apache.org> ha scritto:
> > > > >
> > > > > On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Answers inline.
> > > > > >
> > > > > >
> > > > > > > In my experience when you are close to a release it is better to
> > to
> > > > > > > make big changes. (I am among the approvers of that patch, so I
> > am
> > > > > > > responsible for this change)
> > > > > >
> > > > > >
> > > > > >
> > > > > > Although this statement is acceptable for me, I don’t feel this
> > patch
> > > > > > should not have been merged into 3.6.0. Submission has been
> > preceded
> > > > by a
> > > > > > long argument with MAPR folks who originally wanted to be merged
> > into
> > > > 3.4
> > > > > > branch (considering the pace how ZooKeeper community is moving
> > > > forward) and
> > > > > > we reached an agreement that release it with 3.6.0.
> > > > > >
> > > > > > Make a long story short, this patch has been outstanding for ages
> > > > without
> > > > > > much attention from the community and contributors made a lot of
> > > > effort to
> > > > > > get it done before the release.
> > > > > >
> > > > > >
> > > > > > > I would like to ear from people that have been in the community
> > for
> > > > > > > long time, then I am ready to complete the release process for
> > > > > > > 3.6.0rc2.
> > > > > >
> > > > > >
> > > > > > Me too.
> > > > > >
> > > > > > I tend to accept the way rolling restart works now - as you
> > described
> > > > > > Enrico - and given that situation was pretty much the same between
> > > 3.4
> > > > and
> > > > > > 3.5, I don’t feel we have to make additional changes.
> > > > > >
> > > > > > On the other hand, the fix that Mate suggested sounds quite cool,
> > I’m
> > > > also
> > > > > > happy to work on getting it in.
> > > > > >
> > > > > > Fyi, Release Management page says the following:
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > > > > >
> > > > > > "major.minor release of ZooKeeper must be backwards compatible with
> > > the
> > > > > > previous minor release, major.(minor-1)"
> > > > > >
> > > > > >
> > > > > Our users, direct and indirect, value the ability to migrate to newer
> > > > > versions - esp as we drop support for older. Frictions such as this
> > can
> > > > be
> > > > > a reason to go elsewhere. I'm "pro" b/w compact - esp given our
> > > published
> > > > > guidelines.
> > > > >
> > > > > Patrick
> > > > >
> > > > >
> > > > > > Andor
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On 2020. Feb 10., at 11:32, Enrico Olivelli <eolivelli@gmail.com
> > >
> > > > wrote:
> > > > > > >
> > > > > > > Thank you Mate for checking and explaining this story.
> > > > > > >
> > > > > > > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > > > > > > - it is the last "big patch" committed to 3.6 before starting the
> > > > > > > release process
> > > > > > > - it is the cause of the failure of the first RC
> > > > > > >
> > > > > > > In my experience when you are close to a release it is better to
> > to
> > > > > > > make big changes. (I am among the approvers of that patch, so I
> > am
> > > > > > > responsible for this change)
> > > > > > >
> > > > > > > This is a pointer to the change to whom who wants to understand
> > > > better
> > > > > > > the context
> > > > > > >
> > > > > >
> > > >
> > >
> > https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > > > > >
> > > > > > > IIUC even for the upgrade from 3.4 to 3.5 the story was the same
> > > and
> > > > > > > if this statement holds then I feel we can continue
> > > > > > > with this release.
> > > > > > >
> > > > > > > - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> > > > complex.
> > > > > > > - Making 3.5 and 3.6 "compatible" can be very tricky and we do
> > not
> > > > > > > have tools to certify this compatibility (at least not in the
> > short
> > > > > > > term)
> > > > > > >
> > > > > > > I would like to ear from people that have been in the community
> > for
> > > > > > > long time, then I am ready to complete the release process for
> > > > > > > 3.6.0rc2.
> > > > > > >
> > > > > > > I will update the website and the release notes with a specific
> > > > > > > warning about the upgrade, we should also update the Wiki
> > > > > > >
> > > > > > > Enrico
> > > > > > >
> > > > > > >
> > > > > > > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > > > > > <sz...@gmail.com> ha scritto:
> > > > > > >>
> > > > > > >> Hi Enrico!
> > > > > > >>
> > > > > > >> This is caused by the different PROTOCOL_VERSION in the
> > > > > > QuorumCnxManager.
> > > > > > >> The Protocol version  was changed last time in ZOOKEEPER-2186
> > > > released
> > > > > > >> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs.
> > > > Later I
> > > > > > >> also changed the protocol version when the format of the initial
> > > > message
> > > > > > >> changed in ZOOKEEPER-3188. So actually the quorum protocol is
> > not
> > > > > > >> compatible in this case and is the 'expected' behavior if you
> > > > upgrade
> > > > > > e.g
> > > > > > >> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to
> > 3.6.0.
> > > > > > >>
> > > > > > >> We had some discussion in the PR of ZOOKEEPER-3188 back then and
> > > > got to
> > > > > > the
> > > > > > >> conclusion that it is not that bad, as there will be no data
> > loss
> > > > as you
> > > > > > >> wrote. The tricky thing is that during rolling upgrade we should
> > > > ensure
> > > > > > >> both backward and forward compatibility to make sure that the
> > old
> > > > and
> > > > > > the
> > > > > > >> new part of the quorum can still speak to each other. The
> > current
> > > > > > solution
> > > > > > >> (simply failing if the protocol versions mismatch) is more
> > simple
> > > > and
> > > > > > still
> > > > > > >> working just fine: as the servers are restarted one-by-one, the
> > > > nodes
> > > > > > with
> > > > > > >> the old protocol version and the nodes with the new protocol
> > > version
> > > > > > will
> > > > > > >> form two partitions, but any given time only one partition will
> > > > have the
> > > > > > >> quorum.
> > > > > > >>
> > > > > > >> Still, thinking it trough, as a side effect in these cases there
> > > > will
> > > > > > be a
> > > > > > >> short time when none of the partitions will have quorums (when
> > we
> > > > have N
> > > > > > >> servers with the old protocol version, N servers with the new
> > > > protocol
> > > > > > >> version, and there is one server just being restarted). I am not
> > > > sure
> > > > > > if we
> > > > > > >> can accept this.
> > > > > > >>
> > > > > > >> For ZOOKEEPER-3188 we can add a small patch to make it possible
> > to
> > > > parse
> > > > > > >> the initial message of the old protocol version with the new
> > code.
> > > > But
> > > > > > I am
> > > > > > >> not sure if it would be enough (as the old code will not be able
> > > to
> > > > > > parse
> > > > > > >> the new initial message).
> > > > > > >>
> > > > > > >> One option can be to make a patch also for 3.5 to have a version
> > > > which
> > > > > > >> supports both protocol versions. (let's say in 3.5.8) Then we
> > can
> > > > write
> > > > > > to
> > > > > > >> the release note, that if you need rolling upgrade from any
> > > versions
> > > > > > since
> > > > > > >> 3.4.7, then you have to first upgrade from 3.5.8 before
> > upgrading
> > > to
> > > > > > 3.6.0.
> > > > > > >> We can even make the same thing on the 3.4 branch.
> > > > > > >>
> > > > > > >> But I am also new to the community... It would be great to hear
> > > the
> > > > > > opinion
> > > > > > >> of more experienced people.
> > > > > > >> Whatever the decision will be, I am happy to make the changes.
> > > > > > >>
> > > > > > >> And sorry for breaking the RC (if we decide that this needs to
> > be
> > > > > > >> changed...).  ZOOKEEPER-3188 was a complex patch.
> > > > > > >>
> > > > > > >> Kind regards,
> > > > > > >> Mate
> > > > > > >>
> > > > > > >> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> > > > eolivelli@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >>> Hi,
> > > > > > >>> even if we had enough binding +1 on 3.6.0rc2 before closing the
> > > > VOTE
> > > > > > >>> of 3.6.0 I wanted to finish my tests and I am coming to an
> > > apparent
> > > > > > >>> blocker.
> > > > > > >>>
> > > > > > >>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks
> > > like
> > > > > > >>> peers are not able to talk to each other.
> > > > > > >>> I have a cluster of 3, server1, server2 and server3.
> > > > > > >>> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on
> > > 3.5
> > > > > > nodes:
> > > > > > >>>
> > > > > > >>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > > > > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] -
> > > > Received
> > > > > > >>> connection request 127.0.0.1:62591
> > > > > > >>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > > > > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > > > > > >>>
> > > > > > >>>
> > > > > >
> > > >
> > >
> > org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > > > > > >>> Got unrecognized protocol version -65535
> > > > > > >>>
> > > > > > >>> Once I upgrade all of the peers the system is up and running,
> > > > without
> > > > > > >>> apparently no data loss.
> > > > > > >>>
> > > > > > >>> During the upgrade as soon as I upgrade the first node, say,
> > > > server1,
> > > > > > >>> server1 is not able to accept connections (error "Close of
> > > session
> > > > 0x0
> > > > > > >>> java.io.IOException: ZooKeeperServer not running")  from
> > clients,
> > > > this
> > > > > > >>> is expected, because as far as it cannot talk with the other
> > > peers
> > > > it
> > > > > > >>> is practically partitioned away from the cluster.
> > > > > > >>>
> > > > > > >>> My questions are:
> > > > > > >>> 1) is this expected ? I can't remember protocol changes from
> > 3.5
> > > to
> > > > > > >>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago,
> > and I
> > > > was
> > > > > > >>> not in the community as dev so I cannot tell
> > > > > > >>> 2) is this a viable option for users ? to have some temporary
> > > > glitch
> > > > > > >>> during the upgrade and hope that the upgrade completes without
> > > > > > >>> troubles ?
> > > > > > >>>
> > > > > > >>> In theory as long as two servers are running the same major
> > > version
> > > > > > >>> (3.5 or 3.6) we have a quorum and the system is able to make
> > > > progress
> > > > > > >>> and to server clients.
> > > > > > >>> I feel that this is quite dangerous, but I don't have enough
> > > > context
> > > > > > >>> to understand how this problem is possible and when we decided
> > to
> > > > > > >>> break compatibility.
> > > > > > >>>
> > > > > > >>> The other option is that I am wrong in my test and I am messing
> > > up
> > > > :-)
> > > > > > >>>
> > > > > > >>> The other upgrade path I would like to see working like a charm
> > > is
> > > > the
> > > > > > >>> upgrade from 3.4 to 3.6, as I see that as soon as we release
> > 3.6
> > > we
> > > > > > >>> should encourage users to move to 3.6 and not to 3.5.
> > > > > > >>>
> > > > > > >>> Regards
> > > > > > >>> Enrico
> > > > > > >>>
> > > > > >
> > > > > >
> > > >
> > >
> >

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Szalay-Bekő Máté <sz...@gmail.com>.
Hi All,

about the question from Michael:
> Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> speak old message format when it's talking to old server?

In this particular case, it might be enough. The protocol change happened
now in the 'initial message' sent by the QuorumCnxManager. Maybe it is not
a problem if the new servers can not initiate channels to the old servers,
maybe it is enough if these channel gets initiated by the old servers only.
I will test it quickly.

Although I have no idea if any other thing changed in the quorum protocol
between 3.5 and 3.6. In other cases it might not be enough if the new
servers can understand the old messages, as the old servers can break by
not understanding the messages from the new servers. Also, in the code
currently (AFAIK) there is no generic knowledge of protocol versions, the
servers are not storing that which protocol versions they can/should use to
communicate to which particular other servers. Maybe we don't even need
this, but I would feel better if we would have more tests around these
things.

My suggestion for the long term:
- let's fix this particular issue now with 3.6.0 quickly (I start doing
this today)
- let's do some automation (backed up with jenkins) that will test a whole
combinations of different ZooKeeper upgrade paths by making rolling
upgrades during some light traffic. Let's have a bit better definition
about what we expect (e.g. the quorum is up, but some clients can get
disconnected? What will happen to the ephemeral nodes? Do we want to
gracefully close or transfer the user sessions before stopping the old
server?) and let's see where this broke. Just by checking the code, I don't
think the quorum will always be up (e.g. between older 3.4 versions and
3.5).
- we need to update the Wiki about the working rolling upgrade paths and
maybe about workarounds if needed
- we might need to do some fixes (adding backward compatible versions
and/or specific parameters that enforce old protocol temporary during the
rolling upgrade that can be changed later to the new protocol by either
dynamic reconfig or by rolling restart)

Depending on your comments, I am happy to create a few Jira tickets around
these topics.

Kind regards,
Mate

ps. Enrico, sorry about your RC... I owe you a beer, let me know if you are
near to Budapest ;)

On Tue, Feb 11, 2020 at 8:43 AM Enrico Olivelli <eo...@gmail.com> wrote:

> Good.
>
> I will cancel the vote for 3.6.0rc2.
>
> I appreciate very much If Mate and his colleagues have time to work on a
> fix.
> Otherwise I will have cycles next week
>
> I would also like to spend my time in setting up a few minimal integration
> tests about the upgrade story
>
> Enrico
>
> Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha scritto:
>
> > Kudos Enrico, very thorough work as the final gate keeper of the release!
> >
> > Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
> >
> > I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the rare
> > piece of software that put so much emphasis on compatibilities thus it
> just
> > works when upgrade / downgrade, which is amazing. One guarantee we always
> > had is during rolling upgrade, the quorum will always be available,
> leading
> > to no service interruption. It would be sad we lose such capability given
> > this is still a tractable problem.
> >
> > Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> > speak old message format when it's talking to old server? Basically, an
> > ugly if else check against the protocol version should work and there is
> no
> > need to have multiple pass on rolling upgrade process.
> >
> >
> > On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > I suggest this plan:
> > > - release 3.6.0 now
> > > - improve the migration story, the flow outlined by Mate is
> > > interesting, but it will take time
> > >
> > > 3.6.0rc2 got enough binding votes so I am going to finalize the
> > > release this evening (within 8-10 hours) if no one comes out in the
> > > VOTE thread with a -1
> > >
> > > Enrico
> > >
> > > Enrico
> > >
> > > Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> > > <ph...@apache.org> ha scritto:
> > > >
> > > > On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Answers inline.
> > > > >
> > > > >
> > > > > > In my experience when you are close to a release it is better to
> to
> > > > > > make big changes. (I am among the approvers of that patch, so I
> am
> > > > > > responsible for this change)
> > > > >
> > > > >
> > > > >
> > > > > Although this statement is acceptable for me, I don’t feel this
> patch
> > > > > should not have been merged into 3.6.0. Submission has been
> preceded
> > > by a
> > > > > long argument with MAPR folks who originally wanted to be merged
> into
> > > 3.4
> > > > > branch (considering the pace how ZooKeeper community is moving
> > > forward) and
> > > > > we reached an agreement that release it with 3.6.0.
> > > > >
> > > > > Make a long story short, this patch has been outstanding for ages
> > > without
> > > > > much attention from the community and contributors made a lot of
> > > effort to
> > > > > get it done before the release.
> > > > >
> > > > >
> > > > > > I would like to ear from people that have been in the community
> for
> > > > > > long time, then I am ready to complete the release process for
> > > > > > 3.6.0rc2.
> > > > >
> > > > >
> > > > > Me too.
> > > > >
> > > > > I tend to accept the way rolling restart works now - as you
> described
> > > > > Enrico - and given that situation was pretty much the same between
> > 3.4
> > > and
> > > > > 3.5, I don’t feel we have to make additional changes.
> > > > >
> > > > > On the other hand, the fix that Mate suggested sounds quite cool,
> I’m
> > > also
> > > > > happy to work on getting it in.
> > > > >
> > > > > Fyi, Release Management page says the following:
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > > > >
> > > > > "major.minor release of ZooKeeper must be backwards compatible with
> > the
> > > > > previous minor release, major.(minor-1)"
> > > > >
> > > > >
> > > > Our users, direct and indirect, value the ability to migrate to newer
> > > > versions - esp as we drop support for older. Frictions such as this
> can
> > > be
> > > > a reason to go elsewhere. I'm "pro" b/w compact - esp given our
> > published
> > > > guidelines.
> > > >
> > > > Patrick
> > > >
> > > >
> > > > > Andor
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > On 2020. Feb 10., at 11:32, Enrico Olivelli <eolivelli@gmail.com
> >
> > > wrote:
> > > > > >
> > > > > > Thank you Mate for checking and explaining this story.
> > > > > >
> > > > > > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > > > > > - it is the last "big patch" committed to 3.6 before starting the
> > > > > > release process
> > > > > > - it is the cause of the failure of the first RC
> > > > > >
> > > > > > In my experience when you are close to a release it is better to
> to
> > > > > > make big changes. (I am among the approvers of that patch, so I
> am
> > > > > > responsible for this change)
> > > > > >
> > > > > > This is a pointer to the change to whom who wants to understand
> > > better
> > > > > > the context
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > > > >
> > > > > > IIUC even for the upgrade from 3.4 to 3.5 the story was the same
> > and
> > > > > > if this statement holds then I feel we can continue
> > > > > > with this release.
> > > > > >
> > > > > > - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> > > complex.
> > > > > > - Making 3.5 and 3.6 "compatible" can be very tricky and we do
> not
> > > > > > have tools to certify this compatibility (at least not in the
> short
> > > > > > term)
> > > > > >
> > > > > > I would like to ear from people that have been in the community
> for
> > > > > > long time, then I am ready to complete the release process for
> > > > > > 3.6.0rc2.
> > > > > >
> > > > > > I will update the website and the release notes with a specific
> > > > > > warning about the upgrade, we should also update the Wiki
> > > > > >
> > > > > > Enrico
> > > > > >
> > > > > >
> > > > > > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > > > > <sz...@gmail.com> ha scritto:
> > > > > >>
> > > > > >> Hi Enrico!
> > > > > >>
> > > > > >> This is caused by the different PROTOCOL_VERSION in the
> > > > > QuorumCnxManager.
> > > > > >> The Protocol version  was changed last time in ZOOKEEPER-2186
> > > released
> > > > > >> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs.
> > > Later I
> > > > > >> also changed the protocol version when the format of the initial
> > > message
> > > > > >> changed in ZOOKEEPER-3188. So actually the quorum protocol is
> not
> > > > > >> compatible in this case and is the 'expected' behavior if you
> > > upgrade
> > > > > e.g
> > > > > >> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to
> 3.6.0.
> > > > > >>
> > > > > >> We had some discussion in the PR of ZOOKEEPER-3188 back then and
> > > got to
> > > > > the
> > > > > >> conclusion that it is not that bad, as there will be no data
> loss
> > > as you
> > > > > >> wrote. The tricky thing is that during rolling upgrade we should
> > > ensure
> > > > > >> both backward and forward compatibility to make sure that the
> old
> > > and
> > > > > the
> > > > > >> new part of the quorum can still speak to each other. The
> current
> > > > > solution
> > > > > >> (simply failing if the protocol versions mismatch) is more
> simple
> > > and
> > > > > still
> > > > > >> working just fine: as the servers are restarted one-by-one, the
> > > nodes
> > > > > with
> > > > > >> the old protocol version and the nodes with the new protocol
> > version
> > > > > will
> > > > > >> form two partitions, but any given time only one partition will
> > > have the
> > > > > >> quorum.
> > > > > >>
> > > > > >> Still, thinking it trough, as a side effect in these cases there
> > > will
> > > > > be a
> > > > > >> short time when none of the partitions will have quorums (when
> we
> > > have N
> > > > > >> servers with the old protocol version, N servers with the new
> > > protocol
> > > > > >> version, and there is one server just being restarted). I am not
> > > sure
> > > > > if we
> > > > > >> can accept this.
> > > > > >>
> > > > > >> For ZOOKEEPER-3188 we can add a small patch to make it possible
> to
> > > parse
> > > > > >> the initial message of the old protocol version with the new
> code.
> > > But
> > > > > I am
> > > > > >> not sure if it would be enough (as the old code will not be able
> > to
> > > > > parse
> > > > > >> the new initial message).
> > > > > >>
> > > > > >> One option can be to make a patch also for 3.5 to have a version
> > > which
> > > > > >> supports both protocol versions. (let's say in 3.5.8) Then we
> can
> > > write
> > > > > to
> > > > > >> the release note, that if you need rolling upgrade from any
> > versions
> > > > > since
> > > > > >> 3.4.7, then you have to first upgrade from 3.5.8 before
> upgrading
> > to
> > > > > 3.6.0.
> > > > > >> We can even make the same thing on the 3.4 branch.
> > > > > >>
> > > > > >> But I am also new to the community... It would be great to hear
> > the
> > > > > opinion
> > > > > >> of more experienced people.
> > > > > >> Whatever the decision will be, I am happy to make the changes.
> > > > > >>
> > > > > >> And sorry for breaking the RC (if we decide that this needs to
> be
> > > > > >> changed...).  ZOOKEEPER-3188 was a complex patch.
> > > > > >>
> > > > > >> Kind regards,
> > > > > >> Mate
> > > > > >>
> > > > > >> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> > > eolivelli@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >>> Hi,
> > > > > >>> even if we had enough binding +1 on 3.6.0rc2 before closing the
> > > VOTE
> > > > > >>> of 3.6.0 I wanted to finish my tests and I am coming to an
> > apparent
> > > > > >>> blocker.
> > > > > >>>
> > > > > >>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks
> > like
> > > > > >>> peers are not able to talk to each other.
> > > > > >>> I have a cluster of 3, server1, server2 and server3.
> > > > > >>> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on
> > 3.5
> > > > > nodes:
> > > > > >>>
> > > > > >>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > > > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] -
> > > Received
> > > > > >>> connection request 127.0.0.1:62591
> > > > > >>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > > > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > > > > >>>
> > > > > >>>
> > > > >
> > >
> >
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > > > > >>> Got unrecognized protocol version -65535
> > > > > >>>
> > > > > >>> Once I upgrade all of the peers the system is up and running,
> > > without
> > > > > >>> apparently no data loss.
> > > > > >>>
> > > > > >>> During the upgrade as soon as I upgrade the first node, say,
> > > server1,
> > > > > >>> server1 is not able to accept connections (error "Close of
> > session
> > > 0x0
> > > > > >>> java.io.IOException: ZooKeeperServer not running")  from
> clients,
> > > this
> > > > > >>> is expected, because as far as it cannot talk with the other
> > peers
> > > it
> > > > > >>> is practically partitioned away from the cluster.
> > > > > >>>
> > > > > >>> My questions are:
> > > > > >>> 1) is this expected ? I can't remember protocol changes from
> 3.5
> > to
> > > > > >>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago,
> and I
> > > was
> > > > > >>> not in the community as dev so I cannot tell
> > > > > >>> 2) is this a viable option for users ? to have some temporary
> > > glitch
> > > > > >>> during the upgrade and hope that the upgrade completes without
> > > > > >>> troubles ?
> > > > > >>>
> > > > > >>> In theory as long as two servers are running the same major
> > version
> > > > > >>> (3.5 or 3.6) we have a quorum and the system is able to make
> > > progress
> > > > > >>> and to server clients.
> > > > > >>> I feel that this is quite dangerous, but I don't have enough
> > > context
> > > > > >>> to understand how this problem is possible and when we decided
> to
> > > > > >>> break compatibility.
> > > > > >>>
> > > > > >>> The other option is that I am wrong in my test and I am messing
> > up
> > > :-)
> > > > > >>>
> > > > > >>> The other upgrade path I would like to see working like a charm
> > is
> > > the
> > > > > >>> upgrade from 3.4 to 3.6, as I see that as soon as we release
> 3.6
> > we
> > > > > >>> should encourage users to move to 3.6 and not to 3.5.
> > > > > >>>
> > > > > >>> Regards
> > > > > >>> Enrico
> > > > > >>>
> > > > >
> > > > >
> > >
> >
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Enrico Olivelli <eo...@gmail.com>.
Good.

I will cancel the vote for 3.6.0rc2.

I appreciate very much If Mate and his colleagues have time to work on a
fix.
Otherwise I will have cycles next week

I would also like to spend my time in setting up a few minimal integration
tests about the upgrade story

Enrico

Il Mar 11 Feb 2020, 07:30 Michael Han <ha...@apache.org> ha scritto:

> Kudos Enrico, very thorough work as the final gate keeper of the release!
>
> Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.
>
> I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the rare
> piece of software that put so much emphasis on compatibilities thus it just
> works when upgrade / downgrade, which is amazing. One guarantee we always
> had is during rolling upgrade, the quorum will always be available, leading
> to no service interruption. It would be sad we lose such capability given
> this is still a tractable problem.
>
> Regarding the fix, can we just make 3.6.0 aware of the old protocol and
> speak old message format when it's talking to old server? Basically, an
> ugly if else check against the protocol version should work and there is no
> need to have multiple pass on rolling upgrade process.
>
>
> On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > I suggest this plan:
> > - release 3.6.0 now
> > - improve the migration story, the flow outlined by Mate is
> > interesting, but it will take time
> >
> > 3.6.0rc2 got enough binding votes so I am going to finalize the
> > release this evening (within 8-10 hours) if no one comes out in the
> > VOTE thread with a -1
> >
> > Enrico
> >
> > Enrico
> >
> > Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> > <ph...@apache.org> ha scritto:
> > >
> > > On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org> wrote:
> > >
> > > > Hi,
> > > >
> > > > Answers inline.
> > > >
> > > >
> > > > > In my experience when you are close to a release it is better to to
> > > > > make big changes. (I am among the approvers of that patch, so I am
> > > > > responsible for this change)
> > > >
> > > >
> > > >
> > > > Although this statement is acceptable for me, I don’t feel this patch
> > > > should not have been merged into 3.6.0. Submission has been preceded
> > by a
> > > > long argument with MAPR folks who originally wanted to be merged into
> > 3.4
> > > > branch (considering the pace how ZooKeeper community is moving
> > forward) and
> > > > we reached an agreement that release it with 3.6.0.
> > > >
> > > > Make a long story short, this patch has been outstanding for ages
> > without
> > > > much attention from the community and contributors made a lot of
> > effort to
> > > > get it done before the release.
> > > >
> > > >
> > > > > I would like to ear from people that have been in the community for
> > > > > long time, then I am ready to complete the release process for
> > > > > 3.6.0rc2.
> > > >
> > > >
> > > > Me too.
> > > >
> > > > I tend to accept the way rolling restart works now - as you described
> > > > Enrico - and given that situation was pretty much the same between
> 3.4
> > and
> > > > 3.5, I don’t feel we have to make additional changes.
> > > >
> > > > On the other hand, the fix that Mate suggested sounds quite cool, I’m
> > also
> > > > happy to work on getting it in.
> > > >
> > > > Fyi, Release Management page says the following:
> > > >
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > > >
> > > > "major.minor release of ZooKeeper must be backwards compatible with
> the
> > > > previous minor release, major.(minor-1)"
> > > >
> > > >
> > > Our users, direct and indirect, value the ability to migrate to newer
> > > versions - esp as we drop support for older. Frictions such as this can
> > be
> > > a reason to go elsewhere. I'm "pro" b/w compact - esp given our
> published
> > > guidelines.
> > >
> > > Patrick
> > >
> > >
> > > > Andor
> > > >
> > > >
> > > >
> > > >
> > > > > On 2020. Feb 10., at 11:32, Enrico Olivelli <eo...@gmail.com>
> > wrote:
> > > > >
> > > > > Thank you Mate for checking and explaining this story.
> > > > >
> > > > > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > > > > - it is the last "big patch" committed to 3.6 before starting the
> > > > > release process
> > > > > - it is the cause of the failure of the first RC
> > > > >
> > > > > In my experience when you are close to a release it is better to to
> > > > > make big changes. (I am among the approvers of that patch, so I am
> > > > > responsible for this change)
> > > > >
> > > > > This is a pointer to the change to whom who wants to understand
> > better
> > > > > the context
> > > > >
> > > >
> >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > > >
> > > > > IIUC even for the upgrade from 3.4 to 3.5 the story was the same
> and
> > > > > if this statement holds then I feel we can continue
> > > > > with this release.
> > > > >
> > > > > - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> > complex.
> > > > > - Making 3.5 and 3.6 "compatible" can be very tricky and we do not
> > > > > have tools to certify this compatibility (at least not in the short
> > > > > term)
> > > > >
> > > > > I would like to ear from people that have been in the community for
> > > > > long time, then I am ready to complete the release process for
> > > > > 3.6.0rc2.
> > > > >
> > > > > I will update the website and the release notes with a specific
> > > > > warning about the upgrade, we should also update the Wiki
> > > > >
> > > > > Enrico
> > > > >
> > > > >
> > > > > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > > > <sz...@gmail.com> ha scritto:
> > > > >>
> > > > >> Hi Enrico!
> > > > >>
> > > > >> This is caused by the different PROTOCOL_VERSION in the
> > > > QuorumCnxManager.
> > > > >> The Protocol version  was changed last time in ZOOKEEPER-2186
> > released
> > > > >> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs.
> > Later I
> > > > >> also changed the protocol version when the format of the initial
> > message
> > > > >> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> > > > >> compatible in this case and is the 'expected' behavior if you
> > upgrade
> > > > e.g
> > > > >> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
> > > > >>
> > > > >> We had some discussion in the PR of ZOOKEEPER-3188 back then and
> > got to
> > > > the
> > > > >> conclusion that it is not that bad, as there will be no data loss
> > as you
> > > > >> wrote. The tricky thing is that during rolling upgrade we should
> > ensure
> > > > >> both backward and forward compatibility to make sure that the old
> > and
> > > > the
> > > > >> new part of the quorum can still speak to each other. The current
> > > > solution
> > > > >> (simply failing if the protocol versions mismatch) is more simple
> > and
> > > > still
> > > > >> working just fine: as the servers are restarted one-by-one, the
> > nodes
> > > > with
> > > > >> the old protocol version and the nodes with the new protocol
> version
> > > > will
> > > > >> form two partitions, but any given time only one partition will
> > have the
> > > > >> quorum.
> > > > >>
> > > > >> Still, thinking it trough, as a side effect in these cases there
> > will
> > > > be a
> > > > >> short time when none of the partitions will have quorums (when we
> > have N
> > > > >> servers with the old protocol version, N servers with the new
> > protocol
> > > > >> version, and there is one server just being restarted). I am not
> > sure
> > > > if we
> > > > >> can accept this.
> > > > >>
> > > > >> For ZOOKEEPER-3188 we can add a small patch to make it possible to
> > parse
> > > > >> the initial message of the old protocol version with the new code.
> > But
> > > > I am
> > > > >> not sure if it would be enough (as the old code will not be able
> to
> > > > parse
> > > > >> the new initial message).
> > > > >>
> > > > >> One option can be to make a patch also for 3.5 to have a version
> > which
> > > > >> supports both protocol versions. (let's say in 3.5.8) Then we can
> > write
> > > > to
> > > > >> the release note, that if you need rolling upgrade from any
> versions
> > > > since
> > > > >> 3.4.7, then you have to first upgrade from 3.5.8 before upgrading
> to
> > > > 3.6.0.
> > > > >> We can even make the same thing on the 3.4 branch.
> > > > >>
> > > > >> But I am also new to the community... It would be great to hear
> the
> > > > opinion
> > > > >> of more experienced people.
> > > > >> Whatever the decision will be, I am happy to make the changes.
> > > > >>
> > > > >> And sorry for breaking the RC (if we decide that this needs to be
> > > > >> changed...).  ZOOKEEPER-3188 was a complex patch.
> > > > >>
> > > > >> Kind regards,
> > > > >> Mate
> > > > >>
> > > > >> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> > eolivelli@gmail.com>
> > > > wrote:
> > > > >>
> > > > >>> Hi,
> > > > >>> even if we had enough binding +1 on 3.6.0rc2 before closing the
> > VOTE
> > > > >>> of 3.6.0 I wanted to finish my tests and I am coming to an
> apparent
> > > > >>> blocker.
> > > > >>>
> > > > >>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks
> like
> > > > >>> peers are not able to talk to each other.
> > > > >>> I have a cluster of 3, server1, server2 and server3.
> > > > >>> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on
> 3.5
> > > > nodes:
> > > > >>>
> > > > >>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] -
> > Received
> > > > >>> connection request 127.0.0.1:62591
> > > > >>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > > > >>>
> > > > >>>
> > > >
> >
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > > > >>> Got unrecognized protocol version -65535
> > > > >>>
> > > > >>> Once I upgrade all of the peers the system is up and running,
> > without
> > > > >>> apparently no data loss.
> > > > >>>
> > > > >>> During the upgrade as soon as I upgrade the first node, say,
> > server1,
> > > > >>> server1 is not able to accept connections (error "Close of
> session
> > 0x0
> > > > >>> java.io.IOException: ZooKeeperServer not running")  from clients,
> > this
> > > > >>> is expected, because as far as it cannot talk with the other
> peers
> > it
> > > > >>> is practically partitioned away from the cluster.
> > > > >>>
> > > > >>> My questions are:
> > > > >>> 1) is this expected ? I can't remember protocol changes from 3.5
> to
> > > > >>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I
> > was
> > > > >>> not in the community as dev so I cannot tell
> > > > >>> 2) is this a viable option for users ? to have some temporary
> > glitch
> > > > >>> during the upgrade and hope that the upgrade completes without
> > > > >>> troubles ?
> > > > >>>
> > > > >>> In theory as long as two servers are running the same major
> version
> > > > >>> (3.5 or 3.6) we have a quorum and the system is able to make
> > progress
> > > > >>> and to server clients.
> > > > >>> I feel that this is quite dangerous, but I don't have enough
> > context
> > > > >>> to understand how this problem is possible and when we decided to
> > > > >>> break compatibility.
> > > > >>>
> > > > >>> The other option is that I am wrong in my test and I am messing
> up
> > :-)
> > > > >>>
> > > > >>> The other upgrade path I would like to see working like a charm
> is
> > the
> > > > >>> upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6
> we
> > > > >>> should encourage users to move to 3.6 and not to 3.5.
> > > > >>>
> > > > >>> Regards
> > > > >>> Enrico
> > > > >>>
> > > >
> > > >
> >
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Michael Han <ha...@apache.org>.
Kudos Enrico, very thorough work as the final gate keeper of the release!

Now with this, I'd like to *vote a -1* on the 3.6.0 RC2.

I'd recommend we fix this issue for 3.6.0. ZooKeeper is one of the rare
piece of software that put so much emphasis on compatibilities thus it just
works when upgrade / downgrade, which is amazing. One guarantee we always
had is during rolling upgrade, the quorum will always be available, leading
to no service interruption. It would be sad we lose such capability given
this is still a tractable problem.

Regarding the fix, can we just make 3.6.0 aware of the old protocol and
speak old message format when it's talking to old server? Basically, an
ugly if else check against the protocol version should work and there is no
need to have multiple pass on rolling upgrade process.


On Mon, Feb 10, 2020 at 10:23 PM Enrico Olivelli <eo...@gmail.com>
wrote:

> I suggest this plan:
> - release 3.6.0 now
> - improve the migration story, the flow outlined by Mate is
> interesting, but it will take time
>
> 3.6.0rc2 got enough binding votes so I am going to finalize the
> release this evening (within 8-10 hours) if no one comes out in the
> VOTE thread with a -1
>
> Enrico
>
> Enrico
>
> Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
> <ph...@apache.org> ha scritto:
> >
> > On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org> wrote:
> >
> > > Hi,
> > >
> > > Answers inline.
> > >
> > >
> > > > In my experience when you are close to a release it is better to to
> > > > make big changes. (I am among the approvers of that patch, so I am
> > > > responsible for this change)
> > >
> > >
> > >
> > > Although this statement is acceptable for me, I don’t feel this patch
> > > should not have been merged into 3.6.0. Submission has been preceded
> by a
> > > long argument with MAPR folks who originally wanted to be merged into
> 3.4
> > > branch (considering the pace how ZooKeeper community is moving
> forward) and
> > > we reached an agreement that release it with 3.6.0.
> > >
> > > Make a long story short, this patch has been outstanding for ages
> without
> > > much attention from the community and contributors made a lot of
> effort to
> > > get it done before the release.
> > >
> > >
> > > > I would like to ear from people that have been in the community for
> > > > long time, then I am ready to complete the release process for
> > > > 3.6.0rc2.
> > >
> > >
> > > Me too.
> > >
> > > I tend to accept the way rolling restart works now - as you described
> > > Enrico - and given that situation was pretty much the same between 3.4
> and
> > > 3.5, I don’t feel we have to make additional changes.
> > >
> > > On the other hand, the fix that Mate suggested sounds quite cool, I’m
> also
> > > happy to work on getting it in.
> > >
> > > Fyi, Release Management page says the following:
> > >
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> > >
> > > "major.minor release of ZooKeeper must be backwards compatible with the
> > > previous minor release, major.(minor-1)"
> > >
> > >
> > Our users, direct and indirect, value the ability to migrate to newer
> > versions - esp as we drop support for older. Frictions such as this can
> be
> > a reason to go elsewhere. I'm "pro" b/w compact - esp given our published
> > guidelines.
> >
> > Patrick
> >
> >
> > > Andor
> > >
> > >
> > >
> > >
> > > > On 2020. Feb 10., at 11:32, Enrico Olivelli <eo...@gmail.com>
> wrote:
> > > >
> > > > Thank you Mate for checking and explaining this story.
> > > >
> > > > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > > > - it is the last "big patch" committed to 3.6 before starting the
> > > > release process
> > > > - it is the cause of the failure of the first RC
> > > >
> > > > In my experience when you are close to a release it is better to to
> > > > make big changes. (I am among the approvers of that patch, so I am
> > > > responsible for this change)
> > > >
> > > > This is a pointer to the change to whom who wants to understand
> better
> > > > the context
> > > >
> > >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > > >
> > > > IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
> > > > if this statement holds then I feel we can continue
> > > > with this release.
> > > >
> > > > - Reverting ZOOKEEPER-3188 is not an option for me, it is too
> complex.
> > > > - Making 3.5 and 3.6 "compatible" can be very tricky and we do not
> > > > have tools to certify this compatibility (at least not in the short
> > > > term)
> > > >
> > > > I would like to ear from people that have been in the community for
> > > > long time, then I am ready to complete the release process for
> > > > 3.6.0rc2.
> > > >
> > > > I will update the website and the release notes with a specific
> > > > warning about the upgrade, we should also update the Wiki
> > > >
> > > > Enrico
> > > >
> > > >
> > > > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > > <sz...@gmail.com> ha scritto:
> > > >>
> > > >> Hi Enrico!
> > > >>
> > > >> This is caused by the different PROTOCOL_VERSION in the
> > > QuorumCnxManager.
> > > >> The Protocol version  was changed last time in ZOOKEEPER-2186
> released
> > > >> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs.
> Later I
> > > >> also changed the protocol version when the format of the initial
> message
> > > >> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> > > >> compatible in this case and is the 'expected' behavior if you
> upgrade
> > > e.g
> > > >> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
> > > >>
> > > >> We had some discussion in the PR of ZOOKEEPER-3188 back then and
> got to
> > > the
> > > >> conclusion that it is not that bad, as there will be no data loss
> as you
> > > >> wrote. The tricky thing is that during rolling upgrade we should
> ensure
> > > >> both backward and forward compatibility to make sure that the old
> and
> > > the
> > > >> new part of the quorum can still speak to each other. The current
> > > solution
> > > >> (simply failing if the protocol versions mismatch) is more simple
> and
> > > still
> > > >> working just fine: as the servers are restarted one-by-one, the
> nodes
> > > with
> > > >> the old protocol version and the nodes with the new protocol version
> > > will
> > > >> form two partitions, but any given time only one partition will
> have the
> > > >> quorum.
> > > >>
> > > >> Still, thinking it trough, as a side effect in these cases there
> will
> > > be a
> > > >> short time when none of the partitions will have quorums (when we
> have N
> > > >> servers with the old protocol version, N servers with the new
> protocol
> > > >> version, and there is one server just being restarted). I am not
> sure
> > > if we
> > > >> can accept this.
> > > >>
> > > >> For ZOOKEEPER-3188 we can add a small patch to make it possible to
> parse
> > > >> the initial message of the old protocol version with the new code.
> But
> > > I am
> > > >> not sure if it would be enough (as the old code will not be able to
> > > parse
> > > >> the new initial message).
> > > >>
> > > >> One option can be to make a patch also for 3.5 to have a version
> which
> > > >> supports both protocol versions. (let's say in 3.5.8) Then we can
> write
> > > to
> > > >> the release note, that if you need rolling upgrade from any versions
> > > since
> > > >> 3.4.7, then you have to first upgrade from 3.5.8 before upgrading to
> > > 3.6.0.
> > > >> We can even make the same thing on the 3.4 branch.
> > > >>
> > > >> But I am also new to the community... It would be great to hear the
> > > opinion
> > > >> of more experienced people.
> > > >> Whatever the decision will be, I am happy to make the changes.
> > > >>
> > > >> And sorry for breaking the RC (if we decide that this needs to be
> > > >> changed...).  ZOOKEEPER-3188 was a complex patch.
> > > >>
> > > >> Kind regards,
> > > >> Mate
> > > >>
> > > >> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <
> eolivelli@gmail.com>
> > > wrote:
> > > >>
> > > >>> Hi,
> > > >>> even if we had enough binding +1 on 3.6.0rc2 before closing the
> VOTE
> > > >>> of 3.6.0 I wanted to finish my tests and I am coming to an apparent
> > > >>> blocker.
> > > >>>
> > > >>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
> > > >>> peers are not able to talk to each other.
> > > >>> I have a cluster of 3, server1, server2 and server3.
> > > >>> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5
> > > nodes:
> > > >>>
> > > >>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] -
> Received
> > > >>> connection request 127.0.0.1:62591
> > > >>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > > >>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > > >>>
> > > >>>
> > >
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > > >>> Got unrecognized protocol version -65535
> > > >>>
> > > >>> Once I upgrade all of the peers the system is up and running,
> without
> > > >>> apparently no data loss.
> > > >>>
> > > >>> During the upgrade as soon as I upgrade the first node, say,
> server1,
> > > >>> server1 is not able to accept connections (error "Close of session
> 0x0
> > > >>> java.io.IOException: ZooKeeperServer not running")  from clients,
> this
> > > >>> is expected, because as far as it cannot talk with the other peers
> it
> > > >>> is practically partitioned away from the cluster.
> > > >>>
> > > >>> My questions are:
> > > >>> 1) is this expected ? I can't remember protocol changes from 3.5 to
> > > >>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I
> was
> > > >>> not in the community as dev so I cannot tell
> > > >>> 2) is this a viable option for users ? to have some temporary
> glitch
> > > >>> during the upgrade and hope that the upgrade completes without
> > > >>> troubles ?
> > > >>>
> > > >>> In theory as long as two servers are running the same major version
> > > >>> (3.5 or 3.6) we have a quorum and the system is able to make
> progress
> > > >>> and to server clients.
> > > >>> I feel that this is quite dangerous, but I don't have enough
> context
> > > >>> to understand how this problem is possible and when we decided to
> > > >>> break compatibility.
> > > >>>
> > > >>> The other option is that I am wrong in my test and I am messing up
> :-)
> > > >>>
> > > >>> The other upgrade path I would like to see working like a charm is
> the
> > > >>> upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
> > > >>> should encourage users to move to 3.6 and not to 3.5.
> > > >>>
> > > >>> Regards
> > > >>> Enrico
> > > >>>
> > >
> > >
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Enrico Olivelli <eo...@gmail.com>.
I suggest this plan:
- release 3.6.0 now
- improve the migration story, the flow outlined by Mate is
interesting, but it will take time

3.6.0rc2 got enough binding votes so I am going to finalize the
release this evening (within 8-10 hours) if no one comes out in the
VOTE thread with a -1

Enrico

Enrico

Il giorno lun 10 feb 2020 alle ore 19:33 Patrick Hunt
<ph...@apache.org> ha scritto:
>
> On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org> wrote:
>
> > Hi,
> >
> > Answers inline.
> >
> >
> > > In my experience when you are close to a release it is better to to
> > > make big changes. (I am among the approvers of that patch, so I am
> > > responsible for this change)
> >
> >
> >
> > Although this statement is acceptable for me, I don’t feel this patch
> > should not have been merged into 3.6.0. Submission has been preceded by a
> > long argument with MAPR folks who originally wanted to be merged into 3.4
> > branch (considering the pace how ZooKeeper community is moving forward) and
> > we reached an agreement that release it with 3.6.0.
> >
> > Make a long story short, this patch has been outstanding for ages without
> > much attention from the community and contributors made a lot of effort to
> > get it done before the release.
> >
> >
> > > I would like to ear from people that have been in the community for
> > > long time, then I am ready to complete the release process for
> > > 3.6.0rc2.
> >
> >
> > Me too.
> >
> > I tend to accept the way rolling restart works now - as you described
> > Enrico - and given that situation was pretty much the same between 3.4 and
> > 3.5, I don’t feel we have to make additional changes.
> >
> > On the other hand, the fix that Mate suggested sounds quite cool, I’m also
> > happy to work on getting it in.
> >
> > Fyi, Release Management page says the following:
> > https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
> >
> > "major.minor release of ZooKeeper must be backwards compatible with the
> > previous minor release, major.(minor-1)"
> >
> >
> Our users, direct and indirect, value the ability to migrate to newer
> versions - esp as we drop support for older. Frictions such as this can be
> a reason to go elsewhere. I'm "pro" b/w compact - esp given our published
> guidelines.
>
> Patrick
>
>
> > Andor
> >
> >
> >
> >
> > > On 2020. Feb 10., at 11:32, Enrico Olivelli <eo...@gmail.com> wrote:
> > >
> > > Thank you Mate for checking and explaining this story.
> > >
> > > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > > - it is the last "big patch" committed to 3.6 before starting the
> > > release process
> > > - it is the cause of the failure of the first RC
> > >
> > > In my experience when you are close to a release it is better to to
> > > make big changes. (I am among the approvers of that patch, so I am
> > > responsible for this change)
> > >
> > > This is a pointer to the change to whom who wants to understand better
> > > the context
> > >
> > https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> > >
> > > IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
> > > if this statement holds then I feel we can continue
> > > with this release.
> > >
> > > - Reverting ZOOKEEPER-3188 is not an option for me, it is too complex.
> > > - Making 3.5 and 3.6 "compatible" can be very tricky and we do not
> > > have tools to certify this compatibility (at least not in the short
> > > term)
> > >
> > > I would like to ear from people that have been in the community for
> > > long time, then I am ready to complete the release process for
> > > 3.6.0rc2.
> > >
> > > I will update the website and the release notes with a specific
> > > warning about the upgrade, we should also update the Wiki
> > >
> > > Enrico
> > >
> > >
> > > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > > <sz...@gmail.com> ha scritto:
> > >>
> > >> Hi Enrico!
> > >>
> > >> This is caused by the different PROTOCOL_VERSION in the
> > QuorumCnxManager.
> > >> The Protocol version  was changed last time in ZOOKEEPER-2186 released
> > >> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
> > >> also changed the protocol version when the format of the initial message
> > >> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> > >> compatible in this case and is the 'expected' behavior if you upgrade
> > e.g
> > >> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
> > >>
> > >> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to
> > the
> > >> conclusion that it is not that bad, as there will be no data loss as you
> > >> wrote. The tricky thing is that during rolling upgrade we should ensure
> > >> both backward and forward compatibility to make sure that the old and
> > the
> > >> new part of the quorum can still speak to each other. The current
> > solution
> > >> (simply failing if the protocol versions mismatch) is more simple and
> > still
> > >> working just fine: as the servers are restarted one-by-one, the nodes
> > with
> > >> the old protocol version and the nodes with the new protocol version
> > will
> > >> form two partitions, but any given time only one partition will have the
> > >> quorum.
> > >>
> > >> Still, thinking it trough, as a side effect in these cases there will
> > be a
> > >> short time when none of the partitions will have quorums (when we have N
> > >> servers with the old protocol version, N servers with the new protocol
> > >> version, and there is one server just being restarted). I am not sure
> > if we
> > >> can accept this.
> > >>
> > >> For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
> > >> the initial message of the old protocol version with the new code. But
> > I am
> > >> not sure if it would be enough (as the old code will not be able to
> > parse
> > >> the new initial message).
> > >>
> > >> One option can be to make a patch also for 3.5 to have a version which
> > >> supports both protocol versions. (let's say in 3.5.8) Then we can write
> > to
> > >> the release note, that if you need rolling upgrade from any versions
> > since
> > >> 3.4.7, then you have to first upgrade from 3.5.8 before upgrading to
> > 3.6.0.
> > >> We can even make the same thing on the 3.4 branch.
> > >>
> > >> But I am also new to the community... It would be great to hear the
> > opinion
> > >> of more experienced people.
> > >> Whatever the decision will be, I am happy to make the changes.
> > >>
> > >> And sorry for breaking the RC (if we decide that this needs to be
> > >> changed...).  ZOOKEEPER-3188 was a complex patch.
> > >>
> > >> Kind regards,
> > >> Mate
> > >>
> > >> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> > >>
> > >>> Hi,
> > >>> even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
> > >>> of 3.6.0 I wanted to finish my tests and I am coming to an apparent
> > >>> blocker.
> > >>>
> > >>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
> > >>> peers are not able to talk to each other.
> > >>> I have a cluster of 3, server1, server2 and server3.
> > >>> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5
> > nodes:
> > >>>
> > >>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> > >>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
> > >>> connection request 127.0.0.1:62591
> > >>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > >>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> > >>>
> > >>>
> > org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > >>> Got unrecognized protocol version -65535
> > >>>
> > >>> Once I upgrade all of the peers the system is up and running, without
> > >>> apparently no data loss.
> > >>>
> > >>> During the upgrade as soon as I upgrade the first node, say, server1,
> > >>> server1 is not able to accept connections (error "Close of session 0x0
> > >>> java.io.IOException: ZooKeeperServer not running")  from clients, this
> > >>> is expected, because as far as it cannot talk with the other peers it
> > >>> is practically partitioned away from the cluster.
> > >>>
> > >>> My questions are:
> > >>> 1) is this expected ? I can't remember protocol changes from 3.5 to
> > >>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
> > >>> not in the community as dev so I cannot tell
> > >>> 2) is this a viable option for users ? to have some temporary glitch
> > >>> during the upgrade and hope that the upgrade completes without
> > >>> troubles ?
> > >>>
> > >>> In theory as long as two servers are running the same major version
> > >>> (3.5 or 3.6) we have a quorum and the system is able to make progress
> > >>> and to server clients.
> > >>> I feel that this is quite dangerous, but I don't have enough context
> > >>> to understand how this problem is possible and when we decided to
> > >>> break compatibility.
> > >>>
> > >>> The other option is that I am wrong in my test and I am messing up :-)
> > >>>
> > >>> The other upgrade path I would like to see working like a charm is the
> > >>> upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
> > >>> should encourage users to move to 3.6 and not to 3.5.
> > >>>
> > >>> Regards
> > >>> Enrico
> > >>>
> >
> >

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Patrick Hunt <ph...@apache.org>.
On Mon, Feb 10, 2020 at 3:38 AM Andor Molnar <an...@apache.org> wrote:

> Hi,
>
> Answers inline.
>
>
> > In my experience when you are close to a release it is better to to
> > make big changes. (I am among the approvers of that patch, so I am
> > responsible for this change)
>
>
>
> Although this statement is acceptable for me, I don’t feel this patch
> should not have been merged into 3.6.0. Submission has been preceded by a
> long argument with MAPR folks who originally wanted to be merged into 3.4
> branch (considering the pace how ZooKeeper community is moving forward) and
> we reached an agreement that release it with 3.6.0.
>
> Make a long story short, this patch has been outstanding for ages without
> much attention from the community and contributors made a lot of effort to
> get it done before the release.
>
>
> > I would like to ear from people that have been in the community for
> > long time, then I am ready to complete the release process for
> > 3.6.0rc2.
>
>
> Me too.
>
> I tend to accept the way rolling restart works now - as you described
> Enrico - and given that situation was pretty much the same between 3.4 and
> 3.5, I don’t feel we have to make additional changes.
>
> On the other hand, the fix that Mate suggested sounds quite cool, I’m also
> happy to work on getting it in.
>
> Fyi, Release Management page says the following:
> https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement
>
> "major.minor release of ZooKeeper must be backwards compatible with the
> previous minor release, major.(minor-1)"
>
>
Our users, direct and indirect, value the ability to migrate to newer
versions - esp as we drop support for older. Frictions such as this can be
a reason to go elsewhere. I'm "pro" b/w compact - esp given our published
guidelines.

Patrick


> Andor
>
>
>
>
> > On 2020. Feb 10., at 11:32, Enrico Olivelli <eo...@gmail.com> wrote:
> >
> > Thank you Mate for checking and explaining this story.
> >
> > I find it very interesting that the cause is ZOOKEEPER-3188 as:
> > - it is the last "big patch" committed to 3.6 before starting the
> > release process
> > - it is the cause of the failure of the first RC
> >
> > In my experience when you are close to a release it is better to to
> > make big changes. (I am among the approvers of that patch, so I am
> > responsible for this change)
> >
> > This is a pointer to the change to whom who wants to understand better
> > the context
> >
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> >
> > IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
> > if this statement holds then I feel we can continue
> > with this release.
> >
> > - Reverting ZOOKEEPER-3188 is not an option for me, it is too complex.
> > - Making 3.5 and 3.6 "compatible" can be very tricky and we do not
> > have tools to certify this compatibility (at least not in the short
> > term)
> >
> > I would like to ear from people that have been in the community for
> > long time, then I am ready to complete the release process for
> > 3.6.0rc2.
> >
> > I will update the website and the release notes with a specific
> > warning about the upgrade, we should also update the Wiki
> >
> > Enrico
> >
> >
> > Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> > <sz...@gmail.com> ha scritto:
> >>
> >> Hi Enrico!
> >>
> >> This is caused by the different PROTOCOL_VERSION in the
> QuorumCnxManager.
> >> The Protocol version  was changed last time in ZOOKEEPER-2186 released
> >> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
> >> also changed the protocol version when the format of the initial message
> >> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> >> compatible in this case and is the 'expected' behavior if you upgrade
> e.g
> >> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
> >>
> >> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to
> the
> >> conclusion that it is not that bad, as there will be no data loss as you
> >> wrote. The tricky thing is that during rolling upgrade we should ensure
> >> both backward and forward compatibility to make sure that the old and
> the
> >> new part of the quorum can still speak to each other. The current
> solution
> >> (simply failing if the protocol versions mismatch) is more simple and
> still
> >> working just fine: as the servers are restarted one-by-one, the nodes
> with
> >> the old protocol version and the nodes with the new protocol version
> will
> >> form two partitions, but any given time only one partition will have the
> >> quorum.
> >>
> >> Still, thinking it trough, as a side effect in these cases there will
> be a
> >> short time when none of the partitions will have quorums (when we have N
> >> servers with the old protocol version, N servers with the new protocol
> >> version, and there is one server just being restarted). I am not sure
> if we
> >> can accept this.
> >>
> >> For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
> >> the initial message of the old protocol version with the new code. But
> I am
> >> not sure if it would be enough (as the old code will not be able to
> parse
> >> the new initial message).
> >>
> >> One option can be to make a patch also for 3.5 to have a version which
> >> supports both protocol versions. (let's say in 3.5.8) Then we can write
> to
> >> the release note, that if you need rolling upgrade from any versions
> since
> >> 3.4.7, then you have to first upgrade from 3.5.8 before upgrading to
> 3.6.0.
> >> We can even make the same thing on the 3.4 branch.
> >>
> >> But I am also new to the community... It would be great to hear the
> opinion
> >> of more experienced people.
> >> Whatever the decision will be, I am happy to make the changes.
> >>
> >> And sorry for breaking the RC (if we decide that this needs to be
> >> changed...).  ZOOKEEPER-3188 was a complex patch.
> >>
> >> Kind regards,
> >> Mate
> >>
> >> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <eo...@gmail.com>
> wrote:
> >>
> >>> Hi,
> >>> even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
> >>> of 3.6.0 I wanted to finish my tests and I am coming to an apparent
> >>> blocker.
> >>>
> >>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
> >>> peers are not able to talk to each other.
> >>> I have a cluster of 3, server1, server2 and server3.
> >>> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5
> nodes:
> >>>
> >>> 2020-02-10 09:35:07,745 [myid:3] - INFO
> >>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
> >>> connection request 127.0.0.1:62591
> >>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> >>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> >>>
> >>>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> >>> Got unrecognized protocol version -65535
> >>>
> >>> Once I upgrade all of the peers the system is up and running, without
> >>> apparently no data loss.
> >>>
> >>> During the upgrade as soon as I upgrade the first node, say, server1,
> >>> server1 is not able to accept connections (error "Close of session 0x0
> >>> java.io.IOException: ZooKeeperServer not running")  from clients, this
> >>> is expected, because as far as it cannot talk with the other peers it
> >>> is practically partitioned away from the cluster.
> >>>
> >>> My questions are:
> >>> 1) is this expected ? I can't remember protocol changes from 3.5 to
> >>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
> >>> not in the community as dev so I cannot tell
> >>> 2) is this a viable option for users ? to have some temporary glitch
> >>> during the upgrade and hope that the upgrade completes without
> >>> troubles ?
> >>>
> >>> In theory as long as two servers are running the same major version
> >>> (3.5 or 3.6) we have a quorum and the system is able to make progress
> >>> and to server clients.
> >>> I feel that this is quite dangerous, but I don't have enough context
> >>> to understand how this problem is possible and when we decided to
> >>> break compatibility.
> >>>
> >>> The other option is that I am wrong in my test and I am messing up :-)
> >>>
> >>> The other upgrade path I would like to see working like a charm is the
> >>> upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
> >>> should encourage users to move to 3.6 and not to 3.5.
> >>>
> >>> Regards
> >>> Enrico
> >>>
>
>

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Andor Molnar <an...@apache.org>.
Hi,

Answers inline.


> In my experience when you are close to a release it is better to to
> make big changes. (I am among the approvers of that patch, so I am
> responsible for this change)



Although this statement is acceptable for me, I don’t feel this patch should not have been merged into 3.6.0. Submission has been preceded by a long argument with MAPR folks who originally wanted to be merged into 3.4 branch (considering the pace how ZooKeeper community is moving forward) and we reached an agreement that release it with 3.6.0.

Make a long story short, this patch has been outstanding for ages without much attention from the community and contributors made a lot of effort to get it done before the release.


> I would like to ear from people that have been in the community for
> long time, then I am ready to complete the release process for
> 3.6.0rc2.


Me too.

I tend to accept the way rolling restart works now - as you described Enrico - and given that situation was pretty much the same between 3.4 and 3.5, I don’t feel we have to make additional changes.

On the other hand, the fix that Mate suggested sounds quite cool, I’m also happy to work on getting it in.

Fyi, Release Management page says the following: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ReleaseManagement

"major.minor release of ZooKeeper must be backwards compatible with the previous minor release, major.(minor-1)"

Andor




> On 2020. Feb 10., at 11:32, Enrico Olivelli <eo...@gmail.com> wrote:
> 
> Thank you Mate for checking and explaining this story.
> 
> I find it very interesting that the cause is ZOOKEEPER-3188 as:
> - it is the last "big patch" committed to 3.6 before starting the
> release process
> - it is the cause of the failure of the first RC
> 
> In my experience when you are close to a release it is better to to
> make big changes. (I am among the approvers of that patch, so I am
> responsible for this change)
> 
> This is a pointer to the change to whom who wants to understand better
> the context
> https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11
> 
> IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
> if this statement holds then I feel we can continue
> with this release.
> 
> - Reverting ZOOKEEPER-3188 is not an option for me, it is too complex.
> - Making 3.5 and 3.6 "compatible" can be very tricky and we do not
> have tools to certify this compatibility (at least not in the short
> term)
> 
> I would like to ear from people that have been in the community for
> long time, then I am ready to complete the release process for
> 3.6.0rc2.
> 
> I will update the website and the release notes with a specific
> warning about the upgrade, we should also update the Wiki
> 
> Enrico
> 
> 
> Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
> <sz...@gmail.com> ha scritto:
>> 
>> Hi Enrico!
>> 
>> This is caused by the different PROTOCOL_VERSION in the QuorumCnxManager.
>> The Protocol version  was changed last time in ZOOKEEPER-2186 released
>> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
>> also changed the protocol version when the format of the initial message
>> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
>> compatible in this case and is the 'expected' behavior if you upgrade e.g
>> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
>> 
>> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to the
>> conclusion that it is not that bad, as there will be no data loss as you
>> wrote. The tricky thing is that during rolling upgrade we should ensure
>> both backward and forward compatibility to make sure that the old and the
>> new part of the quorum can still speak to each other. The current solution
>> (simply failing if the protocol versions mismatch) is more simple and still
>> working just fine: as the servers are restarted one-by-one, the nodes with
>> the old protocol version and the nodes with the new protocol version will
>> form two partitions, but any given time only one partition will have the
>> quorum.
>> 
>> Still, thinking it trough, as a side effect in these cases there will be a
>> short time when none of the partitions will have quorums (when we have N
>> servers with the old protocol version, N servers with the new protocol
>> version, and there is one server just being restarted). I am not sure if we
>> can accept this.
>> 
>> For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
>> the initial message of the old protocol version with the new code. But I am
>> not sure if it would be enough (as the old code will not be able to parse
>> the new initial message).
>> 
>> One option can be to make a patch also for 3.5 to have a version which
>> supports both protocol versions. (let's say in 3.5.8) Then we can write to
>> the release note, that if you need rolling upgrade from any versions since
>> 3.4.7, then you have to first upgrade from 3.5.8 before upgrading to 3.6.0.
>> We can even make the same thing on the 3.4 branch.
>> 
>> But I am also new to the community... It would be great to hear the opinion
>> of more experienced people.
>> Whatever the decision will be, I am happy to make the changes.
>> 
>> And sorry for breaking the RC (if we decide that this needs to be
>> changed...).  ZOOKEEPER-3188 was a complex patch.
>> 
>> Kind regards,
>> Mate
>> 
>> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <eo...@gmail.com> wrote:
>> 
>>> Hi,
>>> even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
>>> of 3.6.0 I wanted to finish my tests and I am coming to an apparent
>>> blocker.
>>> 
>>> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
>>> peers are not able to talk to each other.
>>> I have a cluster of 3, server1, server2 and server3.
>>> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:
>>> 
>>> 2020-02-10 09:35:07,745 [myid:3] - INFO
>>> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
>>> connection request 127.0.0.1:62591
>>> 2020-02-10 09:35:07,746 [myid:3] - ERROR
>>> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
>>> 
>>> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
>>> Got unrecognized protocol version -65535
>>> 
>>> Once I upgrade all of the peers the system is up and running, without
>>> apparently no data loss.
>>> 
>>> During the upgrade as soon as I upgrade the first node, say, server1,
>>> server1 is not able to accept connections (error "Close of session 0x0
>>> java.io.IOException: ZooKeeperServer not running")  from clients, this
>>> is expected, because as far as it cannot talk with the other peers it
>>> is practically partitioned away from the cluster.
>>> 
>>> My questions are:
>>> 1) is this expected ? I can't remember protocol changes from 3.5 to
>>> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
>>> not in the community as dev so I cannot tell
>>> 2) is this a viable option for users ? to have some temporary glitch
>>> during the upgrade and hope that the upgrade completes without
>>> troubles ?
>>> 
>>> In theory as long as two servers are running the same major version
>>> (3.5 or 3.6) we have a quorum and the system is able to make progress
>>> and to server clients.
>>> I feel that this is quite dangerous, but I don't have enough context
>>> to understand how this problem is possible and when we decided to
>>> break compatibility.
>>> 
>>> The other option is that I am wrong in my test and I am messing up :-)
>>> 
>>> The other upgrade path I would like to see working like a charm is the
>>> upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
>>> should encourage users to move to 3.6 and not to 3.5.
>>> 
>>> Regards
>>> Enrico
>>> 


Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Enrico Olivelli <eo...@gmail.com>.
Thank you Mate for checking and explaining this story.

I find it very interesting that the cause is ZOOKEEPER-3188 as:
- it is the last "big patch" committed to 3.6 before starting the
release process
- it is the cause of the failure of the first RC

In my experience when you are close to a release it is better to to
make big changes. (I am among the approvers of that patch, so I am
responsible for this change)

This is a pointer to the change to whom who wants to understand better
the context
https://github.com/apache/zookeeper/pull/1048/files#diff-7a209d890686bcba351d758b64b22a7dR11

IIUC even for the upgrade from 3.4 to 3.5 the story was the same and
if this statement holds then I feel we can continue
with this release.

- Reverting ZOOKEEPER-3188 is not an option for me, it is too complex.
- Making 3.5 and 3.6 "compatible" can be very tricky and we do not
have tools to certify this compatibility (at least not in the short
term)

I would like to ear from people that have been in the community for
long time, then I am ready to complete the release process for
3.6.0rc2.

I will update the website and the release notes with a specific
warning about the upgrade, we should also update the Wiki

Enrico


Il giorno lun 10 feb 2020 alle ore 11:17 Szalay-Bekő Máté
<sz...@gmail.com> ha scritto:
>
> Hi Enrico!
>
> This is caused by the different PROTOCOL_VERSION in the QuorumCnxManager.
> The Protocol version  was changed last time in ZOOKEEPER-2186 released
> first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
> also changed the protocol version when the format of the initial message
> changed in ZOOKEEPER-3188. So actually the quorum protocol is not
> compatible in this case and is the 'expected' behavior if you upgrade e.g
> from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.
>
> We had some discussion in the PR of ZOOKEEPER-3188 back then and got to the
> conclusion that it is not that bad, as there will be no data loss as you
> wrote. The tricky thing is that during rolling upgrade we should ensure
> both backward and forward compatibility to make sure that the old and the
> new part of the quorum can still speak to each other. The current solution
> (simply failing if the protocol versions mismatch) is more simple and still
> working just fine: as the servers are restarted one-by-one, the nodes with
> the old protocol version and the nodes with the new protocol version will
> form two partitions, but any given time only one partition will have the
> quorum.
>
> Still, thinking it trough, as a side effect in these cases there will be a
> short time when none of the partitions will have quorums (when we have N
> servers with the old protocol version, N servers with the new protocol
> version, and there is one server just being restarted). I am not sure if we
> can accept this.
>
> For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
> the initial message of the old protocol version with the new code. But I am
> not sure if it would be enough (as the old code will not be able to parse
> the new initial message).
>
> One option can be to make a patch also for 3.5 to have a version which
> supports both protocol versions. (let's say in 3.5.8) Then we can write to
> the release note, that if you need rolling upgrade from any versions since
> 3.4.7, then you have to first upgrade from 3.5.8 before upgrading to 3.6.0.
> We can even make the same thing on the 3.4 branch.
>
> But I am also new to the community... It would be great to hear the opinion
> of more experienced people.
> Whatever the decision will be, I am happy to make the changes.
>
> And sorry for breaking the RC (if we decide that this needs to be
> changed...).  ZOOKEEPER-3188 was a complex patch.
>
> Kind regards,
> Mate
>
> On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <eo...@gmail.com> wrote:
>
> > Hi,
> > even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
> > of 3.6.0 I wanted to finish my tests and I am coming to an apparent
> > blocker.
> >
> > I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
> > peers are not able to talk to each other.
> > I have a cluster of 3, server1, server2 and server3.
> > When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:
> >
> > 2020-02-10 09:35:07,745 [myid:3] - INFO
> > [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
> > connection request 127.0.0.1:62591
> > 2020-02-10 09:35:07,746 [myid:3] - ERROR
> > [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
> >
> > org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> > Got unrecognized protocol version -65535
> >
> > Once I upgrade all of the peers the system is up and running, without
> > apparently no data loss.
> >
> > During the upgrade as soon as I upgrade the first node, say, server1,
> > server1 is not able to accept connections (error "Close of session 0x0
> > java.io.IOException: ZooKeeperServer not running")  from clients, this
> > is expected, because as far as it cannot talk with the other peers it
> > is practically partitioned away from the cluster.
> >
> > My questions are:
> > 1) is this expected ? I can't remember protocol changes from 3.5 to
> > 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
> > not in the community as dev so I cannot tell
> > 2) is this a viable option for users ? to have some temporary glitch
> > during the upgrade and hope that the upgrade completes without
> > troubles ?
> >
> > In theory as long as two servers are running the same major version
> > (3.5 or 3.6) we have a quorum and the system is able to make progress
> > and to server clients.
> > I feel that this is quite dangerous, but I don't have enough context
> > to understand how this problem is possible and when we decided to
> > break compatibility.
> >
> > The other option is that I am wrong in my test and I am messing up :-)
> >
> > The other upgrade path I would like to see working like a charm is the
> > upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
> > should encourage users to move to 3.6 and not to 3.5.
> >
> > Regards
> > Enrico
> >

Re: Rolling upgrade from 3.5 to 3.6 - expected behaviour

Posted by Szalay-Bekő Máté <sz...@gmail.com>.
Hi Enrico!

This is caused by the different PROTOCOL_VERSION in the QuorumCnxManager.
The Protocol version  was changed last time in ZOOKEEPER-2186 released
first in 3.4.7 and 3.5.1 to avoid some crashing / fix some bugs. Later I
also changed the protocol version when the format of the initial message
changed in ZOOKEEPER-3188. So actually the quorum protocol is not
compatible in this case and is the 'expected' behavior if you upgrade e.g
from 3.4.6 to 3.4.7, or 3.4.6 to 3.5.5 or e.g from 3.5.6 to 3.6.0.

We had some discussion in the PR of ZOOKEEPER-3188 back then and got to the
conclusion that it is not that bad, as there will be no data loss as you
wrote. The tricky thing is that during rolling upgrade we should ensure
both backward and forward compatibility to make sure that the old and the
new part of the quorum can still speak to each other. The current solution
(simply failing if the protocol versions mismatch) is more simple and still
working just fine: as the servers are restarted one-by-one, the nodes with
the old protocol version and the nodes with the new protocol version will
form two partitions, but any given time only one partition will have the
quorum.

Still, thinking it trough, as a side effect in these cases there will be a
short time when none of the partitions will have quorums (when we have N
servers with the old protocol version, N servers with the new protocol
version, and there is one server just being restarted). I am not sure if we
can accept this.

For ZOOKEEPER-3188 we can add a small patch to make it possible to parse
the initial message of the old protocol version with the new code. But I am
not sure if it would be enough (as the old code will not be able to parse
the new initial message).

One option can be to make a patch also for 3.5 to have a version which
supports both protocol versions. (let's say in 3.5.8) Then we can write to
the release note, that if you need rolling upgrade from any versions since
3.4.7, then you have to first upgrade from 3.5.8 before upgrading to 3.6.0.
We can even make the same thing on the 3.4 branch.

But I am also new to the community... It would be great to hear the opinion
of more experienced people.
Whatever the decision will be, I am happy to make the changes.

And sorry for breaking the RC (if we decide that this needs to be
changed...).  ZOOKEEPER-3188 was a complex patch.

Kind regards,
Mate

On Mon, Feb 10, 2020 at 9:47 AM Enrico Olivelli <eo...@gmail.com> wrote:

> Hi,
> even if we had enough binding +1 on 3.6.0rc2 before closing the VOTE
> of 3.6.0 I wanted to finish my tests and I am coming to an apparent
> blocker.
>
> I am trying to upgrade a 3.5.6 cluster to 3.6.0, but it looks like
> peers are not able to talk to each other.
> I have a cluster of 3, server1, server2 and server3.
> When I upgrade server1 to 3.6.0rc2 I see this kind of errors on 3.5 nodes:
>
> 2020-02-10 09:35:07,745 [myid:3] - INFO
> [localhost/127.0.0.1:3334:QuorumCnxManager$Listener@918] - Received
> connection request 127.0.0.1:62591
> 2020-02-10 09:35:07,746 [myid:3] - ERROR
> [localhost/127.0.0.1:3334:QuorumCnxManager@527] -
>
> org.apache.zookeeper.server.quorum.QuorumCnxManager$InitialMessage$InitialMessageException:
> Got unrecognized protocol version -65535
>
> Once I upgrade all of the peers the system is up and running, without
> apparently no data loss.
>
> During the upgrade as soon as I upgrade the first node, say, server1,
> server1 is not able to accept connections (error "Close of session 0x0
> java.io.IOException: ZooKeeperServer not running")  from clients, this
> is expected, because as far as it cannot talk with the other peers it
> is practically partitioned away from the cluster.
>
> My questions are:
> 1) is this expected ? I can't remember protocol changes from 3.5 to
> 3.6, but actually 3.6 diverged from 3.5 branch so long ago, and I was
> not in the community as dev so I cannot tell
> 2) is this a viable option for users ? to have some temporary glitch
> during the upgrade and hope that the upgrade completes without
> troubles ?
>
> In theory as long as two servers are running the same major version
> (3.5 or 3.6) we have a quorum and the system is able to make progress
> and to server clients.
> I feel that this is quite dangerous, but I don't have enough context
> to understand how this problem is possible and when we decided to
> break compatibility.
>
> The other option is that I am wrong in my test and I am messing up :-)
>
> The other upgrade path I would like to see working like a charm is the
> upgrade from 3.4 to 3.6, as I see that as soon as we release 3.6 we
> should encourage users to move to 3.6 and not to 3.5.
>
> Regards
> Enrico
>