You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Andor Molnar <an...@apache.org> on 2019/02/07 09:11:47 UTC

Re: JAVA 11 build is broken on 3.5

Hi folks,

Looks like we’re in a very good shape: all builds of 3.5 release are green now.
Are you thinking of anything which would be good to get in before the stable release?

Regards,
Andor




> On 2019. Jan 9., at 22:18, Andor Molnár <an...@apache.org> wrote:
> 
> Thank you guys for your support.
> 
> I leave this open for the community to decide, but if none of the
> binding voters (committers, PMCs) respond in 1 week, I'll take
> responsibility and commit the patches. I know holidays are still going
> on, but 3.5 release is important we should make progress.
> 
> 
> Thanks,
> 
> Andor
> 
> 
> 
> On 1/9/19 21:54, Norbert Kalmar wrote:
>> I agree with backporting Netty 4 and the related patches to 3.5.5.
>> +1 (non binding) (and I might have already voted)
>> 
>> If it took so long to release a stable 3.5, I don't think we should leave a
>> known bug in the system, which has such an impact (not supporting current
>> LTS java).
>> The PRs shouldn't be hard to backport, no new development is required, and
>> unfortunately maven migration isn't finished yet, so it's not like we need
>> to postpone the release just because of Netty upgrade. That's my two cents
>> anyway.
>> 
>> Regards,
>> Norbert
>> 
>> On Wed, Jan 9, 2019 at 9:19 PM Enrico Olivelli <eo...@gmail.com> wrote:
>> 
>>> Il giorno mer 9 gen 2019 alle ore 21:16 Andor Molnar
>>> <an...@apache.org> ha scritto:
>>>> So, you guys saying we should upgrade to Netty 4 and backport the
>>> following patches too:
>>>> ZOOKEEPER-3131 org.apache.zookeeper.server.WatchManager resource leak
>>>> ZOOKEEPER-3163 Use session map to improve the performance when closing
>>> session in Netty
>>>> ZOOKEEPER-3146 Limit the maximum client connections per IP in
>>> NettyServerCnxnFactory
>>>> ZOOKEEPER-3177 Refactor request throttle logic in NIO and Netty to keep
>>> the same behavior and make the code easier to maintain
>>>> I think it’s still deliverable with the stable release. Regarding the
>>> timing, let’s say we release 3.5.5 with Netty 3 and do the upgrade in
>>> 3.5.6. Is it acceptable to do such a big with a minor version change?
>>> 
>>> I think we can do the upgrade in 3.5.5.
>>> We can't say Java 11 is supported if we don't have tests working. And
>>> Java 11 is the current LTS release from Oracle.
>>> 
>>> If we release now 3.5.5 then we need to work for 3.5.6 and then
>>> release....it is a double work, we don't have so much resources :-(
>>> 
>>> IMHO it is better to release 3.5.5 and start focusing on 3.6.0
>>> 
>>> Enrico
>>> 
>>>> Upgrading now isn't ideal either, but maybe somewhat less brutal.
>>>> 
>>>> Regards,
>>>> Andor
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 2019. Jan 8., at 2:10, Brian Nixon <br...@gmail.com>
>>> wrote:
>>>>> Also worth consideration for the Netty fixes backport bundle:
>>>>> 
>>>>> ZOOKEEPER-3131 org.apache.zookeeper.server.WatchManager resource leak
>>>>> 
>>>>> Our experience with Netty has been positive so far though I don't
>>> think we
>>>>> have enough info to declare it is stable. It makes a lot of sense to
>>> me for
>>>>> 3.5 to use Netty 4, the biggest question is the timing.
>>>>> 
>>>>> 
>>>>> On Fri, Jan 4, 2019 at 10:44 AM Andor Molnár <an...@apache.org> wrote:
>>>>> 
>>>>>> I got another one:
>>>>>> 
>>>>>> ZOOKEEPER-3163 Use session map to improve the performance when closing
>>>>>> session in Netty
>>>>>> 
>>>>>> 
>>>>>> So, that doesn't seem too many. We can talk about backporting them,
>>> but
>>>>>> I don't think they're super critical for the first stable release.
>>>>>> 
>>>>>> Regarding 3.4, I need to validate the embedded scenario, but at least
>>>>>> the Java 11 build is green. :)
>>>>>> 
>>>>>> 
>>>>>> Andor
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 1/4/19 18:08, Enrico Olivelli wrote:
>>>>>>> Il giorno ven 4 gen 2019 alle ore 17:04 Norbert Kalmar
>>>>>>> <nk...@cloudera.com.invalid> ha scritto:
>>>>>>>> +1 for Netty 4 in 3.5
>>>>>>>> 
>>>>>>>> Pretty much all the pros and cons has been said before me.
>>>>>>>> I would only add that this is not a new functionality that we wan't
>>> to
>>>>>>>> backport. It's a criticall(ish?) bugfix, which requires quite a bit
>>> of
>>>>>>>> change unfortunately.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Norbert
>>>>>>>> 
>>>>>>>> On Fri, Jan 4, 2019 at 4:36 PM Andor Molnar <an...@apache.org>
>>> wrote:
>>>>>>>>> What are those patches exactly?
>>>>>>>>> 
>>>>>>>>> Comparing the ported version of 3.5 with master I’ve only found 2
>>>>>> patches
>>>>>>>>> which are missing:
>>>>>>>>> 
>>>>>>>>> ZOOKEEPER-3146 Limit the maximum client connections per IP in
>>>>>>>>> NettyServerCnxnFactory
>>>>>>>>> ZOOKEEPER-3177 Refactor request throttle logic in NIO and Netty to
>>> keep
>>>>>>>>> the same behavior and make the code easier to maintain
>>>>>>>>> 
>>>>>>>>> None of them are critical I would say.
>>>>>>>>> Is there anything else I’m missing?
>>>>>>> I have compared the histories and I think Andor you are right.
>>>>>>> 
>>>>>>> master:
>>>>>>> https://github.com/apache/zookeeper/commits/master
>>>>>>> 
>>>>>>> branch-3.5:
>>>>>>> https://github.com/apache/zookeeper/commits/branch-3.5
>>>>>>> 
>>>>>>> Sorry I was thinking to the amount of patches related to TLS stuff ,
>>>>>>> but they are not related to Netty 4.
>>>>>>> 
>>>>>>> What about branch 3.4 ?
>>>>>>> We don't have reconfig but the case "Start embedded ZK, stop it and
>>>>>>> try to restart on the same port" should apply as well.
>>>>>>> 
>>>>>>> Enrico
>>>>>>> 
>>>>>>>>> Andor
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 2019. Jan 4., at 16:27, Enrico Olivelli <eo...@gmail.com>
>>>>>> wrote:
>>>>>>>>>> Il giorno ven 4 gen 2019 alle ore 14:23 Andor Molnar
>>>>>>>>>> <andor@apache.org <ma...@apache.org>> ha scritto:
>>>>>>>>>>> Hi team / Enrico,
>>>>>>>>>>> 
>>>>>>>>>>> I’d like to get feedback from the community on the following
>>> patch
>>>>>>>>> (moving the discussion from GitHub to here):
>>>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-3204 <
>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-3204>
>>>>>>>>>>> https://github.com/apache/zookeeper/pull/753 <
>>>>>>>>> https://github.com/apache/zookeeper/pull/753>
>>>>>>>>>>> In a nutshell: looks like that Netty 3.10 is broken under Java
>>> 11: it
>>>>>>>>> doesn’t properly close the underlying socket (probably not closing
>>> the
>>>>>>>>> registered NIO selectors) and reconfig tests are unable to re-bind
>>> the
>>>>>>>>> ports. This problem is similar that we already fixed in NIO with
>>> the
>>>>>>>>> following patch:
>>>>>>>>> 
>>> https://github.com/apache/zookeeper/commit/c3babb94275ad667dc71c10dcb08a383a3c154c2
>>>>>>>>> <
>>>>>>>>> 
>>> https://github.com/apache/zookeeper/commit/c3babb94275ad667dc71c10dcb08a383a3c154c2
>>>>>>>>>>> The problem doesn’t show up on trunk which has been recently
>>> upgraded
>>>>>>>>> to Netty 4.
>>>>>>>>>>> Repro:
>>>>>>>>>>> - Start embedded ZK, stop it and try to restart on the same
>>> port, or
>>>>>>>>>>> - Start normal ensemble and reconfig to use different (client)
>>> port.
>>>>>>>>> Then reconfig back to the original port which should fail. (that’s
>>> the
>>>>>>>>> scenario which is covered in ReconfigTest)
>>>>>>>>>>> I created the above patch (#753) to backport Netty 4 upgrade to
>>> 3.5
>>>>>> and
>>>>>>>>> it fixes the problem with Java 11 (it doesn’t cause regression in
>>> the
>>>>>>>>> pre-commit build either), but Enrico is having concerns about
>>> making
>>>>>> such
>>>>>>>>> big change before the release.
>>>>>>>>>>> I tend to agree, but let’s see what are the options.
>>>>>>>>>>> 
>>>>>>>>>>> Thoughts:
>>>>>>>>>>> - Do we have to fix this? - Yes. Java 11 is LTS and I the bug is
>>>>>>>>> critical.
>>>>>>>>>>> - Can we fix Netty 3? - Maybe. Let’s say we find the bug in
>>> Netty 3,
>>>>>>>>> what can we do?
>>>>>>>>>>>     a) We cannot workaround from ZooKeeper itself and have to
>>>>>> submit
>>>>>>>>> a pull request for Netty. I think it’s quite unlikely that they
>>> will
>>>>>> accept
>>>>>>>>> the change given it’s not a security bug, but even if they did,
>>> only
>>>>>> the
>>>>>>>>> upgraded version of Netty 3 would work properly with ZooKeeper.
>>> Err.
>>>>>>>>>>>     b) We can workaround it from ZooKeeper: that could be option
>>>>>> #1,
>>>>>>>>> but I have a strong feeling about it’s not going to be the case.
>>>>>>>>>>> - Shall we upgrade to Netty 4? - this is option #2
>>>>>>>>>>> 
>>>>>>>>>>> Please share your thoughts, maybe you know about an option #3.
>>>>>>>>>> Thank you Andor
>>>>>>>>>> 
>>>>>>>>>> I have thought more about this problem, and I have checked that
>>> Netty
>>>>>>>>>> 3 is really dead/unmantained (last release in 2016).
>>>>>>>>>> If I understand correctly there is no easy workaround (nothing
>>> without
>>>>>>>>>> hacking Netty 3 internals)
>>>>>>>>>> 
>>>>>>>>>> As soon as we will declare 3.5.5 "stable" the world will hopefully
>>>>>>>>>> abandon 3.4 and switch to 3.5 + Netty (because of SSL support).
>>>>>>>>>> The network stack is very important so it is better to have Netty
>>> 4 as
>>>>>>>>>> foundation, I am thinking about security issues, we won't make an
>>>>>>>>>> "hotfix" release with the switch to Netty 4 because there is a
>>> bad bug
>>>>>>>>>> in Netty 3.
>>>>>>>>>> So better to switch now.
>>>>>>>>>> 
>>>>>>>>>> But Facebooks friends, expecially @ivmaykov did a lot of bugfixes
>>>>>>>>>> around Netty on master branch, we must be sure that what we are
>>>>>>>>>> delivering in 3.5.5 is stable.
>>>>>>>>>> 
>>>>>>>>>> We will also have to state clearly in the "release notes" that
>>> Netty
>>>>>>>>>> version is changed, as this may have a non trivial impact to
>>> memory
>>>>>>>>>> usage (i.e. Netty 4 uses more Direct memory by default)
>>>>>>>>>> 
>>>>>>>>>> So to recap my final opinion: +1 to switch to Netty 4 if we take
>>> care
>>>>>>>>>> of port all of the fixes around Netty 4 from master branch and we
>>>>>>>>>> state the switch clearly in the release notes
>>>>>>>>>> 
>>>>>>>>>> Enrico
>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Andor