You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by 岭秀 <ma...@sina.com> on 2018/09/03 14:01:00 UTC

回复:Re: ZooKeeper 3.5 blocker issues

waiting for the release of 3.5.
I will take the work of ZOOKEEPER-2778 if I have the time
----- 原始邮件 -----
发件人:Andor Molnar <an...@cloudera.com.INVALID>
收件人:DevZooKeeper <de...@zookeeper.apache.org>
主题:Re: ZooKeeper 3.5 blocker issues
日期:2018年09月03日 18点17分

Unfortunately I haven't got feedback on this blocker list, can we say that
we have an agreement on it? :)
- ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol)
- ZOOKEEPER-1549 (Data inconsistency when follower is receiving a DIFF with
a dirty snapshot)
- ZOOKEEPER-1818 (Fix don't care for trunk)
- ZOOKEEPER-2418 (txnlog diff sync can skip sending some transactions to
followers)
- ZOOKEEPER-2778 (Potential server deadlock between follower sync with
leader and follower receiving external connection requests.)
- ZOOKEEPER-2846 (Leader follower sync with on disk txns can possibly leads
to data inconsistency) (might be duplicate of ZK-2418)
- ZOOKEEPER-2930 (Leader cannot be elected due to network timeout of some
members.)
Most of these tickets are having patch available in Jira or in Github. I'll
review them in the next few weeks to see how we could resolve the issues. I
believe that if these tickets can be closed in one way or the other, we'll
be able to release the upcoming 3.5 version as stable.
Thoughts?
Regards,
Andor
On Tue, Aug 14, 2018 at 11:08 AM, Andor Molnar <an...@cloudera.com> wrote:
> One more item to the list which is very-very important:
>
> ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol)
>
> Andor
>
>
>
> On Mon, Aug 13, 2018 at 2:09 PM, Andor Molnar <an...@apache.org> wrote:
>
>> Looking at the list of Critical issues, I think the following items might
>> be considered as blockers too:
>>
>> ZOOKEEPER-2418 (data inconsistency)
>> ZOOKEEPER-2778
>> ZOOKEEPER-2846 (data inconsistency)
>> ZOOKEEPER-2930 (leader election deadlock)
>>
>> Regards,
>> Andor
>>
>>
>>
>>
>> > On 2018. Aug 13., at 12:38, Andor Molnar <an...@apache.org> wrote:
>> >
>> > Hi folks,
>> >
>> > This has been raised on the user list recently as well, so I think it’s
>> good time to review what’s left for a stable 3.5 release. The list of
>> unresolved blocker issues from Jira is the following:
>> >
>> > https://issues.apache.org/jira/issues/?jql=project%20%3D%
>> 20ZooKeeper%20AND%20priority%20in%20(Blocker)%20AND%
>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC <
>> https://issues.apache.org/jira/issues/?jql=project%20=%20Zo
>> oKeeper%20AND%20priority%20in%20(Blocker)%20AND%20resolution
>> %20=%20Unresolved%20ORDER%20BY%20priority%20DESC>
>> >
>> > It seems to me that the following items are the real blockers for a
>> stable 3.5-release:
>> >
>> > ZOOKEEPER-1549
>> > ZOOKEEPER-1818
>> >
>> > The rest of the list should be revisited and Jira should be amended
>> accordingly. (e.g. ZOOKEEPER-2159 is a blocker improvement??)
>> >
>> > Please correct me if I’m wrong and share your thoughts.
>> >
>> > Regards,
>> > Andor
>> >
>> >
>>
>>
>

Re: Re: ZooKeeper 3.5 blocker issues

Posted by Andor Molnar <an...@cloudera.com.INVALID>.
Fine.

I'm happy to ignore 1549, 2846 and 2930. Still we have the list of:

- ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol)
- ZOOKEEPER-1818 (Fix don't care for trunk)
- ZOOKEEPER-2418 (txnlog diff sync can skip sending some transactions to
followers)
- ZOOKEEPER-2778 (Potential server deadlock between follower sync with
leader and follower receiving external connection requests.)

SSL (ZK-236) is a feature which essential for the 3.5 release, hence I
wouldn't leave it out or postpone it for the next stable release. PR has
been out for a long time, get on reviewing please.
The rest are also long outstanding issues which have been found in the 3.5
branch.
ZK-1818 is something which was found in 3.4 and fixed in 3.4, but never has
been fixed in 3.5. Quite a serious issue if still present.

I think we should at least run some manual testing and see if we could
repro any of these issues before going ahead with a stable release.

Regards,
Andor




On Fri, Sep 7, 2018 at 3:24 AM, Michael Han <ha...@apache.org> wrote:

> I haven't went through the entire list, but looks like lots of the JIRA
> issues listed in this thread, such as ZOOKEEPER-1549, 2846, also affects
> 3.4 releases. Should we scope these issues out?
>
> I think historically the single outstanding blocking issue for a stable 3.5
> release is the reconfig feature and security concerns around it (somehow
> addressed in ZOOKEEPER-2014), and the alpha and beta releases were created
> to stabilize that feature.
>
> http://zookeeper-user.578899.n2.nabble.com/Zookeeper-with-
> SSL-release-date-tt7581744.html
>
> So it looks like we are in good shape to release. Something might worth
> doing to claim the quality of 3.5 is on par with 3.4
>
> * Run Jepsen on 3.5 - 3.4 passed the test for the record
> https://aphyr.com/posts/291-jepsen-zookeeper
> * Fix all flaky tests on 3.5 - 3.4 has little or no flaky tests at all.
>
>
> On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar <an...@cloudera.com.invalid>
> wrote:
>
> > Thanks Maoling! That would be huge help, I appreciate it.
> >
> > Andor
> >
>

Re: Re: ZooKeeper 3.5 blocker issues

Posted by Michael Han <ha...@apache.org>.
I haven't went through the entire list, but looks like lots of the JIRA
issues listed in this thread, such as ZOOKEEPER-1549, 2846, also affects
3.4 releases. Should we scope these issues out?

I think historically the single outstanding blocking issue for a stable 3.5
release is the reconfig feature and security concerns around it (somehow
addressed in ZOOKEEPER-2014), and the alpha and beta releases were created
to stabilize that feature.

http://zookeeper-user.578899.n2.nabble.com/Zookeeper-with-SSL-release-date-tt7581744.html

So it looks like we are in good shape to release. Something might worth
doing to claim the quality of 3.5 is on par with 3.4

* Run Jepsen on 3.5 - 3.4 passed the test for the record
https://aphyr.com/posts/291-jepsen-zookeeper
* Fix all flaky tests on 3.5 - 3.4 has little or no flaky tests at all.


On Tue, Sep 4, 2018 at 1:48 AM, Andor Molnar <an...@cloudera.com.invalid>
wrote:

> Thanks Maoling! That would be huge help, I appreciate it.
>
> Andor
>

Re: Re: ZooKeeper 3.5 blocker issues

Posted by Andor Molnar <an...@cloudera.com.INVALID>.
Thanks Maoling! That would be huge help, I appreciate it.

Andor

Re: Re: ZooKeeper 3.5 blocker issues

Posted by Norbert Kalmar <nk...@cloudera.com.INVALID>.
I agree on the list.

Looks like every one has someone working on it or even a PR available.
Except ZOOKEEPER-2846. That looks like a tough one. I will take a look, but
can't promise I will be able to work much on it in September.
So if anyone has an idea feel free to chip in! :)

Regards,
Norbert

On Mon, Sep 3, 2018 at 4:01 PM 岭秀 <ma...@sina.com> wrote:

> waiting for the release of 3.5.
> I will take the work of ZOOKEEPER-2778 if I have the time
> ----- 原始邮件 -----
> 发件人:Andor Molnar <an...@cloudera.com.INVALID>
> 收件人:DevZooKeeper <de...@zookeeper.apache.org>
> 主题:Re: ZooKeeper 3.5 blocker issues
> 日期:2018年09月03日 18点17分
>
> Unfortunately I haven't got feedback on this blocker list, can we say that
> we have an agreement on it? :)
> - ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol)
> - ZOOKEEPER-1549 (Data inconsistency when follower is receiving a DIFF with
> a dirty snapshot)
> - ZOOKEEPER-1818 (Fix don't care for trunk)
> - ZOOKEEPER-2418 (txnlog diff sync can skip sending some transactions to
> followers)
> - ZOOKEEPER-2778 (Potential server deadlock between follower sync with
> leader and follower receiving external connection requests.)
> - ZOOKEEPER-2846 (Leader follower sync with on disk txns can possibly leads
> to data inconsistency) (might be duplicate of ZK-2418)
> - ZOOKEEPER-2930 (Leader cannot be elected due to network timeout of some
> members.)
> Most of these tickets are having patch available in Jira or in Github. I'll
> review them in the next few weeks to see how we could resolve the issues. I
> believe that if these tickets can be closed in one way or the other, we'll
> be able to release the upcoming 3.5 version as stable.
> Thoughts?
> Regards,
> Andor
> On Tue, Aug 14, 2018 at 11:08 AM, Andor Molnar <an...@cloudera.com> wrote:
> > One more item to the list which is very-very important:
> >
> > ZOOKEEPER-236 (SSL/TLS support for Atomic Broadcast protocol)
> >
> > Andor
> >
> >
> >
> > On Mon, Aug 13, 2018 at 2:09 PM, Andor Molnar <an...@apache.org> wrote:
> >
> >> Looking at the list of Critical issues, I think the following items
> might
> >> be considered as blockers too:
> >>
> >> ZOOKEEPER-2418 (data inconsistency)
> >> ZOOKEEPER-2778
> >> ZOOKEEPER-2846 (data inconsistency)
> >> ZOOKEEPER-2930 (leader election deadlock)
> >>
> >> Regards,
> >> Andor
> >>
> >>
> >>
> >>
> >> > On 2018. Aug 13., at 12:38, Andor Molnar <an...@apache.org> wrote:
> >> >
> >> > Hi folks,
> >> >
> >> > This has been raised on the user list recently as well, so I think
> it’s
> >> good time to review what’s left for a stable 3.5 release. The list of
> >> unresolved blocker issues from Jira is the following:
> >> >
> >> > https://issues.apache.org/jira/issues/?jql=project%20%3D%
> >> 20ZooKeeper%20AND%20priority%20in%20(Blocker)%20AND%
> >> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC <
> >> https://issues.apache.org/jira/issues/?jql=project%20=%20Zo
> >> oKeeper%20AND%20priority%20in%20(Blocker)%20AND%20resolution
> >> %20=%20Unresolved%20ORDER%20BY%20priority%20DESC>
> >> >
> >> > It seems to me that the following items are the real blockers for a
> >> stable 3.5-release:
> >> >
> >> > ZOOKEEPER-1549
> >> > ZOOKEEPER-1818
> >> >
> >> > The rest of the list should be revisited and Jira should be amended
> >> accordingly. (e.g. ZOOKEEPER-2159 is a blocker improvement??)
> >> >
> >> > Please correct me if I’m wrong and share your thoughts.
> >> >
> >> > Regards,
> >> > Andor
> >> >
> >> >
> >>
> >>
> >
>