You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Michael Han <ha...@apache.org> on 2020/09/02 04:30:52 UTC

Re: May violate the ZAB agreement -- version 3.6.1

I can confirm this is a bug now I have a unit test to reproduce the issue.
I will submit a pull request soon. Let's move discussions of this topic to
JIRA and github.

On Fri, Aug 28, 2020 at 10:02 PM li xun <27...@qq.com> wrote:

> Hi hanm
>
>
>
> Thanks
>
> This is the issue in jira
> https://issues.apache.org/jira/browse/ZOOKEEPER-3911
>
> ———————————————————————————————————————————————————————
>
> Below are my thoughts
>
> Before the server becomes the real leader, the follower needs to
> synchronize data with the leader. When encountering big data, it will be
> very slow, causing the server to be temporarily unavailable. Can the leader
> communicate with the follower before the synchronization starts, and
> calculate the maximum zxid_n [reference 1] in the proposal owned by the
> leader that has reached the quorum, and then allow the leader to
> immediately be able to access externally, but only access <=zxid_n Data
> (such as webapp, which can access the leader, which reduces the time that
> zk is inaccessible), there may be two solutions for follower
> 1) Since the follower has not synchronized the data, external webpp access
> is temporarily not allowed, so that even if the data that the follower
> needs to synchronize is large, it will not affect the external service
> provided by zk. But disadvantages: access pressure will be concentrated in
> the leader, at this time the entire cluster does not have the
> characteristics of distributed, prone to single point of failure
> 2) The follower immediately provides services to the outside world, but
> since the follower has not synchronized with the leader, if the follower
> has just experienced a restart, then the follower cannot confirm that it
> currently holds the largest zxid_x that has reached the quorum, and may
> need the follower to do it once Additional inquiry to confirm whether
> zxid_x reaches a quorum. (Or make a separate flag for zxid to indicate
> whether a certain zxid reaches a quorum) Then follower provides access to
> the outside, only access <=zxid_x
> Disadvantages: complex implementation and increased communication volume
>
>
> Reference 1: from <paxos made simple> Leslie Lamport 01 Nov 2001
> "
> 2.3 Learning a Chosen Value
> To learn that a value has been chosen, a learner must find out that a pro-
> posal has been accepted by a majority of acceptors. The obvious algorithm
> is to have each acceptor, whenever it accepts a proposal, respond to all
> learners, sending them the proposal. This allows learners to find out about
> a chosen value as soon as possible, but it requires each acceptor to
> respond to each learner—a number of responses equal to the product of the
> number of acceptors and the number of learners.
> The assumption of non-Byzantine failures makes it easy for one learner to
> find out from another learner that a value has been accepted. We can have
> the acceptors respond with their acceptances to a distinguished learner,
> which in turn informs the other learners when a value has been chosen. This
> approach requires an extra round for all the learners to discover the
> chosen value. It is also less reliable, since the distinguished learner
> could fail. But it requires a number of responses equal only to the sum of
> the number of acceptors and the number of learners.
> More generally, the acceptors could respond with their acceptances to some
> set of distinguished learners, each of which can then inform all the
> learners when a value has been chosen. Using a larger set of distinguished
> learners provides greater reliability at the cost of greater communication
> complexity.
> Because of message loss, a value could be chosen with no learner ever
> finding out. The learner could ask the acceptors what proposals they have
> accepted, but failure of an acceptor could make it impossible to know
> whether or not a majority had accepted a particular proposal . In that
> case, learners will find out what value is chosen only when a new proposal
> is chosen. If a learner needs to know whether a value has been chosen, it
> can have a proposer issue a proposal, using the algorithm described above.
> “
>
>
>
> Best,
> li xun
>
>
>
> 2020年8月29日 10:59,Michael Han <ha...@apache.org> 写道:
>
> Hi Xun,
>
> I think this is a bug, your test case is sound to me. Do you mind
> creating a JIRA for this issue?
>
> Followers should not ACK NEWLEADER without ACK every transaction from the
> DIFF sync. To ACK every transaction, a follower either persists the
> transaction in log, or takes a snapshot before sending the ACK of the
> NEWLEADER (which we did, before ZOOKEEPER-2678 where the snapshot
> optimization was introduced).
>
> A potential fix I have in mind is to make sure to persist all DIFF sync
> proposals from LEADER (similar to what we are already doing for proposals
> coming between NEWLEADER and UPTODATE). By doing so, when the leader
> receives NEWLEADER ACK from a quorum, it's guaranteed that
> every transaction leader DIFF sync to follower is quorum committed. Thus
> there will not be inconsistent views moving forward. Alternatively we can
> take a snapshot before ACK NEWLEADER but that will be a big performance hit
> for big data trees.
>
> I am also interested to hear what others think about this.
>
> On Fri, Aug 28, 2020 at 12:20 AM li xun <27...@qq.com> wrote:
>
> There is a example in the link, would you understand what I mean?
>
>
>
>
> https://drive.google.com/file/d/1jy3kkVQTDYGb4iV1RaPMBbEWLZZltTQG/view?usp=sharing
>
> Since version 3.4, the quorum of followers and the leader did not
> synchronize the files immediately when the synchronization was completed,
> and the data was not persisted to the files in an instant, and at this time
> the zk server can provide external access, such as webapp access, if it
> appears at this time Failure, phantom reading may occur
>
>
> 2020年8月28日 14:51,Justin Ling Mao <ma...@sina.com> 写道:
>
> @李珣The situation you describe may have conceptual deviations about how
>
> the consensus protocol works:---> Since the data of the follower when the
> follower uses the DIFF method to synchronize with the leader is still in
> the memory, it has not had time to persist1. The write path is: write
> transaction log(WAL) firstly, after reaching a consensus, then apply to
> memory, other than the opposite.
>
> ---> but at this time, the latest zxid_n of the leader has not been
>
> supported by the quorum of the follower. At this time, if a client connects
> to the leader and sees zxid_n,2. If a write has not been supported by the
> quorum, it's not safe to apply to the state machine and the client is not
> able to see this write.
>
> I guess that your question may be: how the system handles the
>
> uncommitted logs when leader changes?
>
>
>
>
> ----- Original Message -----
> From: Ted Dunning <te...@gmail.com>
> To: dev@zookeeper.apache.org
> Subject: Re: May violate the ZAB agreement -- version 3.6.1
> Date: 2020-08-28 01:25
>
> How is it that participant A would have a later zxid than the leader?
> In particular, it seems to me that it should be impossible to have these
> two facts be true:
> 1) a transaction has been committed with zxid = z_0. This implies that a
> quorum of the cluster has accepted this transaction and it has been
> committed.
> 2) a new leader election nominates a leader with latest zxid < z_0.
> My reasoning is that any new leader election has to involve a quorum and
>
> at
>
> least a sufficient number of that quorum must have accepted zxid >= z_0
>
> and
>
> therefore would refuse to be part of the quorum (this is a
>
> contradiction).
>
> Thus, no leader could be elected with zxid < z_0 if fact (1) is true.
> What you are describing seems to require both of these facts.
> Perhaps I am missing something about your suggested scenario. Could you
> describe what you are thinking in more detail?
> On Thu, Aug 27, 2020 at 2:08 AM 李珣 <27...@qq.com> wrote:
>
> version 3.6.1
> org.apache.zookeeper.server.quorum.Learner.java line:605
> Suppose there is a situation
> zxid_n is the largest zxid of Participant A (the leader has just resumed
> from downtime). Zxid_n has not been recognized by the quorum. Assuming
> Participant A is elected as the Leader, then if a follower appears to
>
> use
>
> DIFF to synchronize data with the Leader, Leader After sending the
> UPTODATE, the leader can already provide external access, but at this
>
> time,
>
> the latest zxid_n of the leader has not been supported by the quorum of
>
> the
>
> follower. At this time, if a client connects to the leader and sees
>
> zxid_n,
>
> then at this time both the leader and the follower are down. For some
> reason, the leader cannot be started, and the follower can start
>
> normally.
>
> At this time, a new leader can only be elected from the follower. Since
>
> the
>
> data of the follower when the follower uses the DIFF method to
>
> synchronize
>
> with the leader is still in the memory, it has not had time to persist,
> then this The newly elected leader does not have the data of zxid_n, but
> before zxid_n has been seen by the client on the old leader, there will
>
> be
>
> inconsistencies in the data view.
> Is the above situation possible?
>
>
>
>
>