You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Meng Xu <xu...@gmail.com> on 2019/01/09 18:45:46 UTC

Question about the leader-based atomic broadcast

Hi,

I have a question about the leader-based atomic broadcast used in ZooKeeper.

According to the Zab paper "A simple totally ordered broadcast
protocol"[1], the protocol has the requirement of reliable delivery:
If a message is delivered to one server, it will be eventually
delivered by all correct servers.

***My question is:
When will the leader delete the message from its FIFO queue?

I assume the leader won't delete the message until the message is
delivered to all servers?

If so, will the leader's FIFO queue keep increasing when there exists
a slow follower (or slow connection), where it takes a long time for
the slow follower to receive a message?

Thank you very much!

[1] https://www.datadoghq.com/pdf/zab.totally-ordered-broadcast-protocol.2008.pdf

Best,

Meng

Re: Question about the leader-based atomic broadcast

Posted by Michael Han <ha...@apache.org>.

First of all, these questions are more related to implementation details
rather than the Zab protocol itself. My answer thus is a reflection of the
current ZK implementation. Zab itself could be implemented differently
though.

>> When will the leader delete the message from its FIFO queue?

A message is materialized as a Proposal in ZK's implementation. A leader
processes a (write) proposal in a similar way like a two phased commit:
first phase is sync phase where leader persists the proposal to local non
volatile storage, and meanwhile leader will ask followers to do the same. A
follower, upon receiving the request from leader, will send acknowledgement
to leader after the proposal persistent finishes. Leader will wait until it
receives acknowledgments from enough number of followers (such that the
number of followers plus leader itself forms a quorum - typically a
majority quorum which is ZK's default quorum implementation), then proceeds
to next phase, which is commit phase. In commit phase, leader will apply
the proposal to the in memory database, and asks followers to do the same.

The proposal is deleted from leader's queue (in implementation, it's called
outstandingProposals) during the commit phase. In other words, after sync
phase, it's safe to delete the proposal, because it's already persist on a
quorum of servers. And because how majority quorum works, it's safe to
recover in crash cases.

>> I assume the leader won't delete the message until the message is
delivered to all servers?

As previously stated, it's safe to delete as long as the message is persist
on a set of followers such that leader + these servers form a quorum.

>> If so, will the leader's FIFO queue keep increasing when there exists a
slow follower

The size of leader's proposal queue depends on both the throughput of the
system, and the workload. If the throughput is low due to one or more slow
followers, but the workload is also light, it's still possible to clear the
proposal queue before the next client comes in.

But in general, yes, under high workload if one or more followers are slow
(but not too slow so they still in quorum instead of getting restarted
because of failed health check), the leader's proposal queue will keep
grow. There is no way to work around this as long as we use majority quorum
and want to maintain the safety property of the system. There are recent
researches such as Flexible Paxos which weakened the quorum intersection
requirement, which might improve the performance in such case, but ZK does
not implement such optimization.

Hope these help.

On Wed, Jan 9, 2019 at 2:03 PM Ted Dunning <te...@gmail.com> wrote:

> All updates to data in Zookeeper are modified to be idempotent before they
> are accepted into the leader's queue. That means that items in the queue
> can be committed in groups and once each group is acknowledged by a quorum
> of servers, it can be deleted from the queue. Any server not in the
> acknowledging quorum can get up to date by duplicating the contents of the
> leader. As with snapshots, this doesn't require updates to be stopped. The
> idempotency of all updates means that the server can duplicate the leader
> even as updates are being applied and then can apply all updates that
> occurred after the start of the concurrence process. If some of the leader
> state was copied before it was affected by an update after the start of
> synchronization, it will be updated in the follower's image when replaying
> very recent events. If it was copied after being updated, then replaying
> recent events will merely write the same value.
>
> The effect of all of this is that there is no penalty for deleting
> transactions immediately after they are committed and acknowledged.
>
>
>
> On Wed, Jan 9, 2019 at 10:55 AM Meng Xu <xu...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a question about the leader-based atomic broadcast used in
> > ZooKeeper.
> >
> > According to the Zab paper "A simple totally ordered broadcast
> > protocol"[1], the protocol has the requirement of reliable delivery:
> > If a message is delivered to one server, it will be eventually
> > delivered by all correct servers.
> >
> > ***My question is:
> > When will the leader delete the message from its FIFO queue?
> >
> > I assume the leader won't delete the message until the message is
> > delivered to all servers?
> >
> > If so, will the leader's FIFO queue keep increasing when there exists
> > a slow follower (or slow connection), where it takes a long time for
> > the slow follower to receive a message?
> >
> > Thank you very much!
> >
> > [1]
> >
> https://www.datadoghq.com/pdf/zab.totally-ordered-broadcast-protocol.2008.pdf
> >
> > Best,
> >
> > Meng
> >
>

Re: Question about the leader-based atomic broadcast

Posted by Ted Dunning <te...@gmail.com>.

All updates to data in Zookeeper are modified to be idempotent before they
are accepted into the leader's queue. That means that items in the queue
can be committed in groups and once each group is acknowledged by a quorum
of servers, it can be deleted from the queue. Any server not in the
acknowledging quorum can get up to date by duplicating the contents of the
leader. As with snapshots, this doesn't require updates to be stopped. The
idempotency of all updates means that the server can duplicate the leader
even as updates are being applied and then can apply all updates that
occurred after the start of the concurrence process. If some of the leader
state was copied before it was affected by an update after the start of
synchronization, it will be updated in the follower's image when replaying
very recent events. If it was copied after being updated, then replaying
recent events will merely write the same value.

The effect of all of this is that there is no penalty for deleting
transactions immediately after they are committed and acknowledged.

On Wed, Jan 9, 2019 at 10:55 AM Meng Xu <xu...@gmail.com> wrote:

> Hi,
>
> I have a question about the leader-based atomic broadcast used in
> ZooKeeper.
>
> According to the Zab paper "A simple totally ordered broadcast
> protocol"[1], the protocol has the requirement of reliable delivery:
> If a message is delivered to one server, it will be eventually
> delivered by all correct servers.
>
> ***My question is:
> When will the leader delete the message from its FIFO queue?
>
> I assume the leader won't delete the message until the message is
> delivered to all servers?
>
> If so, will the leader's FIFO queue keep increasing when there exists
> a slow follower (or slow connection), where it takes a long time for
> the slow follower to receive a message?
>
> Thank you very much!
>
> [1]
> https://www.datadoghq.com/pdf/zab.totally-ordered-broadcast-protocol.2008.pdf
>
> Best,
>
> Meng
>