You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Karolos Antoniadis <ka...@gmail.com> on 2019/07/30 00:28:20 UTC

Leader election logging during reconfiguration

Hi ZooKeeper developers,

ZooKeeper seems to be logging a "*LEADER ELECTION TOOK*" message even
though no leader election takes place during a reconfiguration.

This can be reproduced by following these steps:
1) start a ZooKeeper cluster (e.g., 3 participants)
2) start a client that connects to some follower
3) perform a *reconfig* operation that removes the leader from the cluster

After the reconfiguration takes place, we can see that the log files of the
remaining participants contain a "*LEADER ELECTION TOOK*" message. For
example, a line that contains

*2019-07-29 23:07:38,518 [myid:2] - INFO
 [QuorumPeer[myid=2](plain=0.0.0.0:2792)(secure=disabled):Follower@75] -
FOLLOWING - LEADER ELECTION TOOK - 57 MS*

However, no leader election took place. With that, I mean that no server
went *LOOKING *and then started voting and sending notifications to other
participants as would be in a normal leader election.
It seems, that before the *reconfig *is committed, the participant that is
going to be the next leader is already decided (see here:
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L865
).

I think the above issue raises the following questions:
- Should we avoid logging LEADER ELECTION messages altogether in such cases?
- Or, should there be some logging for the time it took for the
reconfiguration (e.g., the time between a participant gets a *reconfig*
operation till the operation is committed)? Would such a time value be
useful?

Best,
Karolos

Re: Leader election logging during reconfiguration

Posted by Michael Han <ha...@apache.org>.

>> we should measure the total time more accurately

+1 - it would be good to have a new metric to measure reconfiguration time,
and leaving existing LE time metric dedicated to measure the conventional
FLE time. Mixing both (as of today) will provide some confusing insights on
how long the conventional FLE actually took.

On Mon, Jul 29, 2019 at 7:13 PM Alexander Shraer <sh...@gmail.com> wrote:

> Please see comments inline.
>
> Thanks,
> Alex
>
> On Mon, Jul 29, 2019 at 5:29 PM Karolos Antoniadis <ka...@gmail.com>
> wrote:
>
> > Hi ZooKeeper developers,
> >
> > ZooKeeper seems to be logging a "*LEADER ELECTION TOOK*" message even
> > though no leader election takes place during a reconfiguration.
> >
> > This can be reproduced by following these steps:
> > 1) start a ZooKeeper cluster (e.g., 3 participants)
> > 2) start a client that connects to some follower
> > 3) perform a *reconfig* operation that removes the leader from the
> cluster
> >
> > After the reconfiguration takes place, we can see that the log files of
> the
> > remaining participants contain a "*LEADER ELECTION TOOK*" message. For
> > example, a line that contains
> >
> > *2019-07-29 23:07:38,518 [myid:2] - INFO
> >  [QuorumPeer[myid=2](plain=0.0.0.0:2792)(secure=disabled):Follower@75] -
> > FOLLOWING - LEADER ELECTION TOOK - 57 MS*
> >
> > However, no leader election took place. With that, I mean that no server
> > went *LOOKING *and then started voting and sending notifications to other
> > participants as would be in a normal leader election.
> > It seems, that before the *reconfig *is committed, the participant that
> is
> > going to be the next leader is already decided (see here:
> >
> >
> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L865
> > ).
> >
> > I think the above issue raises the following questions:
> > - Should we avoid logging LEADER ELECTION messages altogether in such
> > cases?
> >
>
> In the specific scenario you described, the leader has changed, but our
> heuristic for choosing the leader apparently worked and a new leader could
> be elected without running the full election.
> Notice that we could be unlucky and the designated leader could be offline,
> and then we'll fall back on election. It's useful to know how much time it
> takes to start following the new leader.
>
>
> > - Or, should there be some logging for the time it took for the
> > reconfiguration (e.g., the time between a participant gets a *reconfig*
> > operation till the operation is committed)? Would such a time value be
> > useful?
> >
>
> IIRC the LEADER ELECTION message is used for this purpose. if you just look
> on the time to commit the reconfig operation, you won't
> account for the work that happens when the commit message is received, such
> as leader re-election, role-change (follower->observer conversion and such)
> etc which is what takes most of the time.
> Committing a reconfig operation is usually not much more expensive than
> committing a normal operation. But perhaps you're right that we should
> measure the total time more accurately. Would you
> like to open a Jira and perhaps take a stab at improving this ?
>
> >
> > Best,
> > Karolos
> >
>

Re: Leader election logging during reconfiguration

Posted by Alexander Shraer <sh...@gmail.com>.

Please see comments inline.

Thanks,
Alex

On Mon, Jul 29, 2019 at 5:29 PM Karolos Antoniadis <ka...@gmail.com>
wrote:

> Hi ZooKeeper developers,
>
> ZooKeeper seems to be logging a "*LEADER ELECTION TOOK*" message even
> though no leader election takes place during a reconfiguration.
>
> This can be reproduced by following these steps:
> 1) start a ZooKeeper cluster (e.g., 3 participants)
> 2) start a client that connects to some follower
> 3) perform a *reconfig* operation that removes the leader from the cluster
>
> After the reconfiguration takes place, we can see that the log files of the
> remaining participants contain a "*LEADER ELECTION TOOK*" message. For
> example, a line that contains
>
> *2019-07-29 23:07:38,518 [myid:2] - INFO
>  [QuorumPeer[myid=2](plain=0.0.0.0:2792)(secure=disabled):Follower@75] -
> FOLLOWING - LEADER ELECTION TOOK - 57 MS*
>
> However, no leader election took place. With that, I mean that no server
> went *LOOKING *and then started voting and sending notifications to other
> participants as would be in a normal leader election.
> It seems, that before the *reconfig *is committed, the participant that is
> going to be the next leader is already decided (see here:
>
> https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L865
> ).
>
> I think the above issue raises the following questions:
> - Should we avoid logging LEADER ELECTION messages altogether in such
> cases?
>

In the specific scenario you described, the leader has changed, but our
heuristic for choosing the leader apparently worked and a new leader could
be elected without running the full election.
Notice that we could be unlucky and the designated leader could be offline,
and then we'll fall back on election. It's useful to know how much time it
takes to start following the new leader.

> - Or, should there be some logging for the time it took for the
> reconfiguration (e.g., the time between a participant gets a *reconfig*
> operation till the operation is committed)? Would such a time value be
> useful?
>

IIRC the LEADER ELECTION message is used for this purpose. if you just look
on the time to commit the reconfig operation, you won't
account for the work that happens when the commit message is received, such
as leader re-election, role-change (follower->observer conversion and such)
etc which is what takes most of the time.
Committing a reconfig operation is usually not much more expensive than
committing a normal operation. But perhaps you're right that we should
measure the total time more accurately. Would you
like to open a Jira and perhaps take a stab at improving this ?

>
> Best,
> Karolos
>