You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Vishal Kher <vi...@gmail.com> on 2011/06/06 19:11:05 UTC

Potential bug with Leader.TRUNC?

Hi,

In LearnerHandler, the code that sends TRUNC to the follower is shown below:

*                if (proposals.size() != 0) {
*                    if ((maxCommittedLog >= peerLastZxid)
                            && (minCommittedLog <= peerLastZxid)) {
                        packetToSend = Leader.DIFF;
                        zxidToSend = maxCommittedLog;
                        for (Proposal propose: proposals) {
                            if (propose.packet.getZxid() > peerLastZxid) {
                                queuePacket(propose.packet);
                                QuorumPacket qcommit = new
QuorumPacket(Leader.COMMIT, propose.packet.getZxid(),
                                        null, null);
                                queuePacket(qcommit);
                            }
                        }
                    } else if (peerLastZxid > maxCommittedLog) {
*                        packetToSend = Leader.TRUNC;*
                        zxidToSend = maxCommittedLog;
                        updates = zxidToSend;
                    }
                } else {
                    // just let the state transfer happen
                }


Why does it send a TRUNC only when the size of proposal's in memory is != 0?
What if after a new leader election,
the transaction log of the leader is empty (i.e., no proposals). If an old
leader tries to join and has higher transaction
ID recorded in its transaction log, then it won't truncate its transaction
log. I think with the right combination of zxid
of the transaction log and snapshot we might see this problem.

Am I misreading the code?

Thanks,
-Vishal

Re: Potential bug with Leader.TRUNC?

Posted by "Fournier, Camille F. [Tech]" <Ca...@gs.com>.
Well the follower would get a full SNAP in that case, leaving a weird log state. Would be useful though to list the series of events that would cause such a scenario within the bounds of otherwise normal operating behavior (ie, all machine logs were created by members of this cluster during the last round of healthy operation). Should this ever happen at all?

C


----- Original Message -----
From: Vishal Kher <vi...@gmail.com>
To: dev@zookeeper.apache.org <de...@zookeeper.apache.org>
Sent: Mon Jun 06 13:11:05 2011
Subject: Potential bug with Leader.TRUNC?

Hi,

In LearnerHandler, the code that sends TRUNC to the follower is shown below:

*                if (proposals.size() != 0) {
*                    if ((maxCommittedLog >= peerLastZxid)
                            && (minCommittedLog <= peerLastZxid)) {
                        packetToSend = Leader.DIFF;
                        zxidToSend = maxCommittedLog;
                        for (Proposal propose: proposals) {
                            if (propose.packet.getZxid() > peerLastZxid) {
                                queuePacket(propose.packet);
                                QuorumPacket qcommit = new
QuorumPacket(Leader.COMMIT, propose.packet.getZxid(),
                                        null, null);
                                queuePacket(qcommit);
                            }
                        }
                    } else if (peerLastZxid > maxCommittedLog) {
*                        packetToSend = Leader.TRUNC;*
                        zxidToSend = maxCommittedLog;
                        updates = zxidToSend;
                    }
                } else {
                    // just let the state transfer happen
                }


Why does it send a TRUNC only when the size of proposal's in memory is != 0?
What if after a new leader election,
the transaction log of the leader is empty (i.e., no proposals). If an old
leader tries to join and has higher transaction
ID recorded in its transaction log, then it won't truncate its transaction
log. I think with the right combination of zxid
of the transaction log and snapshot we might see this problem.

Am I misreading the code?

Thanks,
-Vishal