You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Akshay Chander (JIRA)" <ji...@apache.org> on 2013/10/17 20:18:45 UTC

[jira] [Commented] (ZOOKEEPER-1674) There is no need to clear & load the database across leader election

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798219#comment-13798219 ] 

Akshay Chander commented on ZOOKEEPER-1674:
-------------------------------------------

I am working with Thawan on this feature. I'd appreciate comments and suggestions for the analysis done so far.

Retaining the database across leader election should improve the recovery time after leader election. In order to support such a feature, the following changes will be required to ensure that the existing behavior is maintained. 
 
1) Anything that has reached the PrepRequestProcessor should make it to the SyncRequestProcessor. Similarly, anything that has reached the commitProcessor should eventually reach the FinalRequestProcessor. To maintain this invariant:
 
a) Currently, we drop the database and reload from disk (snapshot + txnlog). We can effectively mimic this behavior in one of two ways.
    i) We retain outstandingProposals and toBeApplied (in the case of leader) or pendingTxns (in the case of followers) across the leader election.
           We will apply the txns in these data structures to the data tree before calling getInitLastLoggedZxid in lookForLeader()). This will ensure that the lastSeenZxid sent by the participant during the leader election will remain the same as before this feature.
    ii) Alternatively, we could apply these txns to the data tree during the shutdown phase. This way, we dont need to do the extra work of persisting these data structures across leader elections.
 
b) During shutdown, we should ensure that all appends to the txnlog have actually been flushed to the disk.
 
c) By retaining the zkDataBase, we will also be retaining the sessionsWithTimeouts, which is a listing of global sessions. We need to ensure that this is now clean after the leader election.
        Leader: If there is an upgrade request for a session (from local to global), we add it to the global session tracker. Since this is going to persist across leader election, we need to ensure that the txn corresponding to this createSession is present in atleast the txnlog.
            Therefore we need to ensure that requests that are in the PrepRequestProcessor make their way to the SyncRequestProcessor even if there is a shutdown at any point in between.
 
d) Ensure that anything in the FinalRequestProcessor gets applied to the Data Tree.
 
2) Don't take a dirty snapshot. We don't want txns that haven't been accepted by a majority of the quorum to be part of any snapshot. Currently, we take snapshots on shutdown and in loadData, which we will stop doing.
 
3) In followers, there is a bug in the local session code. When there is an upgrade request,  we currently remove the session from the local session tracker and add it to globalSessionWithTimeouts in the local request processor itself (checkUpgradeSession)
We probably should not add it to the global sessions just yet and let it be done in the final request processor.
 
4) Another small bug: In learnerSessionTracker::touchSession, currently if a session is not in the localSessionTracker and not a global session, then we return false. this should not be the case any longer.
    This is because we may have removed the session from the local session tracker for an upgrade request. So just add it to the touchTable and return true.
 
This analysis was done on our internal branch which is based of 3.4. Therefore, we haven't investigated how this feature would be affected by  the Dynamic Reconfiguration feature.

> There is no need to clear & load the database across leader election
> --------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1674
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1674
>             Project: ZooKeeper
>          Issue Type: Improvement
>            Reporter: Jacky007
>
> It is interesting to notice the piece of codes in QuorumPeer.java
>  /* ZKDatabase is a top level member of quorumpeer 
>      * which will be used in all the zookeeperservers
>      * instantiated later. Also, it is created once on 
>      * bootup and only thrown away in case of a truncate
>      * message from the leader
>      */
>     private ZKDatabase zkDb;
> It is introduced by ZOOKEEPER-596. Now, we just drop the database every leader election.
> We can keep it safely with ZOOKEEPER-1549.



--
This message was sent by Atlassian JIRA
(v6.1#6144)