You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by lvfangmin <gi...@git.apache.org> on 2018/06/25 23:31:25 UTC

[GitHub] zookeeper issue #353: [ZOOKEEPER-2886] Permanent session moved error in mult...

Github user lvfangmin commented on the issue:

    https://github.com/apache/zookeeper/pull/353
  
    @anmolnar thanks for reviewing, the testNoLogBeforeLeaderEstablishment was introduced by mistake during rebase, and for the confusion. I've fixed the other test to catch the issue I'm trying to reproduce by removing the zk.dontReconnect() statement.
    
    Here is the problem I'm trying to address in this diff:
    
    1. client trying to renew session A on server S1
    2. S1 is slow (like full GC, or high network delay due to packet lost) on sending the revalidate request to leader
    3. client timed out on renew session A on server S1, and tried to connect to S2
    4. S2 is faster than S1, and it revalidated the session on leader and owns the session
    5. S1's revalidate finally reached leader, and leader updated the owner to S1
    6. from now on, the requests from this client will always get session moved error, although S2 is the right one which owns the session
    
    The server need to close session in this case to allow the client to reconnect and address this corner case.
    
    Jira ZOOKEEPER-710 solved the non multi-op cases, but if the client only sends multi-op it can hit this problem again, which is addressed in this diff.


---