You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2018/09/11 23:55:00 UTC

[jira] [Commented] (ZOOKEEPER-3144) Potential ephemeral nodes inconsistent due to global session inconsistent with fuzzy snapshot

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611384#comment-16611384 ] 

Hadoop QA commented on ZOOKEEPER-3144:
--------------------------------------

-1 overall.  GitHub Pull Request  Build
      

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2153//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2153//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2153//console

This message is automatically generated.

> Potential ephemeral nodes inconsistent due to global session inconsistent with fuzzy snapshot
> ---------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3144
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3144
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.5.4, 3.6.0, 3.4.13
>            Reporter: Fangmin Lv
>            Assignee: Fangmin Lv
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.6.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Found this issue recently when checking another prod issue, the problem is that the current code will update lastProcessedZxid before it's actually making change for the global sessions in the DataTree.
>  
> In case there is a snapshot taking in progress, and there is a small time stall between set lastProcessedZxid and update the session in DataTree due to reasons like thread context switch or GC, etc, then it's possible the lastProcessedZxid is actually set to the future which doesn't include the global session change (add or remove).
>  
> When reload this snapshot and it's txns, it will replay txns from lastProcessedZxid + 1, so it won't create the global session anymore, which could cause data inconsistent.
>  
> When global sessions are inconsistent, it might have ephemeral inconsistent as well, since the leader will delete all the ephemerals locally if there is no global sessions associated with it, and if someone have snapshot sync with it then that server will not have that ephemeral as well, but others will. It will also have global session renew issue for that problematic session.
>  
> The same issue exist for the closeSession txn, we need to move these global session update logic before processTxn, so the lastProcessedZxid will not miss the global session here.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)