You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Botond Hejj (Created) (JIRA)" <ji...@apache.org> on 2012/03/12 10:14:38 UTC

[jira] [Created] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

java client watches inconsistently triggered on reconnect
---------------------------------------------------------

                 Key: ZOOKEEPER-1412
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
             Project: ZooKeeper
          Issue Type: Bug
          Components: java client
    Affects Versions: 3.4.3, 3.4.2, 3.4.1, 3.4.0, 3.3.4, 3.3.3
            Reporter: Botond Hejj
             Fix For: 3.3.5, 3.4.4


I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.

After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
This is not always the case in the java client.

I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Benjamin Reed (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228483#comment-13228483 ] 

Benjamin Reed commented on ZOOKEEPER-1412:
------------------------------------------

i agree with botond that we should also fix the java client to update lastzxid on every message. it might be worth doing in a separate jira.
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227960#comment-13227960 ] 

Hadoop QA commented on ZOOKEEPER-1412:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518091/ZOOKEEPER-1412_trunk.patch
  against trunk revision 1297740.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/990//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/990//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/990//console

This message is automatically generated.
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt reassigned ZOOKEEPER-1412:
---------------------------------------

    Assignee: Patrick Hunt
    
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>             Fix For: 3.3.5, 3.4.4
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227692#comment-13227692 ] 

Patrick Hunt commented on ZOOKEEPER-1412:
-----------------------------------------

This is a serious issue, although it does have a pretty straightforward workaround. I'll respin a new 3.3.5 rc that includes a fix for this (I'm working on it).
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Benjamin Reed (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13227994#comment-13227994 ] 

Benjamin Reed commented on ZOOKEEPER-1412:
------------------------------------------

+1 looks good pat. my one small comment is that you comment one of the methods in TestableZooKeeper, but not the other. since it is kind of a public interface it might be good to comment both.
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1412:
------------------------------------

    Attachment: ZOOKEEPER-1412_trunk.patch

Ok, updated with comments. Good to see I was right (at least initially). thanks.
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1412:
------------------------------------

    Attachment: ZOOKEEPER-1412_trunk.patch

These three patches fix the issue on the respective branches. I verified the tests failed before the fix, and passed subsequent to the fix. All tests are passing for me on the three branches.

I believe accessing/returning the lastzxid as I do in FinalRequestProcessor is valid, but it took me some time to convince myself. Please double check that the lastzxid I send back to the client is the one corresponding to the zxid at the time the read was performed
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1412:
------------------------------------

    Hadoop Flags: Reviewed
    
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228518#comment-13228518 ] 

Patrick Hunt commented on ZOOKEEPER-1412:
-----------------------------------------

Honestly I didn't have time to look into all of these scenarios, but I have a concern wrt introducing this change into 3.3 code line, I'd be more comfortable look at such a change for 3.4 or 3.5. For example, if we did introduce this further changes, is there any chance to get a ping response which updates the lastzxid, then disconnect from the server before getting a notification that would then be missed on reconnect?

I'd like to cut a 3.3.5 with what I have so far, which clearly fixes the issue identified (and 3.3.5 fixes another critical issues to get in user's hands). We can then look at further improvements in a separate jira. Make sense?
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1412:
------------------------------------

    Fix Version/s: 3.5.0
    
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228005#comment-13228005 ] 

Mahadev konar commented on ZOOKEEPER-1412:
------------------------------------------

Looks good. +1 for the patch (pending fixing ben's comments :)).
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1412:
------------------------------------

    Attachment: ZOOKEEPER-1412_br34.patch
                ZOOKEEPER-1412_br33.patch
    
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228049#comment-13228049 ] 

Hadoop QA commented on ZOOKEEPER-1412:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518111/ZOOKEEPER-1412_trunk.patch
  against trunk revision 1297740.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/991//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/991//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/991//console

This message is automatically generated.
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1412:
------------------------------------

    Attachment: ZOOKEEPER-1412_br34.patch
                ZOOKEEPER-1412_br33.patch
    
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Botond Hejj (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228278#comment-13228278 ] 

Botond Hejj commented on ZOOKEEPER-1412:
----------------------------------------

Thanks for the patch Patrick.

As I mentioned in my second mail I see a difference between java and c client in regards how they update lastzxid. The c client updates it for all operations and java not. This bug was not surfacing when c client was used because c client updates the lastzxid on every ping. 

Do you see any problem with this inconsistency between the c and java client? Is it worth changing one to behave like the other?
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Hunt updated ZOOKEEPER-1412:
------------------------------------

    Component/s:     (was: java client)
                 server
       Priority: Blocker  (was: Major)
    
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288987#comment-13288987 ] 

Patrick Hunt commented on ZOOKEEPER-1412:
-----------------------------------------

Hi César, I just tried reproducing using the java cli (zk shell) client and was unable (iirc I was able to reproduce previously) using the latest on branch-3.4 (includes this fix).

started client against a follower (3 server ensemble)

WatchedEvent state:SyncConnected type:None path:null
[zk: (CONNECTED) 0] ls /foo 1

killed the server
WatchedEvent state:Disconnected type:None path:null

restarted the server
WatchedEvent state:SyncConnected type:None path:null

that was it

I then started a second client and deleted /foo, which caused

WatchedEvent state:SyncConnected type:NodeDeleted path:/foo

so afaict it looks right to me.

If you still see issues please open a separate jira. A test that reproduces this would also be useful.


                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "César Álvarez Núñez (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288551#comment-13288551 ] 

César Álvarez Núñez commented on ZOOKEEPER-1412:
------------------------------------------------

Once applied this fix, a NodeDataChanged is received instead of a NodeCreated.
More information in user mailing list thread "Unexpected NodeCreated with 3.4.3?"
Affected versions: 3.3.5 and 3.4.3 + ZK-1412
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231063#comment-13231063 ] 

Hudson commented on ZOOKEEPER-1412:
-----------------------------------

Integrated in ZooKeeper-trunk #1493 (See [https://builds.apache.org/job/ZooKeeper-trunk/1493/])
    ZOOKEEPER-1412. java client watches inconsistently triggered on reconnect (phunt) (Revision 1301089)

     Result = SUCCESS
phunt : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1301089
Files : 
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/TestableZooKeeper.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java

                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (ZOOKEEPER-1412) java client watches inconsistently triggered on reconnect

Posted by "Mahadev konar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228740#comment-13228740 ] 

Mahadev konar commented on ZOOKEEPER-1412:
------------------------------------------

I think its ok to go with the current patch for 3.3 line and update the patch to match the zxid's for 3.4 and trunk. I agree with Pat on keeping the changes minimal for 3.3 line. 
                
> java client watches inconsistently triggered on reconnect
> ---------------------------------------------------------
>
>                 Key: ZOOKEEPER-1412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1412
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3
>            Reporter: Botond Hejj
>            Assignee: Patrick Hunt
>            Priority: Blocker
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br33.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_br34.patch, ZOOKEEPER-1412_trunk.patch, ZOOKEEPER-1412_trunk.patch
>
>
> I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.
> After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
> This is not always the case in the java client.
> I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
> Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
> I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira