You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Miller (JIRA)" <ji...@apache.org> on 2012/06/27 19:25:43 UTC

[jira] [Created] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

Mark Miller created SOLR-3582:
---------------------------------

             Summary: Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
                 Key: SOLR-3582
                 URL: https://issues.apache.org/jira/browse/SOLR-3582
             Project: Solr
          Issue Type: Bug
            Reporter: Mark Miller
            Assignee: Mark Miller
            Priority: Minor
             Fix For: 4.0, 5.0


As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.

http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

Posted by "Trym Møller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402851#comment-13402851 ] 

Trym Møller edited comment on SOLR-3582 at 6/28/12 5:10 AM:
------------------------------------------------------------

Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first "stop" after it has been notified about a node change.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers.

Best regards Trym
                
      was (Author: trym):
    Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first "stop" after a node change occurs.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers.

Best regards Trym
                  
> Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-3582
>                 URL: https://issues.apache.org/jira/browse/SOLR-3582
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.
> http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

Posted by "Trym Møller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402851#comment-13402851 ] 

Trym Møller edited comment on SOLR-3582 at 6/28/12 5:12 AM:
------------------------------------------------------------

Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first "stop" after it has been notified about a node change.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers. Are there other things I can do, to show that the solution is the right one?

Best regards Trym
                
      was (Author: trym):
    Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first "stop" after it has been notified about a node change.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers.

Best regards Trym
                  
> Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-3582
>                 URL: https://issues.apache.org/jira/browse/SOLR-3582
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.
> http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402388#comment-13402388 ] 

Mark Miller commented on SOLR-3582:
-----------------------------------

I'm unsure of the proposed solution on the mailing list.

On a connection event, the watch will fire - we will skip doing anything, but watches are one time events, so we will have no watch in place?
                
> Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-3582
>                 URL: https://issues.apache.org/jira/browse/SOLR-3582
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.
> http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

Posted by "Trym Møller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402851#comment-13402851 ] 

Trym Møller edited comment on SOLR-3582 at 6/28/12 5:13 AM:
------------------------------------------------------------

Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first "stop" after it has been notified about a node change.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers. Are there other things I can do to show, that the provided solution is the right one?

Best regards Trym
                
      was (Author: trym):
    Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first "stop" after it has been notified about a node change.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers. Are there other things I can do, to show that the solution is the right one?

Best regards Trym
                  
> Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-3582
>                 URL: https://issues.apache.org/jira/browse/SOLR-3582
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.
> http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (SOLR-3582) Our ZooKeeper watchers respond to session events as if they are change events, creating undesirable side effects.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller resolved SOLR-3582.
-------------------------------

    Resolution: Fixed

Thanks Trym!
                
> Our ZooKeeper watchers respond to session events as if they are change events, creating undesirable side effects.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3582
>                 URL: https://issues.apache.org/jira/browse/SOLR-3582
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.
> http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

Posted by "Trym Møller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402851#comment-13402851 ] 

Trym Møller commented on SOLR-3582:
-----------------------------------

Debugging the provided test shows this behaviour as well, that is, the Watch is kept even though, its notified about disConnection and syncConnection and the Watch will first "stop" after a node change occurs.

As Mark writes on the mailing list, there might be other ZooKeeper Watchers in Solr which might add new watchers on reconnect.

If we agree about the ZooKeeper watcher behaviour, then I think that the provided bug fix solves the problem in the LeaderElector and it can be committed to svn independently of problems with other watchers.

Best regards Trym
                
> Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-3582
>                 URL: https://issues.apache.org/jira/browse/SOLR-3582
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.
> http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402398#comment-13402398 ] 

Mark Miller commented on SOLR-3582:
-----------------------------------

Never mind - found confirmation elsewhere that session events do not remove the watcher. The ZooKeeper programming guide does not appear very clear on this when it talks about watches being one time triggers.
                
> Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-3582
>                 URL: https://issues.apache.org/jira/browse/SOLR-3582
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.
> http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3582) Leader election zookeeper watcher is responding to con/discon notifications incorrectly.

Posted by "Per Steffensen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403041#comment-13403041 ] 

Per Steffensen commented on SOLR-3582:
--------------------------------------

Trym didnt mention it, but this is not only a negligible problem that will never cause any problems in real-world usage. Actually we discovered the problem during one of our performance/endurance test of our real world application in a real world setup and with real world workload (high). We are running with numerous Solr instances in a SolrCloud cluster, with numerous collections each having about 25 slices each with 2 shards (one replica for each slice). During the test Solrs lose their ZK connection (probably due to too long GC pause) and reconnect - resulting in more watchers. The next time a dis-/re-connect to ZK happens it gets many watcher-events resulting in even more watchers for the next time. All in all, seen from the outside, this breaks our performance/endurance test - at first things starts to slow down and eventually JVMs break down with OOM errors. This is a self-reinforcing problem, because for every iteration more time has to be used by the garbage collector collecting watchers (twice as many as last time), increasing the probability of new ZK timeouts, and more time has to be used creating new watchers (twice as many as last time).

I think you should commit the fix. Basically because it makes a (our) real world application able to run for a long time - it wasnt before. Commit the fix, not so much for our sake, because we are using our own build of Solr (inkl this fix, other fixes and nice impl of optimistic locking etc (SOLR-3173, SOLR-3178, etc)) anyway, but to save others (that might also be among the "first movers" on using Solr 4.0 for high scale real world applications) from having to use weeks tracking down the essence of this issue and make a fix.

If you think this observation/fix should lead to a walk through of the code, to check if watchers are used undesirably at other places, and maybe even come to a more generic fix, I would endorse such a task. But for now I urge you to commit to save others from weeks of debugging. If/when you come to a better or more generic solution, you can always go refactor.

Regards, Per Steffensen
                
> Leader election zookeeper watcher is responding to con/discon notifications incorrectly.
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-3582
>                 URL: https://issues.apache.org/jira/browse/SOLR-3582
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.
> http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3582) Our ZooKeeper watchers respond to session events as if they are change events, creating undesirable side effects.

Posted by "Mark Miller (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated SOLR-3582:
------------------------------

    Summary: Our ZooKeeper watchers respond to session events as if they are change events, creating undesirable side effects.  (was: Leader election zookeeper watcher is responding to con/discon notifications incorrectly.)
    
> Our ZooKeeper watchers respond to session events as if they are change events, creating undesirable side effects.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3582
>                 URL: https://issues.apache.org/jira/browse/SOLR-3582
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 4.0, 5.0
>
>
> As brought up by Trym R. Møller on the mailing list, we are responding to watcher events about connection/disconnection as if they were notifications about node changes.
> http://www.lucidimagination.com/search/document/e13ef390b88eeee2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org