You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Alexander Shraer (JIRA)" <ji...@apache.org> on 2013/11/02 00:46:17 UTC

[jira] [Commented] (ZOOKEEPER-1807) Observers spam each other creating connections to the election addr

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811771#comment-13811771 ] 

Alexander Shraer commented on ZOOKEEPER-1807:
---------------------------------------------

Hi Raul,

ZK-107 allows changing server roles. In one config a server is an observer, in the next one it may be a follower. I haven't looked closely, but I think the intention was to talk with everyone you know to try to get the most up-to-date config information. Instead of reverting this to the previous code, consider adding a check (regardless of whether this is an observer/participant server) that won't attempt to create a connection if one is already there to the same server with the same election address (election addresses may change from one view to the next). 

The code should handle observer id > 0, please file a JIRA if you find that there is a problem somewhere.

Thanks,
Alex



> Observers spam each other creating connections to the election addr
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1807
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Raul Gutierrez Segales
>            Assignee: Raul Gutierrez Segales
>
> Hey [~shralex],
> I noticed today that my Observers are spamming each other trying to open connections to the election port. I've got tons of these:
> {noformat}
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 9
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 10
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 6
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 12
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a connection already for server 14
> {noformat}
> and so and so on ad nauseam. 
> Now, looking around I found this inside FastLeaderElection.java from when you committed ZOOKEEPER-107:
> {noformat}
>      private void sendNotifications() {
> -        for (QuorumServer server : self.getVotingView().values()) {
> -            long sid = server.id;
> -
> +        for (long sid : self.getAllKnownServerIds()) {
> +            QuorumVerifier qv = self.getQuorumVerifier();
> {noformat}
> Is that really desired? I suspect that is what's causing Observers to try to connect to each other (as opposed as just connecting to participants). I'll give it a try now and let you know. (Also, we use observer ids that are > 0, and I saw some parts of the code that might not deal with that assumption - so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.1#6144)