You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Austin Shoemaker (JIRA)" <ji...@apache.org> on 2008/09/18 08:37:44 UTC

[jira] Issue Comment Edited: (ZOOKEEPER-127) Use of non-standard election ports in config breaks services

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632111#action_12632111 ] 

austin edited comment on ZOOKEEPER-127 at 9/17/08 11:36 PM:
----------------------------------------------------------------------

After several more runs of our unit test using the patched algorithm 3, the test hangs as the service repeatedly tries to reelect the killed leader. This behavior is similar to ZOOKEEPER-131 which we had experienced using algorithms 0 and 1.

Server 10 is 10.50.65.40 and has been explicitly killed. The following log is from server 5, which mirrors logs on all the other servers.

Any idea what's happening here?

2008-09-18 00:28:20,029 - INFO  [QuorumPeer:QuorumPeer@394] - LOOKING
2008-09-18 00:28:20,029 - WARN  [QuorumPeer:ZooKeeperServer@198] - unable to parse zxid string into long: txt
2008-09-18 00:28:20,029 - WARN  [QuorumPeer:FastLeaderElection@493] - New election: 8589935405
2008-09-18 00:28:20,031 - WARN  [WorkerSender Thread:QuorumCnxManager@381] - Cannot open channel to 10( java.net.ConnectException: Connection refused)
2008-09-18 00:28:20,031 - INFO  [QuorumPeer:QuorumPeer@403] - FOLLOWING
2008-09-18 00:28:20,031 - INFO  [QuorumPeer:ZooKeeperServer@166] - Created server with dataDir:/zookeeper_data/5_data dataLogDir:/zookeeper_data/5_data tickT
ime:2000
2008-09-18 00:28:20,031 - INFO  [QuorumPeer:Follower@128] - Following /10.50.65.40:2888

[[[ exception below repeats 5 times ]]]

2008-09-18 00:28:20,032 - WARN  [QuorumPeer:Follower@145] - Unexpected exception
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:519)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:137)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:405)

[[[ then the follower is restarted ]]]

2008-09-18 00:28:24,049 - ERROR [QuorumPeer:Follower@370] - FIXMSG
java.lang.Exception: shutdown Follower
        at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:370)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:409)

[[[ at this point the log repeats from the beginning ]]]


      was (Author: austin):
    After about 6 runs of our unit test the test hangs as the service repeatedly tries to reelect the killed leader (similar to ZOOKEEPER-131 with algorithms 0 and 1). 


After several more runs of our unit test using the patched algorithm 3, the test hangs as the service repeatedly tries to reelect the killed leader. This behavior is similar to ZOOKEEPER-131 which we had experienced using algorithms 0 and 1.

Server 10 is 10.50.65.40 and has been explicitly killed. The following log is from server 5, which mirrors logs on all the other servers.

Any idea what's happening here?

2008-09-18 00:28:20,029 - INFO  [QuorumPeer:QuorumPeer@394] - LOOKING
2008-09-18 00:28:20,029 - WARN  [QuorumPeer:ZooKeeperServer@198] - unable to parse zxid string into long: txt
2008-09-18 00:28:20,029 - WARN  [QuorumPeer:FastLeaderElection@493] - New election: 8589935405
2008-09-18 00:28:20,031 - WARN  [WorkerSender Thread:QuorumCnxManager@381] - Cannot open channel to 10( java.net.ConnectException: Connection refused)
2008-09-18 00:28:20,031 - INFO  [QuorumPeer:QuorumPeer@403] - FOLLOWING
2008-09-18 00:28:20,031 - INFO  [QuorumPeer:ZooKeeperServer@166] - Created server with dataDir:/zookeeper_data/5_data dataLogDir:/zookeeper_data/5_data tickT
ime:2000
2008-09-18 00:28:20,031 - INFO  [QuorumPeer:Follower@128] - Following /10.50.65.40:2888

[[[ exception below repeats 5 times ]]]

2008-09-18 00:28:20,032 - WARN  [QuorumPeer:Follower@145] - Unexpected exception
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:519)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:137)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:405)

[[[ then the follower is restarted ]]]

2008-09-18 00:28:24,049 - ERROR [QuorumPeer:Follower@370] - FIXMSG
java.lang.Exception: shutdown Follower
        at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:370)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:409)

[[[ at this point the log repeats from the beginning ]]]

  
> Use of non-standard election ports in config breaks services
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-127
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-127
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.0.0
>            Reporter: Mark Harwood
>            Assignee: Flavio Paiva Junqueira
>            Priority: Minor
>             Fix For: 3.0.0
>
>         Attachments: mhPortChanges.patch, ZOOKEEPER-127.patch, ZOOKEEPER-127.patch, ZOOKEEPER-127.patch
>
>
> In QuorumCnxManager.toSend there is a call to create a connection as follows:
>     channel = SocketChannel.open(new InetSocketAddress(addr, port));
> Unfortunately "addr" is the ip address of a remote server while "port" is the electionPort of *this* server.
> As an example, given this configuration (taken from my zoo.cfg)
>   server.1=10.20.9.254:2881
>   server.2=10.20.9.9:2882
>   server.3=10.20.9.254:2883
> Server 3 was observed trying to make a connection to host 10.20.9.9 on port 2883 and obviously failing.
> In tests where all machines use the same electionPort this bug would not manifest itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.