You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "helei (JIRA)" <ji...@apache.org> on 2011/07/04 07:54:22 UTC

[jira] [Commented] (ZOOKEEPER-1115) follower can not sync with leader

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059326#comment-13059326 ] 

helei commented on ZOOKEEPER-1115:
----------------------------------

Hi, guys. After follower get synced with leader, we will call zk.takeSnapshot in communication thread in Learner.java. And this is a function may block. In my case, it takes about 4 seconds to finish the snap. During this period, follower would not ack leader, and leader would thought that something happend to this follower, and close the connection with this follower. After that the follower will receive a broken pipe exception, and it will retry after that. Of cause it will fail time after time.
I think it is a problem.
thanks

> follower can not sync with leader
> ---------------------------------
>
>                 Key: ZOOKEEPER-1115
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1115
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.3
>         Environment: linux rhel 4, x64, java version 1.4.2
>            Reporter: helei
>            Priority: Critical
>
> there are 5 members in the quorum. one follower can not sync with leader after restart. it seems leader has close the data connection with this follower because of read timeout. here is the key log in follower:
> 2011-06-30 22:14:45,069 - WARN  [Thread-17:QuorumCnxManager$RecvWorker@658] - Connection broken: 
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:113)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:156)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
> 2011-06-30 22:14:45,069 - INFO  [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@689] - Notification: 3, 17198470148, 3, 3, LOOKING, LOOKING, 3
> 2011-06-30 22:14:45,070 - ERROR [Thread-16:QuorumCnxManager$SendWorker@559] - Failed to send last message. Shutting down thread.
> java.nio.channels.ClosedChannelException
>         at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:548)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:557)
> 2011-06-30 22:14:45,082 - INFO  [QuorumPeer:/0.0.0.0:2181:Learner@282] - Getting a diff from the leader 0x4011bd462
> 2011-06-30 22:14:45,083 - WARN  [Thread-18:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
> 2011-06-30 22:14:45,085 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@116] - Got zxid 0x4011bd405 expected 0x1
> 2011-06-30 22:14:45,090 - INFO  [QuorumPeer:/0.0.0.0:2181:FileTxnSnapLog@208] - Snapshotting: 4011bd462
> 2011-06-30 22:14:53,397 - WARN  [SyncThread:3:SendAckRequestProcessor@63] - Closing connection to leader, exception during packet send
> java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
>         at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61)
>         at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164)
>         at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
> 2011-06-30 22:14:53,398 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@82] - Exception when following the leader
> java.net.SocketException: Socket closed
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
>         at org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:358)
>         at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:108)
>         at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:79)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:634)
> 2011-06-30 22:14:53,398 - WARN  [SyncThread:3:SendAckRequestProcessor@63] - Closing connection to leader, exception during packet send
> java.net.SocketException: Socket closed
>         at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>         at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
>         at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61)
>         at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164)
>         at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
> 2011-06-30 22:14:53,399 - INFO  [QuorumPeer:/0.0.0.0:2181:Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
>         at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:638)
> and these are the leader's:
> 2011-06-30 22:14:35,943 - ERROR [LearnerHandler-/10.23.247.163:14975:LearnerHandler@444] - Unexpected exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         at java.io.DataInputStream.readInt(DataInputStream.java:370)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358)
> 2011-06-30 22:14:35,943 - WARN  [LearnerHandler-/10.23.247.163:14975:LearnerHandler@457] - ******* GOODBYE /10.23.247.163:14975 ********
> 2011-06-30 22:14:48,943 - ERROR [CommitProcessor:4:NIOServerCnxn@422] - Unexpected Exception: 
> java.nio.channels.CancelledKeyException
>         at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
>         at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
>         at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395)
>         at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1360)
>         at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
>         at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:535)
>         at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
> 2011-06-30 22:14:49,084 - ERROR [LearnerHandler-/10.23.247.163:14998:LearnerHandler@444] - Unexpected exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>         at java.io.DataInputStream.readInt(DataInputStream.java:370)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358)
> 2011-06-30 22:14:49,084 - WARN  [LearnerHandler-/10.23.247.163:14998:LearnerHandler@457] - ******* GOODBYE /10.23.247.163:14998 ********

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira