You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by "Patrick Hunt (JIRA)" <ji...@apache.org> on 2012/04/25 00:06:07 UTC

[jira] [Created] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Patrick Hunt created ZOOKEEPER-1453:
---------------------------------------

             Summary: corrupted logs may not be correctly identified by FileTxnIterator
                 Key: ZOOKEEPER-1453
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.3.3
            Reporter: Patrick Hunt
            Priority: Critical


See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408418#comment-13408418 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

OK, reproduced this again.... This time I made sure to capture a tar file of the data directory on each node and there is now a zookeeper_trace.log file in each tar file. In this scenario, I had a 3-node ensemble {10.10.5.42, 10.10.5.44, 10.10.5.123} and I rebooted 10.10.5.42 via issuing "reboot -f" on that node. When the node comes back up, it spews the following log message as before. Around 16:35 it is rebooted, and comes back around 16:38.

2012-07-06 16:35:51,512 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@158][] - sessionid:0x3385e6145f2000b type:setData cxid:0x4ff77654 zxid:0x100000d4f txntype:5 reqpath:n/a

...[REBOOTED]...

2012-07-06 16:38:38,026 [myid:] - INFO  [main:QuorumPeerConfig@99][] - Reading configuration from: /sf/data/zookeeper/10.10.5.42/10.10.5.42_2181.cfg
2012-07-06 16:38:38,038 [myid:2] - INFO  [main:DatadirCleanupManager@78][] - autopurge.snapRetainCount set to 5
2012-07-06 16:38:38,038 [myid:2] - INFO  [main:DatadirCleanupManager@79][] - autopurge.purgeInterval set to 1
2012-07-06 16:38:38,039 [myid:2] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138][] - Purge task started.
2012-07-06 16:38:38,045 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:hiearchy=default
2012-07-06 16:38:38,045 [myid:2] - DEBUG [main:HierarchyDynamicMBean@260][] - postRegister is called.
2012-07-06 16:38:38,046 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=root
2012-07-06 16:38:38,047 [myid:2] - DEBUG [main:LoggerDynamicMBean@258][] - Adding AppenderMBean for appender named CONSOLE
2012-07-06 16:38:38,048 [myid:2] - DEBUG [main:AppenderDynamicMBean@158][] - getMBeanInfo called.
2012-07-06 16:38:38,048 [myid:2] - DEBUG [main:AppenderDynamicMBean@337][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:appender=CONSOLE
2012-07-06 16:38:38,048 [myid:2] - DEBUG [main:AppenderDynamicMBean@197][] - Adding LayoutMBean:CONSOLE,layout=org.apache.log4j.PatternLayout
2012-07-06 16:38:38,049 [myid:2] - DEBUG [main:LayoutDynamicMBean@140][] - getMBeanInfo called.
2012-07-06 16:38:38,049 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:appender=CONSOLE,layout=org.apache.log4j.PatternLayout
2012-07-06 16:38:38,049 [myid:2] - DEBUG [main:LoggerDynamicMBean@258][] - Adding AppenderMBean for appender named TRACEFILE
2012-07-06 16:38:38,050 [myid:2] - DEBUG [main:AppenderDynamicMBean@158][] - getMBeanInfo called.
2012-07-06 16:38:38,050 [myid:2] - DEBUG [main:AppenderDynamicMBean@337][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:appender=TRACEFILE
2012-07-06 16:38:38,051 [myid:2] - DEBUG [main:AppenderDynamicMBean@197][] - Adding LayoutMBean:TRACEFILE,layout=org.apache.log4j.PatternLayout
2012-07-06 16:38:38,052 [myid:2] - DEBUG [main:LayoutDynamicMBean@140][] - getMBeanInfo called.
2012-07-06 16:38:38,052 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:appender=TRACEFILE,layout=org.apache.log4j.PatternLayout
2012-07-06 16:38:38,052 [myid:2] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144][] - Purge task completed.
2012-07-06 16:38:38,052 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [root] as listener.
2012-07-06 16:38:38,052 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.zookeeper.server.persistence.FileSnap
2012-07-06 16:38:38,053 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.zookeeper.server.persistence.FileSnap] as listener.
2012-07-06 16:38:38,053 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.log4j.jmx.LoggerDynamicMBean
2012-07-06 16:38:38,053 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.log4j.jmx.LoggerDynamicMBean] as listener.
2012-07-06 16:38:38,053 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.zookeeper.jmx.MBeanRegistry
2012-07-06 16:38:38,053 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.zookeeper.jmx.MBeanRegistry] as listener.
2012-07-06 16:38:38,054 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.zookeeper.server.persistence.FileTxnSnapLog
2012-07-06 16:38:38,054 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.zookeeper.server.persistence.FileTxnSnapLog] as listener.
2012-07-06 16:38:38,054 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.zookeeper.server.quorum.QuorumPeerConfig
2012-07-06 16:38:38,054 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.zookeeper.server.quorum.QuorumPeerConfig] as listener.
2012-07-06 16:38:38,054 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.log4j.jmx.AppenderDynamicMBean
2012-07-06 16:38:38,054 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.log4j.jmx.AppenderDynamicMBean] as listener.
2012-07-06 16:38:38,054 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.zookeeper.server.persistence.Util
2012-07-06 16:38:38,055 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.zookeeper.server.persistence.Util] as listener.
2012-07-06 16:38:38,055 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.zookeeper.server.quorum.QuorumPeerMain
2012-07-06 16:38:38,055 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.zookeeper.server.quorum.QuorumPeerMain] as listener.
2012-07-06 16:38:38,055 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.log4j.jmx.HierarchyDynamicMBean
2012-07-06 16:38:38,055 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.log4j.jmx.HierarchyDynamicMBean] as listener.
2012-07-06 16:38:38,055 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.zookeeper.server.DatadirCleanupManager
2012-07-06 16:38:38,055 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.zookeeper.server.DatadirCleanupManager] as listener.
2012-07-06 16:38:38,056 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.log4j.jmx.LayoutDynamicMBean
2012-07-06 16:38:38,056 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.log4j.jmx.LayoutDynamicMBean] as listener.
2012-07-06 16:38:38,056 [myid:2] - DEBUG [main:AbstractDynamicMBean@148][] - preRegister called. Server=com.sun.jmx.mbeanserver.JmxMBeanServer@49b979c0, name=log4j:logger=org.apache.zookeeper.server.persistence.FileTxnLog
2012-07-06 16:38:38,056 [myid:2] - DEBUG [main:HierarchyDynamicMBean@128][] - ---Adding logger [org.apache.zookeeper.server.persistence.FileTxnLog] as listener.
2012-07-06 16:38:38,056 [myid:2] - INFO  [main:QuorumPeerMain@131][] - Starting quorum peer
2012-07-06 16:38:38,072 [myid:2] - INFO  [main:NIOServerCnxnFactory@108][] - binding to port /10.10.5.42:2181
2012-07-06 16:38:38,080 [myid:2] - INFO  [main:QuorumPeer@1107][] - tickTime set to 2000
2012-07-06 16:38:38,080 [myid:2] - INFO  [main:QuorumPeer@1127][] - minSessionTimeout set to -1
2012-07-06 16:38:38,080 [myid:2] - INFO  [main:QuorumPeer@1138][] - maxSessionTimeout set to -1
2012-07-06 16:38:38,081 [myid:2] - INFO  [main:QuorumPeer@1153][] - initLimit set to 10
2012-07-06 16:38:38,093 [myid:2] - INFO  [main:FileSnap@83][] - Reading snapshot /sf/data/zookeeper/10.10.5.42/version-2/snapshot.0
2012-07-06 16:38:38,097 [myid:2] - DEBUG [main:FileTxnLog$FileTxnIterator@575][] - Created new input stream /sf/data/zookeeper/10.10.5.42/version-2/log.100000001
2012-07-06 16:38:38,097 [myid:2] - DEBUG [main:FileTxnLog$FileTxnIterator@578][] - Created new input archive /sf/data/zookeeper/10.10.5.42/version-2/log.100000001
2012-07-06 16:38:38,124 [myid:2] - DEBUG [main:DataTree@951][] - Ignoring processTxn failure hdr: -1 : error: -110
2012-07-06 16:38:38,124 [myid:2] - DEBUG [main:FileTxnSnapLog@241][] - Ignoring processTxn failure hdr: -1 : error: -110
2012-07-06 16:38:38,124 [myid:2] - DEBUG [main:DataTree@951][] - Ignoring processTxn failure hdr: -1 : error: -110
2012-07-06 16:38:38,124 [myid:2] - DEBUG [main:FileTxnSnapLog@241][] - Ignoring processTxn failure hdr: -1 : error: -110
2012-07-06 16:38:38,126 [myid:2] - DEBUG [main:DataTree@951][] - Ignoring processTxn failure hdr: -1 : error: -110
2012-07-06 16:38:38,126 [myid:2] - DEBUG [main:FileTxnSnapLog@241][] - Ignoring processTxn failure hdr: -1 : error: -110

... [ REPEATS FOR A LONG TIME ] ...

2012-07-06 16:38:38,432 [myid:2] - DEBUG [main:FileTxnLog$FileTxnIterator@618][] - EOF excepton java.io.EOFException: Failed to read /sf/data/zookeeper/10.10.5.42/version-2/log.100000001
2012-07-06 16:38:38,437 [myid:2] - INFO  [QuorumPeerListener:QuorumCnxManager$Listener@530][] - My election bind port: /10.10.5.42:2183
2012-07-06 16:38:38,443 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:QuorumPeer@825][] - Starting quorum peer
2012-07-06 16:38:38,447 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:QuorumPeer@860][] - LOOKING
2012-07-06 16:38:38,447 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:QuorumPeer@789][] - Initializing leader election protocol...
2012-07-06 16:38:38,448 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@732][] - Updating proposal: 2 (newleader), 0x100000d54 (newzxid), -1 (oldleader), 0xffffffffffffffff (oldzxid)
2012-07-06 16:38:38,449 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@831][] - New election. My id =  2, proposed zxid=0x100000d54
2012-07-06 16:38:38,449 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 1 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:38,450 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 2 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:38,450 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 3 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:38,450 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@375][] - Opening channel to server 1
2012-07-06 16:38:38,453 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@381][] - Connected to server 1
2012-07-06 16:38:38,454 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager$SendWorker@616][] - Address of remote peer: 1
2012-07-06 16:38:38,456 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@375][] - Opening channel to server 3
2012-07-06 16:38:38,456 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@381][] - Connected to server 3
2012-07-06 16:38:38,456 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@191][] - Have smaller server identifier, so dropping the connection: (3, 2)
2012-07-06 16:38:38,457 [myid:2] - DEBUG [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@295][] - Receive new notification message. My id = 2
2012-07-06 16:38:38,457 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@635][] - Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2012-07-06 16:38:38,458 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@652][] - id: 2, proposed id: 2, zxid: 0x100000d54, proposed zxid: 0x100000d54
2012-07-06 16:38:38,458 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@901][] - Adding vote: from=2, proposed leader=2, proposed zxid=0x100000d54, proposed election epoch=0x1
2012-07-06 16:38:38,659 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:QuorumCnxManager@444][] - Queue size: 0
2012-07-06 16:38:38,659 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 1 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:38,659 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 2 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:38,659 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@408][] - There is a connection already for server 1
2012-07-06 16:38:38,659 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 3 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:38,659 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@865][] - Notification time out: 400
2012-07-06 16:38:38,660 [myid:2] - DEBUG [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@295][] - Receive new notification message. My id = 2
2012-07-06 16:38:38,660 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@375][] - Opening channel to server 3
2012-07-06 16:38:38,660 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@381][] - Connected to server 3
2012-07-06 16:38:38,661 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@191][] - Have smaller server identifier, so dropping the connection: (3, 2)
2012-07-06 16:38:38,661 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@635][] - Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2012-07-06 16:38:38,661 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@652][] - id: 2, proposed id: 2, zxid: 0x100000d54, proposed zxid: 0x100000d54
2012-07-06 16:38:38,661 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@901][] - Adding vote: from=2, proposed leader=2, proposed zxid=0x100000d54, proposed election epoch=0x1
2012-07-06 16:38:39,061 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:QuorumCnxManager@444][] - Queue size: 0
2012-07-06 16:38:39,062 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 1 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:39,062 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@408][] - There is a connection already for server 1
2012-07-06 16:38:39,062 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 2 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:39,062 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 3 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:39,062 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@865][] - Notification time out: 800
2012-07-06 16:38:39,062 [myid:2] - DEBUG [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@295][] - Receive new notification message. My id = 2
2012-07-06 16:38:39,062 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@375][] - Opening channel to server 3
2012-07-06 16:38:39,063 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@381][] - Connected to server 3
2012-07-06 16:38:39,063 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@191][] - Have smaller server identifier, so dropping the connection: (3, 2)
2012-07-06 16:38:39,063 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@635][] - Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2012-07-06 16:38:39,064 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@652][] - id: 2, proposed id: 2, zxid: 0x100000d54, proposed zxid: 0x100000d54
2012-07-06 16:38:39,064 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@901][] - Adding vote: from=2, proposed leader=2, proposed zxid=0x100000d54, proposed election epoch=0x1
2012-07-06 16:38:39,864 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:QuorumCnxManager@444][] - Queue size: 0
2012-07-06 16:38:39,864 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 1 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:39,864 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@408][] - There is a connection already for server 1
2012-07-06 16:38:39,864 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 2 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:39,864 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@583][] - Sending Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), 3 (recipient), 2 (myid), 0x1 (n.peerEpoch)
2012-07-06 16:38:39,864 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@865][] - Notification time out: 1600
2012-07-06 16:38:39,865 [myid:2] - DEBUG [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@295][] - Receive new notification message. My id = 2
2012-07-06 16:38:39,865 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@375][] - Opening channel to server 3
2012-07-06 16:38:39,865 [myid:2] - DEBUG [WorkerSender[myid=2]:QuorumCnxManager@381][] - Connected to server 3
2012-07-06 16:38:39,865 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@191][] - Have smaller server identifier, so dropping the connection: (3, 2)
2012-07-06 16:38:39,866 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@635][] - Notification: 2 (n.leader), 0x100000d54 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2012-07-06 16:38:39,866 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@652][] - id: 2, proposed id: 2, zxid: 0x100000d54, proposed zxid: 0x100000d54
2012-07-06 16:38:39,866 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@901][] - Adding vote: from=2, proposed leader=2, proposed zxid=0x100000d54, proposed election epoch=0x1
2012-07-06 16:38:41,259 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227][] - Accepted socket connection from /10.10.5.44:50944
2012-07-06 16:38:41,263 [myid:2] - WARN  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354][] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-06 16:38:41,264 [myid:2] - DEBUG [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@358][] - IOException stack trace
java.io.IOException: ZooKeeperServer not running
    at org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:926)
    at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:229)
    at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:238)
    at java.lang.Thread.run(Thread.java:722)


Thereafter I see " Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running" over and over. See full trace files on all three nodes in the attachments.

I really need help on this, so anything anyone can offer will be immensely appreciated.

                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399853#comment-13399853 ] 

Patrick Hunt edited comment on ZOOKEEPER-1453 at 6/24/12 4:37 AM:
------------------------------------------------------------------

Patrick, thanks for taking the time to explain.. I read throw the other bug and your explanation is very clear. I'd like to work on a fix for this as it's hitting us very frequently with a stress test we do where we continually reboot one of our machines that is hosting one of our zk servers. Anyhow, I'm looking at the FileTxnIterator code, and I definitely see the bug in next() method in that it always assumes EOF is success. Have you given thought to the right solution here? Maybe giving precedence to validating CRC before checking for EOF? 

What do you think about this:

{noformat}
public boolean next() throws IOException {
    if (ia == null) {
        return false;
    }
    try {
        long crcValue = ia.readLong("crcvalue");
        byte[] bytes = Util.readTxnBytes(ia);
        // validate CRC
        Checksum crc = makeChecksumAlgorithm();
        if (bytes) {
            crc.update(bytes, 0, bytes.length);
        }
        if (crcValue != crc.getValue())
            throw new IOException(CRC_ERROR);
        if (bytes == null || bytes.length == 0)
            throw new EOFException("Failed to read " + logFile);
        hdr = new TxnHeader();
        record = SerializeUtils.deserializeTxn(bytes, hdr);
    } catch (EOFException e) {
    ...
{noformat}

                
      was (Author: marshall):
    Patrick, thanks for taking the time to explain.. I read throw the other bug and your explanation is very clear. I'd like to work on a fix for this as it's hitting us very frequently with a stress test we do where we continually reboot one of our machines that is hosting one of our zk servers. Anyhow, I'm looking at the FileTxnIterator code, and I definitely see the bug in next() method in that it always assumes EOF is success. Have you given thought to the right solution here? Maybe giving precedence to validating CRC before checking for EOF? 

What do you think about this:

public boolean next() throws IOException {
    if (ia == null) {
        return false;
    }
    try {
        long crcValue = ia.readLong("crcvalue");
        byte[] bytes = Util.readTxnBytes(ia);
        // validate CRC
        Checksum crc = makeChecksumAlgorithm();
        if (bytes) {
            crc.update(bytes, 0, bytes.length);
        }
        if (crcValue != crc.getValue())
            throw new IOException(CRC_ERROR);
        if (bytes == null || bytes.length == 0)
            throw new EOFException("Failed to read " + logFile);
        hdr = new TxnHeader();
        record = SerializeUtils.deserializeTxn(bytes, hdr);
    } catch (EOFException e) {
    ...
                  
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408986#comment-13408986 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Is it possible that the issue I'm seeing isn't necessarily caused by corrupted logs, but some sort of failed leader election? It seems very suspect that if I "reboot -f" the leader node, that it can't joint he ensemble on reboot. But once I restart the other two zk processes, then everyone syncs up and we're OK again. Must be a failure in leader election...?
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400260#comment-13400260 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

By the way, when you untar the attached zookeeper.tar.gz file, inside is the full zookeeper_trace.log file.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: zookeeper.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409500#comment-13409500 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Just wanted to confirm that this was indeed the problem. We use zookeeper in an embedded manner so our application code creates the myid file programatically when we start zookeeper. We now know there is a bug in how it's doing that. I can't believe how much time was wasted on such a simple configuration problem. So thankful to Flavio and everyone else here for helping me sort that out. Given how fatal this was, it might have been useful if ZK could have detected multiple servers with the same ID and given a more helpful error message.

In any event I agree with Bill that there is still a potential issue with corruption from a reboot that should be addressed under this jira.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409226#comment-13409226 ] 

Flavio Junqueira commented on ZOOKEEPER-1453:
---------------------------------------------

I have just checked the ids of the servers, both traces and myid file, and the id of the server 10.10.5.42 is wrong. From the traces, you can actually tell that it was 3 and later switched to 2 after the reboot. Having two servers with the same id seems to be causing the confusion you're observing here.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411382#comment-13411382 ] 

Flavio Junqueira commented on ZOOKEEPER-1453:
---------------------------------------------

I'm glad it worked, Marshall.

bq. there is still a potential issue with corruption from a reboot that should be addressed under this jira.

In my understanding, the partial writes during syncs that Bill is referring to would be captured by the crc checks. Is this not enough?
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408956#comment-13408956 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Two other observations about this massive roadblock that I've seen:

(1) Far, far more likely to be a problem if it's the leader node that is rebooted

(2) Once rebooted, even if I clear out the version-2 directory and restart zookeeper on that node it cannot rejoin the ensemble. If I restart one of the other zookeeper processes that is already in the ensemble, then those *two nodes* can now properly see one another and form quorum. The third node that was already in the ensemble is suddenly not connectable. If I restart zookeeper on that node, then it joins the new ensemble and everything is fine.

This seems pretty catastrophic to me. 

                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399671#comment-13399671 ] 

Patrick Hunt commented on ZOOKEEPER-1453:
-----------------------------------------

See this comment, it's a very clear (and easy to reproduce) use case:
https://issues.apache.org/jira/browse/ZOOKEEPER-1449?focusedCommentId=13261071&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13261071

Basically the crc is after the record, we don't have protection against issues prior to verifying the xsum (it's more complicated than that though - see the linked comment).
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408714#comment-13408714 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

The only way I am able to get this node to rejoin the ensemble is to stop zookeeper, removing it's data directory entirely, then copy the entire version-2 directory from another peer by hand. Then when I start zk on that node everything works as expected.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408440#comment-13408440 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Testing this with the fix from ZOOKEEPER-1489 right now. Usually takes about 30 minutes for our test to fail. Will report back with results when that finishes.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408952#comment-13408952 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Well this seems like a pretty huge limitation if an ensemble node can't reliably survive a power failure on one of the nodes without data corruption. 

Surely someone has seen this before or has an idea how this can be fixed? Why can't we simply write to a *.tmp file and then move it over to the real file only after we know all the writes to the *.tmp file completed? POSIX guarantees rename operation must be atomic (http://pubs.opengroup.org/onlinepubs/009695399/functions/rename.html). 
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408363#comment-13408363 ] 

Flavio Junqueira commented on ZOOKEEPER-1453:
---------------------------------------------

I don't think the trace file is here, this is the output of ls after untaring it:

{noformat}
drwxr-xr-x  0 root   root        0 Jun 25 00:54 zookeeper/
-rw-r-----  0 root   root      247 Jun 25 01:12 zookeeper/10.10.5.123_2181.cfg
-rw-r-----  0 root   root      176 Jun 25 01:12 zookeeper/10.10.5.123_2181.cfg.dynamic
-rw-r-----  0 root   root        1 Jun 25 01:12 zookeeper/myid
drwxr-xr-x  0 root   root        0 Jun 25 00:54 zookeeper/version-2/
-rw-r--r--  0 root   root        1 Jun 25 00:54 zookeeper/version-2/currentEpoch
-rw-r--r--  0 root   root        1 Jun 25 00:54 zookeeper/version-2/acceptedEpoch
-rw-r--r--  0 root   root      669 Jun 25 00:54 zookeeper/version-2/snapshot.0
-rw-r--r--  0 root   root 67108880 Jun 25 01:09 zookeeper/version-2/log.100000001
{noformat}

I was able to read the transaction log ok, though.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: zookeeper.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall McMullen updated ZOOKEEPER-1453:
-----------------------------------------

    Attachment: 10.10.5.123-withPath1489.tar.gz
                10.10.5.44-withPath1489.tar.gz
                10.10.5.42-withPath1489.tar.gz
    
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Bill Bridge (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412379#comment-13412379 ] 

Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------

Yes a full redesign of the logging system would be too much for this problem. Corruptions usually come from software bugs that misdirect data when writing or when the storage accidentally fails to write (note that writing to the wrong place can look like sa lost write at the correct place). Another common source is administrator mistakes like copying data to the wrong file or simultaneously assigning the same file to two different uses. Bit flips within a sector do not happen with disks. I think the CRC is a reasonable check value for the kinds of corruption we are likely to encounter.

Sorry, I omitted a critical point about preformattting logs. The same log file is used over and over again so that there is no allocation when writing. Before the first use, the log is initialized to contain valid empty blocks for log sequence zero. Since every reuse is at a higher sequence number the EOF is the first block with a lower sequence number than the sequence number recorded in the log header. This is how Oracle writes online logs. [Oracle Online Redo Log|http://docs.oracle.com/cd/E11882_01/server.112/e25789/physical.htm#i1006163]

ZooKeeper is different, it uses a new file for every log. It incrementally preallocates with zero to batch the allocations, and the zeroes are not forced to disk. The real data writes usually overwrite the zeroes in the filesystem buffer cache. Thus the zeroes are not likely to be on disk if there is a partial write due to a crash. I suppose there are times when the fsync unnecessarily forces the zeros to disk. I guess the consequences of a crash during fsync are file system dependent. Maybe checking for the 0x42 being 0 at the end of a record indicates a partial record when the zeros were flushed earlier, and EOF in the middle of a record means a partial write as well.


                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Bill Bridge (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409435#comment-13409435 ] 

Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------

Glad to hear it was just a configuration issue. I hope fixing the configuration eliminates the corruption. If it does then the partial write problem is much less likely than your experiment implies. However it is a real world issue. If the OS crashes during a file sync, it is possible to have some blocks written and others not. The fix is to recognize partially written records and throw them away at recovery time. Oracle recovery has done this for many years.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412560#comment-13412560 ] 

Patrick Hunt commented on ZOOKEEPER-1453:
-----------------------------------------

Hi Bill. 

bq. I am guessing that the "sentinel iirc" is the 0x42

that's correct.

bq. ZooKeeper is different, it uses a new file for every log.

Ben was talking about changing this at one point. He wanted to reuse log files. Seemed like a good idea but he never completed the patch.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400257#comment-13400257 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

I was able to reproduce this problem again. After I power cycled the server a few times, the node in question refuses to join the ensemble and no clients can connect to it. When I try to telnet to the host in question and issue 'stat' it fails with:

This ZooKeeper instance is not currently serving requests

I enabled tracing and in the log file as it's starting up it fails with:

2012-06-24 20:34:31,734 [myid:1] - INFO  [main:FileSnap@83][] - Reading snapshot /sf/data/zookeeper/10.10.5.123/version-2/snapshot.0
2012-06-24 20:34:31,738 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@575][] - Created new input stream /sf/data/zookeeper/10.10.5.123/version-2/log.100000001
2012-06-24 20:34:31,738 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@578][] - Created new input archive /sf/data/zookeeper/10.10.5.123/version-2/log.100000001
2012-06-24 20:34:31,763 [myid:1] - DEBUG [main:DataTree@951][] - Ignoring processTxn failure hdr: -1 : error: -110
2012-06-24 20:34:31,763 [myid:1] - DEBUG [main:FileTxnSnapLog@241][] - Ignoring processTxn failure hdr: -1 : error: -110
2012-06-24 20:34:31,763 [myid:1] - DEBUG [main:DataTree@951][] - Ignoring processTxn failure hdr: -1 : error: -110

...[ repeats many many times ]...

2012-06-24 20:34:32,065 [myid:1] - DEBUG [main:FileTxnLog$FileTxnIterator@618][] - EOF excepton java.io.EOFException: Failed to read /sf/data/zookeeper/10.10.5.123/version-2/log.100000001
2012-06-24 20:34:32,067 [myid:1] - INFO  [NIOServerCxn.Factory:/10.10.5.123:2181:NIOServerCnxnFactory@227][] - Accepted socket connection from /10.10.5.123:39623
2012-06-24 20:34:32,069 [myid:1] - INFO  [QuorumPeerListener:QuorumCnxManager$Listener@530][] - My election bind port: /10.10.5.123:2183
2012-06-24 20:34:32,071 [myid:1] - WARN  [NIOServerCxn.Factory:/10.10.5.123:2181:NIOServerCnxn@354][] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-06-24 20:34:32,071 [myid:1] - DEBUG [NIOServerCxn.Factory:/10.10.5.123:2181:NIOServerCnxn@358][] - IOException stack trace

I also have a copy of the data directory if it would help.

                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408705#comment-13408705 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

I disabled write cache on the drive that holds my zookeeper database, and it still fails in exactly the same way :-<. 

Here's the part that really baffles me, I tried removing the on-disk database entirely (the version-2 directory) and starting up zookeeper again on the thought that it would just pull down a fresh copy of the database from one of its peers. Unfortunately it still fails to connect. See the output below:

root@SF-42:/sf/data# java -cp /opt/zookeeper-3.5.0-p7/zookeeper-3.5.0-p7.jar:/opt/zookeeper-3.5.0-p7/lib/log4j-1.2.16.jar:/opt/zookeeper-3.5.0-p7/lib/commons-cli-1.2.jar:/opt/zookeeper-3.5.0-p7/lib/slf4j-log4j12-1.6.2.jar:/opt/zookeeper-3.5.0-p7/lib/netty-3.2.5.Final.jar:/opt/zookeeper-3.5.0-p7/lib/jline-0.9.94.jar:/opt/zookeeper-3.5.0-p7/lib/javacc.jar:/opt/zookeeper-3.5.0-p7/lib/slf4j-api-1.6.2.jar:/opt/zookeeper-3.5.0-p7/conf -Dzookeeper.root.logger=DEBUG,CONSOLE -Dzookeeper.log.dir=. -Dzookeeper.tracelog.dir=/sf/data/zookeeper/10.10.5.42/ -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false -Djute.maxbuffer=4194304 org.apache.zookeeper.server.quorum.QuorumPeerMain /sf/data/zookeeper/10.10.5.42/10.10.5.42_2181.cfg
2012-07-07 10:20:23,270 [myid:] - INFO  [main:QuorumPeerConfig@99] - Reading configuration from: /sf/data/zookeeper/10.10.5.42/10.10.5.42_2181.cfg
2012-07-07 10:20:23,279 [myid:2] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 5
2012-07-07 10:20:23,279 [myid:2] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 1
2012-07-07 10:20:23,280 [myid:2] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2012-07-07 10:20:23,289 [myid:2] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
2012-07-07 10:20:23,290 [myid:2] - INFO  [main:QuorumPeerMain@131] - Starting quorum peer
2012-07-07 10:20:23,300 [myid:2] - INFO  [main:NIOServerCnxnFactory@108] - binding to port /10.10.5.42:2181
2012-07-07 10:20:23,308 [myid:2] - INFO  [main:QuorumPeer@1107] - tickTime set to 2000
2012-07-07 10:20:23,308 [myid:2] - INFO  [main:QuorumPeer@1127] - minSessionTimeout set to -1
2012-07-07 10:20:23,308 [myid:2] - INFO  [main:QuorumPeer@1138] - maxSessionTimeout set to -1
2012-07-07 10:20:23,308 [myid:2] - INFO  [main:QuorumPeer@1153] - initLimit set to 10
2012-07-07 10:20:23,321 [myid:2] - INFO  [main:QuorumPeer@620] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2012-07-07 10:20:23,322 [myid:2] - INFO  [main:QuorumPeer@635] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2012-07-07 10:20:23,325 [myid:2] - INFO  [QuorumPeerListener:QuorumCnxManager$Listener@530] - My election bind port: /10.10.5.42:2183
2012-07-07 10:20:23,333 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:QuorumPeer@860] - LOOKING
2012-07-07 10:20:23,334 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@831] - New election. My id =  2, proposed zxid=0x0
2012-07-07 10:20:23,341 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted socket connection from /10.10.5.44:48534
2012-07-07 10:20:23,342 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@191] - Have smaller server identifier, so dropping the connection: (3, 2)
2012-07-07 10:20:23,342 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@635] - Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2012-07-07 10:20:23,345 [myid:2] - WARN  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:23,346 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket connection for client /10.10.5.44:48534 (no session established for client)
2012-07-07 10:20:23,544 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@865] - Notification time out: 400
2012-07-07 10:20:23,545 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@191] - Have smaller server identifier, so dropping the connection: (3, 2)
2012-07-07 10:20:23,545 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@635] - Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2012-07-07 10:20:23,680 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted socket connection from /10.10.5.44:48535
2012-07-07 10:20:23,680 [myid:2] - WARN  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:23,680 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket connection for client /10.10.5.44:48535 (no session established for client)
2012-07-07 10:20:23,946 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@865] - Notification time out: 800
2012-07-07 10:20:23,946 [myid:2] - INFO  [WorkerSender[myid=2]:QuorumCnxManager@191] - Have smaller server identifier, so dropping the connection: (3, 2)
2012-07-07 10:20:23,947 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@635] - Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config version)
2012-07-07 10:20:24,014 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted socket connection from /10.10.5.44:48536
2012-07-07 10:20:24,014 [myid:2] - WARN  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:24,015 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket connection for client /10.10.5.44:48536 (no session established for client)
2012-07-07 10:20:24,349 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted socket connection from /10.10.5.44:48650
2012-07-07 10:20:24,349 [myid:2] - WARN  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:24,349 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket connection for client /10.10.5.44:48650 (no session established for client)
2012-07-07 10:20:24,683 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxnFactory@227] - Accepted socket connection from /10.10.5.44:48678
2012-07-07 10:20:24,683 [myid:2] - WARN  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2012-07-07 10:20:24,683 [myid:2] - INFO  [NIOServerCxn.Factory:/10.10.5.42:2181:NIOServerCnxn@1002] - Closed socket connection for client /10.10.5.44:48678 (no session established for client)
2012-07-07 10:20:24,747 [myid:2] - INFO  [QuorumPeer[myid=2]/10.10.5.42:2181:FastLeaderElection@865] - Notification time out: 1600
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408433#comment-13408433 ] 

Patrick Hunt commented on ZOOKEEPER-1453:
-----------------------------------------

Sweet, thx.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall McMullen updated ZOOKEEPER-1453:
-----------------------------------------

    Attachment: zookeeper.tar.gz

Data directory of the node that won't rejoin zookeeper ensemble.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: zookeeper.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412037#comment-13412037 ] 

Flavio Junqueira commented on ZOOKEEPER-1453:
---------------------------------------------

Hi Bill, Thanks a lot for your input. I really appreciate that you're taking your time to revisit this part of the code. Check some comments I have below, please:

bq. If a CRC failure is always treated as EOF, then corruption that is not from a partial write during a crash will not be treated like corruption

I think that if we are to assume corruptions like bit flips and such, we need to do much more than adding CRCs and block-aligned writes. I'm a bit concerned about a full redesign of the transaction log scheme to consider cases that the current fault model of zookeeper does not cover.

bq. The log is preformatted to contain valid blocks with an earlier log sequence number.

I'm not sure I understand this step. How do we know the log sequence numbers beforehand?

bq. It may be sufficient to decide a CRC failure is EOF if it is caused by the CRC value being zero.

It sounds right to me.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408100#comment-13408100 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

I'm seeing this again, same scenario as described before. Anyone have a chance to look at my log files to see what might be wrong here? 

Also, I'm wondering if the new code for https://issues.apache.org/jira/browse/ZOOKEEPER-1427 could be helpful here. I noticed the code in 1427 makes the writes to acceptedEpoch and currentEpoch files atomic, but what about the log and snapshot files? Are those similarly written atomically? Maybe that is causing my corruption issues....? 

Could really use some help here.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: zookeeper.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408430#comment-13408430 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Patrick, yes, I will try that right now and get back to you.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400098#comment-13400098 ] 

Patrick Hunt commented on ZOOKEEPER-1453:
-----------------------------------------

Marshall are you sure you have this fix? https://issues.apache.org/jira/browse/ZOOKEEPER-1156

I think it would be good to fix this issue, however I'm more concerned about the fact that you're seeing corruption in your stress test environment. I think we need to track down the cause of that, my initial reaction is that you are seeing some bug that we need to fix. The ability to handle a corrupted log file is secondary to actually fixing the issue causing the corruption in the first place. Any insights on what might be causing the issue you are seeing? Perhaps you could file a bug report with log4j logs and a copy of your datadir?

To answer your question, I think we need to capture any failure to verify the crc as a failure, rather than EOF (as you suggest).
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Bill Bridge (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408457#comment-13408457 ] 

Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------

The ZooKeeper log records are not sector aligned on disk. When you crash a server there can be disk I/O in flight that is an integral number of sectors. Some will complete and some will not. The completion order is not guaranteed to be the same as the order in the file.  There is a reasonable chance that the last log record will be partial. There may be some unwritten sectors in the file and then some blocks with records.

The code needs to recognize that partial records at the end of a log are a possibility and pretend they were not written. One hazard with doing that is a corruption in the middle of a log might be considered an EOF. One sanity check would be to include in every log record the highest known to be persistent record id. After finding the end of the log the code could scan farther for a valid record and declare the log corrupt if the valid record implies the log was previously committed beyond the apparent EOF.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400160#comment-13400160 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

I'm running trunk revision 1339307, and it looks like ZOOKEEPER-1156 is included therein. So I think that I'm OK with regard to that particular bug.

You bring up a very valid point. I hadn't thought about pinpointing why we are seeing this corruption, only that we are and trying to prevent it from causing that node to not re-join the ensemble when it comes back up. 

The next time this happens I'll grab log files and a copy of the datadir.

Thanks!
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408458#comment-13408458 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Unfortunately it failed again. I've uploaded new tar files with "-withPath1489" in the name (that was meant to be Patch not Path but anyhow...). Looks to be the same exact symptom. The rebooted node is never able to rejoin the ensemble. Any ideas or thoughts are greatly appreciated. 

If this is a different bug than what's here in ZOOKEEPER-1453, then I can open a separate jira... just let me know.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398997#comment-13398997 ] 

Patrick Hunt commented on ZOOKEEPER-1453:
-----------------------------------------

I don't think we need to do that, we just need to handle these cases better. We already have a checksum on the record but we're not handling all the cases.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409401#comment-13409401 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Oh my gosh, I can't believe it was a simple configuration issue!! Thank you so much for spotting the problem Flavio, you have seriously saved me!

Sorry to have spammed this jira with noise that wasn't related. It sure looked like it at first. 
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400162#comment-13400162 ] 

Flavio Junqueira commented on ZOOKEEPER-1453:
---------------------------------------------

Hi Marshall, One quick question here. When you say you're rebooting servers in your setting, I was wondering how you're doing that exactly.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398809#comment-13398809 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

We've hit this bug a few times now and are wondering if there are any thoughts around this or what the fix for this would entail. Perhaps adding an md5 file for each snapshot/log file.  Then on startup it can discard any that are corrupted... Thoughts?
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408996#comment-13408996 ] 

Flavio Junqueira commented on ZOOKEEPER-1453:
---------------------------------------------

We force it to disk, so if the disk write cache is disabled, then force returning successfully is an indication that it has been written to media. The observation about the OS buffers only holds if we don't sync to disk.

I checked the files in the previous tar.gz without the traces, but I couldn't spot anything suspicious. I'll look into the last batch uploaded. Please don't delete old files. 
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Bill Bridge (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411591#comment-13411591 ] 

Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------

The CRC checks certainly should catch a partially written record at the end of the log. However since this is an expected event we do not want to discard the log as corrupt. We want to ignore the partial record at the end of the log, and any records that might follow. If a CRC failure is always treated as EOF, then corruption that is not from a partial write during a crash will not be treated like corruption. Thus the CRC will no longer fulfill its current role as an assurance of no corruption.

You can solve this problem by putting a block header on every sector of a log file. The block header includes a check value. Every log write is an integral number of blocks. The log is preformatted to contain valid blocks with an earlier log sequence number. Encountering valid blocks with a lower sequence number is EOF. Encountering a block with an invalid check value is a corruption. This is based on the assumption that a disk write always completes an integral number of sector writes. This is true, but in very rare circumstances writing a block can result in an I/O error on reading the block. This is infrequent enough that the multiple logs ZooKeeper uses should be sufficient protection.

This is not a quick fix. I think there should be another jira to propose this. In the mean time it may be sufficient to decide a CRC failure is EOF if it is caused by the CRC value being zero. This is based on the fact that you clear ahead with zero to reduce the number of allocations. I am hoping the CRC is the very last value in a record and on disk it is aligned to its size so that a partial record ends in zero. A partial record that ends at the file EOF should also be ignored and not considered corruption.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400163#comment-13400163 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Flavio, 

This is on Linux servers, and we're trying to simulate non-graceful node failures, so we're calling "reboot -f". Since this doesn't call shutdown, it doesn't allow zookeeper a chance to gracefully shutdown. What I suspect is happening is if zookeeper happened to be in the middle of writing its logs or snapshots out to disk, this would get truncated or suffer some other file system corruption. When it comes back up we restart zookeeper and it never rejoins the ensemble. 
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408124#comment-13408124 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Patrick Hunt commented on 1427 that atomic writes of the snapshot and log files is not necessary. 

Anyhow, could still use some advice on what might be wrong here.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: zookeeper.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Bill Bridge (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412448#comment-13412448 ] 

Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------

I am not up to speed on ZooKeeper terminology so I am guessing that the "sentinel iirc" is the 0x42 (one of my favorite constants) at the end of each record. I see that in Util.readTxnBytes it is validated to be 'B' (aka 0x42) and the record is considered partial if it is wrong. This will certainly catch a partial record. 

Is there any way it could legitimately be anything other than 0x00 or 0x42? If not, then it would be marginally more robust to consider it partial only if it is 0x00, and corrupt if not 0x42. This could only happen if something outside of Zookeeper damaged the log after it was written. Once the head DBA (DataBase Administrator) from Amazon said to me "Sometimes blocks just go bad". Amazon has a lot of blocks.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399853#comment-13399853 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Patrick, thanks for taking the time to explain.. I read throw the other bug and your explanation is very clear. I'd like to work on a fix for this as it's hitting us very frequently with a stress test we do where we continually reboot one of our machines that is hosting one of our zk servers. Anyhow, I'm looking at the FileTxnIterator code, and I definitely see the bug in next() method in that it always assumes EOF is success. Have you given thought to the right solution here? Maybe giving precedence to validating CRC before checking for EOF? 

What do you think about this:

public boolean next() throws IOException {
    if (ia == null) {
        return false;
    }
    try {
        long crcValue = ia.readLong("crcvalue");
        byte[] bytes = Util.readTxnBytes(ia);
        // validate CRC
        Checksum crc = makeChecksumAlgorithm();
        if (bytes) {
            crc.update(bytes, 0, bytes.length);
        }
        if (crcValue != crc.getValue())
            throw new IOException(CRC_ERROR);
        if (bytes == null || bytes.length == 0)
            throw new EOFException("Failed to read " + logFile);
        hdr = new TxnHeader();
        record = SerializeUtils.deserializeTxn(bytes, hdr);
    } catch (EOFException e) {
    ...
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399664#comment-13399664 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Oh, I didn't realize we already had a checksum on the record, that's helpful. Can you elaborate a little more on what cases we're not handling properly? I'd definitely like to help work on this but don't have any idea where to start looking.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408421#comment-13408421 ] 

Patrick Hunt commented on ZOOKEEPER-1453:
-----------------------------------------

Any chance you could apply the fix for ZOOKEEPER-1489 and try reproducing?
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408707#comment-13408707 ] 

Marshall McMullen commented on ZOOKEEPER-1453:
----------------------------------------------

Can anyone explain two things in the above that are really puzzling to me:

(1) Why does it continually show "ZooKeeperServer not running" ? I am certain the *process* is running. So this must not refer to the process itself but something else. Can someone explain?

(2) What does this message signify:

2012-07-07 10:20:23,342 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@191] - Have smaller server identifier, so dropping the connection: (3, 2)

                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall McMullen updated ZOOKEEPER-1453:
-----------------------------------------

    Attachment: 10.10.5.123.tar.gz
                10.10.5.44.tar.gz
                10.10.5.42.tar.gz
    
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Bill Bridge (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408948#comment-13408948 ] 

Bill Bridge commented on ZOOKEEPER-1453:
----------------------------------------

This is not just about the disk write cache. It is necessary to disable the disk write cache to ensure a commit is truly persistent, but that is not the only source of this problem. 

The problem is that the OS maintains a queue of disk writes. Some number of them have been submitted to the disk and some are still in the queue. These writes are not necessarily in file block order. When there is a reboot the write for the last data block may still be in the queue while the write to the block before it has gone to the disk. 

If the reboot does a hardware reset or is caused by power failure, a similar thing can happen for the writes that have been given to the disk. Disks like to have about 4 - 8 requests queued to them so that they can order them to reduce seek/rotational latency. If there is a reset from the disk controller or loss of power, the disk cannot complete the requests it was given. This can result in some completing and some not. Since the disk reorders based on location of the sectors and the current location of the disk arm and current location of spindle rotation, it is not possible to predict what order writes will complete in.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Flavio Junqueira (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408600#comment-13408600 ] 

Flavio Junqueira commented on ZOOKEEPER-1453:
---------------------------------------------

I was wondering the same thing as Bill when I asked how you're doing this experiment. One way to verify is to disable the disk write cache and see if the problem goes away.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Patrick Hunt (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412241#comment-13412241 ] 

Patrick Hunt commented on ZOOKEEPER-1453:
-----------------------------------------

We already handle partial writes via the sentinel iirc. 

My understanding is that we should treat any corruption as EOF as these records were written but not sync'd, therefore we would not have ACK'd the proposal. This sound right? The server will rejoin the quorum, if it does have a "gap" (say due to corruption) it will download the necessary transactions from the leader to catch up with the quorum.
                
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123-withPath1489.tar.gz, 10.10.5.123.tar.gz, 10.10.5.42-withPath1489.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44-withPath1489.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1453) corrupted logs may not be correctly identified by FileTxnIterator

Posted by "Marshall McMullen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marshall McMullen updated ZOOKEEPER-1453:
-----------------------------------------

    Attachment:     (was: zookeeper.tar.gz)
    
> corrupted logs may not be correctly identified by FileTxnIterator
> -----------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1453
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1453
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Priority: Critical
>         Attachments: 10.10.5.123.tar.gz, 10.10.5.42.tar.gz, 10.10.5.44.tar.gz
>
>
> See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira