You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Patrick Hunt (JIRA)" <ji...@apache.org> on 2010/02/02 18:32:19 UTC

[jira] Created: (ZOOKEEPER-663) hudson failure in ZKDatabaseCorruptionTest

hudson failure in ZKDatabaseCorruptionTest
------------------------------------------

                 Key: ZOOKEEPER-663
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-663
             Project: Zookeeper
          Issue Type: Bug
          Components: server
            Reporter: Patrick Hunt
            Assignee: Mahadev konar
            Priority: Critical
             Fix For: 3.3.0


http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/686/

java.lang.RuntimeException: Unable to run quorum server 
	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:380)
	at org.apache.zookeeper.test.ZkDatabaseCorruptionTest.testCorruption(ZkDatabaseCorruptionTest.java:99)
Caused by: java.io.IOException: Invalid magic number 0 != 1514884167
	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:455)
	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:471)
	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:438)
	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:519)
	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:145)
	at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:193)
	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:377)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-663) hudson failure in ZKDatabaseCorruptionTest

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12842160#action_12842160 ] 

Hadoop QA commented on ZOOKEEPER-663:
-------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438038/ZOOKEEPER-663.patch
  against trunk revision 919640.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/129/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/129/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/129/console

This message is automatically generated.

> hudson failure in ZKDatabaseCorruptionTest
> ------------------------------------------
>
>                 Key: ZOOKEEPER-663
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-663
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-663.patch
>
>
> http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/686/
> java.lang.RuntimeException: Unable to run quorum server 
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:380)
> 	at org.apache.zookeeper.test.ZkDatabaseCorruptionTest.testCorruption(ZkDatabaseCorruptionTest.java:99)
> Caused by: java.io.IOException: Invalid magic number 0 != 1514884167
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:455)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:471)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:438)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:519)
> 	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:145)
> 	at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:193)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:377)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-663) hudson failure in ZKDatabaseCorruptionTest

Posted by "Henry Robinson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henry Robinson updated ZOOKEEPER-663:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Mahadev!

> hudson failure in ZKDatabaseCorruptionTest
> ------------------------------------------
>
>                 Key: ZOOKEEPER-663
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-663
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-663.patch
>
>
> http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/686/
> java.lang.RuntimeException: Unable to run quorum server 
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:380)
> 	at org.apache.zookeeper.test.ZkDatabaseCorruptionTest.testCorruption(ZkDatabaseCorruptionTest.java:99)
> Caused by: java.io.IOException: Invalid magic number 0 != 1514884167
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:455)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:471)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:438)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:519)
> 	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:145)
> 	at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:193)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:377)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-663) hudson failure in ZKDatabaseCorruptionTest

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Reed updated ZOOKEEPER-663:
------------------------------------

    Hadoop Flags: [Reviewed]

no need for a test. it just changes messages and doc.

> hudson failure in ZKDatabaseCorruptionTest
> ------------------------------------------
>
>                 Key: ZOOKEEPER-663
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-663
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-663.patch
>
>
> http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/686/
> java.lang.RuntimeException: Unable to run quorum server 
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:380)
> 	at org.apache.zookeeper.test.ZkDatabaseCorruptionTest.testCorruption(ZkDatabaseCorruptionTest.java:99)
> Caused by: java.io.IOException: Invalid magic number 0 != 1514884167
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:455)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:471)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:438)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:519)
> 	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:145)
> 	at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:193)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:377)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-663) hudson failure in ZKDatabaseCorruptionTest

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-663:
------------------------------------

    Attachment: ZOOKEEPER-663.patch

this patch fixes the logging to mention which file is corrupted and then adds forrest docs on handling such kind of failures.

> hudson failure in ZKDatabaseCorruptionTest
> ------------------------------------------
>
>                 Key: ZOOKEEPER-663
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-663
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-663.patch
>
>
> http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/686/
> java.lang.RuntimeException: Unable to run quorum server 
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:380)
> 	at org.apache.zookeeper.test.ZkDatabaseCorruptionTest.testCorruption(ZkDatabaseCorruptionTest.java:99)
> Caused by: java.io.IOException: Invalid magic number 0 != 1514884167
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:455)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:471)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:438)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:519)
> 	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:145)
> 	at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:193)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:377)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (ZOOKEEPER-663) hudson failure in ZKDatabaseCorruptionTest

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841977#action_12841977 ] 

Mahadev konar commented on ZOOKEEPER-663:
-----------------------------------------

This looks like a quorum peer was creting a new txn log file and was shutdown in the middle of that. This probably led to corruption of txnlogs in the data directory of one of the quorumpeers. We actually do not have a good story with the corruption with of the transaction logs. Currently we depend on admins manually going to the node and making decisions on how to resolve this.

As a part of this jira we can add documentation in the forrest docs for now, on how to deal with such situations. Also, the logging needs to change to point which file was corrupted.

> hudson failure in ZKDatabaseCorruptionTest
> ------------------------------------------
>
>                 Key: ZOOKEEPER-663
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-663
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.3.0
>
>
> http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/686/
> java.lang.RuntimeException: Unable to run quorum server 
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:380)
> 	at org.apache.zookeeper.test.ZkDatabaseCorruptionTest.testCorruption(ZkDatabaseCorruptionTest.java:99)
> Caused by: java.io.IOException: Invalid magic number 0 != 1514884167
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:455)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:471)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:438)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:519)
> 	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:145)
> 	at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:193)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:377)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (ZOOKEEPER-663) hudson failure in ZKDatabaseCorruptionTest

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-663:
------------------------------------

    Status: Patch Available  (was: Open)

> hudson failure in ZKDatabaseCorruptionTest
> ------------------------------------------
>
>                 Key: ZOOKEEPER-663
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-663
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>            Reporter: Patrick Hunt
>            Assignee: Mahadev konar
>            Priority: Critical
>             Fix For: 3.3.0
>
>         Attachments: ZOOKEEPER-663.patch
>
>
> http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/686/
> java.lang.RuntimeException: Unable to run quorum server 
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:380)
> 	at org.apache.zookeeper.test.ZkDatabaseCorruptionTest.testCorruption(ZkDatabaseCorruptionTest.java:99)
> Caused by: java.io.IOException: Invalid magic number 0 != 1514884167
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:455)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:471)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:438)
> 	at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:519)
> 	at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:145)
> 	at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:193)
> 	at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:377)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.