You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paul Querna (JIRA)" <ji...@apache.org> on 2011/02/07 20:41:57 UTC

[jira] Created: (CASSANDRA-2128) Corrupted Commit logs

Corrupted Commit logs
---------------------

                 Key: CASSANDRA-2128
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.6.11
            Reporter: Paul Querna


Two of our nodes had a hard failure.

They both came up with a corrupted commit log.

On startup we get this:
{quote}
011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
2011-02-07_19:34:03.95403 java.io.EOFException
2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
2011-02-07_19:34:03.95422 Exception encountered during startup.
2011-02-07_19:34:03.95436 java.io.EOFException
2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
{quote}

On node A, the commit log in question is 100mb.

On node B, the commit log in question is 60mb.

An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991549#comment-12991549 ] 

Gary Dusbabek commented on CASSANDRA-2128:
------------------------------------------

Actually, this is kind of bizarre. 0.6 does use a checksum.  The failure is happening after the RM has been read (in full) while we are attempting to deserialize it.

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.11
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Gary Dusbabek
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992518#comment-12992518 ] 

Jonathan Ellis commented on CASSANDRA-2128:
-------------------------------------------

CASSANDRA-2055 reports the exact same stacktrace: RM.deserialize EOFing on the very first field it tries to read.  Suspicious!

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.11
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Gary Dusbabek
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2128) Corrupted Commit logs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2128:
--------------------------------------

    Fix Version/s: 0.7.2

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.12, 0.7.2
>
>         Attachments: 2128.txt
>
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2128:
--------------------------------------

    Attachment: 2128.txt

Patch adds a check to the length of the row mutation read to make sure it is sane.  (Sane lengths will make sure the CRC can do its job.)

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.11
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Gary Dusbabek
>         Attachments: 2128.txt
>
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991543#comment-12991543 ] 

Gary Dusbabek commented on CASSANDRA-2128:
------------------------------------------

0.7 handles this gracefully by using checksums when the RowMutations are read from the CL. 0.6 could handle this better with a try{}catch{} where the RowMutation is deserialized.

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.11
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992572#comment-12992572 ] 

Gary Dusbabek commented on CASSANDRA-2128:
------------------------------------------

+1

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.12
>
>         Attachments: 2128.txt
>
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Issue Comment Edited: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992572#comment-12992572 ] 

Gary Dusbabek edited comment on CASSANDRA-2128 at 2/9/11 5:01 PM:
------------------------------------------------------------------

+1, except don't modify the log4j config!

      was (Author: gdusbabek):
    +1
  
> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.12
>
>         Attachments: 2128.txt
>
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gary Dusbabek reassigned CASSANDRA-2128:
----------------------------------------

    Assignee: Gary Dusbabek

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.11
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Gary Dusbabek
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992551#comment-12992551 ] 

Jonathan Ellis commented on CASSANDRA-2128:
-------------------------------------------

Paul sent me one of the CommitLog files exhibiting this problem.  It is 99614720 bytes long.

I added one more log entry when replaying it:

{code}
                    logger.debug("checksum of " + bytes.length + " successful: " + claimedCRC32);
                    /* deserialize the commit log entry */
{code}

The last entries in the log before dying are

{noformat}
DEBUG [main] 2011-02-09 09:39:15,577 CommitLog.java (line 213) Reading mutation at 97750254
DEBUG [main] 2011-02-09 09:39:15,577 CommitLog.java (line 240) checksum of 1209 successful: 2281213824
DEBUG [main] 2011-02-09 09:39:15,578 CommitLog.java (line 244) replaying mutation for MonitorApp...
DEBUG [main] 2011-02-09 09:39:15,578 CommitLog.java (line 213) Reading mutation at 97751479
DEBUG [main] 2011-02-09 09:39:15,580 CommitLog.java (line 240) checksum of 0 successful: 0
 INFO [main] 2011-02-09 09:39:15,604 CommitLog.java (line 253) Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
ERROR [main] 2011-02-09 09:39:15,652 CassandraDaemon.java (line 242) Exception encountered during startup.
java.io.EOFException
	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
{noformat}

This is a good 2MB before the end of the file.  So I looked in a hex editor to see what was there, starting at offset 97751479.  There's about 8KB of pure 00 bytes, followed by nonsense that occasionally looks like a RowMutation (some appearances of column names cnt, avg, dev that are present in the early, intact part of the commitlog) but mostly does not: there are no more appearances of the keyspace name, MonitorApp, which is the first part of any real RowMutation in the 0.6 serialization format, and in one place there is even the string "java.util.BitSet" visible, which should only appear in the commitlog segment header at the start of the file (that we rewrite at each flush).

To me it looks like "EC2 can write a bunch of nonsense in your commitlog during a hard failure," and the fix is to make the checksum process more robust against nonsense that happens to conform to the first couple bytes we read for a RowMutation.

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.11
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Gary Dusbabek
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991556#comment-12991556 ] 

Jonathan Ellis commented on CASSANDRA-2128:
-------------------------------------------

0.6 uses checksums too, for this area.  (The one place it doesn't is the Header, which I posted a patch for in CASSANDRA-1815, but that doesn't seem to actually matter in practice.)

Here's the code in question:

{code}
                    long claimedCRC32;
                    byte[] bytes;
                    try
                    {
                        bytes = new byte[(int) reader.readLong()]; // readlong can throw EOFException too
                        reader.readFully(bytes);
                        claimedCRC32 = reader.readLong();
                    }
                    catch (EOFException e)
                    {
                        // last CL entry didn't get completely written.  that's ok.
                        break;
                    }

                    ByteArrayInputStream bufIn = new ByteArrayInputStream(bytes);
                    Checksum checksum = new CRC32();
                    checksum.update(bytes, 0, bytes.length);
                    if (claimedCRC32 != checksum.getValue())
                    {
                        // this part of the log must not have been fsynced.  probably the rest is bad too,
                        // but just in case there is no harm in trying them.
                        continue;
                    }

                    /* deserialize the commit log entry */
                    final RowMutation rm = RowMutation.serializer().deserialize(new DataInputStream(bufIn));
{code}

In other words, first we read the mutation into a buffer and checksum it.  If it passes, we try to deserialize that into a mutation.

It's expected that read-into-buffer can throw an EOF (which we expect and catch) but once it passes checksum it's not expected that mutation deserialize should fail.

(Yes, checksums aren't perfect, especially not 32bit ones, but for the checksum to generate a false positive on the last entry in the commitlog, on two different machines, is not a coincidence I'm buying.)

At first glance, the most likely explanation is a bug in RowMutation serializer.  But it's also possible that I'm wrong and it really is erroring out due to some unflushed data somehow.  If you turn on debug logging, it will include the offset in the commitlog of the mutation being replayed -- if it's very very close to the end of the file, then that would increase that likelihood.

bq. An ideal resolution would be if EOF is hit early, log something, but don't stop the startup. Instead process everything that we have done so far, and keep going.

Disagreed.  This is a serious enough bug that we should require human intervention before ignoring it and starting up anyway.

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.11
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Gary Dusbabek
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992657#comment-12992657 ] 

Hudson commented on CASSANDRA-2128:
-----------------------------------

Integrated in Cassandra-0.6 #58 (See [https://hudson.apache.org/hudson/job/Cassandra-0.6/58/])
    update commitlog replay to catch bogus RowMutation lengths
patch by jbellis; reviewed by gdusbabek for CASSANDRA-2128


> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.12
>
>         Attachments: 2128.txt
>
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2128) Corrupted Commit logs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2128:
--------------------------------------

             Reviewer: gdusbabek
    Affects Version/s:     (was: 0.6.11)
                       0.6
        Fix Version/s: 0.6.12
             Assignee: Jonathan Ellis  (was: Gary Dusbabek)

0.7 already has a weaker fix for this (checking that serializedSize > 0, which isn't quite as strong as >= 10, but probably adequate in practice, and adding a separate checksum for the size itself.)

> Corrupted Commit logs
> ---------------------
>
>                 Key: CASSANDRA-2128
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2128
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: cassandra-0.6 @ r1064246 (0.6.11)
> Ubuntu 9.10
> Rackspace Cloud
>            Reporter: Paul Querna
>            Assignee: Jonathan Ellis
>             Fix For: 0.6.12
>
>         Attachments: 2128.txt
>
>
> Two of our nodes had a hard failure.
> They both came up with a corrupted commit log.
> On startup we get this:
> {quote}
> 011-02-07_19:34:03.95124 INFO - Finished reading /var/lib/cassandra/commitlog/CommitLog-1297099954252.log
> 2011-02-07_19:34:03.95400 ERROR - Exception encountered during startup.
> 2011-02-07_19:34:03.95403 java.io.EOFException
> 2011-02-07_19:34:03.95403 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95404 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95405 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95406 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95407 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95408 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95409 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95410 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> 2011-02-07_19:34:03.95422 Exception encountered during startup.
> 2011-02-07_19:34:03.95436 java.io.EOFException
> 2011-02-07_19:34:03.95447 	at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323)
> 2011-02-07_19:34:03.95458 	at java.io.DataInputStream.readUTF(DataInputStream.java:572)
> 2011-02-07_19:34:03.95468 	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
> 2011-02-07_19:34:03.95478 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:363)
> 2011-02-07_19:34:03.95489 	at org.apache.cassandra.db.RowMutationSerializer.deserialize(RowMutation.java:318)
> 2011-02-07_19:34:03.95499 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:240)
> 2011-02-07_19:34:03.95510 	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
> 2011-02-07_19:34:03.95521 	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:115)
> 2011-02-07_19:34:03.95531 	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
> {quote}
> On node A, the commit log in question is 100mb.
> On node B, the commit log in question is 60mb.
> An ideal resolution would be if EOF is hit early, log something, but don't stop the startup.  Instead process everything that we have done so far, and keep going.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira