You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/06/11 01:02:16 UTC

[jira] Created: (CASSANDRA-1179) split commitlog into header + mutations files

split commitlog into header + mutations files
---------------------------------------------

                 Key: CASSANDRA-1179
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jonathan Ellis
            Assignee: Matthew F. Dennis
             Fix For: 0.7


As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879229#action_12879229 ] 

Jonathan Ellis commented on CASSANDRA-1179:
-------------------------------------------

(actually BRAF.read should be returning -1, so that RAF.readFully throws EOFException)

> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: 1179-v2.txt, trunk-1179-v3.txt, trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879227#action_12879227 ] 

Jonathan Ellis commented on CASSANDRA-1179:
-------------------------------------------

I'd rather fix BRAF to generate correct EOFExceptions in case other code runs into this.  (And by removing the EOFException check, we introduce a new bug that if the size int is incomplete, we die again.)

> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: 1179-v2.txt, trunk-1179-v3.txt, trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1179:
--------------------------------------

    Attachment: 1179-v2.txt

> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: 1179-v2.txt, trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1179:
--------------------------------------

    Attachment: 1179-v2.txt

made some minor changes, primarily using BRAF in writeCommitLogHeader (you don't get buffering w/ raw FileOutputStream, and BRAF is simpler than doing the FOS/BufferedOutputStream/FileChannel dance).  also added RecoveryManager3Test to test the .header missing entirely.

todo: still needs to delete the .headers after a successful replay as well as the .log.

more severe: after running "bin/cassandra -f" and C-c-ing several times in a row, I get

ERROR 16:02:14,537 Exception encountered during startup.
java.lang.ArrayIndexOutOfBoundsException
	at java.lang.System.arraycopy(Native Method)
	at org.apache.cassandra.io.util.BufferedRandomAccessFile.read(BufferedRandomAccessFile.java:332)
	at java.io.RandomAccessFile.readFully(RandomAccessFile.java:381)
	at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:213)
	at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:172)
	at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:120)
	at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:90)
	at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:221)

(This looks like BRAF is throwing AIOOBE when it should really be EOFException)


> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: 1179-v2.txt, trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879122#action_12879122 ] 

Jonathan Ellis commented on CASSANDRA-1179:
-------------------------------------------

(fixed trying to be to clever w/ metadata fsync -- we actually do need to include that the first time we write the file)

> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: 1179-v2.txt, trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1179:
--------------------------------------

    Attachment:     (was: 1179-v2.txt)

> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: 1179-v2.txt, trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1179:
-----------------------------------------

    Attachment: trunk-1179-v3.txt

trunk-1179-v3.txt deletes header files after successful replay, handle short entries and garbage size writes

> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: 1179-v2.txt, trunk-1179-v3.txt, trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1179:
-----------------------------------------

    Attachment: trunk-1179.txt

> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Matthew F. Dennis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew F. Dennis updated CASSANDRA-1179:
-----------------------------------------

    Attachment: trunk-1179-v4.txt

{quote}
made some minor changes, primarily using BRAF in writeCommitLogHeader (you don't get buffering w/ raw FileOutputStream, and BRAF is simpler than doing the FOS/BufferedOutputStream/FileChannel dance).
{quote}

FOS doesn't sync on flush/close and as headers are "optional" now there is no reason to waste the IO.  Just to be sure I was remembering this correctly, I just now tested it.  It provides 80+% improvement over BRAF, even more on a heavily loaded system.  This was clearly a failure on my part to document it at as such.  The header is so small (56 bytes I think) the OS will cache it just fine and not using buffered output will avoid both the memcopies and GC from the buffers.

{quote}
todo: still needs to delete the .headers after a successful replay as well as the .log.
{quote}

thank you, I hadn't realized there were two places the logs were getting removed.  Done.

{quote}
I'd rather fix BRAF to generate correct EOFExceptions in case other code runs into this. (And by removing the EOFException check, we introduce a new bug that if the size int is incomplete, we die again.)

(actually BRAF.read should be returning -1, so that RAF.readFully throws EOFException) 
{quote}

It was not at EOF, the buffer the data was supposed to be written into was zero length.  There was data in the file, but no where to write it in the buffer (because the size read was 0, new byte[size] resulted in a zero length array was was then supposed to be filled by BRAF.readFully).

I've added tests to catch this problem (as well as other related ones) and also changed BRAF to throw a more reasonable exception (but not EOF).  I believe BRAF.readFully will already throw EOF if it is at the end of the file.

The size of the log entry is now CRCed on it's own.  Whlie testing with random garbage at the end of a commit log, I had written a really large int to the size field which resulted in recover() trying to allocate a massive byte[] and getting OOM.

{quote}
by removing the EOFException check, we introduce a new bug that if the size int is incomplete, we die again.
{quote}

good catch.  I have no idea WTF I was thinking, there was even a comment that warned about it that got removed when the try/catch was removed.  I was probably trying to test something and removed it so it'd spew but forgot to put it back.


> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: 1179-v2.txt, trunk-1179-v3.txt, trunk-1179-v4.txt, trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1179) split commitlog into header + mutations files

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880625#action_12880625 ] 

Hudson commented on CASSANDRA-1179:
-----------------------------------

Integrated in Cassandra #471 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/471/])
    split commitlog header into separate file and add size checksum to mutations.  patch by mdennis and jbellis for CASSANDRA-1179


> split commitlog into header + mutations files
> ---------------------------------------------
>
>                 Key: CASSANDRA-1179
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1179
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7
>
>         Attachments: 1179-v2.txt, trunk-1179-v3.txt, trunk-1179-v4.txt, trunk-1179.txt
>
>
> As mentioned in CASSANDRA-1119, it seems possible that a commitlog header could be corrupted by a power loss during update of the header, post-flush.  We could try to make it more robust (by writing the size of the commitlogheader first, and skipping to the end if we encounter corruption) but it seems to me that the most foolproof method would be to split the log into two files: the header, which we'll overwrite, and the data, which is truly append only.  If If the header is corrupt on reply, we just reply the data from the beginning; the header allows us to avoid replaying data redundantly, but it's strictly an optimization and not required for correctness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.