You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Neha Narkhede (Created) (JIRA)" <ji...@apache.org> on 2012/03/18 02:16:39 UTC

[jira] [Created] (KAFKA-309) Bug in FileMessageSet's append API can corrupt on disk log

Bug in FileMessageSet's append API can corrupt on disk log
----------------------------------------------------------

                 Key: KAFKA-309
                 URL: https://issues.apache.org/jira/browse/KAFKA-309
             Project: Kafka
          Issue Type: Sub-task
          Components: core
    Affects Versions: 0.7
            Reporter: Neha Narkhede
            Assignee: Neha Narkhede
            Priority: Critical


In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -

    while(written < messages.sizeInBytes)
      written += messages.writeTo(channel, 0, messages.sizeInBytes)

In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -

  def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
    channel.write(buffer.duplicate)

If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file. 

Effectively, we have a corrupted set of messages on disk. 

Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (KAFKA-309) Bug in FileMessageSet's append API can corrupt on disk log

Posted by "Neha Narkhede (Work started) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on KAFKA-309 started by Neha Narkhede.

> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
>                 Key: KAFKA-309
>                 URL: https://issues.apache.org/jira/browse/KAFKA-309
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>         Attachments: kafka-309-test.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
>     while(written < messages.sizeInBytes)
>       written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
>   def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
>     channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file. 
> Effectively, we have a corrupted set of messages on disk. 
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can corrupt on disk log

Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-309:
--------------------------------

    Attachment: kafka-309.patch

This patch changes the writeTo API of the ByteBufferMessageSet to use the message set's buffer to write to the FileChannel. The writeTo API does *not* change the underlying buffer's position marker. 

The right fix might be to not call ByteBufferMessageSet's writeTo in a loop in FileMessageSet's append API, since the guarantee of a blocking channel would not allow it to return without writing the entire message set or throwing an error. But that fix is arguably higher risk, so punting it for now, until we fully understand the guarantees of FileChannel
                
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
>                 Key: KAFKA-309
>                 URL: https://issues.apache.org/jira/browse/KAFKA-309
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>         Attachments: kafka-309-test.patch, kafka-309.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
>     while(written < messages.sizeInBytes)
>       written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
>   def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
>     channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file. 
> Effectively, we have a corrupted set of messages on disk. 
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can corrupt on disk log

Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-309:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)
    
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
>                 Key: KAFKA-309
>                 URL: https://issues.apache.org/jira/browse/KAFKA-309
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>         Attachments: kafka-309-test.patch, kafka-309.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
>     while(written < messages.sizeInBytes)
>       written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
>   def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
>     channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file. 
> Effectively, we have a corrupted set of messages on disk. 
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can corrupt on disk log

Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-309:
--------------------------------

    Issue Type: Bug  (was: Sub-task)
        Parent:     (was: KAFKA-308)
    
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
>                 Key: KAFKA-309
>                 URL: https://issues.apache.org/jira/browse/KAFKA-309
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>         Attachments: kafka-309-test.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
>     while(written < messages.sizeInBytes)
>       written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
>   def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
>     channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file. 
> Effectively, we have a corrupted set of messages on disk. 
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can corrupt on disk log

Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-309:
--------------------------------

    Status: Patch Available  (was: In Progress)
    
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
>                 Key: KAFKA-309
>                 URL: https://issues.apache.org/jira/browse/KAFKA-309
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>         Attachments: kafka-309-test.patch, kafka-309.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
>     while(written < messages.sizeInBytes)
>       written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
>   def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
>     channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file. 
> Effectively, we have a corrupted set of messages on disk. 
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-309) Bug in FileMessageSet's append API can corrupt on disk log

Posted by "Jun Rao (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235679#comment-13235679 ] 

Jun Rao commented on KAFKA-309:
-------------------------------

+1 on the patch.
                
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
>                 Key: KAFKA-309
>                 URL: https://issues.apache.org/jira/browse/KAFKA-309
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>         Attachments: kafka-309-test.patch, kafka-309.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
>     while(written < messages.sizeInBytes)
>       written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
>   def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
>     channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file. 
> Effectively, we have a corrupted set of messages on disk. 
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can corrupt on disk log

Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-309:
--------------------------------

    Attachment: kafka-309-test.patch

The test includes a FileChannelTest that writes byte buffer of varying lengths to a file channel in a single call and checks if the buffer was completely written.
                
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
>                 Key: KAFKA-309
>                 URL: https://issues.apache.org/jira/browse/KAFKA-309
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Critical
>         Attachments: kafka-309-test.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
>     while(written < messages.sizeInBytes)
>       written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
>   def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
>     channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file. 
> Effectively, we have a corrupted set of messages on disk. 
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira