You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Neha Narkhede (Created) (JIRA)" <ji...@apache.org> on 2012/03/18 02:16:39 UTC
[jira] [Created] (KAFKA-309) Bug in FileMessageSet's append API can
corrupt on disk log
Bug in FileMessageSet's append API can corrupt on disk log
----------------------------------------------------------
Key: KAFKA-309
URL: https://issues.apache.org/jira/browse/KAFKA-309
Project: Kafka
Issue Type: Sub-task
Components: core
Affects Versions: 0.7
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Priority: Critical
In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
while(written < messages.sizeInBytes)
written += messages.writeTo(channel, 0, messages.sizeInBytes)
In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
channel.write(buffer.duplicate)
If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file.
Effectively, we have a corrupted set of messages on disk.
Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Work started] (KAFKA-309) Bug in FileMessageSet's append
API can corrupt on disk log
Posted by "Neha Narkhede (Work started) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on KAFKA-309 started by Neha Narkhede.
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
> Key: KAFKA-309
> URL: https://issues.apache.org/jira/browse/KAFKA-309
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.7
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Critical
> Attachments: kafka-309-test.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
> while(written < messages.sizeInBytes)
> written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
> def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
> channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file.
> Effectively, we have a corrupted set of messages on disk.
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can
corrupt on disk log
Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede updated KAFKA-309:
--------------------------------
Attachment: kafka-309.patch
This patch changes the writeTo API of the ByteBufferMessageSet to use the message set's buffer to write to the FileChannel. The writeTo API does *not* change the underlying buffer's position marker.
The right fix might be to not call ByteBufferMessageSet's writeTo in a loop in FileMessageSet's append API, since the guarantee of a blocking channel would not allow it to return without writing the entire message set or throwing an error. But that fix is arguably higher risk, so punting it for now, until we fully understand the guarantees of FileChannel
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
> Key: KAFKA-309
> URL: https://issues.apache.org/jira/browse/KAFKA-309
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.7
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Critical
> Attachments: kafka-309-test.patch, kafka-309.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
> while(written < messages.sizeInBytes)
> written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
> def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
> channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file.
> Effectively, we have a corrupted set of messages on disk.
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can
corrupt on disk log
Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede updated KAFKA-309:
--------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
> Key: KAFKA-309
> URL: https://issues.apache.org/jira/browse/KAFKA-309
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.7
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Critical
> Attachments: kafka-309-test.patch, kafka-309.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
> while(written < messages.sizeInBytes)
> written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
> def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
> channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file.
> Effectively, we have a corrupted set of messages on disk.
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can
corrupt on disk log
Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede updated KAFKA-309:
--------------------------------
Issue Type: Bug (was: Sub-task)
Parent: (was: KAFKA-308)
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
> Key: KAFKA-309
> URL: https://issues.apache.org/jira/browse/KAFKA-309
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.7
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Critical
> Attachments: kafka-309-test.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
> while(written < messages.sizeInBytes)
> written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
> def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
> channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file.
> Effectively, we have a corrupted set of messages on disk.
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can
corrupt on disk log
Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede updated KAFKA-309:
--------------------------------
Status: Patch Available (was: In Progress)
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
> Key: KAFKA-309
> URL: https://issues.apache.org/jira/browse/KAFKA-309
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.7
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Critical
> Attachments: kafka-309-test.patch, kafka-309.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
> while(written < messages.sizeInBytes)
> written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
> def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
> channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file.
> Effectively, we have a corrupted set of messages on disk.
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-309) Bug in FileMessageSet's append API
can corrupt on disk log
Posted by "Jun Rao (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235679#comment-13235679 ]
Jun Rao commented on KAFKA-309:
-------------------------------
+1 on the patch.
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
> Key: KAFKA-309
> URL: https://issues.apache.org/jira/browse/KAFKA-309
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.7
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Critical
> Attachments: kafka-309-test.patch, kafka-309.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
> while(written < messages.sizeInBytes)
> written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
> def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
> channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file.
> Effectively, we have a corrupted set of messages on disk.
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (KAFKA-309) Bug in FileMessageSet's append API can
corrupt on disk log
Posted by "Neha Narkhede (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede updated KAFKA-309:
--------------------------------
Attachment: kafka-309-test.patch
The test includes a FileChannelTest that writes byte buffer of varying lengths to a file channel in a single call and checks if the buffer was completely written.
> Bug in FileMessageSet's append API can corrupt on disk log
> ----------------------------------------------------------
>
> Key: KAFKA-309
> URL: https://issues.apache.org/jira/browse/KAFKA-309
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.7
> Reporter: Neha Narkhede
> Assignee: Neha Narkhede
> Priority: Critical
> Attachments: kafka-309-test.patch
>
>
> In FileMessageSet's append API, we write a ByteBufferMessageSet to a log in the following manner -
> while(written < messages.sizeInBytes)
> written += messages.writeTo(channel, 0, messages.sizeInBytes)
> In ByteBufferMessageSet, the writeTo API uses buffer.duplicate() to append to a channel -
> def writeTo(channel: GatheringByteChannel, offset: Long, size: Long): Long =
> channel.write(buffer.duplicate)
> If the channel doesn't write the ByteBuffer in one call, then we call it again until sizeInBytes bytes are written. But the next call will use buffer.duplicate() to write to the FileChannel, which will write the entire ByteBufferMessageSet again to the file.
> Effectively, we have a corrupted set of messages on disk.
> Thinking about it, FileChannel is a blocking channel, so ideally, the entire ByteBuffer should be written to the FileChannel in one call. I wrote a test (attached here) and saw that it does. But I'm not aware if there are some corner cases when it doesn't do so. In those cases, Kafka will end up corrupting on disk log segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira