You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Sriharsha Chintalapani <ha...@hortonworks.com> on 2014/10/05 00:28:01 UTC

Review Request 26346: Patch for KAFKA-1670

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/
-----------------------------------------------------------

Review request for kafka.


Bugs: KAFKA-1670
    https://issues.apache.org/jira/browse/KAFKA-1670


Repository: kafka


Description
-------

KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


Diffs
-----

  core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 

Diff: https://reviews.apache.org/r/26346/diff/


Testing
-------


Thanks,

Sriharsha Chintalapani


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/
-----------------------------------------------------------

(Updated Oct. 8, 2014, 1:39 a.m.)


Review request for kafka.


Bugs: KAFKA-1670
    https://issues.apache.org/jira/browse/KAFKA-1670


Repository: kafka


Description (updated)
-------

KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


Diffs (updated)
-----

  clients/src/main/java/org/apache/kafka/common/errors/RecordBatchTooLargeException.java PRE-CREATION 
  clients/src/main/java/org/apache/kafka/common/protocol/Errors.java d434f420ad63406ee2a2fde9435762ae027d85f3 
  core/src/main/scala/kafka/common/ErrorMapping.scala 3fae7910e4ce17bc8325887a046f383e0c151d44 
  core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala PRE-CREATION 
  core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
  core/src/test/scala/unit/kafka/log/LogManagerTest.scala 59bd8a981b3fb8595dd6e790a30071092978a88d 
  core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
  core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9556ed92c61ffee5423be962bcdbe64c71e1f2fa 

Diff: https://reviews.apache.org/r/26346/diff/


Testing
-------


Thanks,

Sriharsha Chintalapani


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Jun Rao <ju...@gmail.com>.

> On Oct. 7, 2014, 11:03 p.m., Jun Rao wrote:
> > core/src/test/scala/unit/kafka/log/LogTest.scala, line 242
> > <https://reviews.apache.org/r/26346/diff/5/?file=714849#file714849line242>
> >
> >     By increasing the segment size to 100, does the log still roll on every message as indicated by the comment?
> 
> Sriharsha Chintalapani wrote:
>     yes it rolls on every messageset. I can add a assert to test that if it required.

Will it? In each append, we add 2 messages with a total of 10 bytes. If we add a 10 byte per message overhead, with compression, it seems both message sets can fit in the same log segment of 100 bytes?


- Jun


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55720
-----------------------------------------------------------


On Oct. 7, 2014, 8:49 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2014, 8:49 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala PRE-CREATION 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogManagerTest.scala 59bd8a981b3fb8595dd6e790a30071092978a88d 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
>   core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9556ed92c61ffee5423be962bcdbe64c71e1f2fa 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Jun Rao <ju...@gmail.com>.

> On Oct. 7, 2014, 11:03 p.m., Jun Rao wrote:
> > core/src/test/scala/unit/kafka/log/LogTest.scala, line 242
> > <https://reviews.apache.org/r/26346/diff/5/?file=714849#file714849line242>
> >
> >     By increasing the segment size to 100, does the log still roll on every message as indicated by the comment?
> 
> Sriharsha Chintalapani wrote:
>     yes it rolls on every messageset. I can add a assert to test that if it required.
> 
> Jun Rao wrote:
>     Will it? In each append, we add 2 messages with a total of 10 bytes. If we add a 10 byte per message overhead, with compression, it seems both message sets can fit in the same log segment of 100 bytes?
> 
> Sriharsha Chintalapani wrote:
>     I am using validMessages.sizeInBytes which is showing the size of ByteBufferedMessageSet 
>     new ByteBufferMessageSet(DefaultCompressionCodec, new Message("hello".getBytes), new Message("there".getBytes)) as 83

Great. Then, this is fine.


- Jun


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55720
-----------------------------------------------------------


On Oct. 7, 2014, 8:49 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2014, 8:49 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala PRE-CREATION 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogManagerTest.scala 59bd8a981b3fb8595dd6e790a30071092978a88d 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
>   core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9556ed92c61ffee5423be962bcdbe64c71e1f2fa 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.

> On Oct. 7, 2014, 11:03 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala, line 20
> > <https://reviews.apache.org/r/26346/diff/5/?file=714846#file714846line20>
> >
> >     We need to add this exception to ErrorMapping and Errors. We also need to add this class to org.apache.kafka.common.errors in the client.

sorry I missed it


> On Oct. 7, 2014, 11:03 p.m., Jun Rao wrote:
> > core/src/test/scala/unit/kafka/log/LogTest.scala, line 242
> > <https://reviews.apache.org/r/26346/diff/5/?file=714849#file714849line242>
> >
> >     By increasing the segment size to 100, does the log still roll on every message as indicated by the comment?

yes it rolls on every messageset. I can add a assert to test that if it required.


- Sriharsha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55720
-----------------------------------------------------------


On Oct. 7, 2014, 8:49 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2014, 8:49 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala PRE-CREATION 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogManagerTest.scala 59bd8a981b3fb8595dd6e790a30071092978a88d 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
>   core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9556ed92c61ffee5423be962bcdbe64c71e1f2fa 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.

> On Oct. 7, 2014, 11:03 p.m., Jun Rao wrote:
> > core/src/test/scala/unit/kafka/log/LogTest.scala, line 242
> > <https://reviews.apache.org/r/26346/diff/5/?file=714849#file714849line242>
> >
> >     By increasing the segment size to 100, does the log still roll on every message as indicated by the comment?
> 
> Sriharsha Chintalapani wrote:
>     yes it rolls on every messageset. I can add a assert to test that if it required.
> 
> Jun Rao wrote:
>     Will it? In each append, we add 2 messages with a total of 10 bytes. If we add a 10 byte per message overhead, with compression, it seems both message sets can fit in the same log segment of 100 bytes?

I am using validMessages.sizeInBytes which is showing the size of ByteBufferedMessageSet 
new ByteBufferMessageSet(DefaultCompressionCodec, new Message("hello".getBytes), new Message("there".getBytes)) as 83


- Sriharsha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55720
-----------------------------------------------------------


On Oct. 7, 2014, 8:49 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2014, 8:49 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala PRE-CREATION 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogManagerTest.scala 59bd8a981b3fb8595dd6e790a30071092978a88d 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
>   core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9556ed92c61ffee5423be962bcdbe64c71e1f2fa 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Jun Rao <ju...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55720
-----------------------------------------------------------



core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala
<https://reviews.apache.org/r/26346/#comment96114>

    We need to add this exception to ErrorMapping and Errors. We also need to add this class to org.apache.kafka.common.errors in the client.



core/src/test/scala/unit/kafka/log/LogTest.scala
<https://reviews.apache.org/r/26346/#comment96104>

    By increasing the segment size to 100, does the log still roll on every message as indicated by the comment?


- Jun Rao


On Oct. 7, 2014, 8:49 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2014, 8:49 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala PRE-CREATION 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogManagerTest.scala 59bd8a981b3fb8595dd6e790a30071092978a88d 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
>   core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9556ed92c61ffee5423be962bcdbe64c71e1f2fa 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/
-----------------------------------------------------------

(Updated Oct. 7, 2014, 8:49 p.m.)


Review request for kafka.


Bugs: KAFKA-1670
    https://issues.apache.org/jira/browse/KAFKA-1670


Repository: kafka


Description (updated)
-------

KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


Diffs (updated)
-----

  core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala PRE-CREATION 
  core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
  core/src/test/scala/unit/kafka/log/LogManagerTest.scala 59bd8a981b3fb8595dd6e790a30071092978a88d 
  core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
  core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9556ed92c61ffee5423be962bcdbe64c71e1f2fa 

Diff: https://reviews.apache.org/r/26346/diff/


Testing
-------


Thanks,

Sriharsha Chintalapani


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/
-----------------------------------------------------------

(Updated Oct. 7, 2014, 8:39 p.m.)


Review request for kafka.


Bugs: KAFKA-1670
    https://issues.apache.org/jira/browse/KAFKA-1670


Repository: kafka


Description (updated)
-------

KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


Merge remote-tracking branch 'origin/trunk' into KAFKA-1670


Diffs (updated)
-----

  core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala PRE-CREATION 
  core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
  core/src/test/scala/unit/kafka/log/LogManagerTest.scala 59bd8a981b3fb8595dd6e790a30071092978a88d 
  core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
  core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9556ed92c61ffee5423be962bcdbe64c71e1f2fa 

Diff: https://reviews.apache.org/r/26346/diff/


Testing
-------


Thanks,

Sriharsha Chintalapani


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.

> On Oct. 6, 2014, 5:24 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/log/Log.scala, line 502
> > <https://reviews.apache.org/r/26346/diff/3/?file=714142#file714142line502>
> >
> >     We probably should enforce that a segment is never larger than config.segmentSize. So, if messageSize is larger than config.segmentSize, we should just throw an exception. Once we do that, it seems that we can just use the following check to cover both conditions.
> >     
> >     if (segment.size + messageSize > (long) config.segmentSize
> >     
> >     We likely need to adjust the comment above accordingly.

Sorry I am not able follow . can you please elaborate on this " So, if messageSize is larger than config.segmentSize,".
Here the issue is not the messageSize is larger than the config.segmentSize.
Currently we only roll when segment.size is greater than the config.segmentSize and the edge case here is
if the config.segmentSize is Int.MaxValue and the current segment.size is Int.MaxValue - 1 we still wouldn't roll the segment
and append the current batch to the same segment and next time we check segment.size is overflown will be negative and still fail to pass the check segment.size > config.segmentSize and we keep appending to the same LogSegment.

if (segment.size + messageSize > (long) config.segmentSize
This condition wouldn't work since segment.size is Int and if its value is anywhere closer to Int.MaxValue adding the current messages size will cause it overflown.

we can change the above condition to 
if (segment.size.toLong + messageSize > config.segmentSize) 
and changing the comment to 
LogSegment will be rolled if  segment.size + messagesBatch.size is greater than config.segmentSize. 
Please let me know if these changes looks good. Thanks.


- Sriharsha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55526
-----------------------------------------------------------


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Jun Rao <ju...@gmail.com>.

> On Oct. 6, 2014, 5:24 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/log/Log.scala, line 502
> > <https://reviews.apache.org/r/26346/diff/3/?file=714142#file714142line502>
> >
> >     We probably should enforce that a segment is never larger than config.segmentSize. So, if messageSize is larger than config.segmentSize, we should just throw an exception. Once we do that, it seems that we can just use the following check to cover both conditions.
> >     
> >     if (segment.size + messageSize > (long) config.segmentSize
> >     
> >     We likely need to adjust the comment above accordingly.
> 
> Sriharsha Chintalapani wrote:
>     Sorry I am not able follow . can you please elaborate on this " So, if messageSize is larger than config.segmentSize,".
>     Here the issue is not the messageSize is larger than the config.segmentSize.
>     Currently we only roll when segment.size is greater than the config.segmentSize and the edge case here is
>     if the config.segmentSize is Int.MaxValue and the current segment.size is Int.MaxValue - 1 we still wouldn't roll the segment
>     and append the current batch to the same segment and next time we check segment.size is overflown will be negative and still fail to pass the check segment.size > config.segmentSize and we keep appending to the same LogSegment.
>     
>     if (segment.size + messageSize > (long) config.segmentSize
>     This condition wouldn't work since segment.size is Int and if its value is anywhere closer to Int.MaxValue adding the current messages size will cause it overflown.
>     
>     we can change the above condition to 
>     if (segment.size.toLong + messageSize > config.segmentSize) 
>     and changing the comment to 
>     LogSegment will be rolled if  segment.size + messagesBatch.size is greater than config.segmentSize. 
>     Please let me know if these changes looks good. Thanks.

Yes, my point is that the current implementation of config.segmentSize is a bit misleading. It actually allows a log segment to be larger than config.segmentSize. It's easier to understand if we disallow that. Also, if we never allow a segment to go beyond config.segmentSize, it will never overflow since config.segmentSize is capped at 2G. The following check that you suggested will suffice.

if (segment.size.toLong + messageSize > config.segmentSize)

Then the issue is what happens when the appended message set itself is larger than config.segmentSize. If you just have the above check, you will still allow a segment to have a size larger than config.segmentSize. Another issue is that if the active segment is empty, we will end up rolling a new segment with the same name, which will confuse the broker. In this case, it will be easier to reject the message set by throwing an exception.


- Jun


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55526
-----------------------------------------------------------


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Jun Rao <ju...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55526
-----------------------------------------------------------



core/src/main/scala/kafka/log/Log.scala
<https://reviews.apache.org/r/26346/#comment95904>

    We probably should enforce that a segment is never larger than config.segmentSize. So, if messageSize is larger than config.segmentSize, we should just throw an exception. Once we do that, it seems that we can just use the following check to cover both conditions.
    
    if (segment.size + messageSize > (long) config.segmentSize
    
    We likely need to adjust the comment above accordingly.


- Jun Rao


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.

> On Oct. 7, 2014, 12:42 a.m., Jay Kreps wrote:
> > core/src/main/scala/kafka/log/Log.scala, line 502
> > <https://reviews.apache.org/r/26346/diff/3/?file=714142#file714142line502>
> >
> >     It is a bit subtle that you are checking for overflow this way. What we mean to check is just that there is sufficient room in the segment for this message, which I think we can do by checking:
> >     
> >     segment.size > config.segmentSize - messagesSize

Thanks Jay and Jun for the review and suggesstions. Please check the latest patch.


- Sriharsha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55619
-----------------------------------------------------------


On Oct. 7, 2014, 8:39 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2014, 8:39 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Merge remote-tracking branch 'origin/trunk' into KAFKA-1670
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/common/MessageSetSizeTooLargeException.scala PRE-CREATION 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogManagerTest.scala 59bd8a981b3fb8595dd6e790a30071092978a88d 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
>   core/src/test/scala/unit/kafka/server/LogOffsetTest.scala 9556ed92c61ffee5423be962bcdbe64c71e1f2fa 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Jay Kreps <bo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55619
-----------------------------------------------------------



core/src/main/scala/kafka/log/Log.scala
<https://reviews.apache.org/r/26346/#comment95982>

    I don't know if this comment is very intuitive. Let's just document that messageSize is the size of the message about to be appended and that this method will roll the log if either (1) the log is full, (2) the max time has elapsed, or (3) the index is full.



core/src/main/scala/kafka/log/Log.scala
<https://reviews.apache.org/r/26346/#comment95983>

    It is a bit subtle that you are checking for overflow this way. What we mean to check is just that there is sufficient room in the segment for this message, which I think we can do by checking:
    
    segment.size > config.segmentSize - messagesSize


- Jay Kreps


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/
-----------------------------------------------------------

(Updated Oct. 6, 2014, 4:48 p.m.)


Review request for kafka.


Bugs: KAFKA-1670
    https://issues.apache.org/jira/browse/KAFKA-1670


Repository: kafka


Description
-------

KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


Diffs (updated)
-----

  core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 

Diff: https://reviews.apache.org/r/26346/diff/


Testing
-------


Thanks,

Sriharsha Chintalapani


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.

> On Oct. 5, 2014, 11:35 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/log/Log.scala, line 499
> > <https://reviews.apache.org/r/26346/diff/2/?file=713902#file713902line499>
> >
> >     Would it be simpler to just do the following?
> >     
> >     if (segment.size + messageSize < (long) config.segmentSize)

that would cause it roll everytime a message batch is appended . Lets say the segment.size is 10 and messagesSize is 10 and config.segmentSize is 2147483647
the above check will pass and roll the current segment without reaching the  config.segmentSize.


- Sriharsha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55479
-----------------------------------------------------------


On Oct. 5, 2014, 3:17 a.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 5, 2014, 3:17 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Neha Narkhede <ne...@gmail.com>.

> On Oct. 5, 2014, 11:35 p.m., Jun Rao wrote:
> > core/src/test/scala/unit/kafka/log/LogTest.scala, lines 113-129
> > <https://reviews.apache.org/r/26346/diff/2/?file=713903#file713903line113>
> >
> >     Appending 2GB of data in a unit test is probably too long. We can probably just manually validate the fix and skip the unit test.

The concern I have with manual validation only is that over time with enough of these bugs, we don't any coverage for regression testing. Any chance this can at least be a system test?


- Neha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55479
-----------------------------------------------------------


On Oct. 6, 2014, 4:48 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 6, 2014, 4:48 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Jun Rao <ju...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55479
-----------------------------------------------------------


Thanks for the patch. Some comments below.


core/src/main/scala/kafka/log/Log.scala
<https://reviews.apache.org/r/26346/#comment95851>

    This probably should be moved to before line 285 since validMessages may change before we do the append.



core/src/main/scala/kafka/log/Log.scala
<https://reviews.apache.org/r/26346/#comment95850>

    Would it be simpler to just do the following?
    
    if (segment.size + messageSize < (long) config.segmentSize)



core/src/test/scala/unit/kafka/log/LogTest.scala
<https://reviews.apache.org/r/26346/#comment95849>

    Appending 2GB of data in a unit test is probably too long. We can probably just manually validate the fix and skip the unit test.


- Jun Rao


On Oct. 5, 2014, 3:17 a.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 5, 2014, 3:17 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/
-----------------------------------------------------------

(Updated Oct. 5, 2014, 3:17 a.m.)


Review request for kafka.


Bugs: KAFKA-1670
    https://issues.apache.org/jira/browse/KAFKA-1670


Repository: kafka


Description
-------

KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.


Diffs (updated)
-----

  core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
  core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 

Diff: https://reviews.apache.org/r/26346/diff/


Testing
-------


Thanks,

Sriharsha Chintalapani


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Sriharsha Chintalapani <ha...@hortonworks.com>.

> On Oct. 4, 2014, 11:28 p.m., Neha Narkhede wrote:
> > Can you try to add a unit test for this?

I added unit test under LogTest. This test very basic its just adds messages to file until the file reaches Int.Max. 
This will result in couple of issues first being atleast having 2G+ tmp space and also adds atleast 1 min to test execution latency.


- Sriharsha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55455
-----------------------------------------------------------


On Oct. 5, 2014, 3:17 a.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 5, 2014, 3:17 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 577d102fc2eb6bb1a72326141ecd431db6d66f04 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>


Re: Review Request 26346: Patch for KAFKA-1670

Posted by Neha Narkhede <ne...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26346/#review55455
-----------------------------------------------------------


Can you try to add a unit test for this?


core/src/main/scala/kafka/log/Log.scala
<https://reviews.apache.org/r/26346/#comment95822>

    Could you please change the API docs to reflect this change?


- Neha Narkhede


On Oct. 4, 2014, 10:28 p.m., Sriharsha Chintalapani wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26346/
> -----------------------------------------------------------
> 
> (Updated Oct. 4, 2014, 10:28 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1670
>     https://issues.apache.org/jira/browse/KAFKA-1670
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> KAFKA-1670. Corrupt log files for segment.bytes values close to Int.MaxInt.
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/Log.scala 0ddf97bd30311b6039e19abade41d2fbbad2f59b 
> 
> Diff: https://reviews.apache.org/r/26346/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>