You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Manikumar Reddy O <ma...@gmail.com> on 2014/09/23 18:20:08 UTC

Re: Review Request 24214: Patch for KAFKA-1374

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/
-----------------------------------------------------------

(Updated Sept. 23, 2014, 4:20 p.m.)


Review request for kafka.


Bugs: KAFKA-1374
    https://issues.apache.org/jira/browse/KAFKA-1374


Repository: kafka


Description (updated)
-------

Addresing Jun's comments


Diffs (updated)
-----

  core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff 
  core/src/main/scala/kafka/tools/TestLogCleaning.scala 1d4ea93f2ba8d4d4d47a307cd47f54a15d3d30dd 
  core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 5bfa764638e92f217d0ff7108ec8f53193c22978 

Diff: https://reviews.apache.org/r/24214/diff/


Testing
-------


Thanks,

Manikumar Reddy O


Re: Review Request 24214: Patch for KAFKA-1374

Posted by Manikumar Reddy O <ma...@gmail.com>.

> On May 12, 2015, 2:01 p.m., Joel Koshy wrote:
> > core/src/main/scala/kafka/log/LogCleaner.scala, line 409
> > <https://reviews.apache.org/r/24214/diff/9/?file=824405#file824405line409>
> >
> >     I would suggest one of two options over this (i.e., instead of two helper methods)
> >     - Inline both here and get rid of those
> >     - Have a single private helper (e.g., collectRetainedMessages)

removed the  helper methods


> On May 12, 2015, 2:01 p.m., Joel Koshy wrote:
> > core/src/main/scala/kafka/log/LogCleaner.scala, line 479
> > <https://reviews.apache.org/r/24214/diff/9/?file=824405#file824405line479>
> >
> >     We should now compress with the compression codec of the topic (KAFKA-1499)

will do as separate JIRA


> On May 12, 2015, 2:01 p.m., Joel Koshy wrote:
> > core/src/main/scala/kafka/log/LogCleaner.scala, line 498
> > <https://reviews.apache.org/r/24214/diff/9/?file=824405#file824405line498>
> >
> >     We should instead do a trivial refactor in ByteBufferMessageSet to compress messages in a preallocated buffer. It would be preferable to avoid having this compression logic in different places.

moved the compresssMessages() method to ByteBufferMessageSet class. Pl let me know your thoughts..


- Manikumar Reddy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/#review83392
-----------------------------------------------------------


On May 18, 2015, 5:29 p.m., Manikumar Reddy O wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24214/
> -----------------------------------------------------------
> 
> (Updated May 18, 2015, 5:29 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1374
>     https://issues.apache.org/jira/browse/KAFKA-1374
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Addressing Joel's comments
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/LogCleaner.scala abea8b251895a5cc0788c6e25b112a2935a3f631 
>   core/src/main/scala/kafka/message/ByteBufferMessageSet.scala 9dfe914991aaf82162e5e300c587c794555d5fd0 
>   core/src/main/scala/kafka/message/MessageSet.scala 28b56e68cfdbbf107dd7cbd248ffa8fa6bbcd13f 
>   core/src/test/scala/kafka/tools/TestLogCleaning.scala 844589427cb9337acd89a5239a98b811ee58118e 
>   core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 3b5aa9dc3b7ac5893c1d281ae1326be0e9ed8aad 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 76d3bfd378f32fd2b216b3ebdec86e2070491924 
> 
> Diff: https://reviews.apache.org/r/24214/diff/
> 
> 
> Testing
> -------
> 
> /*TestLogCleaning stress test output for compressed messages/
> 
> Producing 100000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-6014466306002699464.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-177538909590644701.txt
> 100000 rows of data produced, 13165 rows of data consumed (86.8% reduction).
> De-duplicating and validating output files...
> Validated 9005 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-3298578695475992991.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-7192293977610206930.txt
> 1000000 rows of data produced, 119926 rows of data consumed (88.0% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-3336255463347572934.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-9149188270705707725.txt
> 10000000 rows of data produced, 1645281 rows of data consumed (83.5% reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> /*TestLogCleaning stress test output for non-compressed messages*/
> 
> Producing 100000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-5174543709786189363.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5143455017777144701.txt
> 100000 rows of data produced, 22775 rows of data consumed (77.2% reduction).
> De-duplicating and validating output files...
> Validated 17874 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-7814446915546169271.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5172557663160447626.txt
> 1000000 rows of data produced, 129230 rows of data consumed (87.1% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-6092986571905399164.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-63626021421841220.txt
> 10000000 rows of data produced, 1136608 rows of data consumed (88.6% reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> Thanks,
> 
> Manikumar Reddy O
> 
>


Re: Review Request 24214: Patch for KAFKA-1374

Posted by Joel Koshy <jj...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/#review83392
-----------------------------------------------------------


Sorry for the delay. Overall, this looks good.

As discussed earlier, this patch needs a minor rebase.

There are a couple of points to note:
- In KAFKA-1499 you added broker-side compression. When writing out the compacted messages, we should compress using the configured compression codec. We can do this as an incremental change if you prefer. i.e., your current patch makes the log cleaner compression-aware. A subsequent patch can handle writing out to the configured codec. That part could be non-trivial as we would then probably want to do some batching when writing out compacted compressed messages.
- In KAFKA-1755 I had added some defensive code to prevent compressed messages and unkeyed messages from getting in. The compression-related code will need to be removed. Again, let me know if you need any help with this.

Let me know if you need help with any of this.


core/src/main/scala/kafka/log/LogCleaner.scala
<https://reviews.apache.org/r/24214/#comment134376>

    I would suggest one of two options over this (i.e., instead of two helper methods)
    - Inline both here and get rid of those
    - Have a single private helper (e.g., collectRetainedMessages)



core/src/main/scala/kafka/log/LogCleaner.scala
<https://reviews.apache.org/r/24214/#comment134377>

    We should now compress with the compression codec of the topic (KAFKA-1499)



core/src/main/scala/kafka/log/LogCleaner.scala
<https://reviews.apache.org/r/24214/#comment134378>

    We should instead do a trivial refactor in ByteBufferMessageSet to compress messages in a preallocated buffer. It would be preferable to avoid having this compression logic in different places.


- Joel Koshy


On Jan. 17, 2015, 6:53 p.m., Manikumar Reddy O wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24214/
> -----------------------------------------------------------
> 
> (Updated Jan. 17, 2015, 6:53 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1374
>     https://issues.apache.org/jira/browse/KAFKA-1374
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Updating the rebased code
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/LogCleaner.scala f8e7cd5fabce78c248a9027c4bb374a792508675 
>   core/src/main/scala/kafka/tools/TestLogCleaning.scala af496f7c547a5ac7a4096a6af325dad0d8feec6f 
>   core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 07acd460b1259e0a3f4069b8b8dcd8123ef5810e 
> 
> Diff: https://reviews.apache.org/r/24214/diff/
> 
> 
> Testing
> -------
> 
> /*TestLogCleaning stress test output for compressed messages/
> 
> Producing 100000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-6014466306002699464.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-177538909590644701.txt
> 100000 rows of data produced, 13165 rows of data consumed (86.8% reduction).
> De-duplicating and validating output files...
> Validated 9005 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-3298578695475992991.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-7192293977610206930.txt
> 1000000 rows of data produced, 119926 rows of data consumed (88.0% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-3336255463347572934.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-9149188270705707725.txt
> 10000000 rows of data produced, 1645281 rows of data consumed (83.5% reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> /*TestLogCleaning stress test output for non-compressed messages*/
> 
> Producing 100000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-5174543709786189363.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5143455017777144701.txt
> 100000 rows of data produced, 22775 rows of data consumed (77.2% reduction).
> De-duplicating and validating output files...
> Validated 17874 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-7814446915546169271.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5172557663160447626.txt
> 1000000 rows of data produced, 129230 rows of data consumed (87.1% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-6092986571905399164.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-63626021421841220.txt
> 10000000 rows of data produced, 1136608 rows of data consumed (88.6% reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> Thanks,
> 
> Manikumar Reddy O
> 
>


Re: Review Request 24214: Patch for KAFKA-1374

Posted by Eric Olander <ol...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/#review68569
-----------------------------------------------------------



core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala
<https://reviews.apache.org/r/24214/#comment112888>

    Could be simplified to just:
    for (codec <- CompressionType.values) yield Array(codec.name)


- Eric Olander


On Jan. 17, 2015, 6:53 p.m., Manikumar Reddy O wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24214/
> -----------------------------------------------------------
> 
> (Updated Jan. 17, 2015, 6:53 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1374
>     https://issues.apache.org/jira/browse/KAFKA-1374
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Updating the rebased code
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/LogCleaner.scala f8e7cd5fabce78c248a9027c4bb374a792508675 
>   core/src/main/scala/kafka/tools/TestLogCleaning.scala af496f7c547a5ac7a4096a6af325dad0d8feec6f 
>   core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 07acd460b1259e0a3f4069b8b8dcd8123ef5810e 
> 
> Diff: https://reviews.apache.org/r/24214/diff/
> 
> 
> Testing
> -------
> 
> /*TestLogCleaning stress test output for compressed messages/
> 
> Producing 100000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-6014466306002699464.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-177538909590644701.txt
> 100000 rows of data produced, 13165 rows of data consumed (86.8% reduction).
> De-duplicating and validating output files...
> Validated 9005 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-3298578695475992991.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-7192293977610206930.txt
> 1000000 rows of data produced, 119926 rows of data consumed (88.0% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-3336255463347572934.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-9149188270705707725.txt
> 10000000 rows of data produced, 1645281 rows of data consumed (83.5% reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> /*TestLogCleaning stress test output for non-compressed messages*/
> 
> Producing 100000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-5174543709786189363.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5143455017777144701.txt
> 100000 rows of data produced, 22775 rows of data consumed (77.2% reduction).
> De-duplicating and validating output files...
> Validated 17874 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-7814446915546169271.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5172557663160447626.txt
> 1000000 rows of data produced, 129230 rows of data consumed (87.1% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-6092986571905399164.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-63626021421841220.txt
> 10000000 rows of data produced, 1136608 rows of data consumed (88.6% reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> Thanks,
> 
> Manikumar Reddy O
> 
>


Re: Review Request 24214: Patch for KAFKA-1374

Posted by Joel Koshy <jj...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/#review84278
-----------------------------------------------------------


Thanks for the updated patch. This looks good. I ended up rebasing while you were working on this :) I have a few additional edits which I noted below which I will upload shortly.


core/src/main/scala/kafka/log/LogCleaner.scala
<https://reviews.apache.org/r/24214/#comment135488>

    Minor improvement: we can avoid an extra copy by filtering the iterator above, and then materializing once.



core/src/main/scala/kafka/log/LogCleaner.scala
<https://reviews.apache.org/r/24214/#comment135497>

    I'm wondering if it would be helpful to split stats into compressed vs noncompressed.
    
    E.g., x bytes read (from y compressed bytes); n messages read (from m compressed messages) and so on...



core/src/main/scala/kafka/log/LogCleaner.scala
<https://reviews.apache.org/r/24214/#comment135489>

    The last statement can be !redundant && !obsoleteDelete



core/src/main/scala/kafka/message/ByteBufferMessageSet.scala
<https://reviews.apache.org/r/24214/#comment135490>

    I actually had a different thought - i.e., to avoid duplicating the compression code in BBMS. Then I ran into the issue that you probably saw - i.e., the BBMS create method isn't very amenable to refactor with pre-assigned offsets. So I think what you originally had was actually better.
    
    Ideally we should have a compress (raw bytes) method and just use that in both places. In fact, we can consider using the Compressor in clients - which will have the added benefit of identical compression in use in both the broker and clients. E.g., right now it is possible to be under the message size limit on the client and still exceed it on the broker.



core/src/main/scala/kafka/message/MessageSet.scala
<https://reviews.apache.org/r/24214/#comment135491>

    Can do without this addition.



core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala
<https://reviews.apache.org/r/24214/#comment135494>

    Minor improvement here to avoid the extra hashmap.



core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala
<https://reviews.apache.org/r/24214/#comment135495>

    Can use Stream.cons for convenience.



core/src/test/scala/unit/kafka/log/LogTest.scala
<https://reviews.apache.org/r/24214/#comment135496>

    Few more minor edits - to test appending keyed compressed messages.


- Joel Koshy


On May 18, 2015, 5:29 p.m., Manikumar Reddy O wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24214/
> -----------------------------------------------------------
> 
> (Updated May 18, 2015, 5:29 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1374
>     https://issues.apache.org/jira/browse/KAFKA-1374
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Addressing Joel's comments
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/log/LogCleaner.scala abea8b251895a5cc0788c6e25b112a2935a3f631 
>   core/src/main/scala/kafka/message/ByteBufferMessageSet.scala 9dfe914991aaf82162e5e300c587c794555d5fd0 
>   core/src/main/scala/kafka/message/MessageSet.scala 28b56e68cfdbbf107dd7cbd248ffa8fa6bbcd13f 
>   core/src/test/scala/kafka/tools/TestLogCleaning.scala 844589427cb9337acd89a5239a98b811ee58118e 
>   core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 3b5aa9dc3b7ac5893c1d281ae1326be0e9ed8aad 
>   core/src/test/scala/unit/kafka/log/LogTest.scala 76d3bfd378f32fd2b216b3ebdec86e2070491924 
> 
> Diff: https://reviews.apache.org/r/24214/diff/
> 
> 
> Testing
> -------
> 
> /*TestLogCleaning stress test output for compressed messages/
> 
> Producing 100000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-6014466306002699464.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-177538909590644701.txt
> 100000 rows of data produced, 13165 rows of data consumed (86.8% reduction).
> De-duplicating and validating output files...
> Validated 9005 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-3298578695475992991.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-7192293977610206930.txt
> 1000000 rows of data produced, 119926 rows of data consumed (88.0% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-3336255463347572934.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-9149188270705707725.txt
> 10000000 rows of data produced, 1645281 rows of data consumed (83.5% reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> /*TestLogCleaning stress test output for non-compressed messages*/
> 
> Producing 100000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-5174543709786189363.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5143455017777144701.txt
> 100000 rows of data produced, 22775 rows of data consumed (77.2% reduction).
> De-duplicating and validating output files...
> Validated 17874 values, 0 mismatches.
> 
> Producing 1000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-7814446915546169271.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5172557663160447626.txt
> 1000000 rows of data produced, 129230 rows of data consumed (87.1% reduction).
> De-duplicating and validating output files...
> Validated 89947 values, 0 mismatches.
> 
> Producing 10000000 messages...
> Logging produce requests to /tmp/kafka-log-cleaner-produced-6092986571905399164.txt
> Sleeping for 120 seconds...
> Consuming messages...
> Logging consumed messages to /tmp/kafka-log-cleaner-consumed-63626021421841220.txt
> 10000000 rows of data produced, 1136608 rows of data consumed (88.6% reduction).
> De-duplicating and validating output files...
> Validated 899853 values, 0 mismatches.
> 
> 
> Thanks,
> 
> Manikumar Reddy O
> 
>


Re: Review Request 24214: Patch for KAFKA-1374

Posted by Manikumar Reddy O <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/
-----------------------------------------------------------

(Updated May 18, 2015, 5:29 p.m.)


Review request for kafka.


Bugs: KAFKA-1374
    https://issues.apache.org/jira/browse/KAFKA-1374


Repository: kafka


Description (updated)
-------

Addressing Joel's comments


Diffs (updated)
-----

  core/src/main/scala/kafka/log/LogCleaner.scala abea8b251895a5cc0788c6e25b112a2935a3f631 
  core/src/main/scala/kafka/message/ByteBufferMessageSet.scala 9dfe914991aaf82162e5e300c587c794555d5fd0 
  core/src/main/scala/kafka/message/MessageSet.scala 28b56e68cfdbbf107dd7cbd248ffa8fa6bbcd13f 
  core/src/test/scala/kafka/tools/TestLogCleaning.scala 844589427cb9337acd89a5239a98b811ee58118e 
  core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 3b5aa9dc3b7ac5893c1d281ae1326be0e9ed8aad 
  core/src/test/scala/unit/kafka/log/LogTest.scala 76d3bfd378f32fd2b216b3ebdec86e2070491924 

Diff: https://reviews.apache.org/r/24214/diff/


Testing
-------

/*TestLogCleaning stress test output for compressed messages/

Producing 100000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-6014466306002699464.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-177538909590644701.txt
100000 rows of data produced, 13165 rows of data consumed (86.8% reduction).
De-duplicating and validating output files...
Validated 9005 values, 0 mismatches.

Producing 1000000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-3298578695475992991.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-7192293977610206930.txt
1000000 rows of data produced, 119926 rows of data consumed (88.0% reduction).
De-duplicating and validating output files...
Validated 89947 values, 0 mismatches.

Producing 10000000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-3336255463347572934.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-9149188270705707725.txt
10000000 rows of data produced, 1645281 rows of data consumed (83.5% reduction).
De-duplicating and validating output files...
Validated 899853 values, 0 mismatches.


/*TestLogCleaning stress test output for non-compressed messages*/

Producing 100000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-5174543709786189363.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5143455017777144701.txt
100000 rows of data produced, 22775 rows of data consumed (77.2% reduction).
De-duplicating and validating output files...
Validated 17874 values, 0 mismatches.

Producing 1000000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-7814446915546169271.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5172557663160447626.txt
1000000 rows of data produced, 129230 rows of data consumed (87.1% reduction).
De-duplicating and validating output files...
Validated 89947 values, 0 mismatches.

Producing 10000000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-6092986571905399164.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-63626021421841220.txt
10000000 rows of data produced, 1136608 rows of data consumed (88.6% reduction).
De-duplicating and validating output files...
Validated 899853 values, 0 mismatches.


Thanks,

Manikumar Reddy O


Re: Review Request 24214: Patch for KAFKA-1374

Posted by Manikumar Reddy O <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/
-----------------------------------------------------------

(Updated Jan. 17, 2015, 6:53 p.m.)


Review request for kafka.


Bugs: KAFKA-1374
    https://issues.apache.org/jira/browse/KAFKA-1374


Repository: kafka


Description
-------

Updating the rebased code


Diffs
-----

  core/src/main/scala/kafka/log/LogCleaner.scala f8e7cd5fabce78c248a9027c4bb374a792508675 
  core/src/main/scala/kafka/tools/TestLogCleaning.scala af496f7c547a5ac7a4096a6af325dad0d8feec6f 
  core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 07acd460b1259e0a3f4069b8b8dcd8123ef5810e 

Diff: https://reviews.apache.org/r/24214/diff/


Testing (updated)
-------

/*TestLogCleaning stress test output for compressed messages/

Producing 100000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-6014466306002699464.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-177538909590644701.txt
100000 rows of data produced, 13165 rows of data consumed (86.8% reduction).
De-duplicating and validating output files...
Validated 9005 values, 0 mismatches.

Producing 1000000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-3298578695475992991.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-7192293977610206930.txt
1000000 rows of data produced, 119926 rows of data consumed (88.0% reduction).
De-duplicating and validating output files...
Validated 89947 values, 0 mismatches.

Producing 10000000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-3336255463347572934.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-9149188270705707725.txt
10000000 rows of data produced, 1645281 rows of data consumed (83.5% reduction).
De-duplicating and validating output files...
Validated 899853 values, 0 mismatches.


/*TestLogCleaning stress test output for non-compressed messages*/

Producing 100000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-5174543709786189363.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5143455017777144701.txt
100000 rows of data produced, 22775 rows of data consumed (77.2% reduction).
De-duplicating and validating output files...
Validated 17874 values, 0 mismatches.

Producing 1000000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-7814446915546169271.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-5172557663160447626.txt
1000000 rows of data produced, 129230 rows of data consumed (87.1% reduction).
De-duplicating and validating output files...
Validated 89947 values, 0 mismatches.

Producing 10000000 messages...
Logging produce requests to /tmp/kafka-log-cleaner-produced-6092986571905399164.txt
Sleeping for 120 seconds...
Consuming messages...
Logging consumed messages to /tmp/kafka-log-cleaner-consumed-63626021421841220.txt
10000000 rows of data produced, 1136608 rows of data consumed (88.6% reduction).
De-duplicating and validating output files...
Validated 899853 values, 0 mismatches.


Thanks,

Manikumar Reddy O


Re: Review Request 24214: Patch for KAFKA-1374

Posted by Manikumar Reddy O <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/
-----------------------------------------------------------

(Updated Jan. 17, 2015, 6:51 p.m.)


Review request for kafka.


Bugs: KAFKA-1374
    https://issues.apache.org/jira/browse/KAFKA-1374


Repository: kafka


Description (updated)
-------

Updating the rebased code


Diffs (updated)
-----

  core/src/main/scala/kafka/log/LogCleaner.scala f8e7cd5fabce78c248a9027c4bb374a792508675 
  core/src/main/scala/kafka/tools/TestLogCleaning.scala af496f7c547a5ac7a4096a6af325dad0d8feec6f 
  core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 07acd460b1259e0a3f4069b8b8dcd8123ef5810e 

Diff: https://reviews.apache.org/r/24214/diff/


Testing (updated)
-------

/safe/KAFKA/docs/TestLogCleaning.txt


Thanks,

Manikumar Reddy O


Re: Review Request 24214: Patch for KAFKA-1374

Posted by Manikumar Reddy O <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/
-----------------------------------------------------------

(Updated Oct. 3, 2014, 1:50 p.m.)


Review request for kafka.


Bugs: KAFKA-1374
    https://issues.apache.org/jira/browse/KAFKA-1374


Repository: kafka


Description
-------

fixed couple of bugs and updating stress test details


Diffs (updated)
-----

  core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff 
  core/src/main/scala/kafka/tools/TestLogCleaning.scala 1d4ea93f2ba8d4d4d47a307cd47f54a15d3d30dd 
  core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 5bfa764638e92f217d0ff7108ec8f53193c22978 

Diff: https://reviews.apache.org/r/24214/diff/


Testing
-------


Thanks,

Manikumar Reddy O


Re: Review Request 24214: Patch for KAFKA-1374

Posted by Manikumar Reddy O <ma...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24214/
-----------------------------------------------------------

(Updated Oct. 3, 2014, 1:22 p.m.)


Review request for kafka.


Bugs: KAFKA-1374
    https://issues.apache.org/jira/browse/KAFKA-1374


Repository: kafka


Description (updated)
-------

fixed couple of bugs and updating stress test details


Diffs (updated)
-----

  core/src/main/scala/kafka/log/LogCleaner.scala c20de4ad4734c0bd83c5954fdb29464a27b91dff 
  core/src/main/scala/kafka/tools/TestLogCleaning.scala 1d4ea93f2ba8d4d4d47a307cd47f54a15d3d30dd 
  core/src/test/scala/unit/kafka/log/LogCleanerIntegrationTest.scala 5bfa764638e92f217d0ff7108ec8f53193c22978 

Diff: https://reviews.apache.org/r/24214/diff/


Testing
-------


Thanks,

Manikumar Reddy O