You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Christopher Beard (BLOOMBERG/ 919 3RD A)" <cb...@bloomberg.net> on 2020/08/17 20:23:51 UTC

[DISCUSS] KIP-640 Add log compression analysis tool

Hi everyone,

I would like to start a discussion on KIP-640:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-640%3A+Add+log+compression+analysis+tool

This KIP outlines a new CLI tool which helps compare how the various compression types supported by Kafka reduce the size of a log (and therefore more broadly, of a topic).

I've put together a PR that might help serve as a starting point for comments and suggestions.
[WIP] PR: https://github.com/apache/kafka/pull/9193

Thanks,
Chris Beard

Re: [DISCUSS] KIP-640 Add log compression analysis tool

Posted by Christopher Beard <cb...@bloomberg.net>.
Hi Alex, thanks for the question!

In the simplest sense, the tool doesn't know anything about the messages in the log or any particular batch. The tool would compress the encrypted data to measure the resulting size, but the results would likely show no reduction in data size. Effectively, the tool would just spin a bunch of CPU cycles and produce no interesting results.

It looks like concerns around compression were raised in the KIP-317 discussion, with the possibility of compression being disabled when encryption is used due to concerns about security (which I think are quite valid). My general take on the issue in the context of this KIP would be that this tool is relatively simple in nature and if needed, could be extended upon. If KIP-317 were to change the semantics of how compression is applied to encrypted messages or whether compression is allowed at all, this tool can match those semantics, whatever they may be.

Chris

On 2020/08/24 21:49:29, Alex Wang <al...@linkedin.com.INVALID> wrote: 
> Hi, how will this work with encrypted data in logs if/when KIP-317 gets merged? Encrypted data will be hard to compress, so the analyzer tool might need to acquire the decryption key somewhere measure the compression stats.
> 
> On 2020/08/17 20:23:51, "Christopher Beard (BLOOMBERG/ 919 3RD A)" <cb...@bloomberg.net> wrote: 
> > Hi everyone,
> > 
> > I would like to start a discussion on KIP-640:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-640%3A+Add+log+compression+analysis+tool
> > 
> > This KIP outlines a new CLI tool which helps compare how the various compression types supported by Kafka reduce the size of a log (and therefore more broadly, of a topic).
> > 
> > I've put together a PR that might help serve as a starting point for comments and suggestions.
> > [WIP] PR: https://github.com/apache/kafka/pull/9193
> > 
> > Thanks,
> > Chris Beard
> 

Re: [DISCUSS] KIP-640 Add log compression analysis tool

Posted by Alex Wang <al...@linkedin.com.INVALID>.
Hi, how will this work with encrypted data in logs if/when KIP-317 gets merged? Encrypted data will be hard to compress, so the analyzer tool might need to acquire the decryption key somewhere measure the compression stats.

On 2020/08/17 20:23:51, "Christopher Beard (BLOOMBERG/ 919 3RD A)" <cb...@bloomberg.net> wrote: 
> Hi everyone,
> 
> I would like to start a discussion on KIP-640:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-640%3A+Add+log+compression+analysis+tool
> 
> This KIP outlines a new CLI tool which helps compare how the various compression types supported by Kafka reduce the size of a log (and therefore more broadly, of a topic).
> 
> I've put together a PR that might help serve as a starting point for comments and suggestions.
> [WIP] PR: https://github.com/apache/kafka/pull/9193
> 
> Thanks,
> Chris Beard

Re: [DISCUSS] KIP-640 Add log compression analysis tool

Posted by James Cheng <wu...@gmail.com>.
Chris,

This (understandably) requires access to the log segment files on disk. Managed Kafka services are becoming more popular (Confluent Cloud, Amazon MSK) and they do not expose the log segment files on disk. It’d be great to have an equivalent functionality that would work on managed services.

But, that would be a completely different implementation, so maybe it’s not something you want to handle right now.

-James

Sent from my iPhone

> On Nov 27, 2020, at 10:14 AM, Christopher Beard <cb...@bloomberg.net> wrote:
> 
> Bump. I'd like to gather more feedback on this!
> 
> Chris
> 
>> On 2020/08/17 20:23:51, "Christopher Beard (BLOOMBERG/ 919 3RD A)" <cb...@bloomberg.net> wrote: 
>> Hi everyone,
>> 
>> I would like to start a discussion on KIP-640:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-640%3A+Add+log+compression+analysis+tool
>> 
>> This KIP outlines a new CLI tool which helps compare how the various compression types supported by Kafka reduce the size of a log (and therefore more broadly, of a topic).
>> 
>> I've put together a PR that might help serve as a starting point for comments and suggestions.
>> [WIP] PR: https://github.com/apache/kafka/pull/9193
>> 
>> Thanks,
>> Chris Beard

Re: [DISCUSS] KIP-640 Add log compression analysis tool

Posted by Christopher Beard <cb...@bloomberg.net>.
Bump. I'd like to gather more feedback on this!

Chris

On 2020/08/17 20:23:51, "Christopher Beard (BLOOMBERG/ 919 3RD A)" <cb...@bloomberg.net> wrote: 
> Hi everyone,
> 
> I would like to start a discussion on KIP-640:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-640%3A+Add+log+compression+analysis+tool
> 
> This KIP outlines a new CLI tool which helps compare how the various compression types supported by Kafka reduce the size of a log (and therefore more broadly, of a topic).
> 
> I've put together a PR that might help serve as a starting point for comments and suggestions.
> [WIP] PR: https://github.com/apache/kafka/pull/9193
> 
> Thanks,
> Chris Beard