You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Dana Powers <da...@gmail.com> on 2016/04/25 18:30:28 UTC

KIP-57: Interoperable LZ4 Framing

Hi all,

I've written up a new KIP based on KAFKA-3160 / fixing LZ4 framing.
The write-up is here:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing

Please take a look if you are using LZ4 compression or are interested
in framing specs.

One question: does anyone out there fall into this category: (1)
deploy from trunk, (2) upgraded to v1 message format, (3) using
LZ4-compressed messages ?


Feedback welcome,

-Dana

Re: KIP-57: Interoperable LZ4 Framing

Posted by Ismael Juma <is...@juma.me.uk>.
Thanks for submitting the KIP Dana.

I think it would make a lot of sense to include this change as part of the
bump to message format 1 (although it's a bit tight given the current
release plan).

Ismael

On Mon, Apr 25, 2016 at 9:30 AM, Dana Powers <da...@gmail.com> wrote:

> Hi all,
>
> I've written up a new KIP based on KAFKA-3160 / fixing LZ4 framing.
> The write-up is here:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
>
> Please take a look if you are using LZ4 compression or are interested
> in framing specs.
>
> One question: does anyone out there fall into this category: (1)
> deploy from trunk, (2) upgraded to v1 message format, (3) using
> LZ4-compressed messages ?
>
>
> Feedback welcome,
>
> -Dana
>

Re: KIP-57: Interoperable LZ4 Framing

Posted by Dana Powers <da...@gmail.com>.
Ok -- removed Public Interfaces discussion. It should be up to date w/
PR review comments now.

-Dana

On Fri, May 6, 2016 at 2:15 PM, Ismael Juma <is...@juma.me.uk> wrote:
> One more suggestion Dana, I would remove the "Public interfaces" section as
> those classes are not actually public (only the classes with Javadoc are
> public: https://kafka.apache.org/090/javadoc/index.html) and the
> information in the KIP is a bit stale when compared to the PR.
>
> Ismael
>
> On Fri, May 6, 2016 at 10:12 PM, Ismael Juma <is...@juma.me.uk> wrote:
>
>> On Sun, May 1, 2016 at 3:57 AM, Dana Powers <da...@gmail.com> wrote:
>>>
>>> > 2. We're completely disabling checksumming of the compressed payload on
>>> > consumption. Normally you'd want to validate each level of framing for
>>> > correct end-to-end validation. You could still do this (albeit more
>>> weakly)
>>> > by validating the checksum is one of the two potentially valid values
>>> > (correct checksum or old, incorrect checksum). This obviously has
>>> > computational cost. Are we sure the tradeoff we're going with makes
>>> sense?
>>>
>>> Yes, to be honest, not validating on consumption is mostly because I just
>>> haven't dug into the bowels of the java client compressor / memory records
>>> call chains. It seems non-trivial to switch validation based on the
>>> message
>>> version in the consumer code. I did not opt for the weak validation that
>>> you
>>> suggest because I think the broker should always validate v1 messages on
>>> produce, and that piece shares the same code path within the lz4 java
>>> classes.
>>> I suppose we could make the default to raise an error on checksums that
>>> fail
>>> weak validation, and then switch to strong validation in the broker.
>>> Alternately,
>>> if you have suggestions on how to wire up the consumer code to switch lz4
>>> behavior based on message version, I would be happy to run with that.
>>
>>
>> The lack of checksum validation on consumption was a concern I had as well
>> (and Jun too, when I checked with him) so I helped Dana with this and the
>> PR now includes consumer validation for V1 messages. Dana, can you please
>> update the KIP?
>>
>> Ismael
>>

Re: KIP-57: Interoperable LZ4 Framing

Posted by Ismael Juma <is...@juma.me.uk>.
One more suggestion Dana, I would remove the "Public interfaces" section as
those classes are not actually public (only the classes with Javadoc are
public: https://kafka.apache.org/090/javadoc/index.html) and the
information in the KIP is a bit stale when compared to the PR.

Ismael

On Fri, May 6, 2016 at 10:12 PM, Ismael Juma <is...@juma.me.uk> wrote:

> On Sun, May 1, 2016 at 3:57 AM, Dana Powers <da...@gmail.com> wrote:
>>
>> > 2. We're completely disabling checksumming of the compressed payload on
>> > consumption. Normally you'd want to validate each level of framing for
>> > correct end-to-end validation. You could still do this (albeit more
>> weakly)
>> > by validating the checksum is one of the two potentially valid values
>> > (correct checksum or old, incorrect checksum). This obviously has
>> > computational cost. Are we sure the tradeoff we're going with makes
>> sense?
>>
>> Yes, to be honest, not validating on consumption is mostly because I just
>> haven't dug into the bowels of the java client compressor / memory records
>> call chains. It seems non-trivial to switch validation based on the
>> message
>> version in the consumer code. I did not opt for the weak validation that
>> you
>> suggest because I think the broker should always validate v1 messages on
>> produce, and that piece shares the same code path within the lz4 java
>> classes.
>> I suppose we could make the default to raise an error on checksums that
>> fail
>> weak validation, and then switch to strong validation in the broker.
>> Alternately,
>> if you have suggestions on how to wire up the consumer code to switch lz4
>> behavior based on message version, I would be happy to run with that.
>
>
> The lack of checksum validation on consumption was a concern I had as well
> (and Jun too, when I checked with him) so I helped Dana with this and the
> PR now includes consumer validation for V1 messages. Dana, can you please
> update the KIP?
>
> Ismael
>

Re: KIP-57: Interoperable LZ4 Framing

Posted by Gwen Shapira <gw...@confluent.io>.
Agree.

On Fri, May 6, 2016 at 3:14 PM, Ismael Juma <is...@juma.me.uk> wrote:
> On Fri, May 6, 2016 at 11:07 PM, Dana Powers <da...@gmail.com> wrote:
>
>> Updated:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
>>
>> Should I restart the vote?
>>
>
> I think the update is small enough that we don't need to restart the vote.
>
> Ismael

Re: KIP-57: Interoperable LZ4 Framing

Posted by Ismael Juma <is...@juma.me.uk>.
On Fri, May 6, 2016 at 11:07 PM, Dana Powers <da...@gmail.com> wrote:

> Updated:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
>
> Should I restart the vote?
>

I think the update is small enough that we don't need to restart the vote.

Ismael

Re: KIP-57: Interoperable LZ4 Framing

Posted by Dana Powers <da...@gmail.com>.
Updated: https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing

Should I restart the vote?

-Dana

On Fri, May 6, 2016 at 2:12 PM, Ismael Juma <is...@juma.me.uk> wrote:
> On Sun, May 1, 2016 at 3:57 AM, Dana Powers <da...@gmail.com> wrote:
>>
>> > 2. We're completely disabling checksumming of the compressed payload on
>> > consumption. Normally you'd want to validate each level of framing for
>> > correct end-to-end validation. You could still do this (albeit more
>> weakly)
>> > by validating the checksum is one of the two potentially valid values
>> > (correct checksum or old, incorrect checksum). This obviously has
>> > computational cost. Are we sure the tradeoff we're going with makes
>> sense?
>>
>> Yes, to be honest, not validating on consumption is mostly because I just
>> haven't dug into the bowels of the java client compressor / memory records
>> call chains. It seems non-trivial to switch validation based on the message
>> version in the consumer code. I did not opt for the weak validation that
>> you
>> suggest because I think the broker should always validate v1 messages on
>> produce, and that piece shares the same code path within the lz4 java
>> classes.
>> I suppose we could make the default to raise an error on checksums that
>> fail
>> weak validation, and then switch to strong validation in the broker.
>> Alternately,
>> if you have suggestions on how to wire up the consumer code to switch lz4
>> behavior based on message version, I would be happy to run with that.
>
>
> The lack of checksum validation on consumption was a concern I had as well
> (and Jun too, when I checked with him) so I helped Dana with this and the
> PR now includes consumer validation for V1 messages. Dana, can you please
> update the KIP?
>
> Ismael

Re: KIP-57: Interoperable LZ4 Framing

Posted by Ismael Juma <is...@juma.me.uk>.
On Sun, May 1, 2016 at 3:57 AM, Dana Powers <da...@gmail.com> wrote:
>
> > 2. We're completely disabling checksumming of the compressed payload on
> > consumption. Normally you'd want to validate each level of framing for
> > correct end-to-end validation. You could still do this (albeit more
> weakly)
> > by validating the checksum is one of the two potentially valid values
> > (correct checksum or old, incorrect checksum). This obviously has
> > computational cost. Are we sure the tradeoff we're going with makes
> sense?
>
> Yes, to be honest, not validating on consumption is mostly because I just
> haven't dug into the bowels of the java client compressor / memory records
> call chains. It seems non-trivial to switch validation based on the message
> version in the consumer code. I did not opt for the weak validation that
> you
> suggest because I think the broker should always validate v1 messages on
> produce, and that piece shares the same code path within the lz4 java
> classes.
> I suppose we could make the default to raise an error on checksums that
> fail
> weak validation, and then switch to strong validation in the broker.
> Alternately,
> if you have suggestions on how to wire up the consumer code to switch lz4
> behavior based on message version, I would be happy to run with that.


The lack of checksum validation on consumption was a concern I had as well
(and Jun too, when I checked with him) so I helped Dana with this and the
PR now includes consumer validation for V1 messages. Dana, can you please
update the KIP?

Ismael

Re: KIP-57: Interoperable LZ4 Framing

Posted by Dana Powers <da...@gmail.com>.
On Fri, Apr 29, 2016 at 6:29 PM, Ewen Cheslack-Postava
<ew...@confluent.io> wrote:
> Two questions:
>
> 1. My understanding based on KIP-35 is that this won't be a problem for
> clients that want to support older broker versions since they will use v0
> produce requests with broken checksum to send to those, and any broker
> advertising support for v1 produce requests will also support valid
> checksums? In other words, the KIP is structured in terms of Java client
> versions, but I'd like to make sure we have the compatibility path for
> non-Java clients cleanly mapped out. (And think we do, especially given
> that Dana is proposing, but still would like an ack on that.)

Yes, I'm treating these as the same:

broker/client <= 0.9
messages == v0
Fetch api version <= 1
Produce api version <= 1

broker/client >= 0.10
messages >= v1
Fetch api version >= 2
Produce api version >= 2

I dont think there will be any problem for clients that want to
support both encodings.

> 2. We're completely disabling checksumming of the compressed payload on
> consumption. Normally you'd want to validate each level of framing for
> correct end-to-end validation. You could still do this (albeit more weakly)
> by validating the checksum is one of the two potentially valid values
> (correct checksum or old, incorrect checksum). This obviously has
> computational cost. Are we sure the tradeoff we're going with makes sense?

Yes, to be honest, not validating on consumption is mostly because I just
haven't dug into the bowels of the java client compressor / memory records
call chains. It seems non-trivial to switch validation based on the message
version in the consumer code. I did not opt for the weak validation that you
suggest because I think the broker should always validate v1 messages on
produce, and that piece shares the same code path within the lz4 java classes.
I suppose we could make the default to raise an error on checksums that fail
weak validation, and then switch to strong validation in the broker.
Alternately,
if you have suggestions on how to wire up the consumer code to switch lz4
behavior based on message version, I would be happy to run with that.

-Dana

Re: KIP-57: Interoperable LZ4 Framing

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Two questions:

1. My understanding based on KIP-35 is that this won't be a problem for
clients that want to support older broker versions since they will use v0
produce requests with broken checksum to send to those, and any broker
advertising support for v1 produce requests will also support valid
checksums? In other words, the KIP is structured in terms of Java client
versions, but I'd like to make sure we have the compatibility path for
non-Java clients cleanly mapped out. (And think we do, especially given
that Dana is proposing, but still would like an ack on that.)
2. We're completely disabling checksumming of the compressed payload on
consumption. Normally you'd want to validate each level of framing for
correct end-to-end validation. You could still do this (albeit more weakly)
by validating the checksum is one of the two potentially valid values
(correct checksum or old, incorrect checksum). This obviously has
computational cost. Are we sure the tradeoff we're going with makes sense?

-Ewen




On Mon, Apr 25, 2016 at 2:01 PM, Gwen Shapira <gw...@confluent.io> wrote:

> Hi Dana,
>
> Thank you for proposing this fix. It looks great to me (and LinkedIn,
> who are running trunk confirmed that they did not use LZ4 yet).
> Since for 0.10.0 time is of essence, do you mind starting a voting
> thread in parallel?
>
> Gwen
>
> On Mon, Apr 25, 2016 at 9:30 AM, Dana Powers <da...@gmail.com>
> wrote:
> > Hi all,
> >
> > I've written up a new KIP based on KAFKA-3160 / fixing LZ4 framing.
> > The write-up is here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
> >
> > Please take a look if you are using LZ4 compression or are interested
> > in framing specs.
> >
> > One question: does anyone out there fall into this category: (1)
> > deploy from trunk, (2) upgraded to v1 message format, (3) using
> > LZ4-compressed messages ?
> >
> >
> > Feedback welcome,
> >
> > -Dana
>



-- 
Thanks,
Ewen

Re: KIP-57: Interoperable LZ4 Framing

Posted by Gwen Shapira <gw...@confluent.io>.
Hi Dana,

Thank you for proposing this fix. It looks great to me (and LinkedIn,
who are running trunk confirmed that they did not use LZ4 yet).
Since for 0.10.0 time is of essence, do you mind starting a voting
thread in parallel?

Gwen

On Mon, Apr 25, 2016 at 9:30 AM, Dana Powers <da...@gmail.com> wrote:
> Hi all,
>
> I've written up a new KIP based on KAFKA-3160 / fixing LZ4 framing.
> The write-up is here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
>
> Please take a look if you are using LZ4 compression or are interested
> in framing specs.
>
> One question: does anyone out there fall into this category: (1)
> deploy from trunk, (2) upgraded to v1 message format, (3) using
> LZ4-compressed messages ?
>
>
> Feedback welcome,
>
> -Dana