You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Ewen Cheslack-Postava <ew...@confluent.io> on 2016/05/03 19:38:28 UTC

Re: [VOTE] KIP-57: Interoperable LZ4 Framing

+1

One caveat on the vote though -- I don't know the details of LZ4 (format,
libraries, etc) well enough to have a handle on whether the changes under
"KafkaLZ4* code" are going to be sufficient to get broad support from other
LZ4 libraries. Are we going to have multiple implementations we can test a
PR with before we merge? Or would any subsequent fixes also have to come
with bumping the produce request version to indicate varying feature
support if we had to further change that code? What I want to avoid is
clients in some languages having to work with broker version numbers
instead of protocol version numbers due to further incompatibilities we
might find.

-Ewen

On Thu, Apr 28, 2016 at 9:01 AM, Dana Powers <da...@gmail.com> wrote:

> Sure thing. Yes, the substantive change is fixing the HC checksum.
>
> But to further improve interoperability, the kafka LZ4 class would no
> longer reject messages that have these optional header flags set. The
> flags might get set if the client/user chooses to use a non-java lz4
> compression library that includes them. In practice, naive support for
> the flags just means reading a few extra bytes in the header and/or
> footer of the payload. The KIP does not intend to use or validate this
> extra data.
>
> ContentSize is described as: "This field has no impact on decoding, it
> just informs the decoder how much data the frame holds (for example,
> to display it during decoding process, or for verification purpose).
> It can be safely skipped by a conformant decoder." We skip it.
>
> ContentChecksum is "Content Checksum validates the result, that all
> blocks were fully transmitted in the correct order and without error,
> and also that the encoding/decoding process itself generated no
> distortion." We skip it.
>
> -Dana
>
>
> On Thu, Apr 28, 2016 at 7:43 AM, Jun Rao <ju...@confluent.io> wrote:
> > Hi, Dana,
> >
> > Could you explain the following from the KIP a bit more? The KIP is
> > intended to just fix the HC checksum, but the following seems to suggest
> > there are other format changes?
> >
> > KafkaLZ4* code:
> >
> >    - add naive support for optional header flags (ContentSize,
> >    ContentChecksum) to enable interoperability with off-the-shelf lz4
> libraries
> >    - the only flag left unsupported is dependent-block compression, which
> >    our implementation does not currently support.
> >
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Apr 25, 2016 at 2:26 PM, Dana Powers <da...@gmail.com>
> wrote:
> >
> >> Hi all,
> >>
> >> Initiating a vote thread because the KIP-57 proposal is specific to
> >> the 0.10 release.
> >>
> >> KIP-57 can be accessed here:
> >> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
> >> >.
> >>
> >> The related JIRA is https://issues.apache.org/jira/browse/KAFKA-3160
> >> and working github PR at https://github.com/apache/kafka/pull/1212
> >>
> >> The vote will run for 72 hours.
> >>
> >> +1 (non-binding)
> >>
>



-- 
Thanks,
Ewen

Re: [VOTE] KIP-57: Interoperable LZ4 Framing

Posted by Magnus Edenhill <ma...@edenhill.se>.
Good work on this Dana.

I'll test it with librdkafka (which uses the official liblz4) and report
back.


2016-05-03 20:02 GMT+02:00 Dana Powers <da...@gmail.com>:

> Yes, great point. The intent of adding "naive" support for the
> remaining LZ4 header flags (contentsize and contentchecksum) is to
> avoid rejecting any lz4 messages framed according to the interoperable
> spec (v1.5.1). If we can accept all such messages (which this KIP
> does), we should never have to make similar 'breaking' changes in the
> future.
>
> That said, there is one feature that is left unsupported:
> block-dependent (de)compression, aka the lz4 "streaming" api. If a
> library *only* supported lz4 streaming, it would be incompatible with
> this kafka implementation. I haven't done a survey of libraries, but I
> would be shocked to find an lz4 implementation that only supported
> streaming -- or even that used streaming as the default. This is
> because it is an optimization on the default block api (this is what
> kafka has used and continues to use -- no change here). There's more
> detail on this here:
> https://github.com/Cyan4973/lz4/wiki/LZ4-Streaming-API-Basics
>
> I should also note that we do not support the old dictionary-id flag,
> which was available in earlier lz4f spec versions but has since been
> removed. I can't find any evidence of it ever being used, which is
> probably why it was dropped from the spec.
>
> Re testing, kafka-python uses https://github.com/darkdragn/lz4tools
> which implements 1.5.0 of the lz4f spec and is listed as one of the
> interoperable libraries. I'm not sure which library Magnus uses in
> librdkafka, but it would be great to get independent verification from
> him that this patch works there as well. You'll note that there is
> currently no interoperable java library :( There is some desire to
> merge the kafka classes back into jpountz/lz4-java, which I think
> would be reasonable after these compatibility fixes. See
> https://github.com/jpountz/lz4-java/issues/21 .
>
> -Dana
>
> On Tue, May 3, 2016 at 10:38 AM, Ewen Cheslack-Postava
> <ew...@confluent.io> wrote:
> > +1
> >
> > One caveat on the vote though -- I don't know the details of LZ4 (format,
> > libraries, etc) well enough to have a handle on whether the changes under
> > "KafkaLZ4* code" are going to be sufficient to get broad support from
> other
> > LZ4 libraries. Are we going to have multiple implementations we can test
> a
> > PR with before we merge? Or would any subsequent fixes also have to come
> > with bumping the produce request version to indicate varying feature
> > support if we had to further change that code? What I want to avoid is
> > clients in some languages having to work with broker version numbers
> > instead of protocol version numbers due to further incompatibilities we
> > might find.
> >
> > -Ewen
> >
> > On Thu, Apr 28, 2016 at 9:01 AM, Dana Powers <da...@gmail.com>
> wrote:
> >
> >> Sure thing. Yes, the substantive change is fixing the HC checksum.
> >>
> >> But to further improve interoperability, the kafka LZ4 class would no
> >> longer reject messages that have these optional header flags set. The
> >> flags might get set if the client/user chooses to use a non-java lz4
> >> compression library that includes them. In practice, naive support for
> >> the flags just means reading a few extra bytes in the header and/or
> >> footer of the payload. The KIP does not intend to use or validate this
> >> extra data.
> >>
> >> ContentSize is described as: "This field has no impact on decoding, it
> >> just informs the decoder how much data the frame holds (for example,
> >> to display it during decoding process, or for verification purpose).
> >> It can be safely skipped by a conformant decoder." We skip it.
> >>
> >> ContentChecksum is "Content Checksum validates the result, that all
> >> blocks were fully transmitted in the correct order and without error,
> >> and also that the encoding/decoding process itself generated no
> >> distortion." We skip it.
> >>
> >> -Dana
> >>
> >>
> >> On Thu, Apr 28, 2016 at 7:43 AM, Jun Rao <ju...@confluent.io> wrote:
> >> > Hi, Dana,
> >> >
> >> > Could you explain the following from the KIP a bit more? The KIP is
> >> > intended to just fix the HC checksum, but the following seems to
> suggest
> >> > there are other format changes?
> >> >
> >> > KafkaLZ4* code:
> >> >
> >> >    - add naive support for optional header flags (ContentSize,
> >> >    ContentChecksum) to enable interoperability with off-the-shelf lz4
> >> libraries
> >> >    - the only flag left unsupported is dependent-block compression,
> which
> >> >    our implementation does not currently support.
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Mon, Apr 25, 2016 at 2:26 PM, Dana Powers <da...@gmail.com>
> >> wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> Initiating a vote thread because the KIP-57 proposal is specific to
> >> >> the 0.10 release.
> >> >>
> >> >> KIP-57 can be accessed here:
> >> >> <
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
> >> >> >.
> >> >>
> >> >> The related JIRA is https://issues.apache.org/jira/browse/KAFKA-3160
> >> >> and working github PR at https://github.com/apache/kafka/pull/1212
> >> >>
> >> >> The vote will run for 72 hours.
> >> >>
> >> >> +1 (non-binding)
> >> >>
> >>
> >
> >
> >
> > --
> > Thanks,
> > Ewen
>

Re: [VOTE] KIP-57: Interoperable LZ4 Framing

Posted by Dana Powers <da...@gmail.com>.
Yes, great point. The intent of adding "naive" support for the
remaining LZ4 header flags (contentsize and contentchecksum) is to
avoid rejecting any lz4 messages framed according to the interoperable
spec (v1.5.1). If we can accept all such messages (which this KIP
does), we should never have to make similar 'breaking' changes in the
future.

That said, there is one feature that is left unsupported:
block-dependent (de)compression, aka the lz4 "streaming" api. If a
library *only* supported lz4 streaming, it would be incompatible with
this kafka implementation. I haven't done a survey of libraries, but I
would be shocked to find an lz4 implementation that only supported
streaming -- or even that used streaming as the default. This is
because it is an optimization on the default block api (this is what
kafka has used and continues to use -- no change here). There's more
detail on this here:
https://github.com/Cyan4973/lz4/wiki/LZ4-Streaming-API-Basics

I should also note that we do not support the old dictionary-id flag,
which was available in earlier lz4f spec versions but has since been
removed. I can't find any evidence of it ever being used, which is
probably why it was dropped from the spec.

Re testing, kafka-python uses https://github.com/darkdragn/lz4tools
which implements 1.5.0 of the lz4f spec and is listed as one of the
interoperable libraries. I'm not sure which library Magnus uses in
librdkafka, but it would be great to get independent verification from
him that this patch works there as well. You'll note that there is
currently no interoperable java library :( There is some desire to
merge the kafka classes back into jpountz/lz4-java, which I think
would be reasonable after these compatibility fixes. See
https://github.com/jpountz/lz4-java/issues/21 .

-Dana

On Tue, May 3, 2016 at 10:38 AM, Ewen Cheslack-Postava
<ew...@confluent.io> wrote:
> +1
>
> One caveat on the vote though -- I don't know the details of LZ4 (format,
> libraries, etc) well enough to have a handle on whether the changes under
> "KafkaLZ4* code" are going to be sufficient to get broad support from other
> LZ4 libraries. Are we going to have multiple implementations we can test a
> PR with before we merge? Or would any subsequent fixes also have to come
> with bumping the produce request version to indicate varying feature
> support if we had to further change that code? What I want to avoid is
> clients in some languages having to work with broker version numbers
> instead of protocol version numbers due to further incompatibilities we
> might find.
>
> -Ewen
>
> On Thu, Apr 28, 2016 at 9:01 AM, Dana Powers <da...@gmail.com> wrote:
>
>> Sure thing. Yes, the substantive change is fixing the HC checksum.
>>
>> But to further improve interoperability, the kafka LZ4 class would no
>> longer reject messages that have these optional header flags set. The
>> flags might get set if the client/user chooses to use a non-java lz4
>> compression library that includes them. In practice, naive support for
>> the flags just means reading a few extra bytes in the header and/or
>> footer of the payload. The KIP does not intend to use or validate this
>> extra data.
>>
>> ContentSize is described as: "This field has no impact on decoding, it
>> just informs the decoder how much data the frame holds (for example,
>> to display it during decoding process, or for verification purpose).
>> It can be safely skipped by a conformant decoder." We skip it.
>>
>> ContentChecksum is "Content Checksum validates the result, that all
>> blocks were fully transmitted in the correct order and without error,
>> and also that the encoding/decoding process itself generated no
>> distortion." We skip it.
>>
>> -Dana
>>
>>
>> On Thu, Apr 28, 2016 at 7:43 AM, Jun Rao <ju...@confluent.io> wrote:
>> > Hi, Dana,
>> >
>> > Could you explain the following from the KIP a bit more? The KIP is
>> > intended to just fix the HC checksum, but the following seems to suggest
>> > there are other format changes?
>> >
>> > KafkaLZ4* code:
>> >
>> >    - add naive support for optional header flags (ContentSize,
>> >    ContentChecksum) to enable interoperability with off-the-shelf lz4
>> libraries
>> >    - the only flag left unsupported is dependent-block compression, which
>> >    our implementation does not currently support.
>> >
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Mon, Apr 25, 2016 at 2:26 PM, Dana Powers <da...@gmail.com>
>> wrote:
>> >
>> >> Hi all,
>> >>
>> >> Initiating a vote thread because the KIP-57 proposal is specific to
>> >> the 0.10 release.
>> >>
>> >> KIP-57 can be accessed here:
>> >> <
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-57+-+Interoperable+LZ4+Framing
>> >> >.
>> >>
>> >> The related JIRA is https://issues.apache.org/jira/browse/KAFKA-3160
>> >> and working github PR at https://github.com/apache/kafka/pull/1212
>> >>
>> >> The vote will run for 72 hours.
>> >>
>> >> +1 (non-binding)
>> >>
>>
>
>
>
> --
> Thanks,
> Ewen