You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Dongjin Lee <do...@apache.org> on 2018/06/09 13:13:22 UTC

[DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Since there are no responses for a week, I decided to reinitiate the
discussion thread.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression

This KIP is about to introduce ZStandard Compression into Apache Kafka. The
reason why it is posted again has a story: It was originally posted to the
dev mailing list more than one year ago but since it has no performance
report included, it was postponed later. But Some people (including Ivan)
reported excellent performance report with the draft PR, this work is now
reactivated.

The updated KIP document includes some expected problems and their
candidate alternatives. Please have a look when you are free, and give me a
feedback. All kinds of participating are welcome.

Best,
Dongjin

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Hi Ismael,

Sure. Thanks.

- Dongjin

On Wed, Sep 12, 2018 at 11:56 PM Ismael Juma <is...@juma.me.uk> wrote:

> Dongjin, can you please start a vote?
>
> Ismael
>
> On Sun, Sep 9, 2018 at 11:15 PM Dongjin Lee <do...@apache.org> wrote:
>
> > Hi Jason,
> >
> > You are right. Explicit statements are always better. I updated the
> > document following your suggestion.
> >
> > @Magnus
> >
> > Thanks for the inspection. It seems like a new error code is not a
> problem.
> >
> > Thanks,
> > Dongjin
> >
> > On Fri, Sep 7, 2018 at 3:23 AM Jason Gustafson <ja...@confluent.io>
> wrote:
> >
> > > Hi Dongjin,
> > >
> > > The KIP looks good to me. I'd suggest starting a vote. A couple minor
> > > points that might be worth calling out explicitly in the compatibility
> > > section:
> > >
> > > 1. Zstd will only be allowed for the bumped produce API. For older
> > > versions, we return UNSUPPORTED_COMPRESSION_TYPE regardless of the
> > message
> > > format.
> > > 2. Down-conversion of zstd-compressed records will not be supported.
> > > Instead we will return UNSUPPORTED_COMPRESSION_TYPE.
> > >
> > > Does that sound right?
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Thu, Sep 6, 2018 at 1:45 AM, Magnus Edenhill <ma...@edenhill.se>
> > > wrote:
> > >
> > > > > Ismael wrote:
> > > > > Jason, that's an interesting point regarding the Java client. Do we
> > > know
> > > > > what clients in other languages do in these cases?
> > > >
> > > > librdkafka (and its bindings) passes unknown/future errors through to
> > the
> > > > application, the error code remains intact while
> > > > the error string will be set to something like "Err-123?", which
> isn't
> > > very
> > > > helpful to the user but it at least
> > > > preserves the original error code for further troubleshooting.
> > > > For the producer any unknown error returned in the ProduceResponse
> will
> > > be
> > > > considered a permanent delivery failure (no retries),
> > > > and for the consumer any unknown FetchResponse errors will propagate
> > > > directly to the application, trigger a fetch backoff, and then
> > > > continue fetching past that offset.
> > > >
> > > > So, from the client's perspective it is not really a problem if new
> > error
> > > > codes are added to older API versions.
> > > >
> > > > /Magnus
> > > >
> > > >
> > > > Den tors 6 sep. 2018 kl 09:45 skrev Dongjin Lee <dongjin@apache.org
> >:
> > > >
> > > > > I updated the KIP page
> > > > > <
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > >
> > > > > following the discussion here. Please take a look when you are
> free.
> > > > > If you have any opinion, don't hesitate to give me a message.
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > On Fri, Aug 31, 2018 at 11:35 PM Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >
> > > > > > I just updated the draft implementation[^1], rebasing against the
> > > > latest
> > > > > > trunk and implementing error routine (i.e., Error code 74 for
> > > > > > UnsupportedCompressionTypeException.) Since we decided to
> disallow
> > > all
> > > > > > fetch request below version 2.1.0 for the topics specifying
> > > ZStandard,
> > > > I
> > > > > > added an error logic only.
> > > > > >
> > > > > > Please have a look when you are free.
> > > > > >
> > > > > > Thanks,
> > > > > > Dongjin
> > > > > >
> > > > > > [^1]: Please check the last commit here:
> > > > > > https://github.com/apache/kafka/pull/2267
> > > > > >
> > > > > > On Thu, Aug 23, 2018, 8:55 AM Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > > >
> > > > > >> Jason,
> > > > > >>
> > > > > >> Great. +1 for UNSUPPORTED_COMPRESSION_TYPE.
> > > > > >>
> > > > > >> Best,
> > > > > >> Dongjin
> > > > > >>
> > > > > >> On Thu, Aug 23, 2018 at 8:19 AM Jason Gustafson <
> > jason@confluent.io
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hey Dongjin,
> > > > > >>>
> > > > > >>> Yeah that's right. For what it's worth, librdkafka also appears
> > to
> > > > > handle
> > > > > >>> unexpected error codes. I expect that most client
> implementations
> > > > would
> > > > > >>> either pass through the raw type or convert to an enum using
> > > > something
> > > > > >>> like
> > > > > >>> what the java client does. Since we're expecting the client to
> > fail
> > > > > >>> anyway,
> > > > > >>> I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE
> > > error
> > > > > >>> code.
> > > > > >>>
> > > > > >>> -Jason
> > > > > >>>
> > > > > >>> On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <
> dongjin@apache.org
> > >
> > > > > wrote:
> > > > > >>>
> > > > > >>> > Jason and Ismael,
> > > > > >>> >
> > > > > >>> > It seems like the only thing we need to regard if we define a
> > new
> > > > > error
> > > > > >>> > code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the
> > > > implementation
> > > > > >>> of
> > > > > >>> > the other language clients, right? At least, this strategy
> > causes
> > > > any
> > > > > >>> > problem for Java client. Do I understand correctly?
> > > > > >>> >
> > > > > >>> > Thanks,
> > > > > >>> > Dongjin
> > > > > >>> >
> > > > > >>> > On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <
> > dongjin@apache.org>
> > > > > >>> wrote:
> > > > > >>> >
> > > > > >>> > > Jason,
> > > > > >>> > >
> > > > > >>> > > > I think we would only use this error code when we /know/
> > that
> > > > > zstd
> > > > > >>> was
> > > > > >>> > > in use and the client doesn't support it? This is true if
> > > either
> > > > 1)
> > > > > >>> the
> > > > > >>> > > message needs down-conversion and we encounter a zstd
> > > compressed
> > > > > >>> message,
> > > > > >>> > > or 2) if the topic is explicitly configured to use zstd.
> > > > > >>> > >
> > > > > >>> > > Yes, it is right. And you know, the case 1 includes 1.a)
> old
> > > > > clients'
> > > > > >>> > > request v0, v1 records or 1.b) implicit zstd, the
> compression
> > > > type
> > > > > of
> > > > > >>> > > "producer" with Zstd compressed data.
> > > > > >>> > >
> > > > > >>> > > > However, if the compression type is set to "producer,"
> then
> > > the
> > > > > >>> fetched
> > > > > >>> > > data may or may not be compressed with zstd. In this case,
> we
> > > > > return
> > > > > >>> the
> > > > > >>> > > data to the client and expect it to fail parsing. Is that
> > > > correct?
> > > > > >>> > >
> > > > > >>> > > Exactly.
> > > > > >>> > >
> > > > > >>> > > Following your message, I reviewed the implementation of
> > > > > >>> > > `KafkaApis#handleFetchRequest,` which handles the fetch
> > > request.
> > > > > And
> > > > > >>> > found
> > > > > >>> > > that the information we can use is like the following:
> > > > > >>> > >
> > > > > >>> > > 1. Client's fetch request version. (`versionId` variable)
> > > > > >>> > > 2. Log's compression type. (`logConfig` variable)
> > > > > >>> > >
> > > > > >>> > > We can't detect the actual compression type of the data,
> > unless
> > > > we
> > > > > >>> > inspect
> > > > > >>> > > the `RecordBatch` included in the `Records` instance (i.e.,
> > > > > >>> > > `unconvertedRecords` variable.) Since it requires some
> > > > performance
> > > > > >>> issue,
> > > > > >>> > > it is not our option - in short, we can't be sure if given
> > > chunks
> > > > > of
> > > > > >>> data
> > > > > >>> > > are compressed with zstd or not.
> > > > > >>> > >
> > > > > >>> > > So, conclusion: we can return an error in the case of 1.a
> > and 2
> > > > > >>> easily,
> > > > > >>> > > with the information above. In the case 1.b (implicit
> zstd),
> > we
> > > > can
> > > > > >>> just
> > > > > >>> > > return the data by do nothing special and expect it to fail
> > > > > parsing.
> > > > > >>> > >
> > > > > >>> > > Thanks,
> > > > > >>> > > Dongjin
> > > > > >>> > >
> > > > > >>> > > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <
> > > ismaelj@gmail.com>
> > > > > >>> wrote:
> > > > > >>> > >
> > > > > >>> > >> Jason, that's an interesting point regarding the Java
> > client.
> > > Do
> > > > > we
> > > > > >>> know
> > > > > >>> > >> what clients in other languages do in these cases?
> > > > > >>> > >>
> > > > > >>> > >> Ismael
> > > > > >>> > >>
> > > > > >>> > >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <
> > > jason@confluent.io
> > > > >
> > > > > >>> wrote:
> > > > > >>> > >>
> > > > > >>> > >> > Hi Dongjin,
> > > > > >>> > >> >
> > > > > >>> > >> > One of the complications is that old versions of the API
> > > will
> > > > > not
> > > > > >>> > >> expect a
> > > > > >>> > >> > new error code. However, since we expect this to be a
> > fatal
> > > > > error
> > > > > >>> > anyway
> > > > > >>> > >> > for old clients, it may still be more useful to return
> the
> > > > > correct
> > > > > >>> > error
> > > > > >>> > >> > code. For example, the Kafka clients use the following
> > code
> > > to
> > > > > >>> convert
> > > > > >>> > >> the
> > > > > >>> > >> > error code:
> > > > > >>> > >> >
> > > > > >>> > >> >     public static Errors forCode(short code) {
> > > > > >>> > >> >         Errors error = codeToError.get(code);
> > > > > >>> > >> >         if (error != null) {
> > > > > >>> > >> >             return error;
> > > > > >>> > >> >         } else {
> > > > > >>> > >> >             log.warn("Unexpected error code: {}.",
> code);
> > > > > >>> > >> >             return UNKNOWN_SERVER_ERROR;
> > > > > >>> > >> >         }
> > > > > >>> > >> >     }
> > > > > >>> > >> >
> > > > > >>> > >> > If we return an unsupported error code, it will be
> > converted
> > > > to
> > > > > an
> > > > > >>> > >> UNKNOWN
> > > > > >>> > >> > error, but at least we will get the message in the log
> > with
> > > > the
> > > > > >>> > correct
> > > > > >>> > >> > code. That seems preferable to returning a misleading
> > error
> > > > > code.
> > > > > >>> So I
> > > > > >>> > >> > wonder if we can use the new
> UNSUPPORTED_COMPRESSION_TYPE
> > > > error
> > > > > >>> even
> > > > > >>> > for
> > > > > >>> > >> > older versions.
> > > > > >>> > >> >
> > > > > >>> > >> > Also, one question just to check my understanding. I
> think
> > > we
> > > > > >>> would
> > > > > >>> > only
> > > > > >>> > >> > use this error code when we /know/ that zstd was in use
> > and
> > > > the
> > > > > >>> client
> > > > > >>> > >> > doesn't support it? This is true if either 1) the
> message
> > > > needs
> > > > > >>> > >> > down-conversion and we encounter a zstd compressed
> > message,
> > > or
> > > > > 2)
> > > > > >>> if
> > > > > >>> > the
> > > > > >>> > >> > topic is explicitly configured to use zstd. However, if
> > the
> > > > > >>> > compression
> > > > > >>> > >> > type is set to "producer," then the fetched data may or
> > may
> > > > not
> > > > > be
> > > > > >>> > >> > compressed with zstd. In this case, we return the data
> to
> > > the
> > > > > >>> client
> > > > > >>> > and
> > > > > >>> > >> > expect it to fail parsing. Is that correct?
> > > > > >>> > >> >
> > > > > >>> > >> > Thanks,
> > > > > >>> > >> > Jason
> > > > > >>> > >> >
> > > > > >>> > >> >
> > > > > >>> > >> >
> > > > > >>> > >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <
> > > > > dongjin@apache.org>
> > > > > >>> > >> wrote:
> > > > > >>> > >> >
> > > > > >>> > >> > > Ismael, Jason and all,
> > > > > >>> > >> > >
> > > > > >>> > >> > > I rewrote the backward compatibility strategy & its
> > > > > alternatives
> > > > > >>> > like
> > > > > >>> > >> > > following, based on Ismael & Jason's comments. Since
> it
> > is
> > > > not
> > > > > >>> > >> updated to
> > > > > >>> > >> > > the wiki yet, don't hesitate to give me a message if
> you
> > > > have
> > > > > >>> any
> > > > > >>> > >> opinion
> > > > > >>> > >> > > on it.
> > > > > >>> > >> > >
> > > > > >>> > >> > > ```
> > > > > >>> > >> > > *Backward Compatibility*
> > > > > >>> > >> > >
> > > > > >>> > >> > > We need to establish some backward-compatibility
> > strategy
> > > > for
> > > > > >>> the
> > > > > >>> > >> case an
> > > > > >>> > >> > > old client subscribes a topic using ZStandard
> implicitly
> > > > > (i.e.,
> > > > > >>> > >> > > 'compression.type' configuration of given topic is
> > > > 'producer'
> > > > > >>> and
> > > > > >>> > the
> > > > > >>> > >> > > producer compressed the records with ZStandard). We
> have
> > > the
> > > > > >>> > following
> > > > > >>> > >> > > options for this situation:
> > > > > >>> > >> > >
> > > > > >>> > >> > > *A. Support ZStandard to the old clients which can
> > > > understand
> > > > > >>> v0, v1
> > > > > >>> > >> > > messages only.*
> > > > > >>> > >> > >
> > > > > >>> > >> > > This strategy necessarily requires the down-conversion
> > of
> > > v2
> > > > > >>> message
> > > > > >>> > >> > > compressed with Zstandard into v0 or v1 messages,
> which
> > > > means
> > > > > a
> > > > > >>> > >> > > considerable performance degradation. So we rejected
> > this
> > > > > >>> strategy.
> > > > > >>> > >> > >
> > > > > >>> > >> > > *B. Bump the API version and support only v2-available
> > > > > clients*
> > > > > >>> > >> > >
> > > > > >>> > >> > > With this approach, we can message the old clients
> that
> > > they
> > > > > >>> are old
> > > > > >>> > >> and
> > > > > >>> > >> > > should be upgraded. However, there are still several
> > > options
> > > > > >>> for the
> > > > > >>> > >> > Error
> > > > > >>> > >> > > code.
> > > > > >>> > >> > >
> > > > > >>> > >> > > *B.1. INVALID_REQUEST (42)*
> > > > > >>> > >> > >
> > > > > >>> > >> > > This option gives the client so little information;
> the
> > > user
> > > > > >>> can be
> > > > > >>> > >> > > confused about why the client worked correctly in the
> > past
> > > > > >>> suddenly
> > > > > >>> > >> > > encounters a problem. So we rejected this strategy.
> > > > > >>> > >> > >
> > > > > >>> > >> > > *B.2. CORRUPT_MESSAGE (2)*
> > > > > >>> > >> > >
> > > > > >>> > >> > > This option gives inaccurate information; the user can
> > be
> > > > > >>> surprised
> > > > > >>> > >> and
> > > > > >>> > >> > > misunderstand that the log files are broken in some
> way.
> > > So
> > > > we
> > > > > >>> > >> rejected
> > > > > >>> > >> > > this strategy.
> > > > > >>> > >> > >
> > > > > >>> > >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
> > > > > >>> > >> > >
> > > > > >>> > >> > > The advantage of this approach is we don't need to
> > define
> > > a
> > > > > new
> > > > > >>> > error
> > > > > >>> > >> > code;
> > > > > >>> > >> > > we can reuse it and that's all.
> > > > > >>> > >> > >
> > > > > >>> > >> > > The disadvantage of this approach is that it is also a
> > > > little
> > > > > >>> bit
> > > > > >>> > >> vague;
> > > > > >>> > >> > > This error code is defined as a work for KIP-98[^1]
> and
> > > now
> > > > > >>> returned
> > > > > >>> > >> in
> > > > > >>> > >> > the
> > > > > >>> > >> > > transaction error.
> > > > > >>> > >> > >
> > > > > >>> > >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
> > > > > >>> > >> > >
> > > > > >>> > >> > > The advantage of this approach is that it is clear and
> > > > > provides
> > > > > >>> an
> > > > > >>> > >> exact
> > > > > >>> > >> > > description. The disadvantage is we need to add a new
> > > error
> > > > > >>> code.
> > > > > >>> > >> > > ```
> > > > > >>> > >> > >
> > > > > >>> > >> > > *It seems like what we need to choose is now so clear:
> > > > > >>> > >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or
> > > > > >>> UNSUPPORTED_COMPRESSION_TYPE
> > > > > >>> > >> > > (B.4).*
> > > > > >>> > >> > > The first one doesn't need a new error message but the
> > > > latter
> > > > > is
> > > > > >>> > more
> > > > > >>> > >> > > explicit. Which one do you prefer? Since all of you
> have
> > > > much
> > > > > >>> more
> > > > > >>> > >> > > experience and knowledge than me, I will follow your
> > > > decision.
> > > > > >>> The
> > > > > >>> > >> wiki
> > > > > >>> > >> > > page will be updated following the decision also.
> > > > > >>> > >> > >
> > > > > >>> > >> > > Best,
> > > > > >>> > >> > > Dongjin
> > > > > >>> > >> > >
> > > > > >>> > >> > > [^1]:
> https://issues.apache.org/jira/browse/KAFKA-4990
> > > > > >>> > >> > >
> > > > > >>> > >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <
> > > > > ismaelj@gmail.com>
> > > > > >>> > >> wrote:
> > > > > >>> > >> > >
> > > > > >>> > >> > > > Sounds reasonable to me.
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > Ismael
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <
> > > > > >>> jason@confluent.io>
> > > > > >>> > >> > wrote:
> > > > > >>> > >> > > >
> > > > > >>> > >> > > > > Hey Ismael,
> > > > > >>> > >> > > > >
> > > > > >>> > >> > > > > Your summary looks good to me. I think it might
> also
> > > be
> > > > a
> > > > > >>> good
> > > > > >>> > >> idea
> > > > > >>> > >> > to
> > > > > >>> > >> > > > add
> > > > > >>> > >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to
> go
> > > > along
> > > > > >>> with
> > > > > >>> > the
> > > > > >>> > >> > > > version
> > > > > >>> > >> > > > > bumps. We won't be able to use it for old api
> > versions
> > > > > >>> since the
> > > > > >>> > >> > > clients
> > > > > >>> > >> > > > > will not understand it, but we can use it going
> > > forward
> > > > so
> > > > > >>> that
> > > > > >>> > >> we're
> > > > > >>> > >> > > not
> > > > > >>> > >> > > > > stuck in a similar situation with a new message
> > format
> > > > and
> > > > > >>> a new
> > > > > >>> > >> > codec
> > > > > >>> > >> > > to
> > > > > >>> > >> > > > > support. Another option is to use
> > > > UNSUPPORTED_FOR_MESSAGE_
> > > > > >>> > FORMAT,
> > > > > >>> > >> but
> > > > > >>> > >> > > it
> > > > > >>> > >> > > > is
> > > > > >>> > >> > > > > not as explicit.
> > > > > >>> > >> > > > >
> > > > > >>> > >> > > > > -Jason
> > > > > >>> > >> > > > >
> > > > > >>> > >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <
> > > > > >>> ismael@juma.me.uk
> > > > > >>> > >
> > > > > >>> > >> > > wrote:
> > > > > >>> > >> > > > >
> > > > > >>> > >> > > > > > Hi Dongjin and Jason,
> > > > > >>> > >> > > > > >
> > > > > >>> > >> > > > > > I would agree. My summary:
> > > > > >>> > >> > > > > >
> > > > > >>> > >> > > > > > 1. Support zstd with message format 2 only.
> > > > > >>> > >> > > > > > 2. Bump produce and fetch request versions.
> > > > > >>> > >> > > > > > 3. Provide broker errors whenever possible based
> > on
> > > > the
> > > > > >>> > request
> > > > > >>> > >> > > version
> > > > > >>> > >> > > > > and
> > > > > >>> > >> > > > > > rely on clients for the cases where the broker
> > can't
> > > > > >>> validate
> > > > > >>> > >> > > > efficiently
> > > > > >>> > >> > > > > > (example message format 2 consumer that supports
> > the
> > > > > >>> latest
> > > > > >>> > >> fetch
> > > > > >>> > >> > > > version
> > > > > >>> > >> > > > > > but doesn't support zstd).
> > > > > >>> > >> > > > > >
> > > > > >>> > >> > > > > > If there's general agreement on this, I suggest
> we
> > > > > update
> > > > > >>> the
> > > > > >>> > >> KIP
> > > > > >>> > >> > to
> > > > > >>> > >> > > > > state
> > > > > >>> > >> > > > > > the proposal and to move the rejected options to
> > its
> > > > own
> > > > > >>> > >> section.
> > > > > >>> > >> > And
> > > > > >>> > >> > > > > then
> > > > > >>> > >> > > > > > start a vote!
> > > > > >>> > >> > > > > >
> > > > > >>> > >> > > > > > Ismael
> > > > > >>> > >> > > > > >
> > > > > >>> > >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson
> <
> > > > > >>> > >> > jason@confluent.io>
> > > > > >>> > >> > > > > > wrote:
> > > > > >>> > >> > > > > >
> > > > > >>> > >> > > > > > > Hi Dongjin,
> > > > > >>> > >> > > > > > >
> > > > > >>> > >> > > > > > > Yes, that's a good summary. For clients which
> > > > support
> > > > > >>> v2,
> > > > > >>> > the
> > > > > >>> > >> > > client
> > > > > >>> > >> > > > > can
> > > > > >>> > >> > > > > > > parse the message format and hopefully raise a
> > > > useful
> > > > > >>> error
> > > > > >>> > >> > message
> > > > > >>> > >> > > > > > > indicating the unsupported compression type.
> For
> > > > older
> > > > > >>> > >> clients,
> > > > > >>> > >> > our
> > > > > >>> > >> > > > > > options
> > > > > >>> > >> > > > > > > are probably (1) to down-convert to the old
> > format
> > > > > >>> using no
> > > > > >>> > >> > > > compression
> > > > > >>> > >> > > > > > > type, or (2) to return an error code. I'm
> > leaning
> > > > > >>> toward the
> > > > > >>> > >> > latter
> > > > > >>> > >> > > > as
> > > > > >>> > >> > > > > > the
> > > > > >>> > >> > > > > > > simpler solution, but the challenge is
> finding a
> > > > good
> > > > > >>> error
> > > > > >>> > >> code.
> > > > > >>> > >> > > Two
> > > > > >>> > >> > > > > > > possibilities might be INVALID_REQUEST or
> > > > > >>> CORRUPT_MESSAGE.
> > > > > >>> > The
> > > > > >>> > >> > > > downside
> > > > > >>> > >> > > > > > is
> > > > > >>> > >> > > > > > > that old clients probably won't get a helpful
> > > > message.
> > > > > >>> > >> However,
> > > > > >>> > >> > at
> > > > > >>> > >> > > > > least
> > > > > >>> > >> > > > > > > the behavior will be consistent in the sense
> > that
> > > > all
> > > > > >>> > clients
> > > > > >>> > >> > will
> > > > > >>> > >> > > > fail
> > > > > >>> > >> > > > > > if
> > > > > >>> > >> > > > > > > they do not support zstandard.
> > > > > >>> > >> > > > > > >
> > > > > >>> > >> > > > > > > What do you think?
> > > > > >>> > >> > > > > > >
> > > > > >>> > >> > > > > > > Thanks,
> > > > > >>> > >> > > > > > > Jason
> > > > > >>> > >> > > > > > >
> > > > > >>> > >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
> > > > > >>> > >> dongjin@apache.org
> > > > > >>> > >> > >
> > > > > >>> > >> > > > > wrote:
> > > > > >>> > >> > > > > > >
> > > > > >>> > >> > > > > > > > Thanks Jason, I reviewed the down-converting
> > > logic
> > > > > >>> > following
> > > > > >>> > >> > your
> > > > > >>> > >> > > > > > > > explanation.[^1] You mean the following
> > > routines,
> > > > > >>> right?
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > -
> > > > > >>> > >> > > > > > > >
> > > > > https://github.com/apache/kafka/blob/trunk/core/src/
> > > > > >>> > >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> > > > > >>> > >> > > > > > > > -
> > > > > >>> > >> > > > > > > >
> > > > > >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > >>> > >> > > > > > > > main/java/org/apache/kafka/common/record/
> > > > > >>> > >> > > LazyDownConversionRecords.
> > > > > >>> > >> > > > > > > > java#L165
> > > > > >>> > >> > > > > > > > -
> > > > > >>> > >> > > > > > > >
> > > > > >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >>
> > main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > It seems like your stance is like following:
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > 1. In principle, Kafka does not change the
> > > > > compression
> > > > > >>> > codec
> > > > > >>> > >> > when
> > > > > >>> > >> > > > > > > > down-converting, since it requires
> inspecting
> > > the
> > > > > >>> fetched
> > > > > >>> > >> data,
> > > > > >>> > >> > > > which
> > > > > >>> > >> > > > > > is
> > > > > >>> > >> > > > > > > > expensive.
> > > > > >>> > >> > > > > > > > 2. However, there are some cases the fetched
> > > data
> > > > is
> > > > > >>> > >> inspected
> > > > > >>> > >> > > > > anyway.
> > > > > >>> > >> > > > > > In
> > > > > >>> > >> > > > > > > > this case, we can provide compression
> > conversion
> > > > > from
> > > > > >>> > >> Zstandard
> > > > > >>> > >> > > to
> > > > > >>> > >> > > > > > > > classical ones[^2].
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > And from what I understand, the cases where
> > the
> > > > > client
> > > > > >>> > >> without
> > > > > >>> > >> > > > > > ZStandard
> > > > > >>> > >> > > > > > > > support receives ZStandard compressed
> records
> > > can
> > > > be
> > > > > >>> > >> organized
> > > > > >>> > >> > > into
> > > > > >>> > >> > > > > two
> > > > > >>> > >> > > > > > > > cases:
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > a. The 'compression.type' configuration of
> > given
> > > > > >>> topic is
> > > > > >>> > >> > > > 'producer'
> > > > > >>> > >> > > > > > and
> > > > > >>> > >> > > > > > > > the producer compressed the records with
> > > > ZStandard.
> > > > > >>> (that
> > > > > >>> > >> is,
> > > > > >>> > >> > > using
> > > > > >>> > >> > > > > > > > ZStandard implicitly.)
> > > > > >>> > >> > > > > > > > b.  The 'compression.type' configuration of
> > > given
> > > > > >>> topic is
> > > > > >>> > >> > > 'zstd';
> > > > > >>> > >> > > > > that
> > > > > >>> > >> > > > > > > is,
> > > > > >>> > >> > > > > > > > using ZStandard explicitly.
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > As you stated, we don't have to handle the
> > case
> > > b
> > > > > >>> > specially.
> > > > > >>> > >> > So,
> > > > > >>> > >> > > It
> > > > > >>> > >> > > > > > seems
> > > > > >>> > >> > > > > > > > like we can narrow the focus of the problem
> by
> > > > > joining
> > > > > >>> > case
> > > > > >>> > >> 1
> > > > > >>> > >> > and
> > > > > >>> > >> > > > > case
> > > > > >>> > >> > > > > > b
> > > > > >>> > >> > > > > > > > like the following:
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > > Given the topic with 'producer' as its
> > > > > >>> > 'compression.type'
> > > > > >>> > >> > > > > > > configuration,
> > > > > >>> > >> > > > > > > > ZStandard compressed records and old client
> > > > without
> > > > > >>> > >> ZStandard,
> > > > > >>> > >> > is
> > > > > >>> > >> > > > > there
> > > > > >>> > >> > > > > > > any
> > > > > >>> > >> > > > > > > > case we need to inspect the records and can
> > > change
> > > > > the
> > > > > >>> > >> > > compression
> > > > > >>> > >> > > > > > type?
> > > > > >>> > >> > > > > > > If
> > > > > >>> > >> > > > > > > > so, can we provide compression type
> > converting?
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > Do I understand correctly?
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > Best,
> > > > > >>> > >> > > > > > > > Dongjin
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > [^1]: I'm sorry, I found that I was a little
> > bit
> > > > > >>> > >> > misunderstanding
> > > > > >>> > >> > > > how
> > > > > >>> > >> > > > > > API
> > > > > >>> > >> > > > > > > > version works, after reviewing the
> downconvert
> > > > logic
> > > > > >>> & the
> > > > > >>> > >> > > protocol
> > > > > >>> > >> > > > > > > > documentation <
> > > https://kafka.apache.org/protocol
> > > > >.
> > > > > >>> > >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason
> > Gustafson
> > > <
> > > > > >>> > >> > > > jason@confluent.io>
> > > > > >>> > >> > > > > > > > wrote:
> > > > > >>> > >> > > > > > > >
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > But in my opinion, since the client will
> > > fail
> > > > > >>> with the
> > > > > >>> > >> API
> > > > > >>> > >> > > > > version,
> > > > > >>> > >> > > > > > > so
> > > > > >>> > >> > > > > > > > we
> > > > > >>> > >> > > > > > > > > > don't need to down-convert the messages
> > > > anyway.
> > > > > >>> Isn't
> > > > > >>> > >> it?
> > > > > >>> > >> > > So, I
> > > > > >>> > >> > > > > > think
> > > > > >>> > >> > > > > > > > we
> > > > > >>> > >> > > > > > > > > > don't care about this case. (I'm sorry,
> I
> > am
> > > > not
> > > > > >>> > >> familiar
> > > > > >>> > >> > > with
> > > > > >>> > >> > > > > > > > > down-convert
> > > > > >>> > >> > > > > > > > > > logic.)
> > > > > >>> > >> > > > > > > > >
> > > > > >>> > >> > > > > > > > >
> > > > > >>> > >> > > > > > > > > Currently the broker down-converts
> > > automatically
> > > > > >>> when it
> > > > > >>> > >> > > receives
> > > > > >>> > >> > > > > an
> > > > > >>> > >> > > > > > > old
> > > > > >>> > >> > > > > > > > > version of the fetch request (a version
> > which
> > > is
> > > > > >>> known
> > > > > >>> > to
> > > > > >>> > >> > > predate
> > > > > >>> > >> > > > > the
> > > > > >>> > >> > > > > > > > > message format in use). Typically when
> > > > > >>> down-converting
> > > > > >>> > the
> > > > > >>> > >> > > > message
> > > > > >>> > >> > > > > > > > format,
> > > > > >>> > >> > > > > > > > > we use the same compression type, but
> there
> > is
> > > > not
> > > > > >>> much
> > > > > >>> > >> point
> > > > > >>> > >> > > in
> > > > > >>> > >> > > > > > doing
> > > > > >>> > >> > > > > > > so
> > > > > >>> > >> > > > > > > > > when we know the client doesn't support
> it.
> > So
> > > > if
> > > > > >>> > >> zstandard
> > > > > >>> > >> > is
> > > > > >>> > >> > > in
> > > > > >>> > >> > > > > > use,
> > > > > >>> > >> > > > > > > > and
> > > > > >>> > >> > > > > > > > > we have to down-convert anyway, then we
> can
> > > > choose
> > > > > >>> to
> > > > > >>> > use
> > > > > >>> > >> a
> > > > > >>> > >> > > > > different
> > > > > >>> > >> > > > > > > > > compression type or no compression type.
> > > > > >>> > >> > > > > > > > >
> > > > > >>> > >> > > > > > > > > From my perspective, there is no
> significant
> > > > > >>> downside to
> > > > > >>> > >> > > bumping
> > > > > >>> > >> > > > > the
> > > > > >>> > >> > > > > > > > > protocol version and it has several
> > potential
> > > > > >>> benefits.
> > > > > >>> > >> > Version
> > > > > >>> > >> > > > > bumps
> > > > > >>> > >> > > > > > > are
> > > > > >>> > >> > > > > > > > > cheap. The main question mark in my mind
> is
> > > > about
> > > > > >>> > >> > > > down-conversion.
> > > > > >>> > >> > > > > > > > Figuring
> > > > > >>> > >> > > > > > > > > out whether down-conversion is needed is
> > hard
> > > > > >>> generally
> > > > > >>> > >> > without
> > > > > >>> > >> > > > > > > > inspecting
> > > > > >>> > >> > > > > > > > > the fetched data, which is expensive. I
> > think
> > > we
> > > > > >>> agree
> > > > > >>> > in
> > > > > >>> > >> > > > principle
> > > > > >>> > >> > > > > > > that
> > > > > >>> > >> > > > > > > > we
> > > > > >>> > >> > > > > > > > > do not want to have to pay this cost
> > generally
> > > > and
> > > > > >>> > prefer
> > > > > >>> > >> the
> > > > > >>> > >> > > > > clients
> > > > > >>> > >> > > > > > > to
> > > > > >>> > >> > > > > > > > > fail when they see an unhandled
> compression
> > > > type.
> > > > > >>> The
> > > > > >>> > >> point I
> > > > > >>> > >> > > was
> > > > > >>> > >> > > > > > > making
> > > > > >>> > >> > > > > > > > is
> > > > > >>> > >> > > > > > > > > that there are some cases where we are
> > either
> > > > > >>> inspecting
> > > > > >>> > >> the
> > > > > >>> > >> > > data
> > > > > >>> > >> > > > > > > anyway
> > > > > >>> > >> > > > > > > > > (because we have to down-convert the
> message
> > > > > >>> format), or
> > > > > >>> > >> we
> > > > > >>> > >> > > have
> > > > > >>> > >> > > > an
> > > > > >>> > >> > > > > > > easy
> > > > > >>> > >> > > > > > > > > way to tell whether zstandard is in use
> (the
> > > > topic
> > > > > >>> has
> > > > > >>> > it
> > > > > >>> > >> > > > > configured
> > > > > >>> > >> > > > > > > > > explicitly). In the latter case, we don't
> > have
> > > > to
> > > > > >>> handle
> > > > > >>> > >> it
> > > > > >>> > >> > > > > > specially.
> > > > > >>> > >> > > > > > > > But
> > > > > >>> > >> > > > > > > > > we do have to decide how we will handle
> > > > > >>> down-conversion
> > > > > >>> > to
> > > > > >>> > >> > > older
> > > > > >>> > >> > > > > > > formats.
> > > > > >>> > >> > > > > > > > >
> > > > > >>> > >> > > > > > > > > -Jason
> > > > > >>> > >> > > > > > > > >
> > > > > >>> > >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin
> > Lee <
> > > > > >>> > >> > > dongjin@apache.org
> > > > > >>> > >> > > > >
> > > > > >>> > >> > > > > > > wrote:
> > > > > >>> > >> > > > > > > > >
> > > > > >>> > >> > > > > > > > > > Colin and Jason,
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > Thanks for your opinions. In
> summarizing,
> > > the
> > > > > >>> Pros and
> > > > > >>> > >> Cons
> > > > > >>> > >> > > of
> > > > > >>> > >> > > > > > > bumping
> > > > > >>> > >> > > > > > > > > > fetch API version are:
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > Cons:
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > - The Broker can't know whether a given
> > > > message
> > > > > >>> batch
> > > > > >>> > is
> > > > > >>> > >> > > > > compressed
> > > > > >>> > >> > > > > > > > with
> > > > > >>> > >> > > > > > > > > > zstd or not.
> > > > > >>> > >> > > > > > > > > > - Need some additional logic for the
> topic
> > > > > >>> explicitly
> > > > > >>> > >> > > > configured
> > > > > >>> > >> > > > > to
> > > > > >>> > >> > > > > > > use
> > > > > >>> > >> > > > > > > > > > zstd.
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > Pros:
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > - The broker doesn't need to conduct
> > > expensive
> > > > > >>> > >> > > down-conversion.
> > > > > >>> > >> > > > > > > > > > - Can message the users to update their
> > > > client.
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > So, opinions for the
> > backward-compatibility
> > > > > >>> policy by
> > > > > >>> > >> far:
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > - A: bump the API version - +2 (Colin,
> > > Jason)
> > > > > >>> > >> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > Here are my additional comments:
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > @Colin
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > I greatly appreciate your response. In
> the
> > > > case
> > > > > >>> of the
> > > > > >>> > >> > > > dictionary
> > > > > >>> > >> > > > > > > > > support,
> > > > > >>> > >> > > > > > > > > > of course, this issue should be
> addressed
> > > > later
> > > > > >>> so we
> > > > > >>> > >> don't
> > > > > >>> > >> > > > need
> > > > > >>> > >> > > > > it
> > > > > >>> > >> > > > > > > in
> > > > > >>> > >> > > > > > > > > the
> > > > > >>> > >> > > > > > > > > > first version. You are right - it is not
> > > late
> > > > to
> > > > > >>> try
> > > > > >>> > it
> > > > > >>> > >> > after
> > > > > >>> > >> > > > > some
> > > > > >>> > >> > > > > > > > > > benchmarks. What I mean is, we should
> keep
> > > in
> > > > > >>> mind on
> > > > > >>> > >> that
> > > > > >>> > >> > > > > > potential
> > > > > >>> > >> > > > > > > > > > feature.
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > @Jason
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > You wrote,
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > Similarly, if we have to down-convert
> > > anyway
> > > > > >>> because
> > > > > >>> > >> the
> > > > > >>> > >> > > > client
> > > > > >>> > >> > > > > > > does
> > > > > >>> > >> > > > > > > > > not
> > > > > >>> > >> > > > > > > > > > understand the message format, then we
> > could
> > > > > also
> > > > > >>> use
> > > > > >>> > a
> > > > > >>> > >> > > > different
> > > > > >>> > >> > > > > > > > > > compression type.
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > But in my opinion, since the client will
> > > fail
> > > > > >>> with the
> > > > > >>> > >> API
> > > > > >>> > >> > > > > version,
> > > > > >>> > >> > > > > > > so
> > > > > >>> > >> > > > > > > > we
> > > > > >>> > >> > > > > > > > > > don't need to down-convert the messages
> > > > anyway.
> > > > > >>> Isn't
> > > > > >>> > >> it?
> > > > > >>> > >> > > So, I
> > > > > >>> > >> > > > > > think
> > > > > >>> > >> > > > > > > > we
> > > > > >>> > >> > > > > > > > > > don't care about this case. (I'm sorry,
> I
> > am
> > > > not
> > > > > >>> > >> familiar
> > > > > >>> > >> > > with
> > > > > >>> > >> > > > > > > > > down-convert
> > > > > >>> > >> > > > > > > > > > logic.)
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > Please give more opinions. Thanks!
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > - Dongjin
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason
> > > > Gustafson <
> > > > > >>> > >> > > > > jason@confluent.io
> > > > > >>> > >> > > > > > >
> > > > > >>> > >> > > > > > > > > wrote:
> > > > > >>> > >> > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > Hey Colin,
> > > > > >>> > >> > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > The problem for the fetch API is that
> > the
> > > > > broker
> > > > > >>> > does
> > > > > >>> > >> not
> > > > > >>> > >> > > > > > generally
> > > > > >>> > >> > > > > > > > > know
> > > > > >>> > >> > > > > > > > > > if
> > > > > >>> > >> > > > > > > > > > > a batch was compressed with zstd
> unless
> > it
> > > > > >>> parses
> > > > > >>> > it.
> > > > > >>> > >> I
> > > > > >>> > >> > > think
> > > > > >>> > >> > > > > the
> > > > > >>> > >> > > > > > > > goal
> > > > > >>> > >> > > > > > > > > > here
> > > > > >>> > >> > > > > > > > > > > is to avoid the expensive
> > down-conversion
> > > > that
> > > > > >>> is
> > > > > >>> > >> needed
> > > > > >>> > >> > to
> > > > > >>> > >> > > > > > ensure
> > > > > >>> > >> > > > > > > > > > > compatibility because it is only
> > necessary
> > > > if
> > > > > >>> zstd
> > > > > >>> > is
> > > > > >>> > >> > > > actually
> > > > > >>> > >> > > > > in
> > > > > >>> > >> > > > > > > > use.
> > > > > >>> > >> > > > > > > > > > But
> > > > > >>> > >> > > > > > > > > > > as long as old clients can parse the
> > > message
> > > > > >>> format,
> > > > > >>> > >> they
> > > > > >>> > >> > > > > should
> > > > > >>> > >> > > > > > > get
> > > > > >>> > >> > > > > > > > a
> > > > > >>> > >> > > > > > > > > > > reasonable error if they see an
> > > unsupported
> > > > > >>> > >> compression
> > > > > >>> > >> > > type
> > > > > >>> > >> > > > in
> > > > > >>> > >> > > > > > the
> > > > > >>> > >> > > > > > > > > > > attributes. Basically the onus is on
> > users
> > > > to
> > > > > >>> ensure
> > > > > >>> > >> that
> > > > > >>> > >> > > > their
> > > > > >>> > >> > > > > > > > > consumers
> > > > > >>> > >> > > > > > > > > > > have been updated prior to using zstd.
> > It
> > > > > seems
> > > > > >>> > like a
> > > > > >>> > >> > > > > reasonable
> > > > > >>> > >> > > > > > > > > > tradeoff
> > > > > >>> > >> > > > > > > > > > > to me. There are a couple cases that
> > might
> > > > be
> > > > > >>> worth
> > > > > >>> > >> > > thinking
> > > > > >>> > >> > > > > > > through:
> > > > > >>> > >> > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > 1. If a topic is explicitly configured
> > to
> > > > use
> > > > > >>> zstd,
> > > > > >>> > >> then
> > > > > >>> > >> > we
> > > > > >>> > >> > > > > don't
> > > > > >>> > >> > > > > > > > need
> > > > > >>> > >> > > > > > > > > to
> > > > > >>> > >> > > > > > > > > > > check the fetched data for the
> > compression
> > > > > type
> > > > > >>> to
> > > > > >>> > >> know
> > > > > >>> > >> > if
> > > > > >>> > >> > > we
> > > > > >>> > >> > > > > > need
> > > > > >>> > >> > > > > > > > > > > down-conversion. If we did bump the
> > Fetch
> > > > API
> > > > > >>> > version,
> > > > > >>> > >> > then
> > > > > >>> > >> > > > we
> > > > > >>> > >> > > > > > > could
> > > > > >>> > >> > > > > > > > > > handle
> > > > > >>> > >> > > > > > > > > > > this case by either down-converting
> > using
> > > a
> > > > > >>> > different
> > > > > >>> > >> > > > > compression
> > > > > >>> > >> > > > > > > > type
> > > > > >>> > >> > > > > > > > > or
> > > > > >>> > >> > > > > > > > > > > returning an error.
> > > > > >>> > >> > > > > > > > > > > 2. Similarly, if we have to
> down-convert
> > > > > anyway
> > > > > >>> > >> because
> > > > > >>> > >> > the
> > > > > >>> > >> > > > > > client
> > > > > >>> > >> > > > > > > > does
> > > > > >>> > >> > > > > > > > > > not
> > > > > >>> > >> > > > > > > > > > > understand the message format, then we
> > > could
> > > > > >>> also
> > > > > >>> > use
> > > > > >>> > >> a
> > > > > >>> > >> > > > > different
> > > > > >>> > >> > > > > > > > > > > compression type.
> > > > > >>> > >> > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > For the produce API, I think it's
> > > reasonable
> > > > > to
> > > > > >>> bump
> > > > > >>> > >> the
> > > > > >>> > >> > > api
> > > > > >>> > >> > > > > > > version.
> > > > > >>> > >> > > > > > > > > > This
> > > > > >>> > >> > > > > > > > > > > can be used by clients to check
> whether
> > a
> > > > > broker
> > > > > >>> > >> supports
> > > > > >>> > >> > > > zstd.
> > > > > >>> > >> > > > > > For
> > > > > >>> > >> > > > > > > > > > > example, we might support a list of
> > > > preferred
> > > > > >>> > >> compression
> > > > > >>> > >> > > > types
> > > > > >>> > >> > > > > > in
> > > > > >>> > >> > > > > > > > the
> > > > > >>> > >> > > > > > > > > > > producer and we could use the broker
> to
> > > > detect
> > > > > >>> which
> > > > > >>> > >> > > version
> > > > > >>> > >> > > > to
> > > > > >>> > >> > > > > > > use.
> > > > > >>> > >> > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > -Jason
> > > > > >>> > >> > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin
> > > > McCabe <
> > > > > >>> > >> > > > > cmccabe@apache.org
> > > > > >>> > >> > > > > > >
> > > > > >>> > >> > > > > > > > > wrote:
> > > > > >>> > >> > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > Thanks for bumping this, Dongjin.
> > ZStd
> > > > is a
> > > > > >>> good
> > > > > >>> > >> > > > compression
> > > > > >>> > >> > > > > > > codec
> > > > > >>> > >> > > > > > > > > > and I
> > > > > >>> > >> > > > > > > > > > > > hope we can get this support in
> soon!
> > > > > >>> > >> > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > I would say we can just bump the API
> > > > version
> > > > > >>> to
> > > > > >>> > >> > indicate
> > > > > >>> > >> > > > that
> > > > > >>> > >> > > > > > > ZStd
> > > > > >>> > >> > > > > > > > > > > support
> > > > > >>> > >> > > > > > > > > > > > is expected in new clients.  We
> > probably
> > > > > need
> > > > > >>> some
> > > > > >>> > >> way
> > > > > >>> > >> > of
> > > > > >>> > >> > > > > > > > indicating
> > > > > >>> > >> > > > > > > > > to
> > > > > >>> > >> > > > > > > > > > > the
> > > > > >>> > >> > > > > > > > > > > > older clients that they can't
> consume
> > > the
> > > > > >>> > >> partitions,
> > > > > >>> > >> > as
> > > > > >>> > >> > > > > well.
> > > > > >>> > >> > > > > > > > > Perhaps
> > > > > >>> > >> > > > > > > > > > > we
> > > > > >>> > >> > > > > > > > > > > > can use the
> > > UNSUPPORTED_FOR_MESSAGE_FORMAT
> > > > > >>> error?
> > > > > >>> > >> > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > The license thing seems
> > straightforward
> > > --
> > > > > >>> it's
> > > > > >>> > >> just a
> > > > > >>> > >> > > > matter
> > > > > >>> > >> > > > > > of
> > > > > >>> > >> > > > > > > > > adding
> > > > > >>> > >> > > > > > > > > > > > the text to the right files as per
> ASF
> > > > > >>> guidelines.
> > > > > >>> > >> > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > With regard to the dictionary
> support,
> > > do
> > > > we
> > > > > >>> > really
> > > > > >>> > >> > need
> > > > > >>> > >> > > > that
> > > > > >>> > >> > > > > > in
> > > > > >>> > >> > > > > > > > the
> > > > > >>> > >> > > > > > > > > > > first
> > > > > >>> > >> > > > > > > > > > > > version?  Hopefully message batches
> > are
> > > > big
> > > > > >>> enough
> > > > > >>> > >> that
> > > > > >>> > >> > > > this
> > > > > >>> > >> > > > > > > isn't
> > > > > >>> > >> > > > > > > > > > > needed.
> > > > > >>> > >> > > > > > > > > > > > Some benchmarks might help here.
> > > > > >>> > >> > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > best,
> > > > > >>> > >> > > > > > > > > > > > Colin
> > > > > >>> > >> > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02,
> Dongjin
> > > Lee
> > > > > >>> wrote:
> > > > > >>> > >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's
> > > > reboot
> > > > > >>> this
> > > > > >>> > >> issue,
> > > > > >>> > >> > > > > KIP-110
> > > > > >>> > >> > > > > > > > > > > > > <https://cwiki.apache.org/
> > > > > >>> > >> > > confluence/display/KAFKA/KIP-
> > > > > >>> > >> > > > > > > > > > > > 110%3A+Add+Codec+for+
> > > > ZStandard+Compression>
> > > > > >>> > >> > > > > > > > > > > > > .
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > For newcomers, Here is some
> summary
> > of
> > > > the
> > > > > >>> > >> history:
> > > > > >>> > >> > > > KIP-110
> > > > > >>> > >> > > > > > was
> > > > > >>> > >> > > > > > > > > > > > originally
> > > > > >>> > >> > > > > > > > > > > > > worked for the issue KAFKA-4514
> but,
> > > it
> > > > > >>> lacked
> > > > > >>> > >> > > benchmark
> > > > > >>> > >> > > > > > > results
> > > > > >>> > >> > > > > > > > to
> > > > > >>> > >> > > > > > > > > > get
> > > > > >>> > >> > > > > > > > > > > > the
> > > > > >>> > >> > > > > > > > > > > > > agreement of the community. Later,
> > > Ivan
> > > > > >>> Babrou
> > > > > >>> > and
> > > > > >>> > >> > some
> > > > > >>> > >> > > > > other
> > > > > >>> > >> > > > > > > > users
> > > > > >>> > >> > > > > > > > > > who
> > > > > >>> > >> > > > > > > > > > > > > adopted the patch provided their
> > > > excellent
> > > > > >>> > >> > performance
> > > > > >>> > >> > > > > report
> > > > > >>> > >> > > > > > > > which
> > > > > >>> > >> > > > > > > > > > is
> > > > > >>> > >> > > > > > > > > > > > now
> > > > > >>> > >> > > > > > > > > > > > > included in the KIP, but it
> > postponed
> > > > > again
> > > > > >>> > >> because
> > > > > >>> > >> > of
> > > > > >>> > >> > > > the
> > > > > >>> > >> > > > > > > > > community
> > > > > >>> > >> > > > > > > > > > > was
> > > > > >>> > >> > > > > > > > > > > > > busy for 2.0.0 release. It is why
> I
> > > now
> > > > > >>> reboot
> > > > > >>> > >> this
> > > > > >>> > >> > > > issue.
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > The following is the current
> status
> > of
> > > > the
> > > > > >>> > >> feature:
> > > > > >>> > >> > You
> > > > > >>> > >> > > > can
> > > > > >>> > >> > > > > > > check
> > > > > >>> > >> > > > > > > > > the
> > > > > >>> > >> > > > > > > > > > > > > current draft implementation here
> > > > > >>> > >> > > > > > > > > > > > > <
> > > > > https://github.com/apache/kafka/pull/2267>.
> > > > > >>> It
> > > > > >>> > >> is
> > > > > >>> > >> > > based
> > > > > >>> > >> > > > > on
> > > > > >>> > >> > > > > > > zstd
> > > > > >>> > >> > > > > > > > > > 1.3.5
> > > > > >>> > >> > > > > > > > > > > > and
> > > > > >>> > >> > > > > > > > > > > > > periodically rebased onto the
> latest
> > > > > >>> trunk[^1].
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > The issues that should be
> addressed
> > is
> > > > > like
> > > > > >>> > >> > following:
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > *1. Backward Compatibility*
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > To support old consumers, we need
> to
> > > > take
> > > > > a
> > > > > >>> > >> strategy
> > > > > >>> > >> > to
> > > > > >>> > >> > > > > > handle
> > > > > >>> > >> > > > > > > > the
> > > > > >>> > >> > > > > > > > > > old
> > > > > >>> > >> > > > > > > > > > > > > consumers. Current candidates are:
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > - Bump API version
> > > > > >>> > >> > > > > > > > > > > > > - Leave unchanged: let the old
> > clients
> > > > > fail.
> > > > > >>> > >> > > > > > > > > > > > > - Improve the error messages:
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > *2. Dictionary Support*
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > To support zstd's dictionary
> feature
> > > in
> > > > > the
> > > > > >>> > future
> > > > > >>> > >> > (if
> > > > > >>> > >> > > > > > needed),
> > > > > >>> > >> > > > > > > > we
> > > > > >>> > >> > > > > > > > > > need
> > > > > >>> > >> > > > > > > > > > > > to
> > > > > >>> > >> > > > > > > > > > > > > sketch how it should be and leave
> > some
> > > > > room
> > > > > >>> for
> > > > > >>> > >> it.
> > > > > >>> > >> > As
> > > > > >>> > >> > > of
> > > > > >>> > >> > > > > > now,
> > > > > >>> > >> > > > > > > > > there
> > > > > >>> > >> > > > > > > > > > > has
> > > > > >>> > >> > > > > > > > > > > > > been no discussion on this topic
> > yet.
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > *3. License*
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > > > > > > > > > > To use this feature, we need to
> add
> > > > > license
> > > > > >>> of
> > > > > >>> > >> zstd
> > > > > >>> > >> > and
> > > > > >>> > >> > > > > > > zstd-jni
> > > > > >>> > >> > > > > > > > to
> > > > > >>> > >> > > > > > > > > > the
> > > > > >>> > >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi
> > for
> > > > > >>> raising
> > > > > >>> > >> this
> > > > > >>> > >> > > > issue!)
> > > > > >>> > >> > > > > > It
> > > > > >>> > >> > > > > > > > > seems
> > > > > >>> > >> > > > > > > > > > > like
> > > > > >>> > >> > > > > > > > > > > > > what Apache Spark did would be a
> > good
> > > > > >>> example
> > > > > >>> > but
> > > > > >>> > >> > there
> > > > > >>> > >> > > > has
> > > > > >>> > >> > > > > > > been
> > > > > >>> > >> > > > > > > > no
> > > > > >>> > >> > > > > > > > > > > > > discussion yet.
> > > > > >>> > >> > > > > > > > > > > > >
> > > > > >>> > >> > > > >



-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Ismael Juma <is...@juma.me.uk>.
Dongjin, can you please start a vote?

Ismael

On Sun, Sep 9, 2018 at 11:15 PM Dongjin Lee <do...@apache.org> wrote:

> Hi Jason,
>
> You are right. Explicit statements are always better. I updated the
> document following your suggestion.
>
> @Magnus
>
> Thanks for the inspection. It seems like a new error code is not a problem.
>
> Thanks,
> Dongjin
>
> On Fri, Sep 7, 2018 at 3:23 AM Jason Gustafson <ja...@confluent.io> wrote:
>
> > Hi Dongjin,
> >
> > The KIP looks good to me. I'd suggest starting a vote. A couple minor
> > points that might be worth calling out explicitly in the compatibility
> > section:
> >
> > 1. Zstd will only be allowed for the bumped produce API. For older
> > versions, we return UNSUPPORTED_COMPRESSION_TYPE regardless of the
> message
> > format.
> > 2. Down-conversion of zstd-compressed records will not be supported.
> > Instead we will return UNSUPPORTED_COMPRESSION_TYPE.
> >
> > Does that sound right?
> >
> > Thanks,
> > Jason
> >
> > On Thu, Sep 6, 2018 at 1:45 AM, Magnus Edenhill <ma...@edenhill.se>
> > wrote:
> >
> > > > Ismael wrote:
> > > > Jason, that's an interesting point regarding the Java client. Do we
> > know
> > > > what clients in other languages do in these cases?
> > >
> > > librdkafka (and its bindings) passes unknown/future errors through to
> the
> > > application, the error code remains intact while
> > > the error string will be set to something like "Err-123?", which isn't
> > very
> > > helpful to the user but it at least
> > > preserves the original error code for further troubleshooting.
> > > For the producer any unknown error returned in the ProduceResponse will
> > be
> > > considered a permanent delivery failure (no retries),
> > > and for the consumer any unknown FetchResponse errors will propagate
> > > directly to the application, trigger a fetch backoff, and then
> > > continue fetching past that offset.
> > >
> > > So, from the client's perspective it is not really a problem if new
> error
> > > codes are added to older API versions.
> > >
> > > /Magnus
> > >
> > >
> > > Den tors 6 sep. 2018 kl 09:45 skrev Dongjin Lee <do...@apache.org>:
> > >
> > > > I updated the KIP page
> > > > <
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > >
> > > > following the discussion here. Please take a look when you are free.
> > > > If you have any opinion, don't hesitate to give me a message.
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > On Fri, Aug 31, 2018 at 11:35 PM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >
> > > > > I just updated the draft implementation[^1], rebasing against the
> > > latest
> > > > > trunk and implementing error routine (i.e., Error code 74 for
> > > > > UnsupportedCompressionTypeException.) Since we decided to disallow
> > all
> > > > > fetch request below version 2.1.0 for the topics specifying
> > ZStandard,
> > > I
> > > > > added an error logic only.
> > > > >
> > > > > Please have a look when you are free.
> > > > >
> > > > > Thanks,
> > > > > Dongjin
> > > > >
> > > > > [^1]: Please check the last commit here:
> > > > > https://github.com/apache/kafka/pull/2267
> > > > >
> > > > > On Thu, Aug 23, 2018, 8:55 AM Dongjin Lee <do...@apache.org>
> > wrote:
> > > > >
> > > > >> Jason,
> > > > >>
> > > > >> Great. +1 for UNSUPPORTED_COMPRESSION_TYPE.
> > > > >>
> > > > >> Best,
> > > > >> Dongjin
> > > > >>
> > > > >> On Thu, Aug 23, 2018 at 8:19 AM Jason Gustafson <
> jason@confluent.io
> > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hey Dongjin,
> > > > >>>
> > > > >>> Yeah that's right. For what it's worth, librdkafka also appears
> to
> > > > handle
> > > > >>> unexpected error codes. I expect that most client implementations
> > > would
> > > > >>> either pass through the raw type or convert to an enum using
> > > something
> > > > >>> like
> > > > >>> what the java client does. Since we're expecting the client to
> fail
> > > > >>> anyway,
> > > > >>> I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE
> > error
> > > > >>> code.
> > > > >>>
> > > > >>> -Jason
> > > > >>>
> > > > >>> On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <dongjin@apache.org
> >
> > > > wrote:
> > > > >>>
> > > > >>> > Jason and Ismael,
> > > > >>> >
> > > > >>> > It seems like the only thing we need to regard if we define a
> new
> > > > error
> > > > >>> > code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the
> > > implementation
> > > > >>> of
> > > > >>> > the other language clients, right? At least, this strategy
> causes
> > > any
> > > > >>> > problem for Java client. Do I understand correctly?
> > > > >>> >
> > > > >>> > Thanks,
> > > > >>> > Dongjin
> > > > >>> >
> > > > >>> > On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <
> dongjin@apache.org>
> > > > >>> wrote:
> > > > >>> >
> > > > >>> > > Jason,
> > > > >>> > >
> > > > >>> > > > I think we would only use this error code when we /know/
> that
> > > > zstd
> > > > >>> was
> > > > >>> > > in use and the client doesn't support it? This is true if
> > either
> > > 1)
> > > > >>> the
> > > > >>> > > message needs down-conversion and we encounter a zstd
> > compressed
> > > > >>> message,
> > > > >>> > > or 2) if the topic is explicitly configured to use zstd.
> > > > >>> > >
> > > > >>> > > Yes, it is right. And you know, the case 1 includes 1.a) old
> > > > clients'
> > > > >>> > > request v0, v1 records or 1.b) implicit zstd, the compression
> > > type
> > > > of
> > > > >>> > > "producer" with Zstd compressed data.
> > > > >>> > >
> > > > >>> > > > However, if the compression type is set to "producer," then
> > the
> > > > >>> fetched
> > > > >>> > > data may or may not be compressed with zstd. In this case, we
> > > > return
> > > > >>> the
> > > > >>> > > data to the client and expect it to fail parsing. Is that
> > > correct?
> > > > >>> > >
> > > > >>> > > Exactly.
> > > > >>> > >
> > > > >>> > > Following your message, I reviewed the implementation of
> > > > >>> > > `KafkaApis#handleFetchRequest,` which handles the fetch
> > request.
> > > > And
> > > > >>> > found
> > > > >>> > > that the information we can use is like the following:
> > > > >>> > >
> > > > >>> > > 1. Client's fetch request version. (`versionId` variable)
> > > > >>> > > 2. Log's compression type. (`logConfig` variable)
> > > > >>> > >
> > > > >>> > > We can't detect the actual compression type of the data,
> unless
> > > we
> > > > >>> > inspect
> > > > >>> > > the `RecordBatch` included in the `Records` instance (i.e.,
> > > > >>> > > `unconvertedRecords` variable.) Since it requires some
> > > performance
> > > > >>> issue,
> > > > >>> > > it is not our option - in short, we can't be sure if given
> > chunks
> > > > of
> > > > >>> data
> > > > >>> > > are compressed with zstd or not.
> > > > >>> > >
> > > > >>> > > So, conclusion: we can return an error in the case of 1.a
> and 2
> > > > >>> easily,
> > > > >>> > > with the information above. In the case 1.b (implicit zstd),
> we
> > > can
> > > > >>> just
> > > > >>> > > return the data by do nothing special and expect it to fail
> > > > parsing.
> > > > >>> > >
> > > > >>> > > Thanks,
> > > > >>> > > Dongjin
> > > > >>> > >
> > > > >>> > > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <
> > ismaelj@gmail.com>
> > > > >>> wrote:
> > > > >>> > >
> > > > >>> > >> Jason, that's an interesting point regarding the Java
> client.
> > Do
> > > > we
> > > > >>> know
> > > > >>> > >> what clients in other languages do in these cases?
> > > > >>> > >>
> > > > >>> > >> Ismael
> > > > >>> > >>
> > > > >>> > >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <
> > jason@confluent.io
> > > >
> > > > >>> wrote:
> > > > >>> > >>
> > > > >>> > >> > Hi Dongjin,
> > > > >>> > >> >
> > > > >>> > >> > One of the complications is that old versions of the API
> > will
> > > > not
> > > > >>> > >> expect a
> > > > >>> > >> > new error code. However, since we expect this to be a
> fatal
> > > > error
> > > > >>> > anyway
> > > > >>> > >> > for old clients, it may still be more useful to return the
> > > > correct
> > > > >>> > error
> > > > >>> > >> > code. For example, the Kafka clients use the following
> code
> > to
> > > > >>> convert
> > > > >>> > >> the
> > > > >>> > >> > error code:
> > > > >>> > >> >
> > > > >>> > >> >     public static Errors forCode(short code) {
> > > > >>> > >> >         Errors error = codeToError.get(code);
> > > > >>> > >> >         if (error != null) {
> > > > >>> > >> >             return error;
> > > > >>> > >> >         } else {
> > > > >>> > >> >             log.warn("Unexpected error code: {}.", code);
> > > > >>> > >> >             return UNKNOWN_SERVER_ERROR;
> > > > >>> > >> >         }
> > > > >>> > >> >     }
> > > > >>> > >> >
> > > > >>> > >> > If we return an unsupported error code, it will be
> converted
> > > to
> > > > an
> > > > >>> > >> UNKNOWN
> > > > >>> > >> > error, but at least we will get the message in the log
> with
> > > the
> > > > >>> > correct
> > > > >>> > >> > code. That seems preferable to returning a misleading
> error
> > > > code.
> > > > >>> So I
> > > > >>> > >> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE
> > > error
> > > > >>> even
> > > > >>> > for
> > > > >>> > >> > older versions.
> > > > >>> > >> >
> > > > >>> > >> > Also, one question just to check my understanding. I think
> > we
> > > > >>> would
> > > > >>> > only
> > > > >>> > >> > use this error code when we /know/ that zstd was in use
> and
> > > the
> > > > >>> client
> > > > >>> > >> > doesn't support it? This is true if either 1) the message
> > > needs
> > > > >>> > >> > down-conversion and we encounter a zstd compressed
> message,
> > or
> > > > 2)
> > > > >>> if
> > > > >>> > the
> > > > >>> > >> > topic is explicitly configured to use zstd. However, if
> the
> > > > >>> > compression
> > > > >>> > >> > type is set to "producer," then the fetched data may or
> may
> > > not
> > > > be
> > > > >>> > >> > compressed with zstd. In this case, we return the data to
> > the
> > > > >>> client
> > > > >>> > and
> > > > >>> > >> > expect it to fail parsing. Is that correct?
> > > > >>> > >> >
> > > > >>> > >> > Thanks,
> > > > >>> > >> > Jason
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> >
> > > > >>> > >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <
> > > > dongjin@apache.org>
> > > > >>> > >> wrote:
> > > > >>> > >> >
> > > > >>> > >> > > Ismael, Jason and all,
> > > > >>> > >> > >
> > > > >>> > >> > > I rewrote the backward compatibility strategy & its
> > > > alternatives
> > > > >>> > like
> > > > >>> > >> > > following, based on Ismael & Jason's comments. Since it
> is
> > > not
> > > > >>> > >> updated to
> > > > >>> > >> > > the wiki yet, don't hesitate to give me a message if you
> > > have
> > > > >>> any
> > > > >>> > >> opinion
> > > > >>> > >> > > on it.
> > > > >>> > >> > >
> > > > >>> > >> > > ```
> > > > >>> > >> > > *Backward Compatibility*
> > > > >>> > >> > >
> > > > >>> > >> > > We need to establish some backward-compatibility
> strategy
> > > for
> > > > >>> the
> > > > >>> > >> case an
> > > > >>> > >> > > old client subscribes a topic using ZStandard implicitly
> > > > (i.e.,
> > > > >>> > >> > > 'compression.type' configuration of given topic is
> > > 'producer'
> > > > >>> and
> > > > >>> > the
> > > > >>> > >> > > producer compressed the records with ZStandard). We have
> > the
> > > > >>> > following
> > > > >>> > >> > > options for this situation:
> > > > >>> > >> > >
> > > > >>> > >> > > *A. Support ZStandard to the old clients which can
> > > understand
> > > > >>> v0, v1
> > > > >>> > >> > > messages only.*
> > > > >>> > >> > >
> > > > >>> > >> > > This strategy necessarily requires the down-conversion
> of
> > v2
> > > > >>> message
> > > > >>> > >> > > compressed with Zstandard into v0 or v1 messages, which
> > > means
> > > > a
> > > > >>> > >> > > considerable performance degradation. So we rejected
> this
> > > > >>> strategy.
> > > > >>> > >> > >
> > > > >>> > >> > > *B. Bump the API version and support only v2-available
> > > > clients*
> > > > >>> > >> > >
> > > > >>> > >> > > With this approach, we can message the old clients that
> > they
> > > > >>> are old
> > > > >>> > >> and
> > > > >>> > >> > > should be upgraded. However, there are still several
> > options
> > > > >>> for the
> > > > >>> > >> > Error
> > > > >>> > >> > > code.
> > > > >>> > >> > >
> > > > >>> > >> > > *B.1. INVALID_REQUEST (42)*
> > > > >>> > >> > >
> > > > >>> > >> > > This option gives the client so little information; the
> > user
> > > > >>> can be
> > > > >>> > >> > > confused about why the client worked correctly in the
> past
> > > > >>> suddenly
> > > > >>> > >> > > encounters a problem. So we rejected this strategy.
> > > > >>> > >> > >
> > > > >>> > >> > > *B.2. CORRUPT_MESSAGE (2)*
> > > > >>> > >> > >
> > > > >>> > >> > > This option gives inaccurate information; the user can
> be
> > > > >>> surprised
> > > > >>> > >> and
> > > > >>> > >> > > misunderstand that the log files are broken in some way.
> > So
> > > we
> > > > >>> > >> rejected
> > > > >>> > >> > > this strategy.
> > > > >>> > >> > >
> > > > >>> > >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
> > > > >>> > >> > >
> > > > >>> > >> > > The advantage of this approach is we don't need to
> define
> > a
> > > > new
> > > > >>> > error
> > > > >>> > >> > code;
> > > > >>> > >> > > we can reuse it and that's all.
> > > > >>> > >> > >
> > > > >>> > >> > > The disadvantage of this approach is that it is also a
> > > little
> > > > >>> bit
> > > > >>> > >> vague;
> > > > >>> > >> > > This error code is defined as a work for KIP-98[^1] and
> > now
> > > > >>> returned
> > > > >>> > >> in
> > > > >>> > >> > the
> > > > >>> > >> > > transaction error.
> > > > >>> > >> > >
> > > > >>> > >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
> > > > >>> > >> > >
> > > > >>> > >> > > The advantage of this approach is that it is clear and
> > > > provides
> > > > >>> an
> > > > >>> > >> exact
> > > > >>> > >> > > description. The disadvantage is we need to add a new
> > error
> > > > >>> code.
> > > > >>> > >> > > ```
> > > > >>> > >> > >
> > > > >>> > >> > > *It seems like what we need to choose is now so clear:
> > > > >>> > >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or
> > > > >>> UNSUPPORTED_COMPRESSION_TYPE
> > > > >>> > >> > > (B.4).*
> > > > >>> > >> > > The first one doesn't need a new error message but the
> > > latter
> > > > is
> > > > >>> > more
> > > > >>> > >> > > explicit. Which one do you prefer? Since all of you have
> > > much
> > > > >>> more
> > > > >>> > >> > > experience and knowledge than me, I will follow your
> > > decision.
> > > > >>> The
> > > > >>> > >> wiki
> > > > >>> > >> > > page will be updated following the decision also.
> > > > >>> > >> > >
> > > > >>> > >> > > Best,
> > > > >>> > >> > > Dongjin
> > > > >>> > >> > >
> > > > >>> > >> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
> > > > >>> > >> > >
> > > > >>> > >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <
> > > > ismaelj@gmail.com>
> > > > >>> > >> wrote:
> > > > >>> > >> > >
> > > > >>> > >> > > > Sounds reasonable to me.
> > > > >>> > >> > > >
> > > > >>> > >> > > > Ismael
> > > > >>> > >> > > >
> > > > >>> > >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <
> > > > >>> jason@confluent.io>
> > > > >>> > >> > wrote:
> > > > >>> > >> > > >
> > > > >>> > >> > > > > Hey Ismael,
> > > > >>> > >> > > > >
> > > > >>> > >> > > > > Your summary looks good to me. I think it might also
> > be
> > > a
> > > > >>> good
> > > > >>> > >> idea
> > > > >>> > >> > to
> > > > >>> > >> > > > add
> > > > >>> > >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go
> > > along
> > > > >>> with
> > > > >>> > the
> > > > >>> > >> > > > version
> > > > >>> > >> > > > > bumps. We won't be able to use it for old api
> versions
> > > > >>> since the
> > > > >>> > >> > > clients
> > > > >>> > >> > > > > will not understand it, but we can use it going
> > forward
> > > so
> > > > >>> that
> > > > >>> > >> we're
> > > > >>> > >> > > not
> > > > >>> > >> > > > > stuck in a similar situation with a new message
> format
> > > and
> > > > >>> a new
> > > > >>> > >> > codec
> > > > >>> > >> > > to
> > > > >>> > >> > > > > support. Another option is to use
> > > UNSUPPORTED_FOR_MESSAGE_
> > > > >>> > FORMAT,
> > > > >>> > >> but
> > > > >>> > >> > > it
> > > > >>> > >> > > > is
> > > > >>> > >> > > > > not as explicit.
> > > > >>> > >> > > > >
> > > > >>> > >> > > > > -Jason
> > > > >>> > >> > > > >
> > > > >>> > >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <
> > > > >>> ismael@juma.me.uk
> > > > >>> > >
> > > > >>> > >> > > wrote:
> > > > >>> > >> > > > >
> > > > >>> > >> > > > > > Hi Dongjin and Jason,
> > > > >>> > >> > > > > >
> > > > >>> > >> > > > > > I would agree. My summary:
> > > > >>> > >> > > > > >
> > > > >>> > >> > > > > > 1. Support zstd with message format 2 only.
> > > > >>> > >> > > > > > 2. Bump produce and fetch request versions.
> > > > >>> > >> > > > > > 3. Provide broker errors whenever possible based
> on
> > > the
> > > > >>> > request
> > > > >>> > >> > > version
> > > > >>> > >> > > > > and
> > > > >>> > >> > > > > > rely on clients for the cases where the broker
> can't
> > > > >>> validate
> > > > >>> > >> > > > efficiently
> > > > >>> > >> > > > > > (example message format 2 consumer that supports
> the
> > > > >>> latest
> > > > >>> > >> fetch
> > > > >>> > >> > > > version
> > > > >>> > >> > > > > > but doesn't support zstd).
> > > > >>> > >> > > > > >
> > > > >>> > >> > > > > > If there's general agreement on this, I suggest we
> > > > update
> > > > >>> the
> > > > >>> > >> KIP
> > > > >>> > >> > to
> > > > >>> > >> > > > > state
> > > > >>> > >> > > > > > the proposal and to move the rejected options to
> its
> > > own
> > > > >>> > >> section.
> > > > >>> > >> > And
> > > > >>> > >> > > > > then
> > > > >>> > >> > > > > > start a vote!
> > > > >>> > >> > > > > >
> > > > >>> > >> > > > > > Ismael
> > > > >>> > >> > > > > >
> > > > >>> > >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
> > > > >>> > >> > jason@confluent.io>
> > > > >>> > >> > > > > > wrote:
> > > > >>> > >> > > > > >
> > > > >>> > >> > > > > > > Hi Dongjin,
> > > > >>> > >> > > > > > >
> > > > >>> > >> > > > > > > Yes, that's a good summary. For clients which
> > > support
> > > > >>> v2,
> > > > >>> > the
> > > > >>> > >> > > client
> > > > >>> > >> > > > > can
> > > > >>> > >> > > > > > > parse the message format and hopefully raise a
> > > useful
> > > > >>> error
> > > > >>> > >> > message
> > > > >>> > >> > > > > > > indicating the unsupported compression type. For
> > > older
> > > > >>> > >> clients,
> > > > >>> > >> > our
> > > > >>> > >> > > > > > options
> > > > >>> > >> > > > > > > are probably (1) to down-convert to the old
> format
> > > > >>> using no
> > > > >>> > >> > > > compression
> > > > >>> > >> > > > > > > type, or (2) to return an error code. I'm
> leaning
> > > > >>> toward the
> > > > >>> > >> > latter
> > > > >>> > >> > > > as
> > > > >>> > >> > > > > > the
> > > > >>> > >> > > > > > > simpler solution, but the challenge is finding a
> > > good
> > > > >>> error
> > > > >>> > >> code.
> > > > >>> > >> > > Two
> > > > >>> > >> > > > > > > possibilities might be INVALID_REQUEST or
> > > > >>> CORRUPT_MESSAGE.
> > > > >>> > The
> > > > >>> > >> > > > downside
> > > > >>> > >> > > > > > is
> > > > >>> > >> > > > > > > that old clients probably won't get a helpful
> > > message.
> > > > >>> > >> However,
> > > > >>> > >> > at
> > > > >>> > >> > > > > least
> > > > >>> > >> > > > > > > the behavior will be consistent in the sense
> that
> > > all
> > > > >>> > clients
> > > > >>> > >> > will
> > > > >>> > >> > > > fail
> > > > >>> > >> > > > > > if
> > > > >>> > >> > > > > > > they do not support zstandard.
> > > > >>> > >> > > > > > >
> > > > >>> > >> > > > > > > What do you think?
> > > > >>> > >> > > > > > >
> > > > >>> > >> > > > > > > Thanks,
> > > > >>> > >> > > > > > > Jason
> > > > >>> > >> > > > > > >
> > > > >>> > >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
> > > > >>> > >> dongjin@apache.org
> > > > >>> > >> > >
> > > > >>> > >> > > > > wrote:
> > > > >>> > >> > > > > > >
> > > > >>> > >> > > > > > > > Thanks Jason, I reviewed the down-converting
> > logic
> > > > >>> > following
> > > > >>> > >> > your
> > > > >>> > >> > > > > > > > explanation.[^1] You mean the following
> > routines,
> > > > >>> right?
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > -
> > > > >>> > >> > > > > > > >
> > > > https://github.com/apache/kafka/blob/trunk/core/src/
> > > > >>> > >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> > > > >>> > >> > > > > > > > -
> > > > >>> > >> > > > > > > >
> > > > >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > >>> > >> > > > > > > > main/java/org/apache/kafka/common/record/
> > > > >>> > >> > > LazyDownConversionRecords.
> > > > >>> > >> > > > > > > > java#L165
> > > > >>> > >> > > > > > > > -
> > > > >>> > >> > > > > > > >
> > > > >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > >>> > >> > > > > > > >
> > > > >>> > >>
> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > It seems like your stance is like following:
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > 1. In principle, Kafka does not change the
> > > > compression
> > > > >>> > codec
> > > > >>> > >> > when
> > > > >>> > >> > > > > > > > down-converting, since it requires inspecting
> > the
> > > > >>> fetched
> > > > >>> > >> data,
> > > > >>> > >> > > > which
> > > > >>> > >> > > > > > is
> > > > >>> > >> > > > > > > > expensive.
> > > > >>> > >> > > > > > > > 2. However, there are some cases the fetched
> > data
> > > is
> > > > >>> > >> inspected
> > > > >>> > >> > > > > anyway.
> > > > >>> > >> > > > > > In
> > > > >>> > >> > > > > > > > this case, we can provide compression
> conversion
> > > > from
> > > > >>> > >> Zstandard
> > > > >>> > >> > > to
> > > > >>> > >> > > > > > > > classical ones[^2].
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > And from what I understand, the cases where
> the
> > > > client
> > > > >>> > >> without
> > > > >>> > >> > > > > > ZStandard
> > > > >>> > >> > > > > > > > support receives ZStandard compressed records
> > can
> > > be
> > > > >>> > >> organized
> > > > >>> > >> > > into
> > > > >>> > >> > > > > two
> > > > >>> > >> > > > > > > > cases:
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > a. The 'compression.type' configuration of
> given
> > > > >>> topic is
> > > > >>> > >> > > > 'producer'
> > > > >>> > >> > > > > > and
> > > > >>> > >> > > > > > > > the producer compressed the records with
> > > ZStandard.
> > > > >>> (that
> > > > >>> > >> is,
> > > > >>> > >> > > using
> > > > >>> > >> > > > > > > > ZStandard implicitly.)
> > > > >>> > >> > > > > > > > b.  The 'compression.type' configuration of
> > given
> > > > >>> topic is
> > > > >>> > >> > > 'zstd';
> > > > >>> > >> > > > > that
> > > > >>> > >> > > > > > > is,
> > > > >>> > >> > > > > > > > using ZStandard explicitly.
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > As you stated, we don't have to handle the
> case
> > b
> > > > >>> > specially.
> > > > >>> > >> > So,
> > > > >>> > >> > > It
> > > > >>> > >> > > > > > seems
> > > > >>> > >> > > > > > > > like we can narrow the focus of the problem by
> > > > joining
> > > > >>> > case
> > > > >>> > >> 1
> > > > >>> > >> > and
> > > > >>> > >> > > > > case
> > > > >>> > >> > > > > > b
> > > > >>> > >> > > > > > > > like the following:
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > > Given the topic with 'producer' as its
> > > > >>> > 'compression.type'
> > > > >>> > >> > > > > > > configuration,
> > > > >>> > >> > > > > > > > ZStandard compressed records and old client
> > > without
> > > > >>> > >> ZStandard,
> > > > >>> > >> > is
> > > > >>> > >> > > > > there
> > > > >>> > >> > > > > > > any
> > > > >>> > >> > > > > > > > case we need to inspect the records and can
> > change
> > > > the
> > > > >>> > >> > > compression
> > > > >>> > >> > > > > > type?
> > > > >>> > >> > > > > > > If
> > > > >>> > >> > > > > > > > so, can we provide compression type
> converting?
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > Do I understand correctly?
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > Best,
> > > > >>> > >> > > > > > > > Dongjin
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > [^1]: I'm sorry, I found that I was a little
> bit
> > > > >>> > >> > misunderstanding
> > > > >>> > >> > > > how
> > > > >>> > >> > > > > > API
> > > > >>> > >> > > > > > > > version works, after reviewing the downconvert
> > > logic
> > > > >>> & the
> > > > >>> > >> > > protocol
> > > > >>> > >> > > > > > > > documentation <
> > https://kafka.apache.org/protocol
> > > >.
> > > > >>> > >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason
> Gustafson
> > <
> > > > >>> > >> > > > jason@confluent.io>
> > > > >>> > >> > > > > > > > wrote:
> > > > >>> > >> > > > > > > >
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > But in my opinion, since the client will
> > fail
> > > > >>> with the
> > > > >>> > >> API
> > > > >>> > >> > > > > version,
> > > > >>> > >> > > > > > > so
> > > > >>> > >> > > > > > > > we
> > > > >>> > >> > > > > > > > > > don't need to down-convert the messages
> > > anyway.
> > > > >>> Isn't
> > > > >>> > >> it?
> > > > >>> > >> > > So, I
> > > > >>> > >> > > > > > think
> > > > >>> > >> > > > > > > > we
> > > > >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I
> am
> > > not
> > > > >>> > >> familiar
> > > > >>> > >> > > with
> > > > >>> > >> > > > > > > > > down-convert
> > > > >>> > >> > > > > > > > > > logic.)
> > > > >>> > >> > > > > > > > >
> > > > >>> > >> > > > > > > > >
> > > > >>> > >> > > > > > > > > Currently the broker down-converts
> > automatically
> > > > >>> when it
> > > > >>> > >> > > receives
> > > > >>> > >> > > > > an
> > > > >>> > >> > > > > > > old
> > > > >>> > >> > > > > > > > > version of the fetch request (a version
> which
> > is
> > > > >>> known
> > > > >>> > to
> > > > >>> > >> > > predate
> > > > >>> > >> > > > > the
> > > > >>> > >> > > > > > > > > message format in use). Typically when
> > > > >>> down-converting
> > > > >>> > the
> > > > >>> > >> > > > message
> > > > >>> > >> > > > > > > > format,
> > > > >>> > >> > > > > > > > > we use the same compression type, but there
> is
> > > not
> > > > >>> much
> > > > >>> > >> point
> > > > >>> > >> > > in
> > > > >>> > >> > > > > > doing
> > > > >>> > >> > > > > > > so
> > > > >>> > >> > > > > > > > > when we know the client doesn't support it.
> So
> > > if
> > > > >>> > >> zstandard
> > > > >>> > >> > is
> > > > >>> > >> > > in
> > > > >>> > >> > > > > > use,
> > > > >>> > >> > > > > > > > and
> > > > >>> > >> > > > > > > > > we have to down-convert anyway, then we can
> > > choose
> > > > >>> to
> > > > >>> > use
> > > > >>> > >> a
> > > > >>> > >> > > > > different
> > > > >>> > >> > > > > > > > > compression type or no compression type.
> > > > >>> > >> > > > > > > > >
> > > > >>> > >> > > > > > > > > From my perspective, there is no significant
> > > > >>> downside to
> > > > >>> > >> > > bumping
> > > > >>> > >> > > > > the
> > > > >>> > >> > > > > > > > > protocol version and it has several
> potential
> > > > >>> benefits.
> > > > >>> > >> > Version
> > > > >>> > >> > > > > bumps
> > > > >>> > >> > > > > > > are
> > > > >>> > >> > > > > > > > > cheap. The main question mark in my mind is
> > > about
> > > > >>> > >> > > > down-conversion.
> > > > >>> > >> > > > > > > > Figuring
> > > > >>> > >> > > > > > > > > out whether down-conversion is needed is
> hard
> > > > >>> generally
> > > > >>> > >> > without
> > > > >>> > >> > > > > > > > inspecting
> > > > >>> > >> > > > > > > > > the fetched data, which is expensive. I
> think
> > we
> > > > >>> agree
> > > > >>> > in
> > > > >>> > >> > > > principle
> > > > >>> > >> > > > > > > that
> > > > >>> > >> > > > > > > > we
> > > > >>> > >> > > > > > > > > do not want to have to pay this cost
> generally
> > > and
> > > > >>> > prefer
> > > > >>> > >> the
> > > > >>> > >> > > > > clients
> > > > >>> > >> > > > > > > to
> > > > >>> > >> > > > > > > > > fail when they see an unhandled compression
> > > type.
> > > > >>> The
> > > > >>> > >> point I
> > > > >>> > >> > > was
> > > > >>> > >> > > > > > > making
> > > > >>> > >> > > > > > > > is
> > > > >>> > >> > > > > > > > > that there are some cases where we are
> either
> > > > >>> inspecting
> > > > >>> > >> the
> > > > >>> > >> > > data
> > > > >>> > >> > > > > > > anyway
> > > > >>> > >> > > > > > > > > (because we have to down-convert the message
> > > > >>> format), or
> > > > >>> > >> we
> > > > >>> > >> > > have
> > > > >>> > >> > > > an
> > > > >>> > >> > > > > > > easy
> > > > >>> > >> > > > > > > > > way to tell whether zstandard is in use (the
> > > topic
> > > > >>> has
> > > > >>> > it
> > > > >>> > >> > > > > configured
> > > > >>> > >> > > > > > > > > explicitly). In the latter case, we don't
> have
> > > to
> > > > >>> handle
> > > > >>> > >> it
> > > > >>> > >> > > > > > specially.
> > > > >>> > >> > > > > > > > But
> > > > >>> > >> > > > > > > > > we do have to decide how we will handle
> > > > >>> down-conversion
> > > > >>> > to
> > > > >>> > >> > > older
> > > > >>> > >> > > > > > > formats.
> > > > >>> > >> > > > > > > > >
> > > > >>> > >> > > > > > > > > -Jason
> > > > >>> > >> > > > > > > > >
> > > > >>> > >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin
> Lee <
> > > > >>> > >> > > dongjin@apache.org
> > > > >>> > >> > > > >
> > > > >>> > >> > > > > > > wrote:
> > > > >>> > >> > > > > > > > >
> > > > >>> > >> > > > > > > > > > Colin and Jason,
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > Thanks for your opinions. In summarizing,
> > the
> > > > >>> Pros and
> > > > >>> > >> Cons
> > > > >>> > >> > > of
> > > > >>> > >> > > > > > > bumping
> > > > >>> > >> > > > > > > > > > fetch API version are:
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > Cons:
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > - The Broker can't know whether a given
> > > message
> > > > >>> batch
> > > > >>> > is
> > > > >>> > >> > > > > compressed
> > > > >>> > >> > > > > > > > with
> > > > >>> > >> > > > > > > > > > zstd or not.
> > > > >>> > >> > > > > > > > > > - Need some additional logic for the topic
> > > > >>> explicitly
> > > > >>> > >> > > > configured
> > > > >>> > >> > > > > to
> > > > >>> > >> > > > > > > use
> > > > >>> > >> > > > > > > > > > zstd.
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > Pros:
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > - The broker doesn't need to conduct
> > expensive
> > > > >>> > >> > > down-conversion.
> > > > >>> > >> > > > > > > > > > - Can message the users to update their
> > > client.
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > So, opinions for the
> backward-compatibility
> > > > >>> policy by
> > > > >>> > >> far:
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > - A: bump the API version - +2 (Colin,
> > Jason)
> > > > >>> > >> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > Here are my additional comments:
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > @Colin
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > I greatly appreciate your response. In the
> > > case
> > > > >>> of the
> > > > >>> > >> > > > dictionary
> > > > >>> > >> > > > > > > > > support,
> > > > >>> > >> > > > > > > > > > of course, this issue should be addressed
> > > later
> > > > >>> so we
> > > > >>> > >> don't
> > > > >>> > >> > > > need
> > > > >>> > >> > > > > it
> > > > >>> > >> > > > > > > in
> > > > >>> > >> > > > > > > > > the
> > > > >>> > >> > > > > > > > > > first version. You are right - it is not
> > late
> > > to
> > > > >>> try
> > > > >>> > it
> > > > >>> > >> > after
> > > > >>> > >> > > > > some
> > > > >>> > >> > > > > > > > > > benchmarks. What I mean is, we should keep
> > in
> > > > >>> mind on
> > > > >>> > >> that
> > > > >>> > >> > > > > > potential
> > > > >>> > >> > > > > > > > > > feature.
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > @Jason
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > You wrote,
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > Similarly, if we have to down-convert
> > anyway
> > > > >>> because
> > > > >>> > >> the
> > > > >>> > >> > > > client
> > > > >>> > >> > > > > > > does
> > > > >>> > >> > > > > > > > > not
> > > > >>> > >> > > > > > > > > > understand the message format, then we
> could
> > > > also
> > > > >>> use
> > > > >>> > a
> > > > >>> > >> > > > different
> > > > >>> > >> > > > > > > > > > compression type.
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > But in my opinion, since the client will
> > fail
> > > > >>> with the
> > > > >>> > >> API
> > > > >>> > >> > > > > version,
> > > > >>> > >> > > > > > > so
> > > > >>> > >> > > > > > > > we
> > > > >>> > >> > > > > > > > > > don't need to down-convert the messages
> > > anyway.
> > > > >>> Isn't
> > > > >>> > >> it?
> > > > >>> > >> > > So, I
> > > > >>> > >> > > > > > think
> > > > >>> > >> > > > > > > > we
> > > > >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I
> am
> > > not
> > > > >>> > >> familiar
> > > > >>> > >> > > with
> > > > >>> > >> > > > > > > > > down-convert
> > > > >>> > >> > > > > > > > > > logic.)
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > Please give more opinions. Thanks!
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > - Dongjin
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason
> > > Gustafson <
> > > > >>> > >> > > > > jason@confluent.io
> > > > >>> > >> > > > > > >
> > > > >>> > >> > > > > > > > > wrote:
> > > > >>> > >> > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > Hey Colin,
> > > > >>> > >> > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > The problem for the fetch API is that
> the
> > > > broker
> > > > >>> > does
> > > > >>> > >> not
> > > > >>> > >> > > > > > generally
> > > > >>> > >> > > > > > > > > know
> > > > >>> > >> > > > > > > > > > if
> > > > >>> > >> > > > > > > > > > > a batch was compressed with zstd unless
> it
> > > > >>> parses
> > > > >>> > it.
> > > > >>> > >> I
> > > > >>> > >> > > think
> > > > >>> > >> > > > > the
> > > > >>> > >> > > > > > > > goal
> > > > >>> > >> > > > > > > > > > here
> > > > >>> > >> > > > > > > > > > > is to avoid the expensive
> down-conversion
> > > that
> > > > >>> is
> > > > >>> > >> needed
> > > > >>> > >> > to
> > > > >>> > >> > > > > > ensure
> > > > >>> > >> > > > > > > > > > > compatibility because it is only
> necessary
> > > if
> > > > >>> zstd
> > > > >>> > is
> > > > >>> > >> > > > actually
> > > > >>> > >> > > > > in
> > > > >>> > >> > > > > > > > use.
> > > > >>> > >> > > > > > > > > > But
> > > > >>> > >> > > > > > > > > > > as long as old clients can parse the
> > message
> > > > >>> format,
> > > > >>> > >> they
> > > > >>> > >> > > > > should
> > > > >>> > >> > > > > > > get
> > > > >>> > >> > > > > > > > a
> > > > >>> > >> > > > > > > > > > > reasonable error if they see an
> > unsupported
> > > > >>> > >> compression
> > > > >>> > >> > > type
> > > > >>> > >> > > > in
> > > > >>> > >> > > > > > the
> > > > >>> > >> > > > > > > > > > > attributes. Basically the onus is on
> users
> > > to
> > > > >>> ensure
> > > > >>> > >> that
> > > > >>> > >> > > > their
> > > > >>> > >> > > > > > > > > consumers
> > > > >>> > >> > > > > > > > > > > have been updated prior to using zstd.
> It
> > > > seems
> > > > >>> > like a
> > > > >>> > >> > > > > reasonable
> > > > >>> > >> > > > > > > > > > tradeoff
> > > > >>> > >> > > > > > > > > > > to me. There are a couple cases that
> might
> > > be
> > > > >>> worth
> > > > >>> > >> > > thinking
> > > > >>> > >> > > > > > > through:
> > > > >>> > >> > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > 1. If a topic is explicitly configured
> to
> > > use
> > > > >>> zstd,
> > > > >>> > >> then
> > > > >>> > >> > we
> > > > >>> > >> > > > > don't
> > > > >>> > >> > > > > > > > need
> > > > >>> > >> > > > > > > > > to
> > > > >>> > >> > > > > > > > > > > check the fetched data for the
> compression
> > > > type
> > > > >>> to
> > > > >>> > >> know
> > > > >>> > >> > if
> > > > >>> > >> > > we
> > > > >>> > >> > > > > > need
> > > > >>> > >> > > > > > > > > > > down-conversion. If we did bump the
> Fetch
> > > API
> > > > >>> > version,
> > > > >>> > >> > then
> > > > >>> > >> > > > we
> > > > >>> > >> > > > > > > could
> > > > >>> > >> > > > > > > > > > handle
> > > > >>> > >> > > > > > > > > > > this case by either down-converting
> using
> > a
> > > > >>> > different
> > > > >>> > >> > > > > compression
> > > > >>> > >> > > > > > > > type
> > > > >>> > >> > > > > > > > > or
> > > > >>> > >> > > > > > > > > > > returning an error.
> > > > >>> > >> > > > > > > > > > > 2. Similarly, if we have to down-convert
> > > > anyway
> > > > >>> > >> because
> > > > >>> > >> > the
> > > > >>> > >> > > > > > client
> > > > >>> > >> > > > > > > > does
> > > > >>> > >> > > > > > > > > > not
> > > > >>> > >> > > > > > > > > > > understand the message format, then we
> > could
> > > > >>> also
> > > > >>> > use
> > > > >>> > >> a
> > > > >>> > >> > > > > different
> > > > >>> > >> > > > > > > > > > > compression type.
> > > > >>> > >> > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > For the produce API, I think it's
> > reasonable
> > > > to
> > > > >>> bump
> > > > >>> > >> the
> > > > >>> > >> > > api
> > > > >>> > >> > > > > > > version.
> > > > >>> > >> > > > > > > > > > This
> > > > >>> > >> > > > > > > > > > > can be used by clients to check whether
> a
> > > > broker
> > > > >>> > >> supports
> > > > >>> > >> > > > zstd.
> > > > >>> > >> > > > > > For
> > > > >>> > >> > > > > > > > > > > example, we might support a list of
> > > preferred
> > > > >>> > >> compression
> > > > >>> > >> > > > types
> > > > >>> > >> > > > > > in
> > > > >>> > >> > > > > > > > the
> > > > >>> > >> > > > > > > > > > > producer and we could use the broker to
> > > detect
> > > > >>> which
> > > > >>> > >> > > version
> > > > >>> > >> > > > to
> > > > >>> > >> > > > > > > use.
> > > > >>> > >> > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > -Jason
> > > > >>> > >> > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin
> > > McCabe <
> > > > >>> > >> > > > > cmccabe@apache.org
> > > > >>> > >> > > > > > >
> > > > >>> > >> > > > > > > > > wrote:
> > > > >>> > >> > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > Thanks for bumping this, Dongjin.
> ZStd
> > > is a
> > > > >>> good
> > > > >>> > >> > > > compression
> > > > >>> > >> > > > > > > codec
> > > > >>> > >> > > > > > > > > > and I
> > > > >>> > >> > > > > > > > > > > > hope we can get this support in soon!
> > > > >>> > >> > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > I would say we can just bump the API
> > > version
> > > > >>> to
> > > > >>> > >> > indicate
> > > > >>> > >> > > > that
> > > > >>> > >> > > > > > > ZStd
> > > > >>> > >> > > > > > > > > > > support
> > > > >>> > >> > > > > > > > > > > > is expected in new clients.  We
> probably
> > > > need
> > > > >>> some
> > > > >>> > >> way
> > > > >>> > >> > of
> > > > >>> > >> > > > > > > > indicating
> > > > >>> > >> > > > > > > > > to
> > > > >>> > >> > > > > > > > > > > the
> > > > >>> > >> > > > > > > > > > > > older clients that they can't consume
> > the
> > > > >>> > >> partitions,
> > > > >>> > >> > as
> > > > >>> > >> > > > > well.
> > > > >>> > >> > > > > > > > > Perhaps
> > > > >>> > >> > > > > > > > > > > we
> > > > >>> > >> > > > > > > > > > > > can use the
> > UNSUPPORTED_FOR_MESSAGE_FORMAT
> > > > >>> error?
> > > > >>> > >> > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > The license thing seems
> straightforward
> > --
> > > > >>> it's
> > > > >>> > >> just a
> > > > >>> > >> > > > matter
> > > > >>> > >> > > > > > of
> > > > >>> > >> > > > > > > > > adding
> > > > >>> > >> > > > > > > > > > > > the text to the right files as per ASF
> > > > >>> guidelines.
> > > > >>> > >> > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > With regard to the dictionary support,
> > do
> > > we
> > > > >>> > really
> > > > >>> > >> > need
> > > > >>> > >> > > > that
> > > > >>> > >> > > > > > in
> > > > >>> > >> > > > > > > > the
> > > > >>> > >> > > > > > > > > > > first
> > > > >>> > >> > > > > > > > > > > > version?  Hopefully message batches
> are
> > > big
> > > > >>> enough
> > > > >>> > >> that
> > > > >>> > >> > > > this
> > > > >>> > >> > > > > > > isn't
> > > > >>> > >> > > > > > > > > > > needed.
> > > > >>> > >> > > > > > > > > > > > Some benchmarks might help here.
> > > > >>> > >> > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > best,
> > > > >>> > >> > > > > > > > > > > > Colin
> > > > >>> > >> > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin
> > Lee
> > > > >>> wrote:
> > > > >>> > >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's
> > > reboot
> > > > >>> this
> > > > >>> > >> issue,
> > > > >>> > >> > > > > KIP-110
> > > > >>> > >> > > > > > > > > > > > > <https://cwiki.apache.org/
> > > > >>> > >> > > confluence/display/KAFKA/KIP-
> > > > >>> > >> > > > > > > > > > > > 110%3A+Add+Codec+for+
> > > ZStandard+Compression>
> > > > >>> > >> > > > > > > > > > > > > .
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > For newcomers, Here is some summary
> of
> > > the
> > > > >>> > >> history:
> > > > >>> > >> > > > KIP-110
> > > > >>> > >> > > > > > was
> > > > >>> > >> > > > > > > > > > > > originally
> > > > >>> > >> > > > > > > > > > > > > worked for the issue KAFKA-4514 but,
> > it
> > > > >>> lacked
> > > > >>> > >> > > benchmark
> > > > >>> > >> > > > > > > results
> > > > >>> > >> > > > > > > > to
> > > > >>> > >> > > > > > > > > > get
> > > > >>> > >> > > > > > > > > > > > the
> > > > >>> > >> > > > > > > > > > > > > agreement of the community. Later,
> > Ivan
> > > > >>> Babrou
> > > > >>> > and
> > > > >>> > >> > some
> > > > >>> > >> > > > > other
> > > > >>> > >> > > > > > > > users
> > > > >>> > >> > > > > > > > > > who
> > > > >>> > >> > > > > > > > > > > > > adopted the patch provided their
> > > excellent
> > > > >>> > >> > performance
> > > > >>> > >> > > > > report
> > > > >>> > >> > > > > > > > which
> > > > >>> > >> > > > > > > > > > is
> > > > >>> > >> > > > > > > > > > > > now
> > > > >>> > >> > > > > > > > > > > > > included in the KIP, but it
> postponed
> > > > again
> > > > >>> > >> because
> > > > >>> > >> > of
> > > > >>> > >> > > > the
> > > > >>> > >> > > > > > > > > community
> > > > >>> > >> > > > > > > > > > > was
> > > > >>> > >> > > > > > > > > > > > > busy for 2.0.0 release. It is why I
> > now
> > > > >>> reboot
> > > > >>> > >> this
> > > > >>> > >> > > > issue.
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > The following is the current status
> of
> > > the
> > > > >>> > >> feature:
> > > > >>> > >> > You
> > > > >>> > >> > > > can
> > > > >>> > >> > > > > > > check
> > > > >>> > >> > > > > > > > > the
> > > > >>> > >> > > > > > > > > > > > > current draft implementation here
> > > > >>> > >> > > > > > > > > > > > > <
> > > > https://github.com/apache/kafka/pull/2267>.
> > > > >>> It
> > > > >>> > >> is
> > > > >>> > >> > > based
> > > > >>> > >> > > > > on
> > > > >>> > >> > > > > > > zstd
> > > > >>> > >> > > > > > > > > > 1.3.5
> > > > >>> > >> > > > > > > > > > > > and
> > > > >>> > >> > > > > > > > > > > > > periodically rebased onto the latest
> > > > >>> trunk[^1].
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > The issues that should be addressed
> is
> > > > like
> > > > >>> > >> > following:
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > *1. Backward Compatibility*
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > To support old consumers, we need to
> > > take
> > > > a
> > > > >>> > >> strategy
> > > > >>> > >> > to
> > > > >>> > >> > > > > > handle
> > > > >>> > >> > > > > > > > the
> > > > >>> > >> > > > > > > > > > old
> > > > >>> > >> > > > > > > > > > > > > consumers. Current candidates are:
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > - Bump API version
> > > > >>> > >> > > > > > > > > > > > > - Leave unchanged: let the old
> clients
> > > > fail.
> > > > >>> > >> > > > > > > > > > > > > - Improve the error messages:
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > *2. Dictionary Support*
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > To support zstd's dictionary feature
> > in
> > > > the
> > > > >>> > future
> > > > >>> > >> > (if
> > > > >>> > >> > > > > > needed),
> > > > >>> > >> > > > > > > > we
> > > > >>> > >> > > > > > > > > > need
> > > > >>> > >> > > > > > > > > > > > to
> > > > >>> > >> > > > > > > > > > > > > sketch how it should be and leave
> some
> > > > room
> > > > >>> for
> > > > >>> > >> it.
> > > > >>> > >> > As
> > > > >>> > >> > > of
> > > > >>> > >> > > > > > now,
> > > > >>> > >> > > > > > > > > there
> > > > >>> > >> > > > > > > > > > > has
> > > > >>> > >> > > > > > > > > > > > > been no discussion on this topic
> yet.
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > *3. License*
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > To use this feature, we need to add
> > > > license
> > > > >>> of
> > > > >>> > >> zstd
> > > > >>> > >> > and
> > > > >>> > >> > > > > > > zstd-jni
> > > > >>> > >> > > > > > > > to
> > > > >>> > >> > > > > > > > > > the
> > > > >>> > >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi
> for
> > > > >>> raising
> > > > >>> > >> this
> > > > >>> > >> > > > issue!)
> > > > >>> > >> > > > > > It
> > > > >>> > >> > > > > > > > > seems
> > > > >>> > >> > > > > > > > > > > like
> > > > >>> > >> > > > > > > > > > > > > what Apache Spark did would be a
> good
> > > > >>> example
> > > > >>> > but
> > > > >>> > >> > there
> > > > >>> > >> > > > has
> > > > >>> > >> > > > > > > been
> > > > >>> > >> > > > > > > > no
> > > > >>> > >> > > > > > > > > > > > > discussion yet.
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > You can find the details of the
> above
> > > > >>> issues in
> > > > >>> > >> the
> > > > >>> > >> > KIP
> > > > >>> > >> > > > > > > document.
> > > > >>> > >> > > > > > > > > > > Please
> > > > >>> > >> > > > > > > > > > > > > have a look when you are free, and
> > give
> > > me
> > > > >>> > >> feedback.
> > > > >>> > >> > > All
> > > > >>> > >> > > > > > kinds
> > > > >>> > >> > > > > > > of
> > > > >>> > >> > > > > > > > > > > > > participating are welcome.
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > Best,
> > > > >>> > >> > > > > > > > > > > > > Dongjin
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > [^1]: At the time of writing, commit
> > > > >>> 6b4fb8152.
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM
> > Dongjin
> > > > >>> Lee <
> > > > >>> > >> > > > > > > dongjin@apache.org
> > > > >>> > >> > > > > > > > >
> > > > >>> > >> > > > > > > > > > > wrote:
> > > > >>> > >> > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > > Sorry for the late reply.
> > > > >>> > >> > > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > > In short, I could not submit the
> > > updated
> > > > >>> KIP
> > > > >>> > by
> > > > >>> > >> the
> > > > >>> > >> > > > > feature
> > > > >>> > >> > > > > > > > > freeze
> > > > >>> > >> > > > > > > > > > > > > > deadline of 2.0.0. For this
> reason,
> > it
> > > > >>> will
> > > > >>> > not
> > > > >>> > >> be
> > > > >>> > >> > > > > included
> > > > >>> > >> > > > > > > in
> > > > >>> > >> > > > > > > > > the
> > > > >>> > >> > > > > > > > > > > > 2.0.0
> > > > >>> > >> > > > > > > > > > > > > > release and all discussion for
> this
> > > > issue
> > > > >>> were
> > > > >>> > >> > > > postponed
> > > > >>> > >> > > > > > > after
> > > > >>> > >> > > > > > > > > the
> > > > >>> > >> > > > > > > > > > > > release
> > > > >>> > >> > > > > > > > > > > > > > of 2.0.0.
> > > > >>> > >> > > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > > I have been updating the PR
> > following
> > > > >>> recent
> > > > >>> > >> > updates.
> > > > >>> > >> > > > > Just
> > > > >>> > >> > > > > > > > now, I
> > > > >>> > >> > > > > > > > > > > > rebased
> > > > >>> > >> > > > > > > > > > > > > > it against the latest trunk and
> > > updated
> > > > >>> the
> > > > >>> > zstd
> > > > >>> > >> > > > version
> > > > >>> > >> > > > > > into
> > > > >>> > >> > > > > > > > > > 1.3.5.
> > > > >>> > >> > > > > > > > > > > > If you
> > > > >>> > >> > > > > > > > > > > > > > need some request, don't hesitate
> to
> > > > >>> notify
> > > > >>> > me.
> > > > >>> > >> > (But
> > > > >>> > >> > > > not
> > > > >>> > >> > > > > > this
> > > > >>> > >> > > > > > > > > > thread
> > > > >>> > >> > > > > > > > > > > -
> > > > >>> > >> > > > > > > > > > > > just
> > > > >>> > >> > > > > > > > > > > > > > send me the message directly.)
> > > > >>> > >> > > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > > Best,
> > > > >>> > >> > > > > > > > > > > > > > Dongjin
> > > > >>> > >> > > > > > > > > > > > > >
> > > > >>> > >> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM
> > Bobby
> > > > >>> Evans <
> > > > >>> > >> > > > > > > bobby@apache.org
> > > > >>> > >> > > > > > > > >
> > > > >>> > >> > > > > > > > > > > wrote:
> > > > >>> > >> > > > > > > > > > > > > >
> > > > >>> > >> >

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Hi Jason,

You are right. Explicit statements are always better. I updated the
document following your suggestion.

@Magnus

Thanks for the inspection. It seems like a new error code is not a problem.

Thanks,
Dongjin

On Fri, Sep 7, 2018 at 3:23 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hi Dongjin,
>
> The KIP looks good to me. I'd suggest starting a vote. A couple minor
> points that might be worth calling out explicitly in the compatibility
> section:
>
> 1. Zstd will only be allowed for the bumped produce API. For older
> versions, we return UNSUPPORTED_COMPRESSION_TYPE regardless of the message
> format.
> 2. Down-conversion of zstd-compressed records will not be supported.
> Instead we will return UNSUPPORTED_COMPRESSION_TYPE.
>
> Does that sound right?
>
> Thanks,
> Jason
>
> On Thu, Sep 6, 2018 at 1:45 AM, Magnus Edenhill <ma...@edenhill.se>
> wrote:
>
> > > Ismael wrote:
> > > Jason, that's an interesting point regarding the Java client. Do we
> know
> > > what clients in other languages do in these cases?
> >
> > librdkafka (and its bindings) passes unknown/future errors through to the
> > application, the error code remains intact while
> > the error string will be set to something like "Err-123?", which isn't
> very
> > helpful to the user but it at least
> > preserves the original error code for further troubleshooting.
> > For the producer any unknown error returned in the ProduceResponse will
> be
> > considered a permanent delivery failure (no retries),
> > and for the consumer any unknown FetchResponse errors will propagate
> > directly to the application, trigger a fetch backoff, and then
> > continue fetching past that offset.
> >
> > So, from the client's perspective it is not really a problem if new error
> > codes are added to older API versions.
> >
> > /Magnus
> >
> >
> > Den tors 6 sep. 2018 kl 09:45 skrev Dongjin Lee <do...@apache.org>:
> >
> > > I updated the KIP page
> > > <
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 110%3A+Add+Codec+for+ZStandard+Compression
> > > >
> > > following the discussion here. Please take a look when you are free.
> > > If you have any opinion, don't hesitate to give me a message.
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Fri, Aug 31, 2018 at 11:35 PM Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > I just updated the draft implementation[^1], rebasing against the
> > latest
> > > > trunk and implementing error routine (i.e., Error code 74 for
> > > > UnsupportedCompressionTypeException.) Since we decided to disallow
> all
> > > > fetch request below version 2.1.0 for the topics specifying
> ZStandard,
> > I
> > > > added an error logic only.
> > > >
> > > > Please have a look when you are free.
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > [^1]: Please check the last commit here:
> > > > https://github.com/apache/kafka/pull/2267
> > > >
> > > > On Thu, Aug 23, 2018, 8:55 AM Dongjin Lee <do...@apache.org>
> wrote:
> > > >
> > > >> Jason,
> > > >>
> > > >> Great. +1 for UNSUPPORTED_COMPRESSION_TYPE.
> > > >>
> > > >> Best,
> > > >> Dongjin
> > > >>
> > > >> On Thu, Aug 23, 2018 at 8:19 AM Jason Gustafson <jason@confluent.io
> >
> > > >> wrote:
> > > >>
> > > >>> Hey Dongjin,
> > > >>>
> > > >>> Yeah that's right. For what it's worth, librdkafka also appears to
> > > handle
> > > >>> unexpected error codes. I expect that most client implementations
> > would
> > > >>> either pass through the raw type or convert to an enum using
> > something
> > > >>> like
> > > >>> what the java client does. Since we're expecting the client to fail
> > > >>> anyway,
> > > >>> I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE
> error
> > > >>> code.
> > > >>>
> > > >>> -Jason
> > > >>>
> > > >>> On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <do...@apache.org>
> > > wrote:
> > > >>>
> > > >>> > Jason and Ismael,
> > > >>> >
> > > >>> > It seems like the only thing we need to regard if we define a new
> > > error
> > > >>> > code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the
> > implementation
> > > >>> of
> > > >>> > the other language clients, right? At least, this strategy causes
> > any
> > > >>> > problem for Java client. Do I understand correctly?
> > > >>> >
> > > >>> > Thanks,
> > > >>> > Dongjin
> > > >>> >
> > > >>> > On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <do...@apache.org>
> > > >>> wrote:
> > > >>> >
> > > >>> > > Jason,
> > > >>> > >
> > > >>> > > > I think we would only use this error code when we /know/ that
> > > zstd
> > > >>> was
> > > >>> > > in use and the client doesn't support it? This is true if
> either
> > 1)
> > > >>> the
> > > >>> > > message needs down-conversion and we encounter a zstd
> compressed
> > > >>> message,
> > > >>> > > or 2) if the topic is explicitly configured to use zstd.
> > > >>> > >
> > > >>> > > Yes, it is right. And you know, the case 1 includes 1.a) old
> > > clients'
> > > >>> > > request v0, v1 records or 1.b) implicit zstd, the compression
> > type
> > > of
> > > >>> > > "producer" with Zstd compressed data.
> > > >>> > >
> > > >>> > > > However, if the compression type is set to "producer," then
> the
> > > >>> fetched
> > > >>> > > data may or may not be compressed with zstd. In this case, we
> > > return
> > > >>> the
> > > >>> > > data to the client and expect it to fail parsing. Is that
> > correct?
> > > >>> > >
> > > >>> > > Exactly.
> > > >>> > >
> > > >>> > > Following your message, I reviewed the implementation of
> > > >>> > > `KafkaApis#handleFetchRequest,` which handles the fetch
> request.
> > > And
> > > >>> > found
> > > >>> > > that the information we can use is like the following:
> > > >>> > >
> > > >>> > > 1. Client's fetch request version. (`versionId` variable)
> > > >>> > > 2. Log's compression type. (`logConfig` variable)
> > > >>> > >
> > > >>> > > We can't detect the actual compression type of the data, unless
> > we
> > > >>> > inspect
> > > >>> > > the `RecordBatch` included in the `Records` instance (i.e.,
> > > >>> > > `unconvertedRecords` variable.) Since it requires some
> > performance
> > > >>> issue,
> > > >>> > > it is not our option - in short, we can't be sure if given
> chunks
> > > of
> > > >>> data
> > > >>> > > are compressed with zstd or not.
> > > >>> > >
> > > >>> > > So, conclusion: we can return an error in the case of 1.a and 2
> > > >>> easily,
> > > >>> > > with the information above. In the case 1.b (implicit zstd), we
> > can
> > > >>> just
> > > >>> > > return the data by do nothing special and expect it to fail
> > > parsing.
> > > >>> > >
> > > >>> > > Thanks,
> > > >>> > > Dongjin
> > > >>> > >
> > > >>> > > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <
> ismaelj@gmail.com>
> > > >>> wrote:
> > > >>> > >
> > > >>> > >> Jason, that's an interesting point regarding the Java client.
> Do
> > > we
> > > >>> know
> > > >>> > >> what clients in other languages do in these cases?
> > > >>> > >>
> > > >>> > >> Ismael
> > > >>> > >>
> > > >>> > >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <
> jason@confluent.io
> > >
> > > >>> wrote:
> > > >>> > >>
> > > >>> > >> > Hi Dongjin,
> > > >>> > >> >
> > > >>> > >> > One of the complications is that old versions of the API
> will
> > > not
> > > >>> > >> expect a
> > > >>> > >> > new error code. However, since we expect this to be a fatal
> > > error
> > > >>> > anyway
> > > >>> > >> > for old clients, it may still be more useful to return the
> > > correct
> > > >>> > error
> > > >>> > >> > code. For example, the Kafka clients use the following code
> to
> > > >>> convert
> > > >>> > >> the
> > > >>> > >> > error code:
> > > >>> > >> >
> > > >>> > >> >     public static Errors forCode(short code) {
> > > >>> > >> >         Errors error = codeToError.get(code);
> > > >>> > >> >         if (error != null) {
> > > >>> > >> >             return error;
> > > >>> > >> >         } else {
> > > >>> > >> >             log.warn("Unexpected error code: {}.", code);
> > > >>> > >> >             return UNKNOWN_SERVER_ERROR;
> > > >>> > >> >         }
> > > >>> > >> >     }
> > > >>> > >> >
> > > >>> > >> > If we return an unsupported error code, it will be converted
> > to
> > > an
> > > >>> > >> UNKNOWN
> > > >>> > >> > error, but at least we will get the message in the log with
> > the
> > > >>> > correct
> > > >>> > >> > code. That seems preferable to returning a misleading error
> > > code.
> > > >>> So I
> > > >>> > >> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE
> > error
> > > >>> even
> > > >>> > for
> > > >>> > >> > older versions.
> > > >>> > >> >
> > > >>> > >> > Also, one question just to check my understanding. I think
> we
> > > >>> would
> > > >>> > only
> > > >>> > >> > use this error code when we /know/ that zstd was in use and
> > the
> > > >>> client
> > > >>> > >> > doesn't support it? This is true if either 1) the message
> > needs
> > > >>> > >> > down-conversion and we encounter a zstd compressed message,
> or
> > > 2)
> > > >>> if
> > > >>> > the
> > > >>> > >> > topic is explicitly configured to use zstd. However, if the
> > > >>> > compression
> > > >>> > >> > type is set to "producer," then the fetched data may or may
> > not
> > > be
> > > >>> > >> > compressed with zstd. In this case, we return the data to
> the
> > > >>> client
> > > >>> > and
> > > >>> > >> > expect it to fail parsing. Is that correct?
> > > >>> > >> >
> > > >>> > >> > Thanks,
> > > >>> > >> > Jason
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <
> > > dongjin@apache.org>
> > > >>> > >> wrote:
> > > >>> > >> >
> > > >>> > >> > > Ismael, Jason and all,
> > > >>> > >> > >
> > > >>> > >> > > I rewrote the backward compatibility strategy & its
> > > alternatives
> > > >>> > like
> > > >>> > >> > > following, based on Ismael & Jason's comments. Since it is
> > not
> > > >>> > >> updated to
> > > >>> > >> > > the wiki yet, don't hesitate to give me a message if you
> > have
> > > >>> any
> > > >>> > >> opinion
> > > >>> > >> > > on it.
> > > >>> > >> > >
> > > >>> > >> > > ```
> > > >>> > >> > > *Backward Compatibility*
> > > >>> > >> > >
> > > >>> > >> > > We need to establish some backward-compatibility strategy
> > for
> > > >>> the
> > > >>> > >> case an
> > > >>> > >> > > old client subscribes a topic using ZStandard implicitly
> > > (i.e.,
> > > >>> > >> > > 'compression.type' configuration of given topic is
> > 'producer'
> > > >>> and
> > > >>> > the
> > > >>> > >> > > producer compressed the records with ZStandard). We have
> the
> > > >>> > following
> > > >>> > >> > > options for this situation:
> > > >>> > >> > >
> > > >>> > >> > > *A. Support ZStandard to the old clients which can
> > understand
> > > >>> v0, v1
> > > >>> > >> > > messages only.*
> > > >>> > >> > >
> > > >>> > >> > > This strategy necessarily requires the down-conversion of
> v2
> > > >>> message
> > > >>> > >> > > compressed with Zstandard into v0 or v1 messages, which
> > means
> > > a
> > > >>> > >> > > considerable performance degradation. So we rejected this
> > > >>> strategy.
> > > >>> > >> > >
> > > >>> > >> > > *B. Bump the API version and support only v2-available
> > > clients*
> > > >>> > >> > >
> > > >>> > >> > > With this approach, we can message the old clients that
> they
> > > >>> are old
> > > >>> > >> and
> > > >>> > >> > > should be upgraded. However, there are still several
> options
> > > >>> for the
> > > >>> > >> > Error
> > > >>> > >> > > code.
> > > >>> > >> > >
> > > >>> > >> > > *B.1. INVALID_REQUEST (42)*
> > > >>> > >> > >
> > > >>> > >> > > This option gives the client so little information; the
> user
> > > >>> can be
> > > >>> > >> > > confused about why the client worked correctly in the past
> > > >>> suddenly
> > > >>> > >> > > encounters a problem. So we rejected this strategy.
> > > >>> > >> > >
> > > >>> > >> > > *B.2. CORRUPT_MESSAGE (2)*
> > > >>> > >> > >
> > > >>> > >> > > This option gives inaccurate information; the user can be
> > > >>> surprised
> > > >>> > >> and
> > > >>> > >> > > misunderstand that the log files are broken in some way.
> So
> > we
> > > >>> > >> rejected
> > > >>> > >> > > this strategy.
> > > >>> > >> > >
> > > >>> > >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
> > > >>> > >> > >
> > > >>> > >> > > The advantage of this approach is we don't need to define
> a
> > > new
> > > >>> > error
> > > >>> > >> > code;
> > > >>> > >> > > we can reuse it and that's all.
> > > >>> > >> > >
> > > >>> > >> > > The disadvantage of this approach is that it is also a
> > little
> > > >>> bit
> > > >>> > >> vague;
> > > >>> > >> > > This error code is defined as a work for KIP-98[^1] and
> now
> > > >>> returned
> > > >>> > >> in
> > > >>> > >> > the
> > > >>> > >> > > transaction error.
> > > >>> > >> > >
> > > >>> > >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
> > > >>> > >> > >
> > > >>> > >> > > The advantage of this approach is that it is clear and
> > > provides
> > > >>> an
> > > >>> > >> exact
> > > >>> > >> > > description. The disadvantage is we need to add a new
> error
> > > >>> code.
> > > >>> > >> > > ```
> > > >>> > >> > >
> > > >>> > >> > > *It seems like what we need to choose is now so clear:
> > > >>> > >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or
> > > >>> UNSUPPORTED_COMPRESSION_TYPE
> > > >>> > >> > > (B.4).*
> > > >>> > >> > > The first one doesn't need a new error message but the
> > latter
> > > is
> > > >>> > more
> > > >>> > >> > > explicit. Which one do you prefer? Since all of you have
> > much
> > > >>> more
> > > >>> > >> > > experience and knowledge than me, I will follow your
> > decision.
> > > >>> The
> > > >>> > >> wiki
> > > >>> > >> > > page will be updated following the decision also.
> > > >>> > >> > >
> > > >>> > >> > > Best,
> > > >>> > >> > > Dongjin
> > > >>> > >> > >
> > > >>> > >> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
> > > >>> > >> > >
> > > >>> > >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <
> > > ismaelj@gmail.com>
> > > >>> > >> wrote:
> > > >>> > >> > >
> > > >>> > >> > > > Sounds reasonable to me.
> > > >>> > >> > > >
> > > >>> > >> > > > Ismael
> > > >>> > >> > > >
> > > >>> > >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <
> > > >>> jason@confluent.io>
> > > >>> > >> > wrote:
> > > >>> > >> > > >
> > > >>> > >> > > > > Hey Ismael,
> > > >>> > >> > > > >
> > > >>> > >> > > > > Your summary looks good to me. I think it might also
> be
> > a
> > > >>> good
> > > >>> > >> idea
> > > >>> > >> > to
> > > >>> > >> > > > add
> > > >>> > >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go
> > along
> > > >>> with
> > > >>> > the
> > > >>> > >> > > > version
> > > >>> > >> > > > > bumps. We won't be able to use it for old api versions
> > > >>> since the
> > > >>> > >> > > clients
> > > >>> > >> > > > > will not understand it, but we can use it going
> forward
> > so
> > > >>> that
> > > >>> > >> we're
> > > >>> > >> > > not
> > > >>> > >> > > > > stuck in a similar situation with a new message format
> > and
> > > >>> a new
> > > >>> > >> > codec
> > > >>> > >> > > to
> > > >>> > >> > > > > support. Another option is to use
> > UNSUPPORTED_FOR_MESSAGE_
> > > >>> > FORMAT,
> > > >>> > >> but
> > > >>> > >> > > it
> > > >>> > >> > > > is
> > > >>> > >> > > > > not as explicit.
> > > >>> > >> > > > >
> > > >>> > >> > > > > -Jason
> > > >>> > >> > > > >
> > > >>> > >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <
> > > >>> ismael@juma.me.uk
> > > >>> > >
> > > >>> > >> > > wrote:
> > > >>> > >> > > > >
> > > >>> > >> > > > > > Hi Dongjin and Jason,
> > > >>> > >> > > > > >
> > > >>> > >> > > > > > I would agree. My summary:
> > > >>> > >> > > > > >
> > > >>> > >> > > > > > 1. Support zstd with message format 2 only.
> > > >>> > >> > > > > > 2. Bump produce and fetch request versions.
> > > >>> > >> > > > > > 3. Provide broker errors whenever possible based on
> > the
> > > >>> > request
> > > >>> > >> > > version
> > > >>> > >> > > > > and
> > > >>> > >> > > > > > rely on clients for the cases where the broker can't
> > > >>> validate
> > > >>> > >> > > > efficiently
> > > >>> > >> > > > > > (example message format 2 consumer that supports the
> > > >>> latest
> > > >>> > >> fetch
> > > >>> > >> > > > version
> > > >>> > >> > > > > > but doesn't support zstd).
> > > >>> > >> > > > > >
> > > >>> > >> > > > > > If there's general agreement on this, I suggest we
> > > update
> > > >>> the
> > > >>> > >> KIP
> > > >>> > >> > to
> > > >>> > >> > > > > state
> > > >>> > >> > > > > > the proposal and to move the rejected options to its
> > own
> > > >>> > >> section.
> > > >>> > >> > And
> > > >>> > >> > > > > then
> > > >>> > >> > > > > > start a vote!
> > > >>> > >> > > > > >
> > > >>> > >> > > > > > Ismael
> > > >>> > >> > > > > >
> > > >>> > >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
> > > >>> > >> > jason@confluent.io>
> > > >>> > >> > > > > > wrote:
> > > >>> > >> > > > > >
> > > >>> > >> > > > > > > Hi Dongjin,
> > > >>> > >> > > > > > >
> > > >>> > >> > > > > > > Yes, that's a good summary. For clients which
> > support
> > > >>> v2,
> > > >>> > the
> > > >>> > >> > > client
> > > >>> > >> > > > > can
> > > >>> > >> > > > > > > parse the message format and hopefully raise a
> > useful
> > > >>> error
> > > >>> > >> > message
> > > >>> > >> > > > > > > indicating the unsupported compression type. For
> > older
> > > >>> > >> clients,
> > > >>> > >> > our
> > > >>> > >> > > > > > options
> > > >>> > >> > > > > > > are probably (1) to down-convert to the old format
> > > >>> using no
> > > >>> > >> > > > compression
> > > >>> > >> > > > > > > type, or (2) to return an error code. I'm leaning
> > > >>> toward the
> > > >>> > >> > latter
> > > >>> > >> > > > as
> > > >>> > >> > > > > > the
> > > >>> > >> > > > > > > simpler solution, but the challenge is finding a
> > good
> > > >>> error
> > > >>> > >> code.
> > > >>> > >> > > Two
> > > >>> > >> > > > > > > possibilities might be INVALID_REQUEST or
> > > >>> CORRUPT_MESSAGE.
> > > >>> > The
> > > >>> > >> > > > downside
> > > >>> > >> > > > > > is
> > > >>> > >> > > > > > > that old clients probably won't get a helpful
> > message.
> > > >>> > >> However,
> > > >>> > >> > at
> > > >>> > >> > > > > least
> > > >>> > >> > > > > > > the behavior will be consistent in the sense that
> > all
> > > >>> > clients
> > > >>> > >> > will
> > > >>> > >> > > > fail
> > > >>> > >> > > > > > if
> > > >>> > >> > > > > > > they do not support zstandard.
> > > >>> > >> > > > > > >
> > > >>> > >> > > > > > > What do you think?
> > > >>> > >> > > > > > >
> > > >>> > >> > > > > > > Thanks,
> > > >>> > >> > > > > > > Jason
> > > >>> > >> > > > > > >
> > > >>> > >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
> > > >>> > >> dongjin@apache.org
> > > >>> > >> > >
> > > >>> > >> > > > > wrote:
> > > >>> > >> > > > > > >
> > > >>> > >> > > > > > > > Thanks Jason, I reviewed the down-converting
> logic
> > > >>> > following
> > > >>> > >> > your
> > > >>> > >> > > > > > > > explanation.[^1] You mean the following
> routines,
> > > >>> right?
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > -
> > > >>> > >> > > > > > > >
> > > https://github.com/apache/kafka/blob/trunk/core/src/
> > > >>> > >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> > > >>> > >> > > > > > > > -
> > > >>> > >> > > > > > > >
> > > >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> > > >>> > >> > > > > > > > main/java/org/apache/kafka/common/record/
> > > >>> > >> > > LazyDownConversionRecords.
> > > >>> > >> > > > > > > > java#L165
> > > >>> > >> > > > > > > > -
> > > >>> > >> > > > > > > >
> > > >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> > > >>> > >> > > > > > > >
> > > >>> > >> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > It seems like your stance is like following:
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > 1. In principle, Kafka does not change the
> > > compression
> > > >>> > codec
> > > >>> > >> > when
> > > >>> > >> > > > > > > > down-converting, since it requires inspecting
> the
> > > >>> fetched
> > > >>> > >> data,
> > > >>> > >> > > > which
> > > >>> > >> > > > > > is
> > > >>> > >> > > > > > > > expensive.
> > > >>> > >> > > > > > > > 2. However, there are some cases the fetched
> data
> > is
> > > >>> > >> inspected
> > > >>> > >> > > > > anyway.
> > > >>> > >> > > > > > In
> > > >>> > >> > > > > > > > this case, we can provide compression conversion
> > > from
> > > >>> > >> Zstandard
> > > >>> > >> > > to
> > > >>> > >> > > > > > > > classical ones[^2].
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > And from what I understand, the cases where the
> > > client
> > > >>> > >> without
> > > >>> > >> > > > > > ZStandard
> > > >>> > >> > > > > > > > support receives ZStandard compressed records
> can
> > be
> > > >>> > >> organized
> > > >>> > >> > > into
> > > >>> > >> > > > > two
> > > >>> > >> > > > > > > > cases:
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > a. The 'compression.type' configuration of given
> > > >>> topic is
> > > >>> > >> > > > 'producer'
> > > >>> > >> > > > > > and
> > > >>> > >> > > > > > > > the producer compressed the records with
> > ZStandard.
> > > >>> (that
> > > >>> > >> is,
> > > >>> > >> > > using
> > > >>> > >> > > > > > > > ZStandard implicitly.)
> > > >>> > >> > > > > > > > b.  The 'compression.type' configuration of
> given
> > > >>> topic is
> > > >>> > >> > > 'zstd';
> > > >>> > >> > > > > that
> > > >>> > >> > > > > > > is,
> > > >>> > >> > > > > > > > using ZStandard explicitly.
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > As you stated, we don't have to handle the case
> b
> > > >>> > specially.
> > > >>> > >> > So,
> > > >>> > >> > > It
> > > >>> > >> > > > > > seems
> > > >>> > >> > > > > > > > like we can narrow the focus of the problem by
> > > joining
> > > >>> > case
> > > >>> > >> 1
> > > >>> > >> > and
> > > >>> > >> > > > > case
> > > >>> > >> > > > > > b
> > > >>> > >> > > > > > > > like the following:
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > > Given the topic with 'producer' as its
> > > >>> > 'compression.type'
> > > >>> > >> > > > > > > configuration,
> > > >>> > >> > > > > > > > ZStandard compressed records and old client
> > without
> > > >>> > >> ZStandard,
> > > >>> > >> > is
> > > >>> > >> > > > > there
> > > >>> > >> > > > > > > any
> > > >>> > >> > > > > > > > case we need to inspect the records and can
> change
> > > the
> > > >>> > >> > > compression
> > > >>> > >> > > > > > type?
> > > >>> > >> > > > > > > If
> > > >>> > >> > > > > > > > so, can we provide compression type converting?
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > Do I understand correctly?
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > Best,
> > > >>> > >> > > > > > > > Dongjin
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > [^1]: I'm sorry, I found that I was a little bit
> > > >>> > >> > misunderstanding
> > > >>> > >> > > > how
> > > >>> > >> > > > > > API
> > > >>> > >> > > > > > > > version works, after reviewing the downconvert
> > logic
> > > >>> & the
> > > >>> > >> > > protocol
> > > >>> > >> > > > > > > > documentation <
> https://kafka.apache.org/protocol
> > >.
> > > >>> > >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson
> <
> > > >>> > >> > > > jason@confluent.io>
> > > >>> > >> > > > > > > > wrote:
> > > >>> > >> > > > > > > >
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > But in my opinion, since the client will
> fail
> > > >>> with the
> > > >>> > >> API
> > > >>> > >> > > > > version,
> > > >>> > >> > > > > > > so
> > > >>> > >> > > > > > > > we
> > > >>> > >> > > > > > > > > > don't need to down-convert the messages
> > anyway.
> > > >>> Isn't
> > > >>> > >> it?
> > > >>> > >> > > So, I
> > > >>> > >> > > > > > think
> > > >>> > >> > > > > > > > we
> > > >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am
> > not
> > > >>> > >> familiar
> > > >>> > >> > > with
> > > >>> > >> > > > > > > > > down-convert
> > > >>> > >> > > > > > > > > > logic.)
> > > >>> > >> > > > > > > > >
> > > >>> > >> > > > > > > > >
> > > >>> > >> > > > > > > > > Currently the broker down-converts
> automatically
> > > >>> when it
> > > >>> > >> > > receives
> > > >>> > >> > > > > an
> > > >>> > >> > > > > > > old
> > > >>> > >> > > > > > > > > version of the fetch request (a version which
> is
> > > >>> known
> > > >>> > to
> > > >>> > >> > > predate
> > > >>> > >> > > > > the
> > > >>> > >> > > > > > > > > message format in use). Typically when
> > > >>> down-converting
> > > >>> > the
> > > >>> > >> > > > message
> > > >>> > >> > > > > > > > format,
> > > >>> > >> > > > > > > > > we use the same compression type, but there is
> > not
> > > >>> much
> > > >>> > >> point
> > > >>> > >> > > in
> > > >>> > >> > > > > > doing
> > > >>> > >> > > > > > > so
> > > >>> > >> > > > > > > > > when we know the client doesn't support it. So
> > if
> > > >>> > >> zstandard
> > > >>> > >> > is
> > > >>> > >> > > in
> > > >>> > >> > > > > > use,
> > > >>> > >> > > > > > > > and
> > > >>> > >> > > > > > > > > we have to down-convert anyway, then we can
> > choose
> > > >>> to
> > > >>> > use
> > > >>> > >> a
> > > >>> > >> > > > > different
> > > >>> > >> > > > > > > > > compression type or no compression type.
> > > >>> > >> > > > > > > > >
> > > >>> > >> > > > > > > > > From my perspective, there is no significant
> > > >>> downside to
> > > >>> > >> > > bumping
> > > >>> > >> > > > > the
> > > >>> > >> > > > > > > > > protocol version and it has several potential
> > > >>> benefits.
> > > >>> > >> > Version
> > > >>> > >> > > > > bumps
> > > >>> > >> > > > > > > are
> > > >>> > >> > > > > > > > > cheap. The main question mark in my mind is
> > about
> > > >>> > >> > > > down-conversion.
> > > >>> > >> > > > > > > > Figuring
> > > >>> > >> > > > > > > > > out whether down-conversion is needed is hard
> > > >>> generally
> > > >>> > >> > without
> > > >>> > >> > > > > > > > inspecting
> > > >>> > >> > > > > > > > > the fetched data, which is expensive. I think
> we
> > > >>> agree
> > > >>> > in
> > > >>> > >> > > > principle
> > > >>> > >> > > > > > > that
> > > >>> > >> > > > > > > > we
> > > >>> > >> > > > > > > > > do not want to have to pay this cost generally
> > and
> > > >>> > prefer
> > > >>> > >> the
> > > >>> > >> > > > > clients
> > > >>> > >> > > > > > > to
> > > >>> > >> > > > > > > > > fail when they see an unhandled compression
> > type.
> > > >>> The
> > > >>> > >> point I
> > > >>> > >> > > was
> > > >>> > >> > > > > > > making
> > > >>> > >> > > > > > > > is
> > > >>> > >> > > > > > > > > that there are some cases where we are either
> > > >>> inspecting
> > > >>> > >> the
> > > >>> > >> > > data
> > > >>> > >> > > > > > > anyway
> > > >>> > >> > > > > > > > > (because we have to down-convert the message
> > > >>> format), or
> > > >>> > >> we
> > > >>> > >> > > have
> > > >>> > >> > > > an
> > > >>> > >> > > > > > > easy
> > > >>> > >> > > > > > > > > way to tell whether zstandard is in use (the
> > topic
> > > >>> has
> > > >>> > it
> > > >>> > >> > > > > configured
> > > >>> > >> > > > > > > > > explicitly). In the latter case, we don't have
> > to
> > > >>> handle
> > > >>> > >> it
> > > >>> > >> > > > > > specially.
> > > >>> > >> > > > > > > > But
> > > >>> > >> > > > > > > > > we do have to decide how we will handle
> > > >>> down-conversion
> > > >>> > to
> > > >>> > >> > > older
> > > >>> > >> > > > > > > formats.
> > > >>> > >> > > > > > > > >
> > > >>> > >> > > > > > > > > -Jason
> > > >>> > >> > > > > > > > >
> > > >>> > >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
> > > >>> > >> > > dongjin@apache.org
> > > >>> > >> > > > >
> > > >>> > >> > > > > > > wrote:
> > > >>> > >> > > > > > > > >
> > > >>> > >> > > > > > > > > > Colin and Jason,
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > Thanks for your opinions. In summarizing,
> the
> > > >>> Pros and
> > > >>> > >> Cons
> > > >>> > >> > > of
> > > >>> > >> > > > > > > bumping
> > > >>> > >> > > > > > > > > > fetch API version are:
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > Cons:
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > - The Broker can't know whether a given
> > message
> > > >>> batch
> > > >>> > is
> > > >>> > >> > > > > compressed
> > > >>> > >> > > > > > > > with
> > > >>> > >> > > > > > > > > > zstd or not.
> > > >>> > >> > > > > > > > > > - Need some additional logic for the topic
> > > >>> explicitly
> > > >>> > >> > > > configured
> > > >>> > >> > > > > to
> > > >>> > >> > > > > > > use
> > > >>> > >> > > > > > > > > > zstd.
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > Pros:
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > - The broker doesn't need to conduct
> expensive
> > > >>> > >> > > down-conversion.
> > > >>> > >> > > > > > > > > > - Can message the users to update their
> > client.
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > So, opinions for the backward-compatibility
> > > >>> policy by
> > > >>> > >> far:
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > - A: bump the API version - +2 (Colin,
> Jason)
> > > >>> > >> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > Here are my additional comments:
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > @Colin
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > I greatly appreciate your response. In the
> > case
> > > >>> of the
> > > >>> > >> > > > dictionary
> > > >>> > >> > > > > > > > > support,
> > > >>> > >> > > > > > > > > > of course, this issue should be addressed
> > later
> > > >>> so we
> > > >>> > >> don't
> > > >>> > >> > > > need
> > > >>> > >> > > > > it
> > > >>> > >> > > > > > > in
> > > >>> > >> > > > > > > > > the
> > > >>> > >> > > > > > > > > > first version. You are right - it is not
> late
> > to
> > > >>> try
> > > >>> > it
> > > >>> > >> > after
> > > >>> > >> > > > > some
> > > >>> > >> > > > > > > > > > benchmarks. What I mean is, we should keep
> in
> > > >>> mind on
> > > >>> > >> that
> > > >>> > >> > > > > > potential
> > > >>> > >> > > > > > > > > > feature.
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > @Jason
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > You wrote,
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > > Similarly, if we have to down-convert
> anyway
> > > >>> because
> > > >>> > >> the
> > > >>> > >> > > > client
> > > >>> > >> > > > > > > does
> > > >>> > >> > > > > > > > > not
> > > >>> > >> > > > > > > > > > understand the message format, then we could
> > > also
> > > >>> use
> > > >>> > a
> > > >>> > >> > > > different
> > > >>> > >> > > > > > > > > > compression type.
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > But in my opinion, since the client will
> fail
> > > >>> with the
> > > >>> > >> API
> > > >>> > >> > > > > version,
> > > >>> > >> > > > > > > so
> > > >>> > >> > > > > > > > we
> > > >>> > >> > > > > > > > > > don't need to down-convert the messages
> > anyway.
> > > >>> Isn't
> > > >>> > >> it?
> > > >>> > >> > > So, I
> > > >>> > >> > > > > > think
> > > >>> > >> > > > > > > > we
> > > >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am
> > not
> > > >>> > >> familiar
> > > >>> > >> > > with
> > > >>> > >> > > > > > > > > down-convert
> > > >>> > >> > > > > > > > > > logic.)
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > Please give more opinions. Thanks!
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > - Dongjin
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason
> > Gustafson <
> > > >>> > >> > > > > jason@confluent.io
> > > >>> > >> > > > > > >
> > > >>> > >> > > > > > > > > wrote:
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > > Hey Colin,
> > > >>> > >> > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > The problem for the fetch API is that the
> > > broker
> > > >>> > does
> > > >>> > >> not
> > > >>> > >> > > > > > generally
> > > >>> > >> > > > > > > > > know
> > > >>> > >> > > > > > > > > > if
> > > >>> > >> > > > > > > > > > > a batch was compressed with zstd unless it
> > > >>> parses
> > > >>> > it.
> > > >>> > >> I
> > > >>> > >> > > think
> > > >>> > >> > > > > the
> > > >>> > >> > > > > > > > goal
> > > >>> > >> > > > > > > > > > here
> > > >>> > >> > > > > > > > > > > is to avoid the expensive down-conversion
> > that
> > > >>> is
> > > >>> > >> needed
> > > >>> > >> > to
> > > >>> > >> > > > > > ensure
> > > >>> > >> > > > > > > > > > > compatibility because it is only necessary
> > if
> > > >>> zstd
> > > >>> > is
> > > >>> > >> > > > actually
> > > >>> > >> > > > > in
> > > >>> > >> > > > > > > > use.
> > > >>> > >> > > > > > > > > > But
> > > >>> > >> > > > > > > > > > > as long as old clients can parse the
> message
> > > >>> format,
> > > >>> > >> they
> > > >>> > >> > > > > should
> > > >>> > >> > > > > > > get
> > > >>> > >> > > > > > > > a
> > > >>> > >> > > > > > > > > > > reasonable error if they see an
> unsupported
> > > >>> > >> compression
> > > >>> > >> > > type
> > > >>> > >> > > > in
> > > >>> > >> > > > > > the
> > > >>> > >> > > > > > > > > > > attributes. Basically the onus is on users
> > to
> > > >>> ensure
> > > >>> > >> that
> > > >>> > >> > > > their
> > > >>> > >> > > > > > > > > consumers
> > > >>> > >> > > > > > > > > > > have been updated prior to using zstd. It
> > > seems
> > > >>> > like a
> > > >>> > >> > > > > reasonable
> > > >>> > >> > > > > > > > > > tradeoff
> > > >>> > >> > > > > > > > > > > to me. There are a couple cases that might
> > be
> > > >>> worth
> > > >>> > >> > > thinking
> > > >>> > >> > > > > > > through:
> > > >>> > >> > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > 1. If a topic is explicitly configured to
> > use
> > > >>> zstd,
> > > >>> > >> then
> > > >>> > >> > we
> > > >>> > >> > > > > don't
> > > >>> > >> > > > > > > > need
> > > >>> > >> > > > > > > > > to
> > > >>> > >> > > > > > > > > > > check the fetched data for the compression
> > > type
> > > >>> to
> > > >>> > >> know
> > > >>> > >> > if
> > > >>> > >> > > we
> > > >>> > >> > > > > > need
> > > >>> > >> > > > > > > > > > > down-conversion. If we did bump the Fetch
> > API
> > > >>> > version,
> > > >>> > >> > then
> > > >>> > >> > > > we
> > > >>> > >> > > > > > > could
> > > >>> > >> > > > > > > > > > handle
> > > >>> > >> > > > > > > > > > > this case by either down-converting using
> a
> > > >>> > different
> > > >>> > >> > > > > compression
> > > >>> > >> > > > > > > > type
> > > >>> > >> > > > > > > > > or
> > > >>> > >> > > > > > > > > > > returning an error.
> > > >>> > >> > > > > > > > > > > 2. Similarly, if we have to down-convert
> > > anyway
> > > >>> > >> because
> > > >>> > >> > the
> > > >>> > >> > > > > > client
> > > >>> > >> > > > > > > > does
> > > >>> > >> > > > > > > > > > not
> > > >>> > >> > > > > > > > > > > understand the message format, then we
> could
> > > >>> also
> > > >>> > use
> > > >>> > >> a
> > > >>> > >> > > > > different
> > > >>> > >> > > > > > > > > > > compression type.
> > > >>> > >> > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > For the produce API, I think it's
> reasonable
> > > to
> > > >>> bump
> > > >>> > >> the
> > > >>> > >> > > api
> > > >>> > >> > > > > > > version.
> > > >>> > >> > > > > > > > > > This
> > > >>> > >> > > > > > > > > > > can be used by clients to check whether a
> > > broker
> > > >>> > >> supports
> > > >>> > >> > > > zstd.
> > > >>> > >> > > > > > For
> > > >>> > >> > > > > > > > > > > example, we might support a list of
> > preferred
> > > >>> > >> compression
> > > >>> > >> > > > types
> > > >>> > >> > > > > > in
> > > >>> > >> > > > > > > > the
> > > >>> > >> > > > > > > > > > > producer and we could use the broker to
> > detect
> > > >>> which
> > > >>> > >> > > version
> > > >>> > >> > > > to
> > > >>> > >> > > > > > > use.
> > > >>> > >> > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > -Jason
> > > >>> > >> > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin
> > McCabe <
> > > >>> > >> > > > > cmccabe@apache.org
> > > >>> > >> > > > > > >
> > > >>> > >> > > > > > > > > wrote:
> > > >>> > >> > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd
> > is a
> > > >>> good
> > > >>> > >> > > > compression
> > > >>> > >> > > > > > > codec
> > > >>> > >> > > > > > > > > > and I
> > > >>> > >> > > > > > > > > > > > hope we can get this support in soon!
> > > >>> > >> > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > I would say we can just bump the API
> > version
> > > >>> to
> > > >>> > >> > indicate
> > > >>> > >> > > > that
> > > >>> > >> > > > > > > ZStd
> > > >>> > >> > > > > > > > > > > support
> > > >>> > >> > > > > > > > > > > > is expected in new clients.  We probably
> > > need
> > > >>> some
> > > >>> > >> way
> > > >>> > >> > of
> > > >>> > >> > > > > > > > indicating
> > > >>> > >> > > > > > > > > to
> > > >>> > >> > > > > > > > > > > the
> > > >>> > >> > > > > > > > > > > > older clients that they can't consume
> the
> > > >>> > >> partitions,
> > > >>> > >> > as
> > > >>> > >> > > > > well.
> > > >>> > >> > > > > > > > > Perhaps
> > > >>> > >> > > > > > > > > > > we
> > > >>> > >> > > > > > > > > > > > can use the
> UNSUPPORTED_FOR_MESSAGE_FORMAT
> > > >>> error?
> > > >>> > >> > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > The license thing seems straightforward
> --
> > > >>> it's
> > > >>> > >> just a
> > > >>> > >> > > > matter
> > > >>> > >> > > > > > of
> > > >>> > >> > > > > > > > > adding
> > > >>> > >> > > > > > > > > > > > the text to the right files as per ASF
> > > >>> guidelines.
> > > >>> > >> > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > With regard to the dictionary support,
> do
> > we
> > > >>> > really
> > > >>> > >> > need
> > > >>> > >> > > > that
> > > >>> > >> > > > > > in
> > > >>> > >> > > > > > > > the
> > > >>> > >> > > > > > > > > > > first
> > > >>> > >> > > > > > > > > > > > version?  Hopefully message batches are
> > big
> > > >>> enough
> > > >>> > >> that
> > > >>> > >> > > > this
> > > >>> > >> > > > > > > isn't
> > > >>> > >> > > > > > > > > > > needed.
> > > >>> > >> > > > > > > > > > > > Some benchmarks might help here.
> > > >>> > >> > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > best,
> > > >>> > >> > > > > > > > > > > > Colin
> > > >>> > >> > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin
> Lee
> > > >>> wrote:
> > > >>> > >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's
> > reboot
> > > >>> this
> > > >>> > >> issue,
> > > >>> > >> > > > > KIP-110
> > > >>> > >> > > > > > > > > > > > > <https://cwiki.apache.org/
> > > >>> > >> > > confluence/display/KAFKA/KIP-
> > > >>> > >> > > > > > > > > > > > 110%3A+Add+Codec+for+
> > ZStandard+Compression>
> > > >>> > >> > > > > > > > > > > > > .
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > For newcomers, Here is some summary of
> > the
> > > >>> > >> history:
> > > >>> > >> > > > KIP-110
> > > >>> > >> > > > > > was
> > > >>> > >> > > > > > > > > > > > originally
> > > >>> > >> > > > > > > > > > > > > worked for the issue KAFKA-4514 but,
> it
> > > >>> lacked
> > > >>> > >> > > benchmark
> > > >>> > >> > > > > > > results
> > > >>> > >> > > > > > > > to
> > > >>> > >> > > > > > > > > > get
> > > >>> > >> > > > > > > > > > > > the
> > > >>> > >> > > > > > > > > > > > > agreement of the community. Later,
> Ivan
> > > >>> Babrou
> > > >>> > and
> > > >>> > >> > some
> > > >>> > >> > > > > other
> > > >>> > >> > > > > > > > users
> > > >>> > >> > > > > > > > > > who
> > > >>> > >> > > > > > > > > > > > > adopted the patch provided their
> > excellent
> > > >>> > >> > performance
> > > >>> > >> > > > > report
> > > >>> > >> > > > > > > > which
> > > >>> > >> > > > > > > > > > is
> > > >>> > >> > > > > > > > > > > > now
> > > >>> > >> > > > > > > > > > > > > included in the KIP, but it postponed
> > > again
> > > >>> > >> because
> > > >>> > >> > of
> > > >>> > >> > > > the
> > > >>> > >> > > > > > > > > community
> > > >>> > >> > > > > > > > > > > was
> > > >>> > >> > > > > > > > > > > > > busy for 2.0.0 release. It is why I
> now
> > > >>> reboot
> > > >>> > >> this
> > > >>> > >> > > > issue.
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > The following is the current status of
> > the
> > > >>> > >> feature:
> > > >>> > >> > You
> > > >>> > >> > > > can
> > > >>> > >> > > > > > > check
> > > >>> > >> > > > > > > > > the
> > > >>> > >> > > > > > > > > > > > > current draft implementation here
> > > >>> > >> > > > > > > > > > > > > <
> > > https://github.com/apache/kafka/pull/2267>.
> > > >>> It
> > > >>> > >> is
> > > >>> > >> > > based
> > > >>> > >> > > > > on
> > > >>> > >> > > > > > > zstd
> > > >>> > >> > > > > > > > > > 1.3.5
> > > >>> > >> > > > > > > > > > > > and
> > > >>> > >> > > > > > > > > > > > > periodically rebased onto the latest
> > > >>> trunk[^1].
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > The issues that should be addressed is
> > > like
> > > >>> > >> > following:
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > *1. Backward Compatibility*
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > To support old consumers, we need to
> > take
> > > a
> > > >>> > >> strategy
> > > >>> > >> > to
> > > >>> > >> > > > > > handle
> > > >>> > >> > > > > > > > the
> > > >>> > >> > > > > > > > > > old
> > > >>> > >> > > > > > > > > > > > > consumers. Current candidates are:
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > - Bump API version
> > > >>> > >> > > > > > > > > > > > > - Leave unchanged: let the old clients
> > > fail.
> > > >>> > >> > > > > > > > > > > > > - Improve the error messages:
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > *2. Dictionary Support*
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > To support zstd's dictionary feature
> in
> > > the
> > > >>> > future
> > > >>> > >> > (if
> > > >>> > >> > > > > > needed),
> > > >>> > >> > > > > > > > we
> > > >>> > >> > > > > > > > > > need
> > > >>> > >> > > > > > > > > > > > to
> > > >>> > >> > > > > > > > > > > > > sketch how it should be and leave some
> > > room
> > > >>> for
> > > >>> > >> it.
> > > >>> > >> > As
> > > >>> > >> > > of
> > > >>> > >> > > > > > now,
> > > >>> > >> > > > > > > > > there
> > > >>> > >> > > > > > > > > > > has
> > > >>> > >> > > > > > > > > > > > > been no discussion on this topic yet.
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > *3. License*
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > To use this feature, we need to add
> > > license
> > > >>> of
> > > >>> > >> zstd
> > > >>> > >> > and
> > > >>> > >> > > > > > > zstd-jni
> > > >>> > >> > > > > > > > to
> > > >>> > >> > > > > > > > > > the
> > > >>> > >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for
> > > >>> raising
> > > >>> > >> this
> > > >>> > >> > > > issue!)
> > > >>> > >> > > > > > It
> > > >>> > >> > > > > > > > > seems
> > > >>> > >> > > > > > > > > > > like
> > > >>> > >> > > > > > > > > > > > > what Apache Spark did would be a good
> > > >>> example
> > > >>> > but
> > > >>> > >> > there
> > > >>> > >> > > > has
> > > >>> > >> > > > > > > been
> > > >>> > >> > > > > > > > no
> > > >>> > >> > > > > > > > > > > > > discussion yet.
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > You can find the details of the above
> > > >>> issues in
> > > >>> > >> the
> > > >>> > >> > KIP
> > > >>> > >> > > > > > > document.
> > > >>> > >> > > > > > > > > > > Please
> > > >>> > >> > > > > > > > > > > > > have a look when you are free, and
> give
> > me
> > > >>> > >> feedback.
> > > >>> > >> > > All
> > > >>> > >> > > > > > kinds
> > > >>> > >> > > > > > > of
> > > >>> > >> > > > > > > > > > > > > participating are welcome.
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > Best,
> > > >>> > >> > > > > > > > > > > > > Dongjin
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > [^1]: At the time of writing, commit
> > > >>> 6b4fb8152.
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM
> Dongjin
> > > >>> Lee <
> > > >>> > >> > > > > > > dongjin@apache.org
> > > >>> > >> > > > > > > > >
> > > >>> > >> > > > > > > > > > > wrote:
> > > >>> > >> > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > > Sorry for the late reply.
> > > >>> > >> > > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > > In short, I could not submit the
> > updated
> > > >>> KIP
> > > >>> > by
> > > >>> > >> the
> > > >>> > >> > > > > feature
> > > >>> > >> > > > > > > > > freeze
> > > >>> > >> > > > > > > > > > > > > > deadline of 2.0.0. For this reason,
> it
> > > >>> will
> > > >>> > not
> > > >>> > >> be
> > > >>> > >> > > > > included
> > > >>> > >> > > > > > > in
> > > >>> > >> > > > > > > > > the
> > > >>> > >> > > > > > > > > > > > 2.0.0
> > > >>> > >> > > > > > > > > > > > > > release and all discussion for this
> > > issue
> > > >>> were
> > > >>> > >> > > > postponed
> > > >>> > >> > > > > > > after
> > > >>> > >> > > > > > > > > the
> > > >>> > >> > > > > > > > > > > > release
> > > >>> > >> > > > > > > > > > > > > > of 2.0.0.
> > > >>> > >> > > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > > I have been updating the PR
> following
> > > >>> recent
> > > >>> > >> > updates.
> > > >>> > >> > > > > Just
> > > >>> > >> > > > > > > > now, I
> > > >>> > >> > > > > > > > > > > > rebased
> > > >>> > >> > > > > > > > > > > > > > it against the latest trunk and
> > updated
> > > >>> the
> > > >>> > zstd
> > > >>> > >> > > > version
> > > >>> > >> > > > > > into
> > > >>> > >> > > > > > > > > > 1.3.5.
> > > >>> > >> > > > > > > > > > > > If you
> > > >>> > >> > > > > > > > > > > > > > need some request, don't hesitate to
> > > >>> notify
> > > >>> > me.
> > > >>> > >> > (But
> > > >>> > >> > > > not
> > > >>> > >> > > > > > this
> > > >>> > >> > > > > > > > > > thread
> > > >>> > >> > > > > > > > > > > -
> > > >>> > >> > > > > > > > > > > > just
> > > >>> > >> > > > > > > > > > > > > > send me the message directly.)
> > > >>> > >> > > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > > Best,
> > > >>> > >> > > > > > > > > > > > > > Dongjin
> > > >>> > >> > > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM
> Bobby
> > > >>> Evans <
> > > >>> > >> > > > > > > bobby@apache.org
> > > >>> > >> > > > > > > > >
> > > >>> > >> > > > > > > > > > > wrote:
> > > >>> > >> > > > > > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > > >> I there any update on this.  The
> > > >>> performance
> > > >>> > >> > > > > improvements
> > > >>> > >> > > > > > > are
> > > >>> > >> > > > > > > > > > quite
> > > >>> > >> > > > > > > > > > > > > >> impressive and I really would like
> to
> > > >>> stop
> > > >>> > >> forking
> > > >>> > >> > > > kafka
> > > >>> > >> > > > > > > just
> > > >>> > >> > > > > > > > to
> > > >>> > >> > > > > > > > > > get
> > > >>> > >> > > > > > > > > > > > this
> > > >>> > >> > > > > > > > > > > > > >> in.
> > > >>> > >> > > > > > > > > > > > > >>
> > > >>> > >> > > > > > > > > > > > > >> Thanks,
> > > >>> > >> > > > > > > > > > > > > >>
> > > >>> > >> > > > > > > > > > > > > >> Bobby
> > > >>> > >> > > > > > > > > > > > > >>
> > > >>> > >> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM
> > Dongjin
> > > >>> Lee <
> > > >>> > >> > > > > > > > dongjin@apache.org
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > wrote:
> > > >>> > >> > > > > > > > > > > > > >>
> > > >>> > >> > > > > > > > > > > > > >> > Ismael,
> > > >>> > >> > > > > > > > > > > > > >> >
> > > >>> > >> > > > > > > > > > > > > >> > Oh, I forgot all of you are on
> > > working
> > > >>> > frenzy
> > > >>> > >> > for
> > > >>> > >> > > > 2.0!
> > > >>> > >> > > > > > No
> > > >>> > >> > > > > > > > > > problem,
> > > >>> > >> > > > > > > > > > > > take
> > > >>> > >> > > > > > > > > > > > > >> > your time. I am also working at
> > > another
> > > >>> > issue
> > > >>> > >> > now.
> > > >>> > >> > > > > Thank
> > > >>> > >> > > > > > > you
> > > >>> > >> > > > > > > > > for
> > > >>> > >> > > > > > > > > > > > > >> letting me
> > > >>> > >> > > > > > > > > > > > > >> > know.
> > > >>> > >> > > > > > > > > > > > > >> >
> > > >>> > >> > > > > > > > > > > > > >> > Best,
> > > >>> > >> > > > > > > > > > > > > >> > Dongjin
> > > >>> > >> > > > > > > > > > > > > >> >
> > > >>> > >> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM
> > Ismael
> > > >>> Juma
> > > >>> > <
> > > >>> > >> > > > > > > > ismaelj@gmail.com
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > wrote:
> > > >>> > >> > > > > > > > > > > > > >> >
> > > >>> > >> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin.
> > > >>> Everyone is
> > > >>> > >> busy
> > > >>> > >> > > > > > finalising
> > > >>> > >> > > > > > > > > > 2.0.0.
> > > >>> > >> > > > > > > > > > > > This
> > > >>> > >> > > > > > > > > > > > > >> KIP
> > > >>> > >> > > > > > > > > > > > > >> > > seems like a great candidate
> for
> > > >>> 2.1.0
> > > >>> > and
> > > >>> > >> > > > hopefully
> > > >>> > >> > > > > > > there
> > > >>> > >> > > > > > > > > > will
> > > >>> > >> > > > > > > > > > > be
> > > >>> > >> > > > > > > > > > > > > >> more
> > > >>> > >> > > > > > > > > > > > > >> > of
> > > >>> > >> > > > > > > > > > > > > >> > > a discussion next week. :)
> > > >>> > >> > > > > > > > > > > > > >> > >
> > > >>> > >> > > > > > > > > > > > > >> > > Ismael
> > > >>> > >> > > > > > > > > > > > > >> > >
> > > >>> > >> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17
> > Dongjin
> > > >>> Lee, <
> > > >>> > >> > > > > > > > dongjin@apache.org
> > > >>> > >> > > > > > > > > >
> > > >>> > >> > > > > > > > > > > > wrote:
> > > >>> > >> > > > > > > > > > > > > >> > >
> > > >>> > >> > > > > > > > > > > > > >> > > > Hello. I just updated my
> draft
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Jason Gustafson <ja...@confluent.io>.
Hi Dongjin,

The KIP looks good to me. I'd suggest starting a vote. A couple minor
points that might be worth calling out explicitly in the compatibility
section:

1. Zstd will only be allowed for the bumped produce API. For older
versions, we return UNSUPPORTED_COMPRESSION_TYPE regardless of the message
format.
2. Down-conversion of zstd-compressed records will not be supported.
Instead we will return UNSUPPORTED_COMPRESSION_TYPE.

Does that sound right?

Thanks,
Jason

On Thu, Sep 6, 2018 at 1:45 AM, Magnus Edenhill <ma...@edenhill.se> wrote:

> > Ismael wrote:
> > Jason, that's an interesting point regarding the Java client. Do we know
> > what clients in other languages do in these cases?
>
> librdkafka (and its bindings) passes unknown/future errors through to the
> application, the error code remains intact while
> the error string will be set to something like "Err-123?", which isn't very
> helpful to the user but it at least
> preserves the original error code for further troubleshooting.
> For the producer any unknown error returned in the ProduceResponse will be
> considered a permanent delivery failure (no retries),
> and for the consumer any unknown FetchResponse errors will propagate
> directly to the application, trigger a fetch backoff, and then
> continue fetching past that offset.
>
> So, from the client's perspective it is not really a problem if new error
> codes are added to older API versions.
>
> /Magnus
>
>
> Den tors 6 sep. 2018 kl 09:45 skrev Dongjin Lee <do...@apache.org>:
>
> > I updated the KIP page
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 110%3A+Add+Codec+for+ZStandard+Compression
> > >
> > following the discussion here. Please take a look when you are free.
> > If you have any opinion, don't hesitate to give me a message.
> >
> > Best,
> > Dongjin
> >
> > On Fri, Aug 31, 2018 at 11:35 PM Dongjin Lee <do...@apache.org> wrote:
> >
> > > I just updated the draft implementation[^1], rebasing against the
> latest
> > > trunk and implementing error routine (i.e., Error code 74 for
> > > UnsupportedCompressionTypeException.) Since we decided to disallow all
> > > fetch request below version 2.1.0 for the topics specifying ZStandard,
> I
> > > added an error logic only.
> > >
> > > Please have a look when you are free.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > [^1]: Please check the last commit here:
> > > https://github.com/apache/kafka/pull/2267
> > >
> > > On Thu, Aug 23, 2018, 8:55 AM Dongjin Lee <do...@apache.org> wrote:
> > >
> > >> Jason,
> > >>
> > >> Great. +1 for UNSUPPORTED_COMPRESSION_TYPE.
> > >>
> > >> Best,
> > >> Dongjin
> > >>
> > >> On Thu, Aug 23, 2018 at 8:19 AM Jason Gustafson <ja...@confluent.io>
> > >> wrote:
> > >>
> > >>> Hey Dongjin,
> > >>>
> > >>> Yeah that's right. For what it's worth, librdkafka also appears to
> > handle
> > >>> unexpected error codes. I expect that most client implementations
> would
> > >>> either pass through the raw type or convert to an enum using
> something
> > >>> like
> > >>> what the java client does. Since we're expecting the client to fail
> > >>> anyway,
> > >>> I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE error
> > >>> code.
> > >>>
> > >>> -Jason
> > >>>
> > >>> On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <do...@apache.org>
> > wrote:
> > >>>
> > >>> > Jason and Ismael,
> > >>> >
> > >>> > It seems like the only thing we need to regard if we define a new
> > error
> > >>> > code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the
> implementation
> > >>> of
> > >>> > the other language clients, right? At least, this strategy causes
> any
> > >>> > problem for Java client. Do I understand correctly?
> > >>> >
> > >>> > Thanks,
> > >>> > Dongjin
> > >>> >
> > >>> > On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <do...@apache.org>
> > >>> wrote:
> > >>> >
> > >>> > > Jason,
> > >>> > >
> > >>> > > > I think we would only use this error code when we /know/ that
> > zstd
> > >>> was
> > >>> > > in use and the client doesn't support it? This is true if either
> 1)
> > >>> the
> > >>> > > message needs down-conversion and we encounter a zstd compressed
> > >>> message,
> > >>> > > or 2) if the topic is explicitly configured to use zstd.
> > >>> > >
> > >>> > > Yes, it is right. And you know, the case 1 includes 1.a) old
> > clients'
> > >>> > > request v0, v1 records or 1.b) implicit zstd, the compression
> type
> > of
> > >>> > > "producer" with Zstd compressed data.
> > >>> > >
> > >>> > > > However, if the compression type is set to "producer," then the
> > >>> fetched
> > >>> > > data may or may not be compressed with zstd. In this case, we
> > return
> > >>> the
> > >>> > > data to the client and expect it to fail parsing. Is that
> correct?
> > >>> > >
> > >>> > > Exactly.
> > >>> > >
> > >>> > > Following your message, I reviewed the implementation of
> > >>> > > `KafkaApis#handleFetchRequest,` which handles the fetch request.
> > And
> > >>> > found
> > >>> > > that the information we can use is like the following:
> > >>> > >
> > >>> > > 1. Client's fetch request version. (`versionId` variable)
> > >>> > > 2. Log's compression type. (`logConfig` variable)
> > >>> > >
> > >>> > > We can't detect the actual compression type of the data, unless
> we
> > >>> > inspect
> > >>> > > the `RecordBatch` included in the `Records` instance (i.e.,
> > >>> > > `unconvertedRecords` variable.) Since it requires some
> performance
> > >>> issue,
> > >>> > > it is not our option - in short, we can't be sure if given chunks
> > of
> > >>> data
> > >>> > > are compressed with zstd or not.
> > >>> > >
> > >>> > > So, conclusion: we can return an error in the case of 1.a and 2
> > >>> easily,
> > >>> > > with the information above. In the case 1.b (implicit zstd), we
> can
> > >>> just
> > >>> > > return the data by do nothing special and expect it to fail
> > parsing.
> > >>> > >
> > >>> > > Thanks,
> > >>> > > Dongjin
> > >>> > >
> > >>> > > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <is...@gmail.com>
> > >>> wrote:
> > >>> > >
> > >>> > >> Jason, that's an interesting point regarding the Java client. Do
> > we
> > >>> know
> > >>> > >> what clients in other languages do in these cases?
> > >>> > >>
> > >>> > >> Ismael
> > >>> > >>
> > >>> > >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <jason@confluent.io
> >
> > >>> wrote:
> > >>> > >>
> > >>> > >> > Hi Dongjin,
> > >>> > >> >
> > >>> > >> > One of the complications is that old versions of the API will
> > not
> > >>> > >> expect a
> > >>> > >> > new error code. However, since we expect this to be a fatal
> > error
> > >>> > anyway
> > >>> > >> > for old clients, it may still be more useful to return the
> > correct
> > >>> > error
> > >>> > >> > code. For example, the Kafka clients use the following code to
> > >>> convert
> > >>> > >> the
> > >>> > >> > error code:
> > >>> > >> >
> > >>> > >> >     public static Errors forCode(short code) {
> > >>> > >> >         Errors error = codeToError.get(code);
> > >>> > >> >         if (error != null) {
> > >>> > >> >             return error;
> > >>> > >> >         } else {
> > >>> > >> >             log.warn("Unexpected error code: {}.", code);
> > >>> > >> >             return UNKNOWN_SERVER_ERROR;
> > >>> > >> >         }
> > >>> > >> >     }
> > >>> > >> >
> > >>> > >> > If we return an unsupported error code, it will be converted
> to
> > an
> > >>> > >> UNKNOWN
> > >>> > >> > error, but at least we will get the message in the log with
> the
> > >>> > correct
> > >>> > >> > code. That seems preferable to returning a misleading error
> > code.
> > >>> So I
> > >>> > >> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE
> error
> > >>> even
> > >>> > for
> > >>> > >> > older versions.
> > >>> > >> >
> > >>> > >> > Also, one question just to check my understanding. I think we
> > >>> would
> > >>> > only
> > >>> > >> > use this error code when we /know/ that zstd was in use and
> the
> > >>> client
> > >>> > >> > doesn't support it? This is true if either 1) the message
> needs
> > >>> > >> > down-conversion and we encounter a zstd compressed message, or
> > 2)
> > >>> if
> > >>> > the
> > >>> > >> > topic is explicitly configured to use zstd. However, if the
> > >>> > compression
> > >>> > >> > type is set to "producer," then the fetched data may or may
> not
> > be
> > >>> > >> > compressed with zstd. In this case, we return the data to the
> > >>> client
> > >>> > and
> > >>> > >> > expect it to fail parsing. Is that correct?
> > >>> > >> >
> > >>> > >> > Thanks,
> > >>> > >> > Jason
> > >>> > >> >
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <
> > dongjin@apache.org>
> > >>> > >> wrote:
> > >>> > >> >
> > >>> > >> > > Ismael, Jason and all,
> > >>> > >> > >
> > >>> > >> > > I rewrote the backward compatibility strategy & its
> > alternatives
> > >>> > like
> > >>> > >> > > following, based on Ismael & Jason's comments. Since it is
> not
> > >>> > >> updated to
> > >>> > >> > > the wiki yet, don't hesitate to give me a message if you
> have
> > >>> any
> > >>> > >> opinion
> > >>> > >> > > on it.
> > >>> > >> > >
> > >>> > >> > > ```
> > >>> > >> > > *Backward Compatibility*
> > >>> > >> > >
> > >>> > >> > > We need to establish some backward-compatibility strategy
> for
> > >>> the
> > >>> > >> case an
> > >>> > >> > > old client subscribes a topic using ZStandard implicitly
> > (i.e.,
> > >>> > >> > > 'compression.type' configuration of given topic is
> 'producer'
> > >>> and
> > >>> > the
> > >>> > >> > > producer compressed the records with ZStandard). We have the
> > >>> > following
> > >>> > >> > > options for this situation:
> > >>> > >> > >
> > >>> > >> > > *A. Support ZStandard to the old clients which can
> understand
> > >>> v0, v1
> > >>> > >> > > messages only.*
> > >>> > >> > >
> > >>> > >> > > This strategy necessarily requires the down-conversion of v2
> > >>> message
> > >>> > >> > > compressed with Zstandard into v0 or v1 messages, which
> means
> > a
> > >>> > >> > > considerable performance degradation. So we rejected this
> > >>> strategy.
> > >>> > >> > >
> > >>> > >> > > *B. Bump the API version and support only v2-available
> > clients*
> > >>> > >> > >
> > >>> > >> > > With this approach, we can message the old clients that they
> > >>> are old
> > >>> > >> and
> > >>> > >> > > should be upgraded. However, there are still several options
> > >>> for the
> > >>> > >> > Error
> > >>> > >> > > code.
> > >>> > >> > >
> > >>> > >> > > *B.1. INVALID_REQUEST (42)*
> > >>> > >> > >
> > >>> > >> > > This option gives the client so little information; the user
> > >>> can be
> > >>> > >> > > confused about why the client worked correctly in the past
> > >>> suddenly
> > >>> > >> > > encounters a problem. So we rejected this strategy.
> > >>> > >> > >
> > >>> > >> > > *B.2. CORRUPT_MESSAGE (2)*
> > >>> > >> > >
> > >>> > >> > > This option gives inaccurate information; the user can be
> > >>> surprised
> > >>> > >> and
> > >>> > >> > > misunderstand that the log files are broken in some way. So
> we
> > >>> > >> rejected
> > >>> > >> > > this strategy.
> > >>> > >> > >
> > >>> > >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
> > >>> > >> > >
> > >>> > >> > > The advantage of this approach is we don't need to define a
> > new
> > >>> > error
> > >>> > >> > code;
> > >>> > >> > > we can reuse it and that's all.
> > >>> > >> > >
> > >>> > >> > > The disadvantage of this approach is that it is also a
> little
> > >>> bit
> > >>> > >> vague;
> > >>> > >> > > This error code is defined as a work for KIP-98[^1] and now
> > >>> returned
> > >>> > >> in
> > >>> > >> > the
> > >>> > >> > > transaction error.
> > >>> > >> > >
> > >>> > >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
> > >>> > >> > >
> > >>> > >> > > The advantage of this approach is that it is clear and
> > provides
> > >>> an
> > >>> > >> exact
> > >>> > >> > > description. The disadvantage is we need to add a new error
> > >>> code.
> > >>> > >> > > ```
> > >>> > >> > >
> > >>> > >> > > *It seems like what we need to choose is now so clear:
> > >>> > >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or
> > >>> UNSUPPORTED_COMPRESSION_TYPE
> > >>> > >> > > (B.4).*
> > >>> > >> > > The first one doesn't need a new error message but the
> latter
> > is
> > >>> > more
> > >>> > >> > > explicit. Which one do you prefer? Since all of you have
> much
> > >>> more
> > >>> > >> > > experience and knowledge than me, I will follow your
> decision.
> > >>> The
> > >>> > >> wiki
> > >>> > >> > > page will be updated following the decision also.
> > >>> > >> > >
> > >>> > >> > > Best,
> > >>> > >> > > Dongjin
> > >>> > >> > >
> > >>> > >> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
> > >>> > >> > >
> > >>> > >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <
> > ismaelj@gmail.com>
> > >>> > >> wrote:
> > >>> > >> > >
> > >>> > >> > > > Sounds reasonable to me.
> > >>> > >> > > >
> > >>> > >> > > > Ismael
> > >>> > >> > > >
> > >>> > >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <
> > >>> jason@confluent.io>
> > >>> > >> > wrote:
> > >>> > >> > > >
> > >>> > >> > > > > Hey Ismael,
> > >>> > >> > > > >
> > >>> > >> > > > > Your summary looks good to me. I think it might also be
> a
> > >>> good
> > >>> > >> idea
> > >>> > >> > to
> > >>> > >> > > > add
> > >>> > >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go
> along
> > >>> with
> > >>> > the
> > >>> > >> > > > version
> > >>> > >> > > > > bumps. We won't be able to use it for old api versions
> > >>> since the
> > >>> > >> > > clients
> > >>> > >> > > > > will not understand it, but we can use it going forward
> so
> > >>> that
> > >>> > >> we're
> > >>> > >> > > not
> > >>> > >> > > > > stuck in a similar situation with a new message format
> and
> > >>> a new
> > >>> > >> > codec
> > >>> > >> > > to
> > >>> > >> > > > > support. Another option is to use
> UNSUPPORTED_FOR_MESSAGE_
> > >>> > FORMAT,
> > >>> > >> but
> > >>> > >> > > it
> > >>> > >> > > > is
> > >>> > >> > > > > not as explicit.
> > >>> > >> > > > >
> > >>> > >> > > > > -Jason
> > >>> > >> > > > >
> > >>> > >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <
> > >>> ismael@juma.me.uk
> > >>> > >
> > >>> > >> > > wrote:
> > >>> > >> > > > >
> > >>> > >> > > > > > Hi Dongjin and Jason,
> > >>> > >> > > > > >
> > >>> > >> > > > > > I would agree. My summary:
> > >>> > >> > > > > >
> > >>> > >> > > > > > 1. Support zstd with message format 2 only.
> > >>> > >> > > > > > 2. Bump produce and fetch request versions.
> > >>> > >> > > > > > 3. Provide broker errors whenever possible based on
> the
> > >>> > request
> > >>> > >> > > version
> > >>> > >> > > > > and
> > >>> > >> > > > > > rely on clients for the cases where the broker can't
> > >>> validate
> > >>> > >> > > > efficiently
> > >>> > >> > > > > > (example message format 2 consumer that supports the
> > >>> latest
> > >>> > >> fetch
> > >>> > >> > > > version
> > >>> > >> > > > > > but doesn't support zstd).
> > >>> > >> > > > > >
> > >>> > >> > > > > > If there's general agreement on this, I suggest we
> > update
> > >>> the
> > >>> > >> KIP
> > >>> > >> > to
> > >>> > >> > > > > state
> > >>> > >> > > > > > the proposal and to move the rejected options to its
> own
> > >>> > >> section.
> > >>> > >> > And
> > >>> > >> > > > > then
> > >>> > >> > > > > > start a vote!
> > >>> > >> > > > > >
> > >>> > >> > > > > > Ismael
> > >>> > >> > > > > >
> > >>> > >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
> > >>> > >> > jason@confluent.io>
> > >>> > >> > > > > > wrote:
> > >>> > >> > > > > >
> > >>> > >> > > > > > > Hi Dongjin,
> > >>> > >> > > > > > >
> > >>> > >> > > > > > > Yes, that's a good summary. For clients which
> support
> > >>> v2,
> > >>> > the
> > >>> > >> > > client
> > >>> > >> > > > > can
> > >>> > >> > > > > > > parse the message format and hopefully raise a
> useful
> > >>> error
> > >>> > >> > message
> > >>> > >> > > > > > > indicating the unsupported compression type. For
> older
> > >>> > >> clients,
> > >>> > >> > our
> > >>> > >> > > > > > options
> > >>> > >> > > > > > > are probably (1) to down-convert to the old format
> > >>> using no
> > >>> > >> > > > compression
> > >>> > >> > > > > > > type, or (2) to return an error code. I'm leaning
> > >>> toward the
> > >>> > >> > latter
> > >>> > >> > > > as
> > >>> > >> > > > > > the
> > >>> > >> > > > > > > simpler solution, but the challenge is finding a
> good
> > >>> error
> > >>> > >> code.
> > >>> > >> > > Two
> > >>> > >> > > > > > > possibilities might be INVALID_REQUEST or
> > >>> CORRUPT_MESSAGE.
> > >>> > The
> > >>> > >> > > > downside
> > >>> > >> > > > > > is
> > >>> > >> > > > > > > that old clients probably won't get a helpful
> message.
> > >>> > >> However,
> > >>> > >> > at
> > >>> > >> > > > > least
> > >>> > >> > > > > > > the behavior will be consistent in the sense that
> all
> > >>> > clients
> > >>> > >> > will
> > >>> > >> > > > fail
> > >>> > >> > > > > > if
> > >>> > >> > > > > > > they do not support zstandard.
> > >>> > >> > > > > > >
> > >>> > >> > > > > > > What do you think?
> > >>> > >> > > > > > >
> > >>> > >> > > > > > > Thanks,
> > >>> > >> > > > > > > Jason
> > >>> > >> > > > > > >
> > >>> > >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
> > >>> > >> dongjin@apache.org
> > >>> > >> > >
> > >>> > >> > > > > wrote:
> > >>> > >> > > > > > >
> > >>> > >> > > > > > > > Thanks Jason, I reviewed the down-converting logic
> > >>> > following
> > >>> > >> > your
> > >>> > >> > > > > > > > explanation.[^1] You mean the following routines,
> > >>> right?
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > -
> > >>> > >> > > > > > > >
> > https://github.com/apache/kafka/blob/trunk/core/src/
> > >>> > >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> > >>> > >> > > > > > > > -
> > >>> > >> > > > > > > >
> > >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> > >>> > >> > > > > > > > main/java/org/apache/kafka/common/record/
> > >>> > >> > > LazyDownConversionRecords.
> > >>> > >> > > > > > > > java#L165
> > >>> > >> > > > > > > > -
> > >>> > >> > > > > > > >
> > >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> > >>> > >> > > > > > > >
> > >>> > >> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > It seems like your stance is like following:
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > 1. In principle, Kafka does not change the
> > compression
> > >>> > codec
> > >>> > >> > when
> > >>> > >> > > > > > > > down-converting, since it requires inspecting the
> > >>> fetched
> > >>> > >> data,
> > >>> > >> > > > which
> > >>> > >> > > > > > is
> > >>> > >> > > > > > > > expensive.
> > >>> > >> > > > > > > > 2. However, there are some cases the fetched data
> is
> > >>> > >> inspected
> > >>> > >> > > > > anyway.
> > >>> > >> > > > > > In
> > >>> > >> > > > > > > > this case, we can provide compression conversion
> > from
> > >>> > >> Zstandard
> > >>> > >> > > to
> > >>> > >> > > > > > > > classical ones[^2].
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > And from what I understand, the cases where the
> > client
> > >>> > >> without
> > >>> > >> > > > > > ZStandard
> > >>> > >> > > > > > > > support receives ZStandard compressed records can
> be
> > >>> > >> organized
> > >>> > >> > > into
> > >>> > >> > > > > two
> > >>> > >> > > > > > > > cases:
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > a. The 'compression.type' configuration of given
> > >>> topic is
> > >>> > >> > > > 'producer'
> > >>> > >> > > > > > and
> > >>> > >> > > > > > > > the producer compressed the records with
> ZStandard.
> > >>> (that
> > >>> > >> is,
> > >>> > >> > > using
> > >>> > >> > > > > > > > ZStandard implicitly.)
> > >>> > >> > > > > > > > b.  The 'compression.type' configuration of given
> > >>> topic is
> > >>> > >> > > 'zstd';
> > >>> > >> > > > > that
> > >>> > >> > > > > > > is,
> > >>> > >> > > > > > > > using ZStandard explicitly.
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > As you stated, we don't have to handle the case b
> > >>> > specially.
> > >>> > >> > So,
> > >>> > >> > > It
> > >>> > >> > > > > > seems
> > >>> > >> > > > > > > > like we can narrow the focus of the problem by
> > joining
> > >>> > case
> > >>> > >> 1
> > >>> > >> > and
> > >>> > >> > > > > case
> > >>> > >> > > > > > b
> > >>> > >> > > > > > > > like the following:
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > > Given the topic with 'producer' as its
> > >>> > 'compression.type'
> > >>> > >> > > > > > > configuration,
> > >>> > >> > > > > > > > ZStandard compressed records and old client
> without
> > >>> > >> ZStandard,
> > >>> > >> > is
> > >>> > >> > > > > there
> > >>> > >> > > > > > > any
> > >>> > >> > > > > > > > case we need to inspect the records and can change
> > the
> > >>> > >> > > compression
> > >>> > >> > > > > > type?
> > >>> > >> > > > > > > If
> > >>> > >> > > > > > > > so, can we provide compression type converting?
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > Do I understand correctly?
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > Best,
> > >>> > >> > > > > > > > Dongjin
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > [^1]: I'm sorry, I found that I was a little bit
> > >>> > >> > misunderstanding
> > >>> > >> > > > how
> > >>> > >> > > > > > API
> > >>> > >> > > > > > > > version works, after reviewing the downconvert
> logic
> > >>> & the
> > >>> > >> > > protocol
> > >>> > >> > > > > > > > documentation <https://kafka.apache.org/protocol
> >.
> > >>> > >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
> > >>> > >> > > > jason@confluent.io>
> > >>> > >> > > > > > > > wrote:
> > >>> > >> > > > > > > >
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > But in my opinion, since the client will fail
> > >>> with the
> > >>> > >> API
> > >>> > >> > > > > version,
> > >>> > >> > > > > > > so
> > >>> > >> > > > > > > > we
> > >>> > >> > > > > > > > > > don't need to down-convert the messages
> anyway.
> > >>> Isn't
> > >>> > >> it?
> > >>> > >> > > So, I
> > >>> > >> > > > > > think
> > >>> > >> > > > > > > > we
> > >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am
> not
> > >>> > >> familiar
> > >>> > >> > > with
> > >>> > >> > > > > > > > > down-convert
> > >>> > >> > > > > > > > > > logic.)
> > >>> > >> > > > > > > > >
> > >>> > >> > > > > > > > >
> > >>> > >> > > > > > > > > Currently the broker down-converts automatically
> > >>> when it
> > >>> > >> > > receives
> > >>> > >> > > > > an
> > >>> > >> > > > > > > old
> > >>> > >> > > > > > > > > version of the fetch request (a version which is
> > >>> known
> > >>> > to
> > >>> > >> > > predate
> > >>> > >> > > > > the
> > >>> > >> > > > > > > > > message format in use). Typically when
> > >>> down-converting
> > >>> > the
> > >>> > >> > > > message
> > >>> > >> > > > > > > > format,
> > >>> > >> > > > > > > > > we use the same compression type, but there is
> not
> > >>> much
> > >>> > >> point
> > >>> > >> > > in
> > >>> > >> > > > > > doing
> > >>> > >> > > > > > > so
> > >>> > >> > > > > > > > > when we know the client doesn't support it. So
> if
> > >>> > >> zstandard
> > >>> > >> > is
> > >>> > >> > > in
> > >>> > >> > > > > > use,
> > >>> > >> > > > > > > > and
> > >>> > >> > > > > > > > > we have to down-convert anyway, then we can
> choose
> > >>> to
> > >>> > use
> > >>> > >> a
> > >>> > >> > > > > different
> > >>> > >> > > > > > > > > compression type or no compression type.
> > >>> > >> > > > > > > > >
> > >>> > >> > > > > > > > > From my perspective, there is no significant
> > >>> downside to
> > >>> > >> > > bumping
> > >>> > >> > > > > the
> > >>> > >> > > > > > > > > protocol version and it has several potential
> > >>> benefits.
> > >>> > >> > Version
> > >>> > >> > > > > bumps
> > >>> > >> > > > > > > are
> > >>> > >> > > > > > > > > cheap. The main question mark in my mind is
> about
> > >>> > >> > > > down-conversion.
> > >>> > >> > > > > > > > Figuring
> > >>> > >> > > > > > > > > out whether down-conversion is needed is hard
> > >>> generally
> > >>> > >> > without
> > >>> > >> > > > > > > > inspecting
> > >>> > >> > > > > > > > > the fetched data, which is expensive. I think we
> > >>> agree
> > >>> > in
> > >>> > >> > > > principle
> > >>> > >> > > > > > > that
> > >>> > >> > > > > > > > we
> > >>> > >> > > > > > > > > do not want to have to pay this cost generally
> and
> > >>> > prefer
> > >>> > >> the
> > >>> > >> > > > > clients
> > >>> > >> > > > > > > to
> > >>> > >> > > > > > > > > fail when they see an unhandled compression
> type.
> > >>> The
> > >>> > >> point I
> > >>> > >> > > was
> > >>> > >> > > > > > > making
> > >>> > >> > > > > > > > is
> > >>> > >> > > > > > > > > that there are some cases where we are either
> > >>> inspecting
> > >>> > >> the
> > >>> > >> > > data
> > >>> > >> > > > > > > anyway
> > >>> > >> > > > > > > > > (because we have to down-convert the message
> > >>> format), or
> > >>> > >> we
> > >>> > >> > > have
> > >>> > >> > > > an
> > >>> > >> > > > > > > easy
> > >>> > >> > > > > > > > > way to tell whether zstandard is in use (the
> topic
> > >>> has
> > >>> > it
> > >>> > >> > > > > configured
> > >>> > >> > > > > > > > > explicitly). In the latter case, we don't have
> to
> > >>> handle
> > >>> > >> it
> > >>> > >> > > > > > specially.
> > >>> > >> > > > > > > > But
> > >>> > >> > > > > > > > > we do have to decide how we will handle
> > >>> down-conversion
> > >>> > to
> > >>> > >> > > older
> > >>> > >> > > > > > > formats.
> > >>> > >> > > > > > > > >
> > >>> > >> > > > > > > > > -Jason
> > >>> > >> > > > > > > > >
> > >>> > >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
> > >>> > >> > > dongjin@apache.org
> > >>> > >> > > > >
> > >>> > >> > > > > > > wrote:
> > >>> > >> > > > > > > > >
> > >>> > >> > > > > > > > > > Colin and Jason,
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > Thanks for your opinions. In summarizing, the
> > >>> Pros and
> > >>> > >> Cons
> > >>> > >> > > of
> > >>> > >> > > > > > > bumping
> > >>> > >> > > > > > > > > > fetch API version are:
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > Cons:
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > - The Broker can't know whether a given
> message
> > >>> batch
> > >>> > is
> > >>> > >> > > > > compressed
> > >>> > >> > > > > > > > with
> > >>> > >> > > > > > > > > > zstd or not.
> > >>> > >> > > > > > > > > > - Need some additional logic for the topic
> > >>> explicitly
> > >>> > >> > > > configured
> > >>> > >> > > > > to
> > >>> > >> > > > > > > use
> > >>> > >> > > > > > > > > > zstd.
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > Pros:
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > - The broker doesn't need to conduct expensive
> > >>> > >> > > down-conversion.
> > >>> > >> > > > > > > > > > - Can message the users to update their
> client.
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > So, opinions for the backward-compatibility
> > >>> policy by
> > >>> > >> far:
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > - A: bump the API version - +2 (Colin, Jason)
> > >>> > >> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > Here are my additional comments:
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > @Colin
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > I greatly appreciate your response. In the
> case
> > >>> of the
> > >>> > >> > > > dictionary
> > >>> > >> > > > > > > > > support,
> > >>> > >> > > > > > > > > > of course, this issue should be addressed
> later
> > >>> so we
> > >>> > >> don't
> > >>> > >> > > > need
> > >>> > >> > > > > it
> > >>> > >> > > > > > > in
> > >>> > >> > > > > > > > > the
> > >>> > >> > > > > > > > > > first version. You are right - it is not late
> to
> > >>> try
> > >>> > it
> > >>> > >> > after
> > >>> > >> > > > > some
> > >>> > >> > > > > > > > > > benchmarks. What I mean is, we should keep in
> > >>> mind on
> > >>> > >> that
> > >>> > >> > > > > > potential
> > >>> > >> > > > > > > > > > feature.
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > @Jason
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > You wrote,
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > > Similarly, if we have to down-convert anyway
> > >>> because
> > >>> > >> the
> > >>> > >> > > > client
> > >>> > >> > > > > > > does
> > >>> > >> > > > > > > > > not
> > >>> > >> > > > > > > > > > understand the message format, then we could
> > also
> > >>> use
> > >>> > a
> > >>> > >> > > > different
> > >>> > >> > > > > > > > > > compression type.
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > But in my opinion, since the client will fail
> > >>> with the
> > >>> > >> API
> > >>> > >> > > > > version,
> > >>> > >> > > > > > > so
> > >>> > >> > > > > > > > we
> > >>> > >> > > > > > > > > > don't need to down-convert the messages
> anyway.
> > >>> Isn't
> > >>> > >> it?
> > >>> > >> > > So, I
> > >>> > >> > > > > > think
> > >>> > >> > > > > > > > we
> > >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am
> not
> > >>> > >> familiar
> > >>> > >> > > with
> > >>> > >> > > > > > > > > down-convert
> > >>> > >> > > > > > > > > > logic.)
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > Please give more opinions. Thanks!
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > - Dongjin
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason
> Gustafson <
> > >>> > >> > > > > jason@confluent.io
> > >>> > >> > > > > > >
> > >>> > >> > > > > > > > > wrote:
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > > Hey Colin,
> > >>> > >> > > > > > > > > > >
> > >>> > >> > > > > > > > > > > The problem for the fetch API is that the
> > broker
> > >>> > does
> > >>> > >> not
> > >>> > >> > > > > > generally
> > >>> > >> > > > > > > > > know
> > >>> > >> > > > > > > > > > if
> > >>> > >> > > > > > > > > > > a batch was compressed with zstd unless it
> > >>> parses
> > >>> > it.
> > >>> > >> I
> > >>> > >> > > think
> > >>> > >> > > > > the
> > >>> > >> > > > > > > > goal
> > >>> > >> > > > > > > > > > here
> > >>> > >> > > > > > > > > > > is to avoid the expensive down-conversion
> that
> > >>> is
> > >>> > >> needed
> > >>> > >> > to
> > >>> > >> > > > > > ensure
> > >>> > >> > > > > > > > > > > compatibility because it is only necessary
> if
> > >>> zstd
> > >>> > is
> > >>> > >> > > > actually
> > >>> > >> > > > > in
> > >>> > >> > > > > > > > use.
> > >>> > >> > > > > > > > > > But
> > >>> > >> > > > > > > > > > > as long as old clients can parse the message
> > >>> format,
> > >>> > >> they
> > >>> > >> > > > > should
> > >>> > >> > > > > > > get
> > >>> > >> > > > > > > > a
> > >>> > >> > > > > > > > > > > reasonable error if they see an unsupported
> > >>> > >> compression
> > >>> > >> > > type
> > >>> > >> > > > in
> > >>> > >> > > > > > the
> > >>> > >> > > > > > > > > > > attributes. Basically the onus is on users
> to
> > >>> ensure
> > >>> > >> that
> > >>> > >> > > > their
> > >>> > >> > > > > > > > > consumers
> > >>> > >> > > > > > > > > > > have been updated prior to using zstd. It
> > seems
> > >>> > like a
> > >>> > >> > > > > reasonable
> > >>> > >> > > > > > > > > > tradeoff
> > >>> > >> > > > > > > > > > > to me. There are a couple cases that might
> be
> > >>> worth
> > >>> > >> > > thinking
> > >>> > >> > > > > > > through:
> > >>> > >> > > > > > > > > > >
> > >>> > >> > > > > > > > > > > 1. If a topic is explicitly configured to
> use
> > >>> zstd,
> > >>> > >> then
> > >>> > >> > we
> > >>> > >> > > > > don't
> > >>> > >> > > > > > > > need
> > >>> > >> > > > > > > > > to
> > >>> > >> > > > > > > > > > > check the fetched data for the compression
> > type
> > >>> to
> > >>> > >> know
> > >>> > >> > if
> > >>> > >> > > we
> > >>> > >> > > > > > need
> > >>> > >> > > > > > > > > > > down-conversion. If we did bump the Fetch
> API
> > >>> > version,
> > >>> > >> > then
> > >>> > >> > > > we
> > >>> > >> > > > > > > could
> > >>> > >> > > > > > > > > > handle
> > >>> > >> > > > > > > > > > > this case by either down-converting using a
> > >>> > different
> > >>> > >> > > > > compression
> > >>> > >> > > > > > > > type
> > >>> > >> > > > > > > > > or
> > >>> > >> > > > > > > > > > > returning an error.
> > >>> > >> > > > > > > > > > > 2. Similarly, if we have to down-convert
> > anyway
> > >>> > >> because
> > >>> > >> > the
> > >>> > >> > > > > > client
> > >>> > >> > > > > > > > does
> > >>> > >> > > > > > > > > > not
> > >>> > >> > > > > > > > > > > understand the message format, then we could
> > >>> also
> > >>> > use
> > >>> > >> a
> > >>> > >> > > > > different
> > >>> > >> > > > > > > > > > > compression type.
> > >>> > >> > > > > > > > > > >
> > >>> > >> > > > > > > > > > > For the produce API, I think it's reasonable
> > to
> > >>> bump
> > >>> > >> the
> > >>> > >> > > api
> > >>> > >> > > > > > > version.
> > >>> > >> > > > > > > > > > This
> > >>> > >> > > > > > > > > > > can be used by clients to check whether a
> > broker
> > >>> > >> supports
> > >>> > >> > > > zstd.
> > >>> > >> > > > > > For
> > >>> > >> > > > > > > > > > > example, we might support a list of
> preferred
> > >>> > >> compression
> > >>> > >> > > > types
> > >>> > >> > > > > > in
> > >>> > >> > > > > > > > the
> > >>> > >> > > > > > > > > > > producer and we could use the broker to
> detect
> > >>> which
> > >>> > >> > > version
> > >>> > >> > > > to
> > >>> > >> > > > > > > use.
> > >>> > >> > > > > > > > > > >
> > >>> > >> > > > > > > > > > > -Jason
> > >>> > >> > > > > > > > > > >
> > >>> > >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin
> McCabe <
> > >>> > >> > > > > cmccabe@apache.org
> > >>> > >> > > > > > >
> > >>> > >> > > > > > > > > wrote:
> > >>> > >> > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd
> is a
> > >>> good
> > >>> > >> > > > compression
> > >>> > >> > > > > > > codec
> > >>> > >> > > > > > > > > > and I
> > >>> > >> > > > > > > > > > > > hope we can get this support in soon!
> > >>> > >> > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > I would say we can just bump the API
> version
> > >>> to
> > >>> > >> > indicate
> > >>> > >> > > > that
> > >>> > >> > > > > > > ZStd
> > >>> > >> > > > > > > > > > > support
> > >>> > >> > > > > > > > > > > > is expected in new clients.  We probably
> > need
> > >>> some
> > >>> > >> way
> > >>> > >> > of
> > >>> > >> > > > > > > > indicating
> > >>> > >> > > > > > > > > to
> > >>> > >> > > > > > > > > > > the
> > >>> > >> > > > > > > > > > > > older clients that they can't consume the
> > >>> > >> partitions,
> > >>> > >> > as
> > >>> > >> > > > > well.
> > >>> > >> > > > > > > > > Perhaps
> > >>> > >> > > > > > > > > > > we
> > >>> > >> > > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT
> > >>> error?
> > >>> > >> > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > The license thing seems straightforward --
> > >>> it's
> > >>> > >> just a
> > >>> > >> > > > matter
> > >>> > >> > > > > > of
> > >>> > >> > > > > > > > > adding
> > >>> > >> > > > > > > > > > > > the text to the right files as per ASF
> > >>> guidelines.
> > >>> > >> > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > With regard to the dictionary support, do
> we
> > >>> > really
> > >>> > >> > need
> > >>> > >> > > > that
> > >>> > >> > > > > > in
> > >>> > >> > > > > > > > the
> > >>> > >> > > > > > > > > > > first
> > >>> > >> > > > > > > > > > > > version?  Hopefully message batches are
> big
> > >>> enough
> > >>> > >> that
> > >>> > >> > > > this
> > >>> > >> > > > > > > isn't
> > >>> > >> > > > > > > > > > > needed.
> > >>> > >> > > > > > > > > > > > Some benchmarks might help here.
> > >>> > >> > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > best,
> > >>> > >> > > > > > > > > > > > Colin
> > >>> > >> > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee
> > >>> wrote:
> > >>> > >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's
> reboot
> > >>> this
> > >>> > >> issue,
> > >>> > >> > > > > KIP-110
> > >>> > >> > > > > > > > > > > > > <https://cwiki.apache.org/
> > >>> > >> > > confluence/display/KAFKA/KIP-
> > >>> > >> > > > > > > > > > > > 110%3A+Add+Codec+for+
> ZStandard+Compression>
> > >>> > >> > > > > > > > > > > > > .
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > For newcomers, Here is some summary of
> the
> > >>> > >> history:
> > >>> > >> > > > KIP-110
> > >>> > >> > > > > > was
> > >>> > >> > > > > > > > > > > > originally
> > >>> > >> > > > > > > > > > > > > worked for the issue KAFKA-4514 but, it
> > >>> lacked
> > >>> > >> > > benchmark
> > >>> > >> > > > > > > results
> > >>> > >> > > > > > > > to
> > >>> > >> > > > > > > > > > get
> > >>> > >> > > > > > > > > > > > the
> > >>> > >> > > > > > > > > > > > > agreement of the community. Later, Ivan
> > >>> Babrou
> > >>> > and
> > >>> > >> > some
> > >>> > >> > > > > other
> > >>> > >> > > > > > > > users
> > >>> > >> > > > > > > > > > who
> > >>> > >> > > > > > > > > > > > > adopted the patch provided their
> excellent
> > >>> > >> > performance
> > >>> > >> > > > > report
> > >>> > >> > > > > > > > which
> > >>> > >> > > > > > > > > > is
> > >>> > >> > > > > > > > > > > > now
> > >>> > >> > > > > > > > > > > > > included in the KIP, but it postponed
> > again
> > >>> > >> because
> > >>> > >> > of
> > >>> > >> > > > the
> > >>> > >> > > > > > > > > community
> > >>> > >> > > > > > > > > > > was
> > >>> > >> > > > > > > > > > > > > busy for 2.0.0 release. It is why I now
> > >>> reboot
> > >>> > >> this
> > >>> > >> > > > issue.
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > The following is the current status of
> the
> > >>> > >> feature:
> > >>> > >> > You
> > >>> > >> > > > can
> > >>> > >> > > > > > > check
> > >>> > >> > > > > > > > > the
> > >>> > >> > > > > > > > > > > > > current draft implementation here
> > >>> > >> > > > > > > > > > > > > <
> > https://github.com/apache/kafka/pull/2267>.
> > >>> It
> > >>> > >> is
> > >>> > >> > > based
> > >>> > >> > > > > on
> > >>> > >> > > > > > > zstd
> > >>> > >> > > > > > > > > > 1.3.5
> > >>> > >> > > > > > > > > > > > and
> > >>> > >> > > > > > > > > > > > > periodically rebased onto the latest
> > >>> trunk[^1].
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > The issues that should be addressed is
> > like
> > >>> > >> > following:
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > *1. Backward Compatibility*
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > To support old consumers, we need to
> take
> > a
> > >>> > >> strategy
> > >>> > >> > to
> > >>> > >> > > > > > handle
> > >>> > >> > > > > > > > the
> > >>> > >> > > > > > > > > > old
> > >>> > >> > > > > > > > > > > > > consumers. Current candidates are:
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > - Bump API version
> > >>> > >> > > > > > > > > > > > > - Leave unchanged: let the old clients
> > fail.
> > >>> > >> > > > > > > > > > > > > - Improve the error messages:
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > *2. Dictionary Support*
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > To support zstd's dictionary feature in
> > the
> > >>> > future
> > >>> > >> > (if
> > >>> > >> > > > > > needed),
> > >>> > >> > > > > > > > we
> > >>> > >> > > > > > > > > > need
> > >>> > >> > > > > > > > > > > > to
> > >>> > >> > > > > > > > > > > > > sketch how it should be and leave some
> > room
> > >>> for
> > >>> > >> it.
> > >>> > >> > As
> > >>> > >> > > of
> > >>> > >> > > > > > now,
> > >>> > >> > > > > > > > > there
> > >>> > >> > > > > > > > > > > has
> > >>> > >> > > > > > > > > > > > > been no discussion on this topic yet.
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > *3. License*
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > To use this feature, we need to add
> > license
> > >>> of
> > >>> > >> zstd
> > >>> > >> > and
> > >>> > >> > > > > > > zstd-jni
> > >>> > >> > > > > > > > to
> > >>> > >> > > > > > > > > > the
> > >>> > >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for
> > >>> raising
> > >>> > >> this
> > >>> > >> > > > issue!)
> > >>> > >> > > > > > It
> > >>> > >> > > > > > > > > seems
> > >>> > >> > > > > > > > > > > like
> > >>> > >> > > > > > > > > > > > > what Apache Spark did would be a good
> > >>> example
> > >>> > but
> > >>> > >> > there
> > >>> > >> > > > has
> > >>> > >> > > > > > > been
> > >>> > >> > > > > > > > no
> > >>> > >> > > > > > > > > > > > > discussion yet.
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > You can find the details of the above
> > >>> issues in
> > >>> > >> the
> > >>> > >> > KIP
> > >>> > >> > > > > > > document.
> > >>> > >> > > > > > > > > > > Please
> > >>> > >> > > > > > > > > > > > > have a look when you are free, and give
> me
> > >>> > >> feedback.
> > >>> > >> > > All
> > >>> > >> > > > > > kinds
> > >>> > >> > > > > > > of
> > >>> > >> > > > > > > > > > > > > participating are welcome.
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > Best,
> > >>> > >> > > > > > > > > > > > > Dongjin
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > [^1]: At the time of writing, commit
> > >>> 6b4fb8152.
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin
> > >>> Lee <
> > >>> > >> > > > > > > dongjin@apache.org
> > >>> > >> > > > > > > > >
> > >>> > >> > > > > > > > > > > wrote:
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > > Sorry for the late reply.
> > >>> > >> > > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > > In short, I could not submit the
> updated
> > >>> KIP
> > >>> > by
> > >>> > >> the
> > >>> > >> > > > > feature
> > >>> > >> > > > > > > > > freeze
> > >>> > >> > > > > > > > > > > > > > deadline of 2.0.0. For this reason, it
> > >>> will
> > >>> > not
> > >>> > >> be
> > >>> > >> > > > > included
> > >>> > >> > > > > > > in
> > >>> > >> > > > > > > > > the
> > >>> > >> > > > > > > > > > > > 2.0.0
> > >>> > >> > > > > > > > > > > > > > release and all discussion for this
> > issue
> > >>> were
> > >>> > >> > > > postponed
> > >>> > >> > > > > > > after
> > >>> > >> > > > > > > > > the
> > >>> > >> > > > > > > > > > > > release
> > >>> > >> > > > > > > > > > > > > > of 2.0.0.
> > >>> > >> > > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > > I have been updating the PR following
> > >>> recent
> > >>> > >> > updates.
> > >>> > >> > > > > Just
> > >>> > >> > > > > > > > now, I
> > >>> > >> > > > > > > > > > > > rebased
> > >>> > >> > > > > > > > > > > > > > it against the latest trunk and
> updated
> > >>> the
> > >>> > zstd
> > >>> > >> > > > version
> > >>> > >> > > > > > into
> > >>> > >> > > > > > > > > > 1.3.5.
> > >>> > >> > > > > > > > > > > > If you
> > >>> > >> > > > > > > > > > > > > > need some request, don't hesitate to
> > >>> notify
> > >>> > me.
> > >>> > >> > (But
> > >>> > >> > > > not
> > >>> > >> > > > > > this
> > >>> > >> > > > > > > > > > thread
> > >>> > >> > > > > > > > > > > -
> > >>> > >> > > > > > > > > > > > just
> > >>> > >> > > > > > > > > > > > > > send me the message directly.)
> > >>> > >> > > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > > Best,
> > >>> > >> > > > > > > > > > > > > > Dongjin
> > >>> > >> > > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby
> > >>> Evans <
> > >>> > >> > > > > > > bobby@apache.org
> > >>> > >> > > > > > > > >
> > >>> > >> > > > > > > > > > > wrote:
> > >>> > >> > > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > >> I there any update on this.  The
> > >>> performance
> > >>> > >> > > > > improvements
> > >>> > >> > > > > > > are
> > >>> > >> > > > > > > > > > quite
> > >>> > >> > > > > > > > > > > > > >> impressive and I really would like to
> > >>> stop
> > >>> > >> forking
> > >>> > >> > > > kafka
> > >>> > >> > > > > > > just
> > >>> > >> > > > > > > > to
> > >>> > >> > > > > > > > > > get
> > >>> > >> > > > > > > > > > > > this
> > >>> > >> > > > > > > > > > > > > >> in.
> > >>> > >> > > > > > > > > > > > > >>
> > >>> > >> > > > > > > > > > > > > >> Thanks,
> > >>> > >> > > > > > > > > > > > > >>
> > >>> > >> > > > > > > > > > > > > >> Bobby
> > >>> > >> > > > > > > > > > > > > >>
> > >>> > >> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM
> Dongjin
> > >>> Lee <
> > >>> > >> > > > > > > > dongjin@apache.org
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > > > wrote:
> > >>> > >> > > > > > > > > > > > > >>
> > >>> > >> > > > > > > > > > > > > >> > Ismael,
> > >>> > >> > > > > > > > > > > > > >> >
> > >>> > >> > > > > > > > > > > > > >> > Oh, I forgot all of you are on
> > working
> > >>> > frenzy
> > >>> > >> > for
> > >>> > >> > > > 2.0!
> > >>> > >> > > > > > No
> > >>> > >> > > > > > > > > > problem,
> > >>> > >> > > > > > > > > > > > take
> > >>> > >> > > > > > > > > > > > > >> > your time. I am also working at
> > another
> > >>> > issue
> > >>> > >> > now.
> > >>> > >> > > > > Thank
> > >>> > >> > > > > > > you
> > >>> > >> > > > > > > > > for
> > >>> > >> > > > > > > > > > > > > >> letting me
> > >>> > >> > > > > > > > > > > > > >> > know.
> > >>> > >> > > > > > > > > > > > > >> >
> > >>> > >> > > > > > > > > > > > > >> > Best,
> > >>> > >> > > > > > > > > > > > > >> > Dongjin
> > >>> > >> > > > > > > > > > > > > >> >
> > >>> > >> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM
> Ismael
> > >>> Juma
> > >>> > <
> > >>> > >> > > > > > > > ismaelj@gmail.com
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > > > wrote:
> > >>> > >> > > > > > > > > > > > > >> >
> > >>> > >> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin.
> > >>> Everyone is
> > >>> > >> busy
> > >>> > >> > > > > > finalising
> > >>> > >> > > > > > > > > > 2.0.0.
> > >>> > >> > > > > > > > > > > > This
> > >>> > >> > > > > > > > > > > > > >> KIP
> > >>> > >> > > > > > > > > > > > > >> > > seems like a great candidate for
> > >>> 2.1.0
> > >>> > and
> > >>> > >> > > > hopefully
> > >>> > >> > > > > > > there
> > >>> > >> > > > > > > > > > will
> > >>> > >> > > > > > > > > > > be
> > >>> > >> > > > > > > > > > > > > >> more
> > >>> > >> > > > > > > > > > > > > >> > of
> > >>> > >> > > > > > > > > > > > > >> > > a discussion next week. :)
> > >>> > >> > > > > > > > > > > > > >> > >
> > >>> > >> > > > > > > > > > > > > >> > > Ismael
> > >>> > >> > > > > > > > > > > > > >> > >
> > >>> > >> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17
> Dongjin
> > >>> Lee, <
> > >>> > >> > > > > > > > dongjin@apache.org
> > >>> > >> > > > > > > > > >
> > >>> > >> > > > > > > > > > > > wrote:
> > >>> > >> > > > > > > > > > > > > >> > >
> > >>> > >> > > > > > > > > > > > > >> > > > Hello. I just updated my draft
> > >>> > >> > implementation:
> > >>> > >> > > > > > > > > > > > > >> > > >
> > >>> > >> > > > > > > > > > > > > >> > > > 1. Rebased to latest trunk
> > (commit
> > >>> > >> 5145d6b)
> > >>> > >> > > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> > >>> > >> > > > > > > > > > > > > >> > > >
> > >>> > >> > > > > > > > > > > > > >> > > > You can check out the
> > >>> implementation
> > >>> > from
> > >>> > >> > here
> > >>> > >> > > > > > > > > > > > > >> > > > <
> > >>> > >> https://github.com/apache/kafka/pull/2267
> > >>> > >> > >.
> > >>> > >> > > If
> > >>> > >> > > > > you
> > >>> > >> > > > > > > > > > > experience
> > >>> > >> > > > > > > > > > > > any
> > >>> > >> > > > > > > > > > > > > >> > > problem
> > >>> > >> > > > > > > > > > > > > >> > > > running it, don't hesitate to
> > give
> > >>> me a
> > >>> > >> > > mention.
> > >>> > >> > > > > > > > > > > > > >> > > >
> > >>> > >> > > > > > > > > > > > > >> > > > Best,
> > >>> > >> > > > > > > > > > > > > >> > > > Dongjin
> > >>> > >> > > > > > > > > > > > > >> > > >
> > >>> > >> > > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM
> > >>> Dongjin
> > >>> > >> Lee
> > >>> > >> > <
> > >>> > >> > > > > > > > > > > dongjin@apache.org
> > >>> > >> > > > > > > > > > > > >
> > >>> > >> > > > > > > > > > > > > >> > wrote:
> > >>> > >> > > > > > > > > > > > > >> > > >
> > >>> > >> > > > > > > > > > > > > >> > > > > Here is the short conclusion
> > >>> about
> > >>> > the
> > >>> > >> > > license
> > >>> > >> > > > > > > > problem:
> > >>> > >> > > > > > > > > > *We
> > >>> > >> > > > > > > > > > > > can
> > >>> > >> > > > > > > > > > > > > >> use
> > >>> > >> > > > > > > > > > > > > >> > > zstd
> > >>> > >> > > > > > > > > > > > > >> > > > > and zstd-jni without any
> > >>> problem, but
> > >>> > >> we
> > >>> > >> > > need
> > >>> > >> > > > to
> > >>> > >> > > > > > > > include
> > >>> > >> > > > > > > > > > > their
> > >>> > >> > > > > > > > > > > > > >> > license,
> > >>> > >> > > > > > > > > > > > > >> > > > > e.g., BSD license.*
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License
> &
> > 3
> > >>> > Clause
> > >>> > >> > > > License
> > >>> > >> > > > > > > > requires
> > >>> > >> > > > > > > > > > to
> > >>> > >> > > > > > > > > > > > > >> include
> > >>> > >> > > > > > > > > > > > > >> > the
> > >>> > >> > > > > > > > > > > > > >> > > > > license used, and BSD 3
> Clause
> > >>> > License
> > >>> > >> > > > requires
> > >>> > >> > > > > > that
> > >>> > >> > > > > > > > the
> > >>> > >> > > > > > > > > > > name
> > >>> > >> > > > > > > > > > > > of
> > >>> > >> > > > > > > > > > > > > >> the
> > >>> > >> > > > > > > > > > > > > >> > > > > contributor can't be used to
> > >>> endorse
> > >>> > or
> > >>> > >> > > > promote
> > >>> > >> > > > > > the
> > >>> > >> > > > > > > > > > product.
> > >>> > >> > > > > > > > > > > > > >> That's
> > >>> > >> > > > > > > > > > > > > >> > it
> > >>> > >> > > > > > > > > > > > > >> > > > > <
> > >>> > >> > > > > > > > > > > > > >> > > >
> > >>> > >> > > > > > > > > > > > > >> > >
> > >>> > >> > > > > > > > > > > > > >> >
> > >>> > >> > > > > > > > > > > > > >>
> > >>> > >> > http://www.mikestratton.net/2011/12/is-bsd-license-
> > >>> > >> > > > > > > > > > > > compatible-with-apache-2-0-license/
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > > - They are not listed in the
> > >>> list of
> > >>> > >> > > > prohibited
> > >>> > >> > > > > > > > licenses
> > >>> > >> > > > > > > > > > > > > >> > > > > <
> https://www.apache.org/legal/
> > >>> > >> > > > > > > > resolved.html#category-x>
> > >>> > >> > > > > > > > > > > also.
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > > Here is how Spark did for it
> > >>> > >> > > > > > > > > > > > > >> > > > > <https://issues.apache.org/
> > >>> > >> > > > > > jira/browse/SPARK-19112
> > >>> > >> > > > > > > >:
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > > - They made a directory
> > >>> dedicated to
> > >>> > >> the
> > >>> > >> > > > > > dependency
> > >>> > >> > > > > > > > > > license
> > >>> > >> > > > > > > > > > > > files
> > >>> > >> > > > > > > > > > > > > >> > > > > <
> > >>> > >> > > > > > > https://github.com/apache/
> spark/tree/master/licenses
> > >>> > >> > > > > > > > >
> > >>> > >> > > > > > > > > > and
> > >>> > >> > > > > > > > > > > > added
> > >>> > >> > > > > > > > > > > > > >> > > > licenses
> > >>> > >> > > > > > > > > > > > > >> > > > > for Zstd
> > >>> > >> > > > > > > > > > > > > >> > > > > <
> > >>> > >> > > > > > > > > > > > > >> >
> > >>> > >> > > > https://github.com/apache/spark/blob/master/licenses/
> > >>> > >> > > > > > > > > > > > LICENSE-zstd.txt
> > >>> > >> > > > > > > > > > > > > >> > > >
> > >>> > >> > > > > > > > > > > > > >> > > > &
> > >>> > >> > > > > > > > > > > > > >> > > > > Zstd-jni
> > >>> > >> > > > > > > > > > > > > >> > > > > <
> > >>> > >> > > > > > > > > > > > > >> > > >
> > >>> > >> > > > > > > > > > > > > >> > >
> > >>> > >> > > > > > > > > > > > > >> >
> > >>> > >> > > > > > > > > > > > > >> https://github.com/apache/
> > >>> > >> > > spark/blob/master/licenses/
> > >>> > >> > > > > > > > > > > > LICENSE-zstd-jni.txt
> > >>> > >> > > > > > > > > > > > > >> >
> > >>> > >> > > > > > > > > > > > > >> > > > > .
> > >>> > >> > > > > > > > > > > > > >> > > > > - Added a link to the
> original
> > >>> > license
> > >>> > >> > files
> > >>> > >> > > > in
> > >>> > >> > > > > > > > LICENSE.
> > >>> > >> > > > > > > > > > > > > >> > > > > <
> > >>> > >> > > > > https://github.com/apache/spark/pull/18805/files
> > >>> > >> > > > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > > If needed, I can make a
> similar
> > >>> > update.
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > > Thanks for pointing out this
> > >>> problem,
> > >>> > >> > > Viktor!
> > >>> > >> > > > > Nice
> > >>> > >> > > > > > > > > catch!
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > > Best,
> > >>> > >> > > > > > > > > > > > > >> > > > > Dongjin
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50
> > PM
> > >>> > >> Dongjin
> > >>> > >> > > Lee <
> > >>> > >> > > > > > > > > > > > dongjin@apache.org>
> > >>> > >> > > > > > > > > > > > > >> > > wrote:
> > >>> > >> > > > > > > > > > > > > >> > > > >
> > >>> > >> > > > > > > > > > > > > >> > > > >> I greatly appreciate your
> > >>> > >> comprehensive
> > >>> > >> > > > > > reasoning.
> > >>> > >> > > > > > > > so:
> > >>> > >> > > > > > > > > +1
> > >>> > >> > > > > > > > > > > > for b
> > >>> > >> > > > > > > > > > > > > >> > until
> > >>> > >> > > > > > > > > > > > > >> > > > now.
> > >>> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Magnus Edenhill <ma...@edenhill.se>.
> Ismael wrote:
> Jason, that's an interesting point regarding the Java client. Do we know
> what clients in other languages do in these cases?

librdkafka (and its bindings) passes unknown/future errors through to the
application, the error code remains intact while
the error string will be set to something like "Err-123?", which isn't very
helpful to the user but it at least
preserves the original error code for further troubleshooting.
For the producer any unknown error returned in the ProduceResponse will be
considered a permanent delivery failure (no retries),
and for the consumer any unknown FetchResponse errors will propagate
directly to the application, trigger a fetch backoff, and then
continue fetching past that offset.

So, from the client's perspective it is not really a problem if new error
codes are added to older API versions.

/Magnus


Den tors 6 sep. 2018 kl 09:45 skrev Dongjin Lee <do...@apache.org>:

> I updated the KIP page
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
> >
> following the discussion here. Please take a look when you are free.
> If you have any opinion, don't hesitate to give me a message.
>
> Best,
> Dongjin
>
> On Fri, Aug 31, 2018 at 11:35 PM Dongjin Lee <do...@apache.org> wrote:
>
> > I just updated the draft implementation[^1], rebasing against the latest
> > trunk and implementing error routine (i.e., Error code 74 for
> > UnsupportedCompressionTypeException.) Since we decided to disallow all
> > fetch request below version 2.1.0 for the topics specifying ZStandard, I
> > added an error logic only.
> >
> > Please have a look when you are free.
> >
> > Thanks,
> > Dongjin
> >
> > [^1]: Please check the last commit here:
> > https://github.com/apache/kafka/pull/2267
> >
> > On Thu, Aug 23, 2018, 8:55 AM Dongjin Lee <do...@apache.org> wrote:
> >
> >> Jason,
> >>
> >> Great. +1 for UNSUPPORTED_COMPRESSION_TYPE.
> >>
> >> Best,
> >> Dongjin
> >>
> >> On Thu, Aug 23, 2018 at 8:19 AM Jason Gustafson <ja...@confluent.io>
> >> wrote:
> >>
> >>> Hey Dongjin,
> >>>
> >>> Yeah that's right. For what it's worth, librdkafka also appears to
> handle
> >>> unexpected error codes. I expect that most client implementations would
> >>> either pass through the raw type or convert to an enum using something
> >>> like
> >>> what the java client does. Since we're expecting the client to fail
> >>> anyway,
> >>> I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE error
> >>> code.
> >>>
> >>> -Jason
> >>>
> >>> On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <do...@apache.org>
> wrote:
> >>>
> >>> > Jason and Ismael,
> >>> >
> >>> > It seems like the only thing we need to regard if we define a new
> error
> >>> > code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the implementation
> >>> of
> >>> > the other language clients, right? At least, this strategy causes any
> >>> > problem for Java client. Do I understand correctly?
> >>> >
> >>> > Thanks,
> >>> > Dongjin
> >>> >
> >>> > On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <do...@apache.org>
> >>> wrote:
> >>> >
> >>> > > Jason,
> >>> > >
> >>> > > > I think we would only use this error code when we /know/ that
> zstd
> >>> was
> >>> > > in use and the client doesn't support it? This is true if either 1)
> >>> the
> >>> > > message needs down-conversion and we encounter a zstd compressed
> >>> message,
> >>> > > or 2) if the topic is explicitly configured to use zstd.
> >>> > >
> >>> > > Yes, it is right. And you know, the case 1 includes 1.a) old
> clients'
> >>> > > request v0, v1 records or 1.b) implicit zstd, the compression type
> of
> >>> > > "producer" with Zstd compressed data.
> >>> > >
> >>> > > > However, if the compression type is set to "producer," then the
> >>> fetched
> >>> > > data may or may not be compressed with zstd. In this case, we
> return
> >>> the
> >>> > > data to the client and expect it to fail parsing. Is that correct?
> >>> > >
> >>> > > Exactly.
> >>> > >
> >>> > > Following your message, I reviewed the implementation of
> >>> > > `KafkaApis#handleFetchRequest,` which handles the fetch request.
> And
> >>> > found
> >>> > > that the information we can use is like the following:
> >>> > >
> >>> > > 1. Client's fetch request version. (`versionId` variable)
> >>> > > 2. Log's compression type. (`logConfig` variable)
> >>> > >
> >>> > > We can't detect the actual compression type of the data, unless we
> >>> > inspect
> >>> > > the `RecordBatch` included in the `Records` instance (i.e.,
> >>> > > `unconvertedRecords` variable.) Since it requires some performance
> >>> issue,
> >>> > > it is not our option - in short, we can't be sure if given chunks
> of
> >>> data
> >>> > > are compressed with zstd or not.
> >>> > >
> >>> > > So, conclusion: we can return an error in the case of 1.a and 2
> >>> easily,
> >>> > > with the information above. In the case 1.b (implicit zstd), we can
> >>> just
> >>> > > return the data by do nothing special and expect it to fail
> parsing.
> >>> > >
> >>> > > Thanks,
> >>> > > Dongjin
> >>> > >
> >>> > > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <is...@gmail.com>
> >>> wrote:
> >>> > >
> >>> > >> Jason, that's an interesting point regarding the Java client. Do
> we
> >>> know
> >>> > >> what clients in other languages do in these cases?
> >>> > >>
> >>> > >> Ismael
> >>> > >>
> >>> > >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <ja...@confluent.io>
> >>> wrote:
> >>> > >>
> >>> > >> > Hi Dongjin,
> >>> > >> >
> >>> > >> > One of the complications is that old versions of the API will
> not
> >>> > >> expect a
> >>> > >> > new error code. However, since we expect this to be a fatal
> error
> >>> > anyway
> >>> > >> > for old clients, it may still be more useful to return the
> correct
> >>> > error
> >>> > >> > code. For example, the Kafka clients use the following code to
> >>> convert
> >>> > >> the
> >>> > >> > error code:
> >>> > >> >
> >>> > >> >     public static Errors forCode(short code) {
> >>> > >> >         Errors error = codeToError.get(code);
> >>> > >> >         if (error != null) {
> >>> > >> >             return error;
> >>> > >> >         } else {
> >>> > >> >             log.warn("Unexpected error code: {}.", code);
> >>> > >> >             return UNKNOWN_SERVER_ERROR;
> >>> > >> >         }
> >>> > >> >     }
> >>> > >> >
> >>> > >> > If we return an unsupported error code, it will be converted to
> an
> >>> > >> UNKNOWN
> >>> > >> > error, but at least we will get the message in the log with the
> >>> > correct
> >>> > >> > code. That seems preferable to returning a misleading error
> code.
> >>> So I
> >>> > >> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error
> >>> even
> >>> > for
> >>> > >> > older versions.
> >>> > >> >
> >>> > >> > Also, one question just to check my understanding. I think we
> >>> would
> >>> > only
> >>> > >> > use this error code when we /know/ that zstd was in use and the
> >>> client
> >>> > >> > doesn't support it? This is true if either 1) the message needs
> >>> > >> > down-conversion and we encounter a zstd compressed message, or
> 2)
> >>> if
> >>> > the
> >>> > >> > topic is explicitly configured to use zstd. However, if the
> >>> > compression
> >>> > >> > type is set to "producer," then the fetched data may or may not
> be
> >>> > >> > compressed with zstd. In this case, we return the data to the
> >>> client
> >>> > and
> >>> > >> > expect it to fail parsing. Is that correct?
> >>> > >> >
> >>> > >> > Thanks,
> >>> > >> > Jason
> >>> > >> >
> >>> > >> >
> >>> > >> >
> >>> > >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <
> dongjin@apache.org>
> >>> > >> wrote:
> >>> > >> >
> >>> > >> > > Ismael, Jason and all,
> >>> > >> > >
> >>> > >> > > I rewrote the backward compatibility strategy & its
> alternatives
> >>> > like
> >>> > >> > > following, based on Ismael & Jason's comments. Since it is not
> >>> > >> updated to
> >>> > >> > > the wiki yet, don't hesitate to give me a message if you have
> >>> any
> >>> > >> opinion
> >>> > >> > > on it.
> >>> > >> > >
> >>> > >> > > ```
> >>> > >> > > *Backward Compatibility*
> >>> > >> > >
> >>> > >> > > We need to establish some backward-compatibility strategy for
> >>> the
> >>> > >> case an
> >>> > >> > > old client subscribes a topic using ZStandard implicitly
> (i.e.,
> >>> > >> > > 'compression.type' configuration of given topic is 'producer'
> >>> and
> >>> > the
> >>> > >> > > producer compressed the records with ZStandard). We have the
> >>> > following
> >>> > >> > > options for this situation:
> >>> > >> > >
> >>> > >> > > *A. Support ZStandard to the old clients which can understand
> >>> v0, v1
> >>> > >> > > messages only.*
> >>> > >> > >
> >>> > >> > > This strategy necessarily requires the down-conversion of v2
> >>> message
> >>> > >> > > compressed with Zstandard into v0 or v1 messages, which means
> a
> >>> > >> > > considerable performance degradation. So we rejected this
> >>> strategy.
> >>> > >> > >
> >>> > >> > > *B. Bump the API version and support only v2-available
> clients*
> >>> > >> > >
> >>> > >> > > With this approach, we can message the old clients that they
> >>> are old
> >>> > >> and
> >>> > >> > > should be upgraded. However, there are still several options
> >>> for the
> >>> > >> > Error
> >>> > >> > > code.
> >>> > >> > >
> >>> > >> > > *B.1. INVALID_REQUEST (42)*
> >>> > >> > >
> >>> > >> > > This option gives the client so little information; the user
> >>> can be
> >>> > >> > > confused about why the client worked correctly in the past
> >>> suddenly
> >>> > >> > > encounters a problem. So we rejected this strategy.
> >>> > >> > >
> >>> > >> > > *B.2. CORRUPT_MESSAGE (2)*
> >>> > >> > >
> >>> > >> > > This option gives inaccurate information; the user can be
> >>> surprised
> >>> > >> and
> >>> > >> > > misunderstand that the log files are broken in some way. So we
> >>> > >> rejected
> >>> > >> > > this strategy.
> >>> > >> > >
> >>> > >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
> >>> > >> > >
> >>> > >> > > The advantage of this approach is we don't need to define a
> new
> >>> > error
> >>> > >> > code;
> >>> > >> > > we can reuse it and that's all.
> >>> > >> > >
> >>> > >> > > The disadvantage of this approach is that it is also a little
> >>> bit
> >>> > >> vague;
> >>> > >> > > This error code is defined as a work for KIP-98[^1] and now
> >>> returned
> >>> > >> in
> >>> > >> > the
> >>> > >> > > transaction error.
> >>> > >> > >
> >>> > >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
> >>> > >> > >
> >>> > >> > > The advantage of this approach is that it is clear and
> provides
> >>> an
> >>> > >> exact
> >>> > >> > > description. The disadvantage is we need to add a new error
> >>> code.
> >>> > >> > > ```
> >>> > >> > >
> >>> > >> > > *It seems like what we need to choose is now so clear:
> >>> > >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or
> >>> UNSUPPORTED_COMPRESSION_TYPE
> >>> > >> > > (B.4).*
> >>> > >> > > The first one doesn't need a new error message but the latter
> is
> >>> > more
> >>> > >> > > explicit. Which one do you prefer? Since all of you have much
> >>> more
> >>> > >> > > experience and knowledge than me, I will follow your decision.
> >>> The
> >>> > >> wiki
> >>> > >> > > page will be updated following the decision also.
> >>> > >> > >
> >>> > >> > > Best,
> >>> > >> > > Dongjin
> >>> > >> > >
> >>> > >> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
> >>> > >> > >
> >>> > >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <
> ismaelj@gmail.com>
> >>> > >> wrote:
> >>> > >> > >
> >>> > >> > > > Sounds reasonable to me.
> >>> > >> > > >
> >>> > >> > > > Ismael
> >>> > >> > > >
> >>> > >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <
> >>> jason@confluent.io>
> >>> > >> > wrote:
> >>> > >> > > >
> >>> > >> > > > > Hey Ismael,
> >>> > >> > > > >
> >>> > >> > > > > Your summary looks good to me. I think it might also be a
> >>> good
> >>> > >> idea
> >>> > >> > to
> >>> > >> > > > add
> >>> > >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along
> >>> with
> >>> > the
> >>> > >> > > > version
> >>> > >> > > > > bumps. We won't be able to use it for old api versions
> >>> since the
> >>> > >> > > clients
> >>> > >> > > > > will not understand it, but we can use it going forward so
> >>> that
> >>> > >> we're
> >>> > >> > > not
> >>> > >> > > > > stuck in a similar situation with a new message format and
> >>> a new
> >>> > >> > codec
> >>> > >> > > to
> >>> > >> > > > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_
> >>> > FORMAT,
> >>> > >> but
> >>> > >> > > it
> >>> > >> > > > is
> >>> > >> > > > > not as explicit.
> >>> > >> > > > >
> >>> > >> > > > > -Jason
> >>> > >> > > > >
> >>> > >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <
> >>> ismael@juma.me.uk
> >>> > >
> >>> > >> > > wrote:
> >>> > >> > > > >
> >>> > >> > > > > > Hi Dongjin and Jason,
> >>> > >> > > > > >
> >>> > >> > > > > > I would agree. My summary:
> >>> > >> > > > > >
> >>> > >> > > > > > 1. Support zstd with message format 2 only.
> >>> > >> > > > > > 2. Bump produce and fetch request versions.
> >>> > >> > > > > > 3. Provide broker errors whenever possible based on the
> >>> > request
> >>> > >> > > version
> >>> > >> > > > > and
> >>> > >> > > > > > rely on clients for the cases where the broker can't
> >>> validate
> >>> > >> > > > efficiently
> >>> > >> > > > > > (example message format 2 consumer that supports the
> >>> latest
> >>> > >> fetch
> >>> > >> > > > version
> >>> > >> > > > > > but doesn't support zstd).
> >>> > >> > > > > >
> >>> > >> > > > > > If there's general agreement on this, I suggest we
> update
> >>> the
> >>> > >> KIP
> >>> > >> > to
> >>> > >> > > > > state
> >>> > >> > > > > > the proposal and to move the rejected options to its own
> >>> > >> section.
> >>> > >> > And
> >>> > >> > > > > then
> >>> > >> > > > > > start a vote!
> >>> > >> > > > > >
> >>> > >> > > > > > Ismael
> >>> > >> > > > > >
> >>> > >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
> >>> > >> > jason@confluent.io>
> >>> > >> > > > > > wrote:
> >>> > >> > > > > >
> >>> > >> > > > > > > Hi Dongjin,
> >>> > >> > > > > > >
> >>> > >> > > > > > > Yes, that's a good summary. For clients which support
> >>> v2,
> >>> > the
> >>> > >> > > client
> >>> > >> > > > > can
> >>> > >> > > > > > > parse the message format and hopefully raise a useful
> >>> error
> >>> > >> > message
> >>> > >> > > > > > > indicating the unsupported compression type. For older
> >>> > >> clients,
> >>> > >> > our
> >>> > >> > > > > > options
> >>> > >> > > > > > > are probably (1) to down-convert to the old format
> >>> using no
> >>> > >> > > > compression
> >>> > >> > > > > > > type, or (2) to return an error code. I'm leaning
> >>> toward the
> >>> > >> > latter
> >>> > >> > > > as
> >>> > >> > > > > > the
> >>> > >> > > > > > > simpler solution, but the challenge is finding a good
> >>> error
> >>> > >> code.
> >>> > >> > > Two
> >>> > >> > > > > > > possibilities might be INVALID_REQUEST or
> >>> CORRUPT_MESSAGE.
> >>> > The
> >>> > >> > > > downside
> >>> > >> > > > > > is
> >>> > >> > > > > > > that old clients probably won't get a helpful message.
> >>> > >> However,
> >>> > >> > at
> >>> > >> > > > > least
> >>> > >> > > > > > > the behavior will be consistent in the sense that all
> >>> > clients
> >>> > >> > will
> >>> > >> > > > fail
> >>> > >> > > > > > if
> >>> > >> > > > > > > they do not support zstandard.
> >>> > >> > > > > > >
> >>> > >> > > > > > > What do you think?
> >>> > >> > > > > > >
> >>> > >> > > > > > > Thanks,
> >>> > >> > > > > > > Jason
> >>> > >> > > > > > >
> >>> > >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
> >>> > >> dongjin@apache.org
> >>> > >> > >
> >>> > >> > > > > wrote:
> >>> > >> > > > > > >
> >>> > >> > > > > > > > Thanks Jason, I reviewed the down-converting logic
> >>> > following
> >>> > >> > your
> >>> > >> > > > > > > > explanation.[^1] You mean the following routines,
> >>> right?
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > -
> >>> > >> > > > > > > >
> https://github.com/apache/kafka/blob/trunk/core/src/
> >>> > >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> >>> > >> > > > > > > > -
> >>> > >> > > > > > > >
> >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> >>> > >> > > > > > > > main/java/org/apache/kafka/common/record/
> >>> > >> > > LazyDownConversionRecords.
> >>> > >> > > > > > > > java#L165
> >>> > >> > > > > > > > -
> >>> > >> > > > > > > >
> >>> https://github.com/apache/kafka/blob/trunk/clients/src/
> >>> > >> > > > > > > >
> >>> > >> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > It seems like your stance is like following:
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > 1. In principle, Kafka does not change the
> compression
> >>> > codec
> >>> > >> > when
> >>> > >> > > > > > > > down-converting, since it requires inspecting the
> >>> fetched
> >>> > >> data,
> >>> > >> > > > which
> >>> > >> > > > > > is
> >>> > >> > > > > > > > expensive.
> >>> > >> > > > > > > > 2. However, there are some cases the fetched data is
> >>> > >> inspected
> >>> > >> > > > > anyway.
> >>> > >> > > > > > In
> >>> > >> > > > > > > > this case, we can provide compression conversion
> from
> >>> > >> Zstandard
> >>> > >> > > to
> >>> > >> > > > > > > > classical ones[^2].
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > And from what I understand, the cases where the
> client
> >>> > >> without
> >>> > >> > > > > > ZStandard
> >>> > >> > > > > > > > support receives ZStandard compressed records can be
> >>> > >> organized
> >>> > >> > > into
> >>> > >> > > > > two
> >>> > >> > > > > > > > cases:
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > a. The 'compression.type' configuration of given
> >>> topic is
> >>> > >> > > > 'producer'
> >>> > >> > > > > > and
> >>> > >> > > > > > > > the producer compressed the records with ZStandard.
> >>> (that
> >>> > >> is,
> >>> > >> > > using
> >>> > >> > > > > > > > ZStandard implicitly.)
> >>> > >> > > > > > > > b.  The 'compression.type' configuration of given
> >>> topic is
> >>> > >> > > 'zstd';
> >>> > >> > > > > that
> >>> > >> > > > > > > is,
> >>> > >> > > > > > > > using ZStandard explicitly.
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > As you stated, we don't have to handle the case b
> >>> > specially.
> >>> > >> > So,
> >>> > >> > > It
> >>> > >> > > > > > seems
> >>> > >> > > > > > > > like we can narrow the focus of the problem by
> joining
> >>> > case
> >>> > >> 1
> >>> > >> > and
> >>> > >> > > > > case
> >>> > >> > > > > > b
> >>> > >> > > > > > > > like the following:
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > > Given the topic with 'producer' as its
> >>> > 'compression.type'
> >>> > >> > > > > > > configuration,
> >>> > >> > > > > > > > ZStandard compressed records and old client without
> >>> > >> ZStandard,
> >>> > >> > is
> >>> > >> > > > > there
> >>> > >> > > > > > > any
> >>> > >> > > > > > > > case we need to inspect the records and can change
> the
> >>> > >> > > compression
> >>> > >> > > > > > type?
> >>> > >> > > > > > > If
> >>> > >> > > > > > > > so, can we provide compression type converting?
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > Do I understand correctly?
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > Best,
> >>> > >> > > > > > > > Dongjin
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > [^1]: I'm sorry, I found that I was a little bit
> >>> > >> > misunderstanding
> >>> > >> > > > how
> >>> > >> > > > > > API
> >>> > >> > > > > > > > version works, after reviewing the downconvert logic
> >>> & the
> >>> > >> > > protocol
> >>> > >> > > > > > > > documentation <https://kafka.apache.org/protocol>.
> >>> > >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
> >>> > >> > > > jason@confluent.io>
> >>> > >> > > > > > > > wrote:
> >>> > >> > > > > > > >
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > But in my opinion, since the client will fail
> >>> with the
> >>> > >> API
> >>> > >> > > > > version,
> >>> > >> > > > > > > so
> >>> > >> > > > > > > > we
> >>> > >> > > > > > > > > > don't need to down-convert the messages anyway.
> >>> Isn't
> >>> > >> it?
> >>> > >> > > So, I
> >>> > >> > > > > > think
> >>> > >> > > > > > > > we
> >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
> >>> > >> familiar
> >>> > >> > > with
> >>> > >> > > > > > > > > down-convert
> >>> > >> > > > > > > > > > logic.)
> >>> > >> > > > > > > > >
> >>> > >> > > > > > > > >
> >>> > >> > > > > > > > > Currently the broker down-converts automatically
> >>> when it
> >>> > >> > > receives
> >>> > >> > > > > an
> >>> > >> > > > > > > old
> >>> > >> > > > > > > > > version of the fetch request (a version which is
> >>> known
> >>> > to
> >>> > >> > > predate
> >>> > >> > > > > the
> >>> > >> > > > > > > > > message format in use). Typically when
> >>> down-converting
> >>> > the
> >>> > >> > > > message
> >>> > >> > > > > > > > format,
> >>> > >> > > > > > > > > we use the same compression type, but there is not
> >>> much
> >>> > >> point
> >>> > >> > > in
> >>> > >> > > > > > doing
> >>> > >> > > > > > > so
> >>> > >> > > > > > > > > when we know the client doesn't support it. So if
> >>> > >> zstandard
> >>> > >> > is
> >>> > >> > > in
> >>> > >> > > > > > use,
> >>> > >> > > > > > > > and
> >>> > >> > > > > > > > > we have to down-convert anyway, then we can choose
> >>> to
> >>> > use
> >>> > >> a
> >>> > >> > > > > different
> >>> > >> > > > > > > > > compression type or no compression type.
> >>> > >> > > > > > > > >
> >>> > >> > > > > > > > > From my perspective, there is no significant
> >>> downside to
> >>> > >> > > bumping
> >>> > >> > > > > the
> >>> > >> > > > > > > > > protocol version and it has several potential
> >>> benefits.
> >>> > >> > Version
> >>> > >> > > > > bumps
> >>> > >> > > > > > > are
> >>> > >> > > > > > > > > cheap. The main question mark in my mind is about
> >>> > >> > > > down-conversion.
> >>> > >> > > > > > > > Figuring
> >>> > >> > > > > > > > > out whether down-conversion is needed is hard
> >>> generally
> >>> > >> > without
> >>> > >> > > > > > > > inspecting
> >>> > >> > > > > > > > > the fetched data, which is expensive. I think we
> >>> agree
> >>> > in
> >>> > >> > > > principle
> >>> > >> > > > > > > that
> >>> > >> > > > > > > > we
> >>> > >> > > > > > > > > do not want to have to pay this cost generally and
> >>> > prefer
> >>> > >> the
> >>> > >> > > > > clients
> >>> > >> > > > > > > to
> >>> > >> > > > > > > > > fail when they see an unhandled compression type.
> >>> The
> >>> > >> point I
> >>> > >> > > was
> >>> > >> > > > > > > making
> >>> > >> > > > > > > > is
> >>> > >> > > > > > > > > that there are some cases where we are either
> >>> inspecting
> >>> > >> the
> >>> > >> > > data
> >>> > >> > > > > > > anyway
> >>> > >> > > > > > > > > (because we have to down-convert the message
> >>> format), or
> >>> > >> we
> >>> > >> > > have
> >>> > >> > > > an
> >>> > >> > > > > > > easy
> >>> > >> > > > > > > > > way to tell whether zstandard is in use (the topic
> >>> has
> >>> > it
> >>> > >> > > > > configured
> >>> > >> > > > > > > > > explicitly). In the latter case, we don't have to
> >>> handle
> >>> > >> it
> >>> > >> > > > > > specially.
> >>> > >> > > > > > > > But
> >>> > >> > > > > > > > > we do have to decide how we will handle
> >>> down-conversion
> >>> > to
> >>> > >> > > older
> >>> > >> > > > > > > formats.
> >>> > >> > > > > > > > >
> >>> > >> > > > > > > > > -Jason
> >>> > >> > > > > > > > >
> >>> > >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
> >>> > >> > > dongjin@apache.org
> >>> > >> > > > >
> >>> > >> > > > > > > wrote:
> >>> > >> > > > > > > > >
> >>> > >> > > > > > > > > > Colin and Jason,
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > Thanks for your opinions. In summarizing, the
> >>> Pros and
> >>> > >> Cons
> >>> > >> > > of
> >>> > >> > > > > > > bumping
> >>> > >> > > > > > > > > > fetch API version are:
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > Cons:
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > - The Broker can't know whether a given message
> >>> batch
> >>> > is
> >>> > >> > > > > compressed
> >>> > >> > > > > > > > with
> >>> > >> > > > > > > > > > zstd or not.
> >>> > >> > > > > > > > > > - Need some additional logic for the topic
> >>> explicitly
> >>> > >> > > > configured
> >>> > >> > > > > to
> >>> > >> > > > > > > use
> >>> > >> > > > > > > > > > zstd.
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > Pros:
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > - The broker doesn't need to conduct expensive
> >>> > >> > > down-conversion.
> >>> > >> > > > > > > > > > - Can message the users to update their client.
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > So, opinions for the backward-compatibility
> >>> policy by
> >>> > >> far:
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > - A: bump the API version - +2 (Colin, Jason)
> >>> > >> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > Here are my additional comments:
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > @Colin
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > I greatly appreciate your response. In the case
> >>> of the
> >>> > >> > > > dictionary
> >>> > >> > > > > > > > > support,
> >>> > >> > > > > > > > > > of course, this issue should be addressed later
> >>> so we
> >>> > >> don't
> >>> > >> > > > need
> >>> > >> > > > > it
> >>> > >> > > > > > > in
> >>> > >> > > > > > > > > the
> >>> > >> > > > > > > > > > first version. You are right - it is not late to
> >>> try
> >>> > it
> >>> > >> > after
> >>> > >> > > > > some
> >>> > >> > > > > > > > > > benchmarks. What I mean is, we should keep in
> >>> mind on
> >>> > >> that
> >>> > >> > > > > > potential
> >>> > >> > > > > > > > > > feature.
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > @Jason
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > You wrote,
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > > Similarly, if we have to down-convert anyway
> >>> because
> >>> > >> the
> >>> > >> > > > client
> >>> > >> > > > > > > does
> >>> > >> > > > > > > > > not
> >>> > >> > > > > > > > > > understand the message format, then we could
> also
> >>> use
> >>> > a
> >>> > >> > > > different
> >>> > >> > > > > > > > > > compression type.
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > But in my opinion, since the client will fail
> >>> with the
> >>> > >> API
> >>> > >> > > > > version,
> >>> > >> > > > > > > so
> >>> > >> > > > > > > > we
> >>> > >> > > > > > > > > > don't need to down-convert the messages anyway.
> >>> Isn't
> >>> > >> it?
> >>> > >> > > So, I
> >>> > >> > > > > > think
> >>> > >> > > > > > > > we
> >>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
> >>> > >> familiar
> >>> > >> > > with
> >>> > >> > > > > > > > > down-convert
> >>> > >> > > > > > > > > > logic.)
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > Please give more opinions. Thanks!
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > - Dongjin
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
> >>> > >> > > > > jason@confluent.io
> >>> > >> > > > > > >
> >>> > >> > > > > > > > > wrote:
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > > Hey Colin,
> >>> > >> > > > > > > > > > >
> >>> > >> > > > > > > > > > > The problem for the fetch API is that the
> broker
> >>> > does
> >>> > >> not
> >>> > >> > > > > > generally
> >>> > >> > > > > > > > > know
> >>> > >> > > > > > > > > > if
> >>> > >> > > > > > > > > > > a batch was compressed with zstd unless it
> >>> parses
> >>> > it.
> >>> > >> I
> >>> > >> > > think
> >>> > >> > > > > the
> >>> > >> > > > > > > > goal
> >>> > >> > > > > > > > > > here
> >>> > >> > > > > > > > > > > is to avoid the expensive down-conversion that
> >>> is
> >>> > >> needed
> >>> > >> > to
> >>> > >> > > > > > ensure
> >>> > >> > > > > > > > > > > compatibility because it is only necessary if
> >>> zstd
> >>> > is
> >>> > >> > > > actually
> >>> > >> > > > > in
> >>> > >> > > > > > > > use.
> >>> > >> > > > > > > > > > But
> >>> > >> > > > > > > > > > > as long as old clients can parse the message
> >>> format,
> >>> > >> they
> >>> > >> > > > > should
> >>> > >> > > > > > > get
> >>> > >> > > > > > > > a
> >>> > >> > > > > > > > > > > reasonable error if they see an unsupported
> >>> > >> compression
> >>> > >> > > type
> >>> > >> > > > in
> >>> > >> > > > > > the
> >>> > >> > > > > > > > > > > attributes. Basically the onus is on users to
> >>> ensure
> >>> > >> that
> >>> > >> > > > their
> >>> > >> > > > > > > > > consumers
> >>> > >> > > > > > > > > > > have been updated prior to using zstd. It
> seems
> >>> > like a
> >>> > >> > > > > reasonable
> >>> > >> > > > > > > > > > tradeoff
> >>> > >> > > > > > > > > > > to me. There are a couple cases that might be
> >>> worth
> >>> > >> > > thinking
> >>> > >> > > > > > > through:
> >>> > >> > > > > > > > > > >
> >>> > >> > > > > > > > > > > 1. If a topic is explicitly configured to use
> >>> zstd,
> >>> > >> then
> >>> > >> > we
> >>> > >> > > > > don't
> >>> > >> > > > > > > > need
> >>> > >> > > > > > > > > to
> >>> > >> > > > > > > > > > > check the fetched data for the compression
> type
> >>> to
> >>> > >> know
> >>> > >> > if
> >>> > >> > > we
> >>> > >> > > > > > need
> >>> > >> > > > > > > > > > > down-conversion. If we did bump the Fetch API
> >>> > version,
> >>> > >> > then
> >>> > >> > > > we
> >>> > >> > > > > > > could
> >>> > >> > > > > > > > > > handle
> >>> > >> > > > > > > > > > > this case by either down-converting using a
> >>> > different
> >>> > >> > > > > compression
> >>> > >> > > > > > > > type
> >>> > >> > > > > > > > > or
> >>> > >> > > > > > > > > > > returning an error.
> >>> > >> > > > > > > > > > > 2. Similarly, if we have to down-convert
> anyway
> >>> > >> because
> >>> > >> > the
> >>> > >> > > > > > client
> >>> > >> > > > > > > > does
> >>> > >> > > > > > > > > > not
> >>> > >> > > > > > > > > > > understand the message format, then we could
> >>> also
> >>> > use
> >>> > >> a
> >>> > >> > > > > different
> >>> > >> > > > > > > > > > > compression type.
> >>> > >> > > > > > > > > > >
> >>> > >> > > > > > > > > > > For the produce API, I think it's reasonable
> to
> >>> bump
> >>> > >> the
> >>> > >> > > api
> >>> > >> > > > > > > version.
> >>> > >> > > > > > > > > > This
> >>> > >> > > > > > > > > > > can be used by clients to check whether a
> broker
> >>> > >> supports
> >>> > >> > > > zstd.
> >>> > >> > > > > > For
> >>> > >> > > > > > > > > > > example, we might support a list of preferred
> >>> > >> compression
> >>> > >> > > > types
> >>> > >> > > > > > in
> >>> > >> > > > > > > > the
> >>> > >> > > > > > > > > > > producer and we could use the broker to detect
> >>> which
> >>> > >> > > version
> >>> > >> > > > to
> >>> > >> > > > > > > use.
> >>> > >> > > > > > > > > > >
> >>> > >> > > > > > > > > > > -Jason
> >>> > >> > > > > > > > > > >
> >>> > >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
> >>> > >> > > > > cmccabe@apache.org
> >>> > >> > > > > > >
> >>> > >> > > > > > > > > wrote:
> >>> > >> > > > > > > > > > >
> >>> > >> > > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a
> >>> good
> >>> > >> > > > compression
> >>> > >> > > > > > > codec
> >>> > >> > > > > > > > > > and I
> >>> > >> > > > > > > > > > > > hope we can get this support in soon!
> >>> > >> > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > I would say we can just bump the API version
> >>> to
> >>> > >> > indicate
> >>> > >> > > > that
> >>> > >> > > > > > > ZStd
> >>> > >> > > > > > > > > > > support
> >>> > >> > > > > > > > > > > > is expected in new clients.  We probably
> need
> >>> some
> >>> > >> way
> >>> > >> > of
> >>> > >> > > > > > > > indicating
> >>> > >> > > > > > > > > to
> >>> > >> > > > > > > > > > > the
> >>> > >> > > > > > > > > > > > older clients that they can't consume the
> >>> > >> partitions,
> >>> > >> > as
> >>> > >> > > > > well.
> >>> > >> > > > > > > > > Perhaps
> >>> > >> > > > > > > > > > > we
> >>> > >> > > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT
> >>> error?
> >>> > >> > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > The license thing seems straightforward --
> >>> it's
> >>> > >> just a
> >>> > >> > > > matter
> >>> > >> > > > > > of
> >>> > >> > > > > > > > > adding
> >>> > >> > > > > > > > > > > > the text to the right files as per ASF
> >>> guidelines.
> >>> > >> > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > With regard to the dictionary support, do we
> >>> > really
> >>> > >> > need
> >>> > >> > > > that
> >>> > >> > > > > > in
> >>> > >> > > > > > > > the
> >>> > >> > > > > > > > > > > first
> >>> > >> > > > > > > > > > > > version?  Hopefully message batches are big
> >>> enough
> >>> > >> that
> >>> > >> > > > this
> >>> > >> > > > > > > isn't
> >>> > >> > > > > > > > > > > needed.
> >>> > >> > > > > > > > > > > > Some benchmarks might help here.
> >>> > >> > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > best,
> >>> > >> > > > > > > > > > > > Colin
> >>> > >> > > > > > > > > > > >
> >>> > >> > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee
> >>> wrote:
> >>> > >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot
> >>> this
> >>> > >> issue,
> >>> > >> > > > > KIP-110
> >>> > >> > > > > > > > > > > > > <https://cwiki.apache.org/
> >>> > >> > > confluence/display/KAFKA/KIP-
> >>> > >> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> >>> > >> > > > > > > > > > > > > .
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > For newcomers, Here is some summary of the
> >>> > >> history:
> >>> > >> > > > KIP-110
> >>> > >> > > > > > was
> >>> > >> > > > > > > > > > > > originally
> >>> > >> > > > > > > > > > > > > worked for the issue KAFKA-4514 but, it
> >>> lacked
> >>> > >> > > benchmark
> >>> > >> > > > > > > results
> >>> > >> > > > > > > > to
> >>> > >> > > > > > > > > > get
> >>> > >> > > > > > > > > > > > the
> >>> > >> > > > > > > > > > > > > agreement of the community. Later, Ivan
> >>> Babrou
> >>> > and
> >>> > >> > some
> >>> > >> > > > > other
> >>> > >> > > > > > > > users
> >>> > >> > > > > > > > > > who
> >>> > >> > > > > > > > > > > > > adopted the patch provided their excellent
> >>> > >> > performance
> >>> > >> > > > > report
> >>> > >> > > > > > > > which
> >>> > >> > > > > > > > > > is
> >>> > >> > > > > > > > > > > > now
> >>> > >> > > > > > > > > > > > > included in the KIP, but it postponed
> again
> >>> > >> because
> >>> > >> > of
> >>> > >> > > > the
> >>> > >> > > > > > > > > community
> >>> > >> > > > > > > > > > > was
> >>> > >> > > > > > > > > > > > > busy for 2.0.0 release. It is why I now
> >>> reboot
> >>> > >> this
> >>> > >> > > > issue.
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > The following is the current status of the
> >>> > >> feature:
> >>> > >> > You
> >>> > >> > > > can
> >>> > >> > > > > > > check
> >>> > >> > > > > > > > > the
> >>> > >> > > > > > > > > > > > > current draft implementation here
> >>> > >> > > > > > > > > > > > > <
> https://github.com/apache/kafka/pull/2267>.
> >>> It
> >>> > >> is
> >>> > >> > > based
> >>> > >> > > > > on
> >>> > >> > > > > > > zstd
> >>> > >> > > > > > > > > > 1.3.5
> >>> > >> > > > > > > > > > > > and
> >>> > >> > > > > > > > > > > > > periodically rebased onto the latest
> >>> trunk[^1].
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > The issues that should be addressed is
> like
> >>> > >> > following:
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > *1. Backward Compatibility*
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > To support old consumers, we need to take
> a
> >>> > >> strategy
> >>> > >> > to
> >>> > >> > > > > > handle
> >>> > >> > > > > > > > the
> >>> > >> > > > > > > > > > old
> >>> > >> > > > > > > > > > > > > consumers. Current candidates are:
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > - Bump API version
> >>> > >> > > > > > > > > > > > > - Leave unchanged: let the old clients
> fail.
> >>> > >> > > > > > > > > > > > > - Improve the error messages:
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > *2. Dictionary Support*
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > To support zstd's dictionary feature in
> the
> >>> > future
> >>> > >> > (if
> >>> > >> > > > > > needed),
> >>> > >> > > > > > > > we
> >>> > >> > > > > > > > > > need
> >>> > >> > > > > > > > > > > > to
> >>> > >> > > > > > > > > > > > > sketch how it should be and leave some
> room
> >>> for
> >>> > >> it.
> >>> > >> > As
> >>> > >> > > of
> >>> > >> > > > > > now,
> >>> > >> > > > > > > > > there
> >>> > >> > > > > > > > > > > has
> >>> > >> > > > > > > > > > > > > been no discussion on this topic yet.
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > *3. License*
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > To use this feature, we need to add
> license
> >>> of
> >>> > >> zstd
> >>> > >> > and
> >>> > >> > > > > > > zstd-jni
> >>> > >> > > > > > > > to
> >>> > >> > > > > > > > > > the
> >>> > >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for
> >>> raising
> >>> > >> this
> >>> > >> > > > issue!)
> >>> > >> > > > > > It
> >>> > >> > > > > > > > > seems
> >>> > >> > > > > > > > > > > like
> >>> > >> > > > > > > > > > > > > what Apache Spark did would be a good
> >>> example
> >>> > but
> >>> > >> > there
> >>> > >> > > > has
> >>> > >> > > > > > > been
> >>> > >> > > > > > > > no
> >>> > >> > > > > > > > > > > > > discussion yet.
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > You can find the details of the above
> >>> issues in
> >>> > >> the
> >>> > >> > KIP
> >>> > >> > > > > > > document.
> >>> > >> > > > > > > > > > > Please
> >>> > >> > > > > > > > > > > > > have a look when you are free, and give me
> >>> > >> feedback.
> >>> > >> > > All
> >>> > >> > > > > > kinds
> >>> > >> > > > > > > of
> >>> > >> > > > > > > > > > > > > participating are welcome.
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > Best,
> >>> > >> > > > > > > > > > > > > Dongjin
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > [^1]: At the time of writing, commit
> >>> 6b4fb8152.
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin
> >>> Lee <
> >>> > >> > > > > > > dongjin@apache.org
> >>> > >> > > > > > > > >
> >>> > >> > > > > > > > > > > wrote:
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > > Sorry for the late reply.
> >>> > >> > > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > > In short, I could not submit the updated
> >>> KIP
> >>> > by
> >>> > >> the
> >>> > >> > > > > feature
> >>> > >> > > > > > > > > freeze
> >>> > >> > > > > > > > > > > > > > deadline of 2.0.0. For this reason, it
> >>> will
> >>> > not
> >>> > >> be
> >>> > >> > > > > included
> >>> > >> > > > > > > in
> >>> > >> > > > > > > > > the
> >>> > >> > > > > > > > > > > > 2.0.0
> >>> > >> > > > > > > > > > > > > > release and all discussion for this
> issue
> >>> were
> >>> > >> > > > postponed
> >>> > >> > > > > > > after
> >>> > >> > > > > > > > > the
> >>> > >> > > > > > > > > > > > release
> >>> > >> > > > > > > > > > > > > > of 2.0.0.
> >>> > >> > > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > > I have been updating the PR following
> >>> recent
> >>> > >> > updates.
> >>> > >> > > > > Just
> >>> > >> > > > > > > > now, I
> >>> > >> > > > > > > > > > > > rebased
> >>> > >> > > > > > > > > > > > > > it against the latest trunk and updated
> >>> the
> >>> > zstd
> >>> > >> > > > version
> >>> > >> > > > > > into
> >>> > >> > > > > > > > > > 1.3.5.
> >>> > >> > > > > > > > > > > > If you
> >>> > >> > > > > > > > > > > > > > need some request, don't hesitate to
> >>> notify
> >>> > me.
> >>> > >> > (But
> >>> > >> > > > not
> >>> > >> > > > > > this
> >>> > >> > > > > > > > > > thread
> >>> > >> > > > > > > > > > > -
> >>> > >> > > > > > > > > > > > just
> >>> > >> > > > > > > > > > > > > > send me the message directly.)
> >>> > >> > > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > > Best,
> >>> > >> > > > > > > > > > > > > > Dongjin
> >>> > >> > > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby
> >>> Evans <
> >>> > >> > > > > > > bobby@apache.org
> >>> > >> > > > > > > > >
> >>> > >> > > > > > > > > > > wrote:
> >>> > >> > > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > >> I there any update on this.  The
> >>> performance
> >>> > >> > > > > improvements
> >>> > >> > > > > > > are
> >>> > >> > > > > > > > > > quite
> >>> > >> > > > > > > > > > > > > >> impressive and I really would like to
> >>> stop
> >>> > >> forking
> >>> > >> > > > kafka
> >>> > >> > > > > > > just
> >>> > >> > > > > > > > to
> >>> > >> > > > > > > > > > get
> >>> > >> > > > > > > > > > > > this
> >>> > >> > > > > > > > > > > > > >> in.
> >>> > >> > > > > > > > > > > > > >>
> >>> > >> > > > > > > > > > > > > >> Thanks,
> >>> > >> > > > > > > > > > > > > >>
> >>> > >> > > > > > > > > > > > > >> Bobby
> >>> > >> > > > > > > > > > > > > >>
> >>> > >> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin
> >>> Lee <
> >>> > >> > > > > > > > dongjin@apache.org
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > > > wrote:
> >>> > >> > > > > > > > > > > > > >>
> >>> > >> > > > > > > > > > > > > >> > Ismael,
> >>> > >> > > > > > > > > > > > > >> >
> >>> > >> > > > > > > > > > > > > >> > Oh, I forgot all of you are on
> working
> >>> > frenzy
> >>> > >> > for
> >>> > >> > > > 2.0!
> >>> > >> > > > > > No
> >>> > >> > > > > > > > > > problem,
> >>> > >> > > > > > > > > > > > take
> >>> > >> > > > > > > > > > > > > >> > your time. I am also working at
> another
> >>> > issue
> >>> > >> > now.
> >>> > >> > > > > Thank
> >>> > >> > > > > > > you
> >>> > >> > > > > > > > > for
> >>> > >> > > > > > > > > > > > > >> letting me
> >>> > >> > > > > > > > > > > > > >> > know.
> >>> > >> > > > > > > > > > > > > >> >
> >>> > >> > > > > > > > > > > > > >> > Best,
> >>> > >> > > > > > > > > > > > > >> > Dongjin
> >>> > >> > > > > > > > > > > > > >> >
> >>> > >> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael
> >>> Juma
> >>> > <
> >>> > >> > > > > > > > ismaelj@gmail.com
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > > > wrote:
> >>> > >> > > > > > > > > > > > > >> >
> >>> > >> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin.
> >>> Everyone is
> >>> > >> busy
> >>> > >> > > > > > finalising
> >>> > >> > > > > > > > > > 2.0.0.
> >>> > >> > > > > > > > > > > > This
> >>> > >> > > > > > > > > > > > > >> KIP
> >>> > >> > > > > > > > > > > > > >> > > seems like a great candidate for
> >>> 2.1.0
> >>> > and
> >>> > >> > > > hopefully
> >>> > >> > > > > > > there
> >>> > >> > > > > > > > > > will
> >>> > >> > > > > > > > > > > be
> >>> > >> > > > > > > > > > > > > >> more
> >>> > >> > > > > > > > > > > > > >> > of
> >>> > >> > > > > > > > > > > > > >> > > a discussion next week. :)
> >>> > >> > > > > > > > > > > > > >> > >
> >>> > >> > > > > > > > > > > > > >> > > Ismael
> >>> > >> > > > > > > > > > > > > >> > >
> >>> > >> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin
> >>> Lee, <
> >>> > >> > > > > > > > dongjin@apache.org
> >>> > >> > > > > > > > > >
> >>> > >> > > > > > > > > > > > wrote:
> >>> > >> > > > > > > > > > > > > >> > >
> >>> > >> > > > > > > > > > > > > >> > > > Hello. I just updated my draft
> >>> > >> > implementation:
> >>> > >> > > > > > > > > > > > > >> > > >
> >>> > >> > > > > > > > > > > > > >> > > > 1. Rebased to latest trunk
> (commit
> >>> > >> 5145d6b)
> >>> > >> > > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> >>> > >> > > > > > > > > > > > > >> > > >
> >>> > >> > > > > > > > > > > > > >> > > > You can check out the
> >>> implementation
> >>> > from
> >>> > >> > here
> >>> > >> > > > > > > > > > > > > >> > > > <
> >>> > >> https://github.com/apache/kafka/pull/2267
> >>> > >> > >.
> >>> > >> > > If
> >>> > >> > > > > you
> >>> > >> > > > > > > > > > > experience
> >>> > >> > > > > > > > > > > > any
> >>> > >> > > > > > > > > > > > > >> > > problem
> >>> > >> > > > > > > > > > > > > >> > > > running it, don't hesitate to
> give
> >>> me a
> >>> > >> > > mention.
> >>> > >> > > > > > > > > > > > > >> > > >
> >>> > >> > > > > > > > > > > > > >> > > > Best,
> >>> > >> > > > > > > > > > > > > >> > > > Dongjin
> >>> > >> > > > > > > > > > > > > >> > > >
> >>> > >> > > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM
> >>> Dongjin
> >>> > >> Lee
> >>> > >> > <
> >>> > >> > > > > > > > > > > dongjin@apache.org
> >>> > >> > > > > > > > > > > > >
> >>> > >> > > > > > > > > > > > > >> > wrote:
> >>> > >> > > > > > > > > > > > > >> > > >
> >>> > >> > > > > > > > > > > > > >> > > > > Here is the short conclusion
> >>> about
> >>> > the
> >>> > >> > > license
> >>> > >> > > > > > > > problem:
> >>> > >> > > > > > > > > > *We
> >>> > >> > > > > > > > > > > > can
> >>> > >> > > > > > > > > > > > > >> use
> >>> > >> > > > > > > > > > > > > >> > > zstd
> >>> > >> > > > > > > > > > > > > >> > > > > and zstd-jni without any
> >>> problem, but
> >>> > >> we
> >>> > >> > > need
> >>> > >> > > > to
> >>> > >> > > > > > > > include
> >>> > >> > > > > > > > > > > their
> >>> > >> > > > > > > > > > > > > >> > license,
> >>> > >> > > > > > > > > > > > > >> > > > > e.g., BSD license.*
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License &
> 3
> >>> > Clause
> >>> > >> > > > License
> >>> > >> > > > > > > > requires
> >>> > >> > > > > > > > > > to
> >>> > >> > > > > > > > > > > > > >> include
> >>> > >> > > > > > > > > > > > > >> > the
> >>> > >> > > > > > > > > > > > > >> > > > > license used, and BSD 3 Clause
> >>> > License
> >>> > >> > > > requires
> >>> > >> > > > > > that
> >>> > >> > > > > > > > the
> >>> > >> > > > > > > > > > > name
> >>> > >> > > > > > > > > > > > of
> >>> > >> > > > > > > > > > > > > >> the
> >>> > >> > > > > > > > > > > > > >> > > > > contributor can't be used to
> >>> endorse
> >>> > or
> >>> > >> > > > promote
> >>> > >> > > > > > the
> >>> > >> > > > > > > > > > product.
> >>> > >> > > > > > > > > > > > > >> That's
> >>> > >> > > > > > > > > > > > > >> > it
> >>> > >> > > > > > > > > > > > > >> > > > > <
> >>> > >> > > > > > > > > > > > > >> > > >
> >>> > >> > > > > > > > > > > > > >> > >
> >>> > >> > > > > > > > > > > > > >> >
> >>> > >> > > > > > > > > > > > > >>
> >>> > >> > http://www.mikestratton.net/2011/12/is-bsd-license-
> >>> > >> > > > > > > > > > > > compatible-with-apache-2-0-license/
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > > - They are not listed in the
> >>> list of
> >>> > >> > > > prohibited
> >>> > >> > > > > > > > licenses
> >>> > >> > > > > > > > > > > > > >> > > > > <https://www.apache.org/legal/
> >>> > >> > > > > > > > resolved.html#category-x>
> >>> > >> > > > > > > > > > > also.
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > > Here is how Spark did for it
> >>> > >> > > > > > > > > > > > > >> > > > > <https://issues.apache.org/
> >>> > >> > > > > > jira/browse/SPARK-19112
> >>> > >> > > > > > > >:
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > > - They made a directory
> >>> dedicated to
> >>> > >> the
> >>> > >> > > > > > dependency
> >>> > >> > > > > > > > > > license
> >>> > >> > > > > > > > > > > > files
> >>> > >> > > > > > > > > > > > > >> > > > > <
> >>> > >> > > > > > > https://github.com/apache/spark/tree/master/licenses
> >>> > >> > > > > > > > >
> >>> > >> > > > > > > > > > and
> >>> > >> > > > > > > > > > > > added
> >>> > >> > > > > > > > > > > > > >> > > > licenses
> >>> > >> > > > > > > > > > > > > >> > > > > for Zstd
> >>> > >> > > > > > > > > > > > > >> > > > > <
> >>> > >> > > > > > > > > > > > > >> >
> >>> > >> > > > https://github.com/apache/spark/blob/master/licenses/
> >>> > >> > > > > > > > > > > > LICENSE-zstd.txt
> >>> > >> > > > > > > > > > > > > >> > > >
> >>> > >> > > > > > > > > > > > > >> > > > &
> >>> > >> > > > > > > > > > > > > >> > > > > Zstd-jni
> >>> > >> > > > > > > > > > > > > >> > > > > <
> >>> > >> > > > > > > > > > > > > >> > > >
> >>> > >> > > > > > > > > > > > > >> > >
> >>> > >> > > > > > > > > > > > > >> >
> >>> > >> > > > > > > > > > > > > >> https://github.com/apache/
> >>> > >> > > spark/blob/master/licenses/
> >>> > >> > > > > > > > > > > > LICENSE-zstd-jni.txt
> >>> > >> > > > > > > > > > > > > >> >
> >>> > >> > > > > > > > > > > > > >> > > > > .
> >>> > >> > > > > > > > > > > > > >> > > > > - Added a link to the original
> >>> > license
> >>> > >> > files
> >>> > >> > > > in
> >>> > >> > > > > > > > LICENSE.
> >>> > >> > > > > > > > > > > > > >> > > > > <
> >>> > >> > > > > https://github.com/apache/spark/pull/18805/files
> >>> > >> > > > > > >
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > > If needed, I can make a similar
> >>> > update.
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > > Thanks for pointing out this
> >>> problem,
> >>> > >> > > Viktor!
> >>> > >> > > > > Nice
> >>> > >> > > > > > > > > catch!
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > > Best,
> >>> > >> > > > > > > > > > > > > >> > > > > Dongjin
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50
> PM
> >>> > >> Dongjin
> >>> > >> > > Lee <
> >>> > >> > > > > > > > > > > > dongjin@apache.org>
> >>> > >> > > > > > > > > > > > > >> > > wrote:
> >>> > >> > > > > > > > > > > > > >> > > > >
> >>> > >> > > > > > > > > > > > > >> > > > >> I greatly appreciate your
> >>> > >> comprehensive
> >>> > >> > > > > > reasoning.
> >>> > >> > > > > > > > so:
> >>> > >> > > > > > > > > +1
> >>> > >> > > > > > > > > > > > for b
> >>> > >> > > > > > > > > > > > > >> > until
> >>> > >> > > > > > > > > > > > > >> > > > now.
> >>> >

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
I updated the KIP page
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression>
following the discussion here. Please take a look when you are free.
If you have any opinion, don't hesitate to give me a message.

Best,
Dongjin

On Fri, Aug 31, 2018 at 11:35 PM Dongjin Lee <do...@apache.org> wrote:

> I just updated the draft implementation[^1], rebasing against the latest
> trunk and implementing error routine (i.e., Error code 74 for
> UnsupportedCompressionTypeException.) Since we decided to disallow all
> fetch request below version 2.1.0 for the topics specifying ZStandard, I
> added an error logic only.
>
> Please have a look when you are free.
>
> Thanks,
> Dongjin
>
> [^1]: Please check the last commit here:
> https://github.com/apache/kafka/pull/2267
>
> On Thu, Aug 23, 2018, 8:55 AM Dongjin Lee <do...@apache.org> wrote:
>
>> Jason,
>>
>> Great. +1 for UNSUPPORTED_COMPRESSION_TYPE.
>>
>> Best,
>> Dongjin
>>
>> On Thu, Aug 23, 2018 at 8:19 AM Jason Gustafson <ja...@confluent.io>
>> wrote:
>>
>>> Hey Dongjin,
>>>
>>> Yeah that's right. For what it's worth, librdkafka also appears to handle
>>> unexpected error codes. I expect that most client implementations would
>>> either pass through the raw type or convert to an enum using something
>>> like
>>> what the java client does. Since we're expecting the client to fail
>>> anyway,
>>> I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE error
>>> code.
>>>
>>> -Jason
>>>
>>> On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <do...@apache.org> wrote:
>>>
>>> > Jason and Ismael,
>>> >
>>> > It seems like the only thing we need to regard if we define a new error
>>> > code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the implementation
>>> of
>>> > the other language clients, right? At least, this strategy causes any
>>> > problem for Java client. Do I understand correctly?
>>> >
>>> > Thanks,
>>> > Dongjin
>>> >
>>> > On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <do...@apache.org>
>>> wrote:
>>> >
>>> > > Jason,
>>> > >
>>> > > > I think we would only use this error code when we /know/ that zstd
>>> was
>>> > > in use and the client doesn't support it? This is true if either 1)
>>> the
>>> > > message needs down-conversion and we encounter a zstd compressed
>>> message,
>>> > > or 2) if the topic is explicitly configured to use zstd.
>>> > >
>>> > > Yes, it is right. And you know, the case 1 includes 1.a) old clients'
>>> > > request v0, v1 records or 1.b) implicit zstd, the compression type of
>>> > > "producer" with Zstd compressed data.
>>> > >
>>> > > > However, if the compression type is set to "producer," then the
>>> fetched
>>> > > data may or may not be compressed with zstd. In this case, we return
>>> the
>>> > > data to the client and expect it to fail parsing. Is that correct?
>>> > >
>>> > > Exactly.
>>> > >
>>> > > Following your message, I reviewed the implementation of
>>> > > `KafkaApis#handleFetchRequest,` which handles the fetch request. And
>>> > found
>>> > > that the information we can use is like the following:
>>> > >
>>> > > 1. Client's fetch request version. (`versionId` variable)
>>> > > 2. Log's compression type. (`logConfig` variable)
>>> > >
>>> > > We can't detect the actual compression type of the data, unless we
>>> > inspect
>>> > > the `RecordBatch` included in the `Records` instance (i.e.,
>>> > > `unconvertedRecords` variable.) Since it requires some performance
>>> issue,
>>> > > it is not our option - in short, we can't be sure if given chunks of
>>> data
>>> > > are compressed with zstd or not.
>>> > >
>>> > > So, conclusion: we can return an error in the case of 1.a and 2
>>> easily,
>>> > > with the information above. In the case 1.b (implicit zstd), we can
>>> just
>>> > > return the data by do nothing special and expect it to fail parsing.
>>> > >
>>> > > Thanks,
>>> > > Dongjin
>>> > >
>>> > > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <is...@gmail.com>
>>> wrote:
>>> > >
>>> > >> Jason, that's an interesting point regarding the Java client. Do we
>>> know
>>> > >> what clients in other languages do in these cases?
>>> > >>
>>> > >> Ismael
>>> > >>
>>> > >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <ja...@confluent.io>
>>> wrote:
>>> > >>
>>> > >> > Hi Dongjin,
>>> > >> >
>>> > >> > One of the complications is that old versions of the API will not
>>> > >> expect a
>>> > >> > new error code. However, since we expect this to be a fatal error
>>> > anyway
>>> > >> > for old clients, it may still be more useful to return the correct
>>> > error
>>> > >> > code. For example, the Kafka clients use the following code to
>>> convert
>>> > >> the
>>> > >> > error code:
>>> > >> >
>>> > >> >     public static Errors forCode(short code) {
>>> > >> >         Errors error = codeToError.get(code);
>>> > >> >         if (error != null) {
>>> > >> >             return error;
>>> > >> >         } else {
>>> > >> >             log.warn("Unexpected error code: {}.", code);
>>> > >> >             return UNKNOWN_SERVER_ERROR;
>>> > >> >         }
>>> > >> >     }
>>> > >> >
>>> > >> > If we return an unsupported error code, it will be converted to an
>>> > >> UNKNOWN
>>> > >> > error, but at least we will get the message in the log with the
>>> > correct
>>> > >> > code. That seems preferable to returning a misleading error code.
>>> So I
>>> > >> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error
>>> even
>>> > for
>>> > >> > older versions.
>>> > >> >
>>> > >> > Also, one question just to check my understanding. I think we
>>> would
>>> > only
>>> > >> > use this error code when we /know/ that zstd was in use and the
>>> client
>>> > >> > doesn't support it? This is true if either 1) the message needs
>>> > >> > down-conversion and we encounter a zstd compressed message, or 2)
>>> if
>>> > the
>>> > >> > topic is explicitly configured to use zstd. However, if the
>>> > compression
>>> > >> > type is set to "producer," then the fetched data may or may not be
>>> > >> > compressed with zstd. In this case, we return the data to the
>>> client
>>> > and
>>> > >> > expect it to fail parsing. Is that correct?
>>> > >> >
>>> > >> > Thanks,
>>> > >> > Jason
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <do...@apache.org>
>>> > >> wrote:
>>> > >> >
>>> > >> > > Ismael, Jason and all,
>>> > >> > >
>>> > >> > > I rewrote the backward compatibility strategy & its alternatives
>>> > like
>>> > >> > > following, based on Ismael & Jason's comments. Since it is not
>>> > >> updated to
>>> > >> > > the wiki yet, don't hesitate to give me a message if you have
>>> any
>>> > >> opinion
>>> > >> > > on it.
>>> > >> > >
>>> > >> > > ```
>>> > >> > > *Backward Compatibility*
>>> > >> > >
>>> > >> > > We need to establish some backward-compatibility strategy for
>>> the
>>> > >> case an
>>> > >> > > old client subscribes a topic using ZStandard implicitly (i.e.,
>>> > >> > > 'compression.type' configuration of given topic is 'producer'
>>> and
>>> > the
>>> > >> > > producer compressed the records with ZStandard). We have the
>>> > following
>>> > >> > > options for this situation:
>>> > >> > >
>>> > >> > > *A. Support ZStandard to the old clients which can understand
>>> v0, v1
>>> > >> > > messages only.*
>>> > >> > >
>>> > >> > > This strategy necessarily requires the down-conversion of v2
>>> message
>>> > >> > > compressed with Zstandard into v0 or v1 messages, which means a
>>> > >> > > considerable performance degradation. So we rejected this
>>> strategy.
>>> > >> > >
>>> > >> > > *B. Bump the API version and support only v2-available clients*
>>> > >> > >
>>> > >> > > With this approach, we can message the old clients that they
>>> are old
>>> > >> and
>>> > >> > > should be upgraded. However, there are still several options
>>> for the
>>> > >> > Error
>>> > >> > > code.
>>> > >> > >
>>> > >> > > *B.1. INVALID_REQUEST (42)*
>>> > >> > >
>>> > >> > > This option gives the client so little information; the user
>>> can be
>>> > >> > > confused about why the client worked correctly in the past
>>> suddenly
>>> > >> > > encounters a problem. So we rejected this strategy.
>>> > >> > >
>>> > >> > > *B.2. CORRUPT_MESSAGE (2)*
>>> > >> > >
>>> > >> > > This option gives inaccurate information; the user can be
>>> surprised
>>> > >> and
>>> > >> > > misunderstand that the log files are broken in some way. So we
>>> > >> rejected
>>> > >> > > this strategy.
>>> > >> > >
>>> > >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
>>> > >> > >
>>> > >> > > The advantage of this approach is we don't need to define a new
>>> > error
>>> > >> > code;
>>> > >> > > we can reuse it and that's all.
>>> > >> > >
>>> > >> > > The disadvantage of this approach is that it is also a little
>>> bit
>>> > >> vague;
>>> > >> > > This error code is defined as a work for KIP-98[^1] and now
>>> returned
>>> > >> in
>>> > >> > the
>>> > >> > > transaction error.
>>> > >> > >
>>> > >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
>>> > >> > >
>>> > >> > > The advantage of this approach is that it is clear and provides
>>> an
>>> > >> exact
>>> > >> > > description. The disadvantage is we need to add a new error
>>> code.
>>> > >> > > ```
>>> > >> > >
>>> > >> > > *It seems like what we need to choose is now so clear:
>>> > >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or
>>> UNSUPPORTED_COMPRESSION_TYPE
>>> > >> > > (B.4).*
>>> > >> > > The first one doesn't need a new error message but the latter is
>>> > more
>>> > >> > > explicit. Which one do you prefer? Since all of you have much
>>> more
>>> > >> > > experience and knowledge than me, I will follow your decision.
>>> The
>>> > >> wiki
>>> > >> > > page will be updated following the decision also.
>>> > >> > >
>>> > >> > > Best,
>>> > >> > > Dongjin
>>> > >> > >
>>> > >> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
>>> > >> > >
>>> > >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <is...@gmail.com>
>>> > >> wrote:
>>> > >> > >
>>> > >> > > > Sounds reasonable to me.
>>> > >> > > >
>>> > >> > > > Ismael
>>> > >> > > >
>>> > >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <
>>> jason@confluent.io>
>>> > >> > wrote:
>>> > >> > > >
>>> > >> > > > > Hey Ismael,
>>> > >> > > > >
>>> > >> > > > > Your summary looks good to me. I think it might also be a
>>> good
>>> > >> idea
>>> > >> > to
>>> > >> > > > add
>>> > >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along
>>> with
>>> > the
>>> > >> > > > version
>>> > >> > > > > bumps. We won't be able to use it for old api versions
>>> since the
>>> > >> > > clients
>>> > >> > > > > will not understand it, but we can use it going forward so
>>> that
>>> > >> we're
>>> > >> > > not
>>> > >> > > > > stuck in a similar situation with a new message format and
>>> a new
>>> > >> > codec
>>> > >> > > to
>>> > >> > > > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_
>>> > FORMAT,
>>> > >> but
>>> > >> > > it
>>> > >> > > > is
>>> > >> > > > > not as explicit.
>>> > >> > > > >
>>> > >> > > > > -Jason
>>> > >> > > > >
>>> > >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <
>>> ismael@juma.me.uk
>>> > >
>>> > >> > > wrote:
>>> > >> > > > >
>>> > >> > > > > > Hi Dongjin and Jason,
>>> > >> > > > > >
>>> > >> > > > > > I would agree. My summary:
>>> > >> > > > > >
>>> > >> > > > > > 1. Support zstd with message format 2 only.
>>> > >> > > > > > 2. Bump produce and fetch request versions.
>>> > >> > > > > > 3. Provide broker errors whenever possible based on the
>>> > request
>>> > >> > > version
>>> > >> > > > > and
>>> > >> > > > > > rely on clients for the cases where the broker can't
>>> validate
>>> > >> > > > efficiently
>>> > >> > > > > > (example message format 2 consumer that supports the
>>> latest
>>> > >> fetch
>>> > >> > > > version
>>> > >> > > > > > but doesn't support zstd).
>>> > >> > > > > >
>>> > >> > > > > > If there's general agreement on this, I suggest we update
>>> the
>>> > >> KIP
>>> > >> > to
>>> > >> > > > > state
>>> > >> > > > > > the proposal and to move the rejected options to its own
>>> > >> section.
>>> > >> > And
>>> > >> > > > > then
>>> > >> > > > > > start a vote!
>>> > >> > > > > >
>>> > >> > > > > > Ismael
>>> > >> > > > > >
>>> > >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
>>> > >> > jason@confluent.io>
>>> > >> > > > > > wrote:
>>> > >> > > > > >
>>> > >> > > > > > > Hi Dongjin,
>>> > >> > > > > > >
>>> > >> > > > > > > Yes, that's a good summary. For clients which support
>>> v2,
>>> > the
>>> > >> > > client
>>> > >> > > > > can
>>> > >> > > > > > > parse the message format and hopefully raise a useful
>>> error
>>> > >> > message
>>> > >> > > > > > > indicating the unsupported compression type. For older
>>> > >> clients,
>>> > >> > our
>>> > >> > > > > > options
>>> > >> > > > > > > are probably (1) to down-convert to the old format
>>> using no
>>> > >> > > > compression
>>> > >> > > > > > > type, or (2) to return an error code. I'm leaning
>>> toward the
>>> > >> > latter
>>> > >> > > > as
>>> > >> > > > > > the
>>> > >> > > > > > > simpler solution, but the challenge is finding a good
>>> error
>>> > >> code.
>>> > >> > > Two
>>> > >> > > > > > > possibilities might be INVALID_REQUEST or
>>> CORRUPT_MESSAGE.
>>> > The
>>> > >> > > > downside
>>> > >> > > > > > is
>>> > >> > > > > > > that old clients probably won't get a helpful message.
>>> > >> However,
>>> > >> > at
>>> > >> > > > > least
>>> > >> > > > > > > the behavior will be consistent in the sense that all
>>> > clients
>>> > >> > will
>>> > >> > > > fail
>>> > >> > > > > > if
>>> > >> > > > > > > they do not support zstandard.
>>> > >> > > > > > >
>>> > >> > > > > > > What do you think?
>>> > >> > > > > > >
>>> > >> > > > > > > Thanks,
>>> > >> > > > > > > Jason
>>> > >> > > > > > >
>>> > >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
>>> > >> dongjin@apache.org
>>> > >> > >
>>> > >> > > > > wrote:
>>> > >> > > > > > >
>>> > >> > > > > > > > Thanks Jason, I reviewed the down-converting logic
>>> > following
>>> > >> > your
>>> > >> > > > > > > > explanation.[^1] You mean the following routines,
>>> right?
>>> > >> > > > > > > >
>>> > >> > > > > > > > -
>>> > >> > > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/
>>> > >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
>>> > >> > > > > > > > -
>>> > >> > > > > > > >
>>> https://github.com/apache/kafka/blob/trunk/clients/src/
>>> > >> > > > > > > > main/java/org/apache/kafka/common/record/
>>> > >> > > LazyDownConversionRecords.
>>> > >> > > > > > > > java#L165
>>> > >> > > > > > > > -
>>> > >> > > > > > > >
>>> https://github.com/apache/kafka/blob/trunk/clients/src/
>>> > >> > > > > > > >
>>> > >> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
>>> > >> > > > > > > >
>>> > >> > > > > > > > It seems like your stance is like following:
>>> > >> > > > > > > >
>>> > >> > > > > > > > 1. In principle, Kafka does not change the compression
>>> > codec
>>> > >> > when
>>> > >> > > > > > > > down-converting, since it requires inspecting the
>>> fetched
>>> > >> data,
>>> > >> > > > which
>>> > >> > > > > > is
>>> > >> > > > > > > > expensive.
>>> > >> > > > > > > > 2. However, there are some cases the fetched data is
>>> > >> inspected
>>> > >> > > > > anyway.
>>> > >> > > > > > In
>>> > >> > > > > > > > this case, we can provide compression conversion from
>>> > >> Zstandard
>>> > >> > > to
>>> > >> > > > > > > > classical ones[^2].
>>> > >> > > > > > > >
>>> > >> > > > > > > > And from what I understand, the cases where the client
>>> > >> without
>>> > >> > > > > > ZStandard
>>> > >> > > > > > > > support receives ZStandard compressed records can be
>>> > >> organized
>>> > >> > > into
>>> > >> > > > > two
>>> > >> > > > > > > > cases:
>>> > >> > > > > > > >
>>> > >> > > > > > > > a. The 'compression.type' configuration of given
>>> topic is
>>> > >> > > > 'producer'
>>> > >> > > > > > and
>>> > >> > > > > > > > the producer compressed the records with ZStandard.
>>> (that
>>> > >> is,
>>> > >> > > using
>>> > >> > > > > > > > ZStandard implicitly.)
>>> > >> > > > > > > > b.  The 'compression.type' configuration of given
>>> topic is
>>> > >> > > 'zstd';
>>> > >> > > > > that
>>> > >> > > > > > > is,
>>> > >> > > > > > > > using ZStandard explicitly.
>>> > >> > > > > > > >
>>> > >> > > > > > > > As you stated, we don't have to handle the case b
>>> > specially.
>>> > >> > So,
>>> > >> > > It
>>> > >> > > > > > seems
>>> > >> > > > > > > > like we can narrow the focus of the problem by joining
>>> > case
>>> > >> 1
>>> > >> > and
>>> > >> > > > > case
>>> > >> > > > > > b
>>> > >> > > > > > > > like the following:
>>> > >> > > > > > > >
>>> > >> > > > > > > > > Given the topic with 'producer' as its
>>> > 'compression.type'
>>> > >> > > > > > > configuration,
>>> > >> > > > > > > > ZStandard compressed records and old client without
>>> > >> ZStandard,
>>> > >> > is
>>> > >> > > > > there
>>> > >> > > > > > > any
>>> > >> > > > > > > > case we need to inspect the records and can change the
>>> > >> > > compression
>>> > >> > > > > > type?
>>> > >> > > > > > > If
>>> > >> > > > > > > > so, can we provide compression type converting?
>>> > >> > > > > > > >
>>> > >> > > > > > > > Do I understand correctly?
>>> > >> > > > > > > >
>>> > >> > > > > > > > Best,
>>> > >> > > > > > > > Dongjin
>>> > >> > > > > > > >
>>> > >> > > > > > > > [^1]: I'm sorry, I found that I was a little bit
>>> > >> > misunderstanding
>>> > >> > > > how
>>> > >> > > > > > API
>>> > >> > > > > > > > version works, after reviewing the downconvert logic
>>> & the
>>> > >> > > protocol
>>> > >> > > > > > > > documentation <https://kafka.apache.org/protocol>.
>>> > >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
>>> > >> > > > > > > >
>>> > >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
>>> > >> > > > jason@confluent.io>
>>> > >> > > > > > > > wrote:
>>> > >> > > > > > > >
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > But in my opinion, since the client will fail
>>> with the
>>> > >> API
>>> > >> > > > > version,
>>> > >> > > > > > > so
>>> > >> > > > > > > > we
>>> > >> > > > > > > > > > don't need to down-convert the messages anyway.
>>> Isn't
>>> > >> it?
>>> > >> > > So, I
>>> > >> > > > > > think
>>> > >> > > > > > > > we
>>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
>>> > >> familiar
>>> > >> > > with
>>> > >> > > > > > > > > down-convert
>>> > >> > > > > > > > > > logic.)
>>> > >> > > > > > > > >
>>> > >> > > > > > > > >
>>> > >> > > > > > > > > Currently the broker down-converts automatically
>>> when it
>>> > >> > > receives
>>> > >> > > > > an
>>> > >> > > > > > > old
>>> > >> > > > > > > > > version of the fetch request (a version which is
>>> known
>>> > to
>>> > >> > > predate
>>> > >> > > > > the
>>> > >> > > > > > > > > message format in use). Typically when
>>> down-converting
>>> > the
>>> > >> > > > message
>>> > >> > > > > > > > format,
>>> > >> > > > > > > > > we use the same compression type, but there is not
>>> much
>>> > >> point
>>> > >> > > in
>>> > >> > > > > > doing
>>> > >> > > > > > > so
>>> > >> > > > > > > > > when we know the client doesn't support it. So if
>>> > >> zstandard
>>> > >> > is
>>> > >> > > in
>>> > >> > > > > > use,
>>> > >> > > > > > > > and
>>> > >> > > > > > > > > we have to down-convert anyway, then we can choose
>>> to
>>> > use
>>> > >> a
>>> > >> > > > > different
>>> > >> > > > > > > > > compression type or no compression type.
>>> > >> > > > > > > > >
>>> > >> > > > > > > > > From my perspective, there is no significant
>>> downside to
>>> > >> > > bumping
>>> > >> > > > > the
>>> > >> > > > > > > > > protocol version and it has several potential
>>> benefits.
>>> > >> > Version
>>> > >> > > > > bumps
>>> > >> > > > > > > are
>>> > >> > > > > > > > > cheap. The main question mark in my mind is about
>>> > >> > > > down-conversion.
>>> > >> > > > > > > > Figuring
>>> > >> > > > > > > > > out whether down-conversion is needed is hard
>>> generally
>>> > >> > without
>>> > >> > > > > > > > inspecting
>>> > >> > > > > > > > > the fetched data, which is expensive. I think we
>>> agree
>>> > in
>>> > >> > > > principle
>>> > >> > > > > > > that
>>> > >> > > > > > > > we
>>> > >> > > > > > > > > do not want to have to pay this cost generally and
>>> > prefer
>>> > >> the
>>> > >> > > > > clients
>>> > >> > > > > > > to
>>> > >> > > > > > > > > fail when they see an unhandled compression type.
>>> The
>>> > >> point I
>>> > >> > > was
>>> > >> > > > > > > making
>>> > >> > > > > > > > is
>>> > >> > > > > > > > > that there are some cases where we are either
>>> inspecting
>>> > >> the
>>> > >> > > data
>>> > >> > > > > > > anyway
>>> > >> > > > > > > > > (because we have to down-convert the message
>>> format), or
>>> > >> we
>>> > >> > > have
>>> > >> > > > an
>>> > >> > > > > > > easy
>>> > >> > > > > > > > > way to tell whether zstandard is in use (the topic
>>> has
>>> > it
>>> > >> > > > > configured
>>> > >> > > > > > > > > explicitly). In the latter case, we don't have to
>>> handle
>>> > >> it
>>> > >> > > > > > specially.
>>> > >> > > > > > > > But
>>> > >> > > > > > > > > we do have to decide how we will handle
>>> down-conversion
>>> > to
>>> > >> > > older
>>> > >> > > > > > > formats.
>>> > >> > > > > > > > >
>>> > >> > > > > > > > > -Jason
>>> > >> > > > > > > > >
>>> > >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
>>> > >> > > dongjin@apache.org
>>> > >> > > > >
>>> > >> > > > > > > wrote:
>>> > >> > > > > > > > >
>>> > >> > > > > > > > > > Colin and Jason,
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > Thanks for your opinions. In summarizing, the
>>> Pros and
>>> > >> Cons
>>> > >> > > of
>>> > >> > > > > > > bumping
>>> > >> > > > > > > > > > fetch API version are:
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > Cons:
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > - The Broker can't know whether a given message
>>> batch
>>> > is
>>> > >> > > > > compressed
>>> > >> > > > > > > > with
>>> > >> > > > > > > > > > zstd or not.
>>> > >> > > > > > > > > > - Need some additional logic for the topic
>>> explicitly
>>> > >> > > > configured
>>> > >> > > > > to
>>> > >> > > > > > > use
>>> > >> > > > > > > > > > zstd.
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > Pros:
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > - The broker doesn't need to conduct expensive
>>> > >> > > down-conversion.
>>> > >> > > > > > > > > > - Can message the users to update their client.
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > So, opinions for the backward-compatibility
>>> policy by
>>> > >> far:
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > - A: bump the API version - +2 (Colin, Jason)
>>> > >> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > Here are my additional comments:
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > @Colin
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > I greatly appreciate your response. In the case
>>> of the
>>> > >> > > > dictionary
>>> > >> > > > > > > > > support,
>>> > >> > > > > > > > > > of course, this issue should be addressed later
>>> so we
>>> > >> don't
>>> > >> > > > need
>>> > >> > > > > it
>>> > >> > > > > > > in
>>> > >> > > > > > > > > the
>>> > >> > > > > > > > > > first version. You are right - it is not late to
>>> try
>>> > it
>>> > >> > after
>>> > >> > > > > some
>>> > >> > > > > > > > > > benchmarks. What I mean is, we should keep in
>>> mind on
>>> > >> that
>>> > >> > > > > > potential
>>> > >> > > > > > > > > > feature.
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > @Jason
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > You wrote,
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > > Similarly, if we have to down-convert anyway
>>> because
>>> > >> the
>>> > >> > > > client
>>> > >> > > > > > > does
>>> > >> > > > > > > > > not
>>> > >> > > > > > > > > > understand the message format, then we could also
>>> use
>>> > a
>>> > >> > > > different
>>> > >> > > > > > > > > > compression type.
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > But in my opinion, since the client will fail
>>> with the
>>> > >> API
>>> > >> > > > > version,
>>> > >> > > > > > > so
>>> > >> > > > > > > > we
>>> > >> > > > > > > > > > don't need to down-convert the messages anyway.
>>> Isn't
>>> > >> it?
>>> > >> > > So, I
>>> > >> > > > > > think
>>> > >> > > > > > > > we
>>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
>>> > >> familiar
>>> > >> > > with
>>> > >> > > > > > > > > down-convert
>>> > >> > > > > > > > > > logic.)
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > Please give more opinions. Thanks!
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > - Dongjin
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
>>> > >> > > > > jason@confluent.io
>>> > >> > > > > > >
>>> > >> > > > > > > > > wrote:
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > > Hey Colin,
>>> > >> > > > > > > > > > >
>>> > >> > > > > > > > > > > The problem for the fetch API is that the broker
>>> > does
>>> > >> not
>>> > >> > > > > > generally
>>> > >> > > > > > > > > know
>>> > >> > > > > > > > > > if
>>> > >> > > > > > > > > > > a batch was compressed with zstd unless it
>>> parses
>>> > it.
>>> > >> I
>>> > >> > > think
>>> > >> > > > > the
>>> > >> > > > > > > > goal
>>> > >> > > > > > > > > > here
>>> > >> > > > > > > > > > > is to avoid the expensive down-conversion that
>>> is
>>> > >> needed
>>> > >> > to
>>> > >> > > > > > ensure
>>> > >> > > > > > > > > > > compatibility because it is only necessary if
>>> zstd
>>> > is
>>> > >> > > > actually
>>> > >> > > > > in
>>> > >> > > > > > > > use.
>>> > >> > > > > > > > > > But
>>> > >> > > > > > > > > > > as long as old clients can parse the message
>>> format,
>>> > >> they
>>> > >> > > > > should
>>> > >> > > > > > > get
>>> > >> > > > > > > > a
>>> > >> > > > > > > > > > > reasonable error if they see an unsupported
>>> > >> compression
>>> > >> > > type
>>> > >> > > > in
>>> > >> > > > > > the
>>> > >> > > > > > > > > > > attributes. Basically the onus is on users to
>>> ensure
>>> > >> that
>>> > >> > > > their
>>> > >> > > > > > > > > consumers
>>> > >> > > > > > > > > > > have been updated prior to using zstd. It seems
>>> > like a
>>> > >> > > > > reasonable
>>> > >> > > > > > > > > > tradeoff
>>> > >> > > > > > > > > > > to me. There are a couple cases that might be
>>> worth
>>> > >> > > thinking
>>> > >> > > > > > > through:
>>> > >> > > > > > > > > > >
>>> > >> > > > > > > > > > > 1. If a topic is explicitly configured to use
>>> zstd,
>>> > >> then
>>> > >> > we
>>> > >> > > > > don't
>>> > >> > > > > > > > need
>>> > >> > > > > > > > > to
>>> > >> > > > > > > > > > > check the fetched data for the compression type
>>> to
>>> > >> know
>>> > >> > if
>>> > >> > > we
>>> > >> > > > > > need
>>> > >> > > > > > > > > > > down-conversion. If we did bump the Fetch API
>>> > version,
>>> > >> > then
>>> > >> > > > we
>>> > >> > > > > > > could
>>> > >> > > > > > > > > > handle
>>> > >> > > > > > > > > > > this case by either down-converting using a
>>> > different
>>> > >> > > > > compression
>>> > >> > > > > > > > type
>>> > >> > > > > > > > > or
>>> > >> > > > > > > > > > > returning an error.
>>> > >> > > > > > > > > > > 2. Similarly, if we have to down-convert anyway
>>> > >> because
>>> > >> > the
>>> > >> > > > > > client
>>> > >> > > > > > > > does
>>> > >> > > > > > > > > > not
>>> > >> > > > > > > > > > > understand the message format, then we could
>>> also
>>> > use
>>> > >> a
>>> > >> > > > > different
>>> > >> > > > > > > > > > > compression type.
>>> > >> > > > > > > > > > >
>>> > >> > > > > > > > > > > For the produce API, I think it's reasonable to
>>> bump
>>> > >> the
>>> > >> > > api
>>> > >> > > > > > > version.
>>> > >> > > > > > > > > > This
>>> > >> > > > > > > > > > > can be used by clients to check whether a broker
>>> > >> supports
>>> > >> > > > zstd.
>>> > >> > > > > > For
>>> > >> > > > > > > > > > > example, we might support a list of preferred
>>> > >> compression
>>> > >> > > > types
>>> > >> > > > > > in
>>> > >> > > > > > > > the
>>> > >> > > > > > > > > > > producer and we could use the broker to detect
>>> which
>>> > >> > > version
>>> > >> > > > to
>>> > >> > > > > > > use.
>>> > >> > > > > > > > > > >
>>> > >> > > > > > > > > > > -Jason
>>> > >> > > > > > > > > > >
>>> > >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
>>> > >> > > > > cmccabe@apache.org
>>> > >> > > > > > >
>>> > >> > > > > > > > > wrote:
>>> > >> > > > > > > > > > >
>>> > >> > > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a
>>> good
>>> > >> > > > compression
>>> > >> > > > > > > codec
>>> > >> > > > > > > > > > and I
>>> > >> > > > > > > > > > > > hope we can get this support in soon!
>>> > >> > > > > > > > > > > >
>>> > >> > > > > > > > > > > > I would say we can just bump the API version
>>> to
>>> > >> > indicate
>>> > >> > > > that
>>> > >> > > > > > > ZStd
>>> > >> > > > > > > > > > > support
>>> > >> > > > > > > > > > > > is expected in new clients.  We probably need
>>> some
>>> > >> way
>>> > >> > of
>>> > >> > > > > > > > indicating
>>> > >> > > > > > > > > to
>>> > >> > > > > > > > > > > the
>>> > >> > > > > > > > > > > > older clients that they can't consume the
>>> > >> partitions,
>>> > >> > as
>>> > >> > > > > well.
>>> > >> > > > > > > > > Perhaps
>>> > >> > > > > > > > > > > we
>>> > >> > > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT
>>> error?
>>> > >> > > > > > > > > > > >
>>> > >> > > > > > > > > > > > The license thing seems straightforward --
>>> it's
>>> > >> just a
>>> > >> > > > matter
>>> > >> > > > > > of
>>> > >> > > > > > > > > adding
>>> > >> > > > > > > > > > > > the text to the right files as per ASF
>>> guidelines.
>>> > >> > > > > > > > > > > >
>>> > >> > > > > > > > > > > > With regard to the dictionary support, do we
>>> > really
>>> > >> > need
>>> > >> > > > that
>>> > >> > > > > > in
>>> > >> > > > > > > > the
>>> > >> > > > > > > > > > > first
>>> > >> > > > > > > > > > > > version?  Hopefully message batches are big
>>> enough
>>> > >> that
>>> > >> > > > this
>>> > >> > > > > > > isn't
>>> > >> > > > > > > > > > > needed.
>>> > >> > > > > > > > > > > > Some benchmarks might help here.
>>> > >> > > > > > > > > > > >
>>> > >> > > > > > > > > > > > best,
>>> > >> > > > > > > > > > > > Colin
>>> > >> > > > > > > > > > > >
>>> > >> > > > > > > > > > > >
>>> > >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee
>>> wrote:
>>> > >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot
>>> this
>>> > >> issue,
>>> > >> > > > > KIP-110
>>> > >> > > > > > > > > > > > > <https://cwiki.apache.org/
>>> > >> > > confluence/display/KAFKA/KIP-
>>> > >> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
>>> > >> > > > > > > > > > > > > .
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > For newcomers, Here is some summary of the
>>> > >> history:
>>> > >> > > > KIP-110
>>> > >> > > > > > was
>>> > >> > > > > > > > > > > > originally
>>> > >> > > > > > > > > > > > > worked for the issue KAFKA-4514 but, it
>>> lacked
>>> > >> > > benchmark
>>> > >> > > > > > > results
>>> > >> > > > > > > > to
>>> > >> > > > > > > > > > get
>>> > >> > > > > > > > > > > > the
>>> > >> > > > > > > > > > > > > agreement of the community. Later, Ivan
>>> Babrou
>>> > and
>>> > >> > some
>>> > >> > > > > other
>>> > >> > > > > > > > users
>>> > >> > > > > > > > > > who
>>> > >> > > > > > > > > > > > > adopted the patch provided their excellent
>>> > >> > performance
>>> > >> > > > > report
>>> > >> > > > > > > > which
>>> > >> > > > > > > > > > is
>>> > >> > > > > > > > > > > > now
>>> > >> > > > > > > > > > > > > included in the KIP, but it postponed again
>>> > >> because
>>> > >> > of
>>> > >> > > > the
>>> > >> > > > > > > > > community
>>> > >> > > > > > > > > > > was
>>> > >> > > > > > > > > > > > > busy for 2.0.0 release. It is why I now
>>> reboot
>>> > >> this
>>> > >> > > > issue.
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > The following is the current status of the
>>> > >> feature:
>>> > >> > You
>>> > >> > > > can
>>> > >> > > > > > > check
>>> > >> > > > > > > > > the
>>> > >> > > > > > > > > > > > > current draft implementation here
>>> > >> > > > > > > > > > > > > <https://github.com/apache/kafka/pull/2267>.
>>> It
>>> > >> is
>>> > >> > > based
>>> > >> > > > > on
>>> > >> > > > > > > zstd
>>> > >> > > > > > > > > > 1.3.5
>>> > >> > > > > > > > > > > > and
>>> > >> > > > > > > > > > > > > periodically rebased onto the latest
>>> trunk[^1].
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > The issues that should be addressed is like
>>> > >> > following:
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > *1. Backward Compatibility*
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > To support old consumers, we need to take a
>>> > >> strategy
>>> > >> > to
>>> > >> > > > > > handle
>>> > >> > > > > > > > the
>>> > >> > > > > > > > > > old
>>> > >> > > > > > > > > > > > > consumers. Current candidates are:
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > - Bump API version
>>> > >> > > > > > > > > > > > > - Leave unchanged: let the old clients fail.
>>> > >> > > > > > > > > > > > > - Improve the error messages:
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > *2. Dictionary Support*
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > To support zstd's dictionary feature in the
>>> > future
>>> > >> > (if
>>> > >> > > > > > needed),
>>> > >> > > > > > > > we
>>> > >> > > > > > > > > > need
>>> > >> > > > > > > > > > > > to
>>> > >> > > > > > > > > > > > > sketch how it should be and leave some room
>>> for
>>> > >> it.
>>> > >> > As
>>> > >> > > of
>>> > >> > > > > > now,
>>> > >> > > > > > > > > there
>>> > >> > > > > > > > > > > has
>>> > >> > > > > > > > > > > > > been no discussion on this topic yet.
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > *3. License*
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > To use this feature, we need to add license
>>> of
>>> > >> zstd
>>> > >> > and
>>> > >> > > > > > > zstd-jni
>>> > >> > > > > > > > to
>>> > >> > > > > > > > > > the
>>> > >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for
>>> raising
>>> > >> this
>>> > >> > > > issue!)
>>> > >> > > > > > It
>>> > >> > > > > > > > > seems
>>> > >> > > > > > > > > > > like
>>> > >> > > > > > > > > > > > > what Apache Spark did would be a good
>>> example
>>> > but
>>> > >> > there
>>> > >> > > > has
>>> > >> > > > > > > been
>>> > >> > > > > > > > no
>>> > >> > > > > > > > > > > > > discussion yet.
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > You can find the details of the above
>>> issues in
>>> > >> the
>>> > >> > KIP
>>> > >> > > > > > > document.
>>> > >> > > > > > > > > > > Please
>>> > >> > > > > > > > > > > > > have a look when you are free, and give me
>>> > >> feedback.
>>> > >> > > All
>>> > >> > > > > > kinds
>>> > >> > > > > > > of
>>> > >> > > > > > > > > > > > > participating are welcome.
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > Best,
>>> > >> > > > > > > > > > > > > Dongjin
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > [^1]: At the time of writing, commit
>>> 6b4fb8152.
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin
>>> Lee <
>>> > >> > > > > > > dongjin@apache.org
>>> > >> > > > > > > > >
>>> > >> > > > > > > > > > > wrote:
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > > Sorry for the late reply.
>>> > >> > > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > > In short, I could not submit the updated
>>> KIP
>>> > by
>>> > >> the
>>> > >> > > > > feature
>>> > >> > > > > > > > > freeze
>>> > >> > > > > > > > > > > > > > deadline of 2.0.0. For this reason, it
>>> will
>>> > not
>>> > >> be
>>> > >> > > > > included
>>> > >> > > > > > > in
>>> > >> > > > > > > > > the
>>> > >> > > > > > > > > > > > 2.0.0
>>> > >> > > > > > > > > > > > > > release and all discussion for this issue
>>> were
>>> > >> > > > postponed
>>> > >> > > > > > > after
>>> > >> > > > > > > > > the
>>> > >> > > > > > > > > > > > release
>>> > >> > > > > > > > > > > > > > of 2.0.0.
>>> > >> > > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > > I have been updating the PR following
>>> recent
>>> > >> > updates.
>>> > >> > > > > Just
>>> > >> > > > > > > > now, I
>>> > >> > > > > > > > > > > > rebased
>>> > >> > > > > > > > > > > > > > it against the latest trunk and updated
>>> the
>>> > zstd
>>> > >> > > > version
>>> > >> > > > > > into
>>> > >> > > > > > > > > > 1.3.5.
>>> > >> > > > > > > > > > > > If you
>>> > >> > > > > > > > > > > > > > need some request, don't hesitate to
>>> notify
>>> > me.
>>> > >> > (But
>>> > >> > > > not
>>> > >> > > > > > this
>>> > >> > > > > > > > > > thread
>>> > >> > > > > > > > > > > -
>>> > >> > > > > > > > > > > > just
>>> > >> > > > > > > > > > > > > > send me the message directly.)
>>> > >> > > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > > Best,
>>> > >> > > > > > > > > > > > > > Dongjin
>>> > >> > > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby
>>> Evans <
>>> > >> > > > > > > bobby@apache.org
>>> > >> > > > > > > > >
>>> > >> > > > > > > > > > > wrote:
>>> > >> > > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > >> I there any update on this.  The
>>> performance
>>> > >> > > > > improvements
>>> > >> > > > > > > are
>>> > >> > > > > > > > > > quite
>>> > >> > > > > > > > > > > > > >> impressive and I really would like to
>>> stop
>>> > >> forking
>>> > >> > > > kafka
>>> > >> > > > > > > just
>>> > >> > > > > > > > to
>>> > >> > > > > > > > > > get
>>> > >> > > > > > > > > > > > this
>>> > >> > > > > > > > > > > > > >> in.
>>> > >> > > > > > > > > > > > > >>
>>> > >> > > > > > > > > > > > > >> Thanks,
>>> > >> > > > > > > > > > > > > >>
>>> > >> > > > > > > > > > > > > >> Bobby
>>> > >> > > > > > > > > > > > > >>
>>> > >> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin
>>> Lee <
>>> > >> > > > > > > > dongjin@apache.org
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > > > wrote:
>>> > >> > > > > > > > > > > > > >>
>>> > >> > > > > > > > > > > > > >> > Ismael,
>>> > >> > > > > > > > > > > > > >> >
>>> > >> > > > > > > > > > > > > >> > Oh, I forgot all of you are on working
>>> > frenzy
>>> > >> > for
>>> > >> > > > 2.0!
>>> > >> > > > > > No
>>> > >> > > > > > > > > > problem,
>>> > >> > > > > > > > > > > > take
>>> > >> > > > > > > > > > > > > >> > your time. I am also working at another
>>> > issue
>>> > >> > now.
>>> > >> > > > > Thank
>>> > >> > > > > > > you
>>> > >> > > > > > > > > for
>>> > >> > > > > > > > > > > > > >> letting me
>>> > >> > > > > > > > > > > > > >> > know.
>>> > >> > > > > > > > > > > > > >> >
>>> > >> > > > > > > > > > > > > >> > Best,
>>> > >> > > > > > > > > > > > > >> > Dongjin
>>> > >> > > > > > > > > > > > > >> >
>>> > >> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael
>>> Juma
>>> > <
>>> > >> > > > > > > > ismaelj@gmail.com
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > > > wrote:
>>> > >> > > > > > > > > > > > > >> >
>>> > >> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin.
>>> Everyone is
>>> > >> busy
>>> > >> > > > > > finalising
>>> > >> > > > > > > > > > 2.0.0.
>>> > >> > > > > > > > > > > > This
>>> > >> > > > > > > > > > > > > >> KIP
>>> > >> > > > > > > > > > > > > >> > > seems like a great candidate for
>>> 2.1.0
>>> > and
>>> > >> > > > hopefully
>>> > >> > > > > > > there
>>> > >> > > > > > > > > > will
>>> > >> > > > > > > > > > > be
>>> > >> > > > > > > > > > > > > >> more
>>> > >> > > > > > > > > > > > > >> > of
>>> > >> > > > > > > > > > > > > >> > > a discussion next week. :)
>>> > >> > > > > > > > > > > > > >> > >
>>> > >> > > > > > > > > > > > > >> > > Ismael
>>> > >> > > > > > > > > > > > > >> > >
>>> > >> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin
>>> Lee, <
>>> > >> > > > > > > > dongjin@apache.org
>>> > >> > > > > > > > > >
>>> > >> > > > > > > > > > > > wrote:
>>> > >> > > > > > > > > > > > > >> > >
>>> > >> > > > > > > > > > > > > >> > > > Hello. I just updated my draft
>>> > >> > implementation:
>>> > >> > > > > > > > > > > > > >> > > >
>>> > >> > > > > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit
>>> > >> 5145d6b)
>>> > >> > > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
>>> > >> > > > > > > > > > > > > >> > > >
>>> > >> > > > > > > > > > > > > >> > > > You can check out the
>>> implementation
>>> > from
>>> > >> > here
>>> > >> > > > > > > > > > > > > >> > > > <
>>> > >> https://github.com/apache/kafka/pull/2267
>>> > >> > >.
>>> > >> > > If
>>> > >> > > > > you
>>> > >> > > > > > > > > > > experience
>>> > >> > > > > > > > > > > > any
>>> > >> > > > > > > > > > > > > >> > > problem
>>> > >> > > > > > > > > > > > > >> > > > running it, don't hesitate to give
>>> me a
>>> > >> > > mention.
>>> > >> > > > > > > > > > > > > >> > > >
>>> > >> > > > > > > > > > > > > >> > > > Best,
>>> > >> > > > > > > > > > > > > >> > > > Dongjin
>>> > >> > > > > > > > > > > > > >> > > >
>>> > >> > > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM
>>> Dongjin
>>> > >> Lee
>>> > >> > <
>>> > >> > > > > > > > > > > dongjin@apache.org
>>> > >> > > > > > > > > > > > >
>>> > >> > > > > > > > > > > > > >> > wrote:
>>> > >> > > > > > > > > > > > > >> > > >
>>> > >> > > > > > > > > > > > > >> > > > > Here is the short conclusion
>>> about
>>> > the
>>> > >> > > license
>>> > >> > > > > > > > problem:
>>> > >> > > > > > > > > > *We
>>> > >> > > > > > > > > > > > can
>>> > >> > > > > > > > > > > > > >> use
>>> > >> > > > > > > > > > > > > >> > > zstd
>>> > >> > > > > > > > > > > > > >> > > > > and zstd-jni without any
>>> problem, but
>>> > >> we
>>> > >> > > need
>>> > >> > > > to
>>> > >> > > > > > > > include
>>> > >> > > > > > > > > > > their
>>> > >> > > > > > > > > > > > > >> > license,
>>> > >> > > > > > > > > > > > > >> > > > > e.g., BSD license.*
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3
>>> > Clause
>>> > >> > > > License
>>> > >> > > > > > > > requires
>>> > >> > > > > > > > > > to
>>> > >> > > > > > > > > > > > > >> include
>>> > >> > > > > > > > > > > > > >> > the
>>> > >> > > > > > > > > > > > > >> > > > > license used, and BSD 3 Clause
>>> > License
>>> > >> > > > requires
>>> > >> > > > > > that
>>> > >> > > > > > > > the
>>> > >> > > > > > > > > > > name
>>> > >> > > > > > > > > > > > of
>>> > >> > > > > > > > > > > > > >> the
>>> > >> > > > > > > > > > > > > >> > > > > contributor can't be used to
>>> endorse
>>> > or
>>> > >> > > > promote
>>> > >> > > > > > the
>>> > >> > > > > > > > > > product.
>>> > >> > > > > > > > > > > > > >> That's
>>> > >> > > > > > > > > > > > > >> > it
>>> > >> > > > > > > > > > > > > >> > > > > <
>>> > >> > > > > > > > > > > > > >> > > >
>>> > >> > > > > > > > > > > > > >> > >
>>> > >> > > > > > > > > > > > > >> >
>>> > >> > > > > > > > > > > > > >>
>>> > >> > http://www.mikestratton.net/2011/12/is-bsd-license-
>>> > >> > > > > > > > > > > > compatible-with-apache-2-0-license/
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > > - They are not listed in the
>>> list of
>>> > >> > > > prohibited
>>> > >> > > > > > > > licenses
>>> > >> > > > > > > > > > > > > >> > > > > <https://www.apache.org/legal/
>>> > >> > > > > > > > resolved.html#category-x>
>>> > >> > > > > > > > > > > also.
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > > Here is how Spark did for it
>>> > >> > > > > > > > > > > > > >> > > > > <https://issues.apache.org/
>>> > >> > > > > > jira/browse/SPARK-19112
>>> > >> > > > > > > >:
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > > - They made a directory
>>> dedicated to
>>> > >> the
>>> > >> > > > > > dependency
>>> > >> > > > > > > > > > license
>>> > >> > > > > > > > > > > > files
>>> > >> > > > > > > > > > > > > >> > > > > <
>>> > >> > > > > > > https://github.com/apache/spark/tree/master/licenses
>>> > >> > > > > > > > >
>>> > >> > > > > > > > > > and
>>> > >> > > > > > > > > > > > added
>>> > >> > > > > > > > > > > > > >> > > > licenses
>>> > >> > > > > > > > > > > > > >> > > > > for Zstd
>>> > >> > > > > > > > > > > > > >> > > > > <
>>> > >> > > > > > > > > > > > > >> >
>>> > >> > > > https://github.com/apache/spark/blob/master/licenses/
>>> > >> > > > > > > > > > > > LICENSE-zstd.txt
>>> > >> > > > > > > > > > > > > >> > > >
>>> > >> > > > > > > > > > > > > >> > > > &
>>> > >> > > > > > > > > > > > > >> > > > > Zstd-jni
>>> > >> > > > > > > > > > > > > >> > > > > <
>>> > >> > > > > > > > > > > > > >> > > >
>>> > >> > > > > > > > > > > > > >> > >
>>> > >> > > > > > > > > > > > > >> >
>>> > >> > > > > > > > > > > > > >> https://github.com/apache/
>>> > >> > > spark/blob/master/licenses/
>>> > >> > > > > > > > > > > > LICENSE-zstd-jni.txt
>>> > >> > > > > > > > > > > > > >> >
>>> > >> > > > > > > > > > > > > >> > > > > .
>>> > >> > > > > > > > > > > > > >> > > > > - Added a link to the original
>>> > license
>>> > >> > files
>>> > >> > > > in
>>> > >> > > > > > > > LICENSE.
>>> > >> > > > > > > > > > > > > >> > > > > <
>>> > >> > > > > https://github.com/apache/spark/pull/18805/files
>>> > >> > > > > > >
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > > If needed, I can make a similar
>>> > update.
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > > Thanks for pointing out this
>>> problem,
>>> > >> > > Viktor!
>>> > >> > > > > Nice
>>> > >> > > > > > > > > catch!
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > > Best,
>>> > >> > > > > > > > > > > > > >> > > > > Dongjin
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM
>>> > >> Dongjin
>>> > >> > > Lee <
>>> > >> > > > > > > > > > > > dongjin@apache.org>
>>> > >> > > > > > > > > > > > > >> > > wrote:
>>> > >> > > > > > > > > > > > > >> > > > >
>>> > >> > > > > > > > > > > > > >> > > > >> I greatly appreciate your
>>> > >> comprehensive
>>> > >> > > > > > reasoning.
>>> > >> > > > > > > > so:
>>> > >> > > > > > > > > +1
>>> > >> > > > > > > > > > > > for b
>>> > >> > > > > > > > > > > > > >> > until
>>> > >> > > > > > > > > > > > > >> > > > now.
>>> > >> > > > > > > > > > > > > >> > > > >>
>>> > >> > > > > > > > > > > > > >> > > > >> For the license issues, I will
>>> have
>>> > a
>>> > >> > check
>>> > >> > > > on
>>> > >> > > > > > how
>>> > >> > > > > > > > the
>>> > >> > > > > > > > > > over
>>> > >> > > > > > > > > > > > > >> projects
>>> > >> > > > > > > > > > > > > >> > > are
>>> > >> > > > > > > > > > > > > >> > > > >> doing and share the results.
>>> > >> > > > > > > > > > > > > >> > > > >>
>>> > >> > > > > > > > > > > > > >> > > > >> Best,
>>> > >> > > > > > > > > > > > > >> > > > >> Dongjin
>>> > >> > > > > > > > > > > > > >> > > > >>
>>> > >> > > > > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM
>>> > >> Viktor
>>> > >> > > > > Somogyi <
>>> > >> > > > > > > > > > > > > >> > > > viktorsomogyi@gmail.com>
>>> > >> > > > > > > > > > > > > >> > > > >> wrote:
>>> > >> > > > > > > > > > > > > >> > > > >>
>>> > >> > > > > > > > > > > > > >> > > > >>> Hi Dongjin,
>>> > >> > > > > > > > > > > > > >> > > > >>>
>>> > >> > > > > > > > > > > > > >> > > > >>> A couple of comments:
>>> > >> > > > > > > > > > > > > >> > > > >>> I would vote for option b. in
>>> the
>>> > >> > > "backward
>>> > >> > > > > > > > > > compatibility"
>>> > >> > > > > > > > > > > > > >> section.
>>> > >> > > > > > > > > > > > > >> > > My
>>> > >> > > > > > > > > > > > > >> > > > >>> reasoning for this is that
>>> users
>>> > >> > upgrading
>>> > >> > > > to
>>> > >> > > > > a
>>> > >> > > > > > > zstd
>>> > >> > > > > > > > > > > > compatible
>>> > >> > > > > > > > > > > > > >> > > version
>>> > >> > > > > > > > > > > > > >> > > > >>> won't start to use it
>>> > automatically,
>>> > >> so
>>> > >> > > > manual
>>> > >> > > > > > > > > > > > reconfiguration
>>> > >> > > > > > > > > > > > > >> is
>>> > >> > > > > > > > > > > > > >> > > > >>> required.
>>> > >> > > > > > > > > > > > > >> > > > >>> Therefore an upgrade won't
>>> mess up
>>> > >> the
>>> > >> > > > > cluster.
>>> > >> > > > > > If
>>> > >> > > > > > > > not
>>> > >> > > > > > > > > > all
>>> > >> > > > > > > > > > > > the
>>> > >> > > > > > > > > > > > > >> > > clients
>>> > >> > > > > > > > > > > > > >> > > > >>> are
>>> > >> > > > > > > > > > > > > >> > > > >>> upgraded but just some of them
>>> and
>>> > >> > they'd
>>> > >> > > > > start
>>> > >> > > > > > to
>>> > >> > > > > > > > use
>>> > >> > > > > > > > > > > zstd
>>> > >> > > > > > > > > > > > > >> then it
>>> > >> > > > > > > > > > > > > >> > > > would
>>> > >> > > > > > > > > > > > > >> > > > >>> cause errors in the cluster.
>>> I'd
>>> > >> like to
>>> > >> > > > > presume
>>> > >> > > > > > > > > though
>>> > >> > > > > > > > > > > that
>>> > >> > > > > > > > > > > > > >> this
>>> > >> > > > > > > > > > > > > >> > is
>>> > >> > > > > > > > > > > > > >> > > a
>>> > >> > > > > > > > > > > > > >> > > > >>> very
>>> > >> > > > > > > > > > > > > >> > > > >>> obvious failure case and nobody
>>> > >> should
>>> > >> > be
>>> > >> > > > > > > surprised
>>> > >> > > > > > > > if
>>> > >> > > > > > > > > > it
>>> > >> > > > > > > > > > > > didn't
>>> > >> > > > > > > > > > > > > >> > > work.
>>> > >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose a. as I
>>> think we
>>> > >> > should
>>> > >> > > > bump
>>> > >> > > > > > the
>>> > >> > > > > > > > > fetch
>>> > >> > > > > > > > > > > and
>>> > >> > > > > > > > > > > > > >> > produce
>>> > >> > > > > > > > > > > > > >> > > > >>> requests if it's a change in
>>> the
>>> > >> message
>>> > >> > > > > format.
>>> > >> > > > > > > > > > Moreover
>>> > >> > > > > > > > > > > if
>>> > >> > > > > > > > > > > > > >> some
>>> > >> > > > > > > > > > > > > >> > of
>>> > >> > > > > > > > > > > > > >> > > > the
>>> > >> > > > > > > > > > > > > >> > > > >>> producers and the brokers are
>>> > >> upgraded
>>> > >> > but
>>> > >> > > > > some
>>> > >> > > > > > of
>>> > >> > > > > > > > the
>>> > >> > > > > > > > > > > > consumers
>>> > >> > > > > > > > > > > > > >> > are
>>> > >> > > > > > > > > > > > > >> > > > not,
>>> > >> > > > > > > > > > > > > >> > > > >>> then we wouldn't prevent the
>>> error
>>> > >> when
>>> > >> > > the
>>> > >> > > > > old
>>> > >> > > > > > > > > consumer
>>> > >> > > > > > > > > > > > tries
>>> > >> > > > > > > > > > > > > >> to
>>> > >> > > > > > > > > > > > > >> > > > consume
>>> > >> > > > > > > > > > > > > >> > > > >>> the zstd compressed messages.
>>> > >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose c. either as
>>> I
>>> > >> think
>>> > >> > > > binding
>>> > >> > > > > > the
>>> > >> > > > > > > > > > > > compression
>>> > >> > > > > > > > > > > > > >> type
>>> > >> > > > > > > > > > > > > >> > > to
>>> > >> > > > > > > > > > > > > >> > > > an
>>> > >> > > > > > > > > > > > > >> > > > >>> API is not so obvious from the
>>> > >> > developer's
>>> > >> > > > > > > > > perspective.
>>> > >> > > > > > > > > > > > > >> > > > >>>
>>> > >> > > > > > > > > > > > > >> > > > >>> I would also prefer to use the
>>> > >> existing
>>> > >> > > > > binding,
>>> > >> > > > > > > > > however
>>> > >> > > > > > > > > > > we
>>> > >> > > > > > > > > > > > must
>>> > >> > > > > > > > > > > > > >> > > > respect
>>> > >> > > > > > > > > > > > > >> > > > >>> the licenses:
>>> > >> > > > > > > > > > > > > >> > > > >>> "The code for these JNI
>>> bindings is
>>> > >> > > licenced
>>> > >> > > > > > under
>>> > >> > > > > > > > > > > 2-clause
>>> > >> > > > > > > > > > > > BSD
>>> > >> > > > > > > > > > > > > >> > > > license.
>>> > >> > > > > > > > > > > > > >> > > > >>> The native Zstd library is
>>> licensed
>>> > >> > under
>>> > >> > > > > > 3-clause
>>> > >> > > > > > > > BSD
>>> > >> > > > > > > > > > > > license
>>> > >> > > > > > > > > > > > > >> and
>>> > >> > > > > > > > > > > > > >> > > > GPL2"
>>> > >> > > > > > > > > > > > > >> > > > >>> Based on the FAQ page
>>> > >> > > > > > > > > > > > > >> > > > >>> https://www.apache.org/legal/
>>> > >> > > > > > > > resolved.html#category-a
>>> > >> > > > > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD
>>> > >> licenses
>>> > >> > > but
>>> > >> > > > > the
>>> > >> > > > > > > > Apache
>>> > >> > > > > > > > > > > > license
>>> > >> > > > > > > > > > > > > >> is
>>> > >> > > > > > > > > > > > > >> > not
>>> > >> > > > > > > > > > > > > >> > > > >>> compatible with GPL2. I'm
>>> hoping
>>> > that
>>> > >> > the
>>> > >> > > > > > > "3-clause
>>> > >> > > > > > > > > BSD
>>> > >> > > > > > > > > > > > license
>>> > >> > > > > > > > > > > > > >> and
>>> > >> > > > > > > > > > > > > >> > > > GPL2"
>>> > >> > > > > > > > > > > > > >> > > > >>> is really not an AND but an OR
>>> in
>>> > >> this
>>> > >> > > case,
>>> > >> > > > > but
>>> > >> > > > > > > I'm
>>> > >> > > > > > > > > no
>>> > >> > > > > > > > > > > > lawyer,
>>> > >> > > > > > > > > > > > > >> > just
>>> > >> > > > > > > > > > > > > >> > > > >>> wanted
>>> > >> > > > > > > > > > > > > >> > > > >>> to make the point that we
>>> should
>>> > >> watch
>>> > >> > out
>>> > >> > > > for
>>> > >> > > > > > > > > licenses.
>>> > >> > > > > > > > > > > :)
>>> > >> > > > > > > > > > > > > >> > > > >>>
>>> > >> > > > > > > > > > > > > >> > > > >>> Regards,
>>> > >> > > > > > > > > > > > > >> > > > >>> Viktor
>>> > >> > > > > > > > > > > > > >> > > > >>>
>>> > >> > > > > > > > > > > > > >> > > > >>>
>>> > >> > > > > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM
>>> > Ivan
>>> > >> > > Babrou
>>> > >> > > > <
>>> > >> > > > > > > > > > > > ibobrik@gmail.com>
>>> > >> > > > > > > > > > > > > >> > > wrote:
>>> > >> > > > > > > > > > > > > >> > > > >>>
>>> > >> > > > > > > > > > > > > >> > > > >>> > Hello,
>>> > >> > > > > > > > > > > > > >> > > > >>> >
>>> > >> > > > > > > > > > > > > >> > > > >>> > This is Ivan and I still very
>>> > much
>>> > >> > > support
>>> > >> > > > > the
>>> > >> > > > > > > > fact
>>> > >> > > > > > > > > > that
>>> > >> > > > > > > > > > > > zstd
>>> > >> > > > > > > > > > > > > >> > > > >>> compression
>>> > >> > > > > > > > > > > > > >> > > > >>> > should be included out of the
>>> > box.
>>> > >> > > > > > > > > > > > > >> > > > >>> >
>>> > >> > > > > > > > > > > > > >> > > > >>> > Please think about the
>>> > environment,
>>> > >> > you
>>> > >> > > > can
>>> > >> > > > > > save
>>> > >> > > > > > > > > > quite a
>>> > >> > > > > > > > > > > > lot
>>> > >> > > > > > > > > > > > > >> of
>>> > >> > > > > > > > > > > > > >> > > > >>> hardware
>>> > >> > > > > > > > > > > > > >> > > > >>> > with it.
>>> > >> > > > > > > > > > > > > >> > > > >>> >
>>> > >> > > > > > > > > > > > > >> > > > >>> > Thank you.
>>> > >> > > > > > > > > > > > > >> > > > >>> >
>>> > >> > > > > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14
>>> > >> Dongjin
>>> > >> > > Lee <
>>> > >> > > > > > > > > > > > dongjin@apache.org>
>>> > >> > > > > > > > > > > > > >> > > wrote:
>>> > >> > > > > > > > > > > > > >> > > > >>> >
>>> > >> > > > > > > > > > > > > >> > > > >>> > > Since there are no
>>> responses
>>> > for
>>> > >> a
>>> > >> > > > week, I
>>> > >> > > > > > > > decided
>>> > >> > > > > > > > > > to
>>> > >> > > > > > > > > > > > > >> > reinitiate
>>> > >> > > > > > > > > > > > > >> > > > the
>>> > >> > > > > > > > > > > > > >> > > > >>> > > discussion thread.
>>> > >> > > > > > > > > > > > > >> > > > >>> > >
>>> > >> > > > > >
>>
>>

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
I just updated the draft implementation[^1], rebasing against the latest
trunk and implementing error routine (i.e., Error code 74 for
UnsupportedCompressionTypeException.) Since we decided to disallow all
fetch request below version 2.1.0 for the topics specifying ZStandard, I
added an error logic only.

Please have a look when you are free.

Thanks,
Dongjin

[^1]: Please check the last commit here:
https://github.com/apache/kafka/pull/2267

On Thu, Aug 23, 2018, 8:55 AM Dongjin Lee <do...@apache.org> wrote:

> Jason,
>
> Great. +1 for UNSUPPORTED_COMPRESSION_TYPE.
>
> Best,
> Dongjin
>
> On Thu, Aug 23, 2018 at 8:19 AM Jason Gustafson <ja...@confluent.io>
> wrote:
>
>> Hey Dongjin,
>>
>> Yeah that's right. For what it's worth, librdkafka also appears to handle
>> unexpected error codes. I expect that most client implementations would
>> either pass through the raw type or convert to an enum using something
>> like
>> what the java client does. Since we're expecting the client to fail
>> anyway,
>> I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE error
>> code.
>>
>> -Jason
>>
>> On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <do...@apache.org> wrote:
>>
>> > Jason and Ismael,
>> >
>> > It seems like the only thing we need to regard if we define a new error
>> > code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the implementation of
>> > the other language clients, right? At least, this strategy causes any
>> > problem for Java client. Do I understand correctly?
>> >
>> > Thanks,
>> > Dongjin
>> >
>> > On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <do...@apache.org> wrote:
>> >
>> > > Jason,
>> > >
>> > > > I think we would only use this error code when we /know/ that zstd
>> was
>> > > in use and the client doesn't support it? This is true if either 1)
>> the
>> > > message needs down-conversion and we encounter a zstd compressed
>> message,
>> > > or 2) if the topic is explicitly configured to use zstd.
>> > >
>> > > Yes, it is right. And you know, the case 1 includes 1.a) old clients'
>> > > request v0, v1 records or 1.b) implicit zstd, the compression type of
>> > > "producer" with Zstd compressed data.
>> > >
>> > > > However, if the compression type is set to "producer," then the
>> fetched
>> > > data may or may not be compressed with zstd. In this case, we return
>> the
>> > > data to the client and expect it to fail parsing. Is that correct?
>> > >
>> > > Exactly.
>> > >
>> > > Following your message, I reviewed the implementation of
>> > > `KafkaApis#handleFetchRequest,` which handles the fetch request. And
>> > found
>> > > that the information we can use is like the following:
>> > >
>> > > 1. Client's fetch request version. (`versionId` variable)
>> > > 2. Log's compression type. (`logConfig` variable)
>> > >
>> > > We can't detect the actual compression type of the data, unless we
>> > inspect
>> > > the `RecordBatch` included in the `Records` instance (i.e.,
>> > > `unconvertedRecords` variable.) Since it requires some performance
>> issue,
>> > > it is not our option - in short, we can't be sure if given chunks of
>> data
>> > > are compressed with zstd or not.
>> > >
>> > > So, conclusion: we can return an error in the case of 1.a and 2
>> easily,
>> > > with the information above. In the case 1.b (implicit zstd), we can
>> just
>> > > return the data by do nothing special and expect it to fail parsing.
>> > >
>> > > Thanks,
>> > > Dongjin
>> > >
>> > > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <is...@gmail.com>
>> wrote:
>> > >
>> > >> Jason, that's an interesting point regarding the Java client. Do we
>> know
>> > >> what clients in other languages do in these cases?
>> > >>
>> > >> Ismael
>> > >>
>> > >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <ja...@confluent.io>
>> wrote:
>> > >>
>> > >> > Hi Dongjin,
>> > >> >
>> > >> > One of the complications is that old versions of the API will not
>> > >> expect a
>> > >> > new error code. However, since we expect this to be a fatal error
>> > anyway
>> > >> > for old clients, it may still be more useful to return the correct
>> > error
>> > >> > code. For example, the Kafka clients use the following code to
>> convert
>> > >> the
>> > >> > error code:
>> > >> >
>> > >> >     public static Errors forCode(short code) {
>> > >> >         Errors error = codeToError.get(code);
>> > >> >         if (error != null) {
>> > >> >             return error;
>> > >> >         } else {
>> > >> >             log.warn("Unexpected error code: {}.", code);
>> > >> >             return UNKNOWN_SERVER_ERROR;
>> > >> >         }
>> > >> >     }
>> > >> >
>> > >> > If we return an unsupported error code, it will be converted to an
>> > >> UNKNOWN
>> > >> > error, but at least we will get the message in the log with the
>> > correct
>> > >> > code. That seems preferable to returning a misleading error code.
>> So I
>> > >> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error
>> even
>> > for
>> > >> > older versions.
>> > >> >
>> > >> > Also, one question just to check my understanding. I think we would
>> > only
>> > >> > use this error code when we /know/ that zstd was in use and the
>> client
>> > >> > doesn't support it? This is true if either 1) the message needs
>> > >> > down-conversion and we encounter a zstd compressed message, or 2)
>> if
>> > the
>> > >> > topic is explicitly configured to use zstd. However, if the
>> > compression
>> > >> > type is set to "producer," then the fetched data may or may not be
>> > >> > compressed with zstd. In this case, we return the data to the
>> client
>> > and
>> > >> > expect it to fail parsing. Is that correct?
>> > >> >
>> > >> > Thanks,
>> > >> > Jason
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <do...@apache.org>
>> > >> wrote:
>> > >> >
>> > >> > > Ismael, Jason and all,
>> > >> > >
>> > >> > > I rewrote the backward compatibility strategy & its alternatives
>> > like
>> > >> > > following, based on Ismael & Jason's comments. Since it is not
>> > >> updated to
>> > >> > > the wiki yet, don't hesitate to give me a message if you have any
>> > >> opinion
>> > >> > > on it.
>> > >> > >
>> > >> > > ```
>> > >> > > *Backward Compatibility*
>> > >> > >
>> > >> > > We need to establish some backward-compatibility strategy for the
>> > >> case an
>> > >> > > old client subscribes a topic using ZStandard implicitly (i.e.,
>> > >> > > 'compression.type' configuration of given topic is 'producer' and
>> > the
>> > >> > > producer compressed the records with ZStandard). We have the
>> > following
>> > >> > > options for this situation:
>> > >> > >
>> > >> > > *A. Support ZStandard to the old clients which can understand
>> v0, v1
>> > >> > > messages only.*
>> > >> > >
>> > >> > > This strategy necessarily requires the down-conversion of v2
>> message
>> > >> > > compressed with Zstandard into v0 or v1 messages, which means a
>> > >> > > considerable performance degradation. So we rejected this
>> strategy.
>> > >> > >
>> > >> > > *B. Bump the API version and support only v2-available clients*
>> > >> > >
>> > >> > > With this approach, we can message the old clients that they are
>> old
>> > >> and
>> > >> > > should be upgraded. However, there are still several options for
>> the
>> > >> > Error
>> > >> > > code.
>> > >> > >
>> > >> > > *B.1. INVALID_REQUEST (42)*
>> > >> > >
>> > >> > > This option gives the client so little information; the user can
>> be
>> > >> > > confused about why the client worked correctly in the past
>> suddenly
>> > >> > > encounters a problem. So we rejected this strategy.
>> > >> > >
>> > >> > > *B.2. CORRUPT_MESSAGE (2)*
>> > >> > >
>> > >> > > This option gives inaccurate information; the user can be
>> surprised
>> > >> and
>> > >> > > misunderstand that the log files are broken in some way. So we
>> > >> rejected
>> > >> > > this strategy.
>> > >> > >
>> > >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
>> > >> > >
>> > >> > > The advantage of this approach is we don't need to define a new
>> > error
>> > >> > code;
>> > >> > > we can reuse it and that's all.
>> > >> > >
>> > >> > > The disadvantage of this approach is that it is also a little bit
>> > >> vague;
>> > >> > > This error code is defined as a work for KIP-98[^1] and now
>> returned
>> > >> in
>> > >> > the
>> > >> > > transaction error.
>> > >> > >
>> > >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
>> > >> > >
>> > >> > > The advantage of this approach is that it is clear and provides
>> an
>> > >> exact
>> > >> > > description. The disadvantage is we need to add a new error code.
>> > >> > > ```
>> > >> > >
>> > >> > > *It seems like what we need to choose is now so clear:
>> > >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or
>> UNSUPPORTED_COMPRESSION_TYPE
>> > >> > > (B.4).*
>> > >> > > The first one doesn't need a new error message but the latter is
>> > more
>> > >> > > explicit. Which one do you prefer? Since all of you have much
>> more
>> > >> > > experience and knowledge than me, I will follow your decision.
>> The
>> > >> wiki
>> > >> > > page will be updated following the decision also.
>> > >> > >
>> > >> > > Best,
>> > >> > > Dongjin
>> > >> > >
>> > >> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
>> > >> > >
>> > >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <is...@gmail.com>
>> > >> wrote:
>> > >> > >
>> > >> > > > Sounds reasonable to me.
>> > >> > > >
>> > >> > > > Ismael
>> > >> > > >
>> > >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <
>> jason@confluent.io>
>> > >> > wrote:
>> > >> > > >
>> > >> > > > > Hey Ismael,
>> > >> > > > >
>> > >> > > > > Your summary looks good to me. I think it might also be a
>> good
>> > >> idea
>> > >> > to
>> > >> > > > add
>> > >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along
>> with
>> > the
>> > >> > > > version
>> > >> > > > > bumps. We won't be able to use it for old api versions since
>> the
>> > >> > > clients
>> > >> > > > > will not understand it, but we can use it going forward so
>> that
>> > >> we're
>> > >> > > not
>> > >> > > > > stuck in a similar situation with a new message format and a
>> new
>> > >> > codec
>> > >> > > to
>> > >> > > > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_
>> > FORMAT,
>> > >> but
>> > >> > > it
>> > >> > > > is
>> > >> > > > > not as explicit.
>> > >> > > > >
>> > >> > > > > -Jason
>> > >> > > > >
>> > >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <
>> ismael@juma.me.uk
>> > >
>> > >> > > wrote:
>> > >> > > > >
>> > >> > > > > > Hi Dongjin and Jason,
>> > >> > > > > >
>> > >> > > > > > I would agree. My summary:
>> > >> > > > > >
>> > >> > > > > > 1. Support zstd with message format 2 only.
>> > >> > > > > > 2. Bump produce and fetch request versions.
>> > >> > > > > > 3. Provide broker errors whenever possible based on the
>> > request
>> > >> > > version
>> > >> > > > > and
>> > >> > > > > > rely on clients for the cases where the broker can't
>> validate
>> > >> > > > efficiently
>> > >> > > > > > (example message format 2 consumer that supports the latest
>> > >> fetch
>> > >> > > > version
>> > >> > > > > > but doesn't support zstd).
>> > >> > > > > >
>> > >> > > > > > If there's general agreement on this, I suggest we update
>> the
>> > >> KIP
>> > >> > to
>> > >> > > > > state
>> > >> > > > > > the proposal and to move the rejected options to its own
>> > >> section.
>> > >> > And
>> > >> > > > > then
>> > >> > > > > > start a vote!
>> > >> > > > > >
>> > >> > > > > > Ismael
>> > >> > > > > >
>> > >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
>> > >> > jason@confluent.io>
>> > >> > > > > > wrote:
>> > >> > > > > >
>> > >> > > > > > > Hi Dongjin,
>> > >> > > > > > >
>> > >> > > > > > > Yes, that's a good summary. For clients which support v2,
>> > the
>> > >> > > client
>> > >> > > > > can
>> > >> > > > > > > parse the message format and hopefully raise a useful
>> error
>> > >> > message
>> > >> > > > > > > indicating the unsupported compression type. For older
>> > >> clients,
>> > >> > our
>> > >> > > > > > options
>> > >> > > > > > > are probably (1) to down-convert to the old format using
>> no
>> > >> > > > compression
>> > >> > > > > > > type, or (2) to return an error code. I'm leaning toward
>> the
>> > >> > latter
>> > >> > > > as
>> > >> > > > > > the
>> > >> > > > > > > simpler solution, but the challenge is finding a good
>> error
>> > >> code.
>> > >> > > Two
>> > >> > > > > > > possibilities might be INVALID_REQUEST or
>> CORRUPT_MESSAGE.
>> > The
>> > >> > > > downside
>> > >> > > > > > is
>> > >> > > > > > > that old clients probably won't get a helpful message.
>> > >> However,
>> > >> > at
>> > >> > > > > least
>> > >> > > > > > > the behavior will be consistent in the sense that all
>> > clients
>> > >> > will
>> > >> > > > fail
>> > >> > > > > > if
>> > >> > > > > > > they do not support zstandard.
>> > >> > > > > > >
>> > >> > > > > > > What do you think?
>> > >> > > > > > >
>> > >> > > > > > > Thanks,
>> > >> > > > > > > Jason
>> > >> > > > > > >
>> > >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
>> > >> dongjin@apache.org
>> > >> > >
>> > >> > > > > wrote:
>> > >> > > > > > >
>> > >> > > > > > > > Thanks Jason, I reviewed the down-converting logic
>> > following
>> > >> > your
>> > >> > > > > > > > explanation.[^1] You mean the following routines,
>> right?
>> > >> > > > > > > >
>> > >> > > > > > > > -
>> > >> > > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/
>> > >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
>> > >> > > > > > > > -
>> > >> > > > > > > >
>> https://github.com/apache/kafka/blob/trunk/clients/src/
>> > >> > > > > > > > main/java/org/apache/kafka/common/record/
>> > >> > > LazyDownConversionRecords.
>> > >> > > > > > > > java#L165
>> > >> > > > > > > > -
>> > >> > > > > > > >
>> https://github.com/apache/kafka/blob/trunk/clients/src/
>> > >> > > > > > > >
>> > >> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
>> > >> > > > > > > >
>> > >> > > > > > > > It seems like your stance is like following:
>> > >> > > > > > > >
>> > >> > > > > > > > 1. In principle, Kafka does not change the compression
>> > codec
>> > >> > when
>> > >> > > > > > > > down-converting, since it requires inspecting the
>> fetched
>> > >> data,
>> > >> > > > which
>> > >> > > > > > is
>> > >> > > > > > > > expensive.
>> > >> > > > > > > > 2. However, there are some cases the fetched data is
>> > >> inspected
>> > >> > > > > anyway.
>> > >> > > > > > In
>> > >> > > > > > > > this case, we can provide compression conversion from
>> > >> Zstandard
>> > >> > > to
>> > >> > > > > > > > classical ones[^2].
>> > >> > > > > > > >
>> > >> > > > > > > > And from what I understand, the cases where the client
>> > >> without
>> > >> > > > > > ZStandard
>> > >> > > > > > > > support receives ZStandard compressed records can be
>> > >> organized
>> > >> > > into
>> > >> > > > > two
>> > >> > > > > > > > cases:
>> > >> > > > > > > >
>> > >> > > > > > > > a. The 'compression.type' configuration of given topic
>> is
>> > >> > > > 'producer'
>> > >> > > > > > and
>> > >> > > > > > > > the producer compressed the records with ZStandard.
>> (that
>> > >> is,
>> > >> > > using
>> > >> > > > > > > > ZStandard implicitly.)
>> > >> > > > > > > > b.  The 'compression.type' configuration of given
>> topic is
>> > >> > > 'zstd';
>> > >> > > > > that
>> > >> > > > > > > is,
>> > >> > > > > > > > using ZStandard explicitly.
>> > >> > > > > > > >
>> > >> > > > > > > > As you stated, we don't have to handle the case b
>> > specially.
>> > >> > So,
>> > >> > > It
>> > >> > > > > > seems
>> > >> > > > > > > > like we can narrow the focus of the problem by joining
>> > case
>> > >> 1
>> > >> > and
>> > >> > > > > case
>> > >> > > > > > b
>> > >> > > > > > > > like the following:
>> > >> > > > > > > >
>> > >> > > > > > > > > Given the topic with 'producer' as its
>> > 'compression.type'
>> > >> > > > > > > configuration,
>> > >> > > > > > > > ZStandard compressed records and old client without
>> > >> ZStandard,
>> > >> > is
>> > >> > > > > there
>> > >> > > > > > > any
>> > >> > > > > > > > case we need to inspect the records and can change the
>> > >> > > compression
>> > >> > > > > > type?
>> > >> > > > > > > If
>> > >> > > > > > > > so, can we provide compression type converting?
>> > >> > > > > > > >
>> > >> > > > > > > > Do I understand correctly?
>> > >> > > > > > > >
>> > >> > > > > > > > Best,
>> > >> > > > > > > > Dongjin
>> > >> > > > > > > >
>> > >> > > > > > > > [^1]: I'm sorry, I found that I was a little bit
>> > >> > misunderstanding
>> > >> > > > how
>> > >> > > > > > API
>> > >> > > > > > > > version works, after reviewing the downconvert logic &
>> the
>> > >> > > protocol
>> > >> > > > > > > > documentation <https://kafka.apache.org/protocol>.
>> > >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
>> > >> > > > > > > >
>> > >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
>> > >> > > > jason@confluent.io>
>> > >> > > > > > > > wrote:
>> > >> > > > > > > >
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > But in my opinion, since the client will fail with
>> the
>> > >> API
>> > >> > > > > version,
>> > >> > > > > > > so
>> > >> > > > > > > > we
>> > >> > > > > > > > > > don't need to down-convert the messages anyway.
>> Isn't
>> > >> it?
>> > >> > > So, I
>> > >> > > > > > think
>> > >> > > > > > > > we
>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
>> > >> familiar
>> > >> > > with
>> > >> > > > > > > > > down-convert
>> > >> > > > > > > > > > logic.)
>> > >> > > > > > > > >
>> > >> > > > > > > > >
>> > >> > > > > > > > > Currently the broker down-converts automatically
>> when it
>> > >> > > receives
>> > >> > > > > an
>> > >> > > > > > > old
>> > >> > > > > > > > > version of the fetch request (a version which is
>> known
>> > to
>> > >> > > predate
>> > >> > > > > the
>> > >> > > > > > > > > message format in use). Typically when
>> down-converting
>> > the
>> > >> > > > message
>> > >> > > > > > > > format,
>> > >> > > > > > > > > we use the same compression type, but there is not
>> much
>> > >> point
>> > >> > > in
>> > >> > > > > > doing
>> > >> > > > > > > so
>> > >> > > > > > > > > when we know the client doesn't support it. So if
>> > >> zstandard
>> > >> > is
>> > >> > > in
>> > >> > > > > > use,
>> > >> > > > > > > > and
>> > >> > > > > > > > > we have to down-convert anyway, then we can choose to
>> > use
>> > >> a
>> > >> > > > > different
>> > >> > > > > > > > > compression type or no compression type.
>> > >> > > > > > > > >
>> > >> > > > > > > > > From my perspective, there is no significant
>> downside to
>> > >> > > bumping
>> > >> > > > > the
>> > >> > > > > > > > > protocol version and it has several potential
>> benefits.
>> > >> > Version
>> > >> > > > > bumps
>> > >> > > > > > > are
>> > >> > > > > > > > > cheap. The main question mark in my mind is about
>> > >> > > > down-conversion.
>> > >> > > > > > > > Figuring
>> > >> > > > > > > > > out whether down-conversion is needed is hard
>> generally
>> > >> > without
>> > >> > > > > > > > inspecting
>> > >> > > > > > > > > the fetched data, which is expensive. I think we
>> agree
>> > in
>> > >> > > > principle
>> > >> > > > > > > that
>> > >> > > > > > > > we
>> > >> > > > > > > > > do not want to have to pay this cost generally and
>> > prefer
>> > >> the
>> > >> > > > > clients
>> > >> > > > > > > to
>> > >> > > > > > > > > fail when they see an unhandled compression type. The
>> > >> point I
>> > >> > > was
>> > >> > > > > > > making
>> > >> > > > > > > > is
>> > >> > > > > > > > > that there are some cases where we are either
>> inspecting
>> > >> the
>> > >> > > data
>> > >> > > > > > > anyway
>> > >> > > > > > > > > (because we have to down-convert the message
>> format), or
>> > >> we
>> > >> > > have
>> > >> > > > an
>> > >> > > > > > > easy
>> > >> > > > > > > > > way to tell whether zstandard is in use (the topic
>> has
>> > it
>> > >> > > > > configured
>> > >> > > > > > > > > explicitly). In the latter case, we don't have to
>> handle
>> > >> it
>> > >> > > > > > specially.
>> > >> > > > > > > > But
>> > >> > > > > > > > > we do have to decide how we will handle
>> down-conversion
>> > to
>> > >> > > older
>> > >> > > > > > > formats.
>> > >> > > > > > > > >
>> > >> > > > > > > > > -Jason
>> > >> > > > > > > > >
>> > >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
>> > >> > > dongjin@apache.org
>> > >> > > > >
>> > >> > > > > > > wrote:
>> > >> > > > > > > > >
>> > >> > > > > > > > > > Colin and Jason,
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > Thanks for your opinions. In summarizing, the Pros
>> and
>> > >> Cons
>> > >> > > of
>> > >> > > > > > > bumping
>> > >> > > > > > > > > > fetch API version are:
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > Cons:
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > - The Broker can't know whether a given message
>> batch
>> > is
>> > >> > > > > compressed
>> > >> > > > > > > > with
>> > >> > > > > > > > > > zstd or not.
>> > >> > > > > > > > > > - Need some additional logic for the topic
>> explicitly
>> > >> > > > configured
>> > >> > > > > to
>> > >> > > > > > > use
>> > >> > > > > > > > > > zstd.
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > Pros:
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > - The broker doesn't need to conduct expensive
>> > >> > > down-conversion.
>> > >> > > > > > > > > > - Can message the users to update their client.
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > So, opinions for the backward-compatibility policy
>> by
>> > >> far:
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > - A: bump the API version - +2 (Colin, Jason)
>> > >> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > Here are my additional comments:
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > @Colin
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > I greatly appreciate your response. In the case of
>> the
>> > >> > > > dictionary
>> > >> > > > > > > > > support,
>> > >> > > > > > > > > > of course, this issue should be addressed later so
>> we
>> > >> don't
>> > >> > > > need
>> > >> > > > > it
>> > >> > > > > > > in
>> > >> > > > > > > > > the
>> > >> > > > > > > > > > first version. You are right - it is not late to
>> try
>> > it
>> > >> > after
>> > >> > > > > some
>> > >> > > > > > > > > > benchmarks. What I mean is, we should keep in mind
>> on
>> > >> that
>> > >> > > > > > potential
>> > >> > > > > > > > > > feature.
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > @Jason
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > You wrote,
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > > Similarly, if we have to down-convert anyway
>> because
>> > >> the
>> > >> > > > client
>> > >> > > > > > > does
>> > >> > > > > > > > > not
>> > >> > > > > > > > > > understand the message format, then we could also
>> use
>> > a
>> > >> > > > different
>> > >> > > > > > > > > > compression type.
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > But in my opinion, since the client will fail with
>> the
>> > >> API
>> > >> > > > > version,
>> > >> > > > > > > so
>> > >> > > > > > > > we
>> > >> > > > > > > > > > don't need to down-convert the messages anyway.
>> Isn't
>> > >> it?
>> > >> > > So, I
>> > >> > > > > > think
>> > >> > > > > > > > we
>> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
>> > >> familiar
>> > >> > > with
>> > >> > > > > > > > > down-convert
>> > >> > > > > > > > > > logic.)
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > Please give more opinions. Thanks!
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > - Dongjin
>> > >> > > > > > > > > >
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
>> > >> > > > > jason@confluent.io
>> > >> > > > > > >
>> > >> > > > > > > > > wrote:
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > > Hey Colin,
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > The problem for the fetch API is that the broker
>> > does
>> > >> not
>> > >> > > > > > generally
>> > >> > > > > > > > > know
>> > >> > > > > > > > > > if
>> > >> > > > > > > > > > > a batch was compressed with zstd unless it parses
>> > it.
>> > >> I
>> > >> > > think
>> > >> > > > > the
>> > >> > > > > > > > goal
>> > >> > > > > > > > > > here
>> > >> > > > > > > > > > > is to avoid the expensive down-conversion that is
>> > >> needed
>> > >> > to
>> > >> > > > > > ensure
>> > >> > > > > > > > > > > compatibility because it is only necessary if
>> zstd
>> > is
>> > >> > > > actually
>> > >> > > > > in
>> > >> > > > > > > > use.
>> > >> > > > > > > > > > But
>> > >> > > > > > > > > > > as long as old clients can parse the message
>> format,
>> > >> they
>> > >> > > > > should
>> > >> > > > > > > get
>> > >> > > > > > > > a
>> > >> > > > > > > > > > > reasonable error if they see an unsupported
>> > >> compression
>> > >> > > type
>> > >> > > > in
>> > >> > > > > > the
>> > >> > > > > > > > > > > attributes. Basically the onus is on users to
>> ensure
>> > >> that
>> > >> > > > their
>> > >> > > > > > > > > consumers
>> > >> > > > > > > > > > > have been updated prior to using zstd. It seems
>> > like a
>> > >> > > > > reasonable
>> > >> > > > > > > > > > tradeoff
>> > >> > > > > > > > > > > to me. There are a couple cases that might be
>> worth
>> > >> > > thinking
>> > >> > > > > > > through:
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > 1. If a topic is explicitly configured to use
>> zstd,
>> > >> then
>> > >> > we
>> > >> > > > > don't
>> > >> > > > > > > > need
>> > >> > > > > > > > > to
>> > >> > > > > > > > > > > check the fetched data for the compression type
>> to
>> > >> know
>> > >> > if
>> > >> > > we
>> > >> > > > > > need
>> > >> > > > > > > > > > > down-conversion. If we did bump the Fetch API
>> > version,
>> > >> > then
>> > >> > > > we
>> > >> > > > > > > could
>> > >> > > > > > > > > > handle
>> > >> > > > > > > > > > > this case by either down-converting using a
>> > different
>> > >> > > > > compression
>> > >> > > > > > > > type
>> > >> > > > > > > > > or
>> > >> > > > > > > > > > > returning an error.
>> > >> > > > > > > > > > > 2. Similarly, if we have to down-convert anyway
>> > >> because
>> > >> > the
>> > >> > > > > > client
>> > >> > > > > > > > does
>> > >> > > > > > > > > > not
>> > >> > > > > > > > > > > understand the message format, then we could also
>> > use
>> > >> a
>> > >> > > > > different
>> > >> > > > > > > > > > > compression type.
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > For the produce API, I think it's reasonable to
>> bump
>> > >> the
>> > >> > > api
>> > >> > > > > > > version.
>> > >> > > > > > > > > > This
>> > >> > > > > > > > > > > can be used by clients to check whether a broker
>> > >> supports
>> > >> > > > zstd.
>> > >> > > > > > For
>> > >> > > > > > > > > > > example, we might support a list of preferred
>> > >> compression
>> > >> > > > types
>> > >> > > > > > in
>> > >> > > > > > > > the
>> > >> > > > > > > > > > > producer and we could use the broker to detect
>> which
>> > >> > > version
>> > >> > > > to
>> > >> > > > > > > use.
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > -Jason
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
>> > >> > > > > cmccabe@apache.org
>> > >> > > > > > >
>> > >> > > > > > > > > wrote:
>> > >> > > > > > > > > > >
>> > >> > > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a
>> good
>> > >> > > > compression
>> > >> > > > > > > codec
>> > >> > > > > > > > > > and I
>> > >> > > > > > > > > > > > hope we can get this support in soon!
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > I would say we can just bump the API version to
>> > >> > indicate
>> > >> > > > that
>> > >> > > > > > > ZStd
>> > >> > > > > > > > > > > support
>> > >> > > > > > > > > > > > is expected in new clients.  We probably need
>> some
>> > >> way
>> > >> > of
>> > >> > > > > > > > indicating
>> > >> > > > > > > > > to
>> > >> > > > > > > > > > > the
>> > >> > > > > > > > > > > > older clients that they can't consume the
>> > >> partitions,
>> > >> > as
>> > >> > > > > well.
>> > >> > > > > > > > > Perhaps
>> > >> > > > > > > > > > > we
>> > >> > > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT
>> error?
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > The license thing seems straightforward -- it's
>> > >> just a
>> > >> > > > matter
>> > >> > > > > > of
>> > >> > > > > > > > > adding
>> > >> > > > > > > > > > > > the text to the right files as per ASF
>> guidelines.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > With regard to the dictionary support, do we
>> > really
>> > >> > need
>> > >> > > > that
>> > >> > > > > > in
>> > >> > > > > > > > the
>> > >> > > > > > > > > > > first
>> > >> > > > > > > > > > > > version?  Hopefully message batches are big
>> enough
>> > >> that
>> > >> > > > this
>> > >> > > > > > > isn't
>> > >> > > > > > > > > > > needed.
>> > >> > > > > > > > > > > > Some benchmarks might help here.
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > best,
>> > >> > > > > > > > > > > > Colin
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > >
>> > >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee
>> wrote:
>> > >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot
>> this
>> > >> issue,
>> > >> > > > > KIP-110
>> > >> > > > > > > > > > > > > <https://cwiki.apache.org/
>> > >> > > confluence/display/KAFKA/KIP-
>> > >> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
>> > >> > > > > > > > > > > > > .
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > For newcomers, Here is some summary of the
>> > >> history:
>> > >> > > > KIP-110
>> > >> > > > > > was
>> > >> > > > > > > > > > > > originally
>> > >> > > > > > > > > > > > > worked for the issue KAFKA-4514 but, it
>> lacked
>> > >> > > benchmark
>> > >> > > > > > > results
>> > >> > > > > > > > to
>> > >> > > > > > > > > > get
>> > >> > > > > > > > > > > > the
>> > >> > > > > > > > > > > > > agreement of the community. Later, Ivan
>> Babrou
>> > and
>> > >> > some
>> > >> > > > > other
>> > >> > > > > > > > users
>> > >> > > > > > > > > > who
>> > >> > > > > > > > > > > > > adopted the patch provided their excellent
>> > >> > performance
>> > >> > > > > report
>> > >> > > > > > > > which
>> > >> > > > > > > > > > is
>> > >> > > > > > > > > > > > now
>> > >> > > > > > > > > > > > > included in the KIP, but it postponed again
>> > >> because
>> > >> > of
>> > >> > > > the
>> > >> > > > > > > > > community
>> > >> > > > > > > > > > > was
>> > >> > > > > > > > > > > > > busy for 2.0.0 release. It is why I now
>> reboot
>> > >> this
>> > >> > > > issue.
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > The following is the current status of the
>> > >> feature:
>> > >> > You
>> > >> > > > can
>> > >> > > > > > > check
>> > >> > > > > > > > > the
>> > >> > > > > > > > > > > > > current draft implementation here
>> > >> > > > > > > > > > > > > <https://github.com/apache/kafka/pull/2267>.
>> It
>> > >> is
>> > >> > > based
>> > >> > > > > on
>> > >> > > > > > > zstd
>> > >> > > > > > > > > > 1.3.5
>> > >> > > > > > > > > > > > and
>> > >> > > > > > > > > > > > > periodically rebased onto the latest
>> trunk[^1].
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > The issues that should be addressed is like
>> > >> > following:
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > *1. Backward Compatibility*
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > To support old consumers, we need to take a
>> > >> strategy
>> > >> > to
>> > >> > > > > > handle
>> > >> > > > > > > > the
>> > >> > > > > > > > > > old
>> > >> > > > > > > > > > > > > consumers. Current candidates are:
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > - Bump API version
>> > >> > > > > > > > > > > > > - Leave unchanged: let the old clients fail.
>> > >> > > > > > > > > > > > > - Improve the error messages:
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > *2. Dictionary Support*
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > To support zstd's dictionary feature in the
>> > future
>> > >> > (if
>> > >> > > > > > needed),
>> > >> > > > > > > > we
>> > >> > > > > > > > > > need
>> > >> > > > > > > > > > > > to
>> > >> > > > > > > > > > > > > sketch how it should be and leave some room
>> for
>> > >> it.
>> > >> > As
>> > >> > > of
>> > >> > > > > > now,
>> > >> > > > > > > > > there
>> > >> > > > > > > > > > > has
>> > >> > > > > > > > > > > > > been no discussion on this topic yet.
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > *3. License*
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > To use this feature, we need to add license
>> of
>> > >> zstd
>> > >> > and
>> > >> > > > > > > zstd-jni
>> > >> > > > > > > > to
>> > >> > > > > > > > > > the
>> > >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for
>> raising
>> > >> this
>> > >> > > > issue!)
>> > >> > > > > > It
>> > >> > > > > > > > > seems
>> > >> > > > > > > > > > > like
>> > >> > > > > > > > > > > > > what Apache Spark did would be a good example
>> > but
>> > >> > there
>> > >> > > > has
>> > >> > > > > > > been
>> > >> > > > > > > > no
>> > >> > > > > > > > > > > > > discussion yet.
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > You can find the details of the above issues
>> in
>> > >> the
>> > >> > KIP
>> > >> > > > > > > document.
>> > >> > > > > > > > > > > Please
>> > >> > > > > > > > > > > > > have a look when you are free, and give me
>> > >> feedback.
>> > >> > > All
>> > >> > > > > > kinds
>> > >> > > > > > > of
>> > >> > > > > > > > > > > > > participating are welcome.
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > Best,
>> > >> > > > > > > > > > > > > Dongjin
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > [^1]: At the time of writing, commit
>> 6b4fb8152.
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee
>> <
>> > >> > > > > > > dongjin@apache.org
>> > >> > > > > > > > >
>> > >> > > > > > > > > > > wrote:
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > > Sorry for the late reply.
>> > >> > > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > > In short, I could not submit the updated
>> KIP
>> > by
>> > >> the
>> > >> > > > > feature
>> > >> > > > > > > > > freeze
>> > >> > > > > > > > > > > > > > deadline of 2.0.0. For this reason, it will
>> > not
>> > >> be
>> > >> > > > > included
>> > >> > > > > > > in
>> > >> > > > > > > > > the
>> > >> > > > > > > > > > > > 2.0.0
>> > >> > > > > > > > > > > > > > release and all discussion for this issue
>> were
>> > >> > > > postponed
>> > >> > > > > > > after
>> > >> > > > > > > > > the
>> > >> > > > > > > > > > > > release
>> > >> > > > > > > > > > > > > > of 2.0.0.
>> > >> > > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > > I have been updating the PR following
>> recent
>> > >> > updates.
>> > >> > > > > Just
>> > >> > > > > > > > now, I
>> > >> > > > > > > > > > > > rebased
>> > >> > > > > > > > > > > > > > it against the latest trunk and updated the
>> > zstd
>> > >> > > > version
>> > >> > > > > > into
>> > >> > > > > > > > > > 1.3.5.
>> > >> > > > > > > > > > > > If you
>> > >> > > > > > > > > > > > > > need some request, don't hesitate to notify
>> > me.
>> > >> > (But
>> > >> > > > not
>> > >> > > > > > this
>> > >> > > > > > > > > > thread
>> > >> > > > > > > > > > > -
>> > >> > > > > > > > > > > > just
>> > >> > > > > > > > > > > > > > send me the message directly.)
>> > >> > > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > > Best,
>> > >> > > > > > > > > > > > > > Dongjin
>> > >> > > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby
>> Evans <
>> > >> > > > > > > bobby@apache.org
>> > >> > > > > > > > >
>> > >> > > > > > > > > > > wrote:
>> > >> > > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > >> I there any update on this.  The
>> performance
>> > >> > > > > improvements
>> > >> > > > > > > are
>> > >> > > > > > > > > > quite
>> > >> > > > > > > > > > > > > >> impressive and I really would like to stop
>> > >> forking
>> > >> > > > kafka
>> > >> > > > > > > just
>> > >> > > > > > > > to
>> > >> > > > > > > > > > get
>> > >> > > > > > > > > > > > this
>> > >> > > > > > > > > > > > > >> in.
>> > >> > > > > > > > > > > > > >>
>> > >> > > > > > > > > > > > > >> Thanks,
>> > >> > > > > > > > > > > > > >>
>> > >> > > > > > > > > > > > > >> Bobby
>> > >> > > > > > > > > > > > > >>
>> > >> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin
>> Lee <
>> > >> > > > > > > > dongjin@apache.org
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > > > wrote:
>> > >> > > > > > > > > > > > > >>
>> > >> > > > > > > > > > > > > >> > Ismael,
>> > >> > > > > > > > > > > > > >> >
>> > >> > > > > > > > > > > > > >> > Oh, I forgot all of you are on working
>> > frenzy
>> > >> > for
>> > >> > > > 2.0!
>> > >> > > > > > No
>> > >> > > > > > > > > > problem,
>> > >> > > > > > > > > > > > take
>> > >> > > > > > > > > > > > > >> > your time. I am also working at another
>> > issue
>> > >> > now.
>> > >> > > > > Thank
>> > >> > > > > > > you
>> > >> > > > > > > > > for
>> > >> > > > > > > > > > > > > >> letting me
>> > >> > > > > > > > > > > > > >> > know.
>> > >> > > > > > > > > > > > > >> >
>> > >> > > > > > > > > > > > > >> > Best,
>> > >> > > > > > > > > > > > > >> > Dongjin
>> > >> > > > > > > > > > > > > >> >
>> > >> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael
>> Juma
>> > <
>> > >> > > > > > > > ismaelj@gmail.com
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > > > wrote:
>> > >> > > > > > > > > > > > > >> >
>> > >> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin. Everyone
>> is
>> > >> busy
>> > >> > > > > > finalising
>> > >> > > > > > > > > > 2.0.0.
>> > >> > > > > > > > > > > > This
>> > >> > > > > > > > > > > > > >> KIP
>> > >> > > > > > > > > > > > > >> > > seems like a great candidate for 2.1.0
>> > and
>> > >> > > > hopefully
>> > >> > > > > > > there
>> > >> > > > > > > > > > will
>> > >> > > > > > > > > > > be
>> > >> > > > > > > > > > > > > >> more
>> > >> > > > > > > > > > > > > >> > of
>> > >> > > > > > > > > > > > > >> > > a discussion next week. :)
>> > >> > > > > > > > > > > > > >> > >
>> > >> > > > > > > > > > > > > >> > > Ismael
>> > >> > > > > > > > > > > > > >> > >
>> > >> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin
>> Lee, <
>> > >> > > > > > > > dongjin@apache.org
>> > >> > > > > > > > > >
>> > >> > > > > > > > > > > > wrote:
>> > >> > > > > > > > > > > > > >> > >
>> > >> > > > > > > > > > > > > >> > > > Hello. I just updated my draft
>> > >> > implementation:
>> > >> > > > > > > > > > > > > >> > > >
>> > >> > > > > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit
>> > >> 5145d6b)
>> > >> > > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
>> > >> > > > > > > > > > > > > >> > > >
>> > >> > > > > > > > > > > > > >> > > > You can check out the implementation
>> > from
>> > >> > here
>> > >> > > > > > > > > > > > > >> > > > <
>> > >> https://github.com/apache/kafka/pull/2267
>> > >> > >.
>> > >> > > If
>> > >> > > > > you
>> > >> > > > > > > > > > > experience
>> > >> > > > > > > > > > > > any
>> > >> > > > > > > > > > > > > >> > > problem
>> > >> > > > > > > > > > > > > >> > > > running it, don't hesitate to give
>> me a
>> > >> > > mention.
>> > >> > > > > > > > > > > > > >> > > >
>> > >> > > > > > > > > > > > > >> > > > Best,
>> > >> > > > > > > > > > > > > >> > > > Dongjin
>> > >> > > > > > > > > > > > > >> > > >
>> > >> > > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM
>> Dongjin
>> > >> Lee
>> > >> > <
>> > >> > > > > > > > > > > dongjin@apache.org
>> > >> > > > > > > > > > > > >
>> > >> > > > > > > > > > > > > >> > wrote:
>> > >> > > > > > > > > > > > > >> > > >
>> > >> > > > > > > > > > > > > >> > > > > Here is the short conclusion about
>> > the
>> > >> > > license
>> > >> > > > > > > > problem:
>> > >> > > > > > > > > > *We
>> > >> > > > > > > > > > > > can
>> > >> > > > > > > > > > > > > >> use
>> > >> > > > > > > > > > > > > >> > > zstd
>> > >> > > > > > > > > > > > > >> > > > > and zstd-jni without any problem,
>> but
>> > >> we
>> > >> > > need
>> > >> > > > to
>> > >> > > > > > > > include
>> > >> > > > > > > > > > > their
>> > >> > > > > > > > > > > > > >> > license,
>> > >> > > > > > > > > > > > > >> > > > > e.g., BSD license.*
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3
>> > Clause
>> > >> > > > License
>> > >> > > > > > > > requires
>> > >> > > > > > > > > > to
>> > >> > > > > > > > > > > > > >> include
>> > >> > > > > > > > > > > > > >> > the
>> > >> > > > > > > > > > > > > >> > > > > license used, and BSD 3 Clause
>> > License
>> > >> > > > requires
>> > >> > > > > > that
>> > >> > > > > > > > the
>> > >> > > > > > > > > > > name
>> > >> > > > > > > > > > > > of
>> > >> > > > > > > > > > > > > >> the
>> > >> > > > > > > > > > > > > >> > > > > contributor can't be used to
>> endorse
>> > or
>> > >> > > > promote
>> > >> > > > > > the
>> > >> > > > > > > > > > product.
>> > >> > > > > > > > > > > > > >> That's
>> > >> > > > > > > > > > > > > >> > it
>> > >> > > > > > > > > > > > > >> > > > > <
>> > >> > > > > > > > > > > > > >> > > >
>> > >> > > > > > > > > > > > > >> > >
>> > >> > > > > > > > > > > > > >> >
>> > >> > > > > > > > > > > > > >>
>> > >> > http://www.mikestratton.net/2011/12/is-bsd-license-
>> > >> > > > > > > > > > > > compatible-with-apache-2-0-license/
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > > - They are not listed in the list
>> of
>> > >> > > > prohibited
>> > >> > > > > > > > licenses
>> > >> > > > > > > > > > > > > >> > > > > <https://www.apache.org/legal/
>> > >> > > > > > > > resolved.html#category-x>
>> > >> > > > > > > > > > > also.
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > > Here is how Spark did for it
>> > >> > > > > > > > > > > > > >> > > > > <https://issues.apache.org/
>> > >> > > > > > jira/browse/SPARK-19112
>> > >> > > > > > > >:
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > > - They made a directory dedicated
>> to
>> > >> the
>> > >> > > > > > dependency
>> > >> > > > > > > > > > license
>> > >> > > > > > > > > > > > files
>> > >> > > > > > > > > > > > > >> > > > > <
>> > >> > > > > > > https://github.com/apache/spark/tree/master/licenses
>> > >> > > > > > > > >
>> > >> > > > > > > > > > and
>> > >> > > > > > > > > > > > added
>> > >> > > > > > > > > > > > > >> > > > licenses
>> > >> > > > > > > > > > > > > >> > > > > for Zstd
>> > >> > > > > > > > > > > > > >> > > > > <
>> > >> > > > > > > > > > > > > >> >
>> > >> > > > https://github.com/apache/spark/blob/master/licenses/
>> > >> > > > > > > > > > > > LICENSE-zstd.txt
>> > >> > > > > > > > > > > > > >> > > >
>> > >> > > > > > > > > > > > > >> > > > &
>> > >> > > > > > > > > > > > > >> > > > > Zstd-jni
>> > >> > > > > > > > > > > > > >> > > > > <
>> > >> > > > > > > > > > > > > >> > > >
>> > >> > > > > > > > > > > > > >> > >
>> > >> > > > > > > > > > > > > >> >
>> > >> > > > > > > > > > > > > >> https://github.com/apache/
>> > >> > > spark/blob/master/licenses/
>> > >> > > > > > > > > > > > LICENSE-zstd-jni.txt
>> > >> > > > > > > > > > > > > >> >
>> > >> > > > > > > > > > > > > >> > > > > .
>> > >> > > > > > > > > > > > > >> > > > > - Added a link to the original
>> > license
>> > >> > files
>> > >> > > > in
>> > >> > > > > > > > LICENSE.
>> > >> > > > > > > > > > > > > >> > > > > <
>> > >> > > > > https://github.com/apache/spark/pull/18805/files
>> > >> > > > > > >
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > > If needed, I can make a similar
>> > update.
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > > Thanks for pointing out this
>> problem,
>> > >> > > Viktor!
>> > >> > > > > Nice
>> > >> > > > > > > > > catch!
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > > Best,
>> > >> > > > > > > > > > > > > >> > > > > Dongjin
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM
>> > >> Dongjin
>> > >> > > Lee <
>> > >> > > > > > > > > > > > dongjin@apache.org>
>> > >> > > > > > > > > > > > > >> > > wrote:
>> > >> > > > > > > > > > > > > >> > > > >
>> > >> > > > > > > > > > > > > >> > > > >> I greatly appreciate your
>> > >> comprehensive
>> > >> > > > > > reasoning.
>> > >> > > > > > > > so:
>> > >> > > > > > > > > +1
>> > >> > > > > > > > > > > > for b
>> > >> > > > > > > > > > > > > >> > until
>> > >> > > > > > > > > > > > > >> > > > now.
>> > >> > > > > > > > > > > > > >> > > > >>
>> > >> > > > > > > > > > > > > >> > > > >> For the license issues, I will
>> have
>> > a
>> > >> > check
>> > >> > > > on
>> > >> > > > > > how
>> > >> > > > > > > > the
>> > >> > > > > > > > > > over
>> > >> > > > > > > > > > > > > >> projects
>> > >> > > > > > > > > > > > > >> > > are
>> > >> > > > > > > > > > > > > >> > > > >> doing and share the results.
>> > >> > > > > > > > > > > > > >> > > > >>
>> > >> > > > > > > > > > > > > >> > > > >> Best,
>> > >> > > > > > > > > > > > > >> > > > >> Dongjin
>> > >> > > > > > > > > > > > > >> > > > >>
>> > >> > > > > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM
>> > >> Viktor
>> > >> > > > > Somogyi <
>> > >> > > > > > > > > > > > > >> > > > viktorsomogyi@gmail.com>
>> > >> > > > > > > > > > > > > >> > > > >> wrote:
>> > >> > > > > > > > > > > > > >> > > > >>
>> > >> > > > > > > > > > > > > >> > > > >>> Hi Dongjin,
>> > >> > > > > > > > > > > > > >> > > > >>>
>> > >> > > > > > > > > > > > > >> > > > >>> A couple of comments:
>> > >> > > > > > > > > > > > > >> > > > >>> I would vote for option b. in
>> the
>> > >> > > "backward
>> > >> > > > > > > > > > compatibility"
>> > >> > > > > > > > > > > > > >> section.
>> > >> > > > > > > > > > > > > >> > > My
>> > >> > > > > > > > > > > > > >> > > > >>> reasoning for this is that users
>> > >> > upgrading
>> > >> > > > to
>> > >> > > > > a
>> > >> > > > > > > zstd
>> > >> > > > > > > > > > > > compatible
>> > >> > > > > > > > > > > > > >> > > version
>> > >> > > > > > > > > > > > > >> > > > >>> won't start to use it
>> > automatically,
>> > >> so
>> > >> > > > manual
>> > >> > > > > > > > > > > > reconfiguration
>> > >> > > > > > > > > > > > > >> is
>> > >> > > > > > > > > > > > > >> > > > >>> required.
>> > >> > > > > > > > > > > > > >> > > > >>> Therefore an upgrade won't mess
>> up
>> > >> the
>> > >> > > > > cluster.
>> > >> > > > > > If
>> > >> > > > > > > > not
>> > >> > > > > > > > > > all
>> > >> > > > > > > > > > > > the
>> > >> > > > > > > > > > > > > >> > > clients
>> > >> > > > > > > > > > > > > >> > > > >>> are
>> > >> > > > > > > > > > > > > >> > > > >>> upgraded but just some of them
>> and
>> > >> > they'd
>> > >> > > > > start
>> > >> > > > > > to
>> > >> > > > > > > > use
>> > >> > > > > > > > > > > zstd
>> > >> > > > > > > > > > > > > >> then it
>> > >> > > > > > > > > > > > > >> > > > would
>> > >> > > > > > > > > > > > > >> > > > >>> cause errors in the cluster. I'd
>> > >> like to
>> > >> > > > > presume
>> > >> > > > > > > > > though
>> > >> > > > > > > > > > > that
>> > >> > > > > > > > > > > > > >> this
>> > >> > > > > > > > > > > > > >> > is
>> > >> > > > > > > > > > > > > >> > > a
>> > >> > > > > > > > > > > > > >> > > > >>> very
>> > >> > > > > > > > > > > > > >> > > > >>> obvious failure case and nobody
>> > >> should
>> > >> > be
>> > >> > > > > > > surprised
>> > >> > > > > > > > if
>> > >> > > > > > > > > > it
>> > >> > > > > > > > > > > > didn't
>> > >> > > > > > > > > > > > > >> > > work.
>> > >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose a. as I think
>> we
>> > >> > should
>> > >> > > > bump
>> > >> > > > > > the
>> > >> > > > > > > > > fetch
>> > >> > > > > > > > > > > and
>> > >> > > > > > > > > > > > > >> > produce
>> > >> > > > > > > > > > > > > >> > > > >>> requests if it's a change in the
>> > >> message
>> > >> > > > > format.
>> > >> > > > > > > > > > Moreover
>> > >> > > > > > > > > > > if
>> > >> > > > > > > > > > > > > >> some
>> > >> > > > > > > > > > > > > >> > of
>> > >> > > > > > > > > > > > > >> > > > the
>> > >> > > > > > > > > > > > > >> > > > >>> producers and the brokers are
>> > >> upgraded
>> > >> > but
>> > >> > > > > some
>> > >> > > > > > of
>> > >> > > > > > > > the
>> > >> > > > > > > > > > > > consumers
>> > >> > > > > > > > > > > > > >> > are
>> > >> > > > > > > > > > > > > >> > > > not,
>> > >> > > > > > > > > > > > > >> > > > >>> then we wouldn't prevent the
>> error
>> > >> when
>> > >> > > the
>> > >> > > > > old
>> > >> > > > > > > > > consumer
>> > >> > > > > > > > > > > > tries
>> > >> > > > > > > > > > > > > >> to
>> > >> > > > > > > > > > > > > >> > > > consume
>> > >> > > > > > > > > > > > > >> > > > >>> the zstd compressed messages.
>> > >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose c. either as I
>> > >> think
>> > >> > > > binding
>> > >> > > > > > the
>> > >> > > > > > > > > > > > compression
>> > >> > > > > > > > > > > > > >> type
>> > >> > > > > > > > > > > > > >> > > to
>> > >> > > > > > > > > > > > > >> > > > an
>> > >> > > > > > > > > > > > > >> > > > >>> API is not so obvious from the
>> > >> > developer's
>> > >> > > > > > > > > perspective.
>> > >> > > > > > > > > > > > > >> > > > >>>
>> > >> > > > > > > > > > > > > >> > > > >>> I would also prefer to use the
>> > >> existing
>> > >> > > > > binding,
>> > >> > > > > > > > > however
>> > >> > > > > > > > > > > we
>> > >> > > > > > > > > > > > must
>> > >> > > > > > > > > > > > > >> > > > respect
>> > >> > > > > > > > > > > > > >> > > > >>> the licenses:
>> > >> > > > > > > > > > > > > >> > > > >>> "The code for these JNI
>> bindings is
>> > >> > > licenced
>> > >> > > > > > under
>> > >> > > > > > > > > > > 2-clause
>> > >> > > > > > > > > > > > BSD
>> > >> > > > > > > > > > > > > >> > > > license.
>> > >> > > > > > > > > > > > > >> > > > >>> The native Zstd library is
>> licensed
>> > >> > under
>> > >> > > > > > 3-clause
>> > >> > > > > > > > BSD
>> > >> > > > > > > > > > > > license
>> > >> > > > > > > > > > > > > >> and
>> > >> > > > > > > > > > > > > >> > > > GPL2"
>> > >> > > > > > > > > > > > > >> > > > >>> Based on the FAQ page
>> > >> > > > > > > > > > > > > >> > > > >>> https://www.apache.org/legal/
>> > >> > > > > > > > resolved.html#category-a
>> > >> > > > > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD
>> > >> licenses
>> > >> > > but
>> > >> > > > > the
>> > >> > > > > > > > Apache
>> > >> > > > > > > > > > > > license
>> > >> > > > > > > > > > > > > >> is
>> > >> > > > > > > > > > > > > >> > not
>> > >> > > > > > > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping
>> > that
>> > >> > the
>> > >> > > > > > > "3-clause
>> > >> > > > > > > > > BSD
>> > >> > > > > > > > > > > > license
>> > >> > > > > > > > > > > > > >> and
>> > >> > > > > > > > > > > > > >> > > > GPL2"
>> > >> > > > > > > > > > > > > >> > > > >>> is really not an AND but an OR
>> in
>> > >> this
>> > >> > > case,
>> > >> > > > > but
>> > >> > > > > > > I'm
>> > >> > > > > > > > > no
>> > >> > > > > > > > > > > > lawyer,
>> > >> > > > > > > > > > > > > >> > just
>> > >> > > > > > > > > > > > > >> > > > >>> wanted
>> > >> > > > > > > > > > > > > >> > > > >>> to make the point that we should
>> > >> watch
>> > >> > out
>> > >> > > > for
>> > >> > > > > > > > > licenses.
>> > >> > > > > > > > > > > :)
>> > >> > > > > > > > > > > > > >> > > > >>>
>> > >> > > > > > > > > > > > > >> > > > >>> Regards,
>> > >> > > > > > > > > > > > > >> > > > >>> Viktor
>> > >> > > > > > > > > > > > > >> > > > >>>
>> > >> > > > > > > > > > > > > >> > > > >>>
>> > >> > > > > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM
>> > Ivan
>> > >> > > Babrou
>> > >> > > > <
>> > >> > > > > > > > > > > > ibobrik@gmail.com>
>> > >> > > > > > > > > > > > > >> > > wrote:
>> > >> > > > > > > > > > > > > >> > > > >>>
>> > >> > > > > > > > > > > > > >> > > > >>> > Hello,
>> > >> > > > > > > > > > > > > >> > > > >>> >
>> > >> > > > > > > > > > > > > >> > > > >>> > This is Ivan and I still very
>> > much
>> > >> > > support
>> > >> > > > > the
>> > >> > > > > > > > fact
>> > >> > > > > > > > > > that
>> > >> > > > > > > > > > > > zstd
>> > >> > > > > > > > > > > > > >> > > > >>> compression
>> > >> > > > > > > > > > > > > >> > > > >>> > should be included out of the
>> > box.
>> > >> > > > > > > > > > > > > >> > > > >>> >
>> > >> > > > > > > > > > > > > >> > > > >>> > Please think about the
>> > environment,
>> > >> > you
>> > >> > > > can
>> > >> > > > > > save
>> > >> > > > > > > > > > quite a
>> > >> > > > > > > > > > > > lot
>> > >> > > > > > > > > > > > > >> of
>> > >> > > > > > > > > > > > > >> > > > >>> hardware
>> > >> > > > > > > > > > > > > >> > > > >>> > with it.
>> > >> > > > > > > > > > > > > >> > > > >>> >
>> > >> > > > > > > > > > > > > >> > > > >>> > Thank you.
>> > >> > > > > > > > > > > > > >> > > > >>> >
>> > >> > > > > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14
>> > >> Dongjin
>> > >> > > Lee <
>> > >> > > > > > > > > > > > dongjin@apache.org>
>> > >> > > > > > > > > > > > > >> > > wrote:
>> > >> > > > > > > > > > > > > >> > > > >>> >
>> > >> > > > > > > > > > > > > >> > > > >>> > > Since there are no responses
>> > for
>> > >> a
>> > >> > > > week, I
>> > >> > > > > > > > decided
>> > >> > > > > > > > > > to
>> > >> > > > > > > > > > > > > >> > reinitiate
>> > >> > > > > > > > > > > > > >> > > > the
>> > >> > > > > > > > > > > > > >> > > > >>> > > discussion thread.
>> > >> > > > > > > > > > > > > >> > > > >>> > >
>> > >> > > > > > > > > > > > > >> > > > >>> > >
>> > >> > > > > > > > > > > > > >> > > > >>> > >
>> > >> > > > > > > > > > > > > >> > > > >>> >
>> > >> > > > > > > > > > > > > >> > > > >>>
>> > >> > > > > > > > > > > > > >> > > >
>> > >> >
>>
>> --
>> *Dongjin Lee*
>>
>> *A hitchhiker in the mathematical world.*
>>
>> *github:  <http://goog_969573159/>github.com/dongjinleekr
>> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
>> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
>> <http://www.slideshare.net/dongjinleekr>*
>>
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Jason,

Great. +1 for UNSUPPORTED_COMPRESSION_TYPE.

Best,
Dongjin

On Thu, Aug 23, 2018 at 8:19 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hey Dongjin,
>
> Yeah that's right. For what it's worth, librdkafka also appears to handle
> unexpected error codes. I expect that most client implementations would
> either pass through the raw type or convert to an enum using something like
> what the java client does. Since we're expecting the client to fail anyway,
> I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE error code.
>
> -Jason
>
> On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <do...@apache.org> wrote:
>
> > Jason and Ismael,
> >
> > It seems like the only thing we need to regard if we define a new error
> > code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the implementation of
> > the other language clients, right? At least, this strategy causes any
> > problem for Java client. Do I understand correctly?
> >
> > Thanks,
> > Dongjin
> >
> > On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <do...@apache.org> wrote:
> >
> > > Jason,
> > >
> > > > I think we would only use this error code when we /know/ that zstd
> was
> > > in use and the client doesn't support it? This is true if either 1) the
> > > message needs down-conversion and we encounter a zstd compressed
> message,
> > > or 2) if the topic is explicitly configured to use zstd.
> > >
> > > Yes, it is right. And you know, the case 1 includes 1.a) old clients'
> > > request v0, v1 records or 1.b) implicit zstd, the compression type of
> > > "producer" with Zstd compressed data.
> > >
> > > > However, if the compression type is set to "producer," then the
> fetched
> > > data may or may not be compressed with zstd. In this case, we return
> the
> > > data to the client and expect it to fail parsing. Is that correct?
> > >
> > > Exactly.
> > >
> > > Following your message, I reviewed the implementation of
> > > `KafkaApis#handleFetchRequest,` which handles the fetch request. And
> > found
> > > that the information we can use is like the following:
> > >
> > > 1. Client's fetch request version. (`versionId` variable)
> > > 2. Log's compression type. (`logConfig` variable)
> > >
> > > We can't detect the actual compression type of the data, unless we
> > inspect
> > > the `RecordBatch` included in the `Records` instance (i.e.,
> > > `unconvertedRecords` variable.) Since it requires some performance
> issue,
> > > it is not our option - in short, we can't be sure if given chunks of
> data
> > > are compressed with zstd or not.
> > >
> > > So, conclusion: we can return an error in the case of 1.a and 2 easily,
> > > with the information above. In the case 1.b (implicit zstd), we can
> just
> > > return the data by do nothing special and expect it to fail parsing.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <is...@gmail.com>
> wrote:
> > >
> > >> Jason, that's an interesting point regarding the Java client. Do we
> know
> > >> what clients in other languages do in these cases?
> > >>
> > >> Ismael
> > >>
> > >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <ja...@confluent.io>
> wrote:
> > >>
> > >> > Hi Dongjin,
> > >> >
> > >> > One of the complications is that old versions of the API will not
> > >> expect a
> > >> > new error code. However, since we expect this to be a fatal error
> > anyway
> > >> > for old clients, it may still be more useful to return the correct
> > error
> > >> > code. For example, the Kafka clients use the following code to
> convert
> > >> the
> > >> > error code:
> > >> >
> > >> >     public static Errors forCode(short code) {
> > >> >         Errors error = codeToError.get(code);
> > >> >         if (error != null) {
> > >> >             return error;
> > >> >         } else {
> > >> >             log.warn("Unexpected error code: {}.", code);
> > >> >             return UNKNOWN_SERVER_ERROR;
> > >> >         }
> > >> >     }
> > >> >
> > >> > If we return an unsupported error code, it will be converted to an
> > >> UNKNOWN
> > >> > error, but at least we will get the message in the log with the
> > correct
> > >> > code. That seems preferable to returning a misleading error code.
> So I
> > >> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error even
> > for
> > >> > older versions.
> > >> >
> > >> > Also, one question just to check my understanding. I think we would
> > only
> > >> > use this error code when we /know/ that zstd was in use and the
> client
> > >> > doesn't support it? This is true if either 1) the message needs
> > >> > down-conversion and we encounter a zstd compressed message, or 2) if
> > the
> > >> > topic is explicitly configured to use zstd. However, if the
> > compression
> > >> > type is set to "producer," then the fetched data may or may not be
> > >> > compressed with zstd. In this case, we return the data to the client
> > and
> > >> > expect it to fail parsing. Is that correct?
> > >> >
> > >> > Thanks,
> > >> > Jason
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <do...@apache.org>
> > >> wrote:
> > >> >
> > >> > > Ismael, Jason and all,
> > >> > >
> > >> > > I rewrote the backward compatibility strategy & its alternatives
> > like
> > >> > > following, based on Ismael & Jason's comments. Since it is not
> > >> updated to
> > >> > > the wiki yet, don't hesitate to give me a message if you have any
> > >> opinion
> > >> > > on it.
> > >> > >
> > >> > > ```
> > >> > > *Backward Compatibility*
> > >> > >
> > >> > > We need to establish some backward-compatibility strategy for the
> > >> case an
> > >> > > old client subscribes a topic using ZStandard implicitly (i.e.,
> > >> > > 'compression.type' configuration of given topic is 'producer' and
> > the
> > >> > > producer compressed the records with ZStandard). We have the
> > following
> > >> > > options for this situation:
> > >> > >
> > >> > > *A. Support ZStandard to the old clients which can understand v0,
> v1
> > >> > > messages only.*
> > >> > >
> > >> > > This strategy necessarily requires the down-conversion of v2
> message
> > >> > > compressed with Zstandard into v0 or v1 messages, which means a
> > >> > > considerable performance degradation. So we rejected this
> strategy.
> > >> > >
> > >> > > *B. Bump the API version and support only v2-available clients*
> > >> > >
> > >> > > With this approach, we can message the old clients that they are
> old
> > >> and
> > >> > > should be upgraded. However, there are still several options for
> the
> > >> > Error
> > >> > > code.
> > >> > >
> > >> > > *B.1. INVALID_REQUEST (42)*
> > >> > >
> > >> > > This option gives the client so little information; the user can
> be
> > >> > > confused about why the client worked correctly in the past
> suddenly
> > >> > > encounters a problem. So we rejected this strategy.
> > >> > >
> > >> > > *B.2. CORRUPT_MESSAGE (2)*
> > >> > >
> > >> > > This option gives inaccurate information; the user can be
> surprised
> > >> and
> > >> > > misunderstand that the log files are broken in some way. So we
> > >> rejected
> > >> > > this strategy.
> > >> > >
> > >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
> > >> > >
> > >> > > The advantage of this approach is we don't need to define a new
> > error
> > >> > code;
> > >> > > we can reuse it and that's all.
> > >> > >
> > >> > > The disadvantage of this approach is that it is also a little bit
> > >> vague;
> > >> > > This error code is defined as a work for KIP-98[^1] and now
> returned
> > >> in
> > >> > the
> > >> > > transaction error.
> > >> > >
> > >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
> > >> > >
> > >> > > The advantage of this approach is that it is clear and provides an
> > >> exact
> > >> > > description. The disadvantage is we need to add a new error code.
> > >> > > ```
> > >> > >
> > >> > > *It seems like what we need to choose is now so clear:
> > >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or
> UNSUPPORTED_COMPRESSION_TYPE
> > >> > > (B.4).*
> > >> > > The first one doesn't need a new error message but the latter is
> > more
> > >> > > explicit. Which one do you prefer? Since all of you have much more
> > >> > > experience and knowledge than me, I will follow your decision. The
> > >> wiki
> > >> > > page will be updated following the decision also.
> > >> > >
> > >> > > Best,
> > >> > > Dongjin
> > >> > >
> > >> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
> > >> > >
> > >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <is...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > > Sounds reasonable to me.
> > >> > > >
> > >> > > > Ismael
> > >> > > >
> > >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <jason@confluent.io
> >
> > >> > wrote:
> > >> > > >
> > >> > > > > Hey Ismael,
> > >> > > > >
> > >> > > > > Your summary looks good to me. I think it might also be a good
> > >> idea
> > >> > to
> > >> > > > add
> > >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along with
> > the
> > >> > > > version
> > >> > > > > bumps. We won't be able to use it for old api versions since
> the
> > >> > > clients
> > >> > > > > will not understand it, but we can use it going forward so
> that
> > >> we're
> > >> > > not
> > >> > > > > stuck in a similar situation with a new message format and a
> new
> > >> > codec
> > >> > > to
> > >> > > > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_
> > FORMAT,
> > >> but
> > >> > > it
> > >> > > > is
> > >> > > > > not as explicit.
> > >> > > > >
> > >> > > > > -Jason
> > >> > > > >
> > >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <
> ismael@juma.me.uk
> > >
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > Hi Dongjin and Jason,
> > >> > > > > >
> > >> > > > > > I would agree. My summary:
> > >> > > > > >
> > >> > > > > > 1. Support zstd with message format 2 only.
> > >> > > > > > 2. Bump produce and fetch request versions.
> > >> > > > > > 3. Provide broker errors whenever possible based on the
> > request
> > >> > > version
> > >> > > > > and
> > >> > > > > > rely on clients for the cases where the broker can't
> validate
> > >> > > > efficiently
> > >> > > > > > (example message format 2 consumer that supports the latest
> > >> fetch
> > >> > > > version
> > >> > > > > > but doesn't support zstd).
> > >> > > > > >
> > >> > > > > > If there's general agreement on this, I suggest we update
> the
> > >> KIP
> > >> > to
> > >> > > > > state
> > >> > > > > > the proposal and to move the rejected options to its own
> > >> section.
> > >> > And
> > >> > > > > then
> > >> > > > > > start a vote!
> > >> > > > > >
> > >> > > > > > Ismael
> > >> > > > > >
> > >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
> > >> > jason@confluent.io>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi Dongjin,
> > >> > > > > > >
> > >> > > > > > > Yes, that's a good summary. For clients which support v2,
> > the
> > >> > > client
> > >> > > > > can
> > >> > > > > > > parse the message format and hopefully raise a useful
> error
> > >> > message
> > >> > > > > > > indicating the unsupported compression type. For older
> > >> clients,
> > >> > our
> > >> > > > > > options
> > >> > > > > > > are probably (1) to down-convert to the old format using
> no
> > >> > > > compression
> > >> > > > > > > type, or (2) to return an error code. I'm leaning toward
> the
> > >> > latter
> > >> > > > as
> > >> > > > > > the
> > >> > > > > > > simpler solution, but the challenge is finding a good
> error
> > >> code.
> > >> > > Two
> > >> > > > > > > possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE.
> > The
> > >> > > > downside
> > >> > > > > > is
> > >> > > > > > > that old clients probably won't get a helpful message.
> > >> However,
> > >> > at
> > >> > > > > least
> > >> > > > > > > the behavior will be consistent in the sense that all
> > clients
> > >> > will
> > >> > > > fail
> > >> > > > > > if
> > >> > > > > > > they do not support zstandard.
> > >> > > > > > >
> > >> > > > > > > What do you think?
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > > Jason
> > >> > > > > > >
> > >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
> > >> dongjin@apache.org
> > >> > >
> > >> > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Thanks Jason, I reviewed the down-converting logic
> > following
> > >> > your
> > >> > > > > > > > explanation.[^1] You mean the following routines, right?
> > >> > > > > > > >
> > >> > > > > > > > -
> > >> > > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/
> > >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> > >> > > > > > > > -
> > >> > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > >> > > > > > > > main/java/org/apache/kafka/common/record/
> > >> > > LazyDownConversionRecords.
> > >> > > > > > > > java#L165
> > >> > > > > > > > -
> > >> > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > >> > > > > > > >
> > >> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > >> > > > > > > >
> > >> > > > > > > > It seems like your stance is like following:
> > >> > > > > > > >
> > >> > > > > > > > 1. In principle, Kafka does not change the compression
> > codec
> > >> > when
> > >> > > > > > > > down-converting, since it requires inspecting the
> fetched
> > >> data,
> > >> > > > which
> > >> > > > > > is
> > >> > > > > > > > expensive.
> > >> > > > > > > > 2. However, there are some cases the fetched data is
> > >> inspected
> > >> > > > > anyway.
> > >> > > > > > In
> > >> > > > > > > > this case, we can provide compression conversion from
> > >> Zstandard
> > >> > > to
> > >> > > > > > > > classical ones[^2].
> > >> > > > > > > >
> > >> > > > > > > > And from what I understand, the cases where the client
> > >> without
> > >> > > > > > ZStandard
> > >> > > > > > > > support receives ZStandard compressed records can be
> > >> organized
> > >> > > into
> > >> > > > > two
> > >> > > > > > > > cases:
> > >> > > > > > > >
> > >> > > > > > > > a. The 'compression.type' configuration of given topic
> is
> > >> > > > 'producer'
> > >> > > > > > and
> > >> > > > > > > > the producer compressed the records with ZStandard.
> (that
> > >> is,
> > >> > > using
> > >> > > > > > > > ZStandard implicitly.)
> > >> > > > > > > > b.  The 'compression.type' configuration of given topic
> is
> > >> > > 'zstd';
> > >> > > > > that
> > >> > > > > > > is,
> > >> > > > > > > > using ZStandard explicitly.
> > >> > > > > > > >
> > >> > > > > > > > As you stated, we don't have to handle the case b
> > specially.
> > >> > So,
> > >> > > It
> > >> > > > > > seems
> > >> > > > > > > > like we can narrow the focus of the problem by joining
> > case
> > >> 1
> > >> > and
> > >> > > > > case
> > >> > > > > > b
> > >> > > > > > > > like the following:
> > >> > > > > > > >
> > >> > > > > > > > > Given the topic with 'producer' as its
> > 'compression.type'
> > >> > > > > > > configuration,
> > >> > > > > > > > ZStandard compressed records and old client without
> > >> ZStandard,
> > >> > is
> > >> > > > > there
> > >> > > > > > > any
> > >> > > > > > > > case we need to inspect the records and can change the
> > >> > > compression
> > >> > > > > > type?
> > >> > > > > > > If
> > >> > > > > > > > so, can we provide compression type converting?
> > >> > > > > > > >
> > >> > > > > > > > Do I understand correctly?
> > >> > > > > > > >
> > >> > > > > > > > Best,
> > >> > > > > > > > Dongjin
> > >> > > > > > > >
> > >> > > > > > > > [^1]: I'm sorry, I found that I was a little bit
> > >> > misunderstanding
> > >> > > > how
> > >> > > > > > API
> > >> > > > > > > > version works, after reviewing the downconvert logic &
> the
> > >> > > protocol
> > >> > > > > > > > documentation <https://kafka.apache.org/protocol>.
> > >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
> > >> > > > > > > >
> > >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
> > >> > > > jason@confluent.io>
> > >> > > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > But in my opinion, since the client will fail with
> the
> > >> API
> > >> > > > > version,
> > >> > > > > > > so
> > >> > > > > > > > we
> > >> > > > > > > > > > don't need to down-convert the messages anyway.
> Isn't
> > >> it?
> > >> > > So, I
> > >> > > > > > think
> > >> > > > > > > > we
> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
> > >> familiar
> > >> > > with
> > >> > > > > > > > > down-convert
> > >> > > > > > > > > > logic.)
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > Currently the broker down-converts automatically when
> it
> > >> > > receives
> > >> > > > > an
> > >> > > > > > > old
> > >> > > > > > > > > version of the fetch request (a version which is known
> > to
> > >> > > predate
> > >> > > > > the
> > >> > > > > > > > > message format in use). Typically when down-converting
> > the
> > >> > > > message
> > >> > > > > > > > format,
> > >> > > > > > > > > we use the same compression type, but there is not
> much
> > >> point
> > >> > > in
> > >> > > > > > doing
> > >> > > > > > > so
> > >> > > > > > > > > when we know the client doesn't support it. So if
> > >> zstandard
> > >> > is
> > >> > > in
> > >> > > > > > use,
> > >> > > > > > > > and
> > >> > > > > > > > > we have to down-convert anyway, then we can choose to
> > use
> > >> a
> > >> > > > > different
> > >> > > > > > > > > compression type or no compression type.
> > >> > > > > > > > >
> > >> > > > > > > > > From my perspective, there is no significant downside
> to
> > >> > > bumping
> > >> > > > > the
> > >> > > > > > > > > protocol version and it has several potential
> benefits.
> > >> > Version
> > >> > > > > bumps
> > >> > > > > > > are
> > >> > > > > > > > > cheap. The main question mark in my mind is about
> > >> > > > down-conversion.
> > >> > > > > > > > Figuring
> > >> > > > > > > > > out whether down-conversion is needed is hard
> generally
> > >> > without
> > >> > > > > > > > inspecting
> > >> > > > > > > > > the fetched data, which is expensive. I think we agree
> > in
> > >> > > > principle
> > >> > > > > > > that
> > >> > > > > > > > we
> > >> > > > > > > > > do not want to have to pay this cost generally and
> > prefer
> > >> the
> > >> > > > > clients
> > >> > > > > > > to
> > >> > > > > > > > > fail when they see an unhandled compression type. The
> > >> point I
> > >> > > was
> > >> > > > > > > making
> > >> > > > > > > > is
> > >> > > > > > > > > that there are some cases where we are either
> inspecting
> > >> the
> > >> > > data
> > >> > > > > > > anyway
> > >> > > > > > > > > (because we have to down-convert the message format),
> or
> > >> we
> > >> > > have
> > >> > > > an
> > >> > > > > > > easy
> > >> > > > > > > > > way to tell whether zstandard is in use (the topic has
> > it
> > >> > > > > configured
> > >> > > > > > > > > explicitly). In the latter case, we don't have to
> handle
> > >> it
> > >> > > > > > specially.
> > >> > > > > > > > But
> > >> > > > > > > > > we do have to decide how we will handle
> down-conversion
> > to
> > >> > > older
> > >> > > > > > > formats.
> > >> > > > > > > > >
> > >> > > > > > > > > -Jason
> > >> > > > > > > > >
> > >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
> > >> > > dongjin@apache.org
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Colin and Jason,
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thanks for your opinions. In summarizing, the Pros
> and
> > >> Cons
> > >> > > of
> > >> > > > > > > bumping
> > >> > > > > > > > > > fetch API version are:
> > >> > > > > > > > > >
> > >> > > > > > > > > > Cons:
> > >> > > > > > > > > >
> > >> > > > > > > > > > - The Broker can't know whether a given message
> batch
> > is
> > >> > > > > compressed
> > >> > > > > > > > with
> > >> > > > > > > > > > zstd or not.
> > >> > > > > > > > > > - Need some additional logic for the topic
> explicitly
> > >> > > > configured
> > >> > > > > to
> > >> > > > > > > use
> > >> > > > > > > > > > zstd.
> > >> > > > > > > > > >
> > >> > > > > > > > > > Pros:
> > >> > > > > > > > > >
> > >> > > > > > > > > > - The broker doesn't need to conduct expensive
> > >> > > down-conversion.
> > >> > > > > > > > > > - Can message the users to update their client.
> > >> > > > > > > > > >
> > >> > > > > > > > > > So, opinions for the backward-compatibility policy
> by
> > >> far:
> > >> > > > > > > > > >
> > >> > > > > > > > > > - A: bump the API version - +2 (Colin, Jason)
> > >> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
> > >> > > > > > > > > >
> > >> > > > > > > > > > Here are my additional comments:
> > >> > > > > > > > > >
> > >> > > > > > > > > > @Colin
> > >> > > > > > > > > >
> > >> > > > > > > > > > I greatly appreciate your response. In the case of
> the
> > >> > > > dictionary
> > >> > > > > > > > > support,
> > >> > > > > > > > > > of course, this issue should be addressed later so
> we
> > >> don't
> > >> > > > need
> > >> > > > > it
> > >> > > > > > > in
> > >> > > > > > > > > the
> > >> > > > > > > > > > first version. You are right - it is not late to try
> > it
> > >> > after
> > >> > > > > some
> > >> > > > > > > > > > benchmarks. What I mean is, we should keep in mind
> on
> > >> that
> > >> > > > > > potential
> > >> > > > > > > > > > feature.
> > >> > > > > > > > > >
> > >> > > > > > > > > > @Jason
> > >> > > > > > > > > >
> > >> > > > > > > > > > You wrote,
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Similarly, if we have to down-convert anyway
> because
> > >> the
> > >> > > > client
> > >> > > > > > > does
> > >> > > > > > > > > not
> > >> > > > > > > > > > understand the message format, then we could also
> use
> > a
> > >> > > > different
> > >> > > > > > > > > > compression type.
> > >> > > > > > > > > >
> > >> > > > > > > > > > But in my opinion, since the client will fail with
> the
> > >> API
> > >> > > > > version,
> > >> > > > > > > so
> > >> > > > > > > > we
> > >> > > > > > > > > > don't need to down-convert the messages anyway.
> Isn't
> > >> it?
> > >> > > So, I
> > >> > > > > > think
> > >> > > > > > > > we
> > >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
> > >> familiar
> > >> > > with
> > >> > > > > > > > > down-convert
> > >> > > > > > > > > > logic.)
> > >> > > > > > > > > >
> > >> > > > > > > > > > Please give more opinions. Thanks!
> > >> > > > > > > > > >
> > >> > > > > > > > > > - Dongjin
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
> > >> > > > > jason@confluent.io
> > >> > > > > > >
> > >> > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Hey Colin,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > The problem for the fetch API is that the broker
> > does
> > >> not
> > >> > > > > > generally
> > >> > > > > > > > > know
> > >> > > > > > > > > > if
> > >> > > > > > > > > > > a batch was compressed with zstd unless it parses
> > it.
> > >> I
> > >> > > think
> > >> > > > > the
> > >> > > > > > > > goal
> > >> > > > > > > > > > here
> > >> > > > > > > > > > > is to avoid the expensive down-conversion that is
> > >> needed
> > >> > to
> > >> > > > > > ensure
> > >> > > > > > > > > > > compatibility because it is only necessary if zstd
> > is
> > >> > > > actually
> > >> > > > > in
> > >> > > > > > > > use.
> > >> > > > > > > > > > But
> > >> > > > > > > > > > > as long as old clients can parse the message
> format,
> > >> they
> > >> > > > > should
> > >> > > > > > > get
> > >> > > > > > > > a
> > >> > > > > > > > > > > reasonable error if they see an unsupported
> > >> compression
> > >> > > type
> > >> > > > in
> > >> > > > > > the
> > >> > > > > > > > > > > attributes. Basically the onus is on users to
> ensure
> > >> that
> > >> > > > their
> > >> > > > > > > > > consumers
> > >> > > > > > > > > > > have been updated prior to using zstd. It seems
> > like a
> > >> > > > > reasonable
> > >> > > > > > > > > > tradeoff
> > >> > > > > > > > > > > to me. There are a couple cases that might be
> worth
> > >> > > thinking
> > >> > > > > > > through:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > 1. If a topic is explicitly configured to use
> zstd,
> > >> then
> > >> > we
> > >> > > > > don't
> > >> > > > > > > > need
> > >> > > > > > > > > to
> > >> > > > > > > > > > > check the fetched data for the compression type to
> > >> know
> > >> > if
> > >> > > we
> > >> > > > > > need
> > >> > > > > > > > > > > down-conversion. If we did bump the Fetch API
> > version,
> > >> > then
> > >> > > > we
> > >> > > > > > > could
> > >> > > > > > > > > > handle
> > >> > > > > > > > > > > this case by either down-converting using a
> > different
> > >> > > > > compression
> > >> > > > > > > > type
> > >> > > > > > > > > or
> > >> > > > > > > > > > > returning an error.
> > >> > > > > > > > > > > 2. Similarly, if we have to down-convert anyway
> > >> because
> > >> > the
> > >> > > > > > client
> > >> > > > > > > > does
> > >> > > > > > > > > > not
> > >> > > > > > > > > > > understand the message format, then we could also
> > use
> > >> a
> > >> > > > > different
> > >> > > > > > > > > > > compression type.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > For the produce API, I think it's reasonable to
> bump
> > >> the
> > >> > > api
> > >> > > > > > > version.
> > >> > > > > > > > > > This
> > >> > > > > > > > > > > can be used by clients to check whether a broker
> > >> supports
> > >> > > > zstd.
> > >> > > > > > For
> > >> > > > > > > > > > > example, we might support a list of preferred
> > >> compression
> > >> > > > types
> > >> > > > > > in
> > >> > > > > > > > the
> > >> > > > > > > > > > > producer and we could use the broker to detect
> which
> > >> > > version
> > >> > > > to
> > >> > > > > > > use.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > -Jason
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
> > >> > > > > cmccabe@apache.org
> > >> > > > > > >
> > >> > > > > > > > > wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a
> good
> > >> > > > compression
> > >> > > > > > > codec
> > >> > > > > > > > > > and I
> > >> > > > > > > > > > > > hope we can get this support in soon!
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > I would say we can just bump the API version to
> > >> > indicate
> > >> > > > that
> > >> > > > > > > ZStd
> > >> > > > > > > > > > > support
> > >> > > > > > > > > > > > is expected in new clients.  We probably need
> some
> > >> way
> > >> > of
> > >> > > > > > > > indicating
> > >> > > > > > > > > to
> > >> > > > > > > > > > > the
> > >> > > > > > > > > > > > older clients that they can't consume the
> > >> partitions,
> > >> > as
> > >> > > > > well.
> > >> > > > > > > > > Perhaps
> > >> > > > > > > > > > > we
> > >> > > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT
> error?
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > The license thing seems straightforward -- it's
> > >> just a
> > >> > > > matter
> > >> > > > > > of
> > >> > > > > > > > > adding
> > >> > > > > > > > > > > > the text to the right files as per ASF
> guidelines.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > With regard to the dictionary support, do we
> > really
> > >> > need
> > >> > > > that
> > >> > > > > > in
> > >> > > > > > > > the
> > >> > > > > > > > > > > first
> > >> > > > > > > > > > > > version?  Hopefully message batches are big
> enough
> > >> that
> > >> > > > this
> > >> > > > > > > isn't
> > >> > > > > > > > > > > needed.
> > >> > > > > > > > > > > > Some benchmarks might help here.
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > best,
> > >> > > > > > > > > > > > Colin
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > >
> > >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee
> wrote:
> > >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot this
> > >> issue,
> > >> > > > > KIP-110
> > >> > > > > > > > > > > > > <https://cwiki.apache.org/
> > >> > > confluence/display/KAFKA/KIP-
> > >> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > >> > > > > > > > > > > > > .
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > For newcomers, Here is some summary of the
> > >> history:
> > >> > > > KIP-110
> > >> > > > > > was
> > >> > > > > > > > > > > > originally
> > >> > > > > > > > > > > > > worked for the issue KAFKA-4514 but, it lacked
> > >> > > benchmark
> > >> > > > > > > results
> > >> > > > > > > > to
> > >> > > > > > > > > > get
> > >> > > > > > > > > > > > the
> > >> > > > > > > > > > > > > agreement of the community. Later, Ivan Babrou
> > and
> > >> > some
> > >> > > > > other
> > >> > > > > > > > users
> > >> > > > > > > > > > who
> > >> > > > > > > > > > > > > adopted the patch provided their excellent
> > >> > performance
> > >> > > > > report
> > >> > > > > > > > which
> > >> > > > > > > > > > is
> > >> > > > > > > > > > > > now
> > >> > > > > > > > > > > > > included in the KIP, but it postponed again
> > >> because
> > >> > of
> > >> > > > the
> > >> > > > > > > > > community
> > >> > > > > > > > > > > was
> > >> > > > > > > > > > > > > busy for 2.0.0 release. It is why I now reboot
> > >> this
> > >> > > > issue.
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > The following is the current status of the
> > >> feature:
> > >> > You
> > >> > > > can
> > >> > > > > > > check
> > >> > > > > > > > > the
> > >> > > > > > > > > > > > > current draft implementation here
> > >> > > > > > > > > > > > > <https://github.com/apache/kafka/pull/2267>.
> It
> > >> is
> > >> > > based
> > >> > > > > on
> > >> > > > > > > zstd
> > >> > > > > > > > > > 1.3.5
> > >> > > > > > > > > > > > and
> > >> > > > > > > > > > > > > periodically rebased onto the latest
> trunk[^1].
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > The issues that should be addressed is like
> > >> > following:
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > *1. Backward Compatibility*
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > To support old consumers, we need to take a
> > >> strategy
> > >> > to
> > >> > > > > > handle
> > >> > > > > > > > the
> > >> > > > > > > > > > old
> > >> > > > > > > > > > > > > consumers. Current candidates are:
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > - Bump API version
> > >> > > > > > > > > > > > > - Leave unchanged: let the old clients fail.
> > >> > > > > > > > > > > > > - Improve the error messages:
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > *2. Dictionary Support*
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > To support zstd's dictionary feature in the
> > future
> > >> > (if
> > >> > > > > > needed),
> > >> > > > > > > > we
> > >> > > > > > > > > > need
> > >> > > > > > > > > > > > to
> > >> > > > > > > > > > > > > sketch how it should be and leave some room
> for
> > >> it.
> > >> > As
> > >> > > of
> > >> > > > > > now,
> > >> > > > > > > > > there
> > >> > > > > > > > > > > has
> > >> > > > > > > > > > > > > been no discussion on this topic yet.
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > *3. License*
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > To use this feature, we need to add license of
> > >> zstd
> > >> > and
> > >> > > > > > > zstd-jni
> > >> > > > > > > > to
> > >> > > > > > > > > > the
> > >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for raising
> > >> this
> > >> > > > issue!)
> > >> > > > > > It
> > >> > > > > > > > > seems
> > >> > > > > > > > > > > like
> > >> > > > > > > > > > > > > what Apache Spark did would be a good example
> > but
> > >> > there
> > >> > > > has
> > >> > > > > > > been
> > >> > > > > > > > no
> > >> > > > > > > > > > > > > discussion yet.
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > You can find the details of the above issues
> in
> > >> the
> > >> > KIP
> > >> > > > > > > document.
> > >> > > > > > > > > > > Please
> > >> > > > > > > > > > > > > have a look when you are free, and give me
> > >> feedback.
> > >> > > All
> > >> > > > > > kinds
> > >> > > > > > > of
> > >> > > > > > > > > > > > > participating are welcome.
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > Best,
> > >> > > > > > > > > > > > > Dongjin
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > [^1]: At the time of writing, commit
> 6b4fb8152.
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
> > >> > > > > > > dongjin@apache.org
> > >> > > > > > > > >
> > >> > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Sorry for the late reply.
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > In short, I could not submit the updated KIP
> > by
> > >> the
> > >> > > > > feature
> > >> > > > > > > > > freeze
> > >> > > > > > > > > > > > > > deadline of 2.0.0. For this reason, it will
> > not
> > >> be
> > >> > > > > included
> > >> > > > > > > in
> > >> > > > > > > > > the
> > >> > > > > > > > > > > > 2.0.0
> > >> > > > > > > > > > > > > > release and all discussion for this issue
> were
> > >> > > > postponed
> > >> > > > > > > after
> > >> > > > > > > > > the
> > >> > > > > > > > > > > > release
> > >> > > > > > > > > > > > > > of 2.0.0.
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > I have been updating the PR following recent
> > >> > updates.
> > >> > > > > Just
> > >> > > > > > > > now, I
> > >> > > > > > > > > > > > rebased
> > >> > > > > > > > > > > > > > it against the latest trunk and updated the
> > zstd
> > >> > > > version
> > >> > > > > > into
> > >> > > > > > > > > > 1.3.5.
> > >> > > > > > > > > > > > If you
> > >> > > > > > > > > > > > > > need some request, don't hesitate to notify
> > me.
> > >> > (But
> > >> > > > not
> > >> > > > > > this
> > >> > > > > > > > > > thread
> > >> > > > > > > > > > > -
> > >> > > > > > > > > > > > just
> > >> > > > > > > > > > > > > > send me the message directly.)
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > Best,
> > >> > > > > > > > > > > > > > Dongjin
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby
> Evans <
> > >> > > > > > > bobby@apache.org
> > >> > > > > > > > >
> > >> > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > >
> > >> > > > > > > > > > > > > >> I there any update on this.  The
> performance
> > >> > > > > improvements
> > >> > > > > > > are
> > >> > > > > > > > > > quite
> > >> > > > > > > > > > > > > >> impressive and I really would like to stop
> > >> forking
> > >> > > > kafka
> > >> > > > > > > just
> > >> > > > > > > > to
> > >> > > > > > > > > > get
> > >> > > > > > > > > > > > this
> > >> > > > > > > > > > > > > >> in.
> > >> > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > >> Thanks,
> > >> > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > >> Bobby
> > >> > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin
> Lee <
> > >> > > > > > > > dongjin@apache.org
> > >> > > > > > > > > >
> > >> > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > >>
> > >> > > > > > > > > > > > > >> > Ismael,
> > >> > > > > > > > > > > > > >> >
> > >> > > > > > > > > > > > > >> > Oh, I forgot all of you are on working
> > frenzy
> > >> > for
> > >> > > > 2.0!
> > >> > > > > > No
> > >> > > > > > > > > > problem,
> > >> > > > > > > > > > > > take
> > >> > > > > > > > > > > > > >> > your time. I am also working at another
> > issue
> > >> > now.
> > >> > > > > Thank
> > >> > > > > > > you
> > >> > > > > > > > > for
> > >> > > > > > > > > > > > > >> letting me
> > >> > > > > > > > > > > > > >> > know.
> > >> > > > > > > > > > > > > >> >
> > >> > > > > > > > > > > > > >> > Best,
> > >> > > > > > > > > > > > > >> > Dongjin
> > >> > > > > > > > > > > > > >> >
> > >> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael
> Juma
> > <
> > >> > > > > > > > ismaelj@gmail.com
> > >> > > > > > > > > >
> > >> > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > >> >
> > >> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin. Everyone
> is
> > >> busy
> > >> > > > > > finalising
> > >> > > > > > > > > > 2.0.0.
> > >> > > > > > > > > > > > This
> > >> > > > > > > > > > > > > >> KIP
> > >> > > > > > > > > > > > > >> > > seems like a great candidate for 2.1.0
> > and
> > >> > > > hopefully
> > >> > > > > > > there
> > >> > > > > > > > > > will
> > >> > > > > > > > > > > be
> > >> > > > > > > > > > > > > >> more
> > >> > > > > > > > > > > > > >> > of
> > >> > > > > > > > > > > > > >> > > a discussion next week. :)
> > >> > > > > > > > > > > > > >> > >
> > >> > > > > > > > > > > > > >> > > Ismael
> > >> > > > > > > > > > > > > >> > >
> > >> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin
> Lee, <
> > >> > > > > > > > dongjin@apache.org
> > >> > > > > > > > > >
> > >> > > > > > > > > > > > wrote:
> > >> > > > > > > > > > > > > >> > >
> > >> > > > > > > > > > > > > >> > > > Hello. I just updated my draft
> > >> > implementation:
> > >> > > > > > > > > > > > > >> > > >
> > >> > > > > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit
> > >> 5145d6b)
> > >> > > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> > >> > > > > > > > > > > > > >> > > >
> > >> > > > > > > > > > > > > >> > > > You can check out the implementation
> > from
> > >> > here
> > >> > > > > > > > > > > > > >> > > > <
> > >> https://github.com/apache/kafka/pull/2267
> > >> > >.
> > >> > > If
> > >> > > > > you
> > >> > > > > > > > > > > experience
> > >> > > > > > > > > > > > any
> > >> > > > > > > > > > > > > >> > > problem
> > >> > > > > > > > > > > > > >> > > > running it, don't hesitate to give
> me a
> > >> > > mention.
> > >> > > > > > > > > > > > > >> > > >
> > >> > > > > > > > > > > > > >> > > > Best,
> > >> > > > > > > > > > > > > >> > > > Dongjin
> > >> > > > > > > > > > > > > >> > > >
> > >> > > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM
> Dongjin
> > >> Lee
> > >> > <
> > >> > > > > > > > > > > dongjin@apache.org
> > >> > > > > > > > > > > > >
> > >> > > > > > > > > > > > > >> > wrote:
> > >> > > > > > > > > > > > > >> > > >
> > >> > > > > > > > > > > > > >> > > > > Here is the short conclusion about
> > the
> > >> > > license
> > >> > > > > > > > problem:
> > >> > > > > > > > > > *We
> > >> > > > > > > > > > > > can
> > >> > > > > > > > > > > > > >> use
> > >> > > > > > > > > > > > > >> > > zstd
> > >> > > > > > > > > > > > > >> > > > > and zstd-jni without any problem,
> but
> > >> we
> > >> > > need
> > >> > > > to
> > >> > > > > > > > include
> > >> > > > > > > > > > > their
> > >> > > > > > > > > > > > > >> > license,
> > >> > > > > > > > > > > > > >> > > > > e.g., BSD license.*
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3
> > Clause
> > >> > > > License
> > >> > > > > > > > requires
> > >> > > > > > > > > > to
> > >> > > > > > > > > > > > > >> include
> > >> > > > > > > > > > > > > >> > the
> > >> > > > > > > > > > > > > >> > > > > license used, and BSD 3 Clause
> > License
> > >> > > > requires
> > >> > > > > > that
> > >> > > > > > > > the
> > >> > > > > > > > > > > name
> > >> > > > > > > > > > > > of
> > >> > > > > > > > > > > > > >> the
> > >> > > > > > > > > > > > > >> > > > > contributor can't be used to
> endorse
> > or
> > >> > > > promote
> > >> > > > > > the
> > >> > > > > > > > > > product.
> > >> > > > > > > > > > > > > >> That's
> > >> > > > > > > > > > > > > >> > it
> > >> > > > > > > > > > > > > >> > > > > <
> > >> > > > > > > > > > > > > >> > > >
> > >> > > > > > > > > > > > > >> > >
> > >> > > > > > > > > > > > > >> >
> > >> > > > > > > > > > > > > >>
> > >> > http://www.mikestratton.net/2011/12/is-bsd-license-
> > >> > > > > > > > > > > > compatible-with-apache-2-0-license/
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > > - They are not listed in the list
> of
> > >> > > > prohibited
> > >> > > > > > > > licenses
> > >> > > > > > > > > > > > > >> > > > > <https://www.apache.org/legal/
> > >> > > > > > > > resolved.html#category-x>
> > >> > > > > > > > > > > also.
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > > Here is how Spark did for it
> > >> > > > > > > > > > > > > >> > > > > <https://issues.apache.org/
> > >> > > > > > jira/browse/SPARK-19112
> > >> > > > > > > >:
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > > - They made a directory dedicated
> to
> > >> the
> > >> > > > > > dependency
> > >> > > > > > > > > > license
> > >> > > > > > > > > > > > files
> > >> > > > > > > > > > > > > >> > > > > <
> > >> > > > > > > https://github.com/apache/spark/tree/master/licenses
> > >> > > > > > > > >
> > >> > > > > > > > > > and
> > >> > > > > > > > > > > > added
> > >> > > > > > > > > > > > > >> > > > licenses
> > >> > > > > > > > > > > > > >> > > > > for Zstd
> > >> > > > > > > > > > > > > >> > > > > <
> > >> > > > > > > > > > > > > >> >
> > >> > > > https://github.com/apache/spark/blob/master/licenses/
> > >> > > > > > > > > > > > LICENSE-zstd.txt
> > >> > > > > > > > > > > > > >> > > >
> > >> > > > > > > > > > > > > >> > > > &
> > >> > > > > > > > > > > > > >> > > > > Zstd-jni
> > >> > > > > > > > > > > > > >> > > > > <
> > >> > > > > > > > > > > > > >> > > >
> > >> > > > > > > > > > > > > >> > >
> > >> > > > > > > > > > > > > >> >
> > >> > > > > > > > > > > > > >> https://github.com/apache/
> > >> > > spark/blob/master/licenses/
> > >> > > > > > > > > > > > LICENSE-zstd-jni.txt
> > >> > > > > > > > > > > > > >> >
> > >> > > > > > > > > > > > > >> > > > > .
> > >> > > > > > > > > > > > > >> > > > > - Added a link to the original
> > license
> > >> > files
> > >> > > > in
> > >> > > > > > > > LICENSE.
> > >> > > > > > > > > > > > > >> > > > > <
> > >> > > > > https://github.com/apache/spark/pull/18805/files
> > >> > > > > > >
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > > If needed, I can make a similar
> > update.
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > > Thanks for pointing out this
> problem,
> > >> > > Viktor!
> > >> > > > > Nice
> > >> > > > > > > > > catch!
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > > Best,
> > >> > > > > > > > > > > > > >> > > > > Dongjin
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM
> > >> Dongjin
> > >> > > Lee <
> > >> > > > > > > > > > > > dongjin@apache.org>
> > >> > > > > > > > > > > > > >> > > wrote:
> > >> > > > > > > > > > > > > >> > > > >
> > >> > > > > > > > > > > > > >> > > > >> I greatly appreciate your
> > >> comprehensive
> > >> > > > > > reasoning.
> > >> > > > > > > > so:
> > >> > > > > > > > > +1
> > >> > > > > > > > > > > > for b
> > >> > > > > > > > > > > > > >> > until
> > >> > > > > > > > > > > > > >> > > > now.
> > >> > > > > > > > > > > > > >> > > > >>
> > >> > > > > > > > > > > > > >> > > > >> For the license issues, I will
> have
> > a
> > >> > check
> > >> > > > on
> > >> > > > > > how
> > >> > > > > > > > the
> > >> > > > > > > > > > over
> > >> > > > > > > > > > > > > >> projects
> > >> > > > > > > > > > > > > >> > > are
> > >> > > > > > > > > > > > > >> > > > >> doing and share the results.
> > >> > > > > > > > > > > > > >> > > > >>
> > >> > > > > > > > > > > > > >> > > > >> Best,
> > >> > > > > > > > > > > > > >> > > > >> Dongjin
> > >> > > > > > > > > > > > > >> > > > >>
> > >> > > > > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM
> > >> Viktor
> > >> > > > > Somogyi <
> > >> > > > > > > > > > > > > >> > > > viktorsomogyi@gmail.com>
> > >> > > > > > > > > > > > > >> > > > >> wrote:
> > >> > > > > > > > > > > > > >> > > > >>
> > >> > > > > > > > > > > > > >> > > > >>> Hi Dongjin,
> > >> > > > > > > > > > > > > >> > > > >>>
> > >> > > > > > > > > > > > > >> > > > >>> A couple of comments:
> > >> > > > > > > > > > > > > >> > > > >>> I would vote for option b. in the
> > >> > > "backward
> > >> > > > > > > > > > compatibility"
> > >> > > > > > > > > > > > > >> section.
> > >> > > > > > > > > > > > > >> > > My
> > >> > > > > > > > > > > > > >> > > > >>> reasoning for this is that users
> > >> > upgrading
> > >> > > > to
> > >> > > > > a
> > >> > > > > > > zstd
> > >> > > > > > > > > > > > compatible
> > >> > > > > > > > > > > > > >> > > version
> > >> > > > > > > > > > > > > >> > > > >>> won't start to use it
> > automatically,
> > >> so
> > >> > > > manual
> > >> > > > > > > > > > > > reconfiguration
> > >> > > > > > > > > > > > > >> is
> > >> > > > > > > > > > > > > >> > > > >>> required.
> > >> > > > > > > > > > > > > >> > > > >>> Therefore an upgrade won't mess
> up
> > >> the
> > >> > > > > cluster.
> > >> > > > > > If
> > >> > > > > > > > not
> > >> > > > > > > > > > all
> > >> > > > > > > > > > > > the
> > >> > > > > > > > > > > > > >> > > clients
> > >> > > > > > > > > > > > > >> > > > >>> are
> > >> > > > > > > > > > > > > >> > > > >>> upgraded but just some of them
> and
> > >> > they'd
> > >> > > > > start
> > >> > > > > > to
> > >> > > > > > > > use
> > >> > > > > > > > > > > zstd
> > >> > > > > > > > > > > > > >> then it
> > >> > > > > > > > > > > > > >> > > > would
> > >> > > > > > > > > > > > > >> > > > >>> cause errors in the cluster. I'd
> > >> like to
> > >> > > > > presume
> > >> > > > > > > > > though
> > >> > > > > > > > > > > that
> > >> > > > > > > > > > > > > >> this
> > >> > > > > > > > > > > > > >> > is
> > >> > > > > > > > > > > > > >> > > a
> > >> > > > > > > > > > > > > >> > > > >>> very
> > >> > > > > > > > > > > > > >> > > > >>> obvious failure case and nobody
> > >> should
> > >> > be
> > >> > > > > > > surprised
> > >> > > > > > > > if
> > >> > > > > > > > > > it
> > >> > > > > > > > > > > > didn't
> > >> > > > > > > > > > > > > >> > > work.
> > >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose a. as I think
> we
> > >> > should
> > >> > > > bump
> > >> > > > > > the
> > >> > > > > > > > > fetch
> > >> > > > > > > > > > > and
> > >> > > > > > > > > > > > > >> > produce
> > >> > > > > > > > > > > > > >> > > > >>> requests if it's a change in the
> > >> message
> > >> > > > > format.
> > >> > > > > > > > > > Moreover
> > >> > > > > > > > > > > if
> > >> > > > > > > > > > > > > >> some
> > >> > > > > > > > > > > > > >> > of
> > >> > > > > > > > > > > > > >> > > > the
> > >> > > > > > > > > > > > > >> > > > >>> producers and the brokers are
> > >> upgraded
> > >> > but
> > >> > > > > some
> > >> > > > > > of
> > >> > > > > > > > the
> > >> > > > > > > > > > > > consumers
> > >> > > > > > > > > > > > > >> > are
> > >> > > > > > > > > > > > > >> > > > not,
> > >> > > > > > > > > > > > > >> > > > >>> then we wouldn't prevent the
> error
> > >> when
> > >> > > the
> > >> > > > > old
> > >> > > > > > > > > consumer
> > >> > > > > > > > > > > > tries
> > >> > > > > > > > > > > > > >> to
> > >> > > > > > > > > > > > > >> > > > consume
> > >> > > > > > > > > > > > > >> > > > >>> the zstd compressed messages.
> > >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose c. either as I
> > >> think
> > >> > > > binding
> > >> > > > > > the
> > >> > > > > > > > > > > > compression
> > >> > > > > > > > > > > > > >> type
> > >> > > > > > > > > > > > > >> > > to
> > >> > > > > > > > > > > > > >> > > > an
> > >> > > > > > > > > > > > > >> > > > >>> API is not so obvious from the
> > >> > developer's
> > >> > > > > > > > > perspective.
> > >> > > > > > > > > > > > > >> > > > >>>
> > >> > > > > > > > > > > > > >> > > > >>> I would also prefer to use the
> > >> existing
> > >> > > > > binding,
> > >> > > > > > > > > however
> > >> > > > > > > > > > > we
> > >> > > > > > > > > > > > must
> > >> > > > > > > > > > > > > >> > > > respect
> > >> > > > > > > > > > > > > >> > > > >>> the licenses:
> > >> > > > > > > > > > > > > >> > > > >>> "The code for these JNI bindings
> is
> > >> > > licenced
> > >> > > > > > under
> > >> > > > > > > > > > > 2-clause
> > >> > > > > > > > > > > > BSD
> > >> > > > > > > > > > > > > >> > > > license.
> > >> > > > > > > > > > > > > >> > > > >>> The native Zstd library is
> licensed
> > >> > under
> > >> > > > > > 3-clause
> > >> > > > > > > > BSD
> > >> > > > > > > > > > > > license
> > >> > > > > > > > > > > > > >> and
> > >> > > > > > > > > > > > > >> > > > GPL2"
> > >> > > > > > > > > > > > > >> > > > >>> Based on the FAQ page
> > >> > > > > > > > > > > > > >> > > > >>> https://www.apache.org/legal/
> > >> > > > > > > > resolved.html#category-a
> > >> > > > > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD
> > >> licenses
> > >> > > but
> > >> > > > > the
> > >> > > > > > > > Apache
> > >> > > > > > > > > > > > license
> > >> > > > > > > > > > > > > >> is
> > >> > > > > > > > > > > > > >> > not
> > >> > > > > > > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping
> > that
> > >> > the
> > >> > > > > > > "3-clause
> > >> > > > > > > > > BSD
> > >> > > > > > > > > > > > license
> > >> > > > > > > > > > > > > >> and
> > >> > > > > > > > > > > > > >> > > > GPL2"
> > >> > > > > > > > > > > > > >> > > > >>> is really not an AND but an OR in
> > >> this
> > >> > > case,
> > >> > > > > but
> > >> > > > > > > I'm
> > >> > > > > > > > > no
> > >> > > > > > > > > > > > lawyer,
> > >> > > > > > > > > > > > > >> > just
> > >> > > > > > > > > > > > > >> > > > >>> wanted
> > >> > > > > > > > > > > > > >> > > > >>> to make the point that we should
> > >> watch
> > >> > out
> > >> > > > for
> > >> > > > > > > > > licenses.
> > >> > > > > > > > > > > :)
> > >> > > > > > > > > > > > > >> > > > >>>
> > >> > > > > > > > > > > > > >> > > > >>> Regards,
> > >> > > > > > > > > > > > > >> > > > >>> Viktor
> > >> > > > > > > > > > > > > >> > > > >>>
> > >> > > > > > > > > > > > > >> > > > >>>
> > >> > > > > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM
> > Ivan
> > >> > > Babrou
> > >> > > > <
> > >> > > > > > > > > > > > ibobrik@gmail.com>
> > >> > > > > > > > > > > > > >> > > wrote:
> > >> > > > > > > > > > > > > >> > > > >>>
> > >> > > > > > > > > > > > > >> > > > >>> > Hello,
> > >> > > > > > > > > > > > > >> > > > >>> >
> > >> > > > > > > > > > > > > >> > > > >>> > This is Ivan and I still very
> > much
> > >> > > support
> > >> > > > > the
> > >> > > > > > > > fact
> > >> > > > > > > > > > that
> > >> > > > > > > > > > > > zstd
> > >> > > > > > > > > > > > > >> > > > >>> compression
> > >> > > > > > > > > > > > > >> > > > >>> > should be included out of the
> > box.
> > >> > > > > > > > > > > > > >> > > > >>> >
> > >> > > > > > > > > > > > > >> > > > >>> > Please think about the
> > environment,
> > >> > you
> > >> > > > can
> > >> > > > > > save
> > >> > > > > > > > > > quite a
> > >> > > > > > > > > > > > lot
> > >> > > > > > > > > > > > > >> of
> > >> > > > > > > > > > > > > >> > > > >>> hardware
> > >> > > > > > > > > > > > > >> > > > >>> > with it.
> > >> > > > > > > > > > > > > >> > > > >>> >
> > >> > > > > > > > > > > > > >> > > > >>> > Thank you.
> > >> > > > > > > > > > > > > >> > > > >>> >
> > >> > > > > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14
> > >> Dongjin
> > >> > > Lee <
> > >> > > > > > > > > > > > dongjin@apache.org>
> > >> > > > > > > > > > > > > >> > > wrote:
> > >> > > > > > > > > > > > > >> > > > >>> >
> > >> > > > > > > > > > > > > >> > > > >>> > > Since there are no responses
> > for
> > >> a
> > >> > > > week, I
> > >> > > > > > > > decided
> > >> > > > > > > > > > to
> > >> > > > > > > > > > > > > >> > reinitiate
> > >> > > > > > > > > > > > > >> > > > the
> > >> > > > > > > > > > > > > >> > > > >>> > > discussion thread.
> > >> > > > > > > > > > > > > >> > > > >>> > >
> > >> > > > > > > > > > > > > >> > > > >>> > >
> > >> > > > > > > > > > > > > >> > > > >>> > >
> > >> > > > > > > > > > > > > >> > > > >>> >
> > >> > > > > > > > > > > > > >> > > > >>>
> > >> > > > > > > > > > > > > >> > > >
> > >> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Dongjin,

Yeah that's right. For what it's worth, librdkafka also appears to handle
unexpected error codes. I expect that most client implementations would
either pass through the raw type or convert to an enum using something like
what the java client does. Since we're expecting the client to fail anyway,
I'm probably in favor of using the UNSUPPORTED_COMPRESSION_TYPE error code.

-Jason

On Wed, Aug 22, 2018 at 1:46 AM, Dongjin Lee <do...@apache.org> wrote:

> Jason and Ismael,
>
> It seems like the only thing we need to regard if we define a new error
> code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the implementation of
> the other language clients, right? At least, this strategy causes any
> problem for Java client. Do I understand correctly?
>
> Thanks,
> Dongjin
>
> On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <do...@apache.org> wrote:
>
> > Jason,
> >
> > > I think we would only use this error code when we /know/ that zstd was
> > in use and the client doesn't support it? This is true if either 1) the
> > message needs down-conversion and we encounter a zstd compressed message,
> > or 2) if the topic is explicitly configured to use zstd.
> >
> > Yes, it is right. And you know, the case 1 includes 1.a) old clients'
> > request v0, v1 records or 1.b) implicit zstd, the compression type of
> > "producer" with Zstd compressed data.
> >
> > > However, if the compression type is set to "producer," then the fetched
> > data may or may not be compressed with zstd. In this case, we return the
> > data to the client and expect it to fail parsing. Is that correct?
> >
> > Exactly.
> >
> > Following your message, I reviewed the implementation of
> > `KafkaApis#handleFetchRequest,` which handles the fetch request. And
> found
> > that the information we can use is like the following:
> >
> > 1. Client's fetch request version. (`versionId` variable)
> > 2. Log's compression type. (`logConfig` variable)
> >
> > We can't detect the actual compression type of the data, unless we
> inspect
> > the `RecordBatch` included in the `Records` instance (i.e.,
> > `unconvertedRecords` variable.) Since it requires some performance issue,
> > it is not our option - in short, we can't be sure if given chunks of data
> > are compressed with zstd or not.
> >
> > So, conclusion: we can return an error in the case of 1.a and 2 easily,
> > with the information above. In the case 1.b (implicit zstd), we can just
> > return the data by do nothing special and expect it to fail parsing.
> >
> > Thanks,
> > Dongjin
> >
> > On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <is...@gmail.com> wrote:
> >
> >> Jason, that's an interesting point regarding the Java client. Do we know
> >> what clients in other languages do in these cases?
> >>
> >> Ismael
> >>
> >> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <ja...@confluent.io> wrote:
> >>
> >> > Hi Dongjin,
> >> >
> >> > One of the complications is that old versions of the API will not
> >> expect a
> >> > new error code. However, since we expect this to be a fatal error
> anyway
> >> > for old clients, it may still be more useful to return the correct
> error
> >> > code. For example, the Kafka clients use the following code to convert
> >> the
> >> > error code:
> >> >
> >> >     public static Errors forCode(short code) {
> >> >         Errors error = codeToError.get(code);
> >> >         if (error != null) {
> >> >             return error;
> >> >         } else {
> >> >             log.warn("Unexpected error code: {}.", code);
> >> >             return UNKNOWN_SERVER_ERROR;
> >> >         }
> >> >     }
> >> >
> >> > If we return an unsupported error code, it will be converted to an
> >> UNKNOWN
> >> > error, but at least we will get the message in the log with the
> correct
> >> > code. That seems preferable to returning a misleading error code. So I
> >> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error even
> for
> >> > older versions.
> >> >
> >> > Also, one question just to check my understanding. I think we would
> only
> >> > use this error code when we /know/ that zstd was in use and the client
> >> > doesn't support it? This is true if either 1) the message needs
> >> > down-conversion and we encounter a zstd compressed message, or 2) if
> the
> >> > topic is explicitly configured to use zstd. However, if the
> compression
> >> > type is set to "producer," then the fetched data may or may not be
> >> > compressed with zstd. In this case, we return the data to the client
> and
> >> > expect it to fail parsing. Is that correct?
> >> >
> >> > Thanks,
> >> > Jason
> >> >
> >> >
> >> >
> >> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <do...@apache.org>
> >> wrote:
> >> >
> >> > > Ismael, Jason and all,
> >> > >
> >> > > I rewrote the backward compatibility strategy & its alternatives
> like
> >> > > following, based on Ismael & Jason's comments. Since it is not
> >> updated to
> >> > > the wiki yet, don't hesitate to give me a message if you have any
> >> opinion
> >> > > on it.
> >> > >
> >> > > ```
> >> > > *Backward Compatibility*
> >> > >
> >> > > We need to establish some backward-compatibility strategy for the
> >> case an
> >> > > old client subscribes a topic using ZStandard implicitly (i.e.,
> >> > > 'compression.type' configuration of given topic is 'producer' and
> the
> >> > > producer compressed the records with ZStandard). We have the
> following
> >> > > options for this situation:
> >> > >
> >> > > *A. Support ZStandard to the old clients which can understand v0, v1
> >> > > messages only.*
> >> > >
> >> > > This strategy necessarily requires the down-conversion of v2 message
> >> > > compressed with Zstandard into v0 or v1 messages, which means a
> >> > > considerable performance degradation. So we rejected this strategy.
> >> > >
> >> > > *B. Bump the API version and support only v2-available clients*
> >> > >
> >> > > With this approach, we can message the old clients that they are old
> >> and
> >> > > should be upgraded. However, there are still several options for the
> >> > Error
> >> > > code.
> >> > >
> >> > > *B.1. INVALID_REQUEST (42)*
> >> > >
> >> > > This option gives the client so little information; the user can be
> >> > > confused about why the client worked correctly in the past suddenly
> >> > > encounters a problem. So we rejected this strategy.
> >> > >
> >> > > *B.2. CORRUPT_MESSAGE (2)*
> >> > >
> >> > > This option gives inaccurate information; the user can be surprised
> >> and
> >> > > misunderstand that the log files are broken in some way. So we
> >> rejected
> >> > > this strategy.
> >> > >
> >> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
> >> > >
> >> > > The advantage of this approach is we don't need to define a new
> error
> >> > code;
> >> > > we can reuse it and that's all.
> >> > >
> >> > > The disadvantage of this approach is that it is also a little bit
> >> vague;
> >> > > This error code is defined as a work for KIP-98[^1] and now returned
> >> in
> >> > the
> >> > > transaction error.
> >> > >
> >> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
> >> > >
> >> > > The advantage of this approach is that it is clear and provides an
> >> exact
> >> > > description. The disadvantage is we need to add a new error code.
> >> > > ```
> >> > >
> >> > > *It seems like what we need to choose is now so clear:
> >> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or UNSUPPORTED_COMPRESSION_TYPE
> >> > > (B.4).*
> >> > > The first one doesn't need a new error message but the latter is
> more
> >> > > explicit. Which one do you prefer? Since all of you have much more
> >> > > experience and knowledge than me, I will follow your decision. The
> >> wiki
> >> > > page will be updated following the decision also.
> >> > >
> >> > > Best,
> >> > > Dongjin
> >> > >
> >> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
> >> > >
> >> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <is...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Sounds reasonable to me.
> >> > > >
> >> > > > Ismael
> >> > > >
> >> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <ja...@confluent.io>
> >> > wrote:
> >> > > >
> >> > > > > Hey Ismael,
> >> > > > >
> >> > > > > Your summary looks good to me. I think it might also be a good
> >> idea
> >> > to
> >> > > > add
> >> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along with
> the
> >> > > > version
> >> > > > > bumps. We won't be able to use it for old api versions since the
> >> > > clients
> >> > > > > will not understand it, but we can use it going forward so that
> >> we're
> >> > > not
> >> > > > > stuck in a similar situation with a new message format and a new
> >> > codec
> >> > > to
> >> > > > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_
> FORMAT,
> >> but
> >> > > it
> >> > > > is
> >> > > > > not as explicit.
> >> > > > >
> >> > > > > -Jason
> >> > > > >
> >> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <ismael@juma.me.uk
> >
> >> > > wrote:
> >> > > > >
> >> > > > > > Hi Dongjin and Jason,
> >> > > > > >
> >> > > > > > I would agree. My summary:
> >> > > > > >
> >> > > > > > 1. Support zstd with message format 2 only.
> >> > > > > > 2. Bump produce and fetch request versions.
> >> > > > > > 3. Provide broker errors whenever possible based on the
> request
> >> > > version
> >> > > > > and
> >> > > > > > rely on clients for the cases where the broker can't validate
> >> > > > efficiently
> >> > > > > > (example message format 2 consumer that supports the latest
> >> fetch
> >> > > > version
> >> > > > > > but doesn't support zstd).
> >> > > > > >
> >> > > > > > If there's general agreement on this, I suggest we update the
> >> KIP
> >> > to
> >> > > > > state
> >> > > > > > the proposal and to move the rejected options to its own
> >> section.
> >> > And
> >> > > > > then
> >> > > > > > start a vote!
> >> > > > > >
> >> > > > > > Ismael
> >> > > > > >
> >> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
> >> > jason@confluent.io>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Dongjin,
> >> > > > > > >
> >> > > > > > > Yes, that's a good summary. For clients which support v2,
> the
> >> > > client
> >> > > > > can
> >> > > > > > > parse the message format and hopefully raise a useful error
> >> > message
> >> > > > > > > indicating the unsupported compression type. For older
> >> clients,
> >> > our
> >> > > > > > options
> >> > > > > > > are probably (1) to down-convert to the old format using no
> >> > > > compression
> >> > > > > > > type, or (2) to return an error code. I'm leaning toward the
> >> > latter
> >> > > > as
> >> > > > > > the
> >> > > > > > > simpler solution, but the challenge is finding a good error
> >> code.
> >> > > Two
> >> > > > > > > possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE.
> The
> >> > > > downside
> >> > > > > > is
> >> > > > > > > that old clients probably won't get a helpful message.
> >> However,
> >> > at
> >> > > > > least
> >> > > > > > > the behavior will be consistent in the sense that all
> clients
> >> > will
> >> > > > fail
> >> > > > > > if
> >> > > > > > > they do not support zstandard.
> >> > > > > > >
> >> > > > > > > What do you think?
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > > Jason
> >> > > > > > >
> >> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
> >> dongjin@apache.org
> >> > >
> >> > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Thanks Jason, I reviewed the down-converting logic
> following
> >> > your
> >> > > > > > > > explanation.[^1] You mean the following routines, right?
> >> > > > > > > >
> >> > > > > > > > -
> >> > > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/
> >> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> >> > > > > > > > -
> >> > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> >> > > > > > > > main/java/org/apache/kafka/common/record/
> >> > > LazyDownConversionRecords.
> >> > > > > > > > java#L165
> >> > > > > > > > -
> >> > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> >> > > > > > > >
> >> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> >> > > > > > > >
> >> > > > > > > > It seems like your stance is like following:
> >> > > > > > > >
> >> > > > > > > > 1. In principle, Kafka does not change the compression
> codec
> >> > when
> >> > > > > > > > down-converting, since it requires inspecting the fetched
> >> data,
> >> > > > which
> >> > > > > > is
> >> > > > > > > > expensive.
> >> > > > > > > > 2. However, there are some cases the fetched data is
> >> inspected
> >> > > > > anyway.
> >> > > > > > In
> >> > > > > > > > this case, we can provide compression conversion from
> >> Zstandard
> >> > > to
> >> > > > > > > > classical ones[^2].
> >> > > > > > > >
> >> > > > > > > > And from what I understand, the cases where the client
> >> without
> >> > > > > > ZStandard
> >> > > > > > > > support receives ZStandard compressed records can be
> >> organized
> >> > > into
> >> > > > > two
> >> > > > > > > > cases:
> >> > > > > > > >
> >> > > > > > > > a. The 'compression.type' configuration of given topic is
> >> > > > 'producer'
> >> > > > > > and
> >> > > > > > > > the producer compressed the records with ZStandard. (that
> >> is,
> >> > > using
> >> > > > > > > > ZStandard implicitly.)
> >> > > > > > > > b.  The 'compression.type' configuration of given topic is
> >> > > 'zstd';
> >> > > > > that
> >> > > > > > > is,
> >> > > > > > > > using ZStandard explicitly.
> >> > > > > > > >
> >> > > > > > > > As you stated, we don't have to handle the case b
> specially.
> >> > So,
> >> > > It
> >> > > > > > seems
> >> > > > > > > > like we can narrow the focus of the problem by joining
> case
> >> 1
> >> > and
> >> > > > > case
> >> > > > > > b
> >> > > > > > > > like the following:
> >> > > > > > > >
> >> > > > > > > > > Given the topic with 'producer' as its
> 'compression.type'
> >> > > > > > > configuration,
> >> > > > > > > > ZStandard compressed records and old client without
> >> ZStandard,
> >> > is
> >> > > > > there
> >> > > > > > > any
> >> > > > > > > > case we need to inspect the records and can change the
> >> > > compression
> >> > > > > > type?
> >> > > > > > > If
> >> > > > > > > > so, can we provide compression type converting?
> >> > > > > > > >
> >> > > > > > > > Do I understand correctly?
> >> > > > > > > >
> >> > > > > > > > Best,
> >> > > > > > > > Dongjin
> >> > > > > > > >
> >> > > > > > > > [^1]: I'm sorry, I found that I was a little bit
> >> > misunderstanding
> >> > > > how
> >> > > > > > API
> >> > > > > > > > version works, after reviewing the downconvert logic & the
> >> > > protocol
> >> > > > > > > > documentation <https://kafka.apache.org/protocol>.
> >> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
> >> > > > > > > >
> >> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
> >> > > > jason@confluent.io>
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > But in my opinion, since the client will fail with the
> >> API
> >> > > > > version,
> >> > > > > > > so
> >> > > > > > > > we
> >> > > > > > > > > > don't need to down-convert the messages anyway. Isn't
> >> it?
> >> > > So, I
> >> > > > > > think
> >> > > > > > > > we
> >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
> >> familiar
> >> > > with
> >> > > > > > > > > down-convert
> >> > > > > > > > > > logic.)
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > Currently the broker down-converts automatically when it
> >> > > receives
> >> > > > > an
> >> > > > > > > old
> >> > > > > > > > > version of the fetch request (a version which is known
> to
> >> > > predate
> >> > > > > the
> >> > > > > > > > > message format in use). Typically when down-converting
> the
> >> > > > message
> >> > > > > > > > format,
> >> > > > > > > > > we use the same compression type, but there is not much
> >> point
> >> > > in
> >> > > > > > doing
> >> > > > > > > so
> >> > > > > > > > > when we know the client doesn't support it. So if
> >> zstandard
> >> > is
> >> > > in
> >> > > > > > use,
> >> > > > > > > > and
> >> > > > > > > > > we have to down-convert anyway, then we can choose to
> use
> >> a
> >> > > > > different
> >> > > > > > > > > compression type or no compression type.
> >> > > > > > > > >
> >> > > > > > > > > From my perspective, there is no significant downside to
> >> > > bumping
> >> > > > > the
> >> > > > > > > > > protocol version and it has several potential benefits.
> >> > Version
> >> > > > > bumps
> >> > > > > > > are
> >> > > > > > > > > cheap. The main question mark in my mind is about
> >> > > > down-conversion.
> >> > > > > > > > Figuring
> >> > > > > > > > > out whether down-conversion is needed is hard generally
> >> > without
> >> > > > > > > > inspecting
> >> > > > > > > > > the fetched data, which is expensive. I think we agree
> in
> >> > > > principle
> >> > > > > > > that
> >> > > > > > > > we
> >> > > > > > > > > do not want to have to pay this cost generally and
> prefer
> >> the
> >> > > > > clients
> >> > > > > > > to
> >> > > > > > > > > fail when they see an unhandled compression type. The
> >> point I
> >> > > was
> >> > > > > > > making
> >> > > > > > > > is
> >> > > > > > > > > that there are some cases where we are either inspecting
> >> the
> >> > > data
> >> > > > > > > anyway
> >> > > > > > > > > (because we have to down-convert the message format), or
> >> we
> >> > > have
> >> > > > an
> >> > > > > > > easy
> >> > > > > > > > > way to tell whether zstandard is in use (the topic has
> it
> >> > > > > configured
> >> > > > > > > > > explicitly). In the latter case, we don't have to handle
> >> it
> >> > > > > > specially.
> >> > > > > > > > But
> >> > > > > > > > > we do have to decide how we will handle down-conversion
> to
> >> > > older
> >> > > > > > > formats.
> >> > > > > > > > >
> >> > > > > > > > > -Jason
> >> > > > > > > > >
> >> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
> >> > > dongjin@apache.org
> >> > > > >
> >> > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Colin and Jason,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for your opinions. In summarizing, the Pros and
> >> Cons
> >> > > of
> >> > > > > > > bumping
> >> > > > > > > > > > fetch API version are:
> >> > > > > > > > > >
> >> > > > > > > > > > Cons:
> >> > > > > > > > > >
> >> > > > > > > > > > - The Broker can't know whether a given message batch
> is
> >> > > > > compressed
> >> > > > > > > > with
> >> > > > > > > > > > zstd or not.
> >> > > > > > > > > > - Need some additional logic for the topic explicitly
> >> > > > configured
> >> > > > > to
> >> > > > > > > use
> >> > > > > > > > > > zstd.
> >> > > > > > > > > >
> >> > > > > > > > > > Pros:
> >> > > > > > > > > >
> >> > > > > > > > > > - The broker doesn't need to conduct expensive
> >> > > down-conversion.
> >> > > > > > > > > > - Can message the users to update their client.
> >> > > > > > > > > >
> >> > > > > > > > > > So, opinions for the backward-compatibility policy by
> >> far:
> >> > > > > > > > > >
> >> > > > > > > > > > - A: bump the API version - +2 (Colin, Jason)
> >> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
> >> > > > > > > > > >
> >> > > > > > > > > > Here are my additional comments:
> >> > > > > > > > > >
> >> > > > > > > > > > @Colin
> >> > > > > > > > > >
> >> > > > > > > > > > I greatly appreciate your response. In the case of the
> >> > > > dictionary
> >> > > > > > > > > support,
> >> > > > > > > > > > of course, this issue should be addressed later so we
> >> don't
> >> > > > need
> >> > > > > it
> >> > > > > > > in
> >> > > > > > > > > the
> >> > > > > > > > > > first version. You are right - it is not late to try
> it
> >> > after
> >> > > > > some
> >> > > > > > > > > > benchmarks. What I mean is, we should keep in mind on
> >> that
> >> > > > > > potential
> >> > > > > > > > > > feature.
> >> > > > > > > > > >
> >> > > > > > > > > > @Jason
> >> > > > > > > > > >
> >> > > > > > > > > > You wrote,
> >> > > > > > > > > >
> >> > > > > > > > > > > Similarly, if we have to down-convert anyway because
> >> the
> >> > > > client
> >> > > > > > > does
> >> > > > > > > > > not
> >> > > > > > > > > > understand the message format, then we could also use
> a
> >> > > > different
> >> > > > > > > > > > compression type.
> >> > > > > > > > > >
> >> > > > > > > > > > But in my opinion, since the client will fail with the
> >> API
> >> > > > > version,
> >> > > > > > > so
> >> > > > > > > > we
> >> > > > > > > > > > don't need to down-convert the messages anyway. Isn't
> >> it?
> >> > > So, I
> >> > > > > > think
> >> > > > > > > > we
> >> > > > > > > > > > don't care about this case. (I'm sorry, I am not
> >> familiar
> >> > > with
> >> > > > > > > > > down-convert
> >> > > > > > > > > > logic.)
> >> > > > > > > > > >
> >> > > > > > > > > > Please give more opinions. Thanks!
> >> > > > > > > > > >
> >> > > > > > > > > > - Dongjin
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
> >> > > > > jason@confluent.io
> >> > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Hey Colin,
> >> > > > > > > > > > >
> >> > > > > > > > > > > The problem for the fetch API is that the broker
> does
> >> not
> >> > > > > > generally
> >> > > > > > > > > know
> >> > > > > > > > > > if
> >> > > > > > > > > > > a batch was compressed with zstd unless it parses
> it.
> >> I
> >> > > think
> >> > > > > the
> >> > > > > > > > goal
> >> > > > > > > > > > here
> >> > > > > > > > > > > is to avoid the expensive down-conversion that is
> >> needed
> >> > to
> >> > > > > > ensure
> >> > > > > > > > > > > compatibility because it is only necessary if zstd
> is
> >> > > > actually
> >> > > > > in
> >> > > > > > > > use.
> >> > > > > > > > > > But
> >> > > > > > > > > > > as long as old clients can parse the message format,
> >> they
> >> > > > > should
> >> > > > > > > get
> >> > > > > > > > a
> >> > > > > > > > > > > reasonable error if they see an unsupported
> >> compression
> >> > > type
> >> > > > in
> >> > > > > > the
> >> > > > > > > > > > > attributes. Basically the onus is on users to ensure
> >> that
> >> > > > their
> >> > > > > > > > > consumers
> >> > > > > > > > > > > have been updated prior to using zstd. It seems
> like a
> >> > > > > reasonable
> >> > > > > > > > > > tradeoff
> >> > > > > > > > > > > to me. There are a couple cases that might be worth
> >> > > thinking
> >> > > > > > > through:
> >> > > > > > > > > > >
> >> > > > > > > > > > > 1. If a topic is explicitly configured to use zstd,
> >> then
> >> > we
> >> > > > > don't
> >> > > > > > > > need
> >> > > > > > > > > to
> >> > > > > > > > > > > check the fetched data for the compression type to
> >> know
> >> > if
> >> > > we
> >> > > > > > need
> >> > > > > > > > > > > down-conversion. If we did bump the Fetch API
> version,
> >> > then
> >> > > > we
> >> > > > > > > could
> >> > > > > > > > > > handle
> >> > > > > > > > > > > this case by either down-converting using a
> different
> >> > > > > compression
> >> > > > > > > > type
> >> > > > > > > > > or
> >> > > > > > > > > > > returning an error.
> >> > > > > > > > > > > 2. Similarly, if we have to down-convert anyway
> >> because
> >> > the
> >> > > > > > client
> >> > > > > > > > does
> >> > > > > > > > > > not
> >> > > > > > > > > > > understand the message format, then we could also
> use
> >> a
> >> > > > > different
> >> > > > > > > > > > > compression type.
> >> > > > > > > > > > >
> >> > > > > > > > > > > For the produce API, I think it's reasonable to bump
> >> the
> >> > > api
> >> > > > > > > version.
> >> > > > > > > > > > This
> >> > > > > > > > > > > can be used by clients to check whether a broker
> >> supports
> >> > > > zstd.
> >> > > > > > For
> >> > > > > > > > > > > example, we might support a list of preferred
> >> compression
> >> > > > types
> >> > > > > > in
> >> > > > > > > > the
> >> > > > > > > > > > > producer and we could use the broker to detect which
> >> > > version
> >> > > > to
> >> > > > > > > use.
> >> > > > > > > > > > >
> >> > > > > > > > > > > -Jason
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
> >> > > > > cmccabe@apache.org
> >> > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a good
> >> > > > compression
> >> > > > > > > codec
> >> > > > > > > > > > and I
> >> > > > > > > > > > > > hope we can get this support in soon!
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > I would say we can just bump the API version to
> >> > indicate
> >> > > > that
> >> > > > > > > ZStd
> >> > > > > > > > > > > support
> >> > > > > > > > > > > > is expected in new clients.  We probably need some
> >> way
> >> > of
> >> > > > > > > > indicating
> >> > > > > > > > > to
> >> > > > > > > > > > > the
> >> > > > > > > > > > > > older clients that they can't consume the
> >> partitions,
> >> > as
> >> > > > > well.
> >> > > > > > > > > Perhaps
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > The license thing seems straightforward -- it's
> >> just a
> >> > > > matter
> >> > > > > > of
> >> > > > > > > > > adding
> >> > > > > > > > > > > > the text to the right files as per ASF guidelines.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > With regard to the dictionary support, do we
> really
> >> > need
> >> > > > that
> >> > > > > > in
> >> > > > > > > > the
> >> > > > > > > > > > > first
> >> > > > > > > > > > > > version?  Hopefully message batches are big enough
> >> that
> >> > > > this
> >> > > > > > > isn't
> >> > > > > > > > > > > needed.
> >> > > > > > > > > > > > Some benchmarks might help here.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > best,
> >> > > > > > > > > > > > Colin
> >> > > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> >> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot this
> >> issue,
> >> > > > > KIP-110
> >> > > > > > > > > > > > > <https://cwiki.apache.org/
> >> > > confluence/display/KAFKA/KIP-
> >> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> >> > > > > > > > > > > > > .
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > For newcomers, Here is some summary of the
> >> history:
> >> > > > KIP-110
> >> > > > > > was
> >> > > > > > > > > > > > originally
> >> > > > > > > > > > > > > worked for the issue KAFKA-4514 but, it lacked
> >> > > benchmark
> >> > > > > > > results
> >> > > > > > > > to
> >> > > > > > > > > > get
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > agreement of the community. Later, Ivan Babrou
> and
> >> > some
> >> > > > > other
> >> > > > > > > > users
> >> > > > > > > > > > who
> >> > > > > > > > > > > > > adopted the patch provided their excellent
> >> > performance
> >> > > > > report
> >> > > > > > > > which
> >> > > > > > > > > > is
> >> > > > > > > > > > > > now
> >> > > > > > > > > > > > > included in the KIP, but it postponed again
> >> because
> >> > of
> >> > > > the
> >> > > > > > > > > community
> >> > > > > > > > > > > was
> >> > > > > > > > > > > > > busy for 2.0.0 release. It is why I now reboot
> >> this
> >> > > > issue.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > The following is the current status of the
> >> feature:
> >> > You
> >> > > > can
> >> > > > > > > check
> >> > > > > > > > > the
> >> > > > > > > > > > > > > current draft implementation here
> >> > > > > > > > > > > > > <https://github.com/apache/kafka/pull/2267>. It
> >> is
> >> > > based
> >> > > > > on
> >> > > > > > > zstd
> >> > > > > > > > > > 1.3.5
> >> > > > > > > > > > > > and
> >> > > > > > > > > > > > > periodically rebased onto the latest trunk[^1].
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > The issues that should be addressed is like
> >> > following:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > *1. Backward Compatibility*
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > To support old consumers, we need to take a
> >> strategy
> >> > to
> >> > > > > > handle
> >> > > > > > > > the
> >> > > > > > > > > > old
> >> > > > > > > > > > > > > consumers. Current candidates are:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > - Bump API version
> >> > > > > > > > > > > > > - Leave unchanged: let the old clients fail.
> >> > > > > > > > > > > > > - Improve the error messages:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > *2. Dictionary Support*
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > To support zstd's dictionary feature in the
> future
> >> > (if
> >> > > > > > needed),
> >> > > > > > > > we
> >> > > > > > > > > > need
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > > sketch how it should be and leave some room for
> >> it.
> >> > As
> >> > > of
> >> > > > > > now,
> >> > > > > > > > > there
> >> > > > > > > > > > > has
> >> > > > > > > > > > > > > been no discussion on this topic yet.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > *3. License*
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > To use this feature, we need to add license of
> >> zstd
> >> > and
> >> > > > > > > zstd-jni
> >> > > > > > > > to
> >> > > > > > > > > > the
> >> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for raising
> >> this
> >> > > > issue!)
> >> > > > > > It
> >> > > > > > > > > seems
> >> > > > > > > > > > > like
> >> > > > > > > > > > > > > what Apache Spark did would be a good example
> but
> >> > there
> >> > > > has
> >> > > > > > > been
> >> > > > > > > > no
> >> > > > > > > > > > > > > discussion yet.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > You can find the details of the above issues in
> >> the
> >> > KIP
> >> > > > > > > document.
> >> > > > > > > > > > > Please
> >> > > > > > > > > > > > > have a look when you are free, and give me
> >> feedback.
> >> > > All
> >> > > > > > kinds
> >> > > > > > > of
> >> > > > > > > > > > > > > participating are welcome.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Best,
> >> > > > > > > > > > > > > Dongjin
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > [^1]: At the time of writing, commit 6b4fb8152.
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
> >> > > > > > > dongjin@apache.org
> >> > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Sorry for the late reply.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > In short, I could not submit the updated KIP
> by
> >> the
> >> > > > > feature
> >> > > > > > > > > freeze
> >> > > > > > > > > > > > > > deadline of 2.0.0. For this reason, it will
> not
> >> be
> >> > > > > included
> >> > > > > > > in
> >> > > > > > > > > the
> >> > > > > > > > > > > > 2.0.0
> >> > > > > > > > > > > > > > release and all discussion for this issue were
> >> > > > postponed
> >> > > > > > > after
> >> > > > > > > > > the
> >> > > > > > > > > > > > release
> >> > > > > > > > > > > > > > of 2.0.0.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > I have been updating the PR following recent
> >> > updates.
> >> > > > > Just
> >> > > > > > > > now, I
> >> > > > > > > > > > > > rebased
> >> > > > > > > > > > > > > > it against the latest trunk and updated the
> zstd
> >> > > > version
> >> > > > > > into
> >> > > > > > > > > > 1.3.5.
> >> > > > > > > > > > > > If you
> >> > > > > > > > > > > > > > need some request, don't hesitate to notify
> me.
> >> > (But
> >> > > > not
> >> > > > > > this
> >> > > > > > > > > > thread
> >> > > > > > > > > > > -
> >> > > > > > > > > > > > just
> >> > > > > > > > > > > > > > send me the message directly.)
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Best,
> >> > > > > > > > > > > > > > Dongjin
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <
> >> > > > > > > bobby@apache.org
> >> > > > > > > > >
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >> I there any update on this.  The performance
> >> > > > > improvements
> >> > > > > > > are
> >> > > > > > > > > > quite
> >> > > > > > > > > > > > > >> impressive and I really would like to stop
> >> forking
> >> > > > kafka
> >> > > > > > > just
> >> > > > > > > > to
> >> > > > > > > > > > get
> >> > > > > > > > > > > > this
> >> > > > > > > > > > > > > >> in.
> >> > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > >> Thanks,
> >> > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > >> Bobby
> >> > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
> >> > > > > > > > dongjin@apache.org
> >> > > > > > > > > >
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > >> > Ismael,
> >> > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > >> > Oh, I forgot all of you are on working
> frenzy
> >> > for
> >> > > > 2.0!
> >> > > > > > No
> >> > > > > > > > > > problem,
> >> > > > > > > > > > > > take
> >> > > > > > > > > > > > > >> > your time. I am also working at another
> issue
> >> > now.
> >> > > > > Thank
> >> > > > > > > you
> >> > > > > > > > > for
> >> > > > > > > > > > > > > >> letting me
> >> > > > > > > > > > > > > >> > know.
> >> > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > >> > Best,
> >> > > > > > > > > > > > > >> > Dongjin
> >> > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma
> <
> >> > > > > > > > ismaelj@gmail.com
> >> > > > > > > > > >
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin. Everyone is
> >> busy
> >> > > > > > finalising
> >> > > > > > > > > > 2.0.0.
> >> > > > > > > > > > > > This
> >> > > > > > > > > > > > > >> KIP
> >> > > > > > > > > > > > > >> > > seems like a great candidate for 2.1.0
> and
> >> > > > hopefully
> >> > > > > > > there
> >> > > > > > > > > > will
> >> > > > > > > > > > > be
> >> > > > > > > > > > > > > >> more
> >> > > > > > > > > > > > > >> > of
> >> > > > > > > > > > > > > >> > > a discussion next week. :)
> >> > > > > > > > > > > > > >> > >
> >> > > > > > > > > > > > > >> > > Ismael
> >> > > > > > > > > > > > > >> > >
> >> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
> >> > > > > > > > dongjin@apache.org
> >> > > > > > > > > >
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >> > >
> >> > > > > > > > > > > > > >> > > > Hello. I just updated my draft
> >> > implementation:
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit
> >> 5145d6b)
> >> > > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > > You can check out the implementation
> from
> >> > here
> >> > > > > > > > > > > > > >> > > > <
> >> https://github.com/apache/kafka/pull/2267
> >> > >.
> >> > > If
> >> > > > > you
> >> > > > > > > > > > > experience
> >> > > > > > > > > > > > any
> >> > > > > > > > > > > > > >> > > problem
> >> > > > > > > > > > > > > >> > > > running it, don't hesitate to give me a
> >> > > mention.
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > > Best,
> >> > > > > > > > > > > > > >> > > > Dongjin
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin
> >> Lee
> >> > <
> >> > > > > > > > > > > dongjin@apache.org
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >> > wrote:
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > > > Here is the short conclusion about
> the
> >> > > license
> >> > > > > > > > problem:
> >> > > > > > > > > > *We
> >> > > > > > > > > > > > can
> >> > > > > > > > > > > > > >> use
> >> > > > > > > > > > > > > >> > > zstd
> >> > > > > > > > > > > > > >> > > > > and zstd-jni without any problem, but
> >> we
> >> > > need
> >> > > > to
> >> > > > > > > > include
> >> > > > > > > > > > > their
> >> > > > > > > > > > > > > >> > license,
> >> > > > > > > > > > > > > >> > > > > e.g., BSD license.*
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3
> Clause
> >> > > > License
> >> > > > > > > > requires
> >> > > > > > > > > > to
> >> > > > > > > > > > > > > >> include
> >> > > > > > > > > > > > > >> > the
> >> > > > > > > > > > > > > >> > > > > license used, and BSD 3 Clause
> License
> >> > > > requires
> >> > > > > > that
> >> > > > > > > > the
> >> > > > > > > > > > > name
> >> > > > > > > > > > > > of
> >> > > > > > > > > > > > > >> the
> >> > > > > > > > > > > > > >> > > > > contributor can't be used to endorse
> or
> >> > > > promote
> >> > > > > > the
> >> > > > > > > > > > product.
> >> > > > > > > > > > > > > >> That's
> >> > > > > > > > > > > > > >> > it
> >> > > > > > > > > > > > > >> > > > > <
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > >
> >> > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > >>
> >> > http://www.mikestratton.net/2011/12/is-bsd-license-
> >> > > > > > > > > > > > compatible-with-apache-2-0-license/
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > - They are not listed in the list of
> >> > > > prohibited
> >> > > > > > > > licenses
> >> > > > > > > > > > > > > >> > > > > <https://www.apache.org/legal/
> >> > > > > > > > resolved.html#category-x>
> >> > > > > > > > > > > also.
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > Here is how Spark did for it
> >> > > > > > > > > > > > > >> > > > > <https://issues.apache.org/
> >> > > > > > jira/browse/SPARK-19112
> >> > > > > > > >:
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > - They made a directory dedicated to
> >> the
> >> > > > > > dependency
> >> > > > > > > > > > license
> >> > > > > > > > > > > > files
> >> > > > > > > > > > > > > >> > > > > <
> >> > > > > > > https://github.com/apache/spark/tree/master/licenses
> >> > > > > > > > >
> >> > > > > > > > > > and
> >> > > > > > > > > > > > added
> >> > > > > > > > > > > > > >> > > > licenses
> >> > > > > > > > > > > > > >> > > > > for Zstd
> >> > > > > > > > > > > > > >> > > > > <
> >> > > > > > > > > > > > > >> >
> >> > > > https://github.com/apache/spark/blob/master/licenses/
> >> > > > > > > > > > > > LICENSE-zstd.txt
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > > &
> >> > > > > > > > > > > > > >> > > > > Zstd-jni
> >> > > > > > > > > > > > > >> > > > > <
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > >
> >> > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > >> https://github.com/apache/
> >> > > spark/blob/master/licenses/
> >> > > > > > > > > > > > LICENSE-zstd-jni.txt
> >> > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > >> > > > > .
> >> > > > > > > > > > > > > >> > > > > - Added a link to the original
> license
> >> > files
> >> > > > in
> >> > > > > > > > LICENSE.
> >> > > > > > > > > > > > > >> > > > > <
> >> > > > > https://github.com/apache/spark/pull/18805/files
> >> > > > > > >
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > If needed, I can make a similar
> update.
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > Thanks for pointing out this problem,
> >> > > Viktor!
> >> > > > > Nice
> >> > > > > > > > > catch!
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > Best,
> >> > > > > > > > > > > > > >> > > > > Dongjin
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM
> >> Dongjin
> >> > > Lee <
> >> > > > > > > > > > > > dongjin@apache.org>
> >> > > > > > > > > > > > > >> > > wrote:
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > >> I greatly appreciate your
> >> comprehensive
> >> > > > > > reasoning.
> >> > > > > > > > so:
> >> > > > > > > > > +1
> >> > > > > > > > > > > > for b
> >> > > > > > > > > > > > > >> > until
> >> > > > > > > > > > > > > >> > > > now.
> >> > > > > > > > > > > > > >> > > > >>
> >> > > > > > > > > > > > > >> > > > >> For the license issues, I will have
> a
> >> > check
> >> > > > on
> >> > > > > > how
> >> > > > > > > > the
> >> > > > > > > > > > over
> >> > > > > > > > > > > > > >> projects
> >> > > > > > > > > > > > > >> > > are
> >> > > > > > > > > > > > > >> > > > >> doing and share the results.
> >> > > > > > > > > > > > > >> > > > >>
> >> > > > > > > > > > > > > >> > > > >> Best,
> >> > > > > > > > > > > > > >> > > > >> Dongjin
> >> > > > > > > > > > > > > >> > > > >>
> >> > > > > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM
> >> Viktor
> >> > > > > Somogyi <
> >> > > > > > > > > > > > > >> > > > viktorsomogyi@gmail.com>
> >> > > > > > > > > > > > > >> > > > >> wrote:
> >> > > > > > > > > > > > > >> > > > >>
> >> > > > > > > > > > > > > >> > > > >>> Hi Dongjin,
> >> > > > > > > > > > > > > >> > > > >>>
> >> > > > > > > > > > > > > >> > > > >>> A couple of comments:
> >> > > > > > > > > > > > > >> > > > >>> I would vote for option b. in the
> >> > > "backward
> >> > > > > > > > > > compatibility"
> >> > > > > > > > > > > > > >> section.
> >> > > > > > > > > > > > > >> > > My
> >> > > > > > > > > > > > > >> > > > >>> reasoning for this is that users
> >> > upgrading
> >> > > > to
> >> > > > > a
> >> > > > > > > zstd
> >> > > > > > > > > > > > compatible
> >> > > > > > > > > > > > > >> > > version
> >> > > > > > > > > > > > > >> > > > >>> won't start to use it
> automatically,
> >> so
> >> > > > manual
> >> > > > > > > > > > > > reconfiguration
> >> > > > > > > > > > > > > >> is
> >> > > > > > > > > > > > > >> > > > >>> required.
> >> > > > > > > > > > > > > >> > > > >>> Therefore an upgrade won't mess up
> >> the
> >> > > > > cluster.
> >> > > > > > If
> >> > > > > > > > not
> >> > > > > > > > > > all
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > > >> > > clients
> >> > > > > > > > > > > > > >> > > > >>> are
> >> > > > > > > > > > > > > >> > > > >>> upgraded but just some of them and
> >> > they'd
> >> > > > > start
> >> > > > > > to
> >> > > > > > > > use
> >> > > > > > > > > > > zstd
> >> > > > > > > > > > > > > >> then it
> >> > > > > > > > > > > > > >> > > > would
> >> > > > > > > > > > > > > >> > > > >>> cause errors in the cluster. I'd
> >> like to
> >> > > > > presume
> >> > > > > > > > > though
> >> > > > > > > > > > > that
> >> > > > > > > > > > > > > >> this
> >> > > > > > > > > > > > > >> > is
> >> > > > > > > > > > > > > >> > > a
> >> > > > > > > > > > > > > >> > > > >>> very
> >> > > > > > > > > > > > > >> > > > >>> obvious failure case and nobody
> >> should
> >> > be
> >> > > > > > > surprised
> >> > > > > > > > if
> >> > > > > > > > > > it
> >> > > > > > > > > > > > didn't
> >> > > > > > > > > > > > > >> > > work.
> >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose a. as I think we
> >> > should
> >> > > > bump
> >> > > > > > the
> >> > > > > > > > > fetch
> >> > > > > > > > > > > and
> >> > > > > > > > > > > > > >> > produce
> >> > > > > > > > > > > > > >> > > > >>> requests if it's a change in the
> >> message
> >> > > > > format.
> >> > > > > > > > > > Moreover
> >> > > > > > > > > > > if
> >> > > > > > > > > > > > > >> some
> >> > > > > > > > > > > > > >> > of
> >> > > > > > > > > > > > > >> > > > the
> >> > > > > > > > > > > > > >> > > > >>> producers and the brokers are
> >> upgraded
> >> > but
> >> > > > > some
> >> > > > > > of
> >> > > > > > > > the
> >> > > > > > > > > > > > consumers
> >> > > > > > > > > > > > > >> > are
> >> > > > > > > > > > > > > >> > > > not,
> >> > > > > > > > > > > > > >> > > > >>> then we wouldn't prevent the error
> >> when
> >> > > the
> >> > > > > old
> >> > > > > > > > > consumer
> >> > > > > > > > > > > > tries
> >> > > > > > > > > > > > > >> to
> >> > > > > > > > > > > > > >> > > > consume
> >> > > > > > > > > > > > > >> > > > >>> the zstd compressed messages.
> >> > > > > > > > > > > > > >> > > > >>> I wouldn't choose c. either as I
> >> think
> >> > > > binding
> >> > > > > > the
> >> > > > > > > > > > > > compression
> >> > > > > > > > > > > > > >> type
> >> > > > > > > > > > > > > >> > > to
> >> > > > > > > > > > > > > >> > > > an
> >> > > > > > > > > > > > > >> > > > >>> API is not so obvious from the
> >> > developer's
> >> > > > > > > > > perspective.
> >> > > > > > > > > > > > > >> > > > >>>
> >> > > > > > > > > > > > > >> > > > >>> I would also prefer to use the
> >> existing
> >> > > > > binding,
> >> > > > > > > > > however
> >> > > > > > > > > > > we
> >> > > > > > > > > > > > must
> >> > > > > > > > > > > > > >> > > > respect
> >> > > > > > > > > > > > > >> > > > >>> the licenses:
> >> > > > > > > > > > > > > >> > > > >>> "The code for these JNI bindings is
> >> > > licenced
> >> > > > > > under
> >> > > > > > > > > > > 2-clause
> >> > > > > > > > > > > > BSD
> >> > > > > > > > > > > > > >> > > > license.
> >> > > > > > > > > > > > > >> > > > >>> The native Zstd library is licensed
> >> > under
> >> > > > > > 3-clause
> >> > > > > > > > BSD
> >> > > > > > > > > > > > license
> >> > > > > > > > > > > > > >> and
> >> > > > > > > > > > > > > >> > > > GPL2"
> >> > > > > > > > > > > > > >> > > > >>> Based on the FAQ page
> >> > > > > > > > > > > > > >> > > > >>> https://www.apache.org/legal/
> >> > > > > > > > resolved.html#category-a
> >> > > > > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD
> >> licenses
> >> > > but
> >> > > > > the
> >> > > > > > > > Apache
> >> > > > > > > > > > > > license
> >> > > > > > > > > > > > > >> is
> >> > > > > > > > > > > > > >> > not
> >> > > > > > > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping
> that
> >> > the
> >> > > > > > > "3-clause
> >> > > > > > > > > BSD
> >> > > > > > > > > > > > license
> >> > > > > > > > > > > > > >> and
> >> > > > > > > > > > > > > >> > > > GPL2"
> >> > > > > > > > > > > > > >> > > > >>> is really not an AND but an OR in
> >> this
> >> > > case,
> >> > > > > but
> >> > > > > > > I'm
> >> > > > > > > > > no
> >> > > > > > > > > > > > lawyer,
> >> > > > > > > > > > > > > >> > just
> >> > > > > > > > > > > > > >> > > > >>> wanted
> >> > > > > > > > > > > > > >> > > > >>> to make the point that we should
> >> watch
> >> > out
> >> > > > for
> >> > > > > > > > > licenses.
> >> > > > > > > > > > > :)
> >> > > > > > > > > > > > > >> > > > >>>
> >> > > > > > > > > > > > > >> > > > >>> Regards,
> >> > > > > > > > > > > > > >> > > > >>> Viktor
> >> > > > > > > > > > > > > >> > > > >>>
> >> > > > > > > > > > > > > >> > > > >>>
> >> > > > > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM
> Ivan
> >> > > Babrou
> >> > > > <
> >> > > > > > > > > > > > ibobrik@gmail.com>
> >> > > > > > > > > > > > > >> > > wrote:
> >> > > > > > > > > > > > > >> > > > >>>
> >> > > > > > > > > > > > > >> > > > >>> > Hello,
> >> > > > > > > > > > > > > >> > > > >>> >
> >> > > > > > > > > > > > > >> > > > >>> > This is Ivan and I still very
> much
> >> > > support
> >> > > > > the
> >> > > > > > > > fact
> >> > > > > > > > > > that
> >> > > > > > > > > > > > zstd
> >> > > > > > > > > > > > > >> > > > >>> compression
> >> > > > > > > > > > > > > >> > > > >>> > should be included out of the
> box.
> >> > > > > > > > > > > > > >> > > > >>> >
> >> > > > > > > > > > > > > >> > > > >>> > Please think about the
> environment,
> >> > you
> >> > > > can
> >> > > > > > save
> >> > > > > > > > > > quite a
> >> > > > > > > > > > > > lot
> >> > > > > > > > > > > > > >> of
> >> > > > > > > > > > > > > >> > > > >>> hardware
> >> > > > > > > > > > > > > >> > > > >>> > with it.
> >> > > > > > > > > > > > > >> > > > >>> >
> >> > > > > > > > > > > > > >> > > > >>> > Thank you.
> >> > > > > > > > > > > > > >> > > > >>> >
> >> > > > > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14
> >> Dongjin
> >> > > Lee <
> >> > > > > > > > > > > > dongjin@apache.org>
> >> > > > > > > > > > > > > >> > > wrote:
> >> > > > > > > > > > > > > >> > > > >>> >
> >> > > > > > > > > > > > > >> > > > >>> > > Since there are no responses
> for
> >> a
> >> > > > week, I
> >> > > > > > > > decided
> >> > > > > > > > > > to
> >> > > > > > > > > > > > > >> > reinitiate
> >> > > > > > > > > > > > > >> > > > the
> >> > > > > > > > > > > > > >> > > > >>> > > discussion thread.
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> >
> >> > > > > > > > > > > > > >> > > > >>>
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > >
> >> > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > >>
> >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> > > This KIP is about to introduce
> >> > > ZStandard
> >> > > > > > > > > Compression
> >> > > > > > > > > > > > into
> >> > > > > > > > > > > > > >> > Apache
> >> > > > > > > > > > > > > >> > > > >>> Kafka.
> >> > > > > > > > > > > > > >> > > > >>> > > The reason why it is posted
> again
> >> > has
> >> > > a
> >> > > > > > story:
> >> > > > > > > > It
> >> > > > > > > > > > was
> >> > > > > > > > > > > > > >> > originally
> >> > > > > > > > > > > > > >> > > > >>> posted
> >> > > > > > > > > > > > > >> > > > >>> > to
> >> > > > > > > > > > > > > >> > > > >>> > > the dev mailing list more than
> >> one
> >> > > year
> >> > > > > ago
> >> > > > > > > but
> >> > > > > > > > > > since
> >> > > > > > > > > > > it
> >> > > > > > > > > > > > > >> has no
> >> > > > > > > > > > > > > >> > > > >>> > performance
> >> > > > > > > > > > > > > >> > > > >>> > > report included, it was
> postponed
> >> > > later.
> >> > > > > But
> >> > > > > > > > Some
> >> > > > > > > > > > > people
> >> > > > > > > > > > > > > >> > > (including
> >> > > > > > > > > > > > > >> > > > >>> Ivan)
> >> > > > > > > > > > > > > >> > > > >>> > > reported excellent performance
> >> > report
> >> > > > with
> >> > > > > > the
> >> > > > > > > > > draft
> >> > > > > > > > > > > PR,
> >> > > > > > > > > > > > > >> this
> >> > > > > > > > > > > > > >> > > work
> >> > > > > > > > > > > > > >> > > > >>> is now
> >> > > > > > > > > > > > > >> > > > >>> > > reactivated.
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> > > The updated KIP document
> includes
> >> > some
> >> > > > > > > expected
> >> > > > > > > > > > > > problems and
> >> > > > > > > > > > > > > >> > > their
> >> > > > > > > > > > > > > >> > > > >>> > > candidate alternatives. Please
> >> have
> >> > a
> >> > > > look
> >> > > > > > > when
> >> > > > > > > > > you
> >> > > > > > > > > > > are
> >> > > > > > > > > > > > > >> free,
> >> > > > > > > > > > > > > >> > and
> >> > > > > > > > > > > > > >> > > > >>> give
> >> > > > > > > > > > > > > >> > > > >>> > me a
> >> > > > > > > > > > > > > >> > > > >>> > > feedback. All kinds of
> >> participating
> >> > > are
> >> > > > > > > > welcome.
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> > > Best,
> >> > > > > > > > > > > > > >> > > > >>> > > Dongjin
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> > > --
> >> > > > > > > > > > > > > >> > > > >>> > > *Dongjin Lee*
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> > > *A hitchhiker in the
> mathematical
> >> > > > world.*
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> > > *github:  <
> >> http://goog_969573159/
> >> > > > >github
> >> > > > > > > > > > > > .com/dongjinleekr
> >> > > > > > > > > > > > > >> > > > >>> > > <http://github.com/
> dongjinleekr
> >> > > > >linkedin:
> >> > > > > > > > > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> >> > > > > > > > > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/
> >> > > dongjinleekr
> >> > > > > > > > > >slideshare:
> >> > > > > > > > > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> >> > > > > > > > > > > > > >> > > > >>> > > <http://www.slideshare.net/
> >> > > dongjinleekr
> >> > > > >*
> >> > > > > > > > > > > > > >> > > > >>> > >
> >> > > > > > > > > > > > > >> > > > >>> >
> >> > > > > > > > > > > > > >> > > > >>>
> >> > > > > > > > > > > > > >> > > > >> --
> >> > > > > > > > > > > > > >> > > > >> *Dongjin Lee*
> >> > > > > > > > > > > > > >> > > > >>
> >> > > > > > > > > > > > > >> > > > >> *A hitchhiker in the mathematical
> >> world.*
> >> > > > > > > > > > > > > >> > > > >>
> >> > > > > > > > > > > > > >> > > > >> *github:  <http://goog_969573159/
> >> >github
> >> > > > > > > > > > .com/dongjinleekr
> >> > > > > > > > > > > > > >> > > > >> <http://github.com/dongjinleekr
> >> > >linkedin:
> >> > > > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> >> > > > > > > > > > > > > >> > > > >> <
> >> http://kr.linkedin.com/in/dongjinleekr
> >> > > > > > > >slideshare:
> >> > > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> >> > > > > > > > > > > > > >> > > > >> <
> >> http://www.slideshare.net/dongjinleekr
> >> > >*
> >> > > > > > > > > > > > > >> > > > >>
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > --
> >> > > > > > > > > > > > > >> > > > > *Dongjin Lee*
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > *A hitchhiker in the mathematical
> >> world.*
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > > > *github:  <http://goog_969573159/>
> >> > > > > > > > > github.com/dongjinleekr
> >> > > > > > > > > > > > > >> > > > > <http://github.com/dongjinleekr
> >> >linkedin:
> >> > > > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> >> > > > > > > > > > > > > >> > > > > <
> >> http://kr.linkedin.com/in/dongjinleekr
> >> > > > > > >slideshare:
> >> > > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> >> > > > > > > > > > > > > >> > > > > <
> >> http://www.slideshare.net/dongjinleekr>*
> >> > > > > > > > > > > > > >> > > > >
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > > --
> >> > > > > > > > > > > > > >> > > > *Dongjin Lee*
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > > *A hitchhiker in the mathematical
> world.*
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > > > *github:  <http://goog_969573159/
> >github
> >> > > > > > > > .com/dongjinleekr
> >> > > > > > > > > > > > > >> > > > <http://github.com/dongjinleekr
> >> >linkedin:
> >> > > > > > > > > > > > > >> > > kr.linkedin.com/in/dongjinleekr
> >> > > > > > > > > > > > > >> > > > <http://kr.linkedin.com/in/
> dongjinleekr
> >> > > > > >slideshare:
> >> > > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> >> > > > > > > > > > > > > >> > > > <http://www.slideshare.net/
> dongjinleekr
> >> >*
> >> > > > > > > > > > > > > >> > > >
> >> > > > > > > > > > > > > >> > >
> >> > > > > > > > > > > > > >> >
> >> > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > >> --
> >> > > > > > > > > > > > > >> *Dongjin Lee*
> >> > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > >> *A hitchhiker in the mathematical world.*
> >> > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > >> *github:  <http://goog_969573159/>
> >> > > > > github.com/dongjinleekr
> >> > > > > > > > > > > > > >> <http://github.com/dongjinleekr>linkedin:
> >> > > > > > > kr.linkedin.com/in/
> >> > > > > > > > > > > > dongjinleekr
> >> > > > > > > > > > > > > >> <http://kr.linkedin.com/in/dongjinleekr
> >> > >slideshare:
> >> > > > > > > > > > > > www.slideshare.net/dongjinleekr
> >> > > > > > > > > > > > > >> <http://www.slideshare.net/dongjinleekr>*
> >> > > > > > > > > > > > > >>
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > --
> >> > > > > > > > > > > > > *Dongjin Lee*
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > *A hitchhiker in the mathematical world.*
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > *github:  <http://goog_969573159/>
> >> > > > github.com/dongjinleekr
> >> > > > > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> >> > > > > > kr.linkedin.com/in/
> >> > > > > > > > > > > > dongjinleekr
> >> > > > > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
> >> >slideshare:
> >> > > > > > > > > > > > > www.slideshare.net/dongjinleekr
> >> > > > > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> >> > > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > --
> >> > > > > > > > > > > *Dongjin Lee*
> >> > > > > > > > > > >
> >> > > > > > > > > > > *A hitchhiker in the mathematical world.*
> >> > > > > > > > > > >
> >> > > > > > > > > > > *github:  <http://goog_969573159/>
> >> > github.com/dongjinleekr
> >> > > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> >> > > > kr.linkedin.com/in/
> >> > > > > > > > > > dongjinleekr
> >> > > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> >> > > > > > > > > www.slideshare.net/
> >> > > > > > > > > > dongjinleekr
> >> > > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > > *Dongjin Lee*
> >> > > > > > > > >
> >> > > > > > > > > *A hitchhiker in the mathematical world.*
> >> > > > > > > > >
> >> > > > > > > > > *github:  <http://goog_969573159/>github
> .com/dongjinleekr
> >> > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> >> > kr.linkedin.com/in/
> >> > > > > > > > dongjinleekr
> >> > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> >> > > > > > > www.slideshare.net/
> >> > > > > > > > dongjinleekr
> >> > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > > --
> >> > > > *Dongjin Lee*
> >> > > >
> >> > > > *A hitchhiker in the mathematical world.*
> >> > > >
> >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > <http://github.com/dongji <http://github.com/dongjinleekr>
> >
> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Jason and Ismael,

It seems like the only thing we need to regard if we define a new error
code (i.e., UNSUPPORTED_COMPRESSION_TYPE) would be the implementation of
the other language clients, right? At least, this strategy causes any
problem for Java client. Do I understand correctly?

Thanks,
Dongjin

On Wed, Aug 22, 2018 at 5:43 PM Dongjin Lee <do...@apache.org> wrote:

> Jason,
>
> > I think we would only use this error code when we /know/ that zstd was
> in use and the client doesn't support it? This is true if either 1) the
> message needs down-conversion and we encounter a zstd compressed message,
> or 2) if the topic is explicitly configured to use zstd.
>
> Yes, it is right. And you know, the case 1 includes 1.a) old clients'
> request v0, v1 records or 1.b) implicit zstd, the compression type of
> "producer" with Zstd compressed data.
>
> > However, if the compression type is set to "producer," then the fetched
> data may or may not be compressed with zstd. In this case, we return the
> data to the client and expect it to fail parsing. Is that correct?
>
> Exactly.
>
> Following your message, I reviewed the implementation of
> `KafkaApis#handleFetchRequest,` which handles the fetch request. And found
> that the information we can use is like the following:
>
> 1. Client's fetch request version. (`versionId` variable)
> 2. Log's compression type. (`logConfig` variable)
>
> We can't detect the actual compression type of the data, unless we inspect
> the `RecordBatch` included in the `Records` instance (i.e.,
> `unconvertedRecords` variable.) Since it requires some performance issue,
> it is not our option - in short, we can't be sure if given chunks of data
> are compressed with zstd or not.
>
> So, conclusion: we can return an error in the case of 1.a and 2 easily,
> with the information above. In the case 1.b (implicit zstd), we can just
> return the data by do nothing special and expect it to fail parsing.
>
> Thanks,
> Dongjin
>
> On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <is...@gmail.com> wrote:
>
>> Jason, that's an interesting point regarding the Java client. Do we know
>> what clients in other languages do in these cases?
>>
>> Ismael
>>
>> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <ja...@confluent.io> wrote:
>>
>> > Hi Dongjin,
>> >
>> > One of the complications is that old versions of the API will not
>> expect a
>> > new error code. However, since we expect this to be a fatal error anyway
>> > for old clients, it may still be more useful to return the correct error
>> > code. For example, the Kafka clients use the following code to convert
>> the
>> > error code:
>> >
>> >     public static Errors forCode(short code) {
>> >         Errors error = codeToError.get(code);
>> >         if (error != null) {
>> >             return error;
>> >         } else {
>> >             log.warn("Unexpected error code: {}.", code);
>> >             return UNKNOWN_SERVER_ERROR;
>> >         }
>> >     }
>> >
>> > If we return an unsupported error code, it will be converted to an
>> UNKNOWN
>> > error, but at least we will get the message in the log with the correct
>> > code. That seems preferable to returning a misleading error code. So I
>> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error even for
>> > older versions.
>> >
>> > Also, one question just to check my understanding. I think we would only
>> > use this error code when we /know/ that zstd was in use and the client
>> > doesn't support it? This is true if either 1) the message needs
>> > down-conversion and we encounter a zstd compressed message, or 2) if the
>> > topic is explicitly configured to use zstd. However, if the compression
>> > type is set to "producer," then the fetched data may or may not be
>> > compressed with zstd. In this case, we return the data to the client and
>> > expect it to fail parsing. Is that correct?
>> >
>> > Thanks,
>> > Jason
>> >
>> >
>> >
>> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <do...@apache.org>
>> wrote:
>> >
>> > > Ismael, Jason and all,
>> > >
>> > > I rewrote the backward compatibility strategy & its alternatives like
>> > > following, based on Ismael & Jason's comments. Since it is not
>> updated to
>> > > the wiki yet, don't hesitate to give me a message if you have any
>> opinion
>> > > on it.
>> > >
>> > > ```
>> > > *Backward Compatibility*
>> > >
>> > > We need to establish some backward-compatibility strategy for the
>> case an
>> > > old client subscribes a topic using ZStandard implicitly (i.e.,
>> > > 'compression.type' configuration of given topic is 'producer' and the
>> > > producer compressed the records with ZStandard). We have the following
>> > > options for this situation:
>> > >
>> > > *A. Support ZStandard to the old clients which can understand v0, v1
>> > > messages only.*
>> > >
>> > > This strategy necessarily requires the down-conversion of v2 message
>> > > compressed with Zstandard into v0 or v1 messages, which means a
>> > > considerable performance degradation. So we rejected this strategy.
>> > >
>> > > *B. Bump the API version and support only v2-available clients*
>> > >
>> > > With this approach, we can message the old clients that they are old
>> and
>> > > should be upgraded. However, there are still several options for the
>> > Error
>> > > code.
>> > >
>> > > *B.1. INVALID_REQUEST (42)*
>> > >
>> > > This option gives the client so little information; the user can be
>> > > confused about why the client worked correctly in the past suddenly
>> > > encounters a problem. So we rejected this strategy.
>> > >
>> > > *B.2. CORRUPT_MESSAGE (2)*
>> > >
>> > > This option gives inaccurate information; the user can be surprised
>> and
>> > > misunderstand that the log files are broken in some way. So we
>> rejected
>> > > this strategy.
>> > >
>> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
>> > >
>> > > The advantage of this approach is we don't need to define a new error
>> > code;
>> > > we can reuse it and that's all.
>> > >
>> > > The disadvantage of this approach is that it is also a little bit
>> vague;
>> > > This error code is defined as a work for KIP-98[^1] and now returned
>> in
>> > the
>> > > transaction error.
>> > >
>> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
>> > >
>> > > The advantage of this approach is that it is clear and provides an
>> exact
>> > > description. The disadvantage is we need to add a new error code.
>> > > ```
>> > >
>> > > *It seems like what we need to choose is now so clear:
>> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or UNSUPPORTED_COMPRESSION_TYPE
>> > > (B.4).*
>> > > The first one doesn't need a new error message but the latter is more
>> > > explicit. Which one do you prefer? Since all of you have much more
>> > > experience and knowledge than me, I will follow your decision. The
>> wiki
>> > > page will be updated following the decision also.
>> > >
>> > > Best,
>> > > Dongjin
>> > >
>> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
>> > >
>> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <is...@gmail.com>
>> wrote:
>> > >
>> > > > Sounds reasonable to me.
>> > > >
>> > > > Ismael
>> > > >
>> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <ja...@confluent.io>
>> > wrote:
>> > > >
>> > > > > Hey Ismael,
>> > > > >
>> > > > > Your summary looks good to me. I think it might also be a good
>> idea
>> > to
>> > > > add
>> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along with the
>> > > > version
>> > > > > bumps. We won't be able to use it for old api versions since the
>> > > clients
>> > > > > will not understand it, but we can use it going forward so that
>> we're
>> > > not
>> > > > > stuck in a similar situation with a new message format and a new
>> > codec
>> > > to
>> > > > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_FORMAT,
>> but
>> > > it
>> > > > is
>> > > > > not as explicit.
>> > > > >
>> > > > > -Jason
>> > > > >
>> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <is...@juma.me.uk>
>> > > wrote:
>> > > > >
>> > > > > > Hi Dongjin and Jason,
>> > > > > >
>> > > > > > I would agree. My summary:
>> > > > > >
>> > > > > > 1. Support zstd with message format 2 only.
>> > > > > > 2. Bump produce and fetch request versions.
>> > > > > > 3. Provide broker errors whenever possible based on the request
>> > > version
>> > > > > and
>> > > > > > rely on clients for the cases where the broker can't validate
>> > > > efficiently
>> > > > > > (example message format 2 consumer that supports the latest
>> fetch
>> > > > version
>> > > > > > but doesn't support zstd).
>> > > > > >
>> > > > > > If there's general agreement on this, I suggest we update the
>> KIP
>> > to
>> > > > > state
>> > > > > > the proposal and to move the rejected options to its own
>> section.
>> > And
>> > > > > then
>> > > > > > start a vote!
>> > > > > >
>> > > > > > Ismael
>> > > > > >
>> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
>> > jason@confluent.io>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi Dongjin,
>> > > > > > >
>> > > > > > > Yes, that's a good summary. For clients which support v2, the
>> > > client
>> > > > > can
>> > > > > > > parse the message format and hopefully raise a useful error
>> > message
>> > > > > > > indicating the unsupported compression type. For older
>> clients,
>> > our
>> > > > > > options
>> > > > > > > are probably (1) to down-convert to the old format using no
>> > > > compression
>> > > > > > > type, or (2) to return an error code. I'm leaning toward the
>> > latter
>> > > > as
>> > > > > > the
>> > > > > > > simpler solution, but the challenge is finding a good error
>> code.
>> > > Two
>> > > > > > > possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE. The
>> > > > downside
>> > > > > > is
>> > > > > > > that old clients probably won't get a helpful message.
>> However,
>> > at
>> > > > > least
>> > > > > > > the behavior will be consistent in the sense that all clients
>> > will
>> > > > fail
>> > > > > > if
>> > > > > > > they do not support zstandard.
>> > > > > > >
>> > > > > > > What do you think?
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Jason
>> > > > > > >
>> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
>> dongjin@apache.org
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Thanks Jason, I reviewed the down-converting logic following
>> > your
>> > > > > > > > explanation.[^1] You mean the following routines, right?
>> > > > > > > >
>> > > > > > > > -
>> > > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/
>> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
>> > > > > > > > -
>> > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
>> > > > > > > > main/java/org/apache/kafka/common/record/
>> > > LazyDownConversionRecords.
>> > > > > > > > java#L165
>> > > > > > > > -
>> > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
>> > > > > > > >
>> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
>> > > > > > > >
>> > > > > > > > It seems like your stance is like following:
>> > > > > > > >
>> > > > > > > > 1. In principle, Kafka does not change the compression codec
>> > when
>> > > > > > > > down-converting, since it requires inspecting the fetched
>> data,
>> > > > which
>> > > > > > is
>> > > > > > > > expensive.
>> > > > > > > > 2. However, there are some cases the fetched data is
>> inspected
>> > > > > anyway.
>> > > > > > In
>> > > > > > > > this case, we can provide compression conversion from
>> Zstandard
>> > > to
>> > > > > > > > classical ones[^2].
>> > > > > > > >
>> > > > > > > > And from what I understand, the cases where the client
>> without
>> > > > > > ZStandard
>> > > > > > > > support receives ZStandard compressed records can be
>> organized
>> > > into
>> > > > > two
>> > > > > > > > cases:
>> > > > > > > >
>> > > > > > > > a. The 'compression.type' configuration of given topic is
>> > > > 'producer'
>> > > > > > and
>> > > > > > > > the producer compressed the records with ZStandard. (that
>> is,
>> > > using
>> > > > > > > > ZStandard implicitly.)
>> > > > > > > > b.  The 'compression.type' configuration of given topic is
>> > > 'zstd';
>> > > > > that
>> > > > > > > is,
>> > > > > > > > using ZStandard explicitly.
>> > > > > > > >
>> > > > > > > > As you stated, we don't have to handle the case b specially.
>> > So,
>> > > It
>> > > > > > seems
>> > > > > > > > like we can narrow the focus of the problem by joining case
>> 1
>> > and
>> > > > > case
>> > > > > > b
>> > > > > > > > like the following:
>> > > > > > > >
>> > > > > > > > > Given the topic with 'producer' as its 'compression.type'
>> > > > > > > configuration,
>> > > > > > > > ZStandard compressed records and old client without
>> ZStandard,
>> > is
>> > > > > there
>> > > > > > > any
>> > > > > > > > case we need to inspect the records and can change the
>> > > compression
>> > > > > > type?
>> > > > > > > If
>> > > > > > > > so, can we provide compression type converting?
>> > > > > > > >
>> > > > > > > > Do I understand correctly?
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Dongjin
>> > > > > > > >
>> > > > > > > > [^1]: I'm sorry, I found that I was a little bit
>> > misunderstanding
>> > > > how
>> > > > > > API
>> > > > > > > > version works, after reviewing the downconvert logic & the
>> > > protocol
>> > > > > > > > documentation <https://kafka.apache.org/protocol>.
>> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
>> > > > > > > >
>> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
>> > > > jason@confluent.io>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > But in my opinion, since the client will fail with the
>> API
>> > > > > version,
>> > > > > > > so
>> > > > > > > > we
>> > > > > > > > > > don't need to down-convert the messages anyway. Isn't
>> it?
>> > > So, I
>> > > > > > think
>> > > > > > > > we
>> > > > > > > > > > don't care about this case. (I'm sorry, I am not
>> familiar
>> > > with
>> > > > > > > > > down-convert
>> > > > > > > > > > logic.)
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > Currently the broker down-converts automatically when it
>> > > receives
>> > > > > an
>> > > > > > > old
>> > > > > > > > > version of the fetch request (a version which is known to
>> > > predate
>> > > > > the
>> > > > > > > > > message format in use). Typically when down-converting the
>> > > > message
>> > > > > > > > format,
>> > > > > > > > > we use the same compression type, but there is not much
>> point
>> > > in
>> > > > > > doing
>> > > > > > > so
>> > > > > > > > > when we know the client doesn't support it. So if
>> zstandard
>> > is
>> > > in
>> > > > > > use,
>> > > > > > > > and
>> > > > > > > > > we have to down-convert anyway, then we can choose to use
>> a
>> > > > > different
>> > > > > > > > > compression type or no compression type.
>> > > > > > > > >
>> > > > > > > > > From my perspective, there is no significant downside to
>> > > bumping
>> > > > > the
>> > > > > > > > > protocol version and it has several potential benefits.
>> > Version
>> > > > > bumps
>> > > > > > > are
>> > > > > > > > > cheap. The main question mark in my mind is about
>> > > > down-conversion.
>> > > > > > > > Figuring
>> > > > > > > > > out whether down-conversion is needed is hard generally
>> > without
>> > > > > > > > inspecting
>> > > > > > > > > the fetched data, which is expensive. I think we agree in
>> > > > principle
>> > > > > > > that
>> > > > > > > > we
>> > > > > > > > > do not want to have to pay this cost generally and prefer
>> the
>> > > > > clients
>> > > > > > > to
>> > > > > > > > > fail when they see an unhandled compression type. The
>> point I
>> > > was
>> > > > > > > making
>> > > > > > > > is
>> > > > > > > > > that there are some cases where we are either inspecting
>> the
>> > > data
>> > > > > > > anyway
>> > > > > > > > > (because we have to down-convert the message format), or
>> we
>> > > have
>> > > > an
>> > > > > > > easy
>> > > > > > > > > way to tell whether zstandard is in use (the topic has it
>> > > > > configured
>> > > > > > > > > explicitly). In the latter case, we don't have to handle
>> it
>> > > > > > specially.
>> > > > > > > > But
>> > > > > > > > > we do have to decide how we will handle down-conversion to
>> > > older
>> > > > > > > formats.
>> > > > > > > > >
>> > > > > > > > > -Jason
>> > > > > > > > >
>> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
>> > > dongjin@apache.org
>> > > > >
>> > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Colin and Jason,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for your opinions. In summarizing, the Pros and
>> Cons
>> > > of
>> > > > > > > bumping
>> > > > > > > > > > fetch API version are:
>> > > > > > > > > >
>> > > > > > > > > > Cons:
>> > > > > > > > > >
>> > > > > > > > > > - The Broker can't know whether a given message batch is
>> > > > > compressed
>> > > > > > > > with
>> > > > > > > > > > zstd or not.
>> > > > > > > > > > - Need some additional logic for the topic explicitly
>> > > > configured
>> > > > > to
>> > > > > > > use
>> > > > > > > > > > zstd.
>> > > > > > > > > >
>> > > > > > > > > > Pros:
>> > > > > > > > > >
>> > > > > > > > > > - The broker doesn't need to conduct expensive
>> > > down-conversion.
>> > > > > > > > > > - Can message the users to update their client.
>> > > > > > > > > >
>> > > > > > > > > > So, opinions for the backward-compatibility policy by
>> far:
>> > > > > > > > > >
>> > > > > > > > > > - A: bump the API version - +2 (Colin, Jason)
>> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
>> > > > > > > > > >
>> > > > > > > > > > Here are my additional comments:
>> > > > > > > > > >
>> > > > > > > > > > @Colin
>> > > > > > > > > >
>> > > > > > > > > > I greatly appreciate your response. In the case of the
>> > > > dictionary
>> > > > > > > > > support,
>> > > > > > > > > > of course, this issue should be addressed later so we
>> don't
>> > > > need
>> > > > > it
>> > > > > > > in
>> > > > > > > > > the
>> > > > > > > > > > first version. You are right - it is not late to try it
>> > after
>> > > > > some
>> > > > > > > > > > benchmarks. What I mean is, we should keep in mind on
>> that
>> > > > > > potential
>> > > > > > > > > > feature.
>> > > > > > > > > >
>> > > > > > > > > > @Jason
>> > > > > > > > > >
>> > > > > > > > > > You wrote,
>> > > > > > > > > >
>> > > > > > > > > > > Similarly, if we have to down-convert anyway because
>> the
>> > > > client
>> > > > > > > does
>> > > > > > > > > not
>> > > > > > > > > > understand the message format, then we could also use a
>> > > > different
>> > > > > > > > > > compression type.
>> > > > > > > > > >
>> > > > > > > > > > But in my opinion, since the client will fail with the
>> API
>> > > > > version,
>> > > > > > > so
>> > > > > > > > we
>> > > > > > > > > > don't need to down-convert the messages anyway. Isn't
>> it?
>> > > So, I
>> > > > > > think
>> > > > > > > > we
>> > > > > > > > > > don't care about this case. (I'm sorry, I am not
>> familiar
>> > > with
>> > > > > > > > > down-convert
>> > > > > > > > > > logic.)
>> > > > > > > > > >
>> > > > > > > > > > Please give more opinions. Thanks!
>> > > > > > > > > >
>> > > > > > > > > > - Dongjin
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
>> > > > > jason@confluent.io
>> > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hey Colin,
>> > > > > > > > > > >
>> > > > > > > > > > > The problem for the fetch API is that the broker does
>> not
>> > > > > > generally
>> > > > > > > > > know
>> > > > > > > > > > if
>> > > > > > > > > > > a batch was compressed with zstd unless it parses it.
>> I
>> > > think
>> > > > > the
>> > > > > > > > goal
>> > > > > > > > > > here
>> > > > > > > > > > > is to avoid the expensive down-conversion that is
>> needed
>> > to
>> > > > > > ensure
>> > > > > > > > > > > compatibility because it is only necessary if zstd is
>> > > > actually
>> > > > > in
>> > > > > > > > use.
>> > > > > > > > > > But
>> > > > > > > > > > > as long as old clients can parse the message format,
>> they
>> > > > > should
>> > > > > > > get
>> > > > > > > > a
>> > > > > > > > > > > reasonable error if they see an unsupported
>> compression
>> > > type
>> > > > in
>> > > > > > the
>> > > > > > > > > > > attributes. Basically the onus is on users to ensure
>> that
>> > > > their
>> > > > > > > > > consumers
>> > > > > > > > > > > have been updated prior to using zstd. It seems like a
>> > > > > reasonable
>> > > > > > > > > > tradeoff
>> > > > > > > > > > > to me. There are a couple cases that might be worth
>> > > thinking
>> > > > > > > through:
>> > > > > > > > > > >
>> > > > > > > > > > > 1. If a topic is explicitly configured to use zstd,
>> then
>> > we
>> > > > > don't
>> > > > > > > > need
>> > > > > > > > > to
>> > > > > > > > > > > check the fetched data for the compression type to
>> know
>> > if
>> > > we
>> > > > > > need
>> > > > > > > > > > > down-conversion. If we did bump the Fetch API version,
>> > then
>> > > > we
>> > > > > > > could
>> > > > > > > > > > handle
>> > > > > > > > > > > this case by either down-converting using a different
>> > > > > compression
>> > > > > > > > type
>> > > > > > > > > or
>> > > > > > > > > > > returning an error.
>> > > > > > > > > > > 2. Similarly, if we have to down-convert anyway
>> because
>> > the
>> > > > > > client
>> > > > > > > > does
>> > > > > > > > > > not
>> > > > > > > > > > > understand the message format, then we could also use
>> a
>> > > > > different
>> > > > > > > > > > > compression type.
>> > > > > > > > > > >
>> > > > > > > > > > > For the produce API, I think it's reasonable to bump
>> the
>> > > api
>> > > > > > > version.
>> > > > > > > > > > This
>> > > > > > > > > > > can be used by clients to check whether a broker
>> supports
>> > > > zstd.
>> > > > > > For
>> > > > > > > > > > > example, we might support a list of preferred
>> compression
>> > > > types
>> > > > > > in
>> > > > > > > > the
>> > > > > > > > > > > producer and we could use the broker to detect which
>> > > version
>> > > > to
>> > > > > > > use.
>> > > > > > > > > > >
>> > > > > > > > > > > -Jason
>> > > > > > > > > > >
>> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
>> > > > > cmccabe@apache.org
>> > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a good
>> > > > compression
>> > > > > > > codec
>> > > > > > > > > > and I
>> > > > > > > > > > > > hope we can get this support in soon!
>> > > > > > > > > > > >
>> > > > > > > > > > > > I would say we can just bump the API version to
>> > indicate
>> > > > that
>> > > > > > > ZStd
>> > > > > > > > > > > support
>> > > > > > > > > > > > is expected in new clients.  We probably need some
>> way
>> > of
>> > > > > > > > indicating
>> > > > > > > > > to
>> > > > > > > > > > > the
>> > > > > > > > > > > > older clients that they can't consume the
>> partitions,
>> > as
>> > > > > well.
>> > > > > > > > > Perhaps
>> > > > > > > > > > > we
>> > > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
>> > > > > > > > > > > >
>> > > > > > > > > > > > The license thing seems straightforward -- it's
>> just a
>> > > > matter
>> > > > > > of
>> > > > > > > > > adding
>> > > > > > > > > > > > the text to the right files as per ASF guidelines.
>> > > > > > > > > > > >
>> > > > > > > > > > > > With regard to the dictionary support, do we really
>> > need
>> > > > that
>> > > > > > in
>> > > > > > > > the
>> > > > > > > > > > > first
>> > > > > > > > > > > > version?  Hopefully message batches are big enough
>> that
>> > > > this
>> > > > > > > isn't
>> > > > > > > > > > > needed.
>> > > > > > > > > > > > Some benchmarks might help here.
>> > > > > > > > > > > >
>> > > > > > > > > > > > best,
>> > > > > > > > > > > > Colin
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
>> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot this
>> issue,
>> > > > > KIP-110
>> > > > > > > > > > > > > <https://cwiki.apache.org/
>> > > confluence/display/KAFKA/KIP-
>> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
>> > > > > > > > > > > > > .
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > For newcomers, Here is some summary of the
>> history:
>> > > > KIP-110
>> > > > > > was
>> > > > > > > > > > > > originally
>> > > > > > > > > > > > > worked for the issue KAFKA-4514 but, it lacked
>> > > benchmark
>> > > > > > > results
>> > > > > > > > to
>> > > > > > > > > > get
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > agreement of the community. Later, Ivan Babrou and
>> > some
>> > > > > other
>> > > > > > > > users
>> > > > > > > > > > who
>> > > > > > > > > > > > > adopted the patch provided their excellent
>> > performance
>> > > > > report
>> > > > > > > > which
>> > > > > > > > > > is
>> > > > > > > > > > > > now
>> > > > > > > > > > > > > included in the KIP, but it postponed again
>> because
>> > of
>> > > > the
>> > > > > > > > > community
>> > > > > > > > > > > was
>> > > > > > > > > > > > > busy for 2.0.0 release. It is why I now reboot
>> this
>> > > > issue.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > The following is the current status of the
>> feature:
>> > You
>> > > > can
>> > > > > > > check
>> > > > > > > > > the
>> > > > > > > > > > > > > current draft implementation here
>> > > > > > > > > > > > > <https://github.com/apache/kafka/pull/2267>. It
>> is
>> > > based
>> > > > > on
>> > > > > > > zstd
>> > > > > > > > > > 1.3.5
>> > > > > > > > > > > > and
>> > > > > > > > > > > > > periodically rebased onto the latest trunk[^1].
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > The issues that should be addressed is like
>> > following:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > *1. Backward Compatibility*
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > To support old consumers, we need to take a
>> strategy
>> > to
>> > > > > > handle
>> > > > > > > > the
>> > > > > > > > > > old
>> > > > > > > > > > > > > consumers. Current candidates are:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > - Bump API version
>> > > > > > > > > > > > > - Leave unchanged: let the old clients fail.
>> > > > > > > > > > > > > - Improve the error messages:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > *2. Dictionary Support*
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > To support zstd's dictionary feature in the future
>> > (if
>> > > > > > needed),
>> > > > > > > > we
>> > > > > > > > > > need
>> > > > > > > > > > > > to
>> > > > > > > > > > > > > sketch how it should be and leave some room for
>> it.
>> > As
>> > > of
>> > > > > > now,
>> > > > > > > > > there
>> > > > > > > > > > > has
>> > > > > > > > > > > > > been no discussion on this topic yet.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > *3. License*
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > To use this feature, we need to add license of
>> zstd
>> > and
>> > > > > > > zstd-jni
>> > > > > > > > to
>> > > > > > > > > > the
>> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for raising
>> this
>> > > > issue!)
>> > > > > > It
>> > > > > > > > > seems
>> > > > > > > > > > > like
>> > > > > > > > > > > > > what Apache Spark did would be a good example but
>> > there
>> > > > has
>> > > > > > > been
>> > > > > > > > no
>> > > > > > > > > > > > > discussion yet.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > You can find the details of the above issues in
>> the
>> > KIP
>> > > > > > > document.
>> > > > > > > > > > > Please
>> > > > > > > > > > > > > have a look when you are free, and give me
>> feedback.
>> > > All
>> > > > > > kinds
>> > > > > > > of
>> > > > > > > > > > > > > participating are welcome.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > Dongjin
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > [^1]: At the time of writing, commit 6b4fb8152.
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
>> > > > > > > dongjin@apache.org
>> > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > > Sorry for the late reply.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > In short, I could not submit the updated KIP by
>> the
>> > > > > feature
>> > > > > > > > > freeze
>> > > > > > > > > > > > > > deadline of 2.0.0. For this reason, it will not
>> be
>> > > > > included
>> > > > > > > in
>> > > > > > > > > the
>> > > > > > > > > > > > 2.0.0
>> > > > > > > > > > > > > > release and all discussion for this issue were
>> > > > postponed
>> > > > > > > after
>> > > > > > > > > the
>> > > > > > > > > > > > release
>> > > > > > > > > > > > > > of 2.0.0.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > I have been updating the PR following recent
>> > updates.
>> > > > > Just
>> > > > > > > > now, I
>> > > > > > > > > > > > rebased
>> > > > > > > > > > > > > > it against the latest trunk and updated the zstd
>> > > > version
>> > > > > > into
>> > > > > > > > > > 1.3.5.
>> > > > > > > > > > > > If you
>> > > > > > > > > > > > > > need some request, don't hesitate to notify me.
>> > (But
>> > > > not
>> > > > > > this
>> > > > > > > > > > thread
>> > > > > > > > > > > -
>> > > > > > > > > > > > just
>> > > > > > > > > > > > > > send me the message directly.)
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Best,
>> > > > > > > > > > > > > > Dongjin
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <
>> > > > > > > bobby@apache.org
>> > > > > > > > >
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >> I there any update on this.  The performance
>> > > > > improvements
>> > > > > > > are
>> > > > > > > > > > quite
>> > > > > > > > > > > > > >> impressive and I really would like to stop
>> forking
>> > > > kafka
>> > > > > > > just
>> > > > > > > > to
>> > > > > > > > > > get
>> > > > > > > > > > > > this
>> > > > > > > > > > > > > >> in.
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> Thanks,
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> Bobby
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
>> > > > > > > > dongjin@apache.org
>> > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> > Ismael,
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >> > Oh, I forgot all of you are on working frenzy
>> > for
>> > > > 2.0!
>> > > > > > No
>> > > > > > > > > > problem,
>> > > > > > > > > > > > take
>> > > > > > > > > > > > > >> > your time. I am also working at another issue
>> > now.
>> > > > > Thank
>> > > > > > > you
>> > > > > > > > > for
>> > > > > > > > > > > > > >> letting me
>> > > > > > > > > > > > > >> > know.
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >> > Best,
>> > > > > > > > > > > > > >> > Dongjin
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <
>> > > > > > > > ismaelj@gmail.com
>> > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin. Everyone is
>> busy
>> > > > > > finalising
>> > > > > > > > > > 2.0.0.
>> > > > > > > > > > > > This
>> > > > > > > > > > > > > >> KIP
>> > > > > > > > > > > > > >> > > seems like a great candidate for 2.1.0 and
>> > > > hopefully
>> > > > > > > there
>> > > > > > > > > > will
>> > > > > > > > > > > be
>> > > > > > > > > > > > > >> more
>> > > > > > > > > > > > > >> > of
>> > > > > > > > > > > > > >> > > a discussion next week. :)
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > > Ismael
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
>> > > > > > > > dongjin@apache.org
>> > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> > > > Hello. I just updated my draft
>> > implementation:
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit
>> 5145d6b)
>> > > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > > You can check out the implementation from
>> > here
>> > > > > > > > > > > > > >> > > > <
>> https://github.com/apache/kafka/pull/2267
>> > >.
>> > > If
>> > > > > you
>> > > > > > > > > > > experience
>> > > > > > > > > > > > any
>> > > > > > > > > > > > > >> > > problem
>> > > > > > > > > > > > > >> > > > running it, don't hesitate to give me a
>> > > mention.
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > > Best,
>> > > > > > > > > > > > > >> > > > Dongjin
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin
>> Lee
>> > <
>> > > > > > > > > > > dongjin@apache.org
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >> > wrote:
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > > > Here is the short conclusion about the
>> > > license
>> > > > > > > > problem:
>> > > > > > > > > > *We
>> > > > > > > > > > > > can
>> > > > > > > > > > > > > >> use
>> > > > > > > > > > > > > >> > > zstd
>> > > > > > > > > > > > > >> > > > > and zstd-jni without any problem, but
>> we
>> > > need
>> > > > to
>> > > > > > > > include
>> > > > > > > > > > > their
>> > > > > > > > > > > > > >> > license,
>> > > > > > > > > > > > > >> > > > > e.g., BSD license.*
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause
>> > > > License
>> > > > > > > > requires
>> > > > > > > > > > to
>> > > > > > > > > > > > > >> include
>> > > > > > > > > > > > > >> > the
>> > > > > > > > > > > > > >> > > > > license used, and BSD 3 Clause License
>> > > > requires
>> > > > > > that
>> > > > > > > > the
>> > > > > > > > > > > name
>> > > > > > > > > > > > of
>> > > > > > > > > > > > > >> the
>> > > > > > > > > > > > > >> > > > > contributor can't be used to endorse or
>> > > > promote
>> > > > > > the
>> > > > > > > > > > product.
>> > > > > > > > > > > > > >> That's
>> > > > > > > > > > > > > >> > it
>> > > > > > > > > > > > > >> > > > > <
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >>
>> > http://www.mikestratton.net/2011/12/is-bsd-license-
>> > > > > > > > > > > > compatible-with-apache-2-0-license/
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > - They are not listed in the list of
>> > > > prohibited
>> > > > > > > > licenses
>> > > > > > > > > > > > > >> > > > > <https://www.apache.org/legal/
>> > > > > > > > resolved.html#category-x>
>> > > > > > > > > > > also.
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > Here is how Spark did for it
>> > > > > > > > > > > > > >> > > > > <https://issues.apache.org/
>> > > > > > jira/browse/SPARK-19112
>> > > > > > > >:
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > - They made a directory dedicated to
>> the
>> > > > > > dependency
>> > > > > > > > > > license
>> > > > > > > > > > > > files
>> > > > > > > > > > > > > >> > > > > <
>> > > > > > > https://github.com/apache/spark/tree/master/licenses
>> > > > > > > > >
>> > > > > > > > > > and
>> > > > > > > > > > > > added
>> > > > > > > > > > > > > >> > > > licenses
>> > > > > > > > > > > > > >> > > > > for Zstd
>> > > > > > > > > > > > > >> > > > > <
>> > > > > > > > > > > > > >> >
>> > > > https://github.com/apache/spark/blob/master/licenses/
>> > > > > > > > > > > > LICENSE-zstd.txt
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > > &
>> > > > > > > > > > > > > >> > > > > Zstd-jni
>> > > > > > > > > > > > > >> > > > > <
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >> https://github.com/apache/
>> > > spark/blob/master/licenses/
>> > > > > > > > > > > > LICENSE-zstd-jni.txt
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >> > > > > .
>> > > > > > > > > > > > > >> > > > > - Added a link to the original license
>> > files
>> > > > in
>> > > > > > > > LICENSE.
>> > > > > > > > > > > > > >> > > > > <
>> > > > > https://github.com/apache/spark/pull/18805/files
>> > > > > > >
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > If needed, I can make a similar update.
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > Thanks for pointing out this problem,
>> > > Viktor!
>> > > > > Nice
>> > > > > > > > > catch!
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > Best,
>> > > > > > > > > > > > > >> > > > > Dongjin
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM
>> Dongjin
>> > > Lee <
>> > > > > > > > > > > > dongjin@apache.org>
>> > > > > > > > > > > > > >> > > wrote:
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > >> I greatly appreciate your
>> comprehensive
>> > > > > > reasoning.
>> > > > > > > > so:
>> > > > > > > > > +1
>> > > > > > > > > > > > for b
>> > > > > > > > > > > > > >> > until
>> > > > > > > > > > > > > >> > > > now.
>> > > > > > > > > > > > > >> > > > >>
>> > > > > > > > > > > > > >> > > > >> For the license issues, I will have a
>> > check
>> > > > on
>> > > > > > how
>> > > > > > > > the
>> > > > > > > > > > over
>> > > > > > > > > > > > > >> projects
>> > > > > > > > > > > > > >> > > are
>> > > > > > > > > > > > > >> > > > >> doing and share the results.
>> > > > > > > > > > > > > >> > > > >>
>> > > > > > > > > > > > > >> > > > >> Best,
>> > > > > > > > > > > > > >> > > > >> Dongjin
>> > > > > > > > > > > > > >> > > > >>
>> > > > > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM
>> Viktor
>> > > > > Somogyi <
>> > > > > > > > > > > > > >> > > > viktorsomogyi@gmail.com>
>> > > > > > > > > > > > > >> > > > >> wrote:
>> > > > > > > > > > > > > >> > > > >>
>> > > > > > > > > > > > > >> > > > >>> Hi Dongjin,
>> > > > > > > > > > > > > >> > > > >>>
>> > > > > > > > > > > > > >> > > > >>> A couple of comments:
>> > > > > > > > > > > > > >> > > > >>> I would vote for option b. in the
>> > > "backward
>> > > > > > > > > > compatibility"
>> > > > > > > > > > > > > >> section.
>> > > > > > > > > > > > > >> > > My
>> > > > > > > > > > > > > >> > > > >>> reasoning for this is that users
>> > upgrading
>> > > > to
>> > > > > a
>> > > > > > > zstd
>> > > > > > > > > > > > compatible
>> > > > > > > > > > > > > >> > > version
>> > > > > > > > > > > > > >> > > > >>> won't start to use it automatically,
>> so
>> > > > manual
>> > > > > > > > > > > > reconfiguration
>> > > > > > > > > > > > > >> is
>> > > > > > > > > > > > > >> > > > >>> required.
>> > > > > > > > > > > > > >> > > > >>> Therefore an upgrade won't mess up
>> the
>> > > > > cluster.
>> > > > > > If
>> > > > > > > > not
>> > > > > > > > > > all
>> > > > > > > > > > > > the
>> > > > > > > > > > > > > >> > > clients
>> > > > > > > > > > > > > >> > > > >>> are
>> > > > > > > > > > > > > >> > > > >>> upgraded but just some of them and
>> > they'd
>> > > > > start
>> > > > > > to
>> > > > > > > > use
>> > > > > > > > > > > zstd
>> > > > > > > > > > > > > >> then it
>> > > > > > > > > > > > > >> > > > would
>> > > > > > > > > > > > > >> > > > >>> cause errors in the cluster. I'd
>> like to
>> > > > > presume
>> > > > > > > > > though
>> > > > > > > > > > > that
>> > > > > > > > > > > > > >> this
>> > > > > > > > > > > > > >> > is
>> > > > > > > > > > > > > >> > > a
>> > > > > > > > > > > > > >> > > > >>> very
>> > > > > > > > > > > > > >> > > > >>> obvious failure case and nobody
>> should
>> > be
>> > > > > > > surprised
>> > > > > > > > if
>> > > > > > > > > > it
>> > > > > > > > > > > > didn't
>> > > > > > > > > > > > > >> > > work.
>> > > > > > > > > > > > > >> > > > >>> I wouldn't choose a. as I think we
>> > should
>> > > > bump
>> > > > > > the
>> > > > > > > > > fetch
>> > > > > > > > > > > and
>> > > > > > > > > > > > > >> > produce
>> > > > > > > > > > > > > >> > > > >>> requests if it's a change in the
>> message
>> > > > > format.
>> > > > > > > > > > Moreover
>> > > > > > > > > > > if
>> > > > > > > > > > > > > >> some
>> > > > > > > > > > > > > >> > of
>> > > > > > > > > > > > > >> > > > the
>> > > > > > > > > > > > > >> > > > >>> producers and the brokers are
>> upgraded
>> > but
>> > > > > some
>> > > > > > of
>> > > > > > > > the
>> > > > > > > > > > > > consumers
>> > > > > > > > > > > > > >> > are
>> > > > > > > > > > > > > >> > > > not,
>> > > > > > > > > > > > > >> > > > >>> then we wouldn't prevent the error
>> when
>> > > the
>> > > > > old
>> > > > > > > > > consumer
>> > > > > > > > > > > > tries
>> > > > > > > > > > > > > >> to
>> > > > > > > > > > > > > >> > > > consume
>> > > > > > > > > > > > > >> > > > >>> the zstd compressed messages.
>> > > > > > > > > > > > > >> > > > >>> I wouldn't choose c. either as I
>> think
>> > > > binding
>> > > > > > the
>> > > > > > > > > > > > compression
>> > > > > > > > > > > > > >> type
>> > > > > > > > > > > > > >> > > to
>> > > > > > > > > > > > > >> > > > an
>> > > > > > > > > > > > > >> > > > >>> API is not so obvious from the
>> > developer's
>> > > > > > > > > perspective.
>> > > > > > > > > > > > > >> > > > >>>
>> > > > > > > > > > > > > >> > > > >>> I would also prefer to use the
>> existing
>> > > > > binding,
>> > > > > > > > > however
>> > > > > > > > > > > we
>> > > > > > > > > > > > must
>> > > > > > > > > > > > > >> > > > respect
>> > > > > > > > > > > > > >> > > > >>> the licenses:
>> > > > > > > > > > > > > >> > > > >>> "The code for these JNI bindings is
>> > > licenced
>> > > > > > under
>> > > > > > > > > > > 2-clause
>> > > > > > > > > > > > BSD
>> > > > > > > > > > > > > >> > > > license.
>> > > > > > > > > > > > > >> > > > >>> The native Zstd library is licensed
>> > under
>> > > > > > 3-clause
>> > > > > > > > BSD
>> > > > > > > > > > > > license
>> > > > > > > > > > > > > >> and
>> > > > > > > > > > > > > >> > > > GPL2"
>> > > > > > > > > > > > > >> > > > >>> Based on the FAQ page
>> > > > > > > > > > > > > >> > > > >>> https://www.apache.org/legal/
>> > > > > > > > resolved.html#category-a
>> > > > > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD
>> licenses
>> > > but
>> > > > > the
>> > > > > > > > Apache
>> > > > > > > > > > > > license
>> > > > > > > > > > > > > >> is
>> > > > > > > > > > > > > >> > not
>> > > > > > > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping that
>> > the
>> > > > > > > "3-clause
>> > > > > > > > > BSD
>> > > > > > > > > > > > license
>> > > > > > > > > > > > > >> and
>> > > > > > > > > > > > > >> > > > GPL2"
>> > > > > > > > > > > > > >> > > > >>> is really not an AND but an OR in
>> this
>> > > case,
>> > > > > but
>> > > > > > > I'm
>> > > > > > > > > no
>> > > > > > > > > > > > lawyer,
>> > > > > > > > > > > > > >> > just
>> > > > > > > > > > > > > >> > > > >>> wanted
>> > > > > > > > > > > > > >> > > > >>> to make the point that we should
>> watch
>> > out
>> > > > for
>> > > > > > > > > licenses.
>> > > > > > > > > > > :)
>> > > > > > > > > > > > > >> > > > >>>
>> > > > > > > > > > > > > >> > > > >>> Regards,
>> > > > > > > > > > > > > >> > > > >>> Viktor
>> > > > > > > > > > > > > >> > > > >>>
>> > > > > > > > > > > > > >> > > > >>>
>> > > > > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan
>> > > Babrou
>> > > > <
>> > > > > > > > > > > > ibobrik@gmail.com>
>> > > > > > > > > > > > > >> > > wrote:
>> > > > > > > > > > > > > >> > > > >>>
>> > > > > > > > > > > > > >> > > > >>> > Hello,
>> > > > > > > > > > > > > >> > > > >>> >
>> > > > > > > > > > > > > >> > > > >>> > This is Ivan and I still very much
>> > > support
>> > > > > the
>> > > > > > > > fact
>> > > > > > > > > > that
>> > > > > > > > > > > > zstd
>> > > > > > > > > > > > > >> > > > >>> compression
>> > > > > > > > > > > > > >> > > > >>> > should be included out of the box.
>> > > > > > > > > > > > > >> > > > >>> >
>> > > > > > > > > > > > > >> > > > >>> > Please think about the environment,
>> > you
>> > > > can
>> > > > > > save
>> > > > > > > > > > quite a
>> > > > > > > > > > > > lot
>> > > > > > > > > > > > > >> of
>> > > > > > > > > > > > > >> > > > >>> hardware
>> > > > > > > > > > > > > >> > > > >>> > with it.
>> > > > > > > > > > > > > >> > > > >>> >
>> > > > > > > > > > > > > >> > > > >>> > Thank you.
>> > > > > > > > > > > > > >> > > > >>> >
>> > > > > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14
>> Dongjin
>> > > Lee <
>> > > > > > > > > > > > dongjin@apache.org>
>> > > > > > > > > > > > > >> > > wrote:
>> > > > > > > > > > > > > >> > > > >>> >
>> > > > > > > > > > > > > >> > > > >>> > > Since there are no responses for
>> a
>> > > > week, I
>> > > > > > > > decided
>> > > > > > > > > > to
>> > > > > > > > > > > > > >> > reinitiate
>> > > > > > > > > > > > > >> > > > the
>> > > > > > > > > > > > > >> > > > >>> > > discussion thread.
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> >
>> > > > > > > > > > > > > >> > > > >>>
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >>
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> > > This KIP is about to introduce
>> > > ZStandard
>> > > > > > > > > Compression
>> > > > > > > > > > > > into
>> > > > > > > > > > > > > >> > Apache
>> > > > > > > > > > > > > >> > > > >>> Kafka.
>> > > > > > > > > > > > > >> > > > >>> > > The reason why it is posted again
>> > has
>> > > a
>> > > > > > story:
>> > > > > > > > It
>> > > > > > > > > > was
>> > > > > > > > > > > > > >> > originally
>> > > > > > > > > > > > > >> > > > >>> posted
>> > > > > > > > > > > > > >> > > > >>> > to
>> > > > > > > > > > > > > >> > > > >>> > > the dev mailing list more than
>> one
>> > > year
>> > > > > ago
>> > > > > > > but
>> > > > > > > > > > since
>> > > > > > > > > > > it
>> > > > > > > > > > > > > >> has no
>> > > > > > > > > > > > > >> > > > >>> > performance
>> > > > > > > > > > > > > >> > > > >>> > > report included, it was postponed
>> > > later.
>> > > > > But
>> > > > > > > > Some
>> > > > > > > > > > > people
>> > > > > > > > > > > > > >> > > (including
>> > > > > > > > > > > > > >> > > > >>> Ivan)
>> > > > > > > > > > > > > >> > > > >>> > > reported excellent performance
>> > report
>> > > > with
>> > > > > > the
>> > > > > > > > > draft
>> > > > > > > > > > > PR,
>> > > > > > > > > > > > > >> this
>> > > > > > > > > > > > > >> > > work
>> > > > > > > > > > > > > >> > > > >>> is now
>> > > > > > > > > > > > > >> > > > >>> > > reactivated.
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> > > The updated KIP document includes
>> > some
>> > > > > > > expected
>> > > > > > > > > > > > problems and
>> > > > > > > > > > > > > >> > > their
>> > > > > > > > > > > > > >> > > > >>> > > candidate alternatives. Please
>> have
>> > a
>> > > > look
>> > > > > > > when
>> > > > > > > > > you
>> > > > > > > > > > > are
>> > > > > > > > > > > > > >> free,
>> > > > > > > > > > > > > >> > and
>> > > > > > > > > > > > > >> > > > >>> give
>> > > > > > > > > > > > > >> > > > >>> > me a
>> > > > > > > > > > > > > >> > > > >>> > > feedback. All kinds of
>> participating
>> > > are
>> > > > > > > > welcome.
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> > > Best,
>> > > > > > > > > > > > > >> > > > >>> > > Dongjin
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> > > --
>> > > > > > > > > > > > > >> > > > >>> > > *Dongjin Lee*
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> > > *A hitchhiker in the mathematical
>> > > > world.*
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> > > *github:  <
>> http://goog_969573159/
>> > > > >github
>> > > > > > > > > > > > .com/dongjinleekr
>> > > > > > > > > > > > > >> > > > >>> > > <http://github.com/dongjinleekr
>> > > > >linkedin:
>> > > > > > > > > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
>> > > > > > > > > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/
>> > > dongjinleekr
>> > > > > > > > > >slideshare:
>> > > > > > > > > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
>> > > > > > > > > > > > > >> > > > >>> > > <http://www.slideshare.net/
>> > > dongjinleekr
>> > > > >*
>> > > > > > > > > > > > > >> > > > >>> > >
>> > > > > > > > > > > > > >> > > > >>> >
>> > > > > > > > > > > > > >> > > > >>>
>> > > > > > > > > > > > > >> > > > >> --
>> > > > > > > > > > > > > >> > > > >> *Dongjin Lee*
>> > > > > > > > > > > > > >> > > > >>
>> > > > > > > > > > > > > >> > > > >> *A hitchhiker in the mathematical
>> world.*
>> > > > > > > > > > > > > >> > > > >>
>> > > > > > > > > > > > > >> > > > >> *github:  <http://goog_969573159/
>> >github
>> > > > > > > > > > .com/dongjinleekr
>> > > > > > > > > > > > > >> > > > >> <http://github.com/dongjinleekr
>> > >linkedin:
>> > > > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
>> > > > > > > > > > > > > >> > > > >> <
>> http://kr.linkedin.com/in/dongjinleekr
>> > > > > > > >slideshare:
>> > > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
>> > > > > > > > > > > > > >> > > > >> <
>> http://www.slideshare.net/dongjinleekr
>> > >*
>> > > > > > > > > > > > > >> > > > >>
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > --
>> > > > > > > > > > > > > >> > > > > *Dongjin Lee*
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > *A hitchhiker in the mathematical
>> world.*
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > > > *github:  <http://goog_969573159/>
>> > > > > > > > > github.com/dongjinleekr
>> > > > > > > > > > > > > >> > > > > <http://github.com/dongjinleekr
>> >linkedin:
>> > > > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
>> > > > > > > > > > > > > >> > > > > <
>> http://kr.linkedin.com/in/dongjinleekr
>> > > > > > >slideshare:
>> > > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
>> > > > > > > > > > > > > >> > > > > <
>> http://www.slideshare.net/dongjinleekr>*
>> > > > > > > > > > > > > >> > > > >
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > > --
>> > > > > > > > > > > > > >> > > > *Dongjin Lee*
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > > *A hitchhiker in the mathematical world.*
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > > > *github:  <http://goog_969573159/>github
>> > > > > > > > .com/dongjinleekr
>> > > > > > > > > > > > > >> > > > <http://github.com/dongjinleekr
>> >linkedin:
>> > > > > > > > > > > > > >> > > kr.linkedin.com/in/dongjinleekr
>> > > > > > > > > > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr
>> > > > > >slideshare:
>> > > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
>> > > > > > > > > > > > > >> > > > <http://www.slideshare.net/dongjinleekr
>> >*
>> > > > > > > > > > > > > >> > > >
>> > > > > > > > > > > > > >> > >
>> > > > > > > > > > > > > >> >
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> --
>> > > > > > > > > > > > > >> *Dongjin Lee*
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> *A hitchhiker in the mathematical world.*
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >> *github:  <http://goog_969573159/>
>> > > > > github.com/dongjinleekr
>> > > > > > > > > > > > > >> <http://github.com/dongjinleekr>linkedin:
>> > > > > > > kr.linkedin.com/in/
>> > > > > > > > > > > > dongjinleekr
>> > > > > > > > > > > > > >> <http://kr.linkedin.com/in/dongjinleekr
>> > >slideshare:
>> > > > > > > > > > > > www.slideshare.net/dongjinleekr
>> > > > > > > > > > > > > >> <http://www.slideshare.net/dongjinleekr>*
>> > > > > > > > > > > > > >>
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > --
>> > > > > > > > > > > > > *Dongjin Lee*
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > *A hitchhiker in the mathematical world.*
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > *github:  <http://goog_969573159/>
>> > > > github.com/dongjinleekr
>> > > > > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
>> > > > > > kr.linkedin.com/in/
>> > > > > > > > > > > > dongjinleekr
>> > > > > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
>> >slideshare:
>> > > > > > > > > > > > > www.slideshare.net/dongjinleekr
>> > > > > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > --
>> > > > > > > > > > > *Dongjin Lee*
>> > > > > > > > > > >
>> > > > > > > > > > > *A hitchhiker in the mathematical world.*
>> > > > > > > > > > >
>> > > > > > > > > > > *github:  <http://goog_969573159/>
>> > github.com/dongjinleekr
>> > > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
>> > > > kr.linkedin.com/in/
>> > > > > > > > > > dongjinleekr
>> > > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>> > > > > > > > > www.slideshare.net/
>> > > > > > > > > > dongjinleekr
>> > > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > *Dongjin Lee*
>> > > > > > > > >
>> > > > > > > > > *A hitchhiker in the mathematical world.*
>> > > > > > > > >
>> > > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > > > > > > <http://github.com/dongjinleekr>linkedin:
>> > kr.linkedin.com/in/
>> > > > > > > > dongjinleekr
>> > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>> > > > > > > www.slideshare.net/
>> > > > > > > > dongjinleekr
>> > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > > --
>> > > > *Dongjin Lee*
>> > > >
>> > > > *A hitchhiker in the mathematical world.*
>> > > >
>> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > <http://github.com/dongji <http://github.com/dongjinleekr>
>
>

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Jason,

> I think we would only use this error code when we /know/ that zstd was in
use and the client doesn't support it? This is true if either 1) the
message needs down-conversion and we encounter a zstd compressed message,
or 2) if the topic is explicitly configured to use zstd.

Yes, it is right. And you know, the case 1 includes 1.a) old clients'
request v0, v1 records or 1.b) implicit zstd, the compression type of
"producer" with Zstd compressed data.

> However, if the compression type is set to "producer," then the fetched
data may or may not be compressed with zstd. In this case, we return the
data to the client and expect it to fail parsing. Is that correct?

Exactly.

Following your message, I reviewed the implementation of
`KafkaApis#handleFetchRequest,` which handles the fetch request. And found
that the information we can use is like the following:

1. Client's fetch request version. (`versionId` variable)
2. Log's compression type. (`logConfig` variable)

We can't detect the actual compression type of the data, unless we inspect
the `RecordBatch` included in the `Records` instance (i.e.,
`unconvertedRecords` variable.) Since it requires some performance issue,
it is not our option - in short, we can't be sure if given chunks of data
are compressed with zstd or not.

So, conclusion: we can return an error in the case of 1.a and 2 easily,
with the information above. In the case 1.b (implicit zstd), we can just
return the data by do nothing special and expect it to fail parsing.

Thanks,
Dongjin

On Wed, Aug 22, 2018 at 12:02 PM Ismael Juma <is...@gmail.com> wrote:

> Jason, that's an interesting point regarding the Java client. Do we know
> what clients in other languages do in these cases?
>
> Ismael
>
> On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <ja...@confluent.io> wrote:
>
> > Hi Dongjin,
> >
> > One of the complications is that old versions of the API will not expect
> a
> > new error code. However, since we expect this to be a fatal error anyway
> > for old clients, it may still be more useful to return the correct error
> > code. For example, the Kafka clients use the following code to convert
> the
> > error code:
> >
> >     public static Errors forCode(short code) {
> >         Errors error = codeToError.get(code);
> >         if (error != null) {
> >             return error;
> >         } else {
> >             log.warn("Unexpected error code: {}.", code);
> >             return UNKNOWN_SERVER_ERROR;
> >         }
> >     }
> >
> > If we return an unsupported error code, it will be converted to an
> UNKNOWN
> > error, but at least we will get the message in the log with the correct
> > code. That seems preferable to returning a misleading error code. So I
> > wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error even for
> > older versions.
> >
> > Also, one question just to check my understanding. I think we would only
> > use this error code when we /know/ that zstd was in use and the client
> > doesn't support it? This is true if either 1) the message needs
> > down-conversion and we encounter a zstd compressed message, or 2) if the
> > topic is explicitly configured to use zstd. However, if the compression
> > type is set to "producer," then the fetched data may or may not be
> > compressed with zstd. In this case, we return the data to the client and
> > expect it to fail parsing. Is that correct?
> >
> > Thanks,
> > Jason
> >
> >
> >
> > On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <do...@apache.org> wrote:
> >
> > > Ismael, Jason and all,
> > >
> > > I rewrote the backward compatibility strategy & its alternatives like
> > > following, based on Ismael & Jason's comments. Since it is not updated
> to
> > > the wiki yet, don't hesitate to give me a message if you have any
> opinion
> > > on it.
> > >
> > > ```
> > > *Backward Compatibility*
> > >
> > > We need to establish some backward-compatibility strategy for the case
> an
> > > old client subscribes a topic using ZStandard implicitly (i.e.,
> > > 'compression.type' configuration of given topic is 'producer' and the
> > > producer compressed the records with ZStandard). We have the following
> > > options for this situation:
> > >
> > > *A. Support ZStandard to the old clients which can understand v0, v1
> > > messages only.*
> > >
> > > This strategy necessarily requires the down-conversion of v2 message
> > > compressed with Zstandard into v0 or v1 messages, which means a
> > > considerable performance degradation. So we rejected this strategy.
> > >
> > > *B. Bump the API version and support only v2-available clients*
> > >
> > > With this approach, we can message the old clients that they are old
> and
> > > should be upgraded. However, there are still several options for the
> > Error
> > > code.
> > >
> > > *B.1. INVALID_REQUEST (42)*
> > >
> > > This option gives the client so little information; the user can be
> > > confused about why the client worked correctly in the past suddenly
> > > encounters a problem. So we rejected this strategy.
> > >
> > > *B.2. CORRUPT_MESSAGE (2)*
> > >
> > > This option gives inaccurate information; the user can be surprised and
> > > misunderstand that the log files are broken in some way. So we rejected
> > > this strategy.
> > >
> > > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
> > >
> > > The advantage of this approach is we don't need to define a new error
> > code;
> > > we can reuse it and that's all.
> > >
> > > The disadvantage of this approach is that it is also a little bit
> vague;
> > > This error code is defined as a work for KIP-98[^1] and now returned in
> > the
> > > transaction error.
> > >
> > > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
> > >
> > > The advantage of this approach is that it is clear and provides an
> exact
> > > description. The disadvantage is we need to add a new error code.
> > > ```
> > >
> > > *It seems like what we need to choose is now so clear:
> > > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or UNSUPPORTED_COMPRESSION_TYPE
> > > (B.4).*
> > > The first one doesn't need a new error message but the latter is more
> > > explicit. Which one do you prefer? Since all of you have much more
> > > experience and knowledge than me, I will follow your decision. The wiki
> > > page will be updated following the decision also.
> > >
> > > Best,
> > > Dongjin
> > >
> > > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
> > >
> > > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <is...@gmail.com> wrote:
> > >
> > > > Sounds reasonable to me.
> > > >
> > > > Ismael
> > > >
> > > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <ja...@confluent.io>
> > wrote:
> > > >
> > > > > Hey Ismael,
> > > > >
> > > > > Your summary looks good to me. I think it might also be a good idea
> > to
> > > > add
> > > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along with the
> > > > version
> > > > > bumps. We won't be able to use it for old api versions since the
> > > clients
> > > > > will not understand it, but we can use it going forward so that
> we're
> > > not
> > > > > stuck in a similar situation with a new message format and a new
> > codec
> > > to
> > > > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_FORMAT,
> but
> > > it
> > > > is
> > > > > not as explicit.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > >
> > > > > > Hi Dongjin and Jason,
> > > > > >
> > > > > > I would agree. My summary:
> > > > > >
> > > > > > 1. Support zstd with message format 2 only.
> > > > > > 2. Bump produce and fetch request versions.
> > > > > > 3. Provide broker errors whenever possible based on the request
> > > version
> > > > > and
> > > > > > rely on clients for the cases where the broker can't validate
> > > > efficiently
> > > > > > (example message format 2 consumer that supports the latest fetch
> > > > version
> > > > > > but doesn't support zstd).
> > > > > >
> > > > > > If there's general agreement on this, I suggest we update the KIP
> > to
> > > > > state
> > > > > > the proposal and to move the rejected options to its own section.
> > And
> > > > > then
> > > > > > start a vote!
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
> > jason@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Dongjin,
> > > > > > >
> > > > > > > Yes, that's a good summary. For clients which support v2, the
> > > client
> > > > > can
> > > > > > > parse the message format and hopefully raise a useful error
> > message
> > > > > > > indicating the unsupported compression type. For older clients,
> > our
> > > > > > options
> > > > > > > are probably (1) to down-convert to the old format using no
> > > > compression
> > > > > > > type, or (2) to return an error code. I'm leaning toward the
> > latter
> > > > as
> > > > > > the
> > > > > > > simpler solution, but the challenge is finding a good error
> code.
> > > Two
> > > > > > > possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE. The
> > > > downside
> > > > > > is
> > > > > > > that old clients probably won't get a helpful message. However,
> > at
> > > > > least
> > > > > > > the behavior will be consistent in the sense that all clients
> > will
> > > > fail
> > > > > > if
> > > > > > > they do not support zstandard.
> > > > > > >
> > > > > > > What do you think?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <
> dongjin@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks Jason, I reviewed the down-converting logic following
> > your
> > > > > > > > explanation.[^1] You mean the following routines, right?
> > > > > > > >
> > > > > > > > -
> > > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/
> > > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> > > > > > > > -
> > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > > > > main/java/org/apache/kafka/common/record/
> > > LazyDownConversionRecords.
> > > > > > > > java#L165
> > > > > > > > -
> > > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > > > > main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > > > > > > >
> > > > > > > > It seems like your stance is like following:
> > > > > > > >
> > > > > > > > 1. In principle, Kafka does not change the compression codec
> > when
> > > > > > > > down-converting, since it requires inspecting the fetched
> data,
> > > > which
> > > > > > is
> > > > > > > > expensive.
> > > > > > > > 2. However, there are some cases the fetched data is
> inspected
> > > > > anyway.
> > > > > > In
> > > > > > > > this case, we can provide compression conversion from
> Zstandard
> > > to
> > > > > > > > classical ones[^2].
> > > > > > > >
> > > > > > > > And from what I understand, the cases where the client
> without
> > > > > > ZStandard
> > > > > > > > support receives ZStandard compressed records can be
> organized
> > > into
> > > > > two
> > > > > > > > cases:
> > > > > > > >
> > > > > > > > a. The 'compression.type' configuration of given topic is
> > > > 'producer'
> > > > > > and
> > > > > > > > the producer compressed the records with ZStandard. (that is,
> > > using
> > > > > > > > ZStandard implicitly.)
> > > > > > > > b.  The 'compression.type' configuration of given topic is
> > > 'zstd';
> > > > > that
> > > > > > > is,
> > > > > > > > using ZStandard explicitly.
> > > > > > > >
> > > > > > > > As you stated, we don't have to handle the case b specially.
> > So,
> > > It
> > > > > > seems
> > > > > > > > like we can narrow the focus of the problem by joining case 1
> > and
> > > > > case
> > > > > > b
> > > > > > > > like the following:
> > > > > > > >
> > > > > > > > > Given the topic with 'producer' as its 'compression.type'
> > > > > > > configuration,
> > > > > > > > ZStandard compressed records and old client without
> ZStandard,
> > is
> > > > > there
> > > > > > > any
> > > > > > > > case we need to inspect the records and can change the
> > > compression
> > > > > > type?
> > > > > > > If
> > > > > > > > so, can we provide compression type converting?
> > > > > > > >
> > > > > > > > Do I understand correctly?
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Dongjin
> > > > > > > >
> > > > > > > > [^1]: I'm sorry, I found that I was a little bit
> > misunderstanding
> > > > how
> > > > > > API
> > > > > > > > version works, after reviewing the downconvert logic & the
> > > protocol
> > > > > > > > documentation <https://kafka.apache.org/protocol>.
> > > > > > > > [^2]: None, Gzip, Snappy, Lz4.
> > > > > > > >
> > > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
> > > > jason@confluent.io>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > But in my opinion, since the client will fail with the
> API
> > > > > version,
> > > > > > > so
> > > > > > > > we
> > > > > > > > > > don't need to down-convert the messages anyway. Isn't it?
> > > So, I
> > > > > > think
> > > > > > > > we
> > > > > > > > > > don't care about this case. (I'm sorry, I am not familiar
> > > with
> > > > > > > > > down-convert
> > > > > > > > > > logic.)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Currently the broker down-converts automatically when it
> > > receives
> > > > > an
> > > > > > > old
> > > > > > > > > version of the fetch request (a version which is known to
> > > predate
> > > > > the
> > > > > > > > > message format in use). Typically when down-converting the
> > > > message
> > > > > > > > format,
> > > > > > > > > we use the same compression type, but there is not much
> point
> > > in
> > > > > > doing
> > > > > > > so
> > > > > > > > > when we know the client doesn't support it. So if zstandard
> > is
> > > in
> > > > > > use,
> > > > > > > > and
> > > > > > > > > we have to down-convert anyway, then we can choose to use a
> > > > > different
> > > > > > > > > compression type or no compression type.
> > > > > > > > >
> > > > > > > > > From my perspective, there is no significant downside to
> > > bumping
> > > > > the
> > > > > > > > > protocol version and it has several potential benefits.
> > Version
> > > > > bumps
> > > > > > > are
> > > > > > > > > cheap. The main question mark in my mind is about
> > > > down-conversion.
> > > > > > > > Figuring
> > > > > > > > > out whether down-conversion is needed is hard generally
> > without
> > > > > > > > inspecting
> > > > > > > > > the fetched data, which is expensive. I think we agree in
> > > > principle
> > > > > > > that
> > > > > > > > we
> > > > > > > > > do not want to have to pay this cost generally and prefer
> the
> > > > > clients
> > > > > > > to
> > > > > > > > > fail when they see an unhandled compression type. The
> point I
> > > was
> > > > > > > making
> > > > > > > > is
> > > > > > > > > that there are some cases where we are either inspecting
> the
> > > data
> > > > > > > anyway
> > > > > > > > > (because we have to down-convert the message format), or we
> > > have
> > > > an
> > > > > > > easy
> > > > > > > > > way to tell whether zstandard is in use (the topic has it
> > > > > configured
> > > > > > > > > explicitly). In the latter case, we don't have to handle it
> > > > > > specially.
> > > > > > > > But
> > > > > > > > > we do have to decide how we will handle down-conversion to
> > > older
> > > > > > > formats.
> > > > > > > > >
> > > > > > > > > -Jason
> > > > > > > > >
> > > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
> > > dongjin@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Colin and Jason,
> > > > > > > > > >
> > > > > > > > > > Thanks for your opinions. In summarizing, the Pros and
> Cons
> > > of
> > > > > > > bumping
> > > > > > > > > > fetch API version are:
> > > > > > > > > >
> > > > > > > > > > Cons:
> > > > > > > > > >
> > > > > > > > > > - The Broker can't know whether a given message batch is
> > > > > compressed
> > > > > > > > with
> > > > > > > > > > zstd or not.
> > > > > > > > > > - Need some additional logic for the topic explicitly
> > > > configured
> > > > > to
> > > > > > > use
> > > > > > > > > > zstd.
> > > > > > > > > >
> > > > > > > > > > Pros:
> > > > > > > > > >
> > > > > > > > > > - The broker doesn't need to conduct expensive
> > > down-conversion.
> > > > > > > > > > - Can message the users to update their client.
> > > > > > > > > >
> > > > > > > > > > So, opinions for the backward-compatibility policy by
> far:
> > > > > > > > > >
> > > > > > > > > > - A: bump the API version - +2 (Colin, Jason)
> > > > > > > > > > - B: leave unchanged - +1 (Viktor)
> > > > > > > > > >
> > > > > > > > > > Here are my additional comments:
> > > > > > > > > >
> > > > > > > > > > @Colin
> > > > > > > > > >
> > > > > > > > > > I greatly appreciate your response. In the case of the
> > > > dictionary
> > > > > > > > > support,
> > > > > > > > > > of course, this issue should be addressed later so we
> don't
> > > > need
> > > > > it
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > first version. You are right - it is not late to try it
> > after
> > > > > some
> > > > > > > > > > benchmarks. What I mean is, we should keep in mind on
> that
> > > > > > potential
> > > > > > > > > > feature.
> > > > > > > > > >
> > > > > > > > > > @Jason
> > > > > > > > > >
> > > > > > > > > > You wrote,
> > > > > > > > > >
> > > > > > > > > > > Similarly, if we have to down-convert anyway because
> the
> > > > client
> > > > > > > does
> > > > > > > > > not
> > > > > > > > > > understand the message format, then we could also use a
> > > > different
> > > > > > > > > > compression type.
> > > > > > > > > >
> > > > > > > > > > But in my opinion, since the client will fail with the
> API
> > > > > version,
> > > > > > > so
> > > > > > > > we
> > > > > > > > > > don't need to down-convert the messages anyway. Isn't it?
> > > So, I
> > > > > > think
> > > > > > > > we
> > > > > > > > > > don't care about this case. (I'm sorry, I am not familiar
> > > with
> > > > > > > > > down-convert
> > > > > > > > > > logic.)
> > > > > > > > > >
> > > > > > > > > > Please give more opinions. Thanks!
> > > > > > > > > >
> > > > > > > > > > - Dongjin
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
> > > > > jason@confluent.io
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hey Colin,
> > > > > > > > > > >
> > > > > > > > > > > The problem for the fetch API is that the broker does
> not
> > > > > > generally
> > > > > > > > > know
> > > > > > > > > > if
> > > > > > > > > > > a batch was compressed with zstd unless it parses it. I
> > > think
> > > > > the
> > > > > > > > goal
> > > > > > > > > > here
> > > > > > > > > > > is to avoid the expensive down-conversion that is
> needed
> > to
> > > > > > ensure
> > > > > > > > > > > compatibility because it is only necessary if zstd is
> > > > actually
> > > > > in
> > > > > > > > use.
> > > > > > > > > > But
> > > > > > > > > > > as long as old clients can parse the message format,
> they
> > > > > should
> > > > > > > get
> > > > > > > > a
> > > > > > > > > > > reasonable error if they see an unsupported compression
> > > type
> > > > in
> > > > > > the
> > > > > > > > > > > attributes. Basically the onus is on users to ensure
> that
> > > > their
> > > > > > > > > consumers
> > > > > > > > > > > have been updated prior to using zstd. It seems like a
> > > > > reasonable
> > > > > > > > > > tradeoff
> > > > > > > > > > > to me. There are a couple cases that might be worth
> > > thinking
> > > > > > > through:
> > > > > > > > > > >
> > > > > > > > > > > 1. If a topic is explicitly configured to use zstd,
> then
> > we
> > > > > don't
> > > > > > > > need
> > > > > > > > > to
> > > > > > > > > > > check the fetched data for the compression type to know
> > if
> > > we
> > > > > > need
> > > > > > > > > > > down-conversion. If we did bump the Fetch API version,
> > then
> > > > we
> > > > > > > could
> > > > > > > > > > handle
> > > > > > > > > > > this case by either down-converting using a different
> > > > > compression
> > > > > > > > type
> > > > > > > > > or
> > > > > > > > > > > returning an error.
> > > > > > > > > > > 2. Similarly, if we have to down-convert anyway because
> > the
> > > > > > client
> > > > > > > > does
> > > > > > > > > > not
> > > > > > > > > > > understand the message format, then we could also use a
> > > > > different
> > > > > > > > > > > compression type.
> > > > > > > > > > >
> > > > > > > > > > > For the produce API, I think it's reasonable to bump
> the
> > > api
> > > > > > > version.
> > > > > > > > > > This
> > > > > > > > > > > can be used by clients to check whether a broker
> supports
> > > > zstd.
> > > > > > For
> > > > > > > > > > > example, we might support a list of preferred
> compression
> > > > types
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > producer and we could use the broker to detect which
> > > version
> > > > to
> > > > > > > use.
> > > > > > > > > > >
> > > > > > > > > > > -Jason
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
> > > > > cmccabe@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a good
> > > > compression
> > > > > > > codec
> > > > > > > > > > and I
> > > > > > > > > > > > hope we can get this support in soon!
> > > > > > > > > > > >
> > > > > > > > > > > > I would say we can just bump the API version to
> > indicate
> > > > that
> > > > > > > ZStd
> > > > > > > > > > > support
> > > > > > > > > > > > is expected in new clients.  We probably need some
> way
> > of
> > > > > > > > indicating
> > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > > older clients that they can't consume the partitions,
> > as
> > > > > well.
> > > > > > > > > Perhaps
> > > > > > > > > > > we
> > > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > > > > > > > > > > >
> > > > > > > > > > > > The license thing seems straightforward -- it's just
> a
> > > > matter
> > > > > > of
> > > > > > > > > adding
> > > > > > > > > > > > the text to the right files as per ASF guidelines.
> > > > > > > > > > > >
> > > > > > > > > > > > With regard to the dictionary support, do we really
> > need
> > > > that
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > first
> > > > > > > > > > > > version?  Hopefully message batches are big enough
> that
> > > > this
> > > > > > > isn't
> > > > > > > > > > > needed.
> > > > > > > > > > > > Some benchmarks might help here.
> > > > > > > > > > > >
> > > > > > > > > > > > best,
> > > > > > > > > > > > Colin
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot this
> issue,
> > > > > KIP-110
> > > > > > > > > > > > > <https://cwiki.apache.org/
> > > confluence/display/KAFKA/KIP-
> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > > > > > > > > > > .
> > > > > > > > > > > > >
> > > > > > > > > > > > > For newcomers, Here is some summary of the history:
> > > > KIP-110
> > > > > > was
> > > > > > > > > > > > originally
> > > > > > > > > > > > > worked for the issue KAFKA-4514 but, it lacked
> > > benchmark
> > > > > > > results
> > > > > > > > to
> > > > > > > > > > get
> > > > > > > > > > > > the
> > > > > > > > > > > > > agreement of the community. Later, Ivan Babrou and
> > some
> > > > > other
> > > > > > > > users
> > > > > > > > > > who
> > > > > > > > > > > > > adopted the patch provided their excellent
> > performance
> > > > > report
> > > > > > > > which
> > > > > > > > > > is
> > > > > > > > > > > > now
> > > > > > > > > > > > > included in the KIP, but it postponed again because
> > of
> > > > the
> > > > > > > > > community
> > > > > > > > > > > was
> > > > > > > > > > > > > busy for 2.0.0 release. It is why I now reboot this
> > > > issue.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The following is the current status of the feature:
> > You
> > > > can
> > > > > > > check
> > > > > > > > > the
> > > > > > > > > > > > > current draft implementation here
> > > > > > > > > > > > > <https://github.com/apache/kafka/pull/2267>. It is
> > > based
> > > > > on
> > > > > > > zstd
> > > > > > > > > > 1.3.5
> > > > > > > > > > > > and
> > > > > > > > > > > > > periodically rebased onto the latest trunk[^1].
> > > > > > > > > > > > >
> > > > > > > > > > > > > The issues that should be addressed is like
> > following:
> > > > > > > > > > > > >
> > > > > > > > > > > > > *1. Backward Compatibility*
> > > > > > > > > > > > >
> > > > > > > > > > > > > To support old consumers, we need to take a
> strategy
> > to
> > > > > > handle
> > > > > > > > the
> > > > > > > > > > old
> > > > > > > > > > > > > consumers. Current candidates are:
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Bump API version
> > > > > > > > > > > > > - Leave unchanged: let the old clients fail.
> > > > > > > > > > > > > - Improve the error messages:
> > > > > > > > > > > > >
> > > > > > > > > > > > > *2. Dictionary Support*
> > > > > > > > > > > > >
> > > > > > > > > > > > > To support zstd's dictionary feature in the future
> > (if
> > > > > > needed),
> > > > > > > > we
> > > > > > > > > > need
> > > > > > > > > > > > to
> > > > > > > > > > > > > sketch how it should be and leave some room for it.
> > As
> > > of
> > > > > > now,
> > > > > > > > > there
> > > > > > > > > > > has
> > > > > > > > > > > > > been no discussion on this topic yet.
> > > > > > > > > > > > >
> > > > > > > > > > > > > *3. License*
> > > > > > > > > > > > >
> > > > > > > > > > > > > To use this feature, we need to add license of zstd
> > and
> > > > > > > zstd-jni
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > > > project. (Thanks to Viktor Somogyi for raising this
> > > > issue!)
> > > > > > It
> > > > > > > > > seems
> > > > > > > > > > > like
> > > > > > > > > > > > > what Apache Spark did would be a good example but
> > there
> > > > has
> > > > > > > been
> > > > > > > > no
> > > > > > > > > > > > > discussion yet.
> > > > > > > > > > > > >
> > > > > > > > > > > > > You can find the details of the above issues in the
> > KIP
> > > > > > > document.
> > > > > > > > > > > Please
> > > > > > > > > > > > > have a look when you are free, and give me
> feedback.
> > > All
> > > > > > kinds
> > > > > > > of
> > > > > > > > > > > > > participating are welcome.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Dongjin
> > > > > > > > > > > > >
> > > > > > > > > > > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
> > > > > > > dongjin@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry for the late reply.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In short, I could not submit the updated KIP by
> the
> > > > > feature
> > > > > > > > > freeze
> > > > > > > > > > > > > > deadline of 2.0.0. For this reason, it will not
> be
> > > > > included
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > 2.0.0
> > > > > > > > > > > > > > release and all discussion for this issue were
> > > > postponed
> > > > > > > after
> > > > > > > > > the
> > > > > > > > > > > > release
> > > > > > > > > > > > > > of 2.0.0.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have been updating the PR following recent
> > updates.
> > > > > Just
> > > > > > > > now, I
> > > > > > > > > > > > rebased
> > > > > > > > > > > > > > it against the latest trunk and updated the zstd
> > > > version
> > > > > > into
> > > > > > > > > > 1.3.5.
> > > > > > > > > > > > If you
> > > > > > > > > > > > > > need some request, don't hesitate to notify me.
> > (But
> > > > not
> > > > > > this
> > > > > > > > > > thread
> > > > > > > > > > > -
> > > > > > > > > > > > just
> > > > > > > > > > > > > > send me the message directly.)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Dongjin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <
> > > > > > > bobby@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> I there any update on this.  The performance
> > > > > improvements
> > > > > > > are
> > > > > > > > > > quite
> > > > > > > > > > > > > >> impressive and I really would like to stop
> forking
> > > > kafka
> > > > > > > just
> > > > > > > > to
> > > > > > > > > > get
> > > > > > > > > > > > this
> > > > > > > > > > > > > >> in.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Bobby
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
> > > > > > > > dongjin@apache.org
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > Ismael,
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Oh, I forgot all of you are on working frenzy
> > for
> > > > 2.0!
> > > > > > No
> > > > > > > > > > problem,
> > > > > > > > > > > > take
> > > > > > > > > > > > > >> > your time. I am also working at another issue
> > now.
> > > > > Thank
> > > > > > > you
> > > > > > > > > for
> > > > > > > > > > > > > >> letting me
> > > > > > > > > > > > > >> > know.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Best,
> > > > > > > > > > > > > >> > Dongjin
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <
> > > > > > > > ismaelj@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > > Sorry for the delay Dongjin. Everyone is
> busy
> > > > > > finalising
> > > > > > > > > > 2.0.0.
> > > > > > > > > > > > This
> > > > > > > > > > > > > >> KIP
> > > > > > > > > > > > > >> > > seems like a great candidate for 2.1.0 and
> > > > hopefully
> > > > > > > there
> > > > > > > > > > will
> > > > > > > > > > > be
> > > > > > > > > > > > > >> more
> > > > > > > > > > > > > >> > of
> > > > > > > > > > > > > >> > > a discussion next week. :)
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > Ismael
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
> > > > > > > > dongjin@apache.org
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > > Hello. I just updated my draft
> > implementation:
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit
> 5145d6b)
> > > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > You can check out the implementation from
> > here
> > > > > > > > > > > > > >> > > > <
> https://github.com/apache/kafka/pull/2267
> > >.
> > > If
> > > > > you
> > > > > > > > > > > experience
> > > > > > > > > > > > any
> > > > > > > > > > > > > >> > > problem
> > > > > > > > > > > > > >> > > > running it, don't hesitate to give me a
> > > mention.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Best,
> > > > > > > > > > > > > >> > > > Dongjin
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin
> Lee
> > <
> > > > > > > > > > > dongjin@apache.org
> > > > > > > > > > > > >
> > > > > > > > > > > > > >> > wrote:
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > > Here is the short conclusion about the
> > > license
> > > > > > > > problem:
> > > > > > > > > > *We
> > > > > > > > > > > > can
> > > > > > > > > > > > > >> use
> > > > > > > > > > > > > >> > > zstd
> > > > > > > > > > > > > >> > > > > and zstd-jni without any problem, but we
> > > need
> > > > to
> > > > > > > > include
> > > > > > > > > > > their
> > > > > > > > > > > > > >> > license,
> > > > > > > > > > > > > >> > > > > e.g., BSD license.*
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause
> > > > License
> > > > > > > > requires
> > > > > > > > > > to
> > > > > > > > > > > > > >> include
> > > > > > > > > > > > > >> > the
> > > > > > > > > > > > > >> > > > > license used, and BSD 3 Clause License
> > > > requires
> > > > > > that
> > > > > > > > the
> > > > > > > > > > > name
> > > > > > > > > > > > of
> > > > > > > > > > > > > >> the
> > > > > > > > > > > > > >> > > > > contributor can't be used to endorse or
> > > > promote
> > > > > > the
> > > > > > > > > > product.
> > > > > > > > > > > > > >> That's
> > > > > > > > > > > > > >> > it
> > > > > > > > > > > > > >> > > > > <
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > http://www.mikestratton.net/2011/12/is-bsd-license-
> > > > > > > > > > > > compatible-with-apache-2-0-license/
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > - They are not listed in the list of
> > > > prohibited
> > > > > > > > licenses
> > > > > > > > > > > > > >> > > > > <https://www.apache.org/legal/
> > > > > > > > resolved.html#category-x>
> > > > > > > > > > > also.
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > Here is how Spark did for it
> > > > > > > > > > > > > >> > > > > <https://issues.apache.org/
> > > > > > jira/browse/SPARK-19112
> > > > > > > >:
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > - They made a directory dedicated to the
> > > > > > dependency
> > > > > > > > > > license
> > > > > > > > > > > > files
> > > > > > > > > > > > > >> > > > > <
> > > > > > > https://github.com/apache/spark/tree/master/licenses
> > > > > > > > >
> > > > > > > > > > and
> > > > > > > > > > > > added
> > > > > > > > > > > > > >> > > > licenses
> > > > > > > > > > > > > >> > > > > for Zstd
> > > > > > > > > > > > > >> > > > > <
> > > > > > > > > > > > > >> >
> > > > https://github.com/apache/spark/blob/master/licenses/
> > > > > > > > > > > > LICENSE-zstd.txt
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > &
> > > > > > > > > > > > > >> > > > > Zstd-jni
> > > > > > > > > > > > > >> > > > > <
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> https://github.com/apache/
> > > spark/blob/master/licenses/
> > > > > > > > > > > > LICENSE-zstd-jni.txt
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > > > > .
> > > > > > > > > > > > > >> > > > > - Added a link to the original license
> > files
> > > > in
> > > > > > > > LICENSE.
> > > > > > > > > > > > > >> > > > > <
> > > > > https://github.com/apache/spark/pull/18805/files
> > > > > > >
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > If needed, I can make a similar update.
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > Thanks for pointing out this problem,
> > > Viktor!
> > > > > Nice
> > > > > > > > > catch!
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > Best,
> > > > > > > > > > > > > >> > > > > Dongjin
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin
> > > Lee <
> > > > > > > > > > > > dongjin@apache.org>
> > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > >> I greatly appreciate your comprehensive
> > > > > > reasoning.
> > > > > > > > so:
> > > > > > > > > +1
> > > > > > > > > > > > for b
> > > > > > > > > > > > > >> > until
> > > > > > > > > > > > > >> > > > now.
> > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > >> > > > >> For the license issues, I will have a
> > check
> > > > on
> > > > > > how
> > > > > > > > the
> > > > > > > > > > over
> > > > > > > > > > > > > >> projects
> > > > > > > > > > > > > >> > > are
> > > > > > > > > > > > > >> > > > >> doing and share the results.
> > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > >> > > > >> Best,
> > > > > > > > > > > > > >> > > > >> Dongjin
> > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor
> > > > > Somogyi <
> > > > > > > > > > > > > >> > > > viktorsomogyi@gmail.com>
> > > > > > > > > > > > > >> > > > >> wrote:
> > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > >> > > > >>> Hi Dongjin,
> > > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > > >> > > > >>> A couple of comments:
> > > > > > > > > > > > > >> > > > >>> I would vote for option b. in the
> > > "backward
> > > > > > > > > > compatibility"
> > > > > > > > > > > > > >> section.
> > > > > > > > > > > > > >> > > My
> > > > > > > > > > > > > >> > > > >>> reasoning for this is that users
> > upgrading
> > > > to
> > > > > a
> > > > > > > zstd
> > > > > > > > > > > > compatible
> > > > > > > > > > > > > >> > > version
> > > > > > > > > > > > > >> > > > >>> won't start to use it automatically,
> so
> > > > manual
> > > > > > > > > > > > reconfiguration
> > > > > > > > > > > > > >> is
> > > > > > > > > > > > > >> > > > >>> required.
> > > > > > > > > > > > > >> > > > >>> Therefore an upgrade won't mess up the
> > > > > cluster.
> > > > > > If
> > > > > > > > not
> > > > > > > > > > all
> > > > > > > > > > > > the
> > > > > > > > > > > > > >> > > clients
> > > > > > > > > > > > > >> > > > >>> are
> > > > > > > > > > > > > >> > > > >>> upgraded but just some of them and
> > they'd
> > > > > start
> > > > > > to
> > > > > > > > use
> > > > > > > > > > > zstd
> > > > > > > > > > > > > >> then it
> > > > > > > > > > > > > >> > > > would
> > > > > > > > > > > > > >> > > > >>> cause errors in the cluster. I'd like
> to
> > > > > presume
> > > > > > > > > though
> > > > > > > > > > > that
> > > > > > > > > > > > > >> this
> > > > > > > > > > > > > >> > is
> > > > > > > > > > > > > >> > > a
> > > > > > > > > > > > > >> > > > >>> very
> > > > > > > > > > > > > >> > > > >>> obvious failure case and nobody should
> > be
> > > > > > > surprised
> > > > > > > > if
> > > > > > > > > > it
> > > > > > > > > > > > didn't
> > > > > > > > > > > > > >> > > work.
> > > > > > > > > > > > > >> > > > >>> I wouldn't choose a. as I think we
> > should
> > > > bump
> > > > > > the
> > > > > > > > > fetch
> > > > > > > > > > > and
> > > > > > > > > > > > > >> > produce
> > > > > > > > > > > > > >> > > > >>> requests if it's a change in the
> message
> > > > > format.
> > > > > > > > > > Moreover
> > > > > > > > > > > if
> > > > > > > > > > > > > >> some
> > > > > > > > > > > > > >> > of
> > > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > > >> > > > >>> producers and the brokers are upgraded
> > but
> > > > > some
> > > > > > of
> > > > > > > > the
> > > > > > > > > > > > consumers
> > > > > > > > > > > > > >> > are
> > > > > > > > > > > > > >> > > > not,
> > > > > > > > > > > > > >> > > > >>> then we wouldn't prevent the error
> when
> > > the
> > > > > old
> > > > > > > > > consumer
> > > > > > > > > > > > tries
> > > > > > > > > > > > > >> to
> > > > > > > > > > > > > >> > > > consume
> > > > > > > > > > > > > >> > > > >>> the zstd compressed messages.
> > > > > > > > > > > > > >> > > > >>> I wouldn't choose c. either as I think
> > > > binding
> > > > > > the
> > > > > > > > > > > > compression
> > > > > > > > > > > > > >> type
> > > > > > > > > > > > > >> > > to
> > > > > > > > > > > > > >> > > > an
> > > > > > > > > > > > > >> > > > >>> API is not so obvious from the
> > developer's
> > > > > > > > > perspective.
> > > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > > >> > > > >>> I would also prefer to use the
> existing
> > > > > binding,
> > > > > > > > > however
> > > > > > > > > > > we
> > > > > > > > > > > > must
> > > > > > > > > > > > > >> > > > respect
> > > > > > > > > > > > > >> > > > >>> the licenses:
> > > > > > > > > > > > > >> > > > >>> "The code for these JNI bindings is
> > > licenced
> > > > > > under
> > > > > > > > > > > 2-clause
> > > > > > > > > > > > BSD
> > > > > > > > > > > > > >> > > > license.
> > > > > > > > > > > > > >> > > > >>> The native Zstd library is licensed
> > under
> > > > > > 3-clause
> > > > > > > > BSD
> > > > > > > > > > > > license
> > > > > > > > > > > > > >> and
> > > > > > > > > > > > > >> > > > GPL2"
> > > > > > > > > > > > > >> > > > >>> Based on the FAQ page
> > > > > > > > > > > > > >> > > > >>> https://www.apache.org/legal/
> > > > > > > > resolved.html#category-a
> > > > > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD
> licenses
> > > but
> > > > > the
> > > > > > > > Apache
> > > > > > > > > > > > license
> > > > > > > > > > > > > >> is
> > > > > > > > > > > > > >> > not
> > > > > > > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping that
> > the
> > > > > > > "3-clause
> > > > > > > > > BSD
> > > > > > > > > > > > license
> > > > > > > > > > > > > >> and
> > > > > > > > > > > > > >> > > > GPL2"
> > > > > > > > > > > > > >> > > > >>> is really not an AND but an OR in this
> > > case,
> > > > > but
> > > > > > > I'm
> > > > > > > > > no
> > > > > > > > > > > > lawyer,
> > > > > > > > > > > > > >> > just
> > > > > > > > > > > > > >> > > > >>> wanted
> > > > > > > > > > > > > >> > > > >>> to make the point that we should watch
> > out
> > > > for
> > > > > > > > > licenses.
> > > > > > > > > > > :)
> > > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > > >> > > > >>> Regards,
> > > > > > > > > > > > > >> > > > >>> Viktor
> > > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan
> > > Babrou
> > > > <
> > > > > > > > > > > > ibobrik@gmail.com>
> > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > > >> > > > >>> > Hello,
> > > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > > >> > > > >>> > This is Ivan and I still very much
> > > support
> > > > > the
> > > > > > > > fact
> > > > > > > > > > that
> > > > > > > > > > > > zstd
> > > > > > > > > > > > > >> > > > >>> compression
> > > > > > > > > > > > > >> > > > >>> > should be included out of the box.
> > > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > > >> > > > >>> > Please think about the environment,
> > you
> > > > can
> > > > > > save
> > > > > > > > > > quite a
> > > > > > > > > > > > lot
> > > > > > > > > > > > > >> of
> > > > > > > > > > > > > >> > > > >>> hardware
> > > > > > > > > > > > > >> > > > >>> > with it.
> > > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > > >> > > > >>> > Thank you.
> > > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin
> > > Lee <
> > > > > > > > > > > > dongjin@apache.org>
> > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > > >> > > > >>> > > Since there are no responses for a
> > > > week, I
> > > > > > > > decided
> > > > > > > > > > to
> > > > > > > > > > > > > >> > reinitiate
> > > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > > >> > > > >>> > > discussion thread.
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> > > This KIP is about to introduce
> > > ZStandard
> > > > > > > > > Compression
> > > > > > > > > > > > into
> > > > > > > > > > > > > >> > Apache
> > > > > > > > > > > > > >> > > > >>> Kafka.
> > > > > > > > > > > > > >> > > > >>> > > The reason why it is posted again
> > has
> > > a
> > > > > > story:
> > > > > > > > It
> > > > > > > > > > was
> > > > > > > > > > > > > >> > originally
> > > > > > > > > > > > > >> > > > >>> posted
> > > > > > > > > > > > > >> > > > >>> > to
> > > > > > > > > > > > > >> > > > >>> > > the dev mailing list more than one
> > > year
> > > > > ago
> > > > > > > but
> > > > > > > > > > since
> > > > > > > > > > > it
> > > > > > > > > > > > > >> has no
> > > > > > > > > > > > > >> > > > >>> > performance
> > > > > > > > > > > > > >> > > > >>> > > report included, it was postponed
> > > later.
> > > > > But
> > > > > > > > Some
> > > > > > > > > > > people
> > > > > > > > > > > > > >> > > (including
> > > > > > > > > > > > > >> > > > >>> Ivan)
> > > > > > > > > > > > > >> > > > >>> > > reported excellent performance
> > report
> > > > with
> > > > > > the
> > > > > > > > > draft
> > > > > > > > > > > PR,
> > > > > > > > > > > > > >> this
> > > > > > > > > > > > > >> > > work
> > > > > > > > > > > > > >> > > > >>> is now
> > > > > > > > > > > > > >> > > > >>> > > reactivated.
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> > > The updated KIP document includes
> > some
> > > > > > > expected
> > > > > > > > > > > > problems and
> > > > > > > > > > > > > >> > > their
> > > > > > > > > > > > > >> > > > >>> > > candidate alternatives. Please
> have
> > a
> > > > look
> > > > > > > when
> > > > > > > > > you
> > > > > > > > > > > are
> > > > > > > > > > > > > >> free,
> > > > > > > > > > > > > >> > and
> > > > > > > > > > > > > >> > > > >>> give
> > > > > > > > > > > > > >> > > > >>> > me a
> > > > > > > > > > > > > >> > > > >>> > > feedback. All kinds of
> participating
> > > are
> > > > > > > > welcome.
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> > > Best,
> > > > > > > > > > > > > >> > > > >>> > > Dongjin
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> > > --
> > > > > > > > > > > > > >> > > > >>> > > *Dongjin Lee*
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> > > *A hitchhiker in the mathematical
> > > > world.*
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> > > *github:  <http://goog_969573159/
> > > > >github
> > > > > > > > > > > > .com/dongjinleekr
> > > > > > > > > > > > > >> > > > >>> > > <http://github.com/dongjinleekr
> > > > >linkedin:
> > > > > > > > > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/
> > > dongjinleekr
> > > > > > > > > >slideshare:
> > > > > > > > > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > > >> > > > >>> > > <http://www.slideshare.net/
> > > dongjinleekr
> > > > >*
> > > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > > >> > > > >> --
> > > > > > > > > > > > > >> > > > >> *Dongjin Lee*
> > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > >> > > > >> *A hitchhiker in the mathematical
> world.*
> > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > >> > > > >> *github:  <http://goog_969573159/
> >github
> > > > > > > > > > .com/dongjinleekr
> > > > > > > > > > > > > >> > > > >> <http://github.com/dongjinleekr
> > >linkedin:
> > > > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > > > >> > > > >> <
> http://kr.linkedin.com/in/dongjinleekr
> > > > > > > >slideshare:
> > > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > > >> > > > >> <
> http://www.slideshare.net/dongjinleekr
> > >*
> > > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > --
> > > > > > > > > > > > > >> > > > > *Dongjin Lee*
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > *A hitchhiker in the mathematical
> world.*
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > > > *github:  <http://goog_969573159/>
> > > > > > > > > github.com/dongjinleekr
> > > > > > > > > > > > > >> > > > > <http://github.com/dongjinleekr
> >linkedin:
> > > > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr
> > > > > > >slideshare:
> > > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > > >> > > > > <http://www.slideshare.net/dongjinleekr
> >*
> > > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > >> > > > *Dongjin Lee*
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > *github:  <http://goog_969573159/>github
> > > > > > > > .com/dongjinleekr
> > > > > > > > > > > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr
> > > > > >slideshare:
> > > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> --
> > > > > > > > > > > > > >> *Dongjin Lee*
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> *github:  <http://goog_969573159/>
> > > > > github.com/dongjinleekr
> > > > > > > > > > > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > > > kr.linkedin.com/in/
> > > > > > > > > > > > dongjinleekr
> > > > > > > > > > > > > >> <http://kr.linkedin.com/in/dongjinleekr
> > >slideshare:
> > > > > > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > *Dongjin Lee*
> > > > > > > > > > > > >
> > > > > > > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > > > >
> > > > > > > > > > > > > *github:  <http://goog_969573159/>
> > > > github.com/dongjinleekr
> > > > > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > kr.linkedin.com/in/
> > > > > > > > > > > > dongjinleekr
> > > > > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
> >slideshare:
> > > > > > > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > *Dongjin Lee*
> > > > > > > > > > >
> > > > > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > >
> > > > > > > > > > > *github:  <http://goog_969573159/>
> > github.com/dongjinleekr
> > > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/
> > > > > > > > > > dongjinleekr
> > > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > > www.slideshare.net/
> > > > > > > > > > dongjinleekr
> > > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > *Dongjin Lee*
> > > > > > > > >
> > > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > >
> > > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/
> > > > > > > > dongjinleekr
> > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > www.slideshare.net/
> > > > > > > > dongjinleekr
> > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > >
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/
> > > dongjinleekr
> > > > <http://www.slideshare.net/dongjinleekr>*
> > > >
> > >
> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Ismael Juma <is...@gmail.com>.
Jason, that's an interesting point regarding the Java client. Do we know
what clients in other languages do in these cases?

Ismael

On Tue, 21 Aug 2018, 17:30 Jason Gustafson, <ja...@confluent.io> wrote:

> Hi Dongjin,
>
> One of the complications is that old versions of the API will not expect a
> new error code. However, since we expect this to be a fatal error anyway
> for old clients, it may still be more useful to return the correct error
> code. For example, the Kafka clients use the following code to convert the
> error code:
>
>     public static Errors forCode(short code) {
>         Errors error = codeToError.get(code);
>         if (error != null) {
>             return error;
>         } else {
>             log.warn("Unexpected error code: {}.", code);
>             return UNKNOWN_SERVER_ERROR;
>         }
>     }
>
> If we return an unsupported error code, it will be converted to an UNKNOWN
> error, but at least we will get the message in the log with the correct
> code. That seems preferable to returning a misleading error code. So I
> wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error even for
> older versions.
>
> Also, one question just to check my understanding. I think we would only
> use this error code when we /know/ that zstd was in use and the client
> doesn't support it? This is true if either 1) the message needs
> down-conversion and we encounter a zstd compressed message, or 2) if the
> topic is explicitly configured to use zstd. However, if the compression
> type is set to "producer," then the fetched data may or may not be
> compressed with zstd. In this case, we return the data to the client and
> expect it to fail parsing. Is that correct?
>
> Thanks,
> Jason
>
>
>
> On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <do...@apache.org> wrote:
>
> > Ismael, Jason and all,
> >
> > I rewrote the backward compatibility strategy & its alternatives like
> > following, based on Ismael & Jason's comments. Since it is not updated to
> > the wiki yet, don't hesitate to give me a message if you have any opinion
> > on it.
> >
> > ```
> > *Backward Compatibility*
> >
> > We need to establish some backward-compatibility strategy for the case an
> > old client subscribes a topic using ZStandard implicitly (i.e.,
> > 'compression.type' configuration of given topic is 'producer' and the
> > producer compressed the records with ZStandard). We have the following
> > options for this situation:
> >
> > *A. Support ZStandard to the old clients which can understand v0, v1
> > messages only.*
> >
> > This strategy necessarily requires the down-conversion of v2 message
> > compressed with Zstandard into v0 or v1 messages, which means a
> > considerable performance degradation. So we rejected this strategy.
> >
> > *B. Bump the API version and support only v2-available clients*
> >
> > With this approach, we can message the old clients that they are old and
> > should be upgraded. However, there are still several options for the
> Error
> > code.
> >
> > *B.1. INVALID_REQUEST (42)*
> >
> > This option gives the client so little information; the user can be
> > confused about why the client worked correctly in the past suddenly
> > encounters a problem. So we rejected this strategy.
> >
> > *B.2. CORRUPT_MESSAGE (2)*
> >
> > This option gives inaccurate information; the user can be surprised and
> > misunderstand that the log files are broken in some way. So we rejected
> > this strategy.
> >
> > *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
> >
> > The advantage of this approach is we don't need to define a new error
> code;
> > we can reuse it and that's all.
> >
> > The disadvantage of this approach is that it is also a little bit vague;
> > This error code is defined as a work for KIP-98[^1] and now returned in
> the
> > transaction error.
> >
> > *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
> >
> > The advantage of this approach is that it is clear and provides an exact
> > description. The disadvantage is we need to add a new error code.
> > ```
> >
> > *It seems like what we need to choose is now so clear:
> > UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or UNSUPPORTED_COMPRESSION_TYPE
> > (B.4).*
> > The first one doesn't need a new error message but the latter is more
> > explicit. Which one do you prefer? Since all of you have much more
> > experience and knowledge than me, I will follow your decision. The wiki
> > page will be updated following the decision also.
> >
> > Best,
> > Dongjin
> >
> > [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
> >
> > On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <is...@gmail.com> wrote:
> >
> > > Sounds reasonable to me.
> > >
> > > Ismael
> > >
> > > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <ja...@confluent.io>
> wrote:
> > >
> > > > Hey Ismael,
> > > >
> > > > Your summary looks good to me. I think it might also be a good idea
> to
> > > add
> > > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along with the
> > > version
> > > > bumps. We won't be able to use it for old api versions since the
> > clients
> > > > will not understand it, but we can use it going forward so that we're
> > not
> > > > stuck in a similar situation with a new message format and a new
> codec
> > to
> > > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_FORMAT, but
> > it
> > > is
> > > > not as explicit.
> > > >
> > > > -Jason
> > > >
> > > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > >
> > > > > Hi Dongjin and Jason,
> > > > >
> > > > > I would agree. My summary:
> > > > >
> > > > > 1. Support zstd with message format 2 only.
> > > > > 2. Bump produce and fetch request versions.
> > > > > 3. Provide broker errors whenever possible based on the request
> > version
> > > > and
> > > > > rely on clients for the cases where the broker can't validate
> > > efficiently
> > > > > (example message format 2 consumer that supports the latest fetch
> > > version
> > > > > but doesn't support zstd).
> > > > >
> > > > > If there's general agreement on this, I suggest we update the KIP
> to
> > > > state
> > > > > the proposal and to move the rejected options to its own section.
> And
> > > > then
> > > > > start a vote!
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <
> jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi Dongjin,
> > > > > >
> > > > > > Yes, that's a good summary. For clients which support v2, the
> > client
> > > > can
> > > > > > parse the message format and hopefully raise a useful error
> message
> > > > > > indicating the unsupported compression type. For older clients,
> our
> > > > > options
> > > > > > are probably (1) to down-convert to the old format using no
> > > compression
> > > > > > type, or (2) to return an error code. I'm leaning toward the
> latter
> > > as
> > > > > the
> > > > > > simpler solution, but the challenge is finding a good error code.
> > Two
> > > > > > possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE. The
> > > downside
> > > > > is
> > > > > > that old clients probably won't get a helpful message. However,
> at
> > > > least
> > > > > > the behavior will be consistent in the sense that all clients
> will
> > > fail
> > > > > if
> > > > > > they do not support zstandard.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <dongjin@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Thanks Jason, I reviewed the down-converting logic following
> your
> > > > > > > explanation.[^1] You mean the following routines, right?
> > > > > > >
> > > > > > > -
> > > > > > > https://github.com/apache/kafka/blob/trunk/core/src/
> > > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> > > > > > > -
> > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > > > main/java/org/apache/kafka/common/record/
> > LazyDownConversionRecords.
> > > > > > > java#L165
> > > > > > > -
> > > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > > > main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > > > > > >
> > > > > > > It seems like your stance is like following:
> > > > > > >
> > > > > > > 1. In principle, Kafka does not change the compression codec
> when
> > > > > > > down-converting, since it requires inspecting the fetched data,
> > > which
> > > > > is
> > > > > > > expensive.
> > > > > > > 2. However, there are some cases the fetched data is inspected
> > > > anyway.
> > > > > In
> > > > > > > this case, we can provide compression conversion from Zstandard
> > to
> > > > > > > classical ones[^2].
> > > > > > >
> > > > > > > And from what I understand, the cases where the client without
> > > > > ZStandard
> > > > > > > support receives ZStandard compressed records can be organized
> > into
> > > > two
> > > > > > > cases:
> > > > > > >
> > > > > > > a. The 'compression.type' configuration of given topic is
> > > 'producer'
> > > > > and
> > > > > > > the producer compressed the records with ZStandard. (that is,
> > using
> > > > > > > ZStandard implicitly.)
> > > > > > > b.  The 'compression.type' configuration of given topic is
> > 'zstd';
> > > > that
> > > > > > is,
> > > > > > > using ZStandard explicitly.
> > > > > > >
> > > > > > > As you stated, we don't have to handle the case b specially.
> So,
> > It
> > > > > seems
> > > > > > > like we can narrow the focus of the problem by joining case 1
> and
> > > > case
> > > > > b
> > > > > > > like the following:
> > > > > > >
> > > > > > > > Given the topic with 'producer' as its 'compression.type'
> > > > > > configuration,
> > > > > > > ZStandard compressed records and old client without ZStandard,
> is
> > > > there
> > > > > > any
> > > > > > > case we need to inspect the records and can change the
> > compression
> > > > > type?
> > > > > > If
> > > > > > > so, can we provide compression type converting?
> > > > > > >
> > > > > > > Do I understand correctly?
> > > > > > >
> > > > > > > Best,
> > > > > > > Dongjin
> > > > > > >
> > > > > > > [^1]: I'm sorry, I found that I was a little bit
> misunderstanding
> > > how
> > > > > API
> > > > > > > version works, after reviewing the downconvert logic & the
> > protocol
> > > > > > > documentation <https://kafka.apache.org/protocol>.
> > > > > > > [^2]: None, Gzip, Snappy, Lz4.
> > > > > > >
> > > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
> > > jason@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > >
> > > > > > > > > But in my opinion, since the client will fail with the API
> > > > version,
> > > > > > so
> > > > > > > we
> > > > > > > > > don't need to down-convert the messages anyway. Isn't it?
> > So, I
> > > > > think
> > > > > > > we
> > > > > > > > > don't care about this case. (I'm sorry, I am not familiar
> > with
> > > > > > > > down-convert
> > > > > > > > > logic.)
> > > > > > > >
> > > > > > > >
> > > > > > > > Currently the broker down-converts automatically when it
> > receives
> > > > an
> > > > > > old
> > > > > > > > version of the fetch request (a version which is known to
> > predate
> > > > the
> > > > > > > > message format in use). Typically when down-converting the
> > > message
> > > > > > > format,
> > > > > > > > we use the same compression type, but there is not much point
> > in
> > > > > doing
> > > > > > so
> > > > > > > > when we know the client doesn't support it. So if zstandard
> is
> > in
> > > > > use,
> > > > > > > and
> > > > > > > > we have to down-convert anyway, then we can choose to use a
> > > > different
> > > > > > > > compression type or no compression type.
> > > > > > > >
> > > > > > > > From my perspective, there is no significant downside to
> > bumping
> > > > the
> > > > > > > > protocol version and it has several potential benefits.
> Version
> > > > bumps
> > > > > > are
> > > > > > > > cheap. The main question mark in my mind is about
> > > down-conversion.
> > > > > > > Figuring
> > > > > > > > out whether down-conversion is needed is hard generally
> without
> > > > > > > inspecting
> > > > > > > > the fetched data, which is expensive. I think we agree in
> > > principle
> > > > > > that
> > > > > > > we
> > > > > > > > do not want to have to pay this cost generally and prefer the
> > > > clients
> > > > > > to
> > > > > > > > fail when they see an unhandled compression type. The point I
> > was
> > > > > > making
> > > > > > > is
> > > > > > > > that there are some cases where we are either inspecting the
> > data
> > > > > > anyway
> > > > > > > > (because we have to down-convert the message format), or we
> > have
> > > an
> > > > > > easy
> > > > > > > > way to tell whether zstandard is in use (the topic has it
> > > > configured
> > > > > > > > explicitly). In the latter case, we don't have to handle it
> > > > > specially.
> > > > > > > But
> > > > > > > > we do have to decide how we will handle down-conversion to
> > older
> > > > > > formats.
> > > > > > > >
> > > > > > > > -Jason
> > > > > > > >
> > > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
> > dongjin@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Colin and Jason,
> > > > > > > > >
> > > > > > > > > Thanks for your opinions. In summarizing, the Pros and Cons
> > of
> > > > > > bumping
> > > > > > > > > fetch API version are:
> > > > > > > > >
> > > > > > > > > Cons:
> > > > > > > > >
> > > > > > > > > - The Broker can't know whether a given message batch is
> > > > compressed
> > > > > > > with
> > > > > > > > > zstd or not.
> > > > > > > > > - Need some additional logic for the topic explicitly
> > > configured
> > > > to
> > > > > > use
> > > > > > > > > zstd.
> > > > > > > > >
> > > > > > > > > Pros:
> > > > > > > > >
> > > > > > > > > - The broker doesn't need to conduct expensive
> > down-conversion.
> > > > > > > > > - Can message the users to update their client.
> > > > > > > > >
> > > > > > > > > So, opinions for the backward-compatibility policy by far:
> > > > > > > > >
> > > > > > > > > - A: bump the API version - +2 (Colin, Jason)
> > > > > > > > > - B: leave unchanged - +1 (Viktor)
> > > > > > > > >
> > > > > > > > > Here are my additional comments:
> > > > > > > > >
> > > > > > > > > @Colin
> > > > > > > > >
> > > > > > > > > I greatly appreciate your response. In the case of the
> > > dictionary
> > > > > > > > support,
> > > > > > > > > of course, this issue should be addressed later so we don't
> > > need
> > > > it
> > > > > > in
> > > > > > > > the
> > > > > > > > > first version. You are right - it is not late to try it
> after
> > > > some
> > > > > > > > > benchmarks. What I mean is, we should keep in mind on that
> > > > > potential
> > > > > > > > > feature.
> > > > > > > > >
> > > > > > > > > @Jason
> > > > > > > > >
> > > > > > > > > You wrote,
> > > > > > > > >
> > > > > > > > > > Similarly, if we have to down-convert anyway because the
> > > client
> > > > > > does
> > > > > > > > not
> > > > > > > > > understand the message format, then we could also use a
> > > different
> > > > > > > > > compression type.
> > > > > > > > >
> > > > > > > > > But in my opinion, since the client will fail with the API
> > > > version,
> > > > > > so
> > > > > > > we
> > > > > > > > > don't need to down-convert the messages anyway. Isn't it?
> > So, I
> > > > > think
> > > > > > > we
> > > > > > > > > don't care about this case. (I'm sorry, I am not familiar
> > with
> > > > > > > > down-convert
> > > > > > > > > logic.)
> > > > > > > > >
> > > > > > > > > Please give more opinions. Thanks!
> > > > > > > > >
> > > > > > > > > - Dongjin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
> > > > jason@confluent.io
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hey Colin,
> > > > > > > > > >
> > > > > > > > > > The problem for the fetch API is that the broker does not
> > > > > generally
> > > > > > > > know
> > > > > > > > > if
> > > > > > > > > > a batch was compressed with zstd unless it parses it. I
> > think
> > > > the
> > > > > > > goal
> > > > > > > > > here
> > > > > > > > > > is to avoid the expensive down-conversion that is needed
> to
> > > > > ensure
> > > > > > > > > > compatibility because it is only necessary if zstd is
> > > actually
> > > > in
> > > > > > > use.
> > > > > > > > > But
> > > > > > > > > > as long as old clients can parse the message format, they
> > > > should
> > > > > > get
> > > > > > > a
> > > > > > > > > > reasonable error if they see an unsupported compression
> > type
> > > in
> > > > > the
> > > > > > > > > > attributes. Basically the onus is on users to ensure that
> > > their
> > > > > > > > consumers
> > > > > > > > > > have been updated prior to using zstd. It seems like a
> > > > reasonable
> > > > > > > > > tradeoff
> > > > > > > > > > to me. There are a couple cases that might be worth
> > thinking
> > > > > > through:
> > > > > > > > > >
> > > > > > > > > > 1. If a topic is explicitly configured to use zstd, then
> we
> > > > don't
> > > > > > > need
> > > > > > > > to
> > > > > > > > > > check the fetched data for the compression type to know
> if
> > we
> > > > > need
> > > > > > > > > > down-conversion. If we did bump the Fetch API version,
> then
> > > we
> > > > > > could
> > > > > > > > > handle
> > > > > > > > > > this case by either down-converting using a different
> > > > compression
> > > > > > > type
> > > > > > > > or
> > > > > > > > > > returning an error.
> > > > > > > > > > 2. Similarly, if we have to down-convert anyway because
> the
> > > > > client
> > > > > > > does
> > > > > > > > > not
> > > > > > > > > > understand the message format, then we could also use a
> > > > different
> > > > > > > > > > compression type.
> > > > > > > > > >
> > > > > > > > > > For the produce API, I think it's reasonable to bump the
> > api
> > > > > > version.
> > > > > > > > > This
> > > > > > > > > > can be used by clients to check whether a broker supports
> > > zstd.
> > > > > For
> > > > > > > > > > example, we might support a list of preferred compression
> > > types
> > > > > in
> > > > > > > the
> > > > > > > > > > producer and we could use the broker to detect which
> > version
> > > to
> > > > > > use.
> > > > > > > > > >
> > > > > > > > > > -Jason
> > > > > > > > > >
> > > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
> > > > cmccabe@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a good
> > > compression
> > > > > > codec
> > > > > > > > > and I
> > > > > > > > > > > hope we can get this support in soon!
> > > > > > > > > > >
> > > > > > > > > > > I would say we can just bump the API version to
> indicate
> > > that
> > > > > > ZStd
> > > > > > > > > > support
> > > > > > > > > > > is expected in new clients.  We probably need some way
> of
> > > > > > > indicating
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > > older clients that they can't consume the partitions,
> as
> > > > well.
> > > > > > > > Perhaps
> > > > > > > > > > we
> > > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > > > > > > > > > >
> > > > > > > > > > > The license thing seems straightforward -- it's just a
> > > matter
> > > > > of
> > > > > > > > adding
> > > > > > > > > > > the text to the right files as per ASF guidelines.
> > > > > > > > > > >
> > > > > > > > > > > With regard to the dictionary support, do we really
> need
> > > that
> > > > > in
> > > > > > > the
> > > > > > > > > > first
> > > > > > > > > > > version?  Hopefully message batches are big enough that
> > > this
> > > > > > isn't
> > > > > > > > > > needed.
> > > > > > > > > > > Some benchmarks might help here.
> > > > > > > > > > >
> > > > > > > > > > > best,
> > > > > > > > > > > Colin
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot this issue,
> > > > KIP-110
> > > > > > > > > > > > <https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-
> > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > > > > > > > > > .
> > > > > > > > > > > >
> > > > > > > > > > > > For newcomers, Here is some summary of the history:
> > > KIP-110
> > > > > was
> > > > > > > > > > > originally
> > > > > > > > > > > > worked for the issue KAFKA-4514 but, it lacked
> > benchmark
> > > > > > results
> > > > > > > to
> > > > > > > > > get
> > > > > > > > > > > the
> > > > > > > > > > > > agreement of the community. Later, Ivan Babrou and
> some
> > > > other
> > > > > > > users
> > > > > > > > > who
> > > > > > > > > > > > adopted the patch provided their excellent
> performance
> > > > report
> > > > > > > which
> > > > > > > > > is
> > > > > > > > > > > now
> > > > > > > > > > > > included in the KIP, but it postponed again because
> of
> > > the
> > > > > > > > community
> > > > > > > > > > was
> > > > > > > > > > > > busy for 2.0.0 release. It is why I now reboot this
> > > issue.
> > > > > > > > > > > >
> > > > > > > > > > > > The following is the current status of the feature:
> You
> > > can
> > > > > > check
> > > > > > > > the
> > > > > > > > > > > > current draft implementation here
> > > > > > > > > > > > <https://github.com/apache/kafka/pull/2267>. It is
> > based
> > > > on
> > > > > > zstd
> > > > > > > > > 1.3.5
> > > > > > > > > > > and
> > > > > > > > > > > > periodically rebased onto the latest trunk[^1].
> > > > > > > > > > > >
> > > > > > > > > > > > The issues that should be addressed is like
> following:
> > > > > > > > > > > >
> > > > > > > > > > > > *1. Backward Compatibility*
> > > > > > > > > > > >
> > > > > > > > > > > > To support old consumers, we need to take a strategy
> to
> > > > > handle
> > > > > > > the
> > > > > > > > > old
> > > > > > > > > > > > consumers. Current candidates are:
> > > > > > > > > > > >
> > > > > > > > > > > > - Bump API version
> > > > > > > > > > > > - Leave unchanged: let the old clients fail.
> > > > > > > > > > > > - Improve the error messages:
> > > > > > > > > > > >
> > > > > > > > > > > > *2. Dictionary Support*
> > > > > > > > > > > >
> > > > > > > > > > > > To support zstd's dictionary feature in the future
> (if
> > > > > needed),
> > > > > > > we
> > > > > > > > > need
> > > > > > > > > > > to
> > > > > > > > > > > > sketch how it should be and leave some room for it.
> As
> > of
> > > > > now,
> > > > > > > > there
> > > > > > > > > > has
> > > > > > > > > > > > been no discussion on this topic yet.
> > > > > > > > > > > >
> > > > > > > > > > > > *3. License*
> > > > > > > > > > > >
> > > > > > > > > > > > To use this feature, we need to add license of zstd
> and
> > > > > > zstd-jni
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > project. (Thanks to Viktor Somogyi for raising this
> > > issue!)
> > > > > It
> > > > > > > > seems
> > > > > > > > > > like
> > > > > > > > > > > > what Apache Spark did would be a good example but
> there
> > > has
> > > > > > been
> > > > > > > no
> > > > > > > > > > > > discussion yet.
> > > > > > > > > > > >
> > > > > > > > > > > > You can find the details of the above issues in the
> KIP
> > > > > > document.
> > > > > > > > > > Please
> > > > > > > > > > > > have a look when you are free, and give me feedback.
> > All
> > > > > kinds
> > > > > > of
> > > > > > > > > > > > participating are welcome.
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Dongjin
> > > > > > > > > > > >
> > > > > > > > > > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
> > > > > > dongjin@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Sorry for the late reply.
> > > > > > > > > > > > >
> > > > > > > > > > > > > In short, I could not submit the updated KIP by the
> > > > feature
> > > > > > > > freeze
> > > > > > > > > > > > > deadline of 2.0.0. For this reason, it will not be
> > > > included
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > 2.0.0
> > > > > > > > > > > > > release and all discussion for this issue were
> > > postponed
> > > > > > after
> > > > > > > > the
> > > > > > > > > > > release
> > > > > > > > > > > > > of 2.0.0.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have been updating the PR following recent
> updates.
> > > > Just
> > > > > > > now, I
> > > > > > > > > > > rebased
> > > > > > > > > > > > > it against the latest trunk and updated the zstd
> > > version
> > > > > into
> > > > > > > > > 1.3.5.
> > > > > > > > > > > If you
> > > > > > > > > > > > > need some request, don't hesitate to notify me.
> (But
> > > not
> > > > > this
> > > > > > > > > thread
> > > > > > > > > > -
> > > > > > > > > > > just
> > > > > > > > > > > > > send me the message directly.)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Dongjin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <
> > > > > > bobby@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >> I there any update on this.  The performance
> > > > improvements
> > > > > > are
> > > > > > > > > quite
> > > > > > > > > > > > >> impressive and I really would like to stop forking
> > > kafka
> > > > > > just
> > > > > > > to
> > > > > > > > > get
> > > > > > > > > > > this
> > > > > > > > > > > > >> in.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Bobby
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
> > > > > > > dongjin@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > Ismael,
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Oh, I forgot all of you are on working frenzy
> for
> > > 2.0!
> > > > > No
> > > > > > > > > problem,
> > > > > > > > > > > take
> > > > > > > > > > > > >> > your time. I am also working at another issue
> now.
> > > > Thank
> > > > > > you
> > > > > > > > for
> > > > > > > > > > > > >> letting me
> > > > > > > > > > > > >> > know.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Best,
> > > > > > > > > > > > >> > Dongjin
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <
> > > > > > > ismaelj@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > Sorry for the delay Dongjin. Everyone is busy
> > > > > finalising
> > > > > > > > > 2.0.0.
> > > > > > > > > > > This
> > > > > > > > > > > > >> KIP
> > > > > > > > > > > > >> > > seems like a great candidate for 2.1.0 and
> > > hopefully
> > > > > > there
> > > > > > > > > will
> > > > > > > > > > be
> > > > > > > > > > > > >> more
> > > > > > > > > > > > >> > of
> > > > > > > > > > > > >> > > a discussion next week. :)
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Ismael
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
> > > > > > > dongjin@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > > Hello. I just updated my draft
> implementation:
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > You can check out the implementation from
> here
> > > > > > > > > > > > >> > > > <https://github.com/apache/kafka/pull/2267
> >.
> > If
> > > > you
> > > > > > > > > > experience
> > > > > > > > > > > any
> > > > > > > > > > > > >> > > problem
> > > > > > > > > > > > >> > > > running it, don't hesitate to give me a
> > mention.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Best,
> > > > > > > > > > > > >> > > > Dongjin
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee
> <
> > > > > > > > > > dongjin@apache.org
> > > > > > > > > > > >
> > > > > > > > > > > > >> > wrote:
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > > Here is the short conclusion about the
> > license
> > > > > > > problem:
> > > > > > > > > *We
> > > > > > > > > > > can
> > > > > > > > > > > > >> use
> > > > > > > > > > > > >> > > zstd
> > > > > > > > > > > > >> > > > > and zstd-jni without any problem, but we
> > need
> > > to
> > > > > > > include
> > > > > > > > > > their
> > > > > > > > > > > > >> > license,
> > > > > > > > > > > > >> > > > > e.g., BSD license.*
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause
> > > License
> > > > > > > requires
> > > > > > > > > to
> > > > > > > > > > > > >> include
> > > > > > > > > > > > >> > the
> > > > > > > > > > > > >> > > > > license used, and BSD 3 Clause License
> > > requires
> > > > > that
> > > > > > > the
> > > > > > > > > > name
> > > > > > > > > > > of
> > > > > > > > > > > > >> the
> > > > > > > > > > > > >> > > > > contributor can't be used to endorse or
> > > promote
> > > > > the
> > > > > > > > > product.
> > > > > > > > > > > > >> That's
> > > > > > > > > > > > >> > it
> > > > > > > > > > > > >> > > > > <
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> http://www.mikestratton.net/2011/12/is-bsd-license-
> > > > > > > > > > > compatible-with-apache-2-0-license/
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > - They are not listed in the list of
> > > prohibited
> > > > > > > licenses
> > > > > > > > > > > > >> > > > > <https://www.apache.org/legal/
> > > > > > > resolved.html#category-x>
> > > > > > > > > > also.
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > Here is how Spark did for it
> > > > > > > > > > > > >> > > > > <https://issues.apache.org/
> > > > > jira/browse/SPARK-19112
> > > > > > >:
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > - They made a directory dedicated to the
> > > > > dependency
> > > > > > > > > license
> > > > > > > > > > > files
> > > > > > > > > > > > >> > > > > <
> > > > > > https://github.com/apache/spark/tree/master/licenses
> > > > > > > >
> > > > > > > > > and
> > > > > > > > > > > added
> > > > > > > > > > > > >> > > > licenses
> > > > > > > > > > > > >> > > > > for Zstd
> > > > > > > > > > > > >> > > > > <
> > > > > > > > > > > > >> >
> > > https://github.com/apache/spark/blob/master/licenses/
> > > > > > > > > > > LICENSE-zstd.txt
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > &
> > > > > > > > > > > > >> > > > > Zstd-jni
> > > > > > > > > > > > >> > > > > <
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> https://github.com/apache/
> > spark/blob/master/licenses/
> > > > > > > > > > > LICENSE-zstd-jni.txt
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > > > .
> > > > > > > > > > > > >> > > > > - Added a link to the original license
> files
> > > in
> > > > > > > LICENSE.
> > > > > > > > > > > > >> > > > > <
> > > > https://github.com/apache/spark/pull/18805/files
> > > > > >
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > If needed, I can make a similar update.
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > Thanks for pointing out this problem,
> > Viktor!
> > > > Nice
> > > > > > > > catch!
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > Best,
> > > > > > > > > > > > >> > > > > Dongjin
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin
> > Lee <
> > > > > > > > > > > dongjin@apache.org>
> > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > >> I greatly appreciate your comprehensive
> > > > > reasoning.
> > > > > > > so:
> > > > > > > > +1
> > > > > > > > > > > for b
> > > > > > > > > > > > >> > until
> > > > > > > > > > > > >> > > > now.
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >> For the license issues, I will have a
> check
> > > on
> > > > > how
> > > > > > > the
> > > > > > > > > over
> > > > > > > > > > > > >> projects
> > > > > > > > > > > > >> > > are
> > > > > > > > > > > > >> > > > >> doing and share the results.
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >> Best,
> > > > > > > > > > > > >> > > > >> Dongjin
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor
> > > > Somogyi <
> > > > > > > > > > > > >> > > > viktorsomogyi@gmail.com>
> > > > > > > > > > > > >> > > > >> wrote:
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >>> Hi Dongjin,
> > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > >> > > > >>> A couple of comments:
> > > > > > > > > > > > >> > > > >>> I would vote for option b. in the
> > "backward
> > > > > > > > > compatibility"
> > > > > > > > > > > > >> section.
> > > > > > > > > > > > >> > > My
> > > > > > > > > > > > >> > > > >>> reasoning for this is that users
> upgrading
> > > to
> > > > a
> > > > > > zstd
> > > > > > > > > > > compatible
> > > > > > > > > > > > >> > > version
> > > > > > > > > > > > >> > > > >>> won't start to use it automatically, so
> > > manual
> > > > > > > > > > > reconfiguration
> > > > > > > > > > > > >> is
> > > > > > > > > > > > >> > > > >>> required.
> > > > > > > > > > > > >> > > > >>> Therefore an upgrade won't mess up the
> > > > cluster.
> > > > > If
> > > > > > > not
> > > > > > > > > all
> > > > > > > > > > > the
> > > > > > > > > > > > >> > > clients
> > > > > > > > > > > > >> > > > >>> are
> > > > > > > > > > > > >> > > > >>> upgraded but just some of them and
> they'd
> > > > start
> > > > > to
> > > > > > > use
> > > > > > > > > > zstd
> > > > > > > > > > > > >> then it
> > > > > > > > > > > > >> > > > would
> > > > > > > > > > > > >> > > > >>> cause errors in the cluster. I'd like to
> > > > presume
> > > > > > > > though
> > > > > > > > > > that
> > > > > > > > > > > > >> this
> > > > > > > > > > > > >> > is
> > > > > > > > > > > > >> > > a
> > > > > > > > > > > > >> > > > >>> very
> > > > > > > > > > > > >> > > > >>> obvious failure case and nobody should
> be
> > > > > > surprised
> > > > > > > if
> > > > > > > > > it
> > > > > > > > > > > didn't
> > > > > > > > > > > > >> > > work.
> > > > > > > > > > > > >> > > > >>> I wouldn't choose a. as I think we
> should
> > > bump
> > > > > the
> > > > > > > > fetch
> > > > > > > > > > and
> > > > > > > > > > > > >> > produce
> > > > > > > > > > > > >> > > > >>> requests if it's a change in the message
> > > > format.
> > > > > > > > > Moreover
> > > > > > > > > > if
> > > > > > > > > > > > >> some
> > > > > > > > > > > > >> > of
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > >>> producers and the brokers are upgraded
> but
> > > > some
> > > > > of
> > > > > > > the
> > > > > > > > > > > consumers
> > > > > > > > > > > > >> > are
> > > > > > > > > > > > >> > > > not,
> > > > > > > > > > > > >> > > > >>> then we wouldn't prevent the error when
> > the
> > > > old
> > > > > > > > consumer
> > > > > > > > > > > tries
> > > > > > > > > > > > >> to
> > > > > > > > > > > > >> > > > consume
> > > > > > > > > > > > >> > > > >>> the zstd compressed messages.
> > > > > > > > > > > > >> > > > >>> I wouldn't choose c. either as I think
> > > binding
> > > > > the
> > > > > > > > > > > compression
> > > > > > > > > > > > >> type
> > > > > > > > > > > > >> > > to
> > > > > > > > > > > > >> > > > an
> > > > > > > > > > > > >> > > > >>> API is not so obvious from the
> developer's
> > > > > > > > perspective.
> > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > >> > > > >>> I would also prefer to use the existing
> > > > binding,
> > > > > > > > however
> > > > > > > > > > we
> > > > > > > > > > > must
> > > > > > > > > > > > >> > > > respect
> > > > > > > > > > > > >> > > > >>> the licenses:
> > > > > > > > > > > > >> > > > >>> "The code for these JNI bindings is
> > licenced
> > > > > under
> > > > > > > > > > 2-clause
> > > > > > > > > > > BSD
> > > > > > > > > > > > >> > > > license.
> > > > > > > > > > > > >> > > > >>> The native Zstd library is licensed
> under
> > > > > 3-clause
> > > > > > > BSD
> > > > > > > > > > > license
> > > > > > > > > > > > >> and
> > > > > > > > > > > > >> > > > GPL2"
> > > > > > > > > > > > >> > > > >>> Based on the FAQ page
> > > > > > > > > > > > >> > > > >>> https://www.apache.org/legal/
> > > > > > > resolved.html#category-a
> > > > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD licenses
> > but
> > > > the
> > > > > > > Apache
> > > > > > > > > > > license
> > > > > > > > > > > > >> is
> > > > > > > > > > > > >> > not
> > > > > > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping that
> the
> > > > > > "3-clause
> > > > > > > > BSD
> > > > > > > > > > > license
> > > > > > > > > > > > >> and
> > > > > > > > > > > > >> > > > GPL2"
> > > > > > > > > > > > >> > > > >>> is really not an AND but an OR in this
> > case,
> > > > but
> > > > > > I'm
> > > > > > > > no
> > > > > > > > > > > lawyer,
> > > > > > > > > > > > >> > just
> > > > > > > > > > > > >> > > > >>> wanted
> > > > > > > > > > > > >> > > > >>> to make the point that we should watch
> out
> > > for
> > > > > > > > licenses.
> > > > > > > > > > :)
> > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > >> > > > >>> Regards,
> > > > > > > > > > > > >> > > > >>> Viktor
> > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan
> > Babrou
> > > <
> > > > > > > > > > > ibobrik@gmail.com>
> > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > >> > > > >>> > Hello,
> > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > >> > > > >>> > This is Ivan and I still very much
> > support
> > > > the
> > > > > > > fact
> > > > > > > > > that
> > > > > > > > > > > zstd
> > > > > > > > > > > > >> > > > >>> compression
> > > > > > > > > > > > >> > > > >>> > should be included out of the box.
> > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > >> > > > >>> > Please think about the environment,
> you
> > > can
> > > > > save
> > > > > > > > > quite a
> > > > > > > > > > > lot
> > > > > > > > > > > > >> of
> > > > > > > > > > > > >> > > > >>> hardware
> > > > > > > > > > > > >> > > > >>> > with it.
> > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > >> > > > >>> > Thank you.
> > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin
> > Lee <
> > > > > > > > > > > dongjin@apache.org>
> > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > >> > > > >>> > > Since there are no responses for a
> > > week, I
> > > > > > > decided
> > > > > > > > > to
> > > > > > > > > > > > >> > reinitiate
> > > > > > > > > > > > >> > > > the
> > > > > > > > > > > > >> > > > >>> > > discussion thread.
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> > > This KIP is about to introduce
> > ZStandard
> > > > > > > > Compression
> > > > > > > > > > > into
> > > > > > > > > > > > >> > Apache
> > > > > > > > > > > > >> > > > >>> Kafka.
> > > > > > > > > > > > >> > > > >>> > > The reason why it is posted again
> has
> > a
> > > > > story:
> > > > > > > It
> > > > > > > > > was
> > > > > > > > > > > > >> > originally
> > > > > > > > > > > > >> > > > >>> posted
> > > > > > > > > > > > >> > > > >>> > to
> > > > > > > > > > > > >> > > > >>> > > the dev mailing list more than one
> > year
> > > > ago
> > > > > > but
> > > > > > > > > since
> > > > > > > > > > it
> > > > > > > > > > > > >> has no
> > > > > > > > > > > > >> > > > >>> > performance
> > > > > > > > > > > > >> > > > >>> > > report included, it was postponed
> > later.
> > > > But
> > > > > > > Some
> > > > > > > > > > people
> > > > > > > > > > > > >> > > (including
> > > > > > > > > > > > >> > > > >>> Ivan)
> > > > > > > > > > > > >> > > > >>> > > reported excellent performance
> report
> > > with
> > > > > the
> > > > > > > > draft
> > > > > > > > > > PR,
> > > > > > > > > > > > >> this
> > > > > > > > > > > > >> > > work
> > > > > > > > > > > > >> > > > >>> is now
> > > > > > > > > > > > >> > > > >>> > > reactivated.
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> > > The updated KIP document includes
> some
> > > > > > expected
> > > > > > > > > > > problems and
> > > > > > > > > > > > >> > > their
> > > > > > > > > > > > >> > > > >>> > > candidate alternatives. Please have
> a
> > > look
> > > > > > when
> > > > > > > > you
> > > > > > > > > > are
> > > > > > > > > > > > >> free,
> > > > > > > > > > > > >> > and
> > > > > > > > > > > > >> > > > >>> give
> > > > > > > > > > > > >> > > > >>> > me a
> > > > > > > > > > > > >> > > > >>> > > feedback. All kinds of participating
> > are
> > > > > > > welcome.
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> > > Best,
> > > > > > > > > > > > >> > > > >>> > > Dongjin
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> > > --
> > > > > > > > > > > > >> > > > >>> > > *Dongjin Lee*
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> > > *A hitchhiker in the mathematical
> > > world.*
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> > > *github:  <http://goog_969573159/
> > > >github
> > > > > > > > > > > .com/dongjinleekr
> > > > > > > > > > > > >> > > > >>> > > <http://github.com/dongjinleekr
> > > >linkedin:
> > > > > > > > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/
> > dongjinleekr
> > > > > > > > >slideshare:
> > > > > > > > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > >> > > > >>> > > <http://www.slideshare.net/
> > dongjinleekr
> > > >*
> > > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > > >> > > > >> --
> > > > > > > > > > > > >> > > > >> *Dongjin Lee*
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >> *github:  <http://goog_969573159/>github
> > > > > > > > > .com/dongjinleekr
> > > > > > > > > > > > >> > > > >> <http://github.com/dongjinleekr
> >linkedin:
> > > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr
> > > > > > >slideshare:
> > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > >> > > > >> <http://www.slideshare.net/dongjinleekr
> >*
> > > > > > > > > > > > >> > > > >>
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > --
> > > > > > > > > > > > >> > > > > *Dongjin Lee*
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > > > *github:  <http://goog_969573159/>
> > > > > > > > github.com/dongjinleekr
> > > > > > > > > > > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr
> > > > > >slideshare:
> > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > > >> > > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > >> > > > *Dongjin Lee*
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > *github:  <http://goog_969573159/>github
> > > > > > > .com/dongjinleekr
> > > > > > > > > > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr
> > > > >slideshare:
> > > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> --
> > > > > > > > > > > > >> *Dongjin Lee*
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> *github:  <http://goog_969573159/>
> > > > github.com/dongjinleekr
> > > > > > > > > > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > > kr.linkedin.com/in/
> > > > > > > > > > > dongjinleekr
> > > > > > > > > > > > >> <http://kr.linkedin.com/in/dongjinleekr
> >slideshare:
> > > > > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > *Dongjin Lee*
> > > > > > > > > > > >
> > > > > > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > > >
> > > > > > > > > > > > *github:  <http://goog_969573159/>
> > > github.com/dongjinleekr
> > > > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > kr.linkedin.com/in/
> > > > > > > > > > > dongjinleekr
> > > > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > *Dongjin Lee*
> > > > > > > > > >
> > > > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > >
> > > > > > > > > > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/
> > > > > > > > > dongjinleekr
> > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > www.slideshare.net/
> > > > > > > > > dongjinleekr
> > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > *Dongjin Lee*
> > > > > > > >
> > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > >
> > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/
> > > > > > > dongjinleekr
> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > www.slideshare.net/
> > > > > > > dongjinleekr
> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/
> > dongjinleekr
> > > <http://www.slideshare.net/dongjinleekr>*
> > >
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Jason Gustafson <ja...@confluent.io>.
Hi Dongjin,

One of the complications is that old versions of the API will not expect a
new error code. However, since we expect this to be a fatal error anyway
for old clients, it may still be more useful to return the correct error
code. For example, the Kafka clients use the following code to convert the
error code:

    public static Errors forCode(short code) {
        Errors error = codeToError.get(code);
        if (error != null) {
            return error;
        } else {
            log.warn("Unexpected error code: {}.", code);
            return UNKNOWN_SERVER_ERROR;
        }
    }

If we return an unsupported error code, it will be converted to an UNKNOWN
error, but at least we will get the message in the log with the correct
code. That seems preferable to returning a misleading error code. So I
wonder if we can use the new UNSUPPORTED_COMPRESSION_TYPE error even for
older versions.

Also, one question just to check my understanding. I think we would only
use this error code when we /know/ that zstd was in use and the client
doesn't support it? This is true if either 1) the message needs
down-conversion and we encounter a zstd compressed message, or 2) if the
topic is explicitly configured to use zstd. However, if the compression
type is set to "producer," then the fetched data may or may not be
compressed with zstd. In this case, we return the data to the client and
expect it to fail parsing. Is that correct?

Thanks,
Jason



On Tue, Aug 21, 2018 at 9:08 AM, Dongjin Lee <do...@apache.org> wrote:

> Ismael, Jason and all,
>
> I rewrote the backward compatibility strategy & its alternatives like
> following, based on Ismael & Jason's comments. Since it is not updated to
> the wiki yet, don't hesitate to give me a message if you have any opinion
> on it.
>
> ```
> *Backward Compatibility*
>
> We need to establish some backward-compatibility strategy for the case an
> old client subscribes a topic using ZStandard implicitly (i.e.,
> 'compression.type' configuration of given topic is 'producer' and the
> producer compressed the records with ZStandard). We have the following
> options for this situation:
>
> *A. Support ZStandard to the old clients which can understand v0, v1
> messages only.*
>
> This strategy necessarily requires the down-conversion of v2 message
> compressed with Zstandard into v0 or v1 messages, which means a
> considerable performance degradation. So we rejected this strategy.
>
> *B. Bump the API version and support only v2-available clients*
>
> With this approach, we can message the old clients that they are old and
> should be upgraded. However, there are still several options for the Error
> code.
>
> *B.1. INVALID_REQUEST (42)*
>
> This option gives the client so little information; the user can be
> confused about why the client worked correctly in the past suddenly
> encounters a problem. So we rejected this strategy.
>
> *B.2. CORRUPT_MESSAGE (2)*
>
> This option gives inaccurate information; the user can be surprised and
> misunderstand that the log files are broken in some way. So we rejected
> this strategy.
>
> *B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*
>
> The advantage of this approach is we don't need to define a new error code;
> we can reuse it and that's all.
>
> The disadvantage of this approach is that it is also a little bit vague;
> This error code is defined as a work for KIP-98[^1] and now returned in the
> transaction error.
>
> *B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*
>
> The advantage of this approach is that it is clear and provides an exact
> description. The disadvantage is we need to add a new error code.
> ```
>
> *It seems like what we need to choose is now so clear:
> UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or UNSUPPORTED_COMPRESSION_TYPE
> (B.4).*
> The first one doesn't need a new error message but the latter is more
> explicit. Which one do you prefer? Since all of you have much more
> experience and knowledge than me, I will follow your decision. The wiki
> page will be updated following the decision also.
>
> Best,
> Dongjin
>
> [^1]: https://issues.apache.org/jira/browse/KAFKA-4990
>
> On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <is...@gmail.com> wrote:
>
> > Sounds reasonable to me.
> >
> > Ismael
> >
> > On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <ja...@confluent.io> wrote:
> >
> > > Hey Ismael,
> > >
> > > Your summary looks good to me. I think it might also be a good idea to
> > add
> > > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along with the
> > version
> > > bumps. We won't be able to use it for old api versions since the
> clients
> > > will not understand it, but we can use it going forward so that we're
> not
> > > stuck in a similar situation with a new message format and a new codec
> to
> > > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_FORMAT, but
> it
> > is
> > > not as explicit.
> > >
> > > -Jason
> > >
> > > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <is...@juma.me.uk>
> wrote:
> > >
> > > > Hi Dongjin and Jason,
> > > >
> > > > I would agree. My summary:
> > > >
> > > > 1. Support zstd with message format 2 only.
> > > > 2. Bump produce and fetch request versions.
> > > > 3. Provide broker errors whenever possible based on the request
> version
> > > and
> > > > rely on clients for the cases where the broker can't validate
> > efficiently
> > > > (example message format 2 consumer that supports the latest fetch
> > version
> > > > but doesn't support zstd).
> > > >
> > > > If there's general agreement on this, I suggest we update the KIP to
> > > state
> > > > the proposal and to move the rejected options to its own section. And
> > > then
> > > > start a vote!
> > > >
> > > > Ismael
> > > >
> > > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi Dongjin,
> > > > >
> > > > > Yes, that's a good summary. For clients which support v2, the
> client
> > > can
> > > > > parse the message format and hopefully raise a useful error message
> > > > > indicating the unsupported compression type. For older clients, our
> > > > options
> > > > > are probably (1) to down-convert to the old format using no
> > compression
> > > > > type, or (2) to return an error code. I'm leaning toward the latter
> > as
> > > > the
> > > > > simpler solution, but the challenge is finding a good error code.
> Two
> > > > > possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE. The
> > downside
> > > > is
> > > > > that old clients probably won't get a helpful message. However, at
> > > least
> > > > > the behavior will be consistent in the sense that all clients will
> > fail
> > > > if
> > > > > they do not support zstandard.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thanks,
> > > > > Jason
> > > > >
> > > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >
> > > > > > Thanks Jason, I reviewed the down-converting logic following your
> > > > > > explanation.[^1] You mean the following routines, right?
> > > > > >
> > > > > > -
> > > > > > https://github.com/apache/kafka/blob/trunk/core/src/
> > > > > > main/scala/kafka/server/KafkaApis.scala#L534
> > > > > > -
> > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > > main/java/org/apache/kafka/common/record/
> LazyDownConversionRecords.
> > > > > > java#L165
> > > > > > -
> > > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > > main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > > > > >
> > > > > > It seems like your stance is like following:
> > > > > >
> > > > > > 1. In principle, Kafka does not change the compression codec when
> > > > > > down-converting, since it requires inspecting the fetched data,
> > which
> > > > is
> > > > > > expensive.
> > > > > > 2. However, there are some cases the fetched data is inspected
> > > anyway.
> > > > In
> > > > > > this case, we can provide compression conversion from Zstandard
> to
> > > > > > classical ones[^2].
> > > > > >
> > > > > > And from what I understand, the cases where the client without
> > > > ZStandard
> > > > > > support receives ZStandard compressed records can be organized
> into
> > > two
> > > > > > cases:
> > > > > >
> > > > > > a. The 'compression.type' configuration of given topic is
> > 'producer'
> > > > and
> > > > > > the producer compressed the records with ZStandard. (that is,
> using
> > > > > > ZStandard implicitly.)
> > > > > > b.  The 'compression.type' configuration of given topic is
> 'zstd';
> > > that
> > > > > is,
> > > > > > using ZStandard explicitly.
> > > > > >
> > > > > > As you stated, we don't have to handle the case b specially. So,
> It
> > > > seems
> > > > > > like we can narrow the focus of the problem by joining case 1 and
> > > case
> > > > b
> > > > > > like the following:
> > > > > >
> > > > > > > Given the topic with 'producer' as its 'compression.type'
> > > > > configuration,
> > > > > > ZStandard compressed records and old client without ZStandard, is
> > > there
> > > > > any
> > > > > > case we need to inspect the records and can change the
> compression
> > > > type?
> > > > > If
> > > > > > so, can we provide compression type converting?
> > > > > >
> > > > > > Do I understand correctly?
> > > > > >
> > > > > > Best,
> > > > > > Dongjin
> > > > > >
> > > > > > [^1]: I'm sorry, I found that I was a little bit misunderstanding
> > how
> > > > API
> > > > > > version works, after reviewing the downconvert logic & the
> protocol
> > > > > > documentation <https://kafka.apache.org/protocol>.
> > > > > > [^2]: None, Gzip, Snappy, Lz4.
> > > > > >
> > > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
> > jason@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > >
> > > > > > > > But in my opinion, since the client will fail with the API
> > > version,
> > > > > so
> > > > > > we
> > > > > > > > don't need to down-convert the messages anyway. Isn't it?
> So, I
> > > > think
> > > > > > we
> > > > > > > > don't care about this case. (I'm sorry, I am not familiar
> with
> > > > > > > down-convert
> > > > > > > > logic.)
> > > > > > >
> > > > > > >
> > > > > > > Currently the broker down-converts automatically when it
> receives
> > > an
> > > > > old
> > > > > > > version of the fetch request (a version which is known to
> predate
> > > the
> > > > > > > message format in use). Typically when down-converting the
> > message
> > > > > > format,
> > > > > > > we use the same compression type, but there is not much point
> in
> > > > doing
> > > > > so
> > > > > > > when we know the client doesn't support it. So if zstandard is
> in
> > > > use,
> > > > > > and
> > > > > > > we have to down-convert anyway, then we can choose to use a
> > > different
> > > > > > > compression type or no compression type.
> > > > > > >
> > > > > > > From my perspective, there is no significant downside to
> bumping
> > > the
> > > > > > > protocol version and it has several potential benefits. Version
> > > bumps
> > > > > are
> > > > > > > cheap. The main question mark in my mind is about
> > down-conversion.
> > > > > > Figuring
> > > > > > > out whether down-conversion is needed is hard generally without
> > > > > > inspecting
> > > > > > > the fetched data, which is expensive. I think we agree in
> > principle
> > > > > that
> > > > > > we
> > > > > > > do not want to have to pay this cost generally and prefer the
> > > clients
> > > > > to
> > > > > > > fail when they see an unhandled compression type. The point I
> was
> > > > > making
> > > > > > is
> > > > > > > that there are some cases where we are either inspecting the
> data
> > > > > anyway
> > > > > > > (because we have to down-convert the message format), or we
> have
> > an
> > > > > easy
> > > > > > > way to tell whether zstandard is in use (the topic has it
> > > configured
> > > > > > > explicitly). In the latter case, we don't have to handle it
> > > > specially.
> > > > > > But
> > > > > > > we do have to decide how we will handle down-conversion to
> older
> > > > > formats.
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <
> dongjin@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Colin and Jason,
> > > > > > > >
> > > > > > > > Thanks for your opinions. In summarizing, the Pros and Cons
> of
> > > > > bumping
> > > > > > > > fetch API version are:
> > > > > > > >
> > > > > > > > Cons:
> > > > > > > >
> > > > > > > > - The Broker can't know whether a given message batch is
> > > compressed
> > > > > > with
> > > > > > > > zstd or not.
> > > > > > > > - Need some additional logic for the topic explicitly
> > configured
> > > to
> > > > > use
> > > > > > > > zstd.
> > > > > > > >
> > > > > > > > Pros:
> > > > > > > >
> > > > > > > > - The broker doesn't need to conduct expensive
> down-conversion.
> > > > > > > > - Can message the users to update their client.
> > > > > > > >
> > > > > > > > So, opinions for the backward-compatibility policy by far:
> > > > > > > >
> > > > > > > > - A: bump the API version - +2 (Colin, Jason)
> > > > > > > > - B: leave unchanged - +1 (Viktor)
> > > > > > > >
> > > > > > > > Here are my additional comments:
> > > > > > > >
> > > > > > > > @Colin
> > > > > > > >
> > > > > > > > I greatly appreciate your response. In the case of the
> > dictionary
> > > > > > > support,
> > > > > > > > of course, this issue should be addressed later so we don't
> > need
> > > it
> > > > > in
> > > > > > > the
> > > > > > > > first version. You are right - it is not late to try it after
> > > some
> > > > > > > > benchmarks. What I mean is, we should keep in mind on that
> > > > potential
> > > > > > > > feature.
> > > > > > > >
> > > > > > > > @Jason
> > > > > > > >
> > > > > > > > You wrote,
> > > > > > > >
> > > > > > > > > Similarly, if we have to down-convert anyway because the
> > client
> > > > > does
> > > > > > > not
> > > > > > > > understand the message format, then we could also use a
> > different
> > > > > > > > compression type.
> > > > > > > >
> > > > > > > > But in my opinion, since the client will fail with the API
> > > version,
> > > > > so
> > > > > > we
> > > > > > > > don't need to down-convert the messages anyway. Isn't it?
> So, I
> > > > think
> > > > > > we
> > > > > > > > don't care about this case. (I'm sorry, I am not familiar
> with
> > > > > > > down-convert
> > > > > > > > logic.)
> > > > > > > >
> > > > > > > > Please give more opinions. Thanks!
> > > > > > > >
> > > > > > > > - Dongjin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
> > > jason@confluent.io
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey Colin,
> > > > > > > > >
> > > > > > > > > The problem for the fetch API is that the broker does not
> > > > generally
> > > > > > > know
> > > > > > > > if
> > > > > > > > > a batch was compressed with zstd unless it parses it. I
> think
> > > the
> > > > > > goal
> > > > > > > > here
> > > > > > > > > is to avoid the expensive down-conversion that is needed to
> > > > ensure
> > > > > > > > > compatibility because it is only necessary if zstd is
> > actually
> > > in
> > > > > > use.
> > > > > > > > But
> > > > > > > > > as long as old clients can parse the message format, they
> > > should
> > > > > get
> > > > > > a
> > > > > > > > > reasonable error if they see an unsupported compression
> type
> > in
> > > > the
> > > > > > > > > attributes. Basically the onus is on users to ensure that
> > their
> > > > > > > consumers
> > > > > > > > > have been updated prior to using zstd. It seems like a
> > > reasonable
> > > > > > > > tradeoff
> > > > > > > > > to me. There are a couple cases that might be worth
> thinking
> > > > > through:
> > > > > > > > >
> > > > > > > > > 1. If a topic is explicitly configured to use zstd, then we
> > > don't
> > > > > > need
> > > > > > > to
> > > > > > > > > check the fetched data for the compression type to know if
> we
> > > > need
> > > > > > > > > down-conversion. If we did bump the Fetch API version, then
> > we
> > > > > could
> > > > > > > > handle
> > > > > > > > > this case by either down-converting using a different
> > > compression
> > > > > > type
> > > > > > > or
> > > > > > > > > returning an error.
> > > > > > > > > 2. Similarly, if we have to down-convert anyway because the
> > > > client
> > > > > > does
> > > > > > > > not
> > > > > > > > > understand the message format, then we could also use a
> > > different
> > > > > > > > > compression type.
> > > > > > > > >
> > > > > > > > > For the produce API, I think it's reasonable to bump the
> api
> > > > > version.
> > > > > > > > This
> > > > > > > > > can be used by clients to check whether a broker supports
> > zstd.
> > > > For
> > > > > > > > > example, we might support a list of preferred compression
> > types
> > > > in
> > > > > > the
> > > > > > > > > producer and we could use the broker to detect which
> version
> > to
> > > > > use.
> > > > > > > > >
> > > > > > > > > -Jason
> > > > > > > > >
> > > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
> > > cmccabe@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a good
> > compression
> > > > > codec
> > > > > > > > and I
> > > > > > > > > > hope we can get this support in soon!
> > > > > > > > > >
> > > > > > > > > > I would say we can just bump the API version to indicate
> > that
> > > > > ZStd
> > > > > > > > > support
> > > > > > > > > > is expected in new clients.  We probably need some way of
> > > > > > indicating
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > older clients that they can't consume the partitions, as
> > > well.
> > > > > > > Perhaps
> > > > > > > > > we
> > > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > > > > > > > > >
> > > > > > > > > > The license thing seems straightforward -- it's just a
> > matter
> > > > of
> > > > > > > adding
> > > > > > > > > > the text to the right files as per ASF guidelines.
> > > > > > > > > >
> > > > > > > > > > With regard to the dictionary support, do we really need
> > that
> > > > in
> > > > > > the
> > > > > > > > > first
> > > > > > > > > > version?  Hopefully message batches are big enough that
> > this
> > > > > isn't
> > > > > > > > > needed.
> > > > > > > > > > Some benchmarks might help here.
> > > > > > > > > >
> > > > > > > > > > best,
> > > > > > > > > > Colin
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > > > > > > > > As Kafka 2.0.0 was released, let's reboot this issue,
> > > KIP-110
> > > > > > > > > > > <https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-
> > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > > > > > > > > .
> > > > > > > > > > >
> > > > > > > > > > > For newcomers, Here is some summary of the history:
> > KIP-110
> > > > was
> > > > > > > > > > originally
> > > > > > > > > > > worked for the issue KAFKA-4514 but, it lacked
> benchmark
> > > > > results
> > > > > > to
> > > > > > > > get
> > > > > > > > > > the
> > > > > > > > > > > agreement of the community. Later, Ivan Babrou and some
> > > other
> > > > > > users
> > > > > > > > who
> > > > > > > > > > > adopted the patch provided their excellent performance
> > > report
> > > > > > which
> > > > > > > > is
> > > > > > > > > > now
> > > > > > > > > > > included in the KIP, but it postponed again because of
> > the
> > > > > > > community
> > > > > > > > > was
> > > > > > > > > > > busy for 2.0.0 release. It is why I now reboot this
> > issue.
> > > > > > > > > > >
> > > > > > > > > > > The following is the current status of the feature: You
> > can
> > > > > check
> > > > > > > the
> > > > > > > > > > > current draft implementation here
> > > > > > > > > > > <https://github.com/apache/kafka/pull/2267>. It is
> based
> > > on
> > > > > zstd
> > > > > > > > 1.3.5
> > > > > > > > > > and
> > > > > > > > > > > periodically rebased onto the latest trunk[^1].
> > > > > > > > > > >
> > > > > > > > > > > The issues that should be addressed is like following:
> > > > > > > > > > >
> > > > > > > > > > > *1. Backward Compatibility*
> > > > > > > > > > >
> > > > > > > > > > > To support old consumers, we need to take a strategy to
> > > > handle
> > > > > > the
> > > > > > > > old
> > > > > > > > > > > consumers. Current candidates are:
> > > > > > > > > > >
> > > > > > > > > > > - Bump API version
> > > > > > > > > > > - Leave unchanged: let the old clients fail.
> > > > > > > > > > > - Improve the error messages:
> > > > > > > > > > >
> > > > > > > > > > > *2. Dictionary Support*
> > > > > > > > > > >
> > > > > > > > > > > To support zstd's dictionary feature in the future (if
> > > > needed),
> > > > > > we
> > > > > > > > need
> > > > > > > > > > to
> > > > > > > > > > > sketch how it should be and leave some room for it. As
> of
> > > > now,
> > > > > > > there
> > > > > > > > > has
> > > > > > > > > > > been no discussion on this topic yet.
> > > > > > > > > > >
> > > > > > > > > > > *3. License*
> > > > > > > > > > >
> > > > > > > > > > > To use this feature, we need to add license of zstd and
> > > > > zstd-jni
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > project. (Thanks to Viktor Somogyi for raising this
> > issue!)
> > > > It
> > > > > > > seems
> > > > > > > > > like
> > > > > > > > > > > what Apache Spark did would be a good example but there
> > has
> > > > > been
> > > > > > no
> > > > > > > > > > > discussion yet.
> > > > > > > > > > >
> > > > > > > > > > > You can find the details of the above issues in the KIP
> > > > > document.
> > > > > > > > > Please
> > > > > > > > > > > have a look when you are free, and give me feedback.
> All
> > > > kinds
> > > > > of
> > > > > > > > > > > participating are welcome.
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Dongjin
> > > > > > > > > > >
> > > > > > > > > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
> > > > > dongjin@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Sorry for the late reply.
> > > > > > > > > > > >
> > > > > > > > > > > > In short, I could not submit the updated KIP by the
> > > feature
> > > > > > > freeze
> > > > > > > > > > > > deadline of 2.0.0. For this reason, it will not be
> > > included
> > > > > in
> > > > > > > the
> > > > > > > > > > 2.0.0
> > > > > > > > > > > > release and all discussion for this issue were
> > postponed
> > > > > after
> > > > > > > the
> > > > > > > > > > release
> > > > > > > > > > > > of 2.0.0.
> > > > > > > > > > > >
> > > > > > > > > > > > I have been updating the PR following recent updates.
> > > Just
> > > > > > now, I
> > > > > > > > > > rebased
> > > > > > > > > > > > it against the latest trunk and updated the zstd
> > version
> > > > into
> > > > > > > > 1.3.5.
> > > > > > > > > > If you
> > > > > > > > > > > > need some request, don't hesitate to notify me. (But
> > not
> > > > this
> > > > > > > > thread
> > > > > > > > > -
> > > > > > > > > > just
> > > > > > > > > > > > send me the message directly.)
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Dongjin
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <
> > > > > bobby@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> I there any update on this.  The performance
> > > improvements
> > > > > are
> > > > > > > > quite
> > > > > > > > > > > >> impressive and I really would like to stop forking
> > kafka
> > > > > just
> > > > > > to
> > > > > > > > get
> > > > > > > > > > this
> > > > > > > > > > > >> in.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Thanks,
> > > > > > > > > > > >>
> > > > > > > > > > > >> Bobby
> > > > > > > > > > > >>
> > > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
> > > > > > dongjin@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> > Ismael,
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Oh, I forgot all of you are on working frenzy for
> > 2.0!
> > > > No
> > > > > > > > problem,
> > > > > > > > > > take
> > > > > > > > > > > >> > your time. I am also working at another issue now.
> > > Thank
> > > > > you
> > > > > > > for
> > > > > > > > > > > >> letting me
> > > > > > > > > > > >> > know.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > Best,
> > > > > > > > > > > >> > Dongjin
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <
> > > > > > ismaelj@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > > Sorry for the delay Dongjin. Everyone is busy
> > > > finalising
> > > > > > > > 2.0.0.
> > > > > > > > > > This
> > > > > > > > > > > >> KIP
> > > > > > > > > > > >> > > seems like a great candidate for 2.1.0 and
> > hopefully
> > > > > there
> > > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > >> more
> > > > > > > > > > > >> > of
> > > > > > > > > > > >> > > a discussion next week. :)
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > Ismael
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
> > > > > > dongjin@apache.org
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > > Hello. I just updated my draft implementation:
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > You can check out the implementation from here
> > > > > > > > > > > >> > > > <https://github.com/apache/kafka/pull/2267>.
> If
> > > you
> > > > > > > > > experience
> > > > > > > > > > any
> > > > > > > > > > > >> > > problem
> > > > > > > > > > > >> > > > running it, don't hesitate to give me a
> mention.
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > Best,
> > > > > > > > > > > >> > > > Dongjin
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <
> > > > > > > > > dongjin@apache.org
> > > > > > > > > > >
> > > > > > > > > > > >> > wrote:
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > > Here is the short conclusion about the
> license
> > > > > > problem:
> > > > > > > > *We
> > > > > > > > > > can
> > > > > > > > > > > >> use
> > > > > > > > > > > >> > > zstd
> > > > > > > > > > > >> > > > > and zstd-jni without any problem, but we
> need
> > to
> > > > > > include
> > > > > > > > > their
> > > > > > > > > > > >> > license,
> > > > > > > > > > > >> > > > > e.g., BSD license.*
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause
> > License
> > > > > > requires
> > > > > > > > to
> > > > > > > > > > > >> include
> > > > > > > > > > > >> > the
> > > > > > > > > > > >> > > > > license used, and BSD 3 Clause License
> > requires
> > > > that
> > > > > > the
> > > > > > > > > name
> > > > > > > > > > of
> > > > > > > > > > > >> the
> > > > > > > > > > > >> > > > > contributor can't be used to endorse or
> > promote
> > > > the
> > > > > > > > product.
> > > > > > > > > > > >> That's
> > > > > > > > > > > >> > it
> > > > > > > > > > > >> > > > > <
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> > > > > > > > > > compatible-with-apache-2-0-license/
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > - They are not listed in the list of
> > prohibited
> > > > > > licenses
> > > > > > > > > > > >> > > > > <https://www.apache.org/legal/
> > > > > > resolved.html#category-x>
> > > > > > > > > also.
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > Here is how Spark did for it
> > > > > > > > > > > >> > > > > <https://issues.apache.org/
> > > > jira/browse/SPARK-19112
> > > > > >:
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > - They made a directory dedicated to the
> > > > dependency
> > > > > > > > license
> > > > > > > > > > files
> > > > > > > > > > > >> > > > > <
> > > > > https://github.com/apache/spark/tree/master/licenses
> > > > > > >
> > > > > > > > and
> > > > > > > > > > added
> > > > > > > > > > > >> > > > licenses
> > > > > > > > > > > >> > > > > for Zstd
> > > > > > > > > > > >> > > > > <
> > > > > > > > > > > >> >
> > https://github.com/apache/spark/blob/master/licenses/
> > > > > > > > > > LICENSE-zstd.txt
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > &
> > > > > > > > > > > >> > > > > Zstd-jni
> > > > > > > > > > > >> > > > > <
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >> https://github.com/apache/
> spark/blob/master/licenses/
> > > > > > > > > > LICENSE-zstd-jni.txt
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > > > > .
> > > > > > > > > > > >> > > > > - Added a link to the original license files
> > in
> > > > > > LICENSE.
> > > > > > > > > > > >> > > > > <
> > > https://github.com/apache/spark/pull/18805/files
> > > > >
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > If needed, I can make a similar update.
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > Thanks for pointing out this problem,
> Viktor!
> > > Nice
> > > > > > > catch!
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > Best,
> > > > > > > > > > > >> > > > > Dongjin
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin
> Lee <
> > > > > > > > > > dongjin@apache.org>
> > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > >> I greatly appreciate your comprehensive
> > > > reasoning.
> > > > > > so:
> > > > > > > +1
> > > > > > > > > > for b
> > > > > > > > > > > >> > until
> > > > > > > > > > > >> > > > now.
> > > > > > > > > > > >> > > > >>
> > > > > > > > > > > >> > > > >> For the license issues, I will have a check
> > on
> > > > how
> > > > > > the
> > > > > > > > over
> > > > > > > > > > > >> projects
> > > > > > > > > > > >> > > are
> > > > > > > > > > > >> > > > >> doing and share the results.
> > > > > > > > > > > >> > > > >>
> > > > > > > > > > > >> > > > >> Best,
> > > > > > > > > > > >> > > > >> Dongjin
> > > > > > > > > > > >> > > > >>
> > > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor
> > > Somogyi <
> > > > > > > > > > > >> > > > viktorsomogyi@gmail.com>
> > > > > > > > > > > >> > > > >> wrote:
> > > > > > > > > > > >> > > > >>
> > > > > > > > > > > >> > > > >>> Hi Dongjin,
> > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > >> > > > >>> A couple of comments:
> > > > > > > > > > > >> > > > >>> I would vote for option b. in the
> "backward
> > > > > > > > compatibility"
> > > > > > > > > > > >> section.
> > > > > > > > > > > >> > > My
> > > > > > > > > > > >> > > > >>> reasoning for this is that users upgrading
> > to
> > > a
> > > > > zstd
> > > > > > > > > > compatible
> > > > > > > > > > > >> > > version
> > > > > > > > > > > >> > > > >>> won't start to use it automatically, so
> > manual
> > > > > > > > > > reconfiguration
> > > > > > > > > > > >> is
> > > > > > > > > > > >> > > > >>> required.
> > > > > > > > > > > >> > > > >>> Therefore an upgrade won't mess up the
> > > cluster.
> > > > If
> > > > > > not
> > > > > > > > all
> > > > > > > > > > the
> > > > > > > > > > > >> > > clients
> > > > > > > > > > > >> > > > >>> are
> > > > > > > > > > > >> > > > >>> upgraded but just some of them and they'd
> > > start
> > > > to
> > > > > > use
> > > > > > > > > zstd
> > > > > > > > > > > >> then it
> > > > > > > > > > > >> > > > would
> > > > > > > > > > > >> > > > >>> cause errors in the cluster. I'd like to
> > > presume
> > > > > > > though
> > > > > > > > > that
> > > > > > > > > > > >> this
> > > > > > > > > > > >> > is
> > > > > > > > > > > >> > > a
> > > > > > > > > > > >> > > > >>> very
> > > > > > > > > > > >> > > > >>> obvious failure case and nobody should be
> > > > > surprised
> > > > > > if
> > > > > > > > it
> > > > > > > > > > didn't
> > > > > > > > > > > >> > > work.
> > > > > > > > > > > >> > > > >>> I wouldn't choose a. as I think we should
> > bump
> > > > the
> > > > > > > fetch
> > > > > > > > > and
> > > > > > > > > > > >> > produce
> > > > > > > > > > > >> > > > >>> requests if it's a change in the message
> > > format.
> > > > > > > > Moreover
> > > > > > > > > if
> > > > > > > > > > > >> some
> > > > > > > > > > > >> > of
> > > > > > > > > > > >> > > > the
> > > > > > > > > > > >> > > > >>> producers and the brokers are upgraded but
> > > some
> > > > of
> > > > > > the
> > > > > > > > > > consumers
> > > > > > > > > > > >> > are
> > > > > > > > > > > >> > > > not,
> > > > > > > > > > > >> > > > >>> then we wouldn't prevent the error when
> the
> > > old
> > > > > > > consumer
> > > > > > > > > > tries
> > > > > > > > > > > >> to
> > > > > > > > > > > >> > > > consume
> > > > > > > > > > > >> > > > >>> the zstd compressed messages.
> > > > > > > > > > > >> > > > >>> I wouldn't choose c. either as I think
> > binding
> > > > the
> > > > > > > > > > compression
> > > > > > > > > > > >> type
> > > > > > > > > > > >> > > to
> > > > > > > > > > > >> > > > an
> > > > > > > > > > > >> > > > >>> API is not so obvious from the developer's
> > > > > > > perspective.
> > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > >> > > > >>> I would also prefer to use the existing
> > > binding,
> > > > > > > however
> > > > > > > > > we
> > > > > > > > > > must
> > > > > > > > > > > >> > > > respect
> > > > > > > > > > > >> > > > >>> the licenses:
> > > > > > > > > > > >> > > > >>> "The code for these JNI bindings is
> licenced
> > > > under
> > > > > > > > > 2-clause
> > > > > > > > > > BSD
> > > > > > > > > > > >> > > > license.
> > > > > > > > > > > >> > > > >>> The native Zstd library is licensed under
> > > > 3-clause
> > > > > > BSD
> > > > > > > > > > license
> > > > > > > > > > > >> and
> > > > > > > > > > > >> > > > GPL2"
> > > > > > > > > > > >> > > > >>> Based on the FAQ page
> > > > > > > > > > > >> > > > >>> https://www.apache.org/legal/
> > > > > > resolved.html#category-a
> > > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD licenses
> but
> > > the
> > > > > > Apache
> > > > > > > > > > license
> > > > > > > > > > > >> is
> > > > > > > > > > > >> > not
> > > > > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping that the
> > > > > "3-clause
> > > > > > > BSD
> > > > > > > > > > license
> > > > > > > > > > > >> and
> > > > > > > > > > > >> > > > GPL2"
> > > > > > > > > > > >> > > > >>> is really not an AND but an OR in this
> case,
> > > but
> > > > > I'm
> > > > > > > no
> > > > > > > > > > lawyer,
> > > > > > > > > > > >> > just
> > > > > > > > > > > >> > > > >>> wanted
> > > > > > > > > > > >> > > > >>> to make the point that we should watch out
> > for
> > > > > > > licenses.
> > > > > > > > > :)
> > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > >> > > > >>> Regards,
> > > > > > > > > > > >> > > > >>> Viktor
> > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan
> Babrou
> > <
> > > > > > > > > > ibobrik@gmail.com>
> > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > >> > > > >>> > Hello,
> > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > >> > > > >>> > This is Ivan and I still very much
> support
> > > the
> > > > > > fact
> > > > > > > > that
> > > > > > > > > > zstd
> > > > > > > > > > > >> > > > >>> compression
> > > > > > > > > > > >> > > > >>> > should be included out of the box.
> > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > >> > > > >>> > Please think about the environment, you
> > can
> > > > save
> > > > > > > > quite a
> > > > > > > > > > lot
> > > > > > > > > > > >> of
> > > > > > > > > > > >> > > > >>> hardware
> > > > > > > > > > > >> > > > >>> > with it.
> > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > >> > > > >>> > Thank you.
> > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin
> Lee <
> > > > > > > > > > dongjin@apache.org>
> > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > >> > > > >>> > > Since there are no responses for a
> > week, I
> > > > > > decided
> > > > > > > > to
> > > > > > > > > > > >> > reinitiate
> > > > > > > > > > > >> > > > the
> > > > > > > > > > > >> > > > >>> > > discussion thread.
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> > > This KIP is about to introduce
> ZStandard
> > > > > > > Compression
> > > > > > > > > > into
> > > > > > > > > > > >> > Apache
> > > > > > > > > > > >> > > > >>> Kafka.
> > > > > > > > > > > >> > > > >>> > > The reason why it is posted again has
> a
> > > > story:
> > > > > > It
> > > > > > > > was
> > > > > > > > > > > >> > originally
> > > > > > > > > > > >> > > > >>> posted
> > > > > > > > > > > >> > > > >>> > to
> > > > > > > > > > > >> > > > >>> > > the dev mailing list more than one
> year
> > > ago
> > > > > but
> > > > > > > > since
> > > > > > > > > it
> > > > > > > > > > > >> has no
> > > > > > > > > > > >> > > > >>> > performance
> > > > > > > > > > > >> > > > >>> > > report included, it was postponed
> later.
> > > But
> > > > > > Some
> > > > > > > > > people
> > > > > > > > > > > >> > > (including
> > > > > > > > > > > >> > > > >>> Ivan)
> > > > > > > > > > > >> > > > >>> > > reported excellent performance report
> > with
> > > > the
> > > > > > > draft
> > > > > > > > > PR,
> > > > > > > > > > > >> this
> > > > > > > > > > > >> > > work
> > > > > > > > > > > >> > > > >>> is now
> > > > > > > > > > > >> > > > >>> > > reactivated.
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> > > The updated KIP document includes some
> > > > > expected
> > > > > > > > > > problems and
> > > > > > > > > > > >> > > their
> > > > > > > > > > > >> > > > >>> > > candidate alternatives. Please have a
> > look
> > > > > when
> > > > > > > you
> > > > > > > > > are
> > > > > > > > > > > >> free,
> > > > > > > > > > > >> > and
> > > > > > > > > > > >> > > > >>> give
> > > > > > > > > > > >> > > > >>> > me a
> > > > > > > > > > > >> > > > >>> > > feedback. All kinds of participating
> are
> > > > > > welcome.
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> > > Best,
> > > > > > > > > > > >> > > > >>> > > Dongjin
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> > > --
> > > > > > > > > > > >> > > > >>> > > *Dongjin Lee*
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> > > *A hitchhiker in the mathematical
> > world.*
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> > > *github:  <http://goog_969573159/
> > >github
> > > > > > > > > > .com/dongjinleekr
> > > > > > > > > > > >> > > > >>> > > <http://github.com/dongjinleekr
> > >linkedin:
> > > > > > > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/
> dongjinleekr
> > > > > > > >slideshare:
> > > > > > > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > > > > > > > > >> > > > >>> > > <http://www.slideshare.net/
> dongjinleekr
> > >*
> > > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > > >> > > > >>> >
> > > > > > > > > > > >> > > > >>>
> > > > > > > > > > > >> > > > >> --
> > > > > > > > > > > >> > > > >> *Dongjin Lee*
> > > > > > > > > > > >> > > > >>
> > > > > > > > > > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > > > > >> > > > >>
> > > > > > > > > > > >> > > > >> *github:  <http://goog_969573159/>github
> > > > > > > > .com/dongjinleekr
> > > > > > > > > > > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr
> > > > > >slideshare:
> > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > >> > > > >>
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > --
> > > > > > > > > > > >> > > > > *Dongjin Lee*
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > > > *github:  <http://goog_969573159/>
> > > > > > > github.com/dongjinleekr
> > > > > > > > > > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr
> > > > >slideshare:
> > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > >> > > > >
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > --
> > > > > > > > > > > >> > > > *Dongjin Lee*
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > > > *github:  <http://goog_969573159/>github
> > > > > > .com/dongjinleekr
> > > > > > > > > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr
> > > >slideshare:
> > > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > >> > > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> >
> > > > > > > > > > > >>
> > > > > > > > > > > >> --
> > > > > > > > > > > >> *Dongjin Lee*
> > > > > > > > > > > >>
> > > > > > > > > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > > > > >>
> > > > > > > > > > > >> *github:  <http://goog_969573159/>
> > > github.com/dongjinleekr
> > > > > > > > > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > kr.linkedin.com/in/
> > > > > > > > > > dongjinleekr
> > > > > > > > > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > *Dongjin Lee*
> > > > > > > > > > >
> > > > > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > >
> > > > > > > > > > > *github:  <http://goog_969573159/>
> > github.com/dongjinleekr
> > > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/
> > > > > > > > > > dongjinleekr
> > > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > *Dongjin Lee*
> > > > > > > > >
> > > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > >
> > > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/
> > > > > > > > dongjinleekr
> > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > www.slideshare.net/
> > > > > > > > dongjinleekr
> > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *Dongjin Lee*
> > > > > > >
> > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > >
> > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > > > dongjinleekr
> > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > www.slideshare.net/
> > > > > > dongjinleekr
> > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/
> dongjinleekr
> > <http://www.slideshare.net/dongjinleekr>*
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Ismael, Jason and all,

I rewrote the backward compatibility strategy & its alternatives like
following, based on Ismael & Jason's comments. Since it is not updated to
the wiki yet, don't hesitate to give me a message if you have any opinion
on it.

```
*Backward Compatibility*

We need to establish some backward-compatibility strategy for the case an
old client subscribes a topic using ZStandard implicitly (i.e.,
'compression.type' configuration of given topic is 'producer' and the
producer compressed the records with ZStandard). We have the following
options for this situation:

*A. Support ZStandard to the old clients which can understand v0, v1
messages only.*

This strategy necessarily requires the down-conversion of v2 message
compressed with Zstandard into v0 or v1 messages, which means a
considerable performance degradation. So we rejected this strategy.

*B. Bump the API version and support only v2-available clients*

With this approach, we can message the old clients that they are old and
should be upgraded. However, there are still several options for the Error
code.

*B.1. INVALID_REQUEST (42)*

This option gives the client so little information; the user can be
confused about why the client worked correctly in the past suddenly
encounters a problem. So we rejected this strategy.

*B.2. CORRUPT_MESSAGE (2)*

This option gives inaccurate information; the user can be surprised and
misunderstand that the log files are broken in some way. So we rejected
this strategy.

*B.3 UNSUPPORTED_FOR_MESSAGE_FORMAT (43)*

The advantage of this approach is we don't need to define a new error code;
we can reuse it and that's all.

The disadvantage of this approach is that it is also a little bit vague;
This error code is defined as a work for KIP-98[^1] and now returned in the
transaction error.

*B.4. UNSUPPORTED_COMPRESSION_TYPE (new)*

The advantage of this approach is that it is clear and provides an exact
description. The disadvantage is we need to add a new error code.
```

*It seems like what we need to choose is now so clear:
UNSUPPORTED_FOR_MESSAGE_FORMAT (B.3) or UNSUPPORTED_COMPRESSION_TYPE (B.4).*
The first one doesn't need a new error message but the latter is more
explicit. Which one do you prefer? Since all of you have much more
experience and knowledge than me, I will follow your decision. The wiki
page will be updated following the decision also.

Best,
Dongjin

[^1]: https://issues.apache.org/jira/browse/KAFKA-4990

On Sun, Aug 19, 2018 at 4:58 AM Ismael Juma <is...@gmail.com> wrote:

> Sounds reasonable to me.
>
> Ismael
>
> On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <ja...@confluent.io> wrote:
>
> > Hey Ismael,
> >
> > Your summary looks good to me. I think it might also be a good idea to
> add
> > a new UNSUPPORTED_COMPRESSION_TYPE error code to go along with the
> version
> > bumps. We won't be able to use it for old api versions since the clients
> > will not understand it, but we can use it going forward so that we're not
> > stuck in a similar situation with a new message format and a new codec to
> > support. Another option is to use UNSUPPORTED_FOR_MESSAGE_FORMAT, but it
> is
> > not as explicit.
> >
> > -Jason
> >
> > On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Hi Dongjin and Jason,
> > >
> > > I would agree. My summary:
> > >
> > > 1. Support zstd with message format 2 only.
> > > 2. Bump produce and fetch request versions.
> > > 3. Provide broker errors whenever possible based on the request version
> > and
> > > rely on clients for the cases where the broker can't validate
> efficiently
> > > (example message format 2 consumer that supports the latest fetch
> version
> > > but doesn't support zstd).
> > >
> > > If there's general agreement on this, I suggest we update the KIP to
> > state
> > > the proposal and to move the rejected options to its own section. And
> > then
> > > start a vote!
> > >
> > > Ismael
> > >
> > > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hi Dongjin,
> > > >
> > > > Yes, that's a good summary. For clients which support v2, the client
> > can
> > > > parse the message format and hopefully raise a useful error message
> > > > indicating the unsupported compression type. For older clients, our
> > > options
> > > > are probably (1) to down-convert to the old format using no
> compression
> > > > type, or (2) to return an error code. I'm leaning toward the latter
> as
> > > the
> > > > simpler solution, but the challenge is finding a good error code. Two
> > > > possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE. The
> downside
> > > is
> > > > that old clients probably won't get a helpful message. However, at
> > least
> > > > the behavior will be consistent in the sense that all clients will
> fail
> > > if
> > > > they do not support zstandard.
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <do...@apache.org>
> > wrote:
> > > >
> > > > > Thanks Jason, I reviewed the down-converting logic following your
> > > > > explanation.[^1] You mean the following routines, right?
> > > > >
> > > > > -
> > > > > https://github.com/apache/kafka/blob/trunk/core/src/
> > > > > main/scala/kafka/server/KafkaApis.scala#L534
> > > > > -
> > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > main/java/org/apache/kafka/common/record/LazyDownConversionRecords.
> > > > > java#L165
> > > > > -
> > > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > > main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > > > >
> > > > > It seems like your stance is like following:
> > > > >
> > > > > 1. In principle, Kafka does not change the compression codec when
> > > > > down-converting, since it requires inspecting the fetched data,
> which
> > > is
> > > > > expensive.
> > > > > 2. However, there are some cases the fetched data is inspected
> > anyway.
> > > In
> > > > > this case, we can provide compression conversion from Zstandard to
> > > > > classical ones[^2].
> > > > >
> > > > > And from what I understand, the cases where the client without
> > > ZStandard
> > > > > support receives ZStandard compressed records can be organized into
> > two
> > > > > cases:
> > > > >
> > > > > a. The 'compression.type' configuration of given topic is
> 'producer'
> > > and
> > > > > the producer compressed the records with ZStandard. (that is, using
> > > > > ZStandard implicitly.)
> > > > > b.  The 'compression.type' configuration of given topic is 'zstd';
> > that
> > > > is,
> > > > > using ZStandard explicitly.
> > > > >
> > > > > As you stated, we don't have to handle the case b specially. So, It
> > > seems
> > > > > like we can narrow the focus of the problem by joining case 1 and
> > case
> > > b
> > > > > like the following:
> > > > >
> > > > > > Given the topic with 'producer' as its 'compression.type'
> > > > configuration,
> > > > > ZStandard compressed records and old client without ZStandard, is
> > there
> > > > any
> > > > > case we need to inspect the records and can change the compression
> > > type?
> > > > If
> > > > > so, can we provide compression type converting?
> > > > >
> > > > > Do I understand correctly?
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > [^1]: I'm sorry, I found that I was a little bit misunderstanding
> how
> > > API
> > > > > version works, after reviewing the downconvert logic & the protocol
> > > > > documentation <https://kafka.apache.org/protocol>.
> > > > > [^2]: None, Gzip, Snappy, Lz4.
> > > > >
> > > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <
> jason@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > >
> > > > > > > But in my opinion, since the client will fail with the API
> > version,
> > > > so
> > > > > we
> > > > > > > don't need to down-convert the messages anyway. Isn't it? So, I
> > > think
> > > > > we
> > > > > > > don't care about this case. (I'm sorry, I am not familiar with
> > > > > > down-convert
> > > > > > > logic.)
> > > > > >
> > > > > >
> > > > > > Currently the broker down-converts automatically when it receives
> > an
> > > > old
> > > > > > version of the fetch request (a version which is known to predate
> > the
> > > > > > message format in use). Typically when down-converting the
> message
> > > > > format,
> > > > > > we use the same compression type, but there is not much point in
> > > doing
> > > > so
> > > > > > when we know the client doesn't support it. So if zstandard is in
> > > use,
> > > > > and
> > > > > > we have to down-convert anyway, then we can choose to use a
> > different
> > > > > > compression type or no compression type.
> > > > > >
> > > > > > From my perspective, there is no significant downside to bumping
> > the
> > > > > > protocol version and it has several potential benefits. Version
> > bumps
> > > > are
> > > > > > cheap. The main question mark in my mind is about
> down-conversion.
> > > > > Figuring
> > > > > > out whether down-conversion is needed is hard generally without
> > > > > inspecting
> > > > > > the fetched data, which is expensive. I think we agree in
> principle
> > > > that
> > > > > we
> > > > > > do not want to have to pay this cost generally and prefer the
> > clients
> > > > to
> > > > > > fail when they see an unhandled compression type. The point I was
> > > > making
> > > > > is
> > > > > > that there are some cases where we are either inspecting the data
> > > > anyway
> > > > > > (because we have to down-convert the message format), or we have
> an
> > > > easy
> > > > > > way to tell whether zstandard is in use (the topic has it
> > configured
> > > > > > explicitly). In the latter case, we don't have to handle it
> > > specially.
> > > > > But
> > > > > > we do have to decide how we will handle down-conversion to older
> > > > formats.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <dongjin@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Colin and Jason,
> > > > > > >
> > > > > > > Thanks for your opinions. In summarizing, the Pros and Cons of
> > > > bumping
> > > > > > > fetch API version are:
> > > > > > >
> > > > > > > Cons:
> > > > > > >
> > > > > > > - The Broker can't know whether a given message batch is
> > compressed
> > > > > with
> > > > > > > zstd or not.
> > > > > > > - Need some additional logic for the topic explicitly
> configured
> > to
> > > > use
> > > > > > > zstd.
> > > > > > >
> > > > > > > Pros:
> > > > > > >
> > > > > > > - The broker doesn't need to conduct expensive down-conversion.
> > > > > > > - Can message the users to update their client.
> > > > > > >
> > > > > > > So, opinions for the backward-compatibility policy by far:
> > > > > > >
> > > > > > > - A: bump the API version - +2 (Colin, Jason)
> > > > > > > - B: leave unchanged - +1 (Viktor)
> > > > > > >
> > > > > > > Here are my additional comments:
> > > > > > >
> > > > > > > @Colin
> > > > > > >
> > > > > > > I greatly appreciate your response. In the case of the
> dictionary
> > > > > > support,
> > > > > > > of course, this issue should be addressed later so we don't
> need
> > it
> > > > in
> > > > > > the
> > > > > > > first version. You are right - it is not late to try it after
> > some
> > > > > > > benchmarks. What I mean is, we should keep in mind on that
> > > potential
> > > > > > > feature.
> > > > > > >
> > > > > > > @Jason
> > > > > > >
> > > > > > > You wrote,
> > > > > > >
> > > > > > > > Similarly, if we have to down-convert anyway because the
> client
> > > > does
> > > > > > not
> > > > > > > understand the message format, then we could also use a
> different
> > > > > > > compression type.
> > > > > > >
> > > > > > > But in my opinion, since the client will fail with the API
> > version,
> > > > so
> > > > > we
> > > > > > > don't need to down-convert the messages anyway. Isn't it? So, I
> > > think
> > > > > we
> > > > > > > don't care about this case. (I'm sorry, I am not familiar with
> > > > > > down-convert
> > > > > > > logic.)
> > > > > > >
> > > > > > > Please give more opinions. Thanks!
> > > > > > >
> > > > > > > - Dongjin
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
> > jason@confluent.io
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hey Colin,
> > > > > > > >
> > > > > > > > The problem for the fetch API is that the broker does not
> > > generally
> > > > > > know
> > > > > > > if
> > > > > > > > a batch was compressed with zstd unless it parses it. I think
> > the
> > > > > goal
> > > > > > > here
> > > > > > > > is to avoid the expensive down-conversion that is needed to
> > > ensure
> > > > > > > > compatibility because it is only necessary if zstd is
> actually
> > in
> > > > > use.
> > > > > > > But
> > > > > > > > as long as old clients can parse the message format, they
> > should
> > > > get
> > > > > a
> > > > > > > > reasonable error if they see an unsupported compression type
> in
> > > the
> > > > > > > > attributes. Basically the onus is on users to ensure that
> their
> > > > > > consumers
> > > > > > > > have been updated prior to using zstd. It seems like a
> > reasonable
> > > > > > > tradeoff
> > > > > > > > to me. There are a couple cases that might be worth thinking
> > > > through:
> > > > > > > >
> > > > > > > > 1. If a topic is explicitly configured to use zstd, then we
> > don't
> > > > > need
> > > > > > to
> > > > > > > > check the fetched data for the compression type to know if we
> > > need
> > > > > > > > down-conversion. If we did bump the Fetch API version, then
> we
> > > > could
> > > > > > > handle
> > > > > > > > this case by either down-converting using a different
> > compression
> > > > > type
> > > > > > or
> > > > > > > > returning an error.
> > > > > > > > 2. Similarly, if we have to down-convert anyway because the
> > > client
> > > > > does
> > > > > > > not
> > > > > > > > understand the message format, then we could also use a
> > different
> > > > > > > > compression type.
> > > > > > > >
> > > > > > > > For the produce API, I think it's reasonable to bump the api
> > > > version.
> > > > > > > This
> > > > > > > > can be used by clients to check whether a broker supports
> zstd.
> > > For
> > > > > > > > example, we might support a list of preferred compression
> types
> > > in
> > > > > the
> > > > > > > > producer and we could use the broker to detect which version
> to
> > > > use.
> > > > > > > >
> > > > > > > > -Jason
> > > > > > > >
> > > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
> > cmccabe@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a good
> compression
> > > > codec
> > > > > > > and I
> > > > > > > > > hope we can get this support in soon!
> > > > > > > > >
> > > > > > > > > I would say we can just bump the API version to indicate
> that
> > > > ZStd
> > > > > > > > support
> > > > > > > > > is expected in new clients.  We probably need some way of
> > > > > indicating
> > > > > > to
> > > > > > > > the
> > > > > > > > > older clients that they can't consume the partitions, as
> > well.
> > > > > > Perhaps
> > > > > > > > we
> > > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > > > > > > > >
> > > > > > > > > The license thing seems straightforward -- it's just a
> matter
> > > of
> > > > > > adding
> > > > > > > > > the text to the right files as per ASF guidelines.
> > > > > > > > >
> > > > > > > > > With regard to the dictionary support, do we really need
> that
> > > in
> > > > > the
> > > > > > > > first
> > > > > > > > > version?  Hopefully message batches are big enough that
> this
> > > > isn't
> > > > > > > > needed.
> > > > > > > > > Some benchmarks might help here.
> > > > > > > > >
> > > > > > > > > best,
> > > > > > > > > Colin
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > > > > > > > As Kafka 2.0.0 was released, let's reboot this issue,
> > KIP-110
> > > > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > > > > > > > .
> > > > > > > > > >
> > > > > > > > > > For newcomers, Here is some summary of the history:
> KIP-110
> > > was
> > > > > > > > > originally
> > > > > > > > > > worked for the issue KAFKA-4514 but, it lacked benchmark
> > > > results
> > > > > to
> > > > > > > get
> > > > > > > > > the
> > > > > > > > > > agreement of the community. Later, Ivan Babrou and some
> > other
> > > > > users
> > > > > > > who
> > > > > > > > > > adopted the patch provided their excellent performance
> > report
> > > > > which
> > > > > > > is
> > > > > > > > > now
> > > > > > > > > > included in the KIP, but it postponed again because of
> the
> > > > > > community
> > > > > > > > was
> > > > > > > > > > busy for 2.0.0 release. It is why I now reboot this
> issue.
> > > > > > > > > >
> > > > > > > > > > The following is the current status of the feature: You
> can
> > > > check
> > > > > > the
> > > > > > > > > > current draft implementation here
> > > > > > > > > > <https://github.com/apache/kafka/pull/2267>. It is based
> > on
> > > > zstd
> > > > > > > 1.3.5
> > > > > > > > > and
> > > > > > > > > > periodically rebased onto the latest trunk[^1].
> > > > > > > > > >
> > > > > > > > > > The issues that should be addressed is like following:
> > > > > > > > > >
> > > > > > > > > > *1. Backward Compatibility*
> > > > > > > > > >
> > > > > > > > > > To support old consumers, we need to take a strategy to
> > > handle
> > > > > the
> > > > > > > old
> > > > > > > > > > consumers. Current candidates are:
> > > > > > > > > >
> > > > > > > > > > - Bump API version
> > > > > > > > > > - Leave unchanged: let the old clients fail.
> > > > > > > > > > - Improve the error messages:
> > > > > > > > > >
> > > > > > > > > > *2. Dictionary Support*
> > > > > > > > > >
> > > > > > > > > > To support zstd's dictionary feature in the future (if
> > > needed),
> > > > > we
> > > > > > > need
> > > > > > > > > to
> > > > > > > > > > sketch how it should be and leave some room for it. As of
> > > now,
> > > > > > there
> > > > > > > > has
> > > > > > > > > > been no discussion on this topic yet.
> > > > > > > > > >
> > > > > > > > > > *3. License*
> > > > > > > > > >
> > > > > > > > > > To use this feature, we need to add license of zstd and
> > > > zstd-jni
> > > > > to
> > > > > > > the
> > > > > > > > > > project. (Thanks to Viktor Somogyi for raising this
> issue!)
> > > It
> > > > > > seems
> > > > > > > > like
> > > > > > > > > > what Apache Spark did would be a good example but there
> has
> > > > been
> > > > > no
> > > > > > > > > > discussion yet.
> > > > > > > > > >
> > > > > > > > > > You can find the details of the above issues in the KIP
> > > > document.
> > > > > > > > Please
> > > > > > > > > > have a look when you are free, and give me feedback. All
> > > kinds
> > > > of
> > > > > > > > > > participating are welcome.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Dongjin
> > > > > > > > > >
> > > > > > > > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > > > > > > > >
> > > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
> > > > dongjin@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Sorry for the late reply.
> > > > > > > > > > >
> > > > > > > > > > > In short, I could not submit the updated KIP by the
> > feature
> > > > > > freeze
> > > > > > > > > > > deadline of 2.0.0. For this reason, it will not be
> > included
> > > > in
> > > > > > the
> > > > > > > > > 2.0.0
> > > > > > > > > > > release and all discussion for this issue were
> postponed
> > > > after
> > > > > > the
> > > > > > > > > release
> > > > > > > > > > > of 2.0.0.
> > > > > > > > > > >
> > > > > > > > > > > I have been updating the PR following recent updates.
> > Just
> > > > > now, I
> > > > > > > > > rebased
> > > > > > > > > > > it against the latest trunk and updated the zstd
> version
> > > into
> > > > > > > 1.3.5.
> > > > > > > > > If you
> > > > > > > > > > > need some request, don't hesitate to notify me. (But
> not
> > > this
> > > > > > > thread
> > > > > > > > -
> > > > > > > > > just
> > > > > > > > > > > send me the message directly.)
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Dongjin
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <
> > > > bobby@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> I there any update on this.  The performance
> > improvements
> > > > are
> > > > > > > quite
> > > > > > > > > > >> impressive and I really would like to stop forking
> kafka
> > > > just
> > > > > to
> > > > > > > get
> > > > > > > > > this
> > > > > > > > > > >> in.
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks,
> > > > > > > > > > >>
> > > > > > > > > > >> Bobby
> > > > > > > > > > >>
> > > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
> > > > > dongjin@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > Ismael,
> > > > > > > > > > >> >
> > > > > > > > > > >> > Oh, I forgot all of you are on working frenzy for
> 2.0!
> > > No
> > > > > > > problem,
> > > > > > > > > take
> > > > > > > > > > >> > your time. I am also working at another issue now.
> > Thank
> > > > you
> > > > > > for
> > > > > > > > > > >> letting me
> > > > > > > > > > >> > know.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Best,
> > > > > > > > > > >> > Dongjin
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <
> > > > > ismaelj@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> > > Sorry for the delay Dongjin. Everyone is busy
> > > finalising
> > > > > > > 2.0.0.
> > > > > > > > > This
> > > > > > > > > > >> KIP
> > > > > > > > > > >> > > seems like a great candidate for 2.1.0 and
> hopefully
> > > > there
> > > > > > > will
> > > > > > > > be
> > > > > > > > > > >> more
> > > > > > > > > > >> > of
> > > > > > > > > > >> > > a discussion next week. :)
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Ismael
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
> > > > > dongjin@apache.org
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > Hello. I just updated my draft implementation:
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > You can check out the implementation from here
> > > > > > > > > > >> > > > <https://github.com/apache/kafka/pull/2267>. If
> > you
> > > > > > > > experience
> > > > > > > > > any
> > > > > > > > > > >> > > problem
> > > > > > > > > > >> > > > running it, don't hesitate to give me a mention.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Best,
> > > > > > > > > > >> > > > Dongjin
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <
> > > > > > > > dongjin@apache.org
> > > > > > > > > >
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > > Here is the short conclusion about the license
> > > > > problem:
> > > > > > > *We
> > > > > > > > > can
> > > > > > > > > > >> use
> > > > > > > > > > >> > > zstd
> > > > > > > > > > >> > > > > and zstd-jni without any problem, but we need
> to
> > > > > include
> > > > > > > > their
> > > > > > > > > > >> > license,
> > > > > > > > > > >> > > > > e.g., BSD license.*
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause
> License
> > > > > requires
> > > > > > > to
> > > > > > > > > > >> include
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > > license used, and BSD 3 Clause License
> requires
> > > that
> > > > > the
> > > > > > > > name
> > > > > > > > > of
> > > > > > > > > > >> the
> > > > > > > > > > >> > > > > contributor can't be used to endorse or
> promote
> > > the
> > > > > > > product.
> > > > > > > > > > >> That's
> > > > > > > > > > >> > it
> > > > > > > > > > >> > > > > <
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> > > > > > > > > compatible-with-apache-2-0-license/
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > - They are not listed in the list of
> prohibited
> > > > > licenses
> > > > > > > > > > >> > > > > <https://www.apache.org/legal/
> > > > > resolved.html#category-x>
> > > > > > > > also.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Here is how Spark did for it
> > > > > > > > > > >> > > > > <https://issues.apache.org/
> > > jira/browse/SPARK-19112
> > > > >:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > - They made a directory dedicated to the
> > > dependency
> > > > > > > license
> > > > > > > > > files
> > > > > > > > > > >> > > > > <
> > > > https://github.com/apache/spark/tree/master/licenses
> > > > > >
> > > > > > > and
> > > > > > > > > added
> > > > > > > > > > >> > > > licenses
> > > > > > > > > > >> > > > > for Zstd
> > > > > > > > > > >> > > > > <
> > > > > > > > > > >> >
> https://github.com/apache/spark/blob/master/licenses/
> > > > > > > > > LICENSE-zstd.txt
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > &
> > > > > > > > > > >> > > > > Zstd-jni
> > > > > > > > > > >> > > > > <
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >> https://github.com/apache/spark/blob/master/licenses/
> > > > > > > > > LICENSE-zstd-jni.txt
> > > > > > > > > > >> >
> > > > > > > > > > >> > > > > .
> > > > > > > > > > >> > > > > - Added a link to the original license files
> in
> > > > > LICENSE.
> > > > > > > > > > >> > > > > <
> > https://github.com/apache/spark/pull/18805/files
> > > >
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > If needed, I can make a similar update.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Thanks for pointing out this problem, Viktor!
> > Nice
> > > > > > catch!
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Best,
> > > > > > > > > > >> > > > > Dongjin
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <
> > > > > > > > > dongjin@apache.org>
> > > > > > > > > > >> > > wrote:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > >> I greatly appreciate your comprehensive
> > > reasoning.
> > > > > so:
> > > > > > +1
> > > > > > > > > for b
> > > > > > > > > > >> > until
> > > > > > > > > > >> > > > now.
> > > > > > > > > > >> > > > >>
> > > > > > > > > > >> > > > >> For the license issues, I will have a check
> on
> > > how
> > > > > the
> > > > > > > over
> > > > > > > > > > >> projects
> > > > > > > > > > >> > > are
> > > > > > > > > > >> > > > >> doing and share the results.
> > > > > > > > > > >> > > > >>
> > > > > > > > > > >> > > > >> Best,
> > > > > > > > > > >> > > > >> Dongjin
> > > > > > > > > > >> > > > >>
> > > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor
> > Somogyi <
> > > > > > > > > > >> > > > viktorsomogyi@gmail.com>
> > > > > > > > > > >> > > > >> wrote:
> > > > > > > > > > >> > > > >>
> > > > > > > > > > >> > > > >>> Hi Dongjin,
> > > > > > > > > > >> > > > >>>
> > > > > > > > > > >> > > > >>> A couple of comments:
> > > > > > > > > > >> > > > >>> I would vote for option b. in the "backward
> > > > > > > compatibility"
> > > > > > > > > > >> section.
> > > > > > > > > > >> > > My
> > > > > > > > > > >> > > > >>> reasoning for this is that users upgrading
> to
> > a
> > > > zstd
> > > > > > > > > compatible
> > > > > > > > > > >> > > version
> > > > > > > > > > >> > > > >>> won't start to use it automatically, so
> manual
> > > > > > > > > reconfiguration
> > > > > > > > > > >> is
> > > > > > > > > > >> > > > >>> required.
> > > > > > > > > > >> > > > >>> Therefore an upgrade won't mess up the
> > cluster.
> > > If
> > > > > not
> > > > > > > all
> > > > > > > > > the
> > > > > > > > > > >> > > clients
> > > > > > > > > > >> > > > >>> are
> > > > > > > > > > >> > > > >>> upgraded but just some of them and they'd
> > start
> > > to
> > > > > use
> > > > > > > > zstd
> > > > > > > > > > >> then it
> > > > > > > > > > >> > > > would
> > > > > > > > > > >> > > > >>> cause errors in the cluster. I'd like to
> > presume
> > > > > > though
> > > > > > > > that
> > > > > > > > > > >> this
> > > > > > > > > > >> > is
> > > > > > > > > > >> > > a
> > > > > > > > > > >> > > > >>> very
> > > > > > > > > > >> > > > >>> obvious failure case and nobody should be
> > > > surprised
> > > > > if
> > > > > > > it
> > > > > > > > > didn't
> > > > > > > > > > >> > > work.
> > > > > > > > > > >> > > > >>> I wouldn't choose a. as I think we should
> bump
> > > the
> > > > > > fetch
> > > > > > > > and
> > > > > > > > > > >> > produce
> > > > > > > > > > >> > > > >>> requests if it's a change in the message
> > format.
> > > > > > > Moreover
> > > > > > > > if
> > > > > > > > > > >> some
> > > > > > > > > > >> > of
> > > > > > > > > > >> > > > the
> > > > > > > > > > >> > > > >>> producers and the brokers are upgraded but
> > some
> > > of
> > > > > the
> > > > > > > > > consumers
> > > > > > > > > > >> > are
> > > > > > > > > > >> > > > not,
> > > > > > > > > > >> > > > >>> then we wouldn't prevent the error when the
> > old
> > > > > > consumer
> > > > > > > > > tries
> > > > > > > > > > >> to
> > > > > > > > > > >> > > > consume
> > > > > > > > > > >> > > > >>> the zstd compressed messages.
> > > > > > > > > > >> > > > >>> I wouldn't choose c. either as I think
> binding
> > > the
> > > > > > > > > compression
> > > > > > > > > > >> type
> > > > > > > > > > >> > > to
> > > > > > > > > > >> > > > an
> > > > > > > > > > >> > > > >>> API is not so obvious from the developer's
> > > > > > perspective.
> > > > > > > > > > >> > > > >>>
> > > > > > > > > > >> > > > >>> I would also prefer to use the existing
> > binding,
> > > > > > however
> > > > > > > > we
> > > > > > > > > must
> > > > > > > > > > >> > > > respect
> > > > > > > > > > >> > > > >>> the licenses:
> > > > > > > > > > >> > > > >>> "The code for these JNI bindings is licenced
> > > under
> > > > > > > > 2-clause
> > > > > > > > > BSD
> > > > > > > > > > >> > > > license.
> > > > > > > > > > >> > > > >>> The native Zstd library is licensed under
> > > 3-clause
> > > > > BSD
> > > > > > > > > license
> > > > > > > > > > >> and
> > > > > > > > > > >> > > > GPL2"
> > > > > > > > > > >> > > > >>> Based on the FAQ page
> > > > > > > > > > >> > > > >>> https://www.apache.org/legal/
> > > > > resolved.html#category-a
> > > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD licenses but
> > the
> > > > > Apache
> > > > > > > > > license
> > > > > > > > > > >> is
> > > > > > > > > > >> > not
> > > > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping that the
> > > > "3-clause
> > > > > > BSD
> > > > > > > > > license
> > > > > > > > > > >> and
> > > > > > > > > > >> > > > GPL2"
> > > > > > > > > > >> > > > >>> is really not an AND but an OR in this case,
> > but
> > > > I'm
> > > > > > no
> > > > > > > > > lawyer,
> > > > > > > > > > >> > just
> > > > > > > > > > >> > > > >>> wanted
> > > > > > > > > > >> > > > >>> to make the point that we should watch out
> for
> > > > > > licenses.
> > > > > > > > :)
> > > > > > > > > > >> > > > >>>
> > > > > > > > > > >> > > > >>> Regards,
> > > > > > > > > > >> > > > >>> Viktor
> > > > > > > > > > >> > > > >>>
> > > > > > > > > > >> > > > >>>
> > > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou
> <
> > > > > > > > > ibobrik@gmail.com>
> > > > > > > > > > >> > > wrote:
> > > > > > > > > > >> > > > >>>
> > > > > > > > > > >> > > > >>> > Hello,
> > > > > > > > > > >> > > > >>> >
> > > > > > > > > > >> > > > >>> > This is Ivan and I still very much support
> > the
> > > > > fact
> > > > > > > that
> > > > > > > > > zstd
> > > > > > > > > > >> > > > >>> compression
> > > > > > > > > > >> > > > >>> > should be included out of the box.
> > > > > > > > > > >> > > > >>> >
> > > > > > > > > > >> > > > >>> > Please think about the environment, you
> can
> > > save
> > > > > > > quite a
> > > > > > > > > lot
> > > > > > > > > > >> of
> > > > > > > > > > >> > > > >>> hardware
> > > > > > > > > > >> > > > >>> > with it.
> > > > > > > > > > >> > > > >>> >
> > > > > > > > > > >> > > > >>> > Thank you.
> > > > > > > > > > >> > > > >>> >
> > > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <
> > > > > > > > > dongjin@apache.org>
> > > > > > > > > > >> > > wrote:
> > > > > > > > > > >> > > > >>> >
> > > > > > > > > > >> > > > >>> > > Since there are no responses for a
> week, I
> > > > > decided
> > > > > > > to
> > > > > > > > > > >> > reinitiate
> > > > > > > > > > >> > > > the
> > > > > > > > > > >> > > > >>> > > discussion thread.
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> >
> > > > > > > > > > >> > > > >>>
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> > > This KIP is about to introduce ZStandard
> > > > > > Compression
> > > > > > > > > into
> > > > > > > > > > >> > Apache
> > > > > > > > > > >> > > > >>> Kafka.
> > > > > > > > > > >> > > > >>> > > The reason why it is posted again has a
> > > story:
> > > > > It
> > > > > > > was
> > > > > > > > > > >> > originally
> > > > > > > > > > >> > > > >>> posted
> > > > > > > > > > >> > > > >>> > to
> > > > > > > > > > >> > > > >>> > > the dev mailing list more than one year
> > ago
> > > > but
> > > > > > > since
> > > > > > > > it
> > > > > > > > > > >> has no
> > > > > > > > > > >> > > > >>> > performance
> > > > > > > > > > >> > > > >>> > > report included, it was postponed later.
> > But
> > > > > Some
> > > > > > > > people
> > > > > > > > > > >> > > (including
> > > > > > > > > > >> > > > >>> Ivan)
> > > > > > > > > > >> > > > >>> > > reported excellent performance report
> with
> > > the
> > > > > > draft
> > > > > > > > PR,
> > > > > > > > > > >> this
> > > > > > > > > > >> > > work
> > > > > > > > > > >> > > > >>> is now
> > > > > > > > > > >> > > > >>> > > reactivated.
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> > > The updated KIP document includes some
> > > > expected
> > > > > > > > > problems and
> > > > > > > > > > >> > > their
> > > > > > > > > > >> > > > >>> > > candidate alternatives. Please have a
> look
> > > > when
> > > > > > you
> > > > > > > > are
> > > > > > > > > > >> free,
> > > > > > > > > > >> > and
> > > > > > > > > > >> > > > >>> give
> > > > > > > > > > >> > > > >>> > me a
> > > > > > > > > > >> > > > >>> > > feedback. All kinds of participating are
> > > > > welcome.
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> > > Best,
> > > > > > > > > > >> > > > >>> > > Dongjin
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> > > --
> > > > > > > > > > >> > > > >>> > > *Dongjin Lee*
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> > > *A hitchhiker in the mathematical
> world.*
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> > > *github:  <http://goog_969573159/
> >github
> > > > > > > > > .com/dongjinleekr
> > > > > > > > > > >> > > > >>> > > <http://github.com/dongjinleekr
> >linkedin:
> > > > > > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr
> > > > > > >slideshare:
> > > > > > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > > > > > > > >> > > > >>> > > <http://www.slideshare.net/dongjinleekr
> >*
> > > > > > > > > > >> > > > >>> > >
> > > > > > > > > > >> > > > >>> >
> > > > > > > > > > >> > > > >>>
> > > > > > > > > > >> > > > >> --
> > > > > > > > > > >> > > > >> *Dongjin Lee*
> > > > > > > > > > >> > > > >>
> > > > > > > > > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > > > >> > > > >>
> > > > > > > > > > >> > > > >> *github:  <http://goog_969573159/>github
> > > > > > > .com/dongjinleekr
> > > > > > > > > > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr
> > > > >slideshare:
> > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > >> > > > >>
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > --
> > > > > > > > > > >> > > > > *Dongjin Lee*
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > *github:  <http://goog_969573159/>
> > > > > > github.com/dongjinleekr
> > > > > > > > > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr
> > > >slideshare:
> > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > --
> > > > > > > > > > >> > > > *Dongjin Lee*
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > *github:  <http://goog_969573159/>github
> > > > > .com/dongjinleekr
> > > > > > > > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr
> > >slideshare:
> > > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> *Dongjin Lee*
> > > > > > > > > > >>
> > > > > > > > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > > > >>
> > > > > > > > > > >> *github:  <http://goog_969573159/>
> > github.com/dongjinleekr
> > > > > > > > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/
> > > > > > > > > dongjinleekr
> > > > > > > > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > *Dongjin Lee*
> > > > > > > > > >
> > > > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > >
> > > > > > > > > > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/
> > > > > > > > > dongjinleekr
> > > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > *Dongjin Lee*
> > > > > > > >
> > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > >
> > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/
> > > > > > > dongjinleekr
> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > www.slideshare.net/
> > > > > > > dongjinleekr
> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > *Dongjin Lee*
> > > > > >
> > > > > > *A hitchhiker in the mathematical world.*
> > > > > >
> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > > dongjinleekr
> > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > www.slideshare.net/
> > > > > dongjinleekr
> > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > >
> > > > >
> > > >
> > >
> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Ismael Juma <is...@gmail.com>.
Sounds reasonable to me.

Ismael

On Sat, 18 Aug 2018, 12:20 Jason Gustafson, <ja...@confluent.io> wrote:

> Hey Ismael,
>
> Your summary looks good to me. I think it might also be a good idea to add
> a new UNSUPPORTED_COMPRESSION_TYPE error code to go along with the version
> bumps. We won't be able to use it for old api versions since the clients
> will not understand it, but we can use it going forward so that we're not
> stuck in a similar situation with a new message format and a new codec to
> support. Another option is to use UNSUPPORTED_FOR_MESSAGE_FORMAT, but it is
> not as explicit.
>
> -Jason
>
> On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <is...@juma.me.uk> wrote:
>
> > Hi Dongjin and Jason,
> >
> > I would agree. My summary:
> >
> > 1. Support zstd with message format 2 only.
> > 2. Bump produce and fetch request versions.
> > 3. Provide broker errors whenever possible based on the request version
> and
> > rely on clients for the cases where the broker can't validate efficiently
> > (example message format 2 consumer that supports the latest fetch version
> > but doesn't support zstd).
> >
> > If there's general agreement on this, I suggest we update the KIP to
> state
> > the proposal and to move the rejected options to its own section. And
> then
> > start a vote!
> >
> > Ismael
> >
> > On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Hi Dongjin,
> > >
> > > Yes, that's a good summary. For clients which support v2, the client
> can
> > > parse the message format and hopefully raise a useful error message
> > > indicating the unsupported compression type. For older clients, our
> > options
> > > are probably (1) to down-convert to the old format using no compression
> > > type, or (2) to return an error code. I'm leaning toward the latter as
> > the
> > > simpler solution, but the challenge is finding a good error code. Two
> > > possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE. The downside
> > is
> > > that old clients probably won't get a helpful message. However, at
> least
> > > the behavior will be consistent in the sense that all clients will fail
> > if
> > > they do not support zstandard.
> > >
> > > What do you think?
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > Thanks Jason, I reviewed the down-converting logic following your
> > > > explanation.[^1] You mean the following routines, right?
> > > >
> > > > -
> > > > https://github.com/apache/kafka/blob/trunk/core/src/
> > > > main/scala/kafka/server/KafkaApis.scala#L534
> > > > -
> > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > main/java/org/apache/kafka/common/record/LazyDownConversionRecords.
> > > > java#L165
> > > > -
> > > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > > main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > > >
> > > > It seems like your stance is like following:
> > > >
> > > > 1. In principle, Kafka does not change the compression codec when
> > > > down-converting, since it requires inspecting the fetched data, which
> > is
> > > > expensive.
> > > > 2. However, there are some cases the fetched data is inspected
> anyway.
> > In
> > > > this case, we can provide compression conversion from Zstandard to
> > > > classical ones[^2].
> > > >
> > > > And from what I understand, the cases where the client without
> > ZStandard
> > > > support receives ZStandard compressed records can be organized into
> two
> > > > cases:
> > > >
> > > > a. The 'compression.type' configuration of given topic is 'producer'
> > and
> > > > the producer compressed the records with ZStandard. (that is, using
> > > > ZStandard implicitly.)
> > > > b.  The 'compression.type' configuration of given topic is 'zstd';
> that
> > > is,
> > > > using ZStandard explicitly.
> > > >
> > > > As you stated, we don't have to handle the case b specially. So, It
> > seems
> > > > like we can narrow the focus of the problem by joining case 1 and
> case
> > b
> > > > like the following:
> > > >
> > > > > Given the topic with 'producer' as its 'compression.type'
> > > configuration,
> > > > ZStandard compressed records and old client without ZStandard, is
> there
> > > any
> > > > case we need to inspect the records and can change the compression
> > type?
> > > If
> > > > so, can we provide compression type converting?
> > > >
> > > > Do I understand correctly?
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > [^1]: I'm sorry, I found that I was a little bit misunderstanding how
> > API
> > > > version works, after reviewing the downconvert logic & the protocol
> > > > documentation <https://kafka.apache.org/protocol>.
> > > > [^2]: None, Gzip, Snappy, Lz4.
> > > >
> > > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <ja...@confluent.io>
> > > > wrote:
> > > >
> > > > > >
> > > > > > But in my opinion, since the client will fail with the API
> version,
> > > so
> > > > we
> > > > > > don't need to down-convert the messages anyway. Isn't it? So, I
> > think
> > > > we
> > > > > > don't care about this case. (I'm sorry, I am not familiar with
> > > > > down-convert
> > > > > > logic.)
> > > > >
> > > > >
> > > > > Currently the broker down-converts automatically when it receives
> an
> > > old
> > > > > version of the fetch request (a version which is known to predate
> the
> > > > > message format in use). Typically when down-converting the message
> > > > format,
> > > > > we use the same compression type, but there is not much point in
> > doing
> > > so
> > > > > when we know the client doesn't support it. So if zstandard is in
> > use,
> > > > and
> > > > > we have to down-convert anyway, then we can choose to use a
> different
> > > > > compression type or no compression type.
> > > > >
> > > > > From my perspective, there is no significant downside to bumping
> the
> > > > > protocol version and it has several potential benefits. Version
> bumps
> > > are
> > > > > cheap. The main question mark in my mind is about down-conversion.
> > > > Figuring
> > > > > out whether down-conversion is needed is hard generally without
> > > > inspecting
> > > > > the fetched data, which is expensive. I think we agree in principle
> > > that
> > > > we
> > > > > do not want to have to pay this cost generally and prefer the
> clients
> > > to
> > > > > fail when they see an unhandled compression type. The point I was
> > > making
> > > > is
> > > > > that there are some cases where we are either inspecting the data
> > > anyway
> > > > > (because we have to down-convert the message format), or we have an
> > > easy
> > > > > way to tell whether zstandard is in use (the topic has it
> configured
> > > > > explicitly). In the latter case, we don't have to handle it
> > specially.
> > > > But
> > > > > we do have to decide how we will handle down-conversion to older
> > > formats.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >
> > > > > > Colin and Jason,
> > > > > >
> > > > > > Thanks for your opinions. In summarizing, the Pros and Cons of
> > > bumping
> > > > > > fetch API version are:
> > > > > >
> > > > > > Cons:
> > > > > >
> > > > > > - The Broker can't know whether a given message batch is
> compressed
> > > > with
> > > > > > zstd or not.
> > > > > > - Need some additional logic for the topic explicitly configured
> to
> > > use
> > > > > > zstd.
> > > > > >
> > > > > > Pros:
> > > > > >
> > > > > > - The broker doesn't need to conduct expensive down-conversion.
> > > > > > - Can message the users to update their client.
> > > > > >
> > > > > > So, opinions for the backward-compatibility policy by far:
> > > > > >
> > > > > > - A: bump the API version - +2 (Colin, Jason)
> > > > > > - B: leave unchanged - +1 (Viktor)
> > > > > >
> > > > > > Here are my additional comments:
> > > > > >
> > > > > > @Colin
> > > > > >
> > > > > > I greatly appreciate your response. In the case of the dictionary
> > > > > support,
> > > > > > of course, this issue should be addressed later so we don't need
> it
> > > in
> > > > > the
> > > > > > first version. You are right - it is not late to try it after
> some
> > > > > > benchmarks. What I mean is, we should keep in mind on that
> > potential
> > > > > > feature.
> > > > > >
> > > > > > @Jason
> > > > > >
> > > > > > You wrote,
> > > > > >
> > > > > > > Similarly, if we have to down-convert anyway because the client
> > > does
> > > > > not
> > > > > > understand the message format, then we could also use a different
> > > > > > compression type.
> > > > > >
> > > > > > But in my opinion, since the client will fail with the API
> version,
> > > so
> > > > we
> > > > > > don't need to down-convert the messages anyway. Isn't it? So, I
> > think
> > > > we
> > > > > > don't care about this case. (I'm sorry, I am not familiar with
> > > > > down-convert
> > > > > > logic.)
> > > > > >
> > > > > > Please give more opinions. Thanks!
> > > > > >
> > > > > > - Dongjin
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <
> jason@confluent.io
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hey Colin,
> > > > > > >
> > > > > > > The problem for the fetch API is that the broker does not
> > generally
> > > > > know
> > > > > > if
> > > > > > > a batch was compressed with zstd unless it parses it. I think
> the
> > > > goal
> > > > > > here
> > > > > > > is to avoid the expensive down-conversion that is needed to
> > ensure
> > > > > > > compatibility because it is only necessary if zstd is actually
> in
> > > > use.
> > > > > > But
> > > > > > > as long as old clients can parse the message format, they
> should
> > > get
> > > > a
> > > > > > > reasonable error if they see an unsupported compression type in
> > the
> > > > > > > attributes. Basically the onus is on users to ensure that their
> > > > > consumers
> > > > > > > have been updated prior to using zstd. It seems like a
> reasonable
> > > > > > tradeoff
> > > > > > > to me. There are a couple cases that might be worth thinking
> > > through:
> > > > > > >
> > > > > > > 1. If a topic is explicitly configured to use zstd, then we
> don't
> > > > need
> > > > > to
> > > > > > > check the fetched data for the compression type to know if we
> > need
> > > > > > > down-conversion. If we did bump the Fetch API version, then we
> > > could
> > > > > > handle
> > > > > > > this case by either down-converting using a different
> compression
> > > > type
> > > > > or
> > > > > > > returning an error.
> > > > > > > 2. Similarly, if we have to down-convert anyway because the
> > client
> > > > does
> > > > > > not
> > > > > > > understand the message format, then we could also use a
> different
> > > > > > > compression type.
> > > > > > >
> > > > > > > For the produce API, I think it's reasonable to bump the api
> > > version.
> > > > > > This
> > > > > > > can be used by clients to check whether a broker supports zstd.
> > For
> > > > > > > example, we might support a list of preferred compression types
> > in
> > > > the
> > > > > > > producer and we could use the broker to detect which version to
> > > use.
> > > > > > >
> > > > > > > -Jason
> > > > > > >
> > > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <
> cmccabe@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for bumping this, Dongjin.  ZStd is a good compression
> > > codec
> > > > > > and I
> > > > > > > > hope we can get this support in soon!
> > > > > > > >
> > > > > > > > I would say we can just bump the API version to indicate that
> > > ZStd
> > > > > > > support
> > > > > > > > is expected in new clients.  We probably need some way of
> > > > indicating
> > > > > to
> > > > > > > the
> > > > > > > > older clients that they can't consume the partitions, as
> well.
> > > > > Perhaps
> > > > > > > we
> > > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > > > > > > >
> > > > > > > > The license thing seems straightforward -- it's just a matter
> > of
> > > > > adding
> > > > > > > > the text to the right files as per ASF guidelines.
> > > > > > > >
> > > > > > > > With regard to the dictionary support, do we really need that
> > in
> > > > the
> > > > > > > first
> > > > > > > > version?  Hopefully message batches are big enough that this
> > > isn't
> > > > > > > needed.
> > > > > > > > Some benchmarks might help here.
> > > > > > > >
> > > > > > > > best,
> > > > > > > > Colin
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > > > > > > As Kafka 2.0.0 was released, let's reboot this issue,
> KIP-110
> > > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > > > > > > .
> > > > > > > > >
> > > > > > > > > For newcomers, Here is some summary of the history: KIP-110
> > was
> > > > > > > > originally
> > > > > > > > > worked for the issue KAFKA-4514 but, it lacked benchmark
> > > results
> > > > to
> > > > > > get
> > > > > > > > the
> > > > > > > > > agreement of the community. Later, Ivan Babrou and some
> other
> > > > users
> > > > > > who
> > > > > > > > > adopted the patch provided their excellent performance
> report
> > > > which
> > > > > > is
> > > > > > > > now
> > > > > > > > > included in the KIP, but it postponed again because of the
> > > > > community
> > > > > > > was
> > > > > > > > > busy for 2.0.0 release. It is why I now reboot this issue.
> > > > > > > > >
> > > > > > > > > The following is the current status of the feature: You can
> > > check
> > > > > the
> > > > > > > > > current draft implementation here
> > > > > > > > > <https://github.com/apache/kafka/pull/2267>. It is based
> on
> > > zstd
> > > > > > 1.3.5
> > > > > > > > and
> > > > > > > > > periodically rebased onto the latest trunk[^1].
> > > > > > > > >
> > > > > > > > > The issues that should be addressed is like following:
> > > > > > > > >
> > > > > > > > > *1. Backward Compatibility*
> > > > > > > > >
> > > > > > > > > To support old consumers, we need to take a strategy to
> > handle
> > > > the
> > > > > > old
> > > > > > > > > consumers. Current candidates are:
> > > > > > > > >
> > > > > > > > > - Bump API version
> > > > > > > > > - Leave unchanged: let the old clients fail.
> > > > > > > > > - Improve the error messages:
> > > > > > > > >
> > > > > > > > > *2. Dictionary Support*
> > > > > > > > >
> > > > > > > > > To support zstd's dictionary feature in the future (if
> > needed),
> > > > we
> > > > > > need
> > > > > > > > to
> > > > > > > > > sketch how it should be and leave some room for it. As of
> > now,
> > > > > there
> > > > > > > has
> > > > > > > > > been no discussion on this topic yet.
> > > > > > > > >
> > > > > > > > > *3. License*
> > > > > > > > >
> > > > > > > > > To use this feature, we need to add license of zstd and
> > > zstd-jni
> > > > to
> > > > > > the
> > > > > > > > > project. (Thanks to Viktor Somogyi for raising this issue!)
> > It
> > > > > seems
> > > > > > > like
> > > > > > > > > what Apache Spark did would be a good example but there has
> > > been
> > > > no
> > > > > > > > > discussion yet.
> > > > > > > > >
> > > > > > > > > You can find the details of the above issues in the KIP
> > > document.
> > > > > > > Please
> > > > > > > > > have a look when you are free, and give me feedback. All
> > kinds
> > > of
> > > > > > > > > participating are welcome.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Dongjin
> > > > > > > > >
> > > > > > > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > > > > > > >
> > > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
> > > dongjin@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Sorry for the late reply.
> > > > > > > > > >
> > > > > > > > > > In short, I could not submit the updated KIP by the
> feature
> > > > > freeze
> > > > > > > > > > deadline of 2.0.0. For this reason, it will not be
> included
> > > in
> > > > > the
> > > > > > > > 2.0.0
> > > > > > > > > > release and all discussion for this issue were postponed
> > > after
> > > > > the
> > > > > > > > release
> > > > > > > > > > of 2.0.0.
> > > > > > > > > >
> > > > > > > > > > I have been updating the PR following recent updates.
> Just
> > > > now, I
> > > > > > > > rebased
> > > > > > > > > > it against the latest trunk and updated the zstd version
> > into
> > > > > > 1.3.5.
> > > > > > > > If you
> > > > > > > > > > need some request, don't hesitate to notify me. (But not
> > this
> > > > > > thread
> > > > > > > -
> > > > > > > > just
> > > > > > > > > > send me the message directly.)
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Dongjin
> > > > > > > > > >
> > > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <
> > > bobby@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> I there any update on this.  The performance
> improvements
> > > are
> > > > > > quite
> > > > > > > > > >> impressive and I really would like to stop forking kafka
> > > just
> > > > to
> > > > > > get
> > > > > > > > this
> > > > > > > > > >> in.
> > > > > > > > > >>
> > > > > > > > > >> Thanks,
> > > > > > > > > >>
> > > > > > > > > >> Bobby
> > > > > > > > > >>
> > > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
> > > > dongjin@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > Ismael,
> > > > > > > > > >> >
> > > > > > > > > >> > Oh, I forgot all of you are on working frenzy for 2.0!
> > No
> > > > > > problem,
> > > > > > > > take
> > > > > > > > > >> > your time. I am also working at another issue now.
> Thank
> > > you
> > > > > for
> > > > > > > > > >> letting me
> > > > > > > > > >> > know.
> > > > > > > > > >> >
> > > > > > > > > >> > Best,
> > > > > > > > > >> > Dongjin
> > > > > > > > > >> >
> > > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <
> > > > ismaelj@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Sorry for the delay Dongjin. Everyone is busy
> > finalising
> > > > > > 2.0.0.
> > > > > > > > This
> > > > > > > > > >> KIP
> > > > > > > > > >> > > seems like a great candidate for 2.1.0 and hopefully
> > > there
> > > > > > will
> > > > > > > be
> > > > > > > > > >> more
> > > > > > > > > >> > of
> > > > > > > > > >> > > a discussion next week. :)
> > > > > > > > > >> > >
> > > > > > > > > >> > > Ismael
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
> > > > dongjin@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Hello. I just updated my draft implementation:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > You can check out the implementation from here
> > > > > > > > > >> > > > <https://github.com/apache/kafka/pull/2267>. If
> you
> > > > > > > experience
> > > > > > > > any
> > > > > > > > > >> > > problem
> > > > > > > > > >> > > > running it, don't hesitate to give me a mention.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Best,
> > > > > > > > > >> > > > Dongjin
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <
> > > > > > > dongjin@apache.org
> > > > > > > > >
> > > > > > > > > >> > wrote:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > Here is the short conclusion about the license
> > > > problem:
> > > > > > *We
> > > > > > > > can
> > > > > > > > > >> use
> > > > > > > > > >> > > zstd
> > > > > > > > > >> > > > > and zstd-jni without any problem, but we need to
> > > > include
> > > > > > > their
> > > > > > > > > >> > license,
> > > > > > > > > >> > > > > e.g., BSD license.*
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause License
> > > > requires
> > > > > > to
> > > > > > > > > >> include
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > license used, and BSD 3 Clause License requires
> > that
> > > > the
> > > > > > > name
> > > > > > > > of
> > > > > > > > > >> the
> > > > > > > > > >> > > > > contributor can't be used to endorse or promote
> > the
> > > > > > product.
> > > > > > > > > >> That's
> > > > > > > > > >> > it
> > > > > > > > > >> > > > > <
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> > > > > > > > compatible-with-apache-2-0-license/
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > - They are not listed in the list of prohibited
> > > > licenses
> > > > > > > > > >> > > > > <https://www.apache.org/legal/
> > > > resolved.html#category-x>
> > > > > > > also.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Here is how Spark did for it
> > > > > > > > > >> > > > > <https://issues.apache.org/
> > jira/browse/SPARK-19112
> > > >:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > - They made a directory dedicated to the
> > dependency
> > > > > > license
> > > > > > > > files
> > > > > > > > > >> > > > > <
> > > https://github.com/apache/spark/tree/master/licenses
> > > > >
> > > > > > and
> > > > > > > > added
> > > > > > > > > >> > > > licenses
> > > > > > > > > >> > > > > for Zstd
> > > > > > > > > >> > > > > <
> > > > > > > > > >> > https://github.com/apache/spark/blob/master/licenses/
> > > > > > > > LICENSE-zstd.txt
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > &
> > > > > > > > > >> > > > > Zstd-jni
> > > > > > > > > >> > > > > <
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >> https://github.com/apache/spark/blob/master/licenses/
> > > > > > > > LICENSE-zstd-jni.txt
> > > > > > > > > >> >
> > > > > > > > > >> > > > > .
> > > > > > > > > >> > > > > - Added a link to the original license files in
> > > > LICENSE.
> > > > > > > > > >> > > > > <
> https://github.com/apache/spark/pull/18805/files
> > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > If needed, I can make a similar update.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Thanks for pointing out this problem, Viktor!
> Nice
> > > > > catch!
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > Best,
> > > > > > > > > >> > > > > Dongjin
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <
> > > > > > > > dongjin@apache.org>
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >> I greatly appreciate your comprehensive
> > reasoning.
> > > > so:
> > > > > +1
> > > > > > > > for b
> > > > > > > > > >> > until
> > > > > > > > > >> > > > now.
> > > > > > > > > >> > > > >>
> > > > > > > > > >> > > > >> For the license issues, I will have a check on
> > how
> > > > the
> > > > > > over
> > > > > > > > > >> projects
> > > > > > > > > >> > > are
> > > > > > > > > >> > > > >> doing and share the results.
> > > > > > > > > >> > > > >>
> > > > > > > > > >> > > > >> Best,
> > > > > > > > > >> > > > >> Dongjin
> > > > > > > > > >> > > > >>
> > > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor
> Somogyi <
> > > > > > > > > >> > > > viktorsomogyi@gmail.com>
> > > > > > > > > >> > > > >> wrote:
> > > > > > > > > >> > > > >>
> > > > > > > > > >> > > > >>> Hi Dongjin,
> > > > > > > > > >> > > > >>>
> > > > > > > > > >> > > > >>> A couple of comments:
> > > > > > > > > >> > > > >>> I would vote for option b. in the "backward
> > > > > > compatibility"
> > > > > > > > > >> section.
> > > > > > > > > >> > > My
> > > > > > > > > >> > > > >>> reasoning for this is that users upgrading to
> a
> > > zstd
> > > > > > > > compatible
> > > > > > > > > >> > > version
> > > > > > > > > >> > > > >>> won't start to use it automatically, so manual
> > > > > > > > reconfiguration
> > > > > > > > > >> is
> > > > > > > > > >> > > > >>> required.
> > > > > > > > > >> > > > >>> Therefore an upgrade won't mess up the
> cluster.
> > If
> > > > not
> > > > > > all
> > > > > > > > the
> > > > > > > > > >> > > clients
> > > > > > > > > >> > > > >>> are
> > > > > > > > > >> > > > >>> upgraded but just some of them and they'd
> start
> > to
> > > > use
> > > > > > > zstd
> > > > > > > > > >> then it
> > > > > > > > > >> > > > would
> > > > > > > > > >> > > > >>> cause errors in the cluster. I'd like to
> presume
> > > > > though
> > > > > > > that
> > > > > > > > > >> this
> > > > > > > > > >> > is
> > > > > > > > > >> > > a
> > > > > > > > > >> > > > >>> very
> > > > > > > > > >> > > > >>> obvious failure case and nobody should be
> > > surprised
> > > > if
> > > > > > it
> > > > > > > > didn't
> > > > > > > > > >> > > work.
> > > > > > > > > >> > > > >>> I wouldn't choose a. as I think we should bump
> > the
> > > > > fetch
> > > > > > > and
> > > > > > > > > >> > produce
> > > > > > > > > >> > > > >>> requests if it's a change in the message
> format.
> > > > > > Moreover
> > > > > > > if
> > > > > > > > > >> some
> > > > > > > > > >> > of
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > >>> producers and the brokers are upgraded but
> some
> > of
> > > > the
> > > > > > > > consumers
> > > > > > > > > >> > are
> > > > > > > > > >> > > > not,
> > > > > > > > > >> > > > >>> then we wouldn't prevent the error when the
> old
> > > > > consumer
> > > > > > > > tries
> > > > > > > > > >> to
> > > > > > > > > >> > > > consume
> > > > > > > > > >> > > > >>> the zstd compressed messages.
> > > > > > > > > >> > > > >>> I wouldn't choose c. either as I think binding
> > the
> > > > > > > > compression
> > > > > > > > > >> type
> > > > > > > > > >> > > to
> > > > > > > > > >> > > > an
> > > > > > > > > >> > > > >>> API is not so obvious from the developer's
> > > > > perspective.
> > > > > > > > > >> > > > >>>
> > > > > > > > > >> > > > >>> I would also prefer to use the existing
> binding,
> > > > > however
> > > > > > > we
> > > > > > > > must
> > > > > > > > > >> > > > respect
> > > > > > > > > >> > > > >>> the licenses:
> > > > > > > > > >> > > > >>> "The code for these JNI bindings is licenced
> > under
> > > > > > > 2-clause
> > > > > > > > BSD
> > > > > > > > > >> > > > license.
> > > > > > > > > >> > > > >>> The native Zstd library is licensed under
> > 3-clause
> > > > BSD
> > > > > > > > license
> > > > > > > > > >> and
> > > > > > > > > >> > > > GPL2"
> > > > > > > > > >> > > > >>> Based on the FAQ page
> > > > > > > > > >> > > > >>> https://www.apache.org/legal/
> > > > resolved.html#category-a
> > > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD licenses but
> the
> > > > Apache
> > > > > > > > license
> > > > > > > > > >> is
> > > > > > > > > >> > not
> > > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping that the
> > > "3-clause
> > > > > BSD
> > > > > > > > license
> > > > > > > > > >> and
> > > > > > > > > >> > > > GPL2"
> > > > > > > > > >> > > > >>> is really not an AND but an OR in this case,
> but
> > > I'm
> > > > > no
> > > > > > > > lawyer,
> > > > > > > > > >> > just
> > > > > > > > > >> > > > >>> wanted
> > > > > > > > > >> > > > >>> to make the point that we should watch out for
> > > > > licenses.
> > > > > > > :)
> > > > > > > > > >> > > > >>>
> > > > > > > > > >> > > > >>> Regards,
> > > > > > > > > >> > > > >>> Viktor
> > > > > > > > > >> > > > >>>
> > > > > > > > > >> > > > >>>
> > > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <
> > > > > > > > ibobrik@gmail.com>
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > > >>>
> > > > > > > > > >> > > > >>> > Hello,
> > > > > > > > > >> > > > >>> >
> > > > > > > > > >> > > > >>> > This is Ivan and I still very much support
> the
> > > > fact
> > > > > > that
> > > > > > > > zstd
> > > > > > > > > >> > > > >>> compression
> > > > > > > > > >> > > > >>> > should be included out of the box.
> > > > > > > > > >> > > > >>> >
> > > > > > > > > >> > > > >>> > Please think about the environment, you can
> > save
> > > > > > quite a
> > > > > > > > lot
> > > > > > > > > >> of
> > > > > > > > > >> > > > >>> hardware
> > > > > > > > > >> > > > >>> > with it.
> > > > > > > > > >> > > > >>> >
> > > > > > > > > >> > > > >>> > Thank you.
> > > > > > > > > >> > > > >>> >
> > > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <
> > > > > > > > dongjin@apache.org>
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > > > >>> >
> > > > > > > > > >> > > > >>> > > Since there are no responses for a week, I
> > > > decided
> > > > > > to
> > > > > > > > > >> > reinitiate
> > > > > > > > > >> > > > the
> > > > > > > > > >> > > > >>> > > discussion thread.
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> >
> > > > > > > > > >> > > > >>>
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> > > This KIP is about to introduce ZStandard
> > > > > Compression
> > > > > > > > into
> > > > > > > > > >> > Apache
> > > > > > > > > >> > > > >>> Kafka.
> > > > > > > > > >> > > > >>> > > The reason why it is posted again has a
> > story:
> > > > It
> > > > > > was
> > > > > > > > > >> > originally
> > > > > > > > > >> > > > >>> posted
> > > > > > > > > >> > > > >>> > to
> > > > > > > > > >> > > > >>> > > the dev mailing list more than one year
> ago
> > > but
> > > > > > since
> > > > > > > it
> > > > > > > > > >> has no
> > > > > > > > > >> > > > >>> > performance
> > > > > > > > > >> > > > >>> > > report included, it was postponed later.
> But
> > > > Some
> > > > > > > people
> > > > > > > > > >> > > (including
> > > > > > > > > >> > > > >>> Ivan)
> > > > > > > > > >> > > > >>> > > reported excellent performance report with
> > the
> > > > > draft
> > > > > > > PR,
> > > > > > > > > >> this
> > > > > > > > > >> > > work
> > > > > > > > > >> > > > >>> is now
> > > > > > > > > >> > > > >>> > > reactivated.
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> > > The updated KIP document includes some
> > > expected
> > > > > > > > problems and
> > > > > > > > > >> > > their
> > > > > > > > > >> > > > >>> > > candidate alternatives. Please have a look
> > > when
> > > > > you
> > > > > > > are
> > > > > > > > > >> free,
> > > > > > > > > >> > and
> > > > > > > > > >> > > > >>> give
> > > > > > > > > >> > > > >>> > me a
> > > > > > > > > >> > > > >>> > > feedback. All kinds of participating are
> > > > welcome.
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> > > Best,
> > > > > > > > > >> > > > >>> > > Dongjin
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> > > --
> > > > > > > > > >> > > > >>> > > *Dongjin Lee*
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> > > *A hitchhiker in the mathematical world.*
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> > > *github:  <http://goog_969573159/>github
> > > > > > > > .com/dongjinleekr
> > > > > > > > > >> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr
> > > > > >slideshare:
> > > > > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > > > > > > >> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > >> > > > >>> > >
> > > > > > > > > >> > > > >>> >
> > > > > > > > > >> > > > >>>
> > > > > > > > > >> > > > >> --
> > > > > > > > > >> > > > >> *Dongjin Lee*
> > > > > > > > > >> > > > >>
> > > > > > > > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > > >> > > > >>
> > > > > > > > > >> > > > >> *github:  <http://goog_969573159/>github
> > > > > > .com/dongjinleekr
> > > > > > > > > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr
> > > >slideshare:
> > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > >> > > > >>
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > --
> > > > > > > > > >> > > > > *Dongjin Lee*
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > *github:  <http://goog_969573159/>
> > > > > github.com/dongjinleekr
> > > > > > > > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr
> > >slideshare:
> > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > --
> > > > > > > > > >> > > > *Dongjin Lee*
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > *github:  <http://goog_969573159/>github
> > > > .com/dongjinleekr
> > > > > > > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr
> >slideshare:
> > > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> *Dongjin Lee*
> > > > > > > > > >>
> > > > > > > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > > >>
> > > > > > > > > >> *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > > > > > > > > >> <http://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/
> > > > > > > > dongjinleekr
> > > > > > > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > *Dongjin Lee*
> > > > > > > > >
> > > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > >
> > > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/
> > > > > > > > dongjinleekr
> > > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *Dongjin Lee*
> > > > > > >
> > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > >
> > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > > > dongjinleekr
> > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > www.slideshare.net/
> > > > > > dongjinleekr
> > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > >
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > dongjinleekr
> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > www.slideshare.net/
> > > > dongjinleekr
> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Ismael,

Your summary looks good to me. I think it might also be a good idea to add
a new UNSUPPORTED_COMPRESSION_TYPE error code to go along with the version
bumps. We won't be able to use it for old api versions since the clients
will not understand it, but we can use it going forward so that we're not
stuck in a similar situation with a new message format and a new codec to
support. Another option is to use UNSUPPORTED_FOR_MESSAGE_FORMAT, but it is
not as explicit.

-Jason

On Fri, Aug 17, 2018 at 5:19 PM, Ismael Juma <is...@juma.me.uk> wrote:

> Hi Dongjin and Jason,
>
> I would agree. My summary:
>
> 1. Support zstd with message format 2 only.
> 2. Bump produce and fetch request versions.
> 3. Provide broker errors whenever possible based on the request version and
> rely on clients for the cases where the broker can't validate efficiently
> (example message format 2 consumer that supports the latest fetch version
> but doesn't support zstd).
>
> If there's general agreement on this, I suggest we update the KIP to state
> the proposal and to move the rejected options to its own section. And then
> start a vote!
>
> Ismael
>
> On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi Dongjin,
> >
> > Yes, that's a good summary. For clients which support v2, the client can
> > parse the message format and hopefully raise a useful error message
> > indicating the unsupported compression type. For older clients, our
> options
> > are probably (1) to down-convert to the old format using no compression
> > type, or (2) to return an error code. I'm leaning toward the latter as
> the
> > simpler solution, but the challenge is finding a good error code. Two
> > possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE. The downside
> is
> > that old clients probably won't get a helpful message. However, at least
> > the behavior will be consistent in the sense that all clients will fail
> if
> > they do not support zstandard.
> >
> > What do you think?
> >
> > Thanks,
> > Jason
> >
> > On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <do...@apache.org> wrote:
> >
> > > Thanks Jason, I reviewed the down-converting logic following your
> > > explanation.[^1] You mean the following routines, right?
> > >
> > > -
> > > https://github.com/apache/kafka/blob/trunk/core/src/
> > > main/scala/kafka/server/KafkaApis.scala#L534
> > > -
> > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > main/java/org/apache/kafka/common/record/LazyDownConversionRecords.
> > > java#L165
> > > -
> > > https://github.com/apache/kafka/blob/trunk/clients/src/
> > > main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> > >
> > > It seems like your stance is like following:
> > >
> > > 1. In principle, Kafka does not change the compression codec when
> > > down-converting, since it requires inspecting the fetched data, which
> is
> > > expensive.
> > > 2. However, there are some cases the fetched data is inspected anyway.
> In
> > > this case, we can provide compression conversion from Zstandard to
> > > classical ones[^2].
> > >
> > > And from what I understand, the cases where the client without
> ZStandard
> > > support receives ZStandard compressed records can be organized into two
> > > cases:
> > >
> > > a. The 'compression.type' configuration of given topic is 'producer'
> and
> > > the producer compressed the records with ZStandard. (that is, using
> > > ZStandard implicitly.)
> > > b.  The 'compression.type' configuration of given topic is 'zstd'; that
> > is,
> > > using ZStandard explicitly.
> > >
> > > As you stated, we don't have to handle the case b specially. So, It
> seems
> > > like we can narrow the focus of the problem by joining case 1 and case
> b
> > > like the following:
> > >
> > > > Given the topic with 'producer' as its 'compression.type'
> > configuration,
> > > ZStandard compressed records and old client without ZStandard, is there
> > any
> > > case we need to inspect the records and can change the compression
> type?
> > If
> > > so, can we provide compression type converting?
> > >
> > > Do I understand correctly?
> > >
> > > Best,
> > > Dongjin
> > >
> > > [^1]: I'm sorry, I found that I was a little bit misunderstanding how
> API
> > > version works, after reviewing the downconvert logic & the protocol
> > > documentation <https://kafka.apache.org/protocol>.
> > > [^2]: None, Gzip, Snappy, Lz4.
> > >
> > > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > >
> > > > > But in my opinion, since the client will fail with the API version,
> > so
> > > we
> > > > > don't need to down-convert the messages anyway. Isn't it? So, I
> think
> > > we
> > > > > don't care about this case. (I'm sorry, I am not familiar with
> > > > down-convert
> > > > > logic.)
> > > >
> > > >
> > > > Currently the broker down-converts automatically when it receives an
> > old
> > > > version of the fetch request (a version which is known to predate the
> > > > message format in use). Typically when down-converting the message
> > > format,
> > > > we use the same compression type, but there is not much point in
> doing
> > so
> > > > when we know the client doesn't support it. So if zstandard is in
> use,
> > > and
> > > > we have to down-convert anyway, then we can choose to use a different
> > > > compression type or no compression type.
> > > >
> > > > From my perspective, there is no significant downside to bumping the
> > > > protocol version and it has several potential benefits. Version bumps
> > are
> > > > cheap. The main question mark in my mind is about down-conversion.
> > > Figuring
> > > > out whether down-conversion is needed is hard generally without
> > > inspecting
> > > > the fetched data, which is expensive. I think we agree in principle
> > that
> > > we
> > > > do not want to have to pay this cost generally and prefer the clients
> > to
> > > > fail when they see an unhandled compression type. The point I was
> > making
> > > is
> > > > that there are some cases where we are either inspecting the data
> > anyway
> > > > (because we have to down-convert the message format), or we have an
> > easy
> > > > way to tell whether zstandard is in use (the topic has it configured
> > > > explicitly). In the latter case, we don't have to handle it
> specially.
> > > But
> > > > we do have to decide how we will handle down-conversion to older
> > formats.
> > > >
> > > > -Jason
> > > >
> > > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <do...@apache.org>
> > wrote:
> > > >
> > > > > Colin and Jason,
> > > > >
> > > > > Thanks for your opinions. In summarizing, the Pros and Cons of
> > bumping
> > > > > fetch API version are:
> > > > >
> > > > > Cons:
> > > > >
> > > > > - The Broker can't know whether a given message batch is compressed
> > > with
> > > > > zstd or not.
> > > > > - Need some additional logic for the topic explicitly configured to
> > use
> > > > > zstd.
> > > > >
> > > > > Pros:
> > > > >
> > > > > - The broker doesn't need to conduct expensive down-conversion.
> > > > > - Can message the users to update their client.
> > > > >
> > > > > So, opinions for the backward-compatibility policy by far:
> > > > >
> > > > > - A: bump the API version - +2 (Colin, Jason)
> > > > > - B: leave unchanged - +1 (Viktor)
> > > > >
> > > > > Here are my additional comments:
> > > > >
> > > > > @Colin
> > > > >
> > > > > I greatly appreciate your response. In the case of the dictionary
> > > > support,
> > > > > of course, this issue should be addressed later so we don't need it
> > in
> > > > the
> > > > > first version. You are right - it is not late to try it after some
> > > > > benchmarks. What I mean is, we should keep in mind on that
> potential
> > > > > feature.
> > > > >
> > > > > @Jason
> > > > >
> > > > > You wrote,
> > > > >
> > > > > > Similarly, if we have to down-convert anyway because the client
> > does
> > > > not
> > > > > understand the message format, then we could also use a different
> > > > > compression type.
> > > > >
> > > > > But in my opinion, since the client will fail with the API version,
> > so
> > > we
> > > > > don't need to down-convert the messages anyway. Isn't it? So, I
> think
> > > we
> > > > > don't care about this case. (I'm sorry, I am not familiar with
> > > > down-convert
> > > > > logic.)
> > > > >
> > > > > Please give more opinions. Thanks!
> > > > >
> > > > > - Dongjin
> > > > >
> > > > >
> > > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <jason@confluent.io
> >
> > > > wrote:
> > > > >
> > > > > > Hey Colin,
> > > > > >
> > > > > > The problem for the fetch API is that the broker does not
> generally
> > > > know
> > > > > if
> > > > > > a batch was compressed with zstd unless it parses it. I think the
> > > goal
> > > > > here
> > > > > > is to avoid the expensive down-conversion that is needed to
> ensure
> > > > > > compatibility because it is only necessary if zstd is actually in
> > > use.
> > > > > But
> > > > > > as long as old clients can parse the message format, they should
> > get
> > > a
> > > > > > reasonable error if they see an unsupported compression type in
> the
> > > > > > attributes. Basically the onus is on users to ensure that their
> > > > consumers
> > > > > > have been updated prior to using zstd. It seems like a reasonable
> > > > > tradeoff
> > > > > > to me. There are a couple cases that might be worth thinking
> > through:
> > > > > >
> > > > > > 1. If a topic is explicitly configured to use zstd, then we don't
> > > need
> > > > to
> > > > > > check the fetched data for the compression type to know if we
> need
> > > > > > down-conversion. If we did bump the Fetch API version, then we
> > could
> > > > > handle
> > > > > > this case by either down-converting using a different compression
> > > type
> > > > or
> > > > > > returning an error.
> > > > > > 2. Similarly, if we have to down-convert anyway because the
> client
> > > does
> > > > > not
> > > > > > understand the message format, then we could also use a different
> > > > > > compression type.
> > > > > >
> > > > > > For the produce API, I think it's reasonable to bump the api
> > version.
> > > > > This
> > > > > > can be used by clients to check whether a broker supports zstd.
> For
> > > > > > example, we might support a list of preferred compression types
> in
> > > the
> > > > > > producer and we could use the broker to detect which version to
> > use.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <cmccabe@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Thanks for bumping this, Dongjin.  ZStd is a good compression
> > codec
> > > > > and I
> > > > > > > hope we can get this support in soon!
> > > > > > >
> > > > > > > I would say we can just bump the API version to indicate that
> > ZStd
> > > > > > support
> > > > > > > is expected in new clients.  We probably need some way of
> > > indicating
> > > > to
> > > > > > the
> > > > > > > older clients that they can't consume the partitions, as well.
> > > > Perhaps
> > > > > > we
> > > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > > > > > >
> > > > > > > The license thing seems straightforward -- it's just a matter
> of
> > > > adding
> > > > > > > the text to the right files as per ASF guidelines.
> > > > > > >
> > > > > > > With regard to the dictionary support, do we really need that
> in
> > > the
> > > > > > first
> > > > > > > version?  Hopefully message batches are big enough that this
> > isn't
> > > > > > needed.
> > > > > > > Some benchmarks might help here.
> > > > > > >
> > > > > > > best,
> > > > > > > Colin
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > > > > > As Kafka 2.0.0 was released, let's reboot this issue, KIP-110
> > > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > > > > > .
> > > > > > > >
> > > > > > > > For newcomers, Here is some summary of the history: KIP-110
> was
> > > > > > > originally
> > > > > > > > worked for the issue KAFKA-4514 but, it lacked benchmark
> > results
> > > to
> > > > > get
> > > > > > > the
> > > > > > > > agreement of the community. Later, Ivan Babrou and some other
> > > users
> > > > > who
> > > > > > > > adopted the patch provided their excellent performance report
> > > which
> > > > > is
> > > > > > > now
> > > > > > > > included in the KIP, but it postponed again because of the
> > > > community
> > > > > > was
> > > > > > > > busy for 2.0.0 release. It is why I now reboot this issue.
> > > > > > > >
> > > > > > > > The following is the current status of the feature: You can
> > check
> > > > the
> > > > > > > > current draft implementation here
> > > > > > > > <https://github.com/apache/kafka/pull/2267>. It is based on
> > zstd
> > > > > 1.3.5
> > > > > > > and
> > > > > > > > periodically rebased onto the latest trunk[^1].
> > > > > > > >
> > > > > > > > The issues that should be addressed is like following:
> > > > > > > >
> > > > > > > > *1. Backward Compatibility*
> > > > > > > >
> > > > > > > > To support old consumers, we need to take a strategy to
> handle
> > > the
> > > > > old
> > > > > > > > consumers. Current candidates are:
> > > > > > > >
> > > > > > > > - Bump API version
> > > > > > > > - Leave unchanged: let the old clients fail.
> > > > > > > > - Improve the error messages:
> > > > > > > >
> > > > > > > > *2. Dictionary Support*
> > > > > > > >
> > > > > > > > To support zstd's dictionary feature in the future (if
> needed),
> > > we
> > > > > need
> > > > > > > to
> > > > > > > > sketch how it should be and leave some room for it. As of
> now,
> > > > there
> > > > > > has
> > > > > > > > been no discussion on this topic yet.
> > > > > > > >
> > > > > > > > *3. License*
> > > > > > > >
> > > > > > > > To use this feature, we need to add license of zstd and
> > zstd-jni
> > > to
> > > > > the
> > > > > > > > project. (Thanks to Viktor Somogyi for raising this issue!)
> It
> > > > seems
> > > > > > like
> > > > > > > > what Apache Spark did would be a good example but there has
> > been
> > > no
> > > > > > > > discussion yet.
> > > > > > > >
> > > > > > > > You can find the details of the above issues in the KIP
> > document.
> > > > > > Please
> > > > > > > > have a look when you are free, and give me feedback. All
> kinds
> > of
> > > > > > > > participating are welcome.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Dongjin
> > > > > > > >
> > > > > > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > > > > > >
> > > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
> > dongjin@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Sorry for the late reply.
> > > > > > > > >
> > > > > > > > > In short, I could not submit the updated KIP by the feature
> > > > freeze
> > > > > > > > > deadline of 2.0.0. For this reason, it will not be included
> > in
> > > > the
> > > > > > > 2.0.0
> > > > > > > > > release and all discussion for this issue were postponed
> > after
> > > > the
> > > > > > > release
> > > > > > > > > of 2.0.0.
> > > > > > > > >
> > > > > > > > > I have been updating the PR following recent updates. Just
> > > now, I
> > > > > > > rebased
> > > > > > > > > it against the latest trunk and updated the zstd version
> into
> > > > > 1.3.5.
> > > > > > > If you
> > > > > > > > > need some request, don't hesitate to notify me. (But not
> this
> > > > > thread
> > > > > > -
> > > > > > > just
> > > > > > > > > send me the message directly.)
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Dongjin
> > > > > > > > >
> > > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <
> > bobby@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> I there any update on this.  The performance improvements
> > are
> > > > > quite
> > > > > > > > >> impressive and I really would like to stop forking kafka
> > just
> > > to
> > > > > get
> > > > > > > this
> > > > > > > > >> in.
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >>
> > > > > > > > >> Bobby
> > > > > > > > >>
> > > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
> > > dongjin@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> > Ismael,
> > > > > > > > >> >
> > > > > > > > >> > Oh, I forgot all of you are on working frenzy for 2.0!
> No
> > > > > problem,
> > > > > > > take
> > > > > > > > >> > your time. I am also working at another issue now. Thank
> > you
> > > > for
> > > > > > > > >> letting me
> > > > > > > > >> > know.
> > > > > > > > >> >
> > > > > > > > >> > Best,
> > > > > > > > >> > Dongjin
> > > > > > > > >> >
> > > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <
> > > ismaelj@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Sorry for the delay Dongjin. Everyone is busy
> finalising
> > > > > 2.0.0.
> > > > > > > This
> > > > > > > > >> KIP
> > > > > > > > >> > > seems like a great candidate for 2.1.0 and hopefully
> > there
> > > > > will
> > > > > > be
> > > > > > > > >> more
> > > > > > > > >> > of
> > > > > > > > >> > > a discussion next week. :)
> > > > > > > > >> > >
> > > > > > > > >> > > Ismael
> > > > > > > > >> > >
> > > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
> > > dongjin@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Hello. I just updated my draft implementation:
> > > > > > > > >> > > >
> > > > > > > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > > > > > >> > > > 2. Apply ZStd 1.3.4
> > > > > > > > >> > > >
> > > > > > > > >> > > > You can check out the implementation from here
> > > > > > > > >> > > > <https://github.com/apache/kafka/pull/2267>. If you
> > > > > > experience
> > > > > > > any
> > > > > > > > >> > > problem
> > > > > > > > >> > > > running it, don't hesitate to give me a mention.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Best,
> > > > > > > > >> > > > Dongjin
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <
> > > > > > dongjin@apache.org
> > > > > > > >
> > > > > > > > >> > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > > Here is the short conclusion about the license
> > > problem:
> > > > > *We
> > > > > > > can
> > > > > > > > >> use
> > > > > > > > >> > > zstd
> > > > > > > > >> > > > > and zstd-jni without any problem, but we need to
> > > include
> > > > > > their
> > > > > > > > >> > license,
> > > > > > > > >> > > > > e.g., BSD license.*
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause License
> > > requires
> > > > > to
> > > > > > > > >> include
> > > > > > > > >> > the
> > > > > > > > >> > > > > license used, and BSD 3 Clause License requires
> that
> > > the
> > > > > > name
> > > > > > > of
> > > > > > > > >> the
> > > > > > > > >> > > > > contributor can't be used to endorse or promote
> the
> > > > > product.
> > > > > > > > >> That's
> > > > > > > > >> > it
> > > > > > > > >> > > > > <
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> > > > > > > compatible-with-apache-2-0-license/
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > - They are not listed in the list of prohibited
> > > licenses
> > > > > > > > >> > > > > <https://www.apache.org/legal/
> > > resolved.html#category-x>
> > > > > > also.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Here is how Spark did for it
> > > > > > > > >> > > > > <https://issues.apache.org/
> jira/browse/SPARK-19112
> > >:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > - They made a directory dedicated to the
> dependency
> > > > > license
> > > > > > > files
> > > > > > > > >> > > > > <
> > https://github.com/apache/spark/tree/master/licenses
> > > >
> > > > > and
> > > > > > > added
> > > > > > > > >> > > > licenses
> > > > > > > > >> > > > > for Zstd
> > > > > > > > >> > > > > <
> > > > > > > > >> > https://github.com/apache/spark/blob/master/licenses/
> > > > > > > LICENSE-zstd.txt
> > > > > > > > >> > > >
> > > > > > > > >> > > > &
> > > > > > > > >> > > > > Zstd-jni
> > > > > > > > >> > > > > <
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >> https://github.com/apache/spark/blob/master/licenses/
> > > > > > > LICENSE-zstd-jni.txt
> > > > > > > > >> >
> > > > > > > > >> > > > > .
> > > > > > > > >> > > > > - Added a link to the original license files in
> > > LICENSE.
> > > > > > > > >> > > > > <https://github.com/apache/spark/pull/18805/files
> >
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > If needed, I can make a similar update.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Thanks for pointing out this problem, Viktor! Nice
> > > > catch!
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Best,
> > > > > > > > >> > > > > Dongjin
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <
> > > > > > > dongjin@apache.org>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >> I greatly appreciate your comprehensive
> reasoning.
> > > so:
> > > > +1
> > > > > > > for b
> > > > > > > > >> > until
> > > > > > > > >> > > > now.
> > > > > > > > >> > > > >>
> > > > > > > > >> > > > >> For the license issues, I will have a check on
> how
> > > the
> > > > > over
> > > > > > > > >> projects
> > > > > > > > >> > > are
> > > > > > > > >> > > > >> doing and share the results.
> > > > > > > > >> > > > >>
> > > > > > > > >> > > > >> Best,
> > > > > > > > >> > > > >> Dongjin
> > > > > > > > >> > > > >>
> > > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > > > > > > > >> > > > viktorsomogyi@gmail.com>
> > > > > > > > >> > > > >> wrote:
> > > > > > > > >> > > > >>
> > > > > > > > >> > > > >>> Hi Dongjin,
> > > > > > > > >> > > > >>>
> > > > > > > > >> > > > >>> A couple of comments:
> > > > > > > > >> > > > >>> I would vote for option b. in the "backward
> > > > > compatibility"
> > > > > > > > >> section.
> > > > > > > > >> > > My
> > > > > > > > >> > > > >>> reasoning for this is that users upgrading to a
> > zstd
> > > > > > > compatible
> > > > > > > > >> > > version
> > > > > > > > >> > > > >>> won't start to use it automatically, so manual
> > > > > > > reconfiguration
> > > > > > > > >> is
> > > > > > > > >> > > > >>> required.
> > > > > > > > >> > > > >>> Therefore an upgrade won't mess up the cluster.
> If
> > > not
> > > > > all
> > > > > > > the
> > > > > > > > >> > > clients
> > > > > > > > >> > > > >>> are
> > > > > > > > >> > > > >>> upgraded but just some of them and they'd start
> to
> > > use
> > > > > > zstd
> > > > > > > > >> then it
> > > > > > > > >> > > > would
> > > > > > > > >> > > > >>> cause errors in the cluster. I'd like to presume
> > > > though
> > > > > > that
> > > > > > > > >> this
> > > > > > > > >> > is
> > > > > > > > >> > > a
> > > > > > > > >> > > > >>> very
> > > > > > > > >> > > > >>> obvious failure case and nobody should be
> > surprised
> > > if
> > > > > it
> > > > > > > didn't
> > > > > > > > >> > > work.
> > > > > > > > >> > > > >>> I wouldn't choose a. as I think we should bump
> the
> > > > fetch
> > > > > > and
> > > > > > > > >> > produce
> > > > > > > > >> > > > >>> requests if it's a change in the message format.
> > > > > Moreover
> > > > > > if
> > > > > > > > >> some
> > > > > > > > >> > of
> > > > > > > > >> > > > the
> > > > > > > > >> > > > >>> producers and the brokers are upgraded but some
> of
> > > the
> > > > > > > consumers
> > > > > > > > >> > are
> > > > > > > > >> > > > not,
> > > > > > > > >> > > > >>> then we wouldn't prevent the error when the old
> > > > consumer
> > > > > > > tries
> > > > > > > > >> to
> > > > > > > > >> > > > consume
> > > > > > > > >> > > > >>> the zstd compressed messages.
> > > > > > > > >> > > > >>> I wouldn't choose c. either as I think binding
> the
> > > > > > > compression
> > > > > > > > >> type
> > > > > > > > >> > > to
> > > > > > > > >> > > > an
> > > > > > > > >> > > > >>> API is not so obvious from the developer's
> > > > perspective.
> > > > > > > > >> > > > >>>
> > > > > > > > >> > > > >>> I would also prefer to use the existing binding,
> > > > however
> > > > > > we
> > > > > > > must
> > > > > > > > >> > > > respect
> > > > > > > > >> > > > >>> the licenses:
> > > > > > > > >> > > > >>> "The code for these JNI bindings is licenced
> under
> > > > > > 2-clause
> > > > > > > BSD
> > > > > > > > >> > > > license.
> > > > > > > > >> > > > >>> The native Zstd library is licensed under
> 3-clause
> > > BSD
> > > > > > > license
> > > > > > > > >> and
> > > > > > > > >> > > > GPL2"
> > > > > > > > >> > > > >>> Based on the FAQ page
> > > > > > > > >> > > > >>> https://www.apache.org/legal/
> > > resolved.html#category-a
> > > > > > > > >> > > > >>> we may use 2- and 3-clause BSD licenses but the
> > > Apache
> > > > > > > license
> > > > > > > > >> is
> > > > > > > > >> > not
> > > > > > > > >> > > > >>> compatible with GPL2. I'm hoping that the
> > "3-clause
> > > > BSD
> > > > > > > license
> > > > > > > > >> and
> > > > > > > > >> > > > GPL2"
> > > > > > > > >> > > > >>> is really not an AND but an OR in this case, but
> > I'm
> > > > no
> > > > > > > lawyer,
> > > > > > > > >> > just
> > > > > > > > >> > > > >>> wanted
> > > > > > > > >> > > > >>> to make the point that we should watch out for
> > > > licenses.
> > > > > > :)
> > > > > > > > >> > > > >>>
> > > > > > > > >> > > > >>> Regards,
> > > > > > > > >> > > > >>> Viktor
> > > > > > > > >> > > > >>>
> > > > > > > > >> > > > >>>
> > > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <
> > > > > > > ibobrik@gmail.com>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > > >>>
> > > > > > > > >> > > > >>> > Hello,
> > > > > > > > >> > > > >>> >
> > > > > > > > >> > > > >>> > This is Ivan and I still very much support the
> > > fact
> > > > > that
> > > > > > > zstd
> > > > > > > > >> > > > >>> compression
> > > > > > > > >> > > > >>> > should be included out of the box.
> > > > > > > > >> > > > >>> >
> > > > > > > > >> > > > >>> > Please think about the environment, you can
> save
> > > > > quite a
> > > > > > > lot
> > > > > > > > >> of
> > > > > > > > >> > > > >>> hardware
> > > > > > > > >> > > > >>> > with it.
> > > > > > > > >> > > > >>> >
> > > > > > > > >> > > > >>> > Thank you.
> > > > > > > > >> > > > >>> >
> > > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <
> > > > > > > dongjin@apache.org>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > > >>> >
> > > > > > > > >> > > > >>> > > Since there are no responses for a week, I
> > > decided
> > > > > to
> > > > > > > > >> > reinitiate
> > > > > > > > >> > > > the
> > > > > > > > >> > > > >>> > > discussion thread.
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> >
> > > > > > > > >> > > > >>>
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> > > This KIP is about to introduce ZStandard
> > > > Compression
> > > > > > > into
> > > > > > > > >> > Apache
> > > > > > > > >> > > > >>> Kafka.
> > > > > > > > >> > > > >>> > > The reason why it is posted again has a
> story:
> > > It
> > > > > was
> > > > > > > > >> > originally
> > > > > > > > >> > > > >>> posted
> > > > > > > > >> > > > >>> > to
> > > > > > > > >> > > > >>> > > the dev mailing list more than one year ago
> > but
> > > > > since
> > > > > > it
> > > > > > > > >> has no
> > > > > > > > >> > > > >>> > performance
> > > > > > > > >> > > > >>> > > report included, it was postponed later. But
> > > Some
> > > > > > people
> > > > > > > > >> > > (including
> > > > > > > > >> > > > >>> Ivan)
> > > > > > > > >> > > > >>> > > reported excellent performance report with
> the
> > > > draft
> > > > > > PR,
> > > > > > > > >> this
> > > > > > > > >> > > work
> > > > > > > > >> > > > >>> is now
> > > > > > > > >> > > > >>> > > reactivated.
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> > > The updated KIP document includes some
> > expected
> > > > > > > problems and
> > > > > > > > >> > > their
> > > > > > > > >> > > > >>> > > candidate alternatives. Please have a look
> > when
> > > > you
> > > > > > are
> > > > > > > > >> free,
> > > > > > > > >> > and
> > > > > > > > >> > > > >>> give
> > > > > > > > >> > > > >>> > me a
> > > > > > > > >> > > > >>> > > feedback. All kinds of participating are
> > > welcome.
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> > > Best,
> > > > > > > > >> > > > >>> > > Dongjin
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> > > --
> > > > > > > > >> > > > >>> > > *Dongjin Lee*
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> > > *A hitchhiker in the mathematical world.*
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> > > *github:  <http://goog_969573159/>github
> > > > > > > .com/dongjinleekr
> > > > > > > > >> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr
> > > > >slideshare:
> > > > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > > > > > >> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > >> > > > >>> > >
> > > > > > > > >> > > > >>> >
> > > > > > > > >> > > > >>>
> > > > > > > > >> > > > >> --
> > > > > > > > >> > > > >> *Dongjin Lee*
> > > > > > > > >> > > > >>
> > > > > > > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > >> > > > >>
> > > > > > > > >> > > > >> *github:  <http://goog_969573159/>github
> > > > > .com/dongjinleekr
> > > > > > > > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr
> > >slideshare:
> > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > >> > > > >>
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > --
> > > > > > > > >> > > > > *Dongjin Lee*
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > *github:  <http://goog_969573159/>
> > > > github.com/dongjinleekr
> > > > > > > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr
> >slideshare:
> > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > >> > > > >
> > > > > > > > >> > > >
> > > > > > > > >> > > >
> > > > > > > > >> > > > --
> > > > > > > > >> > > > *Dongjin Lee*
> > > > > > > > >> > > >
> > > > > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > > > > >> > > >
> > > > > > > > >> > > > *github:  <http://goog_969573159/>github
> > > .com/dongjinleekr
> > > > > > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> *Dongjin Lee*
> > > > > > > > >>
> > > > > > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > > >>
> > > > > > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > > >> <http://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/
> > > > > > > dongjinleekr
> > > > > > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > *Dongjin Lee*
> > > > > > > >
> > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > >
> > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/
> > > > > > > dongjinleekr
> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > *Dongjin Lee*
> > > > > >
> > > > > > *A hitchhiker in the mathematical world.*
> > > > > >
> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > > dongjinleekr
> > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > www.slideshare.net/
> > > > > dongjinleekr
> > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > >
> > > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > >
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/
> > > dongjinleekr
> > > > <http://www.slideshare.net/dongjinleekr>*
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Ismael Juma <is...@juma.me.uk>.
Hi Dongjin and Jason,

I would agree. My summary:

1. Support zstd with message format 2 only.
2. Bump produce and fetch request versions.
3. Provide broker errors whenever possible based on the request version and
rely on clients for the cases where the broker can't validate efficiently
(example message format 2 consumer that supports the latest fetch version
but doesn't support zstd).

If there's general agreement on this, I suggest we update the KIP to state
the proposal and to move the rejected options to its own section. And then
start a vote!

Ismael

On Fri, Aug 17, 2018 at 4:00 PM Jason Gustafson <ja...@confluent.io> wrote:

> Hi Dongjin,
>
> Yes, that's a good summary. For clients which support v2, the client can
> parse the message format and hopefully raise a useful error message
> indicating the unsupported compression type. For older clients, our options
> are probably (1) to down-convert to the old format using no compression
> type, or (2) to return an error code. I'm leaning toward the latter as the
> simpler solution, but the challenge is finding a good error code. Two
> possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE. The downside is
> that old clients probably won't get a helpful message. However, at least
> the behavior will be consistent in the sense that all clients will fail if
> they do not support zstandard.
>
> What do you think?
>
> Thanks,
> Jason
>
> On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <do...@apache.org> wrote:
>
> > Thanks Jason, I reviewed the down-converting logic following your
> > explanation.[^1] You mean the following routines, right?
> >
> > -
> > https://github.com/apache/kafka/blob/trunk/core/src/
> > main/scala/kafka/server/KafkaApis.scala#L534
> > -
> > https://github.com/apache/kafka/blob/trunk/clients/src/
> > main/java/org/apache/kafka/common/record/LazyDownConversionRecords.
> > java#L165
> > -
> > https://github.com/apache/kafka/blob/trunk/clients/src/
> > main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
> >
> > It seems like your stance is like following:
> >
> > 1. In principle, Kafka does not change the compression codec when
> > down-converting, since it requires inspecting the fetched data, which is
> > expensive.
> > 2. However, there are some cases the fetched data is inspected anyway. In
> > this case, we can provide compression conversion from Zstandard to
> > classical ones[^2].
> >
> > And from what I understand, the cases where the client without ZStandard
> > support receives ZStandard compressed records can be organized into two
> > cases:
> >
> > a. The 'compression.type' configuration of given topic is 'producer' and
> > the producer compressed the records with ZStandard. (that is, using
> > ZStandard implicitly.)
> > b.  The 'compression.type' configuration of given topic is 'zstd'; that
> is,
> > using ZStandard explicitly.
> >
> > As you stated, we don't have to handle the case b specially. So, It seems
> > like we can narrow the focus of the problem by joining case 1 and case b
> > like the following:
> >
> > > Given the topic with 'producer' as its 'compression.type'
> configuration,
> > ZStandard compressed records and old client without ZStandard, is there
> any
> > case we need to inspect the records and can change the compression type?
> If
> > so, can we provide compression type converting?
> >
> > Do I understand correctly?
> >
> > Best,
> > Dongjin
> >
> > [^1]: I'm sorry, I found that I was a little bit misunderstanding how API
> > version works, after reviewing the downconvert logic & the protocol
> > documentation <https://kafka.apache.org/protocol>.
> > [^2]: None, Gzip, Snappy, Lz4.
> >
> > On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > >
> > > > But in my opinion, since the client will fail with the API version,
> so
> > we
> > > > don't need to down-convert the messages anyway. Isn't it? So, I think
> > we
> > > > don't care about this case. (I'm sorry, I am not familiar with
> > > down-convert
> > > > logic.)
> > >
> > >
> > > Currently the broker down-converts automatically when it receives an
> old
> > > version of the fetch request (a version which is known to predate the
> > > message format in use). Typically when down-converting the message
> > format,
> > > we use the same compression type, but there is not much point in doing
> so
> > > when we know the client doesn't support it. So if zstandard is in use,
> > and
> > > we have to down-convert anyway, then we can choose to use a different
> > > compression type or no compression type.
> > >
> > > From my perspective, there is no significant downside to bumping the
> > > protocol version and it has several potential benefits. Version bumps
> are
> > > cheap. The main question mark in my mind is about down-conversion.
> > Figuring
> > > out whether down-conversion is needed is hard generally without
> > inspecting
> > > the fetched data, which is expensive. I think we agree in principle
> that
> > we
> > > do not want to have to pay this cost generally and prefer the clients
> to
> > > fail when they see an unhandled compression type. The point I was
> making
> > is
> > > that there are some cases where we are either inspecting the data
> anyway
> > > (because we have to down-convert the message format), or we have an
> easy
> > > way to tell whether zstandard is in use (the topic has it configured
> > > explicitly). In the latter case, we don't have to handle it specially.
> > But
> > > we do have to decide how we will handle down-conversion to older
> formats.
> > >
> > > -Jason
> > >
> > > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > Colin and Jason,
> > > >
> > > > Thanks for your opinions. In summarizing, the Pros and Cons of
> bumping
> > > > fetch API version are:
> > > >
> > > > Cons:
> > > >
> > > > - The Broker can't know whether a given message batch is compressed
> > with
> > > > zstd or not.
> > > > - Need some additional logic for the topic explicitly configured to
> use
> > > > zstd.
> > > >
> > > > Pros:
> > > >
> > > > - The broker doesn't need to conduct expensive down-conversion.
> > > > - Can message the users to update their client.
> > > >
> > > > So, opinions for the backward-compatibility policy by far:
> > > >
> > > > - A: bump the API version - +2 (Colin, Jason)
> > > > - B: leave unchanged - +1 (Viktor)
> > > >
> > > > Here are my additional comments:
> > > >
> > > > @Colin
> > > >
> > > > I greatly appreciate your response. In the case of the dictionary
> > > support,
> > > > of course, this issue should be addressed later so we don't need it
> in
> > > the
> > > > first version. You are right - it is not late to try it after some
> > > > benchmarks. What I mean is, we should keep in mind on that potential
> > > > feature.
> > > >
> > > > @Jason
> > > >
> > > > You wrote,
> > > >
> > > > > Similarly, if we have to down-convert anyway because the client
> does
> > > not
> > > > understand the message format, then we could also use a different
> > > > compression type.
> > > >
> > > > But in my opinion, since the client will fail with the API version,
> so
> > we
> > > > don't need to down-convert the messages anyway. Isn't it? So, I think
> > we
> > > > don't care about this case. (I'm sorry, I am not familiar with
> > > down-convert
> > > > logic.)
> > > >
> > > > Please give more opinions. Thanks!
> > > >
> > > > - Dongjin
> > > >
> > > >
> > > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > > >
> > > > > Hey Colin,
> > > > >
> > > > > The problem for the fetch API is that the broker does not generally
> > > know
> > > > if
> > > > > a batch was compressed with zstd unless it parses it. I think the
> > goal
> > > > here
> > > > > is to avoid the expensive down-conversion that is needed to ensure
> > > > > compatibility because it is only necessary if zstd is actually in
> > use.
> > > > But
> > > > > as long as old clients can parse the message format, they should
> get
> > a
> > > > > reasonable error if they see an unsupported compression type in the
> > > > > attributes. Basically the onus is on users to ensure that their
> > > consumers
> > > > > have been updated prior to using zstd. It seems like a reasonable
> > > > tradeoff
> > > > > to me. There are a couple cases that might be worth thinking
> through:
> > > > >
> > > > > 1. If a topic is explicitly configured to use zstd, then we don't
> > need
> > > to
> > > > > check the fetched data for the compression type to know if we need
> > > > > down-conversion. If we did bump the Fetch API version, then we
> could
> > > > handle
> > > > > this case by either down-converting using a different compression
> > type
> > > or
> > > > > returning an error.
> > > > > 2. Similarly, if we have to down-convert anyway because the client
> > does
> > > > not
> > > > > understand the message format, then we could also use a different
> > > > > compression type.
> > > > >
> > > > > For the produce API, I think it's reasonable to bump the api
> version.
> > > > This
> > > > > can be used by clients to check whether a broker supports zstd. For
> > > > > example, we might support a list of preferred compression types in
> > the
> > > > > producer and we could use the broker to detect which version to
> use.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <cm...@apache.org>
> > > wrote:
> > > > >
> > > > > > Thanks for bumping this, Dongjin.  ZStd is a good compression
> codec
> > > > and I
> > > > > > hope we can get this support in soon!
> > > > > >
> > > > > > I would say we can just bump the API version to indicate that
> ZStd
> > > > > support
> > > > > > is expected in new clients.  We probably need some way of
> > indicating
> > > to
> > > > > the
> > > > > > older clients that they can't consume the partitions, as well.
> > > Perhaps
> > > > > we
> > > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > > > > >
> > > > > > The license thing seems straightforward -- it's just a matter of
> > > adding
> > > > > > the text to the right files as per ASF guidelines.
> > > > > >
> > > > > > With regard to the dictionary support, do we really need that in
> > the
> > > > > first
> > > > > > version?  Hopefully message batches are big enough that this
> isn't
> > > > > needed.
> > > > > > Some benchmarks might help here.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > > > > As Kafka 2.0.0 was released, let's reboot this issue, KIP-110
> > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > > > > .
> > > > > > >
> > > > > > > For newcomers, Here is some summary of the history: KIP-110 was
> > > > > > originally
> > > > > > > worked for the issue KAFKA-4514 but, it lacked benchmark
> results
> > to
> > > > get
> > > > > > the
> > > > > > > agreement of the community. Later, Ivan Babrou and some other
> > users
> > > > who
> > > > > > > adopted the patch provided their excellent performance report
> > which
> > > > is
> > > > > > now
> > > > > > > included in the KIP, but it postponed again because of the
> > > community
> > > > > was
> > > > > > > busy for 2.0.0 release. It is why I now reboot this issue.
> > > > > > >
> > > > > > > The following is the current status of the feature: You can
> check
> > > the
> > > > > > > current draft implementation here
> > > > > > > <https://github.com/apache/kafka/pull/2267>. It is based on
> zstd
> > > > 1.3.5
> > > > > > and
> > > > > > > periodically rebased onto the latest trunk[^1].
> > > > > > >
> > > > > > > The issues that should be addressed is like following:
> > > > > > >
> > > > > > > *1. Backward Compatibility*
> > > > > > >
> > > > > > > To support old consumers, we need to take a strategy to handle
> > the
> > > > old
> > > > > > > consumers. Current candidates are:
> > > > > > >
> > > > > > > - Bump API version
> > > > > > > - Leave unchanged: let the old clients fail.
> > > > > > > - Improve the error messages:
> > > > > > >
> > > > > > > *2. Dictionary Support*
> > > > > > >
> > > > > > > To support zstd's dictionary feature in the future (if needed),
> > we
> > > > need
> > > > > > to
> > > > > > > sketch how it should be and leave some room for it. As of now,
> > > there
> > > > > has
> > > > > > > been no discussion on this topic yet.
> > > > > > >
> > > > > > > *3. License*
> > > > > > >
> > > > > > > To use this feature, we need to add license of zstd and
> zstd-jni
> > to
> > > > the
> > > > > > > project. (Thanks to Viktor Somogyi for raising this issue!) It
> > > seems
> > > > > like
> > > > > > > what Apache Spark did would be a good example but there has
> been
> > no
> > > > > > > discussion yet.
> > > > > > >
> > > > > > > You can find the details of the above issues in the KIP
> document.
> > > > > Please
> > > > > > > have a look when you are free, and give me feedback. All kinds
> of
> > > > > > > participating are welcome.
> > > > > > >
> > > > > > > Best,
> > > > > > > Dongjin
> > > > > > >
> > > > > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > > > > >
> > > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <
> dongjin@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Sorry for the late reply.
> > > > > > > >
> > > > > > > > In short, I could not submit the updated KIP by the feature
> > > freeze
> > > > > > > > deadline of 2.0.0. For this reason, it will not be included
> in
> > > the
> > > > > > 2.0.0
> > > > > > > > release and all discussion for this issue were postponed
> after
> > > the
> > > > > > release
> > > > > > > > of 2.0.0.
> > > > > > > >
> > > > > > > > I have been updating the PR following recent updates. Just
> > now, I
> > > > > > rebased
> > > > > > > > it against the latest trunk and updated the zstd version into
> > > > 1.3.5.
> > > > > > If you
> > > > > > > > need some request, don't hesitate to notify me. (But not this
> > > > thread
> > > > > -
> > > > > > just
> > > > > > > > send me the message directly.)
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Dongjin
> > > > > > > >
> > > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <
> bobby@apache.org
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > >> I there any update on this.  The performance improvements
> are
> > > > quite
> > > > > > > >> impressive and I really would like to stop forking kafka
> just
> > to
> > > > get
> > > > > > this
> > > > > > > >> in.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Bobby
> > > > > > > >>
> > > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
> > dongjin@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >>
> > > > > > > >> > Ismael,
> > > > > > > >> >
> > > > > > > >> > Oh, I forgot all of you are on working frenzy for 2.0! No
> > > > problem,
> > > > > > take
> > > > > > > >> > your time. I am also working at another issue now. Thank
> you
> > > for
> > > > > > > >> letting me
> > > > > > > >> > know.
> > > > > > > >> >
> > > > > > > >> > Best,
> > > > > > > >> > Dongjin
> > > > > > > >> >
> > > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <
> > ismaelj@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Sorry for the delay Dongjin. Everyone is busy finalising
> > > > 2.0.0.
> > > > > > This
> > > > > > > >> KIP
> > > > > > > >> > > seems like a great candidate for 2.1.0 and hopefully
> there
> > > > will
> > > > > be
> > > > > > > >> more
> > > > > > > >> > of
> > > > > > > >> > > a discussion next week. :)
> > > > > > > >> > >
> > > > > > > >> > > Ismael
> > > > > > > >> > >
> > > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
> > dongjin@apache.org
> > > >
> > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hello. I just updated my draft implementation:
> > > > > > > >> > > >
> > > > > > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > > > > >> > > > 2. Apply ZStd 1.3.4
> > > > > > > >> > > >
> > > > > > > >> > > > You can check out the implementation from here
> > > > > > > >> > > > <https://github.com/apache/kafka/pull/2267>. If you
> > > > > experience
> > > > > > any
> > > > > > > >> > > problem
> > > > > > > >> > > > running it, don't hesitate to give me a mention.
> > > > > > > >> > > >
> > > > > > > >> > > > Best,
> > > > > > > >> > > > Dongjin
> > > > > > > >> > > >
> > > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <
> > > > > dongjin@apache.org
> > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Here is the short conclusion about the license
> > problem:
> > > > *We
> > > > > > can
> > > > > > > >> use
> > > > > > > >> > > zstd
> > > > > > > >> > > > > and zstd-jni without any problem, but we need to
> > include
> > > > > their
> > > > > > > >> > license,
> > > > > > > >> > > > > e.g., BSD license.*
> > > > > > > >> > > > >
> > > > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause License
> > requires
> > > > to
> > > > > > > >> include
> > > > > > > >> > the
> > > > > > > >> > > > > license used, and BSD 3 Clause License requires that
> > the
> > > > > name
> > > > > > of
> > > > > > > >> the
> > > > > > > >> > > > > contributor can't be used to endorse or promote the
> > > > product.
> > > > > > > >> That's
> > > > > > > >> > it
> > > > > > > >> > > > > <
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> > > > > > compatible-with-apache-2-0-license/
> > > > > > > >> > > > >
> > > > > > > >> > > > > - They are not listed in the list of prohibited
> > licenses
> > > > > > > >> > > > > <https://www.apache.org/legal/
> > resolved.html#category-x>
> > > > > also.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Here is how Spark did for it
> > > > > > > >> > > > > <https://issues.apache.org/jira/browse/SPARK-19112
> >:
> > > > > > > >> > > > >
> > > > > > > >> > > > > - They made a directory dedicated to the dependency
> > > > license
> > > > > > files
> > > > > > > >> > > > > <
> https://github.com/apache/spark/tree/master/licenses
> > >
> > > > and
> > > > > > added
> > > > > > > >> > > > licenses
> > > > > > > >> > > > > for Zstd
> > > > > > > >> > > > > <
> > > > > > > >> > https://github.com/apache/spark/blob/master/licenses/
> > > > > > LICENSE-zstd.txt
> > > > > > > >> > > >
> > > > > > > >> > > > &
> > > > > > > >> > > > > Zstd-jni
> > > > > > > >> > > > > <
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> https://github.com/apache/spark/blob/master/licenses/
> > > > > > LICENSE-zstd-jni.txt
> > > > > > > >> >
> > > > > > > >> > > > > .
> > > > > > > >> > > > > - Added a link to the original license files in
> > LICENSE.
> > > > > > > >> > > > > <https://github.com/apache/spark/pull/18805/files>
> > > > > > > >> > > > >
> > > > > > > >> > > > > If needed, I can make a similar update.
> > > > > > > >> > > > >
> > > > > > > >> > > > > Thanks for pointing out this problem, Viktor! Nice
> > > catch!
> > > > > > > >> > > > >
> > > > > > > >> > > > > Best,
> > > > > > > >> > > > > Dongjin
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <
> > > > > > dongjin@apache.org>
> > > > > > > >> > > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > >> I greatly appreciate your comprehensive reasoning.
> > so:
> > > +1
> > > > > > for b
> > > > > > > >> > until
> > > > > > > >> > > > now.
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> For the license issues, I will have a check on how
> > the
> > > > over
> > > > > > > >> projects
> > > > > > > >> > > are
> > > > > > > >> > > > >> doing and share the results.
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> Best,
> > > > > > > >> > > > >> Dongjin
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > > > > > > >> > > > viktorsomogyi@gmail.com>
> > > > > > > >> > > > >> wrote:
> > > > > > > >> > > > >>
> > > > > > > >> > > > >>> Hi Dongjin,
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> A couple of comments:
> > > > > > > >> > > > >>> I would vote for option b. in the "backward
> > > > compatibility"
> > > > > > > >> section.
> > > > > > > >> > > My
> > > > > > > >> > > > >>> reasoning for this is that users upgrading to a
> zstd
> > > > > > compatible
> > > > > > > >> > > version
> > > > > > > >> > > > >>> won't start to use it automatically, so manual
> > > > > > reconfiguration
> > > > > > > >> is
> > > > > > > >> > > > >>> required.
> > > > > > > >> > > > >>> Therefore an upgrade won't mess up the cluster. If
> > not
> > > > all
> > > > > > the
> > > > > > > >> > > clients
> > > > > > > >> > > > >>> are
> > > > > > > >> > > > >>> upgraded but just some of them and they'd start to
> > use
> > > > > zstd
> > > > > > > >> then it
> > > > > > > >> > > > would
> > > > > > > >> > > > >>> cause errors in the cluster. I'd like to presume
> > > though
> > > > > that
> > > > > > > >> this
> > > > > > > >> > is
> > > > > > > >> > > a
> > > > > > > >> > > > >>> very
> > > > > > > >> > > > >>> obvious failure case and nobody should be
> surprised
> > if
> > > > it
> > > > > > didn't
> > > > > > > >> > > work.
> > > > > > > >> > > > >>> I wouldn't choose a. as I think we should bump the
> > > fetch
> > > > > and
> > > > > > > >> > produce
> > > > > > > >> > > > >>> requests if it's a change in the message format.
> > > > Moreover
> > > > > if
> > > > > > > >> some
> > > > > > > >> > of
> > > > > > > >> > > > the
> > > > > > > >> > > > >>> producers and the brokers are upgraded but some of
> > the
> > > > > > consumers
> > > > > > > >> > are
> > > > > > > >> > > > not,
> > > > > > > >> > > > >>> then we wouldn't prevent the error when the old
> > > consumer
> > > > > > tries
> > > > > > > >> to
> > > > > > > >> > > > consume
> > > > > > > >> > > > >>> the zstd compressed messages.
> > > > > > > >> > > > >>> I wouldn't choose c. either as I think binding the
> > > > > > compression
> > > > > > > >> type
> > > > > > > >> > > to
> > > > > > > >> > > > an
> > > > > > > >> > > > >>> API is not so obvious from the developer's
> > > perspective.
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> I would also prefer to use the existing binding,
> > > however
> > > > > we
> > > > > > must
> > > > > > > >> > > > respect
> > > > > > > >> > > > >>> the licenses:
> > > > > > > >> > > > >>> "The code for these JNI bindings is licenced under
> > > > > 2-clause
> > > > > > BSD
> > > > > > > >> > > > license.
> > > > > > > >> > > > >>> The native Zstd library is licensed under 3-clause
> > BSD
> > > > > > license
> > > > > > > >> and
> > > > > > > >> > > > GPL2"
> > > > > > > >> > > > >>> Based on the FAQ page
> > > > > > > >> > > > >>> https://www.apache.org/legal/
> > resolved.html#category-a
> > > > > > > >> > > > >>> we may use 2- and 3-clause BSD licenses but the
> > Apache
> > > > > > license
> > > > > > > >> is
> > > > > > > >> > not
> > > > > > > >> > > > >>> compatible with GPL2. I'm hoping that the
> "3-clause
> > > BSD
> > > > > > license
> > > > > > > >> and
> > > > > > > >> > > > GPL2"
> > > > > > > >> > > > >>> is really not an AND but an OR in this case, but
> I'm
> > > no
> > > > > > lawyer,
> > > > > > > >> > just
> > > > > > > >> > > > >>> wanted
> > > > > > > >> > > > >>> to make the point that we should watch out for
> > > licenses.
> > > > > :)
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> Regards,
> > > > > > > >> > > > >>> Viktor
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <
> > > > > > ibobrik@gmail.com>
> > > > > > > >> > > wrote:
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >>> > Hello,
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > This is Ivan and I still very much support the
> > fact
> > > > that
> > > > > > zstd
> > > > > > > >> > > > >>> compression
> > > > > > > >> > > > >>> > should be included out of the box.
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > Please think about the environment, you can save
> > > > quite a
> > > > > > lot
> > > > > > > >> of
> > > > > > > >> > > > >>> hardware
> > > > > > > >> > > > >>> > with it.
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > Thank you.
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <
> > > > > > dongjin@apache.org>
> > > > > > > >> > > wrote:
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>> > > Since there are no responses for a week, I
> > decided
> > > > to
> > > > > > > >> > reinitiate
> > > > > > > >> > > > the
> > > > > > > >> > > > >>> > > discussion thread.
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>>
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > > This KIP is about to introduce ZStandard
> > > Compression
> > > > > > into
> > > > > > > >> > Apache
> > > > > > > >> > > > >>> Kafka.
> > > > > > > >> > > > >>> > > The reason why it is posted again has a story:
> > It
> > > > was
> > > > > > > >> > originally
> > > > > > > >> > > > >>> posted
> > > > > > > >> > > > >>> > to
> > > > > > > >> > > > >>> > > the dev mailing list more than one year ago
> but
> > > > since
> > > > > it
> > > > > > > >> has no
> > > > > > > >> > > > >>> > performance
> > > > > > > >> > > > >>> > > report included, it was postponed later. But
> > Some
> > > > > people
> > > > > > > >> > > (including
> > > > > > > >> > > > >>> Ivan)
> > > > > > > >> > > > >>> > > reported excellent performance report with the
> > > draft
> > > > > PR,
> > > > > > > >> this
> > > > > > > >> > > work
> > > > > > > >> > > > >>> is now
> > > > > > > >> > > > >>> > > reactivated.
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > > The updated KIP document includes some
> expected
> > > > > > problems and
> > > > > > > >> > > their
> > > > > > > >> > > > >>> > > candidate alternatives. Please have a look
> when
> > > you
> > > > > are
> > > > > > > >> free,
> > > > > > > >> > and
> > > > > > > >> > > > >>> give
> > > > > > > >> > > > >>> > me a
> > > > > > > >> > > > >>> > > feedback. All kinds of participating are
> > welcome.
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > > Best,
> > > > > > > >> > > > >>> > > Dongjin
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > > --
> > > > > > > >> > > > >>> > > *Dongjin Lee*
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > > *A hitchhiker in the mathematical world.*
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> > > *github:  <http://goog_969573159/>github
> > > > > > .com/dongjinleekr
> > > > > > > >> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr
> > > >slideshare:
> > > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > > > > >> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > >> > > > >>> > >
> > > > > > > >> > > > >>> >
> > > > > > > >> > > > >>>
> > > > > > > >> > > > >> --
> > > > > > > >> > > > >> *Dongjin Lee*
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > >> > > > >>
> > > > > > > >> > > > >> *github:  <http://goog_969573159/>github
> > > > .com/dongjinleekr
> > > > > > > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr
> >slideshare:
> > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > >> > > > >>
> > > > > > > >> > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > > > --
> > > > > > > >> > > > > *Dongjin Lee*
> > > > > > > >> > > > >
> > > > > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > > > >> > > > >
> > > > > > > >> > > > > *github:  <http://goog_969573159/>
> > > github.com/dongjinleekr
> > > > > > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > --
> > > > > > > >> > > > *Dongjin Lee*
> > > > > > > >> > > >
> > > > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > > > >> > > >
> > > > > > > >> > > > *github:  <http://goog_969573159/>github
> > .com/dongjinleekr
> > > > > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> *Dongjin Lee*
> > > > > > > >>
> > > > > > > >> *A hitchhiker in the mathematical world.*
> > > > > > > >>
> > > > > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > >> <http://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/
> > > > > > dongjinleekr
> > > > > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > www.slideshare.net/dongjinleekr
> > > > > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *Dongjin Lee*
> > > > > > >
> > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > >
> > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > > > dongjinleekr
> > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > >
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > >
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > dongjinleekr
> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > www.slideshare.net/
> > > > dongjinleekr
> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > >
> > > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/
> > dongjinleekr
> > > <http://www.slideshare.net/dongjinleekr>*
> > >
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Jason Gustafson <ja...@confluent.io>.
Hi Dongjin,

Yes, that's a good summary. For clients which support v2, the client can
parse the message format and hopefully raise a useful error message
indicating the unsupported compression type. For older clients, our options
are probably (1) to down-convert to the old format using no compression
type, or (2) to return an error code. I'm leaning toward the latter as the
simpler solution, but the challenge is finding a good error code. Two
possibilities might be INVALID_REQUEST or CORRUPT_MESSAGE. The downside is
that old clients probably won't get a helpful message. However, at least
the behavior will be consistent in the sense that all clients will fail if
they do not support zstandard.

What do you think?

Thanks,
Jason

On Fri, Aug 17, 2018 at 8:08 AM, Dongjin Lee <do...@apache.org> wrote:

> Thanks Jason, I reviewed the down-converting logic following your
> explanation.[^1] You mean the following routines, right?
>
> -
> https://github.com/apache/kafka/blob/trunk/core/src/
> main/scala/kafka/server/KafkaApis.scala#L534
> -
> https://github.com/apache/kafka/blob/trunk/clients/src/
> main/java/org/apache/kafka/common/record/LazyDownConversionRecords.
> java#L165
> -
> https://github.com/apache/kafka/blob/trunk/clients/src/
> main/java/org/apache/kafka/common/record/RecordsUtil.java#L40
>
> It seems like your stance is like following:
>
> 1. In principle, Kafka does not change the compression codec when
> down-converting, since it requires inspecting the fetched data, which is
> expensive.
> 2. However, there are some cases the fetched data is inspected anyway. In
> this case, we can provide compression conversion from Zstandard to
> classical ones[^2].
>
> And from what I understand, the cases where the client without ZStandard
> support receives ZStandard compressed records can be organized into two
> cases:
>
> a. The 'compression.type' configuration of given topic is 'producer' and
> the producer compressed the records with ZStandard. (that is, using
> ZStandard implicitly.)
> b.  The 'compression.type' configuration of given topic is 'zstd'; that is,
> using ZStandard explicitly.
>
> As you stated, we don't have to handle the case b specially. So, It seems
> like we can narrow the focus of the problem by joining case 1 and case b
> like the following:
>
> > Given the topic with 'producer' as its 'compression.type' configuration,
> ZStandard compressed records and old client without ZStandard, is there any
> case we need to inspect the records and can change the compression type? If
> so, can we provide compression type converting?
>
> Do I understand correctly?
>
> Best,
> Dongjin
>
> [^1]: I'm sorry, I found that I was a little bit misunderstanding how API
> version works, after reviewing the downconvert logic & the protocol
> documentation <https://kafka.apache.org/protocol>.
> [^2]: None, Gzip, Snappy, Lz4.
>
> On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > >
> > > But in my opinion, since the client will fail with the API version, so
> we
> > > don't need to down-convert the messages anyway. Isn't it? So, I think
> we
> > > don't care about this case. (I'm sorry, I am not familiar with
> > down-convert
> > > logic.)
> >
> >
> > Currently the broker down-converts automatically when it receives an old
> > version of the fetch request (a version which is known to predate the
> > message format in use). Typically when down-converting the message
> format,
> > we use the same compression type, but there is not much point in doing so
> > when we know the client doesn't support it. So if zstandard is in use,
> and
> > we have to down-convert anyway, then we can choose to use a different
> > compression type or no compression type.
> >
> > From my perspective, there is no significant downside to bumping the
> > protocol version and it has several potential benefits. Version bumps are
> > cheap. The main question mark in my mind is about down-conversion.
> Figuring
> > out whether down-conversion is needed is hard generally without
> inspecting
> > the fetched data, which is expensive. I think we agree in principle that
> we
> > do not want to have to pay this cost generally and prefer the clients to
> > fail when they see an unhandled compression type. The point I was making
> is
> > that there are some cases where we are either inspecting the data anyway
> > (because we have to down-convert the message format), or we have an easy
> > way to tell whether zstandard is in use (the topic has it configured
> > explicitly). In the latter case, we don't have to handle it specially.
> But
> > we do have to decide how we will handle down-conversion to older formats.
> >
> > -Jason
> >
> > On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <do...@apache.org> wrote:
> >
> > > Colin and Jason,
> > >
> > > Thanks for your opinions. In summarizing, the Pros and Cons of bumping
> > > fetch API version are:
> > >
> > > Cons:
> > >
> > > - The Broker can't know whether a given message batch is compressed
> with
> > > zstd or not.
> > > - Need some additional logic for the topic explicitly configured to use
> > > zstd.
> > >
> > > Pros:
> > >
> > > - The broker doesn't need to conduct expensive down-conversion.
> > > - Can message the users to update their client.
> > >
> > > So, opinions for the backward-compatibility policy by far:
> > >
> > > - A: bump the API version - +2 (Colin, Jason)
> > > - B: leave unchanged - +1 (Viktor)
> > >
> > > Here are my additional comments:
> > >
> > > @Colin
> > >
> > > I greatly appreciate your response. In the case of the dictionary
> > support,
> > > of course, this issue should be addressed later so we don't need it in
> > the
> > > first version. You are right - it is not late to try it after some
> > > benchmarks. What I mean is, we should keep in mind on that potential
> > > feature.
> > >
> > > @Jason
> > >
> > > You wrote,
> > >
> > > > Similarly, if we have to down-convert anyway because the client does
> > not
> > > understand the message format, then we could also use a different
> > > compression type.
> > >
> > > But in my opinion, since the client will fail with the API version, so
> we
> > > don't need to down-convert the messages anyway. Isn't it? So, I think
> we
> > > don't care about this case. (I'm sorry, I am not familiar with
> > down-convert
> > > logic.)
> > >
> > > Please give more opinions. Thanks!
> > >
> > > - Dongjin
> > >
> > >
> > > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <ja...@confluent.io>
> > wrote:
> > >
> > > > Hey Colin,
> > > >
> > > > The problem for the fetch API is that the broker does not generally
> > know
> > > if
> > > > a batch was compressed with zstd unless it parses it. I think the
> goal
> > > here
> > > > is to avoid the expensive down-conversion that is needed to ensure
> > > > compatibility because it is only necessary if zstd is actually in
> use.
> > > But
> > > > as long as old clients can parse the message format, they should get
> a
> > > > reasonable error if they see an unsupported compression type in the
> > > > attributes. Basically the onus is on users to ensure that their
> > consumers
> > > > have been updated prior to using zstd. It seems like a reasonable
> > > tradeoff
> > > > to me. There are a couple cases that might be worth thinking through:
> > > >
> > > > 1. If a topic is explicitly configured to use zstd, then we don't
> need
> > to
> > > > check the fetched data for the compression type to know if we need
> > > > down-conversion. If we did bump the Fetch API version, then we could
> > > handle
> > > > this case by either down-converting using a different compression
> type
> > or
> > > > returning an error.
> > > > 2. Similarly, if we have to down-convert anyway because the client
> does
> > > not
> > > > understand the message format, then we could also use a different
> > > > compression type.
> > > >
> > > > For the produce API, I think it's reasonable to bump the api version.
> > > This
> > > > can be used by clients to check whether a broker supports zstd. For
> > > > example, we might support a list of preferred compression types in
> the
> > > > producer and we could use the broker to detect which version to use.
> > > >
> > > > -Jason
> > > >
> > > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <cm...@apache.org>
> > wrote:
> > > >
> > > > > Thanks for bumping this, Dongjin.  ZStd is a good compression codec
> > > and I
> > > > > hope we can get this support in soon!
> > > > >
> > > > > I would say we can just bump the API version to indicate that ZStd
> > > > support
> > > > > is expected in new clients.  We probably need some way of
> indicating
> > to
> > > > the
> > > > > older clients that they can't consume the partitions, as well.
> > Perhaps
> > > > we
> > > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > > > >
> > > > > The license thing seems straightforward -- it's just a matter of
> > adding
> > > > > the text to the right files as per ASF guidelines.
> > > > >
> > > > > With regard to the dictionary support, do we really need that in
> the
> > > > first
> > > > > version?  Hopefully message batches are big enough that this isn't
> > > > needed.
> > > > > Some benchmarks might help here.
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > > > As Kafka 2.0.0 was released, let's reboot this issue, KIP-110
> > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > > > .
> > > > > >
> > > > > > For newcomers, Here is some summary of the history: KIP-110 was
> > > > > originally
> > > > > > worked for the issue KAFKA-4514 but, it lacked benchmark results
> to
> > > get
> > > > > the
> > > > > > agreement of the community. Later, Ivan Babrou and some other
> users
> > > who
> > > > > > adopted the patch provided their excellent performance report
> which
> > > is
> > > > > now
> > > > > > included in the KIP, but it postponed again because of the
> > community
> > > > was
> > > > > > busy for 2.0.0 release. It is why I now reboot this issue.
> > > > > >
> > > > > > The following is the current status of the feature: You can check
> > the
> > > > > > current draft implementation here
> > > > > > <https://github.com/apache/kafka/pull/2267>. It is based on zstd
> > > 1.3.5
> > > > > and
> > > > > > periodically rebased onto the latest trunk[^1].
> > > > > >
> > > > > > The issues that should be addressed is like following:
> > > > > >
> > > > > > *1. Backward Compatibility*
> > > > > >
> > > > > > To support old consumers, we need to take a strategy to handle
> the
> > > old
> > > > > > consumers. Current candidates are:
> > > > > >
> > > > > > - Bump API version
> > > > > > - Leave unchanged: let the old clients fail.
> > > > > > - Improve the error messages:
> > > > > >
> > > > > > *2. Dictionary Support*
> > > > > >
> > > > > > To support zstd's dictionary feature in the future (if needed),
> we
> > > need
> > > > > to
> > > > > > sketch how it should be and leave some room for it. As of now,
> > there
> > > > has
> > > > > > been no discussion on this topic yet.
> > > > > >
> > > > > > *3. License*
> > > > > >
> > > > > > To use this feature, we need to add license of zstd and zstd-jni
> to
> > > the
> > > > > > project. (Thanks to Viktor Somogyi for raising this issue!) It
> > seems
> > > > like
> > > > > > what Apache Spark did would be a good example but there has been
> no
> > > > > > discussion yet.
> > > > > >
> > > > > > You can find the details of the above issues in the KIP document.
> > > > Please
> > > > > > have a look when you are free, and give me feedback. All kinds of
> > > > > > participating are welcome.
> > > > > >
> > > > > > Best,
> > > > > > Dongjin
> > > > > >
> > > > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > > > >
> > > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <dongjin@apache.org
> >
> > > > wrote:
> > > > > >
> > > > > > > Sorry for the late reply.
> > > > > > >
> > > > > > > In short, I could not submit the updated KIP by the feature
> > freeze
> > > > > > > deadline of 2.0.0. For this reason, it will not be included in
> > the
> > > > > 2.0.0
> > > > > > > release and all discussion for this issue were postponed after
> > the
> > > > > release
> > > > > > > of 2.0.0.
> > > > > > >
> > > > > > > I have been updating the PR following recent updates. Just
> now, I
> > > > > rebased
> > > > > > > it against the latest trunk and updated the zstd version into
> > > 1.3.5.
> > > > > If you
> > > > > > > need some request, don't hesitate to notify me. (But not this
> > > thread
> > > > -
> > > > > just
> > > > > > > send me the message directly.)
> > > > > > >
> > > > > > > Best,
> > > > > > > Dongjin
> > > > > > >
> > > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <bobby@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > >> I there any update on this.  The performance improvements are
> > > quite
> > > > > > >> impressive and I really would like to stop forking kafka just
> to
> > > get
> > > > > this
> > > > > > >> in.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Bobby
> > > > > > >>
> > > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <
> dongjin@apache.org
> > >
> > > > > wrote:
> > > > > > >>
> > > > > > >> > Ismael,
> > > > > > >> >
> > > > > > >> > Oh, I forgot all of you are on working frenzy for 2.0! No
> > > problem,
> > > > > take
> > > > > > >> > your time. I am also working at another issue now. Thank you
> > for
> > > > > > >> letting me
> > > > > > >> > know.
> > > > > > >> >
> > > > > > >> > Best,
> > > > > > >> > Dongjin
> > > > > > >> >
> > > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <
> ismaelj@gmail.com
> > >
> > > > > wrote:
> > > > > > >> >
> > > > > > >> > > Sorry for the delay Dongjin. Everyone is busy finalising
> > > 2.0.0.
> > > > > This
> > > > > > >> KIP
> > > > > > >> > > seems like a great candidate for 2.1.0 and hopefully there
> > > will
> > > > be
> > > > > > >> more
> > > > > > >> > of
> > > > > > >> > > a discussion next week. :)
> > > > > > >> > >
> > > > > > >> > > Ismael
> > > > > > >> > >
> > > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <
> dongjin@apache.org
> > >
> > > > > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hello. I just updated my draft implementation:
> > > > > > >> > > >
> > > > > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > > > >> > > > 2. Apply ZStd 1.3.4
> > > > > > >> > > >
> > > > > > >> > > > You can check out the implementation from here
> > > > > > >> > > > <https://github.com/apache/kafka/pull/2267>. If you
> > > > experience
> > > > > any
> > > > > > >> > > problem
> > > > > > >> > > > running it, don't hesitate to give me a mention.
> > > > > > >> > > >
> > > > > > >> > > > Best,
> > > > > > >> > > > Dongjin
> > > > > > >> > > >
> > > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <
> > > > dongjin@apache.org
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Here is the short conclusion about the license
> problem:
> > > *We
> > > > > can
> > > > > > >> use
> > > > > > >> > > zstd
> > > > > > >> > > > > and zstd-jni without any problem, but we need to
> include
> > > > their
> > > > > > >> > license,
> > > > > > >> > > > > e.g., BSD license.*
> > > > > > >> > > > >
> > > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause License
> requires
> > > to
> > > > > > >> include
> > > > > > >> > the
> > > > > > >> > > > > license used, and BSD 3 Clause License requires that
> the
> > > > name
> > > > > of
> > > > > > >> the
> > > > > > >> > > > > contributor can't be used to endorse or promote the
> > > product.
> > > > > > >> That's
> > > > > > >> > it
> > > > > > >> > > > > <
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> > > > > compatible-with-apache-2-0-license/
> > > > > > >> > > > >
> > > > > > >> > > > > - They are not listed in the list of prohibited
> licenses
> > > > > > >> > > > > <https://www.apache.org/legal/
> resolved.html#category-x>
> > > > also.
> > > > > > >> > > > >
> > > > > > >> > > > > Here is how Spark did for it
> > > > > > >> > > > > <https://issues.apache.org/jira/browse/SPARK-19112>:
> > > > > > >> > > > >
> > > > > > >> > > > > - They made a directory dedicated to the dependency
> > > license
> > > > > files
> > > > > > >> > > > > <https://github.com/apache/spark/tree/master/licenses
> >
> > > and
> > > > > added
> > > > > > >> > > > licenses
> > > > > > >> > > > > for Zstd
> > > > > > >> > > > > <
> > > > > > >> > https://github.com/apache/spark/blob/master/licenses/
> > > > > LICENSE-zstd.txt
> > > > > > >> > > >
> > > > > > >> > > > &
> > > > > > >> > > > > Zstd-jni
> > > > > > >> > > > > <
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> https://github.com/apache/spark/blob/master/licenses/
> > > > > LICENSE-zstd-jni.txt
> > > > > > >> >
> > > > > > >> > > > > .
> > > > > > >> > > > > - Added a link to the original license files in
> LICENSE.
> > > > > > >> > > > > <https://github.com/apache/spark/pull/18805/files>
> > > > > > >> > > > >
> > > > > > >> > > > > If needed, I can make a similar update.
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks for pointing out this problem, Viktor! Nice
> > catch!
> > > > > > >> > > > >
> > > > > > >> > > > > Best,
> > > > > > >> > > > > Dongjin
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <
> > > > > dongjin@apache.org>
> > > > > > >> > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > >> I greatly appreciate your comprehensive reasoning.
> so:
> > +1
> > > > > for b
> > > > > > >> > until
> > > > > > >> > > > now.
> > > > > > >> > > > >>
> > > > > > >> > > > >> For the license issues, I will have a check on how
> the
> > > over
> > > > > > >> projects
> > > > > > >> > > are
> > > > > > >> > > > >> doing and share the results.
> > > > > > >> > > > >>
> > > > > > >> > > > >> Best,
> > > > > > >> > > > >> Dongjin
> > > > > > >> > > > >>
> > > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > > > > > >> > > > viktorsomogyi@gmail.com>
> > > > > > >> > > > >> wrote:
> > > > > > >> > > > >>
> > > > > > >> > > > >>> Hi Dongjin,
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> A couple of comments:
> > > > > > >> > > > >>> I would vote for option b. in the "backward
> > > compatibility"
> > > > > > >> section.
> > > > > > >> > > My
> > > > > > >> > > > >>> reasoning for this is that users upgrading to a zstd
> > > > > compatible
> > > > > > >> > > version
> > > > > > >> > > > >>> won't start to use it automatically, so manual
> > > > > reconfiguration
> > > > > > >> is
> > > > > > >> > > > >>> required.
> > > > > > >> > > > >>> Therefore an upgrade won't mess up the cluster. If
> not
> > > all
> > > > > the
> > > > > > >> > > clients
> > > > > > >> > > > >>> are
> > > > > > >> > > > >>> upgraded but just some of them and they'd start to
> use
> > > > zstd
> > > > > > >> then it
> > > > > > >> > > > would
> > > > > > >> > > > >>> cause errors in the cluster. I'd like to presume
> > though
> > > > that
> > > > > > >> this
> > > > > > >> > is
> > > > > > >> > > a
> > > > > > >> > > > >>> very
> > > > > > >> > > > >>> obvious failure case and nobody should be surprised
> if
> > > it
> > > > > didn't
> > > > > > >> > > work.
> > > > > > >> > > > >>> I wouldn't choose a. as I think we should bump the
> > fetch
> > > > and
> > > > > > >> > produce
> > > > > > >> > > > >>> requests if it's a change in the message format.
> > > Moreover
> > > > if
> > > > > > >> some
> > > > > > >> > of
> > > > > > >> > > > the
> > > > > > >> > > > >>> producers and the brokers are upgraded but some of
> the
> > > > > consumers
> > > > > > >> > are
> > > > > > >> > > > not,
> > > > > > >> > > > >>> then we wouldn't prevent the error when the old
> > consumer
> > > > > tries
> > > > > > >> to
> > > > > > >> > > > consume
> > > > > > >> > > > >>> the zstd compressed messages.
> > > > > > >> > > > >>> I wouldn't choose c. either as I think binding the
> > > > > compression
> > > > > > >> type
> > > > > > >> > > to
> > > > > > >> > > > an
> > > > > > >> > > > >>> API is not so obvious from the developer's
> > perspective.
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> I would also prefer to use the existing binding,
> > however
> > > > we
> > > > > must
> > > > > > >> > > > respect
> > > > > > >> > > > >>> the licenses:
> > > > > > >> > > > >>> "The code for these JNI bindings is licenced under
> > > > 2-clause
> > > > > BSD
> > > > > > >> > > > license.
> > > > > > >> > > > >>> The native Zstd library is licensed under 3-clause
> BSD
> > > > > license
> > > > > > >> and
> > > > > > >> > > > GPL2"
> > > > > > >> > > > >>> Based on the FAQ page
> > > > > > >> > > > >>> https://www.apache.org/legal/
> resolved.html#category-a
> > > > > > >> > > > >>> we may use 2- and 3-clause BSD licenses but the
> Apache
> > > > > license
> > > > > > >> is
> > > > > > >> > not
> > > > > > >> > > > >>> compatible with GPL2. I'm hoping that the "3-clause
> > BSD
> > > > > license
> > > > > > >> and
> > > > > > >> > > > GPL2"
> > > > > > >> > > > >>> is really not an AND but an OR in this case, but I'm
> > no
> > > > > lawyer,
> > > > > > >> > just
> > > > > > >> > > > >>> wanted
> > > > > > >> > > > >>> to make the point that we should watch out for
> > licenses.
> > > > :)
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> Regards,
> > > > > > >> > > > >>> Viktor
> > > > > > >> > > > >>>
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <
> > > > > ibobrik@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > > > >>>
> > > > > > >> > > > >>> > Hello,
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > This is Ivan and I still very much support the
> fact
> > > that
> > > > > zstd
> > > > > > >> > > > >>> compression
> > > > > > >> > > > >>> > should be included out of the box.
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > Please think about the environment, you can save
> > > quite a
> > > > > lot
> > > > > > >> of
> > > > > > >> > > > >>> hardware
> > > > > > >> > > > >>> > with it.
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > Thank you.
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <
> > > > > dongjin@apache.org>
> > > > > > >> > > wrote:
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>> > > Since there are no responses for a week, I
> decided
> > > to
> > > > > > >> > reinitiate
> > > > > > >> > > > the
> > > > > > >> > > > >>> > > discussion thread.
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>>
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > > This KIP is about to introduce ZStandard
> > Compression
> > > > > into
> > > > > > >> > Apache
> > > > > > >> > > > >>> Kafka.
> > > > > > >> > > > >>> > > The reason why it is posted again has a story:
> It
> > > was
> > > > > > >> > originally
> > > > > > >> > > > >>> posted
> > > > > > >> > > > >>> > to
> > > > > > >> > > > >>> > > the dev mailing list more than one year ago but
> > > since
> > > > it
> > > > > > >> has no
> > > > > > >> > > > >>> > performance
> > > > > > >> > > > >>> > > report included, it was postponed later. But
> Some
> > > > people
> > > > > > >> > > (including
> > > > > > >> > > > >>> Ivan)
> > > > > > >> > > > >>> > > reported excellent performance report with the
> > draft
> > > > PR,
> > > > > > >> this
> > > > > > >> > > work
> > > > > > >> > > > >>> is now
> > > > > > >> > > > >>> > > reactivated.
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > > The updated KIP document includes some expected
> > > > > problems and
> > > > > > >> > > their
> > > > > > >> > > > >>> > > candidate alternatives. Please have a look when
> > you
> > > > are
> > > > > > >> free,
> > > > > > >> > and
> > > > > > >> > > > >>> give
> > > > > > >> > > > >>> > me a
> > > > > > >> > > > >>> > > feedback. All kinds of participating are
> welcome.
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > > Best,
> > > > > > >> > > > >>> > > Dongjin
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > > --
> > > > > > >> > > > >>> > > *Dongjin Lee*
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > > *A hitchhiker in the mathematical world.*
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> > > *github:  <http://goog_969573159/>github
> > > > > .com/dongjinleekr
> > > > > > >> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > > > >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr
> > >slideshare:
> > > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > > > >> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > >> > > > >>> > >
> > > > > > >> > > > >>> >
> > > > > > >> > > > >>>
> > > > > > >> > > > >> --
> > > > > > >> > > > >> *Dongjin Lee*
> > > > > > >> > > > >>
> > > > > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > > > > >> > > > >>
> > > > > > >> > > > >> *github:  <http://goog_969573159/>github
> > > .com/dongjinleekr
> > > > > > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > >> > > > >>
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > --
> > > > > > >> > > > > *Dongjin Lee*
> > > > > > >> > > > >
> > > > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > > >> > > > >
> > > > > > >> > > > > *github:  <http://goog_969573159/>
> > github.com/dongjinleekr
> > > > > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > --
> > > > > > >> > > > *Dongjin Lee*
> > > > > > >> > > >
> > > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > > >> > > >
> > > > > > >> > > > *github:  <http://goog_969573159/>github
> .com/dongjinleekr
> > > > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >> --
> > > > > > >> *Dongjin Lee*
> > > > > > >>
> > > > > > >> *A hitchhiker in the mathematical world.*
> > > > > > >>
> > > > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > >> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > > dongjinleekr
> > > > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > www.slideshare.net/dongjinleekr
> > > > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > > >>
> > > > > > >
> > > > > >
> > > > > > --
> > > > > > *Dongjin Lee*
> > > > > >
> > > > > > *A hitchhiker in the mathematical world.*
> > > > > >
> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > > dongjinleekr
> > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > www.slideshare.net/dongjinleekr
> > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > >
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/
> > > dongjinleekr
> > > > <http://www.slideshare.net/dongjinleekr>*
> > > >
> > >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/
> dongjinleekr
> > <http://www.slideshare.net/dongjinleekr>*
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Thanks Jason, I reviewed the down-converting logic following your
explanation.[^1] You mean the following routines, right?

-
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaApis.scala#L534
-
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/LazyDownConversionRecords.java#L165
-
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/record/RecordsUtil.java#L40

It seems like your stance is like following:

1. In principle, Kafka does not change the compression codec when
down-converting, since it requires inspecting the fetched data, which is
expensive.
2. However, there are some cases the fetched data is inspected anyway. In
this case, we can provide compression conversion from Zstandard to
classical ones[^2].

And from what I understand, the cases where the client without ZStandard
support receives ZStandard compressed records can be organized into two
cases:

a. The 'compression.type' configuration of given topic is 'producer' and
the producer compressed the records with ZStandard. (that is, using
ZStandard implicitly.)
b.  The 'compression.type' configuration of given topic is 'zstd'; that is,
using ZStandard explicitly.

As you stated, we don't have to handle the case b specially. So, It seems
like we can narrow the focus of the problem by joining case 1 and case b
like the following:

> Given the topic with 'producer' as its 'compression.type' configuration,
ZStandard compressed records and old client without ZStandard, is there any
case we need to inspect the records and can change the compression type? If
so, can we provide compression type converting?

Do I understand correctly?

Best,
Dongjin

[^1]: I'm sorry, I found that I was a little bit misunderstanding how API
version works, after reviewing the downconvert logic & the protocol
documentation <https://kafka.apache.org/protocol>.
[^2]: None, Gzip, Snappy, Lz4.

On Tue, Aug 14, 2018 at 2:16 AM Jason Gustafson <ja...@confluent.io> wrote:

> >
> > But in my opinion, since the client will fail with the API version, so we
> > don't need to down-convert the messages anyway. Isn't it? So, I think we
> > don't care about this case. (I'm sorry, I am not familiar with
> down-convert
> > logic.)
>
>
> Currently the broker down-converts automatically when it receives an old
> version of the fetch request (a version which is known to predate the
> message format in use). Typically when down-converting the message format,
> we use the same compression type, but there is not much point in doing so
> when we know the client doesn't support it. So if zstandard is in use, and
> we have to down-convert anyway, then we can choose to use a different
> compression type or no compression type.
>
> From my perspective, there is no significant downside to bumping the
> protocol version and it has several potential benefits. Version bumps are
> cheap. The main question mark in my mind is about down-conversion. Figuring
> out whether down-conversion is needed is hard generally without inspecting
> the fetched data, which is expensive. I think we agree in principle that we
> do not want to have to pay this cost generally and prefer the clients to
> fail when they see an unhandled compression type. The point I was making is
> that there are some cases where we are either inspecting the data anyway
> (because we have to down-convert the message format), or we have an easy
> way to tell whether zstandard is in use (the topic has it configured
> explicitly). In the latter case, we don't have to handle it specially. But
> we do have to decide how we will handle down-conversion to older formats.
>
> -Jason
>
> On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <do...@apache.org> wrote:
>
> > Colin and Jason,
> >
> > Thanks for your opinions. In summarizing, the Pros and Cons of bumping
> > fetch API version are:
> >
> > Cons:
> >
> > - The Broker can't know whether a given message batch is compressed with
> > zstd or not.
> > - Need some additional logic for the topic explicitly configured to use
> > zstd.
> >
> > Pros:
> >
> > - The broker doesn't need to conduct expensive down-conversion.
> > - Can message the users to update their client.
> >
> > So, opinions for the backward-compatibility policy by far:
> >
> > - A: bump the API version - +2 (Colin, Jason)
> > - B: leave unchanged - +1 (Viktor)
> >
> > Here are my additional comments:
> >
> > @Colin
> >
> > I greatly appreciate your response. In the case of the dictionary
> support,
> > of course, this issue should be addressed later so we don't need it in
> the
> > first version. You are right - it is not late to try it after some
> > benchmarks. What I mean is, we should keep in mind on that potential
> > feature.
> >
> > @Jason
> >
> > You wrote,
> >
> > > Similarly, if we have to down-convert anyway because the client does
> not
> > understand the message format, then we could also use a different
> > compression type.
> >
> > But in my opinion, since the client will fail with the API version, so we
> > don't need to down-convert the messages anyway. Isn't it? So, I think we
> > don't care about this case. (I'm sorry, I am not familiar with
> down-convert
> > logic.)
> >
> > Please give more opinions. Thanks!
> >
> > - Dongjin
> >
> >
> > On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <ja...@confluent.io>
> wrote:
> >
> > > Hey Colin,
> > >
> > > The problem for the fetch API is that the broker does not generally
> know
> > if
> > > a batch was compressed with zstd unless it parses it. I think the goal
> > here
> > > is to avoid the expensive down-conversion that is needed to ensure
> > > compatibility because it is only necessary if zstd is actually in use.
> > But
> > > as long as old clients can parse the message format, they should get a
> > > reasonable error if they see an unsupported compression type in the
> > > attributes. Basically the onus is on users to ensure that their
> consumers
> > > have been updated prior to using zstd. It seems like a reasonable
> > tradeoff
> > > to me. There are a couple cases that might be worth thinking through:
> > >
> > > 1. If a topic is explicitly configured to use zstd, then we don't need
> to
> > > check the fetched data for the compression type to know if we need
> > > down-conversion. If we did bump the Fetch API version, then we could
> > handle
> > > this case by either down-converting using a different compression type
> or
> > > returning an error.
> > > 2. Similarly, if we have to down-convert anyway because the client does
> > not
> > > understand the message format, then we could also use a different
> > > compression type.
> > >
> > > For the produce API, I think it's reasonable to bump the api version.
> > This
> > > can be used by clients to check whether a broker supports zstd. For
> > > example, we might support a list of preferred compression types in the
> > > producer and we could use the broker to detect which version to use.
> > >
> > > -Jason
> > >
> > > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <cm...@apache.org>
> wrote:
> > >
> > > > Thanks for bumping this, Dongjin.  ZStd is a good compression codec
> > and I
> > > > hope we can get this support in soon!
> > > >
> > > > I would say we can just bump the API version to indicate that ZStd
> > > support
> > > > is expected in new clients.  We probably need some way of indicating
> to
> > > the
> > > > older clients that they can't consume the partitions, as well.
> Perhaps
> > > we
> > > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > > >
> > > > The license thing seems straightforward -- it's just a matter of
> adding
> > > > the text to the right files as per ASF guidelines.
> > > >
> > > > With regard to the dictionary support, do we really need that in the
> > > first
> > > > version?  Hopefully message batches are big enough that this isn't
> > > needed.
> > > > Some benchmarks might help here.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > > As Kafka 2.0.0 was released, let's reboot this issue, KIP-110
> > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > > .
> > > > >
> > > > > For newcomers, Here is some summary of the history: KIP-110 was
> > > > originally
> > > > > worked for the issue KAFKA-4514 but, it lacked benchmark results to
> > get
> > > > the
> > > > > agreement of the community. Later, Ivan Babrou and some other users
> > who
> > > > > adopted the patch provided their excellent performance report which
> > is
> > > > now
> > > > > included in the KIP, but it postponed again because of the
> community
> > > was
> > > > > busy for 2.0.0 release. It is why I now reboot this issue.
> > > > >
> > > > > The following is the current status of the feature: You can check
> the
> > > > > current draft implementation here
> > > > > <https://github.com/apache/kafka/pull/2267>. It is based on zstd
> > 1.3.5
> > > > and
> > > > > periodically rebased onto the latest trunk[^1].
> > > > >
> > > > > The issues that should be addressed is like following:
> > > > >
> > > > > *1. Backward Compatibility*
> > > > >
> > > > > To support old consumers, we need to take a strategy to handle the
> > old
> > > > > consumers. Current candidates are:
> > > > >
> > > > > - Bump API version
> > > > > - Leave unchanged: let the old clients fail.
> > > > > - Improve the error messages:
> > > > >
> > > > > *2. Dictionary Support*
> > > > >
> > > > > To support zstd's dictionary feature in the future (if needed), we
> > need
> > > > to
> > > > > sketch how it should be and leave some room for it. As of now,
> there
> > > has
> > > > > been no discussion on this topic yet.
> > > > >
> > > > > *3. License*
> > > > >
> > > > > To use this feature, we need to add license of zstd and zstd-jni to
> > the
> > > > > project. (Thanks to Viktor Somogyi for raising this issue!) It
> seems
> > > like
> > > > > what Apache Spark did would be a good example but there has been no
> > > > > discussion yet.
> > > > >
> > > > > You can find the details of the above issues in the KIP document.
> > > Please
> > > > > have a look when you are free, and give me feedback. All kinds of
> > > > > participating are welcome.
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > > >
> > > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >
> > > > > > Sorry for the late reply.
> > > > > >
> > > > > > In short, I could not submit the updated KIP by the feature
> freeze
> > > > > > deadline of 2.0.0. For this reason, it will not be included in
> the
> > > > 2.0.0
> > > > > > release and all discussion for this issue were postponed after
> the
> > > > release
> > > > > > of 2.0.0.
> > > > > >
> > > > > > I have been updating the PR following recent updates. Just now, I
> > > > rebased
> > > > > > it against the latest trunk and updated the zstd version into
> > 1.3.5.
> > > > If you
> > > > > > need some request, don't hesitate to notify me. (But not this
> > thread
> > > -
> > > > just
> > > > > > send me the message directly.)
> > > > > >
> > > > > > Best,
> > > > > > Dongjin
> > > > > >
> > > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <bo...@apache.org>
> > > wrote:
> > > > > >
> > > > > >> I there any update on this.  The performance improvements are
> > quite
> > > > > >> impressive and I really would like to stop forking kafka just to
> > get
> > > > this
> > > > > >> in.
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Bobby
> > > > > >>
> > > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <dongjin@apache.org
> >
> > > > wrote:
> > > > > >>
> > > > > >> > Ismael,
> > > > > >> >
> > > > > >> > Oh, I forgot all of you are on working frenzy for 2.0! No
> > problem,
> > > > take
> > > > > >> > your time. I am also working at another issue now. Thank you
> for
> > > > > >> letting me
> > > > > >> > know.
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > Dongjin
> > > > > >> >
> > > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <ismaelj@gmail.com
> >
> > > > wrote:
> > > > > >> >
> > > > > >> > > Sorry for the delay Dongjin. Everyone is busy finalising
> > 2.0.0.
> > > > This
> > > > > >> KIP
> > > > > >> > > seems like a great candidate for 2.1.0 and hopefully there
> > will
> > > be
> > > > > >> more
> > > > > >> > of
> > > > > >> > > a discussion next week. :)
> > > > > >> > >
> > > > > >> > > Ismael
> > > > > >> > >
> > > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <dongjin@apache.org
> >
> > > > wrote:
> > > > > >> > >
> > > > > >> > > > Hello. I just updated my draft implementation:
> > > > > >> > > >
> > > > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > > >> > > > 2. Apply ZStd 1.3.4
> > > > > >> > > >
> > > > > >> > > > You can check out the implementation from here
> > > > > >> > > > <https://github.com/apache/kafka/pull/2267>. If you
> > > experience
> > > > any
> > > > > >> > > problem
> > > > > >> > > > running it, don't hesitate to give me a mention.
> > > > > >> > > >
> > > > > >> > > > Best,
> > > > > >> > > > Dongjin
> > > > > >> > > >
> > > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <
> > > dongjin@apache.org
> > > > >
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > Here is the short conclusion about the license problem:
> > *We
> > > > can
> > > > > >> use
> > > > > >> > > zstd
> > > > > >> > > > > and zstd-jni without any problem, but we need to include
> > > their
> > > > > >> > license,
> > > > > >> > > > > e.g., BSD license.*
> > > > > >> > > > >
> > > > > >> > > > > Both of BSD 2 Clause License & 3 Clause License requires
> > to
> > > > > >> include
> > > > > >> > the
> > > > > >> > > > > license used, and BSD 3 Clause License requires that the
> > > name
> > > > of
> > > > > >> the
> > > > > >> > > > > contributor can't be used to endorse or promote the
> > product.
> > > > > >> That's
> > > > > >> > it
> > > > > >> > > > > <
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> > > > compatible-with-apache-2-0-license/
> > > > > >> > > > >
> > > > > >> > > > > - They are not listed in the list of prohibited licenses
> > > > > >> > > > > <https://www.apache.org/legal/resolved.html#category-x>
> > > also.
> > > > > >> > > > >
> > > > > >> > > > > Here is how Spark did for it
> > > > > >> > > > > <https://issues.apache.org/jira/browse/SPARK-19112>:
> > > > > >> > > > >
> > > > > >> > > > > - They made a directory dedicated to the dependency
> > license
> > > > files
> > > > > >> > > > > <https://github.com/apache/spark/tree/master/licenses>
> > and
> > > > added
> > > > > >> > > > licenses
> > > > > >> > > > > for Zstd
> > > > > >> > > > > <
> > > > > >> > https://github.com/apache/spark/blob/master/licenses/
> > > > LICENSE-zstd.txt
> > > > > >> > > >
> > > > > >> > > > &
> > > > > >> > > > > Zstd-jni
> > > > > >> > > > > <
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >> https://github.com/apache/spark/blob/master/licenses/
> > > > LICENSE-zstd-jni.txt
> > > > > >> >
> > > > > >> > > > > .
> > > > > >> > > > > - Added a link to the original license files in LICENSE.
> > > > > >> > > > > <https://github.com/apache/spark/pull/18805/files>
> > > > > >> > > > >
> > > > > >> > > > > If needed, I can make a similar update.
> > > > > >> > > > >
> > > > > >> > > > > Thanks for pointing out this problem, Viktor! Nice
> catch!
> > > > > >> > > > >
> > > > > >> > > > > Best,
> > > > > >> > > > > Dongjin
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <
> > > > dongjin@apache.org>
> > > > > >> > > wrote:
> > > > > >> > > > >
> > > > > >> > > > >> I greatly appreciate your comprehensive reasoning. so:
> +1
> > > > for b
> > > > > >> > until
> > > > > >> > > > now.
> > > > > >> > > > >>
> > > > > >> > > > >> For the license issues, I will have a check on how the
> > over
> > > > > >> projects
> > > > > >> > > are
> > > > > >> > > > >> doing and share the results.
> > > > > >> > > > >>
> > > > > >> > > > >> Best,
> > > > > >> > > > >> Dongjin
> > > > > >> > > > >>
> > > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > > > > >> > > > viktorsomogyi@gmail.com>
> > > > > >> > > > >> wrote:
> > > > > >> > > > >>
> > > > > >> > > > >>> Hi Dongjin,
> > > > > >> > > > >>>
> > > > > >> > > > >>> A couple of comments:
> > > > > >> > > > >>> I would vote for option b. in the "backward
> > compatibility"
> > > > > >> section.
> > > > > >> > > My
> > > > > >> > > > >>> reasoning for this is that users upgrading to a zstd
> > > > compatible
> > > > > >> > > version
> > > > > >> > > > >>> won't start to use it automatically, so manual
> > > > reconfiguration
> > > > > >> is
> > > > > >> > > > >>> required.
> > > > > >> > > > >>> Therefore an upgrade won't mess up the cluster. If not
> > all
> > > > the
> > > > > >> > > clients
> > > > > >> > > > >>> are
> > > > > >> > > > >>> upgraded but just some of them and they'd start to use
> > > zstd
> > > > > >> then it
> > > > > >> > > > would
> > > > > >> > > > >>> cause errors in the cluster. I'd like to presume
> though
> > > that
> > > > > >> this
> > > > > >> > is
> > > > > >> > > a
> > > > > >> > > > >>> very
> > > > > >> > > > >>> obvious failure case and nobody should be surprised if
> > it
> > > > didn't
> > > > > >> > > work.
> > > > > >> > > > >>> I wouldn't choose a. as I think we should bump the
> fetch
> > > and
> > > > > >> > produce
> > > > > >> > > > >>> requests if it's a change in the message format.
> > Moreover
> > > if
> > > > > >> some
> > > > > >> > of
> > > > > >> > > > the
> > > > > >> > > > >>> producers and the brokers are upgraded but some of the
> > > > consumers
> > > > > >> > are
> > > > > >> > > > not,
> > > > > >> > > > >>> then we wouldn't prevent the error when the old
> consumer
> > > > tries
> > > > > >> to
> > > > > >> > > > consume
> > > > > >> > > > >>> the zstd compressed messages.
> > > > > >> > > > >>> I wouldn't choose c. either as I think binding the
> > > > compression
> > > > > >> type
> > > > > >> > > to
> > > > > >> > > > an
> > > > > >> > > > >>> API is not so obvious from the developer's
> perspective.
> > > > > >> > > > >>>
> > > > > >> > > > >>> I would also prefer to use the existing binding,
> however
> > > we
> > > > must
> > > > > >> > > > respect
> > > > > >> > > > >>> the licenses:
> > > > > >> > > > >>> "The code for these JNI bindings is licenced under
> > > 2-clause
> > > > BSD
> > > > > >> > > > license.
> > > > > >> > > > >>> The native Zstd library is licensed under 3-clause BSD
> > > > license
> > > > > >> and
> > > > > >> > > > GPL2"
> > > > > >> > > > >>> Based on the FAQ page
> > > > > >> > > > >>> https://www.apache.org/legal/resolved.html#category-a
> > > > > >> > > > >>> we may use 2- and 3-clause BSD licenses but the Apache
> > > > license
> > > > > >> is
> > > > > >> > not
> > > > > >> > > > >>> compatible with GPL2. I'm hoping that the "3-clause
> BSD
> > > > license
> > > > > >> and
> > > > > >> > > > GPL2"
> > > > > >> > > > >>> is really not an AND but an OR in this case, but I'm
> no
> > > > lawyer,
> > > > > >> > just
> > > > > >> > > > >>> wanted
> > > > > >> > > > >>> to make the point that we should watch out for
> licenses.
> > > :)
> > > > > >> > > > >>>
> > > > > >> > > > >>> Regards,
> > > > > >> > > > >>> Viktor
> > > > > >> > > > >>>
> > > > > >> > > > >>>
> > > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <
> > > > ibobrik@gmail.com>
> > > > > >> > > wrote:
> > > > > >> > > > >>>
> > > > > >> > > > >>> > Hello,
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > This is Ivan and I still very much support the fact
> > that
> > > > zstd
> > > > > >> > > > >>> compression
> > > > > >> > > > >>> > should be included out of the box.
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > Please think about the environment, you can save
> > quite a
> > > > lot
> > > > > >> of
> > > > > >> > > > >>> hardware
> > > > > >> > > > >>> > with it.
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > Thank you.
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <
> > > > dongjin@apache.org>
> > > > > >> > > wrote:
> > > > > >> > > > >>> >
> > > > > >> > > > >>> > > Since there are no responses for a week, I decided
> > to
> > > > > >> > reinitiate
> > > > > >> > > > the
> > > > > >> > > > >>> > > discussion thread.
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> >
> > > > > >> > > > >>>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > > This KIP is about to introduce ZStandard
> Compression
> > > > into
> > > > > >> > Apache
> > > > > >> > > > >>> Kafka.
> > > > > >> > > > >>> > > The reason why it is posted again has a story: It
> > was
> > > > > >> > originally
> > > > > >> > > > >>> posted
> > > > > >> > > > >>> > to
> > > > > >> > > > >>> > > the dev mailing list more than one year ago but
> > since
> > > it
> > > > > >> has no
> > > > > >> > > > >>> > performance
> > > > > >> > > > >>> > > report included, it was postponed later. But Some
> > > people
> > > > > >> > > (including
> > > > > >> > > > >>> Ivan)
> > > > > >> > > > >>> > > reported excellent performance report with the
> draft
> > > PR,
> > > > > >> this
> > > > > >> > > work
> > > > > >> > > > >>> is now
> > > > > >> > > > >>> > > reactivated.
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > > The updated KIP document includes some expected
> > > > problems and
> > > > > >> > > their
> > > > > >> > > > >>> > > candidate alternatives. Please have a look when
> you
> > > are
> > > > > >> free,
> > > > > >> > and
> > > > > >> > > > >>> give
> > > > > >> > > > >>> > me a
> > > > > >> > > > >>> > > feedback. All kinds of participating are welcome.
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > > Best,
> > > > > >> > > > >>> > > Dongjin
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > > --
> > > > > >> > > > >>> > > *Dongjin Lee*
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > > *A hitchhiker in the mathematical world.*
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> > > *github:  <http://goog_969573159/>github
> > > > .com/dongjinleekr
> > > > > >> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr
> >slideshare:
> > > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > > >> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > > > > >> > > > >>> > >
> > > > > >> > > > >>> >
> > > > > >> > > > >>>
> > > > > >> > > > >> --
> > > > > >> > > > >> *Dongjin Lee*
> > > > > >> > > > >>
> > > > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > > > >> > > > >>
> > > > > >> > > > >> *github:  <http://goog_969573159/>github
> > .com/dongjinleekr
> > > > > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > >> > > > >>
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > --
> > > > > >> > > > > *Dongjin Lee*
> > > > > >> > > > >
> > > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > >> > > > >
> > > > > >> > > > > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > > > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > > *Dongjin Lee*
> > > > > >> > > >
> > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > >> > > >
> > > > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > >> > > > www.slideshare.net/dongjinleekr
> > > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >> --
> > > > > >> *Dongjin Lee*
> > > > > >>
> > > > > >> *A hitchhiker in the mathematical world.*
> > > > > >>
> > > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > >> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > dongjinleekr
> > > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > www.slideshare.net/dongjinleekr
> > > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > > >>
> > > > > >
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > >
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > > dongjinleekr
> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > www.slideshare.net/dongjinleekr
> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/
> > dongjinleekr
> > > <http://www.slideshare.net/dongjinleekr>*
> > >
> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Jason Gustafson <ja...@confluent.io>.
>
> But in my opinion, since the client will fail with the API version, so we
> don't need to down-convert the messages anyway. Isn't it? So, I think we
> don't care about this case. (I'm sorry, I am not familiar with down-convert
> logic.)


Currently the broker down-converts automatically when it receives an old
version of the fetch request (a version which is known to predate the
message format in use). Typically when down-converting the message format,
we use the same compression type, but there is not much point in doing so
when we know the client doesn't support it. So if zstandard is in use, and
we have to down-convert anyway, then we can choose to use a different
compression type or no compression type.

From my perspective, there is no significant downside to bumping the
protocol version and it has several potential benefits. Version bumps are
cheap. The main question mark in my mind is about down-conversion. Figuring
out whether down-conversion is needed is hard generally without inspecting
the fetched data, which is expensive. I think we agree in principle that we
do not want to have to pay this cost generally and prefer the clients to
fail when they see an unhandled compression type. The point I was making is
that there are some cases where we are either inspecting the data anyway
(because we have to down-convert the message format), or we have an easy
way to tell whether zstandard is in use (the topic has it configured
explicitly). In the latter case, we don't have to handle it specially. But
we do have to decide how we will handle down-conversion to older formats.

-Jason

On Sun, Aug 12, 2018 at 5:15 PM, Dongjin Lee <do...@apache.org> wrote:

> Colin and Jason,
>
> Thanks for your opinions. In summarizing, the Pros and Cons of bumping
> fetch API version are:
>
> Cons:
>
> - The Broker can't know whether a given message batch is compressed with
> zstd or not.
> - Need some additional logic for the topic explicitly configured to use
> zstd.
>
> Pros:
>
> - The broker doesn't need to conduct expensive down-conversion.
> - Can message the users to update their client.
>
> So, opinions for the backward-compatibility policy by far:
>
> - A: bump the API version - +2 (Colin, Jason)
> - B: leave unchanged - +1 (Viktor)
>
> Here are my additional comments:
>
> @Colin
>
> I greatly appreciate your response. In the case of the dictionary support,
> of course, this issue should be addressed later so we don't need it in the
> first version. You are right - it is not late to try it after some
> benchmarks. What I mean is, we should keep in mind on that potential
> feature.
>
> @Jason
>
> You wrote,
>
> > Similarly, if we have to down-convert anyway because the client does not
> understand the message format, then we could also use a different
> compression type.
>
> But in my opinion, since the client will fail with the API version, so we
> don't need to down-convert the messages anyway. Isn't it? So, I think we
> don't care about this case. (I'm sorry, I am not familiar with down-convert
> logic.)
>
> Please give more opinions. Thanks!
>
> - Dongjin
>
>
> On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <ja...@confluent.io> wrote:
>
> > Hey Colin,
> >
> > The problem for the fetch API is that the broker does not generally know
> if
> > a batch was compressed with zstd unless it parses it. I think the goal
> here
> > is to avoid the expensive down-conversion that is needed to ensure
> > compatibility because it is only necessary if zstd is actually in use.
> But
> > as long as old clients can parse the message format, they should get a
> > reasonable error if they see an unsupported compression type in the
> > attributes. Basically the onus is on users to ensure that their consumers
> > have been updated prior to using zstd. It seems like a reasonable
> tradeoff
> > to me. There are a couple cases that might be worth thinking through:
> >
> > 1. If a topic is explicitly configured to use zstd, then we don't need to
> > check the fetched data for the compression type to know if we need
> > down-conversion. If we did bump the Fetch API version, then we could
> handle
> > this case by either down-converting using a different compression type or
> > returning an error.
> > 2. Similarly, if we have to down-convert anyway because the client does
> not
> > understand the message format, then we could also use a different
> > compression type.
> >
> > For the produce API, I think it's reasonable to bump the api version.
> This
> > can be used by clients to check whether a broker supports zstd. For
> > example, we might support a list of preferred compression types in the
> > producer and we could use the broker to detect which version to use.
> >
> > -Jason
> >
> > On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <cm...@apache.org> wrote:
> >
> > > Thanks for bumping this, Dongjin.  ZStd is a good compression codec
> and I
> > > hope we can get this support in soon!
> > >
> > > I would say we can just bump the API version to indicate that ZStd
> > support
> > > is expected in new clients.  We probably need some way of indicating to
> > the
> > > older clients that they can't consume the partitions, as well.  Perhaps
> > we
> > > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> > >
> > > The license thing seems straightforward -- it's just a matter of adding
> > > the text to the right files as per ASF guidelines.
> > >
> > > With regard to the dictionary support, do we really need that in the
> > first
> > > version?  Hopefully message batches are big enough that this isn't
> > needed.
> > > Some benchmarks might help here.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > > As Kafka 2.0.0 was released, let's reboot this issue, KIP-110
> > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > > .
> > > >
> > > > For newcomers, Here is some summary of the history: KIP-110 was
> > > originally
> > > > worked for the issue KAFKA-4514 but, it lacked benchmark results to
> get
> > > the
> > > > agreement of the community. Later, Ivan Babrou and some other users
> who
> > > > adopted the patch provided their excellent performance report which
> is
> > > now
> > > > included in the KIP, but it postponed again because of the community
> > was
> > > > busy for 2.0.0 release. It is why I now reboot this issue.
> > > >
> > > > The following is the current status of the feature: You can check the
> > > > current draft implementation here
> > > > <https://github.com/apache/kafka/pull/2267>. It is based on zstd
> 1.3.5
> > > and
> > > > periodically rebased onto the latest trunk[^1].
> > > >
> > > > The issues that should be addressed is like following:
> > > >
> > > > *1. Backward Compatibility*
> > > >
> > > > To support old consumers, we need to take a strategy to handle the
> old
> > > > consumers. Current candidates are:
> > > >
> > > > - Bump API version
> > > > - Leave unchanged: let the old clients fail.
> > > > - Improve the error messages:
> > > >
> > > > *2. Dictionary Support*
> > > >
> > > > To support zstd's dictionary feature in the future (if needed), we
> need
> > > to
> > > > sketch how it should be and leave some room for it. As of now, there
> > has
> > > > been no discussion on this topic yet.
> > > >
> > > > *3. License*
> > > >
> > > > To use this feature, we need to add license of zstd and zstd-jni to
> the
> > > > project. (Thanks to Viktor Somogyi for raising this issue!) It seems
> > like
> > > > what Apache Spark did would be a good example but there has been no
> > > > discussion yet.
> > > >
> > > > You can find the details of the above issues in the KIP document.
> > Please
> > > > have a look when you are free, and give me feedback. All kinds of
> > > > participating are welcome.
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > [^1]: At the time of writing, commit 6b4fb8152.
> > > >
> > > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >
> > > > > Sorry for the late reply.
> > > > >
> > > > > In short, I could not submit the updated KIP by the feature freeze
> > > > > deadline of 2.0.0. For this reason, it will not be included in the
> > > 2.0.0
> > > > > release and all discussion for this issue were postponed after the
> > > release
> > > > > of 2.0.0.
> > > > >
> > > > > I have been updating the PR following recent updates. Just now, I
> > > rebased
> > > > > it against the latest trunk and updated the zstd version into
> 1.3.5.
> > > If you
> > > > > need some request, don't hesitate to notify me. (But not this
> thread
> > -
> > > just
> > > > > send me the message directly.)
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <bo...@apache.org>
> > wrote:
> > > > >
> > > > >> I there any update on this.  The performance improvements are
> quite
> > > > >> impressive and I really would like to stop forking kafka just to
> get
> > > this
> > > > >> in.
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Bobby
> > > > >>
> > > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >>
> > > > >> > Ismael,
> > > > >> >
> > > > >> > Oh, I forgot all of you are on working frenzy for 2.0! No
> problem,
> > > take
> > > > >> > your time. I am also working at another issue now. Thank you for
> > > > >> letting me
> > > > >> > know.
> > > > >> >
> > > > >> > Best,
> > > > >> > Dongjin
> > > > >> >
> > > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <is...@gmail.com>
> > > wrote:
> > > > >> >
> > > > >> > > Sorry for the delay Dongjin. Everyone is busy finalising
> 2.0.0.
> > > This
> > > > >> KIP
> > > > >> > > seems like a great candidate for 2.1.0 and hopefully there
> will
> > be
> > > > >> more
> > > > >> > of
> > > > >> > > a discussion next week. :)
> > > > >> > >
> > > > >> > > Ismael
> > > > >> > >
> > > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <do...@apache.org>
> > > wrote:
> > > > >> > >
> > > > >> > > > Hello. I just updated my draft implementation:
> > > > >> > > >
> > > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > >> > > > 2. Apply ZStd 1.3.4
> > > > >> > > >
> > > > >> > > > You can check out the implementation from here
> > > > >> > > > <https://github.com/apache/kafka/pull/2267>. If you
> > experience
> > > any
> > > > >> > > problem
> > > > >> > > > running it, don't hesitate to give me a mention.
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Dongjin
> > > > >> > > >
> > > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <
> > dongjin@apache.org
> > > >
> > > > >> > wrote:
> > > > >> > > >
> > > > >> > > > > Here is the short conclusion about the license problem:
> *We
> > > can
> > > > >> use
> > > > >> > > zstd
> > > > >> > > > > and zstd-jni without any problem, but we need to include
> > their
> > > > >> > license,
> > > > >> > > > > e.g., BSD license.*
> > > > >> > > > >
> > > > >> > > > > Both of BSD 2 Clause License & 3 Clause License requires
> to
> > > > >> include
> > > > >> > the
> > > > >> > > > > license used, and BSD 3 Clause License requires that the
> > name
> > > of
> > > > >> the
> > > > >> > > > > contributor can't be used to endorse or promote the
> product.
> > > > >> That's
> > > > >> > it
> > > > >> > > > > <
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> > > compatible-with-apache-2-0-license/
> > > > >> > > > >
> > > > >> > > > > - They are not listed in the list of prohibited licenses
> > > > >> > > > > <https://www.apache.org/legal/resolved.html#category-x>
> > also.
> > > > >> > > > >
> > > > >> > > > > Here is how Spark did for it
> > > > >> > > > > <https://issues.apache.org/jira/browse/SPARK-19112>:
> > > > >> > > > >
> > > > >> > > > > - They made a directory dedicated to the dependency
> license
> > > files
> > > > >> > > > > <https://github.com/apache/spark/tree/master/licenses>
> and
> > > added
> > > > >> > > > licenses
> > > > >> > > > > for Zstd
> > > > >> > > > > <
> > > > >> > https://github.com/apache/spark/blob/master/licenses/
> > > LICENSE-zstd.txt
> > > > >> > > >
> > > > >> > > > &
> > > > >> > > > > Zstd-jni
> > > > >> > > > > <
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> https://github.com/apache/spark/blob/master/licenses/
> > > LICENSE-zstd-jni.txt
> > > > >> >
> > > > >> > > > > .
> > > > >> > > > > - Added a link to the original license files in LICENSE.
> > > > >> > > > > <https://github.com/apache/spark/pull/18805/files>
> > > > >> > > > >
> > > > >> > > > > If needed, I can make a similar update.
> > > > >> > > > >
> > > > >> > > > > Thanks for pointing out this problem, Viktor! Nice catch!
> > > > >> > > > >
> > > > >> > > > > Best,
> > > > >> > > > > Dongjin
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <
> > > dongjin@apache.org>
> > > > >> > > wrote:
> > > > >> > > > >
> > > > >> > > > >> I greatly appreciate your comprehensive reasoning. so: +1
> > > for b
> > > > >> > until
> > > > >> > > > now.
> > > > >> > > > >>
> > > > >> > > > >> For the license issues, I will have a check on how the
> over
> > > > >> projects
> > > > >> > > are
> > > > >> > > > >> doing and share the results.
> > > > >> > > > >>
> > > > >> > > > >> Best,
> > > > >> > > > >> Dongjin
> > > > >> > > > >>
> > > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > > > >> > > > viktorsomogyi@gmail.com>
> > > > >> > > > >> wrote:
> > > > >> > > > >>
> > > > >> > > > >>> Hi Dongjin,
> > > > >> > > > >>>
> > > > >> > > > >>> A couple of comments:
> > > > >> > > > >>> I would vote for option b. in the "backward
> compatibility"
> > > > >> section.
> > > > >> > > My
> > > > >> > > > >>> reasoning for this is that users upgrading to a zstd
> > > compatible
> > > > >> > > version
> > > > >> > > > >>> won't start to use it automatically, so manual
> > > reconfiguration
> > > > >> is
> > > > >> > > > >>> required.
> > > > >> > > > >>> Therefore an upgrade won't mess up the cluster. If not
> all
> > > the
> > > > >> > > clients
> > > > >> > > > >>> are
> > > > >> > > > >>> upgraded but just some of them and they'd start to use
> > zstd
> > > > >> then it
> > > > >> > > > would
> > > > >> > > > >>> cause errors in the cluster. I'd like to presume though
> > that
> > > > >> this
> > > > >> > is
> > > > >> > > a
> > > > >> > > > >>> very
> > > > >> > > > >>> obvious failure case and nobody should be surprised if
> it
> > > didn't
> > > > >> > > work.
> > > > >> > > > >>> I wouldn't choose a. as I think we should bump the fetch
> > and
> > > > >> > produce
> > > > >> > > > >>> requests if it's a change in the message format.
> Moreover
> > if
> > > > >> some
> > > > >> > of
> > > > >> > > > the
> > > > >> > > > >>> producers and the brokers are upgraded but some of the
> > > consumers
> > > > >> > are
> > > > >> > > > not,
> > > > >> > > > >>> then we wouldn't prevent the error when the old consumer
> > > tries
> > > > >> to
> > > > >> > > > consume
> > > > >> > > > >>> the zstd compressed messages.
> > > > >> > > > >>> I wouldn't choose c. either as I think binding the
> > > compression
> > > > >> type
> > > > >> > > to
> > > > >> > > > an
> > > > >> > > > >>> API is not so obvious from the developer's perspective.
> > > > >> > > > >>>
> > > > >> > > > >>> I would also prefer to use the existing binding, however
> > we
> > > must
> > > > >> > > > respect
> > > > >> > > > >>> the licenses:
> > > > >> > > > >>> "The code for these JNI bindings is licenced under
> > 2-clause
> > > BSD
> > > > >> > > > license.
> > > > >> > > > >>> The native Zstd library is licensed under 3-clause BSD
> > > license
> > > > >> and
> > > > >> > > > GPL2"
> > > > >> > > > >>> Based on the FAQ page
> > > > >> > > > >>> https://www.apache.org/legal/resolved.html#category-a
> > > > >> > > > >>> we may use 2- and 3-clause BSD licenses but the Apache
> > > license
> > > > >> is
> > > > >> > not
> > > > >> > > > >>> compatible with GPL2. I'm hoping that the "3-clause BSD
> > > license
> > > > >> and
> > > > >> > > > GPL2"
> > > > >> > > > >>> is really not an AND but an OR in this case, but I'm no
> > > lawyer,
> > > > >> > just
> > > > >> > > > >>> wanted
> > > > >> > > > >>> to make the point that we should watch out for licenses.
> > :)
> > > > >> > > > >>>
> > > > >> > > > >>> Regards,
> > > > >> > > > >>> Viktor
> > > > >> > > > >>>
> > > > >> > > > >>>
> > > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <
> > > ibobrik@gmail.com>
> > > > >> > > wrote:
> > > > >> > > > >>>
> > > > >> > > > >>> > Hello,
> > > > >> > > > >>> >
> > > > >> > > > >>> > This is Ivan and I still very much support the fact
> that
> > > zstd
> > > > >> > > > >>> compression
> > > > >> > > > >>> > should be included out of the box.
> > > > >> > > > >>> >
> > > > >> > > > >>> > Please think about the environment, you can save
> quite a
> > > lot
> > > > >> of
> > > > >> > > > >>> hardware
> > > > >> > > > >>> > with it.
> > > > >> > > > >>> >
> > > > >> > > > >>> > Thank you.
> > > > >> > > > >>> >
> > > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <
> > > dongjin@apache.org>
> > > > >> > > wrote:
> > > > >> > > > >>> >
> > > > >> > > > >>> > > Since there are no responses for a week, I decided
> to
> > > > >> > reinitiate
> > > > >> > > > the
> > > > >> > > > >>> > > discussion thread.
> > > > >> > > > >>> > >
> > > > >> > > > >>> > >
> > > > >> > > > >>> > >
> > > > >> > > > >>> >
> > > > >> > > > >>>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 110%3A+Add+Codec+for+ZStandard+Compression
> > > > >> > > > >>> > >
> > > > >> > > > >>> > > This KIP is about to introduce ZStandard Compression
> > > into
> > > > >> > Apache
> > > > >> > > > >>> Kafka.
> > > > >> > > > >>> > > The reason why it is posted again has a story: It
> was
> > > > >> > originally
> > > > >> > > > >>> posted
> > > > >> > > > >>> > to
> > > > >> > > > >>> > > the dev mailing list more than one year ago but
> since
> > it
> > > > >> has no
> > > > >> > > > >>> > performance
> > > > >> > > > >>> > > report included, it was postponed later. But Some
> > people
> > > > >> > > (including
> > > > >> > > > >>> Ivan)
> > > > >> > > > >>> > > reported excellent performance report with the draft
> > PR,
> > > > >> this
> > > > >> > > work
> > > > >> > > > >>> is now
> > > > >> > > > >>> > > reactivated.
> > > > >> > > > >>> > >
> > > > >> > > > >>> > > The updated KIP document includes some expected
> > > problems and
> > > > >> > > their
> > > > >> > > > >>> > > candidate alternatives. Please have a look when you
> > are
> > > > >> free,
> > > > >> > and
> > > > >> > > > >>> give
> > > > >> > > > >>> > me a
> > > > >> > > > >>> > > feedback. All kinds of participating are welcome.
> > > > >> > > > >>> > >
> > > > >> > > > >>> > > Best,
> > > > >> > > > >>> > > Dongjin
> > > > >> > > > >>> > >
> > > > >> > > > >>> > > --
> > > > >> > > > >>> > > *Dongjin Lee*
> > > > >> > > > >>> > >
> > > > >> > > > >>> > > *A hitchhiker in the mathematical world.*
> > > > >> > > > >>> > >
> > > > >> > > > >>> > > *github:  <http://goog_969573159/>github
> > > .com/dongjinleekr
> > > > >> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > > >> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > > > >> > > > >>> > >
> > > > >> > > > >>> >
> > > > >> > > > >>>
> > > > >> > > > >> --
> > > > >> > > > >> *Dongjin Lee*
> > > > >> > > > >>
> > > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > > >> > > > >>
> > > > >> > > > >> *github:  <http://goog_969573159/>github
> .com/dongjinleekr
> > > > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > >> > > > www.slideshare.net/dongjinleekr
> > > > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > >> > > > >>
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > *Dongjin Lee*
> > > > >> > > > >
> > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > >
> > > > >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > >> > > > www.slideshare.net/dongjinleekr
> > > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > *Dongjin Lee*
> > > > >> > > >
> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > >> > > >
> > > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > >> > > > www.slideshare.net/dongjinleekr
> > > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >> --
> > > > >> *Dongjin Lee*
> > > > >>
> > > > >> *A hitchhiker in the mathematical world.*
> > > > >>
> > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > dongjinleekr
> > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > www.slideshare.net/dongjinleekr
> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > >>
> > > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > >
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > > dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > www.slideshare.net/dongjinleekr
> > > > <http://www.slideshare.net/dongjinleekr>*
> > >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/
> dongjinleekr
> > <http://www.slideshare.net/dongjinleekr>*
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Colin and Jason,

Thanks for your opinions. In summarizing, the Pros and Cons of bumping
fetch API version are:

Cons:

- The Broker can't know whether a given message batch is compressed with
zstd or not.
- Need some additional logic for the topic explicitly configured to use
zstd.

Pros:

- The broker doesn't need to conduct expensive down-conversion.
- Can message the users to update their client.

So, opinions for the backward-compatibility policy by far:

- A: bump the API version - +2 (Colin, Jason)
- B: leave unchanged - +1 (Viktor)

Here are my additional comments:

@Colin

I greatly appreciate your response. In the case of the dictionary support,
of course, this issue should be addressed later so we don't need it in the
first version. You are right - it is not late to try it after some
benchmarks. What I mean is, we should keep in mind on that potential
feature.

@Jason

You wrote,

> Similarly, if we have to down-convert anyway because the client does not
understand the message format, then we could also use a different
compression type.

But in my opinion, since the client will fail with the API version, so we
don't need to down-convert the messages anyway. Isn't it? So, I think we
don't care about this case. (I'm sorry, I am not familiar with down-convert
logic.)

Please give more opinions. Thanks!

- Dongjin


On Wed, Aug 8, 2018 at 6:41 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hey Colin,
>
> The problem for the fetch API is that the broker does not generally know if
> a batch was compressed with zstd unless it parses it. I think the goal here
> is to avoid the expensive down-conversion that is needed to ensure
> compatibility because it is only necessary if zstd is actually in use. But
> as long as old clients can parse the message format, they should get a
> reasonable error if they see an unsupported compression type in the
> attributes. Basically the onus is on users to ensure that their consumers
> have been updated prior to using zstd. It seems like a reasonable tradeoff
> to me. There are a couple cases that might be worth thinking through:
>
> 1. If a topic is explicitly configured to use zstd, then we don't need to
> check the fetched data for the compression type to know if we need
> down-conversion. If we did bump the Fetch API version, then we could handle
> this case by either down-converting using a different compression type or
> returning an error.
> 2. Similarly, if we have to down-convert anyway because the client does not
> understand the message format, then we could also use a different
> compression type.
>
> For the produce API, I think it's reasonable to bump the api version. This
> can be used by clients to check whether a broker supports zstd. For
> example, we might support a list of preferred compression types in the
> producer and we could use the broker to detect which version to use.
>
> -Jason
>
> On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <cm...@apache.org> wrote:
>
> > Thanks for bumping this, Dongjin.  ZStd is a good compression codec and I
> > hope we can get this support in soon!
> >
> > I would say we can just bump the API version to indicate that ZStd
> support
> > is expected in new clients.  We probably need some way of indicating to
> the
> > older clients that they can't consume the partitions, as well.  Perhaps
> we
> > can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
> >
> > The license thing seems straightforward -- it's just a matter of adding
> > the text to the right files as per ASF guidelines.
> >
> > With regard to the dictionary support, do we really need that in the
> first
> > version?  Hopefully message batches are big enough that this isn't
> needed.
> > Some benchmarks might help here.
> >
> > best,
> > Colin
> >
> >
> > On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > > As Kafka 2.0.0 was released, let's reboot this issue, KIP-110
> > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 110%3A+Add+Codec+for+ZStandard+Compression>
> > > .
> > >
> > > For newcomers, Here is some summary of the history: KIP-110 was
> > originally
> > > worked for the issue KAFKA-4514 but, it lacked benchmark results to get
> > the
> > > agreement of the community. Later, Ivan Babrou and some other users who
> > > adopted the patch provided their excellent performance report which is
> > now
> > > included in the KIP, but it postponed again because of the community
> was
> > > busy for 2.0.0 release. It is why I now reboot this issue.
> > >
> > > The following is the current status of the feature: You can check the
> > > current draft implementation here
> > > <https://github.com/apache/kafka/pull/2267>. It is based on zstd 1.3.5
> > and
> > > periodically rebased onto the latest trunk[^1].
> > >
> > > The issues that should be addressed is like following:
> > >
> > > *1. Backward Compatibility*
> > >
> > > To support old consumers, we need to take a strategy to handle the old
> > > consumers. Current candidates are:
> > >
> > > - Bump API version
> > > - Leave unchanged: let the old clients fail.
> > > - Improve the error messages:
> > >
> > > *2. Dictionary Support*
> > >
> > > To support zstd's dictionary feature in the future (if needed), we need
> > to
> > > sketch how it should be and leave some room for it. As of now, there
> has
> > > been no discussion on this topic yet.
> > >
> > > *3. License*
> > >
> > > To use this feature, we need to add license of zstd and zstd-jni to the
> > > project. (Thanks to Viktor Somogyi for raising this issue!) It seems
> like
> > > what Apache Spark did would be a good example but there has been no
> > > discussion yet.
> > >
> > > You can find the details of the above issues in the KIP document.
> Please
> > > have a look when you are free, and give me feedback. All kinds of
> > > participating are welcome.
> > >
> > > Best,
> > > Dongjin
> > >
> > > [^1]: At the time of writing, commit 6b4fb8152.
> > >
> > > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > Sorry for the late reply.
> > > >
> > > > In short, I could not submit the updated KIP by the feature freeze
> > > > deadline of 2.0.0. For this reason, it will not be included in the
> > 2.0.0
> > > > release and all discussion for this issue were postponed after the
> > release
> > > > of 2.0.0.
> > > >
> > > > I have been updating the PR following recent updates. Just now, I
> > rebased
> > > > it against the latest trunk and updated the zstd version into 1.3.5.
> > If you
> > > > need some request, don't hesitate to notify me. (But not this thread
> -
> > just
> > > > send me the message directly.)
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <bo...@apache.org>
> wrote:
> > > >
> > > >> I there any update on this.  The performance improvements are quite
> > > >> impressive and I really would like to stop forking kafka just to get
> > this
> > > >> in.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Bobby
> > > >>
> > > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >>
> > > >> > Ismael,
> > > >> >
> > > >> > Oh, I forgot all of you are on working frenzy for 2.0! No problem,
> > take
> > > >> > your time. I am also working at another issue now. Thank you for
> > > >> letting me
> > > >> > know.
> > > >> >
> > > >> > Best,
> > > >> > Dongjin
> > > >> >
> > > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <is...@gmail.com>
> > wrote:
> > > >> >
> > > >> > > Sorry for the delay Dongjin. Everyone is busy finalising 2.0.0.
> > This
> > > >> KIP
> > > >> > > seems like a great candidate for 2.1.0 and hopefully there will
> be
> > > >> more
> > > >> > of
> > > >> > > a discussion next week. :)
> > > >> > >
> > > >> > > Ismael
> > > >> > >
> > > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <do...@apache.org>
> > wrote:
> > > >> > >
> > > >> > > > Hello. I just updated my draft implementation:
> > > >> > > >
> > > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > >> > > > 2. Apply ZStd 1.3.4
> > > >> > > >
> > > >> > > > You can check out the implementation from here
> > > >> > > > <https://github.com/apache/kafka/pull/2267>. If you
> experience
> > any
> > > >> > > problem
> > > >> > > > running it, don't hesitate to give me a mention.
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Dongjin
> > > >> > > >
> > > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <
> dongjin@apache.org
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Here is the short conclusion about the license problem: *We
> > can
> > > >> use
> > > >> > > zstd
> > > >> > > > > and zstd-jni without any problem, but we need to include
> their
> > > >> > license,
> > > >> > > > > e.g., BSD license.*
> > > >> > > > >
> > > >> > > > > Both of BSD 2 Clause License & 3 Clause License requires to
> > > >> include
> > > >> > the
> > > >> > > > > license used, and BSD 3 Clause License requires that the
> name
> > of
> > > >> the
> > > >> > > > > contributor can't be used to endorse or promote the product.
> > > >> That's
> > > >> > it
> > > >> > > > > <
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> > compatible-with-apache-2-0-license/
> > > >> > > > >
> > > >> > > > > - They are not listed in the list of prohibited licenses
> > > >> > > > > <https://www.apache.org/legal/resolved.html#category-x>
> also.
> > > >> > > > >
> > > >> > > > > Here is how Spark did for it
> > > >> > > > > <https://issues.apache.org/jira/browse/SPARK-19112>:
> > > >> > > > >
> > > >> > > > > - They made a directory dedicated to the dependency license
> > files
> > > >> > > > > <https://github.com/apache/spark/tree/master/licenses> and
> > added
> > > >> > > > licenses
> > > >> > > > > for Zstd
> > > >> > > > > <
> > > >> > https://github.com/apache/spark/blob/master/licenses/
> > LICENSE-zstd.txt
> > > >> > > >
> > > >> > > > &
> > > >> > > > > Zstd-jni
> > > >> > > > > <
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> https://github.com/apache/spark/blob/master/licenses/
> > LICENSE-zstd-jni.txt
> > > >> >
> > > >> > > > > .
> > > >> > > > > - Added a link to the original license files in LICENSE.
> > > >> > > > > <https://github.com/apache/spark/pull/18805/files>
> > > >> > > > >
> > > >> > > > > If needed, I can make a similar update.
> > > >> > > > >
> > > >> > > > > Thanks for pointing out this problem, Viktor! Nice catch!
> > > >> > > > >
> > > >> > > > > Best,
> > > >> > > > > Dongjin
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <
> > dongjin@apache.org>
> > > >> > > wrote:
> > > >> > > > >
> > > >> > > > >> I greatly appreciate your comprehensive reasoning. so: +1
> > for b
> > > >> > until
> > > >> > > > now.
> > > >> > > > >>
> > > >> > > > >> For the license issues, I will have a check on how the over
> > > >> projects
> > > >> > > are
> > > >> > > > >> doing and share the results.
> > > >> > > > >>
> > > >> > > > >> Best,
> > > >> > > > >> Dongjin
> > > >> > > > >>
> > > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > > >> > > > viktorsomogyi@gmail.com>
> > > >> > > > >> wrote:
> > > >> > > > >>
> > > >> > > > >>> Hi Dongjin,
> > > >> > > > >>>
> > > >> > > > >>> A couple of comments:
> > > >> > > > >>> I would vote for option b. in the "backward compatibility"
> > > >> section.
> > > >> > > My
> > > >> > > > >>> reasoning for this is that users upgrading to a zstd
> > compatible
> > > >> > > version
> > > >> > > > >>> won't start to use it automatically, so manual
> > reconfiguration
> > > >> is
> > > >> > > > >>> required.
> > > >> > > > >>> Therefore an upgrade won't mess up the cluster. If not all
> > the
> > > >> > > clients
> > > >> > > > >>> are
> > > >> > > > >>> upgraded but just some of them and they'd start to use
> zstd
> > > >> then it
> > > >> > > > would
> > > >> > > > >>> cause errors in the cluster. I'd like to presume though
> that
> > > >> this
> > > >> > is
> > > >> > > a
> > > >> > > > >>> very
> > > >> > > > >>> obvious failure case and nobody should be surprised if it
> > didn't
> > > >> > > work.
> > > >> > > > >>> I wouldn't choose a. as I think we should bump the fetch
> and
> > > >> > produce
> > > >> > > > >>> requests if it's a change in the message format. Moreover
> if
> > > >> some
> > > >> > of
> > > >> > > > the
> > > >> > > > >>> producers and the brokers are upgraded but some of the
> > consumers
> > > >> > are
> > > >> > > > not,
> > > >> > > > >>> then we wouldn't prevent the error when the old consumer
> > tries
> > > >> to
> > > >> > > > consume
> > > >> > > > >>> the zstd compressed messages.
> > > >> > > > >>> I wouldn't choose c. either as I think binding the
> > compression
> > > >> type
> > > >> > > to
> > > >> > > > an
> > > >> > > > >>> API is not so obvious from the developer's perspective.
> > > >> > > > >>>
> > > >> > > > >>> I would also prefer to use the existing binding, however
> we
> > must
> > > >> > > > respect
> > > >> > > > >>> the licenses:
> > > >> > > > >>> "The code for these JNI bindings is licenced under
> 2-clause
> > BSD
> > > >> > > > license.
> > > >> > > > >>> The native Zstd library is licensed under 3-clause BSD
> > license
> > > >> and
> > > >> > > > GPL2"
> > > >> > > > >>> Based on the FAQ page
> > > >> > > > >>> https://www.apache.org/legal/resolved.html#category-a
> > > >> > > > >>> we may use 2- and 3-clause BSD licenses but the Apache
> > license
> > > >> is
> > > >> > not
> > > >> > > > >>> compatible with GPL2. I'm hoping that the "3-clause BSD
> > license
> > > >> and
> > > >> > > > GPL2"
> > > >> > > > >>> is really not an AND but an OR in this case, but I'm no
> > lawyer,
> > > >> > just
> > > >> > > > >>> wanted
> > > >> > > > >>> to make the point that we should watch out for licenses.
> :)
> > > >> > > > >>>
> > > >> > > > >>> Regards,
> > > >> > > > >>> Viktor
> > > >> > > > >>>
> > > >> > > > >>>
> > > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <
> > ibobrik@gmail.com>
> > > >> > > wrote:
> > > >> > > > >>>
> > > >> > > > >>> > Hello,
> > > >> > > > >>> >
> > > >> > > > >>> > This is Ivan and I still very much support the fact that
> > zstd
> > > >> > > > >>> compression
> > > >> > > > >>> > should be included out of the box.
> > > >> > > > >>> >
> > > >> > > > >>> > Please think about the environment, you can save quite a
> > lot
> > > >> of
> > > >> > > > >>> hardware
> > > >> > > > >>> > with it.
> > > >> > > > >>> >
> > > >> > > > >>> > Thank you.
> > > >> > > > >>> >
> > > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <
> > dongjin@apache.org>
> > > >> > > wrote:
> > > >> > > > >>> >
> > > >> > > > >>> > > Since there are no responses for a week, I decided to
> > > >> > reinitiate
> > > >> > > > the
> > > >> > > > >>> > > discussion thread.
> > > >> > > > >>> > >
> > > >> > > > >>> > >
> > > >> > > > >>> > >
> > > >> > > > >>> >
> > > >> > > > >>>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 110%3A+Add+Codec+for+ZStandard+Compression
> > > >> > > > >>> > >
> > > >> > > > >>> > > This KIP is about to introduce ZStandard Compression
> > into
> > > >> > Apache
> > > >> > > > >>> Kafka.
> > > >> > > > >>> > > The reason why it is posted again has a story: It was
> > > >> > originally
> > > >> > > > >>> posted
> > > >> > > > >>> > to
> > > >> > > > >>> > > the dev mailing list more than one year ago but since
> it
> > > >> has no
> > > >> > > > >>> > performance
> > > >> > > > >>> > > report included, it was postponed later. But Some
> people
> > > >> > > (including
> > > >> > > > >>> Ivan)
> > > >> > > > >>> > > reported excellent performance report with the draft
> PR,
> > > >> this
> > > >> > > work
> > > >> > > > >>> is now
> > > >> > > > >>> > > reactivated.
> > > >> > > > >>> > >
> > > >> > > > >>> > > The updated KIP document includes some expected
> > problems and
> > > >> > > their
> > > >> > > > >>> > > candidate alternatives. Please have a look when you
> are
> > > >> free,
> > > >> > and
> > > >> > > > >>> give
> > > >> > > > >>> > me a
> > > >> > > > >>> > > feedback. All kinds of participating are welcome.
> > > >> > > > >>> > >
> > > >> > > > >>> > > Best,
> > > >> > > > >>> > > Dongjin
> > > >> > > > >>> > >
> > > >> > > > >>> > > --
> > > >> > > > >>> > > *Dongjin Lee*
> > > >> > > > >>> > >
> > > >> > > > >>> > > *A hitchhiker in the mathematical world.*
> > > >> > > > >>> > >
> > > >> > > > >>> > > *github:  <http://goog_969573159/>github
> > .com/dongjinleekr
> > > >> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > >> > > > >>> > www.slideshare.net/dongjinleekr
> > > >> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > > >> > > > >>> > >
> > > >> > > > >>> >
> > > >> > > > >>>
> > > >> > > > >> --
> > > >> > > > >> *Dongjin Lee*
> > > >> > > > >>
> > > >> > > > >> *A hitchhiker in the mathematical world.*
> > > >> > > > >>
> > > >> > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > >> > > > www.slideshare.net/dongjinleekr
> > > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > >> > > > >>
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > *Dongjin Lee*
> > > >> > > > >
> > > >> > > > > *A hitchhiker in the mathematical world.*
> > > >> > > > >
> > > >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > >> > > > www.slideshare.net/dongjinleekr
> > > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > *Dongjin Lee*
> > > >> > > >
> > > >> > > > *A hitchhiker in the mathematical world.*
> > > >> > > >
> > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > <http://github.com/dongjinleekr>linkedin:
> > > >> > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > >> > > > www.slideshare.net/dongjinleekr
> > > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >> --
> > > >> *Dongjin Lee*
> > > >>
> > > >> *A hitchhiker in the mathematical world.*
> > > >>
> > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > dongjinleekr
> > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/dongjinleekr
> > > >> <http://www.slideshare.net/dongjinleekr>*
> > > >>
> > > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> > dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > www.slideshare.net/dongjinleekr
> > > <http://www.slideshare.net/dongjinleekr>*
> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Jason Gustafson <ja...@confluent.io>.
Hey Colin,

The problem for the fetch API is that the broker does not generally know if
a batch was compressed with zstd unless it parses it. I think the goal here
is to avoid the expensive down-conversion that is needed to ensure
compatibility because it is only necessary if zstd is actually in use. But
as long as old clients can parse the message format, they should get a
reasonable error if they see an unsupported compression type in the
attributes. Basically the onus is on users to ensure that their consumers
have been updated prior to using zstd. It seems like a reasonable tradeoff
to me. There are a couple cases that might be worth thinking through:

1. If a topic is explicitly configured to use zstd, then we don't need to
check the fetched data for the compression type to know if we need
down-conversion. If we did bump the Fetch API version, then we could handle
this case by either down-converting using a different compression type or
returning an error.
2. Similarly, if we have to down-convert anyway because the client does not
understand the message format, then we could also use a different
compression type.

For the produce API, I think it's reasonable to bump the api version. This
can be used by clients to check whether a broker supports zstd. For
example, we might support a list of preferred compression types in the
producer and we could use the broker to detect which version to use.

-Jason

On Tue, Aug 7, 2018 at 1:32 PM, Colin McCabe <cm...@apache.org> wrote:

> Thanks for bumping this, Dongjin.  ZStd is a good compression codec and I
> hope we can get this support in soon!
>
> I would say we can just bump the API version to indicate that ZStd support
> is expected in new clients.  We probably need some way of indicating to the
> older clients that they can't consume the partitions, as well.  Perhaps we
> can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?
>
> The license thing seems straightforward -- it's just a matter of adding
> the text to the right files as per ASF guidelines.
>
> With regard to the dictionary support, do we really need that in the first
> version?  Hopefully message batches are big enough that this isn't needed.
> Some benchmarks might help here.
>
> best,
> Colin
>
>
> On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> > As Kafka 2.0.0 was released, let's reboot this issue, KIP-110
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 110%3A+Add+Codec+for+ZStandard+Compression>
> > .
> >
> > For newcomers, Here is some summary of the history: KIP-110 was
> originally
> > worked for the issue KAFKA-4514 but, it lacked benchmark results to get
> the
> > agreement of the community. Later, Ivan Babrou and some other users who
> > adopted the patch provided their excellent performance report which is
> now
> > included in the KIP, but it postponed again because of the community was
> > busy for 2.0.0 release. It is why I now reboot this issue.
> >
> > The following is the current status of the feature: You can check the
> > current draft implementation here
> > <https://github.com/apache/kafka/pull/2267>. It is based on zstd 1.3.5
> and
> > periodically rebased onto the latest trunk[^1].
> >
> > The issues that should be addressed is like following:
> >
> > *1. Backward Compatibility*
> >
> > To support old consumers, we need to take a strategy to handle the old
> > consumers. Current candidates are:
> >
> > - Bump API version
> > - Leave unchanged: let the old clients fail.
> > - Improve the error messages:
> >
> > *2. Dictionary Support*
> >
> > To support zstd's dictionary feature in the future (if needed), we need
> to
> > sketch how it should be and leave some room for it. As of now, there has
> > been no discussion on this topic yet.
> >
> > *3. License*
> >
> > To use this feature, we need to add license of zstd and zstd-jni to the
> > project. (Thanks to Viktor Somogyi for raising this issue!) It seems like
> > what Apache Spark did would be a good example but there has been no
> > discussion yet.
> >
> > You can find the details of the above issues in the KIP document. Please
> > have a look when you are free, and give me feedback. All kinds of
> > participating are welcome.
> >
> > Best,
> > Dongjin
> >
> > [^1]: At the time of writing, commit 6b4fb8152.
> >
> > On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <do...@apache.org> wrote:
> >
> > > Sorry for the late reply.
> > >
> > > In short, I could not submit the updated KIP by the feature freeze
> > > deadline of 2.0.0. For this reason, it will not be included in the
> 2.0.0
> > > release and all discussion for this issue were postponed after the
> release
> > > of 2.0.0.
> > >
> > > I have been updating the PR following recent updates. Just now, I
> rebased
> > > it against the latest trunk and updated the zstd version into 1.3.5.
> If you
> > > need some request, don't hesitate to notify me. (But not this thread -
> just
> > > send me the message directly.)
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <bo...@apache.org> wrote:
> > >
> > >> I there any update on this.  The performance improvements are quite
> > >> impressive and I really would like to stop forking kafka just to get
> this
> > >> in.
> > >>
> > >> Thanks,
> > >>
> > >> Bobby
> > >>
> > >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <do...@apache.org>
> wrote:
> > >>
> > >> > Ismael,
> > >> >
> > >> > Oh, I forgot all of you are on working frenzy for 2.0! No problem,
> take
> > >> > your time. I am also working at another issue now. Thank you for
> > >> letting me
> > >> > know.
> > >> >
> > >> > Best,
> > >> > Dongjin
> > >> >
> > >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <is...@gmail.com>
> wrote:
> > >> >
> > >> > > Sorry for the delay Dongjin. Everyone is busy finalising 2.0.0.
> This
> > >> KIP
> > >> > > seems like a great candidate for 2.1.0 and hopefully there will be
> > >> more
> > >> > of
> > >> > > a discussion next week. :)
> > >> > >
> > >> > > Ismael
> > >> > >
> > >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <do...@apache.org>
> wrote:
> > >> > >
> > >> > > > Hello. I just updated my draft implementation:
> > >> > > >
> > >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > >> > > > 2. Apply ZStd 1.3.4
> > >> > > >
> > >> > > > You can check out the implementation from here
> > >> > > > <https://github.com/apache/kafka/pull/2267>. If you experience
> any
> > >> > > problem
> > >> > > > running it, don't hesitate to give me a mention.
> > >> > > >
> > >> > > > Best,
> > >> > > > Dongjin
> > >> > > >
> > >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <dongjin@apache.org
> >
> > >> > wrote:
> > >> > > >
> > >> > > > > Here is the short conclusion about the license problem: *We
> can
> > >> use
> > >> > > zstd
> > >> > > > > and zstd-jni without any problem, but we need to include their
> > >> > license,
> > >> > > > > e.g., BSD license.*
> > >> > > > >
> > >> > > > > Both of BSD 2 Clause License & 3 Clause License requires to
> > >> include
> > >> > the
> > >> > > > > license used, and BSD 3 Clause License requires that the name
> of
> > >> the
> > >> > > > > contributor can't be used to endorse or promote the product.
> > >> That's
> > >> > it
> > >> > > > > <
> > >> > > >
> > >> > >
> > >> >
> > >> http://www.mikestratton.net/2011/12/is-bsd-license-
> compatible-with-apache-2-0-license/
> > >> > > > >
> > >> > > > > - They are not listed in the list of prohibited licenses
> > >> > > > > <https://www.apache.org/legal/resolved.html#category-x> also.
> > >> > > > >
> > >> > > > > Here is how Spark did for it
> > >> > > > > <https://issues.apache.org/jira/browse/SPARK-19112>:
> > >> > > > >
> > >> > > > > - They made a directory dedicated to the dependency license
> files
> > >> > > > > <https://github.com/apache/spark/tree/master/licenses> and
> added
> > >> > > > licenses
> > >> > > > > for Zstd
> > >> > > > > <
> > >> > https://github.com/apache/spark/blob/master/licenses/
> LICENSE-zstd.txt
> > >> > > >
> > >> > > > &
> > >> > > > > Zstd-jni
> > >> > > > > <
> > >> > > >
> > >> > >
> > >> >
> > >> https://github.com/apache/spark/blob/master/licenses/
> LICENSE-zstd-jni.txt
> > >> >
> > >> > > > > .
> > >> > > > > - Added a link to the original license files in LICENSE.
> > >> > > > > <https://github.com/apache/spark/pull/18805/files>
> > >> > > > >
> > >> > > > > If needed, I can make a similar update.
> > >> > > > >
> > >> > > > > Thanks for pointing out this problem, Viktor! Nice catch!
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Dongjin
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <
> dongjin@apache.org>
> > >> > > wrote:
> > >> > > > >
> > >> > > > >> I greatly appreciate your comprehensive reasoning. so: +1
> for b
> > >> > until
> > >> > > > now.
> > >> > > > >>
> > >> > > > >> For the license issues, I will have a check on how the over
> > >> projects
> > >> > > are
> > >> > > > >> doing and share the results.
> > >> > > > >>
> > >> > > > >> Best,
> > >> > > > >> Dongjin
> > >> > > > >>
> > >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > >> > > > viktorsomogyi@gmail.com>
> > >> > > > >> wrote:
> > >> > > > >>
> > >> > > > >>> Hi Dongjin,
> > >> > > > >>>
> > >> > > > >>> A couple of comments:
> > >> > > > >>> I would vote for option b. in the "backward compatibility"
> > >> section.
> > >> > > My
> > >> > > > >>> reasoning for this is that users upgrading to a zstd
> compatible
> > >> > > version
> > >> > > > >>> won't start to use it automatically, so manual
> reconfiguration
> > >> is
> > >> > > > >>> required.
> > >> > > > >>> Therefore an upgrade won't mess up the cluster. If not all
> the
> > >> > > clients
> > >> > > > >>> are
> > >> > > > >>> upgraded but just some of them and they'd start to use zstd
> > >> then it
> > >> > > > would
> > >> > > > >>> cause errors in the cluster. I'd like to presume though that
> > >> this
> > >> > is
> > >> > > a
> > >> > > > >>> very
> > >> > > > >>> obvious failure case and nobody should be surprised if it
> didn't
> > >> > > work.
> > >> > > > >>> I wouldn't choose a. as I think we should bump the fetch and
> > >> > produce
> > >> > > > >>> requests if it's a change in the message format. Moreover if
> > >> some
> > >> > of
> > >> > > > the
> > >> > > > >>> producers and the brokers are upgraded but some of the
> consumers
> > >> > are
> > >> > > > not,
> > >> > > > >>> then we wouldn't prevent the error when the old consumer
> tries
> > >> to
> > >> > > > consume
> > >> > > > >>> the zstd compressed messages.
> > >> > > > >>> I wouldn't choose c. either as I think binding the
> compression
> > >> type
> > >> > > to
> > >> > > > an
> > >> > > > >>> API is not so obvious from the developer's perspective.
> > >> > > > >>>
> > >> > > > >>> I would also prefer to use the existing binding, however we
> must
> > >> > > > respect
> > >> > > > >>> the licenses:
> > >> > > > >>> "The code for these JNI bindings is licenced under 2-clause
> BSD
> > >> > > > license.
> > >> > > > >>> The native Zstd library is licensed under 3-clause BSD
> license
> > >> and
> > >> > > > GPL2"
> > >> > > > >>> Based on the FAQ page
> > >> > > > >>> https://www.apache.org/legal/resolved.html#category-a
> > >> > > > >>> we may use 2- and 3-clause BSD licenses but the Apache
> license
> > >> is
> > >> > not
> > >> > > > >>> compatible with GPL2. I'm hoping that the "3-clause BSD
> license
> > >> and
> > >> > > > GPL2"
> > >> > > > >>> is really not an AND but an OR in this case, but I'm no
> lawyer,
> > >> > just
> > >> > > > >>> wanted
> > >> > > > >>> to make the point that we should watch out for licenses. :)
> > >> > > > >>>
> > >> > > > >>> Regards,
> > >> > > > >>> Viktor
> > >> > > > >>>
> > >> > > > >>>
> > >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <
> ibobrik@gmail.com>
> > >> > > wrote:
> > >> > > > >>>
> > >> > > > >>> > Hello,
> > >> > > > >>> >
> > >> > > > >>> > This is Ivan and I still very much support the fact that
> zstd
> > >> > > > >>> compression
> > >> > > > >>> > should be included out of the box.
> > >> > > > >>> >
> > >> > > > >>> > Please think about the environment, you can save quite a
> lot
> > >> of
> > >> > > > >>> hardware
> > >> > > > >>> > with it.
> > >> > > > >>> >
> > >> > > > >>> > Thank you.
> > >> > > > >>> >
> > >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <
> dongjin@apache.org>
> > >> > > wrote:
> > >> > > > >>> >
> > >> > > > >>> > > Since there are no responses for a week, I decided to
> > >> > reinitiate
> > >> > > > the
> > >> > > > >>> > > discussion thread.
> > >> > > > >>> > >
> > >> > > > >>> > >
> > >> > > > >>> > >
> > >> > > > >>> >
> > >> > > > >>>
> > >> > > >
> > >> > >
> > >> >
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 110%3A+Add+Codec+for+ZStandard+Compression
> > >> > > > >>> > >
> > >> > > > >>> > > This KIP is about to introduce ZStandard Compression
> into
> > >> > Apache
> > >> > > > >>> Kafka.
> > >> > > > >>> > > The reason why it is posted again has a story: It was
> > >> > originally
> > >> > > > >>> posted
> > >> > > > >>> > to
> > >> > > > >>> > > the dev mailing list more than one year ago but since it
> > >> has no
> > >> > > > >>> > performance
> > >> > > > >>> > > report included, it was postponed later. But Some people
> > >> > > (including
> > >> > > > >>> Ivan)
> > >> > > > >>> > > reported excellent performance report with the draft PR,
> > >> this
> > >> > > work
> > >> > > > >>> is now
> > >> > > > >>> > > reactivated.
> > >> > > > >>> > >
> > >> > > > >>> > > The updated KIP document includes some expected
> problems and
> > >> > > their
> > >> > > > >>> > > candidate alternatives. Please have a look when you are
> > >> free,
> > >> > and
> > >> > > > >>> give
> > >> > > > >>> > me a
> > >> > > > >>> > > feedback. All kinds of participating are welcome.
> > >> > > > >>> > >
> > >> > > > >>> > > Best,
> > >> > > > >>> > > Dongjin
> > >> > > > >>> > >
> > >> > > > >>> > > --
> > >> > > > >>> > > *Dongjin Lee*
> > >> > > > >>> > >
> > >> > > > >>> > > *A hitchhiker in the mathematical world.*
> > >> > > > >>> > >
> > >> > > > >>> > > *github:  <http://goog_969573159/>github
> .com/dongjinleekr
> > >> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > >> > > > >>> > www.slideshare.net/dongjinleekr
> > >> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > >> > > > >>> > >
> > >> > > > >>> >
> > >> > > > >>>
> > >> > > > >> --
> > >> > > > >> *Dongjin Lee*
> > >> > > > >>
> > >> > > > >> *A hitchhiker in the mathematical world.*
> > >> > > > >>
> > >> > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > >> <http://github.com/dongjinleekr>linkedin:
> > >> > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > >> > > > www.slideshare.net/dongjinleekr
> > >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > >> > > > >>
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > *Dongjin Lee*
> > >> > > > >
> > >> > > > > *A hitchhiker in the mathematical world.*
> > >> > > > >
> > >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > > <http://github.com/dongjinleekr>linkedin:
> > >> > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > >> > > > www.slideshare.net/dongjinleekr
> > >> > > > > <http://www.slideshare.net/dongjinleekr>*
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > *Dongjin Lee*
> > >> > > >
> > >> > > > *A hitchhiker in the mathematical world.*
> > >> > > >
> > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > <http://github.com/dongjinleekr>linkedin:
> > >> > > kr.linkedin.com/in/dongjinleekr
> > >> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > >> > > > www.slideshare.net/dongjinleekr
> > >> > > > <http://www.slideshare.net/dongjinleekr>*
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >> --
> > >> *Dongjin Lee*
> > >>
> > >> *A hitchhiker in the mathematical world.*
> > >>
> > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> dongjinleekr
> > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/dongjinleekr
> > >> <http://www.slideshare.net/dongjinleekr>*
> > >>
> > >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/
> dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/dongjinleekr
> > <http://www.slideshare.net/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Colin McCabe <cm...@apache.org>.
Thanks for bumping this, Dongjin.  ZStd is a good compression codec and I hope we can get this support in soon!

I would say we can just bump the API version to indicate that ZStd support is expected in new clients.  We probably need some way of indicating to the older clients that they can't consume the partitions, as well.  Perhaps we can use the UNSUPPORTED_FOR_MESSAGE_FORMAT error?

The license thing seems straightforward -- it's just a matter of adding the text to the right files as per ASF guidelines.

With regard to the dictionary support, do we really need that in the first version?  Hopefully message batches are big enough that this isn't needed.  Some benchmarks might help here.

best,
Colin


On Tue, Aug 7, 2018, at 08:02, Dongjin Lee wrote:
> As Kafka 2.0.0 was released, let's reboot this issue, KIP-110
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression>
> .
> 
> For newcomers, Here is some summary of the history: KIP-110 was originally
> worked for the issue KAFKA-4514 but, it lacked benchmark results to get the
> agreement of the community. Later, Ivan Babrou and some other users who
> adopted the patch provided their excellent performance report which is now
> included in the KIP, but it postponed again because of the community was
> busy for 2.0.0 release. It is why I now reboot this issue.
> 
> The following is the current status of the feature: You can check the
> current draft implementation here
> <https://github.com/apache/kafka/pull/2267>. It is based on zstd 1.3.5 and
> periodically rebased onto the latest trunk[^1].
> 
> The issues that should be addressed is like following:
> 
> *1. Backward Compatibility*
> 
> To support old consumers, we need to take a strategy to handle the old
> consumers. Current candidates are:
> 
> - Bump API version
> - Leave unchanged: let the old clients fail.
> - Improve the error messages:
> 
> *2. Dictionary Support*
> 
> To support zstd's dictionary feature in the future (if needed), we need to
> sketch how it should be and leave some room for it. As of now, there has
> been no discussion on this topic yet.
> 
> *3. License*
> 
> To use this feature, we need to add license of zstd and zstd-jni to the
> project. (Thanks to Viktor Somogyi for raising this issue!) It seems like
> what Apache Spark did would be a good example but there has been no
> discussion yet.
> 
> You can find the details of the above issues in the KIP document. Please
> have a look when you are free, and give me feedback. All kinds of
> participating are welcome.
> 
> Best,
> Dongjin
> 
> [^1]: At the time of writing, commit 6b4fb8152.
> 
> On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <do...@apache.org> wrote:
> 
> > Sorry for the late reply.
> >
> > In short, I could not submit the updated KIP by the feature freeze
> > deadline of 2.0.0. For this reason, it will not be included in the 2.0.0
> > release and all discussion for this issue were postponed after the release
> > of 2.0.0.
> >
> > I have been updating the PR following recent updates. Just now, I rebased
> > it against the latest trunk and updated the zstd version into 1.3.5. If you
> > need some request, don't hesitate to notify me. (But not this thread - just
> > send me the message directly.)
> >
> > Best,
> > Dongjin
> >
> > On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <bo...@apache.org> wrote:
> >
> >> I there any update on this.  The performance improvements are quite
> >> impressive and I really would like to stop forking kafka just to get this
> >> in.
> >>
> >> Thanks,
> >>
> >> Bobby
> >>
> >> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <do...@apache.org> wrote:
> >>
> >> > Ismael,
> >> >
> >> > Oh, I forgot all of you are on working frenzy for 2.0! No problem, take
> >> > your time. I am also working at another issue now. Thank you for
> >> letting me
> >> > know.
> >> >
> >> > Best,
> >> > Dongjin
> >> >
> >> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <is...@gmail.com> wrote:
> >> >
> >> > > Sorry for the delay Dongjin. Everyone is busy finalising 2.0.0. This
> >> KIP
> >> > > seems like a great candidate for 2.1.0 and hopefully there will be
> >> more
> >> > of
> >> > > a discussion next week. :)
> >> > >
> >> > > Ismael
> >> > >
> >> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <do...@apache.org> wrote:
> >> > >
> >> > > > Hello. I just updated my draft implementation:
> >> > > >
> >> > > > 1. Rebased to latest trunk (commit 5145d6b)
> >> > > > 2. Apply ZStd 1.3.4
> >> > > >
> >> > > > You can check out the implementation from here
> >> > > > <https://github.com/apache/kafka/pull/2267>. If you experience any
> >> > > problem
> >> > > > running it, don't hesitate to give me a mention.
> >> > > >
> >> > > > Best,
> >> > > > Dongjin
> >> > > >
> >> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <do...@apache.org>
> >> > wrote:
> >> > > >
> >> > > > > Here is the short conclusion about the license problem: *We can
> >> use
> >> > > zstd
> >> > > > > and zstd-jni without any problem, but we need to include their
> >> > license,
> >> > > > > e.g., BSD license.*
> >> > > > >
> >> > > > > Both of BSD 2 Clause License & 3 Clause License requires to
> >> include
> >> > the
> >> > > > > license used, and BSD 3 Clause License requires that the name of
> >> the
> >> > > > > contributor can't be used to endorse or promote the product.
> >> That's
> >> > it
> >> > > > > <
> >> > > >
> >> > >
> >> >
> >> http://www.mikestratton.net/2011/12/is-bsd-license-compatible-with-apache-2-0-license/
> >> > > > >
> >> > > > > - They are not listed in the list of prohibited licenses
> >> > > > > <https://www.apache.org/legal/resolved.html#category-x> also.
> >> > > > >
> >> > > > > Here is how Spark did for it
> >> > > > > <https://issues.apache.org/jira/browse/SPARK-19112>:
> >> > > > >
> >> > > > > - They made a directory dedicated to the dependency license files
> >> > > > > <https://github.com/apache/spark/tree/master/licenses> and added
> >> > > > licenses
> >> > > > > for Zstd
> >> > > > > <
> >> > https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd.txt
> >> > > >
> >> > > > &
> >> > > > > Zstd-jni
> >> > > > > <
> >> > > >
> >> > >
> >> >
> >> https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd-jni.txt
> >> >
> >> > > > > .
> >> > > > > - Added a link to the original license files in LICENSE.
> >> > > > > <https://github.com/apache/spark/pull/18805/files>
> >> > > > >
> >> > > > > If needed, I can make a similar update.
> >> > > > >
> >> > > > > Thanks for pointing out this problem, Viktor! Nice catch!
> >> > > > >
> >> > > > > Best,
> >> > > > > Dongjin
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <do...@apache.org>
> >> > > wrote:
> >> > > > >
> >> > > > >> I greatly appreciate your comprehensive reasoning. so: +1 for b
> >> > until
> >> > > > now.
> >> > > > >>
> >> > > > >> For the license issues, I will have a check on how the over
> >> projects
> >> > > are
> >> > > > >> doing and share the results.
> >> > > > >>
> >> > > > >> Best,
> >> > > > >> Dongjin
> >> > > > >>
> >> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> >> > > > viktorsomogyi@gmail.com>
> >> > > > >> wrote:
> >> > > > >>
> >> > > > >>> Hi Dongjin,
> >> > > > >>>
> >> > > > >>> A couple of comments:
> >> > > > >>> I would vote for option b. in the "backward compatibility"
> >> section.
> >> > > My
> >> > > > >>> reasoning for this is that users upgrading to a zstd compatible
> >> > > version
> >> > > > >>> won't start to use it automatically, so manual reconfiguration
> >> is
> >> > > > >>> required.
> >> > > > >>> Therefore an upgrade won't mess up the cluster. If not all the
> >> > > clients
> >> > > > >>> are
> >> > > > >>> upgraded but just some of them and they'd start to use zstd
> >> then it
> >> > > > would
> >> > > > >>> cause errors in the cluster. I'd like to presume though that
> >> this
> >> > is
> >> > > a
> >> > > > >>> very
> >> > > > >>> obvious failure case and nobody should be surprised if it didn't
> >> > > work.
> >> > > > >>> I wouldn't choose a. as I think we should bump the fetch and
> >> > produce
> >> > > > >>> requests if it's a change in the message format. Moreover if
> >> some
> >> > of
> >> > > > the
> >> > > > >>> producers and the brokers are upgraded but some of the consumers
> >> > are
> >> > > > not,
> >> > > > >>> then we wouldn't prevent the error when the old consumer tries
> >> to
> >> > > > consume
> >> > > > >>> the zstd compressed messages.
> >> > > > >>> I wouldn't choose c. either as I think binding the compression
> >> type
> >> > > to
> >> > > > an
> >> > > > >>> API is not so obvious from the developer's perspective.
> >> > > > >>>
> >> > > > >>> I would also prefer to use the existing binding, however we must
> >> > > > respect
> >> > > > >>> the licenses:
> >> > > > >>> "The code for these JNI bindings is licenced under 2-clause BSD
> >> > > > license.
> >> > > > >>> The native Zstd library is licensed under 3-clause BSD license
> >> and
> >> > > > GPL2"
> >> > > > >>> Based on the FAQ page
> >> > > > >>> https://www.apache.org/legal/resolved.html#category-a
> >> > > > >>> we may use 2- and 3-clause BSD licenses but the Apache license
> >> is
> >> > not
> >> > > > >>> compatible with GPL2. I'm hoping that the "3-clause BSD license
> >> and
> >> > > > GPL2"
> >> > > > >>> is really not an AND but an OR in this case, but I'm no lawyer,
> >> > just
> >> > > > >>> wanted
> >> > > > >>> to make the point that we should watch out for licenses. :)
> >> > > > >>>
> >> > > > >>> Regards,
> >> > > > >>> Viktor
> >> > > > >>>
> >> > > > >>>
> >> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com>
> >> > > wrote:
> >> > > > >>>
> >> > > > >>> > Hello,
> >> > > > >>> >
> >> > > > >>> > This is Ivan and I still very much support the fact that zstd
> >> > > > >>> compression
> >> > > > >>> > should be included out of the box.
> >> > > > >>> >
> >> > > > >>> > Please think about the environment, you can save quite a lot
> >> of
> >> > > > >>> hardware
> >> > > > >>> > with it.
> >> > > > >>> >
> >> > > > >>> > Thank you.
> >> > > > >>> >
> >> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org>
> >> > > wrote:
> >> > > > >>> >
> >> > > > >>> > > Since there are no responses for a week, I decided to
> >> > reinitiate
> >> > > > the
> >> > > > >>> > > discussion thread.
> >> > > > >>> > >
> >> > > > >>> > >
> >> > > > >>> > >
> >> > > > >>> >
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
> >> > > > >>> > >
> >> > > > >>> > > This KIP is about to introduce ZStandard Compression into
> >> > Apache
> >> > > > >>> Kafka.
> >> > > > >>> > > The reason why it is posted again has a story: It was
> >> > originally
> >> > > > >>> posted
> >> > > > >>> > to
> >> > > > >>> > > the dev mailing list more than one year ago but since it
> >> has no
> >> > > > >>> > performance
> >> > > > >>> > > report included, it was postponed later. But Some people
> >> > > (including
> >> > > > >>> Ivan)
> >> > > > >>> > > reported excellent performance report with the draft PR,
> >> this
> >> > > work
> >> > > > >>> is now
> >> > > > >>> > > reactivated.
> >> > > > >>> > >
> >> > > > >>> > > The updated KIP document includes some expected problems and
> >> > > their
> >> > > > >>> > > candidate alternatives. Please have a look when you are
> >> free,
> >> > and
> >> > > > >>> give
> >> > > > >>> > me a
> >> > > > >>> > > feedback. All kinds of participating are welcome.
> >> > > > >>> > >
> >> > > > >>> > > Best,
> >> > > > >>> > > Dongjin
> >> > > > >>> > >
> >> > > > >>> > > --
> >> > > > >>> > > *Dongjin Lee*
> >> > > > >>> > >
> >> > > > >>> > > *A hitchhiker in the mathematical world.*
> >> > > > >>> > >
> >> > > > >>> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> >> > > > >>> > kr.linkedin.com/in/dongjinleekr
> >> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> >> > > > >>> > www.slideshare.net/dongjinleekr
> >> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> >> > > > >>> > >
> >> > > > >>> >
> >> > > > >>>
> >> > > > >> --
> >> > > > >> *Dongjin Lee*
> >> > > > >>
> >> > > > >> *A hitchhiker in the mathematical world.*
> >> > > > >>
> >> > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > >> <http://github.com/dongjinleekr>linkedin:
> >> > > > kr.linkedin.com/in/dongjinleekr
> >> > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> >> > > > www.slideshare.net/dongjinleekr
> >> > > > >> <http://www.slideshare.net/dongjinleekr>*
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > *Dongjin Lee*
> >> > > > >
> >> > > > > *A hitchhiker in the mathematical world.*
> >> > > > >
> >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > > <http://github.com/dongjinleekr>linkedin:
> >> > > > kr.linkedin.com/in/dongjinleekr
> >> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> >> > > > www.slideshare.net/dongjinleekr
> >> > > > > <http://www.slideshare.net/dongjinleekr>*
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > *Dongjin Lee*
> >> > > >
> >> > > > *A hitchhiker in the mathematical world.*
> >> > > >
> >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > <http://github.com/dongjinleekr>linkedin:
> >> > > kr.linkedin.com/in/dongjinleekr
> >> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> >> > > > www.slideshare.net/dongjinleekr
> >> > > > <http://www.slideshare.net/dongjinleekr>*
> >> > > >
> >> > >
> >> >
> >>
> >> --
> >> *Dongjin Lee*
> >>
> >> *A hitchhiker in the mathematical world.*
> >>
> >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> >> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> >> <http://www.slideshare.net/dongjinleekr>*
> >>
> >
> 
> -- 
> *Dongjin Lee*
> 
> *A hitchhiker in the mathematical world.*
> 
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
As Kafka 2.0.0 was released, let's reboot this issue, KIP-110
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression>
.

For newcomers, Here is some summary of the history: KIP-110 was originally
worked for the issue KAFKA-4514 but, it lacked benchmark results to get the
agreement of the community. Later, Ivan Babrou and some other users who
adopted the patch provided their excellent performance report which is now
included in the KIP, but it postponed again because of the community was
busy for 2.0.0 release. It is why I now reboot this issue.

The following is the current status of the feature: You can check the
current draft implementation here
<https://github.com/apache/kafka/pull/2267>. It is based on zstd 1.3.5 and
periodically rebased onto the latest trunk[^1].

The issues that should be addressed is like following:

*1. Backward Compatibility*

To support old consumers, we need to take a strategy to handle the old
consumers. Current candidates are:

- Bump API version
- Leave unchanged: let the old clients fail.
- Improve the error messages:

*2. Dictionary Support*

To support zstd's dictionary feature in the future (if needed), we need to
sketch how it should be and leave some room for it. As of now, there has
been no discussion on this topic yet.

*3. License*

To use this feature, we need to add license of zstd and zstd-jni to the
project. (Thanks to Viktor Somogyi for raising this issue!) It seems like
what Apache Spark did would be a good example but there has been no
discussion yet.

You can find the details of the above issues in the KIP document. Please
have a look when you are free, and give me feedback. All kinds of
participating are welcome.

Best,
Dongjin

[^1]: At the time of writing, commit 6b4fb8152.

On Sat, Jul 14, 2018 at 10:45 PM Dongjin Lee <do...@apache.org> wrote:

> Sorry for the late reply.
>
> In short, I could not submit the updated KIP by the feature freeze
> deadline of 2.0.0. For this reason, it will not be included in the 2.0.0
> release and all discussion for this issue were postponed after the release
> of 2.0.0.
>
> I have been updating the PR following recent updates. Just now, I rebased
> it against the latest trunk and updated the zstd version into 1.3.5. If you
> need some request, don't hesitate to notify me. (But not this thread - just
> send me the message directly.)
>
> Best,
> Dongjin
>
> On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <bo...@apache.org> wrote:
>
>> I there any update on this.  The performance improvements are quite
>> impressive and I really would like to stop forking kafka just to get this
>> in.
>>
>> Thanks,
>>
>> Bobby
>>
>> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <do...@apache.org> wrote:
>>
>> > Ismael,
>> >
>> > Oh, I forgot all of you are on working frenzy for 2.0! No problem, take
>> > your time. I am also working at another issue now. Thank you for
>> letting me
>> > know.
>> >
>> > Best,
>> > Dongjin
>> >
>> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <is...@gmail.com> wrote:
>> >
>> > > Sorry for the delay Dongjin. Everyone is busy finalising 2.0.0. This
>> KIP
>> > > seems like a great candidate for 2.1.0 and hopefully there will be
>> more
>> > of
>> > > a discussion next week. :)
>> > >
>> > > Ismael
>> > >
>> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <do...@apache.org> wrote:
>> > >
>> > > > Hello. I just updated my draft implementation:
>> > > >
>> > > > 1. Rebased to latest trunk (commit 5145d6b)
>> > > > 2. Apply ZStd 1.3.4
>> > > >
>> > > > You can check out the implementation from here
>> > > > <https://github.com/apache/kafka/pull/2267>. If you experience any
>> > > problem
>> > > > running it, don't hesitate to give me a mention.
>> > > >
>> > > > Best,
>> > > > Dongjin
>> > > >
>> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <do...@apache.org>
>> > wrote:
>> > > >
>> > > > > Here is the short conclusion about the license problem: *We can
>> use
>> > > zstd
>> > > > > and zstd-jni without any problem, but we need to include their
>> > license,
>> > > > > e.g., BSD license.*
>> > > > >
>> > > > > Both of BSD 2 Clause License & 3 Clause License requires to
>> include
>> > the
>> > > > > license used, and BSD 3 Clause License requires that the name of
>> the
>> > > > > contributor can't be used to endorse or promote the product.
>> That's
>> > it
>> > > > > <
>> > > >
>> > >
>> >
>> http://www.mikestratton.net/2011/12/is-bsd-license-compatible-with-apache-2-0-license/
>> > > > >
>> > > > > - They are not listed in the list of prohibited licenses
>> > > > > <https://www.apache.org/legal/resolved.html#category-x> also.
>> > > > >
>> > > > > Here is how Spark did for it
>> > > > > <https://issues.apache.org/jira/browse/SPARK-19112>:
>> > > > >
>> > > > > - They made a directory dedicated to the dependency license files
>> > > > > <https://github.com/apache/spark/tree/master/licenses> and added
>> > > > licenses
>> > > > > for Zstd
>> > > > > <
>> > https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd.txt
>> > > >
>> > > > &
>> > > > > Zstd-jni
>> > > > > <
>> > > >
>> > >
>> >
>> https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd-jni.txt
>> >
>> > > > > .
>> > > > > - Added a link to the original license files in LICENSE.
>> > > > > <https://github.com/apache/spark/pull/18805/files>
>> > > > >
>> > > > > If needed, I can make a similar update.
>> > > > >
>> > > > > Thanks for pointing out this problem, Viktor! Nice catch!
>> > > > >
>> > > > > Best,
>> > > > > Dongjin
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <do...@apache.org>
>> > > wrote:
>> > > > >
>> > > > >> I greatly appreciate your comprehensive reasoning. so: +1 for b
>> > until
>> > > > now.
>> > > > >>
>> > > > >> For the license issues, I will have a check on how the over
>> projects
>> > > are
>> > > > >> doing and share the results.
>> > > > >>
>> > > > >> Best,
>> > > > >> Dongjin
>> > > > >>
>> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
>> > > > viktorsomogyi@gmail.com>
>> > > > >> wrote:
>> > > > >>
>> > > > >>> Hi Dongjin,
>> > > > >>>
>> > > > >>> A couple of comments:
>> > > > >>> I would vote for option b. in the "backward compatibility"
>> section.
>> > > My
>> > > > >>> reasoning for this is that users upgrading to a zstd compatible
>> > > version
>> > > > >>> won't start to use it automatically, so manual reconfiguration
>> is
>> > > > >>> required.
>> > > > >>> Therefore an upgrade won't mess up the cluster. If not all the
>> > > clients
>> > > > >>> are
>> > > > >>> upgraded but just some of them and they'd start to use zstd
>> then it
>> > > > would
>> > > > >>> cause errors in the cluster. I'd like to presume though that
>> this
>> > is
>> > > a
>> > > > >>> very
>> > > > >>> obvious failure case and nobody should be surprised if it didn't
>> > > work.
>> > > > >>> I wouldn't choose a. as I think we should bump the fetch and
>> > produce
>> > > > >>> requests if it's a change in the message format. Moreover if
>> some
>> > of
>> > > > the
>> > > > >>> producers and the brokers are upgraded but some of the consumers
>> > are
>> > > > not,
>> > > > >>> then we wouldn't prevent the error when the old consumer tries
>> to
>> > > > consume
>> > > > >>> the zstd compressed messages.
>> > > > >>> I wouldn't choose c. either as I think binding the compression
>> type
>> > > to
>> > > > an
>> > > > >>> API is not so obvious from the developer's perspective.
>> > > > >>>
>> > > > >>> I would also prefer to use the existing binding, however we must
>> > > > respect
>> > > > >>> the licenses:
>> > > > >>> "The code for these JNI bindings is licenced under 2-clause BSD
>> > > > license.
>> > > > >>> The native Zstd library is licensed under 3-clause BSD license
>> and
>> > > > GPL2"
>> > > > >>> Based on the FAQ page
>> > > > >>> https://www.apache.org/legal/resolved.html#category-a
>> > > > >>> we may use 2- and 3-clause BSD licenses but the Apache license
>> is
>> > not
>> > > > >>> compatible with GPL2. I'm hoping that the "3-clause BSD license
>> and
>> > > > GPL2"
>> > > > >>> is really not an AND but an OR in this case, but I'm no lawyer,
>> > just
>> > > > >>> wanted
>> > > > >>> to make the point that we should watch out for licenses. :)
>> > > > >>>
>> > > > >>> Regards,
>> > > > >>> Viktor
>> > > > >>>
>> > > > >>>
>> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com>
>> > > wrote:
>> > > > >>>
>> > > > >>> > Hello,
>> > > > >>> >
>> > > > >>> > This is Ivan and I still very much support the fact that zstd
>> > > > >>> compression
>> > > > >>> > should be included out of the box.
>> > > > >>> >
>> > > > >>> > Please think about the environment, you can save quite a lot
>> of
>> > > > >>> hardware
>> > > > >>> > with it.
>> > > > >>> >
>> > > > >>> > Thank you.
>> > > > >>> >
>> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org>
>> > > wrote:
>> > > > >>> >
>> > > > >>> > > Since there are no responses for a week, I decided to
>> > reinitiate
>> > > > the
>> > > > >>> > > discussion thread.
>> > > > >>> > >
>> > > > >>> > >
>> > > > >>> > >
>> > > > >>> >
>> > > > >>>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
>> > > > >>> > >
>> > > > >>> > > This KIP is about to introduce ZStandard Compression into
>> > Apache
>> > > > >>> Kafka.
>> > > > >>> > > The reason why it is posted again has a story: It was
>> > originally
>> > > > >>> posted
>> > > > >>> > to
>> > > > >>> > > the dev mailing list more than one year ago but since it
>> has no
>> > > > >>> > performance
>> > > > >>> > > report included, it was postponed later. But Some people
>> > > (including
>> > > > >>> Ivan)
>> > > > >>> > > reported excellent performance report with the draft PR,
>> this
>> > > work
>> > > > >>> is now
>> > > > >>> > > reactivated.
>> > > > >>> > >
>> > > > >>> > > The updated KIP document includes some expected problems and
>> > > their
>> > > > >>> > > candidate alternatives. Please have a look when you are
>> free,
>> > and
>> > > > >>> give
>> > > > >>> > me a
>> > > > >>> > > feedback. All kinds of participating are welcome.
>> > > > >>> > >
>> > > > >>> > > Best,
>> > > > >>> > > Dongjin
>> > > > >>> > >
>> > > > >>> > > --
>> > > > >>> > > *Dongjin Lee*
>> > > > >>> > >
>> > > > >>> > > *A hitchhiker in the mathematical world.*
>> > > > >>> > >
>> > > > >>> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
>> > > > >>> > kr.linkedin.com/in/dongjinleekr
>> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>> > > > >>> > www.slideshare.net/dongjinleekr
>> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
>> > > > >>> > >
>> > > > >>> >
>> > > > >>>
>> > > > >> --
>> > > > >> *Dongjin Lee*
>> > > > >>
>> > > > >> *A hitchhiker in the mathematical world.*
>> > > > >>
>> > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > >> <http://github.com/dongjinleekr>linkedin:
>> > > > kr.linkedin.com/in/dongjinleekr
>> > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>> > > > www.slideshare.net/dongjinleekr
>> > > > >> <http://www.slideshare.net/dongjinleekr>*
>> > > > >>
>> > > > >
>> > > > >
>> > > > > --
>> > > > > *Dongjin Lee*
>> > > > >
>> > > > > *A hitchhiker in the mathematical world.*
>> > > > >
>> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > > <http://github.com/dongjinleekr>linkedin:
>> > > > kr.linkedin.com/in/dongjinleekr
>> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>> > > > www.slideshare.net/dongjinleekr
>> > > > > <http://www.slideshare.net/dongjinleekr>*
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > *Dongjin Lee*
>> > > >
>> > > > *A hitchhiker in the mathematical world.*
>> > > >
>> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > <http://github.com/dongjinleekr>linkedin:
>> > > kr.linkedin.com/in/dongjinleekr
>> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>> > > > www.slideshare.net/dongjinleekr
>> > > > <http://www.slideshare.net/dongjinleekr>*
>> > > >
>> > >
>> >
>>
>> --
>> *Dongjin Lee*
>>
>> *A hitchhiker in the mathematical world.*
>>
>> *github:  <http://goog_969573159/>github.com/dongjinleekr
>> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
>> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
>> <http://www.slideshare.net/dongjinleekr>*
>>
>

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Sorry for the late reply.

In short, I could not submit the updated KIP by the feature freeze deadline
of 2.0.0. For this reason, it will not be included in the 2.0.0 release and
all discussion for this issue were postponed after the release of 2.0.0.

I have been updating the PR following recent updates. Just now, I rebased
it against the latest trunk and updated the zstd version into 1.3.5. If you
need some request, don't hesitate to notify me. (But not this thread - just
send me the message directly.)

Best,
Dongjin

On Tue, Jul 10, 2018 at 11:57 PM Bobby Evans <bo...@apache.org> wrote:

> I there any update on this.  The performance improvements are quite
> impressive and I really would like to stop forking kafka just to get this
> in.
>
> Thanks,
>
> Bobby
>
> On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <do...@apache.org> wrote:
>
> > Ismael,
> >
> > Oh, I forgot all of you are on working frenzy for 2.0! No problem, take
> > your time. I am also working at another issue now. Thank you for letting
> me
> > know.
> >
> > Best,
> > Dongjin
> >
> > On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <is...@gmail.com> wrote:
> >
> > > Sorry for the delay Dongjin. Everyone is busy finalising 2.0.0. This
> KIP
> > > seems like a great candidate for 2.1.0 and hopefully there will be more
> > of
> > > a discussion next week. :)
> > >
> > > Ismael
> > >
> > > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <do...@apache.org> wrote:
> > >
> > > > Hello. I just updated my draft implementation:
> > > >
> > > > 1. Rebased to latest trunk (commit 5145d6b)
> > > > 2. Apply ZStd 1.3.4
> > > >
> > > > You can check out the implementation from here
> > > > <https://github.com/apache/kafka/pull/2267>. If you experience any
> > > problem
> > > > running it, don't hesitate to give me a mention.
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >
> > > > > Here is the short conclusion about the license problem: *We can use
> > > zstd
> > > > > and zstd-jni without any problem, but we need to include their
> > license,
> > > > > e.g., BSD license.*
> > > > >
> > > > > Both of BSD 2 Clause License & 3 Clause License requires to include
> > the
> > > > > license used, and BSD 3 Clause License requires that the name of
> the
> > > > > contributor can't be used to endorse or promote the product. That's
> > it
> > > > > <
> > > >
> > >
> >
> http://www.mikestratton.net/2011/12/is-bsd-license-compatible-with-apache-2-0-license/
> > > > >
> > > > > - They are not listed in the list of prohibited licenses
> > > > > <https://www.apache.org/legal/resolved.html#category-x> also.
> > > > >
> > > > > Here is how Spark did for it
> > > > > <https://issues.apache.org/jira/browse/SPARK-19112>:
> > > > >
> > > > > - They made a directory dedicated to the dependency license files
> > > > > <https://github.com/apache/spark/tree/master/licenses> and added
> > > > licenses
> > > > > for Zstd
> > > > > <
> > https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd.txt
> > > >
> > > > &
> > > > > Zstd-jni
> > > > > <
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd-jni.txt>
> > > > > .
> > > > > - Added a link to the original license files in LICENSE.
> > > > > <https://github.com/apache/spark/pull/18805/files>
> > > > >
> > > > > If needed, I can make a similar update.
> > > > >
> > > > > Thanks for pointing out this problem, Viktor! Nice catch!
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >
> > > > >> I greatly appreciate your comprehensive reasoning. so: +1 for b
> > until
> > > > now.
> > > > >>
> > > > >> For the license issues, I will have a check on how the over
> projects
> > > are
> > > > >> doing and share the results.
> > > > >>
> > > > >> Best,
> > > > >> Dongjin
> > > > >>
> > > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > > > viktorsomogyi@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Hi Dongjin,
> > > > >>>
> > > > >>> A couple of comments:
> > > > >>> I would vote for option b. in the "backward compatibility"
> section.
> > > My
> > > > >>> reasoning for this is that users upgrading to a zstd compatible
> > > version
> > > > >>> won't start to use it automatically, so manual reconfiguration is
> > > > >>> required.
> > > > >>> Therefore an upgrade won't mess up the cluster. If not all the
> > > clients
> > > > >>> are
> > > > >>> upgraded but just some of them and they'd start to use zstd then
> it
> > > > would
> > > > >>> cause errors in the cluster. I'd like to presume though that this
> > is
> > > a
> > > > >>> very
> > > > >>> obvious failure case and nobody should be surprised if it didn't
> > > work.
> > > > >>> I wouldn't choose a. as I think we should bump the fetch and
> > produce
> > > > >>> requests if it's a change in the message format. Moreover if some
> > of
> > > > the
> > > > >>> producers and the brokers are upgraded but some of the consumers
> > are
> > > > not,
> > > > >>> then we wouldn't prevent the error when the old consumer tries to
> > > > consume
> > > > >>> the zstd compressed messages.
> > > > >>> I wouldn't choose c. either as I think binding the compression
> type
> > > to
> > > > an
> > > > >>> API is not so obvious from the developer's perspective.
> > > > >>>
> > > > >>> I would also prefer to use the existing binding, however we must
> > > > respect
> > > > >>> the licenses:
> > > > >>> "The code for these JNI bindings is licenced under 2-clause BSD
> > > > license.
> > > > >>> The native Zstd library is licensed under 3-clause BSD license
> and
> > > > GPL2"
> > > > >>> Based on the FAQ page
> > > > >>> https://www.apache.org/legal/resolved.html#category-a
> > > > >>> we may use 2- and 3-clause BSD licenses but the Apache license is
> > not
> > > > >>> compatible with GPL2. I'm hoping that the "3-clause BSD license
> and
> > > > GPL2"
> > > > >>> is really not an AND but an OR in this case, but I'm no lawyer,
> > just
> > > > >>> wanted
> > > > >>> to make the point that we should watch out for licenses. :)
> > > > >>>
> > > > >>> Regards,
> > > > >>> Viktor
> > > > >>>
> > > > >>>
> > > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com>
> > > wrote:
> > > > >>>
> > > > >>> > Hello,
> > > > >>> >
> > > > >>> > This is Ivan and I still very much support the fact that zstd
> > > > >>> compression
> > > > >>> > should be included out of the box.
> > > > >>> >
> > > > >>> > Please think about the environment, you can save quite a lot of
> > > > >>> hardware
> > > > >>> > with it.
> > > > >>> >
> > > > >>> > Thank you.
> > > > >>> >
> > > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >>> >
> > > > >>> > > Since there are no responses for a week, I decided to
> > reinitiate
> > > > the
> > > > >>> > > discussion thread.
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
> > > > >>> > >
> > > > >>> > > This KIP is about to introduce ZStandard Compression into
> > Apache
> > > > >>> Kafka.
> > > > >>> > > The reason why it is posted again has a story: It was
> > originally
> > > > >>> posted
> > > > >>> > to
> > > > >>> > > the dev mailing list more than one year ago but since it has
> no
> > > > >>> > performance
> > > > >>> > > report included, it was postponed later. But Some people
> > > (including
> > > > >>> Ivan)
> > > > >>> > > reported excellent performance report with the draft PR, this
> > > work
> > > > >>> is now
> > > > >>> > > reactivated.
> > > > >>> > >
> > > > >>> > > The updated KIP document includes some expected problems and
> > > their
> > > > >>> > > candidate alternatives. Please have a look when you are free,
> > and
> > > > >>> give
> > > > >>> > me a
> > > > >>> > > feedback. All kinds of participating are welcome.
> > > > >>> > >
> > > > >>> > > Best,
> > > > >>> > > Dongjin
> > > > >>> > >
> > > > >>> > > --
> > > > >>> > > *Dongjin Lee*
> > > > >>> > >
> > > > >>> > > *A hitchhiker in the mathematical world.*
> > > > >>> > >
> > > > >>> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > > > >>> > kr.linkedin.com/in/dongjinleekr
> > > > >>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > >>> > www.slideshare.net/dongjinleekr
> > > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >> --
> > > > >> *Dongjin Lee*
> > > > >>
> > > > >> *A hitchhiker in the mathematical world.*
> > > > >>
> > > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> <http://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > www.slideshare.net/dongjinleekr
> > > > >> <http://www.slideshare.net/dongjinleekr>*
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > >
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > www.slideshare.net/dongjinleekr
> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > >
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > >
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > www.slideshare.net/dongjinleekr
> > > > <http://www.slideshare.net/dongjinleekr>*
> > > >
> > >
> >
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Bobby Evans <bo...@apache.org>.
I there any update on this.  The performance improvements are quite
impressive and I really would like to stop forking kafka just to get this
in.

Thanks,

Bobby

On Wed, Jun 13, 2018 at 8:56 PM Dongjin Lee <do...@apache.org> wrote:

> Ismael,
>
> Oh, I forgot all of you are on working frenzy for 2.0! No problem, take
> your time. I am also working at another issue now. Thank you for letting me
> know.
>
> Best,
> Dongjin
>
> On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <is...@gmail.com> wrote:
>
> > Sorry for the delay Dongjin. Everyone is busy finalising 2.0.0. This KIP
> > seems like a great candidate for 2.1.0 and hopefully there will be more
> of
> > a discussion next week. :)
> >
> > Ismael
> >
> > On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <do...@apache.org> wrote:
> >
> > > Hello. I just updated my draft implementation:
> > >
> > > 1. Rebased to latest trunk (commit 5145d6b)
> > > 2. Apply ZStd 1.3.4
> > >
> > > You can check out the implementation from here
> > > <https://github.com/apache/kafka/pull/2267>. If you experience any
> > problem
> > > running it, don't hesitate to give me a mention.
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > Here is the short conclusion about the license problem: *We can use
> > zstd
> > > > and zstd-jni without any problem, but we need to include their
> license,
> > > > e.g., BSD license.*
> > > >
> > > > Both of BSD 2 Clause License & 3 Clause License requires to include
> the
> > > > license used, and BSD 3 Clause License requires that the name of the
> > > > contributor can't be used to endorse or promote the product. That's
> it
> > > > <
> > >
> >
> http://www.mikestratton.net/2011/12/is-bsd-license-compatible-with-apache-2-0-license/
> > > >
> > > > - They are not listed in the list of prohibited licenses
> > > > <https://www.apache.org/legal/resolved.html#category-x> also.
> > > >
> > > > Here is how Spark did for it
> > > > <https://issues.apache.org/jira/browse/SPARK-19112>:
> > > >
> > > > - They made a directory dedicated to the dependency license files
> > > > <https://github.com/apache/spark/tree/master/licenses> and added
> > > licenses
> > > > for Zstd
> > > > <
> https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd.txt
> > >
> > > &
> > > > Zstd-jni
> > > > <
> > >
> >
> https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd-jni.txt>
> > > > .
> > > > - Added a link to the original license files in LICENSE.
> > > > <https://github.com/apache/spark/pull/18805/files>
> > > >
> > > > If needed, I can make a similar update.
> > > >
> > > > Thanks for pointing out this problem, Viktor! Nice catch!
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > >
> > > >
> > > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >
> > > >> I greatly appreciate your comprehensive reasoning. so: +1 for b
> until
> > > now.
> > > >>
> > > >> For the license issues, I will have a check on how the over projects
> > are
> > > >> doing and share the results.
> > > >>
> > > >> Best,
> > > >> Dongjin
> > > >>
> > > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > > viktorsomogyi@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hi Dongjin,
> > > >>>
> > > >>> A couple of comments:
> > > >>> I would vote for option b. in the "backward compatibility" section.
> > My
> > > >>> reasoning for this is that users upgrading to a zstd compatible
> > version
> > > >>> won't start to use it automatically, so manual reconfiguration is
> > > >>> required.
> > > >>> Therefore an upgrade won't mess up the cluster. If not all the
> > clients
> > > >>> are
> > > >>> upgraded but just some of them and they'd start to use zstd then it
> > > would
> > > >>> cause errors in the cluster. I'd like to presume though that this
> is
> > a
> > > >>> very
> > > >>> obvious failure case and nobody should be surprised if it didn't
> > work.
> > > >>> I wouldn't choose a. as I think we should bump the fetch and
> produce
> > > >>> requests if it's a change in the message format. Moreover if some
> of
> > > the
> > > >>> producers and the brokers are upgraded but some of the consumers
> are
> > > not,
> > > >>> then we wouldn't prevent the error when the old consumer tries to
> > > consume
> > > >>> the zstd compressed messages.
> > > >>> I wouldn't choose c. either as I think binding the compression type
> > to
> > > an
> > > >>> API is not so obvious from the developer's perspective.
> > > >>>
> > > >>> I would also prefer to use the existing binding, however we must
> > > respect
> > > >>> the licenses:
> > > >>> "The code for these JNI bindings is licenced under 2-clause BSD
> > > license.
> > > >>> The native Zstd library is licensed under 3-clause BSD license and
> > > GPL2"
> > > >>> Based on the FAQ page
> > > >>> https://www.apache.org/legal/resolved.html#category-a
> > > >>> we may use 2- and 3-clause BSD licenses but the Apache license is
> not
> > > >>> compatible with GPL2. I'm hoping that the "3-clause BSD license and
> > > GPL2"
> > > >>> is really not an AND but an OR in this case, but I'm no lawyer,
> just
> > > >>> wanted
> > > >>> to make the point that we should watch out for licenses. :)
> > > >>>
> > > >>> Regards,
> > > >>> Viktor
> > > >>>
> > > >>>
> > > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com>
> > wrote:
> > > >>>
> > > >>> > Hello,
> > > >>> >
> > > >>> > This is Ivan and I still very much support the fact that zstd
> > > >>> compression
> > > >>> > should be included out of the box.
> > > >>> >
> > > >>> > Please think about the environment, you can save quite a lot of
> > > >>> hardware
> > > >>> > with it.
> > > >>> >
> > > >>> > Thank you.
> > > >>> >
> > > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org>
> > wrote:
> > > >>> >
> > > >>> > > Since there are no responses for a week, I decided to
> reinitiate
> > > the
> > > >>> > > discussion thread.
> > > >>> > >
> > > >>> > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
> > > >>> > >
> > > >>> > > This KIP is about to introduce ZStandard Compression into
> Apache
> > > >>> Kafka.
> > > >>> > > The reason why it is posted again has a story: It was
> originally
> > > >>> posted
> > > >>> > to
> > > >>> > > the dev mailing list more than one year ago but since it has no
> > > >>> > performance
> > > >>> > > report included, it was postponed later. But Some people
> > (including
> > > >>> Ivan)
> > > >>> > > reported excellent performance report with the draft PR, this
> > work
> > > >>> is now
> > > >>> > > reactivated.
> > > >>> > >
> > > >>> > > The updated KIP document includes some expected problems and
> > their
> > > >>> > > candidate alternatives. Please have a look when you are free,
> and
> > > >>> give
> > > >>> > me a
> > > >>> > > feedback. All kinds of participating are welcome.
> > > >>> > >
> > > >>> > > Best,
> > > >>> > > Dongjin
> > > >>> > >
> > > >>> > > --
> > > >>> > > *Dongjin Lee*
> > > >>> > >
> > > >>> > > *A hitchhiker in the mathematical world.*
> > > >>> > >
> > > >>> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >>> > > <http://github.com/dongjinleekr>linkedin:
> > > >>> > kr.linkedin.com/in/dongjinleekr
> > > >>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > >>> > www.slideshare.net/dongjinleekr
> > > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > > >>> > >
> > > >>> >
> > > >>>
> > > >> --
> > > >> *Dongjin Lee*
> > > >>
> > > >> *A hitchhiker in the mathematical world.*
> > > >>
> > > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> <http://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > www.slideshare.net/dongjinleekr
> > > >> <http://www.slideshare.net/dongjinleekr>*
> > > >>
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > >
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > www.slideshare.net/dongjinleekr
> > > > <http://www.slideshare.net/dongjinleekr>*
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > www.slideshare.net/dongjinleekr
> > > <http://www.slideshare.net/dongjinleekr>*
> > >
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Ismael,

Oh, I forgot all of you are on working frenzy for 2.0! No problem, take
your time. I am also working at another issue now. Thank you for letting me
know.

Best,
Dongjin

On Wed, Jun 13, 2018, 11:44 PM Ismael Juma <is...@gmail.com> wrote:

> Sorry for the delay Dongjin. Everyone is busy finalising 2.0.0. This KIP
> seems like a great candidate for 2.1.0 and hopefully there will be more of
> a discussion next week. :)
>
> Ismael
>
> On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <do...@apache.org> wrote:
>
> > Hello. I just updated my draft implementation:
> >
> > 1. Rebased to latest trunk (commit 5145d6b)
> > 2. Apply ZStd 1.3.4
> >
> > You can check out the implementation from here
> > <https://github.com/apache/kafka/pull/2267>. If you experience any
> problem
> > running it, don't hesitate to give me a mention.
> >
> > Best,
> > Dongjin
> >
> > On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <do...@apache.org> wrote:
> >
> > > Here is the short conclusion about the license problem: *We can use
> zstd
> > > and zstd-jni without any problem, but we need to include their license,
> > > e.g., BSD license.*
> > >
> > > Both of BSD 2 Clause License & 3 Clause License requires to include the
> > > license used, and BSD 3 Clause License requires that the name of the
> > > contributor can't be used to endorse or promote the product. That's it
> > > <
> >
> http://www.mikestratton.net/2011/12/is-bsd-license-compatible-with-apache-2-0-license/
> > >
> > > - They are not listed in the list of prohibited licenses
> > > <https://www.apache.org/legal/resolved.html#category-x> also.
> > >
> > > Here is how Spark did for it
> > > <https://issues.apache.org/jira/browse/SPARK-19112>:
> > >
> > > - They made a directory dedicated to the dependency license files
> > > <https://github.com/apache/spark/tree/master/licenses> and added
> > licenses
> > > for Zstd
> > > <https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd.txt
> >
> > &
> > > Zstd-jni
> > > <
> >
> https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd-jni.txt>
> > > .
> > > - Added a link to the original license files in LICENSE.
> > > <https://github.com/apache/spark/pull/18805/files>
> > >
> > > If needed, I can make a similar update.
> > >
> > > Thanks for pointing out this problem, Viktor! Nice catch!
> > >
> > > Best,
> > > Dongjin
> > >
> > >
> > >
> > > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > >> I greatly appreciate your comprehensive reasoning. so: +1 for b until
> > now.
> > >>
> > >> For the license issues, I will have a check on how the over projects
> are
> > >> doing and share the results.
> > >>
> > >> Best,
> > >> Dongjin
> > >>
> > >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> > viktorsomogyi@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi Dongjin,
> > >>>
> > >>> A couple of comments:
> > >>> I would vote for option b. in the "backward compatibility" section.
> My
> > >>> reasoning for this is that users upgrading to a zstd compatible
> version
> > >>> won't start to use it automatically, so manual reconfiguration is
> > >>> required.
> > >>> Therefore an upgrade won't mess up the cluster. If not all the
> clients
> > >>> are
> > >>> upgraded but just some of them and they'd start to use zstd then it
> > would
> > >>> cause errors in the cluster. I'd like to presume though that this is
> a
> > >>> very
> > >>> obvious failure case and nobody should be surprised if it didn't
> work.
> > >>> I wouldn't choose a. as I think we should bump the fetch and produce
> > >>> requests if it's a change in the message format. Moreover if some of
> > the
> > >>> producers and the brokers are upgraded but some of the consumers are
> > not,
> > >>> then we wouldn't prevent the error when the old consumer tries to
> > consume
> > >>> the zstd compressed messages.
> > >>> I wouldn't choose c. either as I think binding the compression type
> to
> > an
> > >>> API is not so obvious from the developer's perspective.
> > >>>
> > >>> I would also prefer to use the existing binding, however we must
> > respect
> > >>> the licenses:
> > >>> "The code for these JNI bindings is licenced under 2-clause BSD
> > license.
> > >>> The native Zstd library is licensed under 3-clause BSD license and
> > GPL2"
> > >>> Based on the FAQ page
> > >>> https://www.apache.org/legal/resolved.html#category-a
> > >>> we may use 2- and 3-clause BSD licenses but the Apache license is not
> > >>> compatible with GPL2. I'm hoping that the "3-clause BSD license and
> > GPL2"
> > >>> is really not an AND but an OR in this case, but I'm no lawyer, just
> > >>> wanted
> > >>> to make the point that we should watch out for licenses. :)
> > >>>
> > >>> Regards,
> > >>> Viktor
> > >>>
> > >>>
> > >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com>
> wrote:
> > >>>
> > >>> > Hello,
> > >>> >
> > >>> > This is Ivan and I still very much support the fact that zstd
> > >>> compression
> > >>> > should be included out of the box.
> > >>> >
> > >>> > Please think about the environment, you can save quite a lot of
> > >>> hardware
> > >>> > with it.
> > >>> >
> > >>> > Thank you.
> > >>> >
> > >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org>
> wrote:
> > >>> >
> > >>> > > Since there are no responses for a week, I decided to reinitiate
> > the
> > >>> > > discussion thread.
> > >>> > >
> > >>> > >
> > >>> > >
> > >>> >
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
> > >>> > >
> > >>> > > This KIP is about to introduce ZStandard Compression into Apache
> > >>> Kafka.
> > >>> > > The reason why it is posted again has a story: It was originally
> > >>> posted
> > >>> > to
> > >>> > > the dev mailing list more than one year ago but since it has no
> > >>> > performance
> > >>> > > report included, it was postponed later. But Some people
> (including
> > >>> Ivan)
> > >>> > > reported excellent performance report with the draft PR, this
> work
> > >>> is now
> > >>> > > reactivated.
> > >>> > >
> > >>> > > The updated KIP document includes some expected problems and
> their
> > >>> > > candidate alternatives. Please have a look when you are free, and
> > >>> give
> > >>> > me a
> > >>> > > feedback. All kinds of participating are welcome.
> > >>> > >
> > >>> > > Best,
> > >>> > > Dongjin
> > >>> > >
> > >>> > > --
> > >>> > > *Dongjin Lee*
> > >>> > >
> > >>> > > *A hitchhiker in the mathematical world.*
> > >>> > >
> > >>> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >>> > > <http://github.com/dongjinleekr>linkedin:
> > >>> > kr.linkedin.com/in/dongjinleekr
> > >>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > >>> > www.slideshare.net/dongjinleekr
> > >>> > > <http://www.slideshare.net/dongjinleekr>*
> > >>> > >
> > >>> >
> > >>>
> > >> --
> > >> *Dongjin Lee*
> > >>
> > >> *A hitchhiker in the mathematical world.*
> > >>
> > >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> <http://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/dongjinleekr
> > >> <http://www.slideshare.net/dongjinleekr>*
> > >>
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/dongjinleekr
> > > <http://www.slideshare.net/dongjinleekr>*
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/dongjinleekr
> > <http://www.slideshare.net/dongjinleekr>*
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Ismael Juma <is...@gmail.com>.
Sorry for the delay Dongjin. Everyone is busy finalising 2.0.0. This KIP
seems like a great candidate for 2.1.0 and hopefully there will be more of
a discussion next week. :)

Ismael

On Wed, 13 Jun 2018, 05:17 Dongjin Lee, <do...@apache.org> wrote:

> Hello. I just updated my draft implementation:
>
> 1. Rebased to latest trunk (commit 5145d6b)
> 2. Apply ZStd 1.3.4
>
> You can check out the implementation from here
> <https://github.com/apache/kafka/pull/2267>. If you experience any problem
> running it, don't hesitate to give me a mention.
>
> Best,
> Dongjin
>
> On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <do...@apache.org> wrote:
>
> > Here is the short conclusion about the license problem: *We can use zstd
> > and zstd-jni without any problem, but we need to include their license,
> > e.g., BSD license.*
> >
> > Both of BSD 2 Clause License & 3 Clause License requires to include the
> > license used, and BSD 3 Clause License requires that the name of the
> > contributor can't be used to endorse or promote the product. That's it
> > <
> http://www.mikestratton.net/2011/12/is-bsd-license-compatible-with-apache-2-0-license/
> >
> > - They are not listed in the list of prohibited licenses
> > <https://www.apache.org/legal/resolved.html#category-x> also.
> >
> > Here is how Spark did for it
> > <https://issues.apache.org/jira/browse/SPARK-19112>:
> >
> > - They made a directory dedicated to the dependency license files
> > <https://github.com/apache/spark/tree/master/licenses> and added
> licenses
> > for Zstd
> > <https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd.txt>
> &
> > Zstd-jni
> > <
> https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd-jni.txt>
> > .
> > - Added a link to the original license files in LICENSE.
> > <https://github.com/apache/spark/pull/18805/files>
> >
> > If needed, I can make a similar update.
> >
> > Thanks for pointing out this problem, Viktor! Nice catch!
> >
> > Best,
> > Dongjin
> >
> >
> >
> > On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <do...@apache.org> wrote:
> >
> >> I greatly appreciate your comprehensive reasoning. so: +1 for b until
> now.
> >>
> >> For the license issues, I will have a check on how the over projects are
> >> doing and share the results.
> >>
> >> Best,
> >> Dongjin
> >>
> >> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <
> viktorsomogyi@gmail.com>
> >> wrote:
> >>
> >>> Hi Dongjin,
> >>>
> >>> A couple of comments:
> >>> I would vote for option b. in the "backward compatibility" section. My
> >>> reasoning for this is that users upgrading to a zstd compatible version
> >>> won't start to use it automatically, so manual reconfiguration is
> >>> required.
> >>> Therefore an upgrade won't mess up the cluster. If not all the clients
> >>> are
> >>> upgraded but just some of them and they'd start to use zstd then it
> would
> >>> cause errors in the cluster. I'd like to presume though that this is a
> >>> very
> >>> obvious failure case and nobody should be surprised if it didn't work.
> >>> I wouldn't choose a. as I think we should bump the fetch and produce
> >>> requests if it's a change in the message format. Moreover if some of
> the
> >>> producers and the brokers are upgraded but some of the consumers are
> not,
> >>> then we wouldn't prevent the error when the old consumer tries to
> consume
> >>> the zstd compressed messages.
> >>> I wouldn't choose c. either as I think binding the compression type to
> an
> >>> API is not so obvious from the developer's perspective.
> >>>
> >>> I would also prefer to use the existing binding, however we must
> respect
> >>> the licenses:
> >>> "The code for these JNI bindings is licenced under 2-clause BSD
> license.
> >>> The native Zstd library is licensed under 3-clause BSD license and
> GPL2"
> >>> Based on the FAQ page
> >>> https://www.apache.org/legal/resolved.html#category-a
> >>> we may use 2- and 3-clause BSD licenses but the Apache license is not
> >>> compatible with GPL2. I'm hoping that the "3-clause BSD license and
> GPL2"
> >>> is really not an AND but an OR in this case, but I'm no lawyer, just
> >>> wanted
> >>> to make the point that we should watch out for licenses. :)
> >>>
> >>> Regards,
> >>> Viktor
> >>>
> >>>
> >>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com> wrote:
> >>>
> >>> > Hello,
> >>> >
> >>> > This is Ivan and I still very much support the fact that zstd
> >>> compression
> >>> > should be included out of the box.
> >>> >
> >>> > Please think about the environment, you can save quite a lot of
> >>> hardware
> >>> > with it.
> >>> >
> >>> > Thank you.
> >>> >
> >>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org> wrote:
> >>> >
> >>> > > Since there are no responses for a week, I decided to reinitiate
> the
> >>> > > discussion thread.
> >>> > >
> >>> > >
> >>> > >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
> >>> > >
> >>> > > This KIP is about to introduce ZStandard Compression into Apache
> >>> Kafka.
> >>> > > The reason why it is posted again has a story: It was originally
> >>> posted
> >>> > to
> >>> > > the dev mailing list more than one year ago but since it has no
> >>> > performance
> >>> > > report included, it was postponed later. But Some people (including
> >>> Ivan)
> >>> > > reported excellent performance report with the draft PR, this work
> >>> is now
> >>> > > reactivated.
> >>> > >
> >>> > > The updated KIP document includes some expected problems and their
> >>> > > candidate alternatives. Please have a look when you are free, and
> >>> give
> >>> > me a
> >>> > > feedback. All kinds of participating are welcome.
> >>> > >
> >>> > > Best,
> >>> > > Dongjin
> >>> > >
> >>> > > --
> >>> > > *Dongjin Lee*
> >>> > >
> >>> > > *A hitchhiker in the mathematical world.*
> >>> > >
> >>> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >>> > > <http://github.com/dongjinleekr>linkedin:
> >>> > kr.linkedin.com/in/dongjinleekr
> >>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> >>> > www.slideshare.net/dongjinleekr
> >>> > > <http://www.slideshare.net/dongjinleekr>*
> >>> > >
> >>> >
> >>>
> >> --
> >> *Dongjin Lee*
> >>
> >> *A hitchhiker in the mathematical world.*
> >>
> >> *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> <http://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> >> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/dongjinleekr
> >> <http://www.slideshare.net/dongjinleekr>*
> >>
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/dongjinleekr
> > <http://www.slideshare.net/dongjinleekr>*
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Hello. I just updated my draft implementation:

1. Rebased to latest trunk (commit 5145d6b)
2. Apply ZStd 1.3.4

You can check out the implementation from here
<https://github.com/apache/kafka/pull/2267>. If you experience any problem
running it, don't hesitate to give me a mention.

Best,
Dongjin

On Tue, Jun 12, 2018 at 6:50 PM Dongjin Lee <do...@apache.org> wrote:

> Here is the short conclusion about the license problem: *We can use zstd
> and zstd-jni without any problem, but we need to include their license,
> e.g., BSD license.*
>
> Both of BSD 2 Clause License & 3 Clause License requires to include the
> license used, and BSD 3 Clause License requires that the name of the
> contributor can't be used to endorse or promote the product. That's it
> <http://www.mikestratton.net/2011/12/is-bsd-license-compatible-with-apache-2-0-license/>
> - They are not listed in the list of prohibited licenses
> <https://www.apache.org/legal/resolved.html#category-x> also.
>
> Here is how Spark did for it
> <https://issues.apache.org/jira/browse/SPARK-19112>:
>
> - They made a directory dedicated to the dependency license files
> <https://github.com/apache/spark/tree/master/licenses> and added licenses
> for Zstd
> <https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd.txt> &
> Zstd-jni
> <https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd-jni.txt>
> .
> - Added a link to the original license files in LICENSE.
> <https://github.com/apache/spark/pull/18805/files>
>
> If needed, I can make a similar update.
>
> Thanks for pointing out this problem, Viktor! Nice catch!
>
> Best,
> Dongjin
>
>
>
> On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <do...@apache.org> wrote:
>
>> I greatly appreciate your comprehensive reasoning. so: +1 for b until now.
>>
>> For the license issues, I will have a check on how the over projects are
>> doing and share the results.
>>
>> Best,
>> Dongjin
>>
>> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <vi...@gmail.com>
>> wrote:
>>
>>> Hi Dongjin,
>>>
>>> A couple of comments:
>>> I would vote for option b. in the "backward compatibility" section. My
>>> reasoning for this is that users upgrading to a zstd compatible version
>>> won't start to use it automatically, so manual reconfiguration is
>>> required.
>>> Therefore an upgrade won't mess up the cluster. If not all the clients
>>> are
>>> upgraded but just some of them and they'd start to use zstd then it would
>>> cause errors in the cluster. I'd like to presume though that this is a
>>> very
>>> obvious failure case and nobody should be surprised if it didn't work.
>>> I wouldn't choose a. as I think we should bump the fetch and produce
>>> requests if it's a change in the message format. Moreover if some of the
>>> producers and the brokers are upgraded but some of the consumers are not,
>>> then we wouldn't prevent the error when the old consumer tries to consume
>>> the zstd compressed messages.
>>> I wouldn't choose c. either as I think binding the compression type to an
>>> API is not so obvious from the developer's perspective.
>>>
>>> I would also prefer to use the existing binding, however we must respect
>>> the licenses:
>>> "The code for these JNI bindings is licenced under 2-clause BSD license.
>>> The native Zstd library is licensed under 3-clause BSD license and GPL2"
>>> Based on the FAQ page
>>> https://www.apache.org/legal/resolved.html#category-a
>>> we may use 2- and 3-clause BSD licenses but the Apache license is not
>>> compatible with GPL2. I'm hoping that the "3-clause BSD license and GPL2"
>>> is really not an AND but an OR in this case, but I'm no lawyer, just
>>> wanted
>>> to make the point that we should watch out for licenses. :)
>>>
>>> Regards,
>>> Viktor
>>>
>>>
>>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com> wrote:
>>>
>>> > Hello,
>>> >
>>> > This is Ivan and I still very much support the fact that zstd
>>> compression
>>> > should be included out of the box.
>>> >
>>> > Please think about the environment, you can save quite a lot of
>>> hardware
>>> > with it.
>>> >
>>> > Thank you.
>>> >
>>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org> wrote:
>>> >
>>> > > Since there are no responses for a week, I decided to reinitiate the
>>> > > discussion thread.
>>> > >
>>> > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
>>> > >
>>> > > This KIP is about to introduce ZStandard Compression into Apache
>>> Kafka.
>>> > > The reason why it is posted again has a story: It was originally
>>> posted
>>> > to
>>> > > the dev mailing list more than one year ago but since it has no
>>> > performance
>>> > > report included, it was postponed later. But Some people (including
>>> Ivan)
>>> > > reported excellent performance report with the draft PR, this work
>>> is now
>>> > > reactivated.
>>> > >
>>> > > The updated KIP document includes some expected problems and their
>>> > > candidate alternatives. Please have a look when you are free, and
>>> give
>>> > me a
>>> > > feedback. All kinds of participating are welcome.
>>> > >
>>> > > Best,
>>> > > Dongjin
>>> > >
>>> > > --
>>> > > *Dongjin Lee*
>>> > >
>>> > > *A hitchhiker in the mathematical world.*
>>> > >
>>> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>>> > > <http://github.com/dongjinleekr>linkedin:
>>> > kr.linkedin.com/in/dongjinleekr
>>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>>> > www.slideshare.net/dongjinleekr
>>> > > <http://www.slideshare.net/dongjinleekr>*
>>> > >
>>> >
>>>
>> --
>> *Dongjin Lee*
>>
>> *A hitchhiker in the mathematical world.*
>>
>> *github:  <http://goog_969573159/>github.com/dongjinleekr
>> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
>> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
>> <http://www.slideshare.net/dongjinleekr>*
>>
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
Here is the short conclusion about the license problem: *We can use zstd
and zstd-jni without any problem, but we need to include their license,
e.g., BSD license.*

Both of BSD 2 Clause License & 3 Clause License requires to include the
license used, and BSD 3 Clause License requires that the name of the
contributor can't be used to endorse or promote the product. That's it
<http://www.mikestratton.net/2011/12/is-bsd-license-compatible-with-apache-2-0-license/>
- They are not listed in the list of prohibited licenses
<https://www.apache.org/legal/resolved.html#category-x> also.

Here is how Spark did for it
<https://issues.apache.org/jira/browse/SPARK-19112>:

- They made a directory dedicated to the dependency license files
<https://github.com/apache/spark/tree/master/licenses> and added licenses
for Zstd
<https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd.txt> &
Zstd-jni
<https://github.com/apache/spark/blob/master/licenses/LICENSE-zstd-jni.txt>.
- Added a link to the original license files in LICENSE.
<https://github.com/apache/spark/pull/18805/files>

If needed, I can make a similar update.

Thanks for pointing out this problem, Viktor! Nice catch!

Best,
Dongjin



On Mon, Jun 11, 2018 at 11:50 PM Dongjin Lee <do...@apache.org> wrote:

> I greatly appreciate your comprehensive reasoning. so: +1 for b until now.
>
> For the license issues, I will have a check on how the over projects are
> doing and share the results.
>
> Best,
> Dongjin
>
> On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <vi...@gmail.com>
> wrote:
>
>> Hi Dongjin,
>>
>> A couple of comments:
>> I would vote for option b. in the "backward compatibility" section. My
>> reasoning for this is that users upgrading to a zstd compatible version
>> won't start to use it automatically, so manual reconfiguration is
>> required.
>> Therefore an upgrade won't mess up the cluster. If not all the clients are
>> upgraded but just some of them and they'd start to use zstd then it would
>> cause errors in the cluster. I'd like to presume though that this is a
>> very
>> obvious failure case and nobody should be surprised if it didn't work.
>> I wouldn't choose a. as I think we should bump the fetch and produce
>> requests if it's a change in the message format. Moreover if some of the
>> producers and the brokers are upgraded but some of the consumers are not,
>> then we wouldn't prevent the error when the old consumer tries to consume
>> the zstd compressed messages.
>> I wouldn't choose c. either as I think binding the compression type to an
>> API is not so obvious from the developer's perspective.
>>
>> I would also prefer to use the existing binding, however we must respect
>> the licenses:
>> "The code for these JNI bindings is licenced under 2-clause BSD license.
>> The native Zstd library is licensed under 3-clause BSD license and GPL2"
>> Based on the FAQ page
>> https://www.apache.org/legal/resolved.html#category-a
>> we may use 2- and 3-clause BSD licenses but the Apache license is not
>> compatible with GPL2. I'm hoping that the "3-clause BSD license and GPL2"
>> is really not an AND but an OR in this case, but I'm no lawyer, just
>> wanted
>> to make the point that we should watch out for licenses. :)
>>
>> Regards,
>> Viktor
>>
>>
>> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com> wrote:
>>
>> > Hello,
>> >
>> > This is Ivan and I still very much support the fact that zstd
>> compression
>> > should be included out of the box.
>> >
>> > Please think about the environment, you can save quite a lot of hardware
>> > with it.
>> >
>> > Thank you.
>> >
>> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org> wrote:
>> >
>> > > Since there are no responses for a week, I decided to reinitiate the
>> > > discussion thread.
>> > >
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
>> > >
>> > > This KIP is about to introduce ZStandard Compression into Apache
>> Kafka.
>> > > The reason why it is posted again has a story: It was originally
>> posted
>> > to
>> > > the dev mailing list more than one year ago but since it has no
>> > performance
>> > > report included, it was postponed later. But Some people (including
>> Ivan)
>> > > reported excellent performance report with the draft PR, this work is
>> now
>> > > reactivated.
>> > >
>> > > The updated KIP document includes some expected problems and their
>> > > candidate alternatives. Please have a look when you are free, and give
>> > me a
>> > > feedback. All kinds of participating are welcome.
>> > >
>> > > Best,
>> > > Dongjin
>> > >
>> > > --
>> > > *Dongjin Lee*
>> > >
>> > > *A hitchhiker in the mathematical world.*
>> > >
>> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > <http://github.com/dongjinleekr>linkedin:
>> > kr.linkedin.com/in/dongjinleekr
>> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>> > www.slideshare.net/dongjinleekr
>> > > <http://www.slideshare.net/dongjinleekr>*
>> > >
>> >
>>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Dongjin Lee <do...@apache.org>.
I greatly appreciate your comprehensive reasoning. so: +1 for b until now.

For the license issues, I will have a check on how the over projects are
doing and share the results.

Best,
Dongjin

On Mon, Jun 11, 2018 at 10:08 PM Viktor Somogyi <vi...@gmail.com>
wrote:

> Hi Dongjin,
>
> A couple of comments:
> I would vote for option b. in the "backward compatibility" section. My
> reasoning for this is that users upgrading to a zstd compatible version
> won't start to use it automatically, so manual reconfiguration is required.
> Therefore an upgrade won't mess up the cluster. If not all the clients are
> upgraded but just some of them and they'd start to use zstd then it would
> cause errors in the cluster. I'd like to presume though that this is a very
> obvious failure case and nobody should be surprised if it didn't work.
> I wouldn't choose a. as I think we should bump the fetch and produce
> requests if it's a change in the message format. Moreover if some of the
> producers and the brokers are upgraded but some of the consumers are not,
> then we wouldn't prevent the error when the old consumer tries to consume
> the zstd compressed messages.
> I wouldn't choose c. either as I think binding the compression type to an
> API is not so obvious from the developer's perspective.
>
> I would also prefer to use the existing binding, however we must respect
> the licenses:
> "The code for these JNI bindings is licenced under 2-clause BSD license.
> The native Zstd library is licensed under 3-clause BSD license and GPL2"
> Based on the FAQ page
> https://www.apache.org/legal/resolved.html#category-a
> we may use 2- and 3-clause BSD licenses but the Apache license is not
> compatible with GPL2. I'm hoping that the "3-clause BSD license and GPL2"
> is really not an AND but an OR in this case, but I'm no lawyer, just wanted
> to make the point that we should watch out for licenses. :)
>
> Regards,
> Viktor
>
>
> On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com> wrote:
>
> > Hello,
> >
> > This is Ivan and I still very much support the fact that zstd compression
> > should be included out of the box.
> >
> > Please think about the environment, you can save quite a lot of hardware
> > with it.
> >
> > Thank you.
> >
> > On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org> wrote:
> >
> > > Since there are no responses for a week, I decided to reinitiate the
> > > discussion thread.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
> > >
> > > This KIP is about to introduce ZStandard Compression into Apache Kafka.
> > > The reason why it is posted again has a story: It was originally posted
> > to
> > > the dev mailing list more than one year ago but since it has no
> > performance
> > > report included, it was postponed later. But Some people (including
> Ivan)
> > > reported excellent performance report with the draft PR, this work is
> now
> > > reactivated.
> > >
> > > The updated KIP document includes some expected problems and their
> > > candidate alternatives. Please have a look when you are free, and give
> > me a
> > > feedback. All kinds of participating are welcome.
> > >
> > > Best,
> > > Dongjin
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/dongjinleekr
> > > <http://www.slideshare.net/dongjinleekr>*
> > >
> >
>
-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Viktor Somogyi <vi...@gmail.com>.
Hi Dongjin,

A couple of comments:
I would vote for option b. in the "backward compatibility" section. My
reasoning for this is that users upgrading to a zstd compatible version
won't start to use it automatically, so manual reconfiguration is required.
Therefore an upgrade won't mess up the cluster. If not all the clients are
upgraded but just some of them and they'd start to use zstd then it would
cause errors in the cluster. I'd like to presume though that this is a very
obvious failure case and nobody should be surprised if it didn't work.
I wouldn't choose a. as I think we should bump the fetch and produce
requests if it's a change in the message format. Moreover if some of the
producers and the brokers are upgraded but some of the consumers are not,
then we wouldn't prevent the error when the old consumer tries to consume
the zstd compressed messages.
I wouldn't choose c. either as I think binding the compression type to an
API is not so obvious from the developer's perspective.

I would also prefer to use the existing binding, however we must respect
the licenses:
"The code for these JNI bindings is licenced under 2-clause BSD license.
The native Zstd library is licensed under 3-clause BSD license and GPL2"
Based on the FAQ page https://www.apache.org/legal/resolved.html#category-a
we may use 2- and 3-clause BSD licenses but the Apache license is not
compatible with GPL2. I'm hoping that the "3-clause BSD license and GPL2"
is really not an AND but an OR in this case, but I'm no lawyer, just wanted
to make the point that we should watch out for licenses. :)

Regards,
Viktor


On Sun, Jun 10, 2018 at 3:02 AM Ivan Babrou <ib...@gmail.com> wrote:

> Hello,
>
> This is Ivan and I still very much support the fact that zstd compression
> should be included out of the box.
>
> Please think about the environment, you can save quite a lot of hardware
> with it.
>
> Thank you.
>
> On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org> wrote:
>
> > Since there are no responses for a week, I decided to reinitiate the
> > discussion thread.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
> >
> > This KIP is about to introduce ZStandard Compression into Apache Kafka.
> > The reason why it is posted again has a story: It was originally posted
> to
> > the dev mailing list more than one year ago but since it has no
> performance
> > report included, it was postponed later. But Some people (including Ivan)
> > reported excellent performance report with the draft PR, this work is now
> > reactivated.
> >
> > The updated KIP document includes some expected problems and their
> > candidate alternatives. Please have a look when you are free, and give
> me a
> > feedback. All kinds of participating are welcome.
> >
> > Best,
> > Dongjin
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/dongjinleekr
> > <http://www.slideshare.net/dongjinleekr>*
> >
>

Re: [DISCUSS] KIP-110: Add Codec for ZStandard Compression (Updated)

Posted by Ivan Babrou <ib...@gmail.com>.
Hello,

This is Ivan and I still very much support the fact that zstd compression
should be included out of the box.

Please think about the environment, you can save quite a lot of hardware
with it.

Thank you.

On Sat, Jun 9, 2018 at 14:14 Dongjin Lee <do...@apache.org> wrote:

> Since there are no responses for a week, I decided to reinitiate the
> discussion thread.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-110%3A+Add+Codec+for+ZStandard+Compression
>
> This KIP is about to introduce ZStandard Compression into Apache Kafka.
> The reason why it is posted again has a story: It was originally posted to
> the dev mailing list more than one year ago but since it has no performance
> report included, it was postponed later. But Some people (including Ivan)
> reported excellent performance report with the draft PR, this work is now
> reactivated.
>
> The updated KIP document includes some expected problems and their
> candidate alternatives. Please have a look when you are free, and give me a
> feedback. All kinds of participating are welcome.
>
> Best,
> Dongjin
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>