You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Dongjin Lee <do...@apache.org> on 2018/11/18 22:06:40 UTC

[DISCUSS] KIP-390: Add producer option to adjust compression level

Hello dev,

I hope to initiate the discussion of KIP-390: Add producer option to adjust
compression level
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level>.
All feedbacks will be highly appreciated.

Best,
Dongjin

-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  <http://goog_969573159/>github.com/dongjinleekr
<http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<http://kr.linkedin.com/in/dongjinleekr>slideshare:
www.slideshare.net/dongjinleekr
<http://www.slideshare.net/dongjinleekr>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Is there anyone who is interested in this KIP?

Thanks,
Dongjin

On Mon, Nov 19, 2018 at 7:06 AM Dongjin Lee <do...@apache.org> wrote:

> Hello dev,
>
> I hope to initiate the discussion of KIP-390: Add producer option to
> adjust compression level
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level>.
> All feedbacks will be highly appreciated.
>
> Best,
> Dongjin
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare: www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Becket Qin <be...@gmail.com>.
This thread has been quiet for some time now as there are a lot of things
going on concurrently.

@Ismael Juma <is...@juma.me.uk> @Jason Gustafson <ja...@confluent.io> what
do you think about this? Just a brief recap, we are considering exposing
the configs for each compression type in the following k-v pairs format:

compression.configs="gzip.compression.level:5, lz4.compression.level:17,
zstd.compression.level:22"

Thanks,

Jiangjie (Becket) Qin

On Thu, Feb 7, 2019 at 11:28 PM Dongjin Lee <do...@apache.org> wrote:

> Hi Becket,
>
> Understood. You are right, introducing map type is just a clearer way, but
> not a strong prerequisite. It can be discussed later.
>
> @Committers, PMC members:
>
> Could you please give us some valuable feedbacks and suggestions for this
> topic?
>
>
> Thanks,
>
> Dongjin
>
> On Fri, Feb 1, 2019 at 11:37 AM Becket Qin <be...@gmail.com> wrote:
>
> > Hi Dongjin,
> >
> > Thanks for the reply. I agree that implementation wise, it is clearer to
> > add a new "Map" config type. However, practically speaking I don't think
> > this KIP strongly depends on that. As long as we follow the same string
> > format in the configuration for all those map-like configurations,
> whether
> > to have a Map config type or not seems not that important.
> >
> > Given that the existing configuration uses the format of
> > "KEY:VALUE[,KEY:VALUE]", we could just use this string format.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Thu, Jan 31, 2019 at 7:47 PM Dongjin Lee <do...@apache.org> wrote:
> >
> > > Mickael,
> > >
> > > It seems like the majority of the community agrees that the new config
> > > scheme proposed by Becket is much better. However, we still need
> another
> > > KIP to support this kind of config; it is different from the case of
> > > `listener.security.protocol.map,` which is just a concatenation of
> > > available settings (i.e., string type.) For Becket's idea, we need to
> add
> > > support for the key-value style.
> > >
> > > It seems like we need advice from a committer, whether to add a new
> > > prerequisite KIP for the map type configuration to implement Becket's
> > idea.
> > >
> > > Committers:
> > >
> > > Could you give us some advice on this problem?
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Wed, Jan 30, 2019 at 11:30 PM Mickael Maison <
> > mickael.maison@gmail.com>
> > > wrote:
> > >
> > > > Thanks, that's a very interesting KIP!
> > > >
> > > > I agree with Becket that a clearer config format is likely to help.
> > > > Have you considered using a config format similar to listeners (as
> > > > described in the "Updating SSL Keystore of an Existing Listener" in
> > > > the Kafka docs)?
> > > >
> > > > Also worth noting that we already have configs that effectively take
> a
> > > > map, for example: listener.security.protocol.map so not sure if we
> > > > need an additional KIP
> > > >
> > > > On Wed, Jan 30, 2019 at 5:22 AM Dongjin Lee <do...@apache.org>
> > wrote:
> > > > >
> > > > > Hello.
> > > > >
> > > > > Do you have any idea on Becket's Idea of new config format (example
> > > > below)?
> > > > >
> > > > > ```
> > > > > compression.config="gzip.compression.level=5,
> > lz4.compression.level=17,
> > > > > zstd.compression.level=22"
> > > > > ```
> > > > >
> > > > > It requires some additional KIP for supporting new config format
> > (map),
> > > > but
> > > > > it can significantly simplify the configuration with flexibility
> and
> > > > > extensibility. If you prefer this way, I hope to carry the ball.
> > > > >
> > > > > If not, please give me an opinion here or the voting thread.
> > > > >
> > > > > Thanks,
> > > > > Dongjin
> > > > >
> > > > >
> > > > > On Fri, Jan 25, 2019 at 1:25 AM Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >
> > > > > > Hi Becket,
> > > > > >
> > > > > > Thank you for your opinion. Frankly, I have no strong opinion on
> > > > > > configuration name. In this problem, I will follow the
> community's
> > > > choice.
> > > > > > (I like your idea in that it has a notion of 'scope' per
> > compression
> > > > codec.
> > > > > > However, it should be implemented on top of new config type like
> > Map;
> > > > It
> > > > > > will require another KIP as a prerequisite, but if the community
> > > prefer
> > > > > > this way, I will take the task.)
> > > > > >
> > > > > > (One minor correction: the one who thought 'producer' compression
> > > > config
> > > > > > would cause a problem at broker was me, not Ismael - and Ismael
> > > > reassured
> > > > > > me there will be no problem with it.)
> > > > > >
> > > > > > To All,
> > > > > >
> > > > > > How about Becket's idea of 'compression.config' option?
> > > > > >
> > > > > > Best,
> > > > > > Dongjin
> > > > > >
> > > > > > On Wed, Jan 23, 2019 at 1:16 PM Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Hi Dongjin,
> > > > > >>
> > > > > >> Thanks for the KIP and sorry for being a bit late on the
> > discussion.
> > > > > >>
> > > > > >> It makes sense to expose the configuration for compression
> types.
> > > But
> > > > I am
> > > > > >> wondering if there is a better way to do that than what proposed
> > in
> > > > the
> > > > > >> KIP. What I feel confusing is that we are effectively sharing
> the
> > > > > >> configuration across different compression types, the meaning of
> > the
> > > > > >> configuration are actually kind of different depending on the
> > > > compression
> > > > > >> type. This will end up with issues like what Ismael has brought
> up
> > > > > >> earlier.
> > > > > >> Say if the broker has compression type of producer (this may
> > result
> > > in
> > > > > >> mixed compression type in the same topic), and for some reason
> the
> > > > broker
> > > > > >> needs to re-compress the topic (e.g. due to log compaction), a
> > > single
> > > > > >> topic
> > > > > >> level compression config may not work, because a valid
> compression
> > > > level
> > > > > >> for lz4 maybe invalid for gzip.
> > > > > >>
> > > > > >> One alternative I am thinking is to provide a
> "compression.config"
> > > > > >> configuration, inside which it specifies configuration used by
> > each
> > > > > >> specific compression type as k-v pairs. The format could use
> some
> > > name
> > > > > >> space as well. For example,
> > > > > >>
> > > > > >>
> > > > > >>
> > > >
> > >
> >
> compression.config="gzip.compression.level=5,lz4.compression.level=17,zstd.compression.level=22".
> > > > > >>
> > > > > >> Each compression type will just pick whatever configuration they
> > > need
> > > > from
> > > > > >> the k-v pairs defined in this config.
> > > > > >>
> > > > > >> Besides clarity, some other benefits are:
> > > > > >> 1. Extensibility, we don't have to add more configuration when
> we
> > > add
> > > > new
> > > > > >> compression types, or expose new config for a particular
> > compression
> > > > type.
> > > > > >> 2. Even for the same config, different compression type may have
> > > > different
> > > > > >> terminologies. With the approach we can honor those
> terminologies
> > > > instead
> > > > > >> of shoehorning them into the same configuration name.
> > > > > >>
> > > > > >> What do you think?
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Jiangjie (Becket) Qin
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Jan 23, 2019 at 12:07 AM Dongjin Lee <
> dongjin@apache.org>
> > > > wrote:
> > > > > >>
> > > > > >> > Hello. I just fixed the draft implementation, with rebasing
> onto
> > > the
> > > > > >> latest
> > > > > >> > trunk. The KIP was also restored.
> > > > > >> >
> > > > > >> > Please have a look, and if there is no major problem, please
> > vote
> > > > to the
> > > > > >> > voting thread. You know, KIP freeze for 2.2.0 is almost
> > imminent.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Dongjin
> > > > > >> >
> > > > > >> > On Tue, Jan 22, 2019 at 1:04 AM Ismael Juma <
> ismael@juma.me.uk>
> > > > wrote:
> > > > > >> >
> > > > > >> > > Thanks!
> > > > > >> > >
> > > > > >> > > Ismael
> > > > > >> > >
> > > > > >> > > On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <
> > dongjin@apache.org
> > > >
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > > > Hi Ismael,
> > > > > >> > > >
> > > > > >> > > > After reviewing
> > > > > >> > > `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
> > > > > >> > > > yes, you are right. If source codec and target codec is
> > > > identical
> > > > > >> and
> > > > > >> > the
> > > > > >> > > > magic is above 0, the broker can do an in-place
> assignment,
> > > > without
> > > > > >> > > > recompressing. Sorry for my misunderstanding.
> > > > > >> > > >
> > > > > >> > > > Since we don't need `compression.[gzip,lz4,zstd].level`
> and
> > > > > >> > > > `compression.[gzip,snappy,lz4].buffer.size` anymore, I
> will
> > > > revert
> > > > > >> > those
> > > > > >> > > > changes and update the KIP with the recent discussions. I
> > will
> > > > > >> complete
> > > > > >> > > it
> > > > > >> > > > in 48 hours from now.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Dongjin
> > > > > >> > > >
> > > > > >> > > > On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <
> > > dongjin@apache.org
> > > > >
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > I see. Let me have a check. If not needed, of course, we
> > > don't
> > > > > >> have
> > > > > >> > to
> > > > > >> > > > > waste on configuration options.
> > > > > >> > > > >
> > > > > >> > > > > Since the KIP deadline is imminent, I just opened the
> > voting
> > > > > >> thread.
> > > > > >> > > > Let's
> > > > > >> > > > > continue the discussion here.
> > > > > >> > > > >
> > > > > >> > > > > Best,
> > > > > >> > > > > Dongjin
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <
> > > > ismael@juma.me.uk>
> > > > > >> > wrote:
> > > > > >> > > > >
> > > > > >> > > > >> Hi Dongjin,
> > > > > >> > > > >>
> > > > > >> > > > >> When the compression type is "producer", then the
> broker
> > > > doesn't
> > > > > >> > > > >> recompress
> > > > > >> > > > >> though. Thinking about it some more, there are some
> > > uncommon
> > > > > >> cases
> > > > > >> > > where
> > > > > >> > > > >> recompression does happen (the old (and hopefully
> hardly
> > > > used by
> > > > > >> > now)
> > > > > >> > > > >> message format == 0 and some edge cases), so it is a
> good
> > > > point
> > > > > >> you
> > > > > >> > > > >> raised.
> > > > > >> > > > >>
> > > > > >> > > > >> It's a bit unfortunate to add so many topic configs for
> > > cases
> > > > > >> that
> > > > > >> > > > >> probably
> > > > > >> > > > >> don't matter. That is, if you are using "producer"
> > > > compression,
> > > > > >> you
> > > > > >> > > > >> probably don't need to configure these settings and can
> > > live
> > > > with
> > > > > >> > the
> > > > > >> > > > >> defaults. Perhaps we should only support the topic
> config
> > > > for the
> > > > > >> > > cases
> > > > > >> > > > >> where you are actually recompressing in the broker.
> > > > > >> > > > >>
> > > > > >> > > > >> What do you think? I'd be interested in other people's
> > > > thoughts
> > > > > >> too.
> > > > > >> > > > >>
> > > > > >> > > > >> Ismael
> > > > > >> > > > >>
> > > > > >> > > > >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <
> > > > dongjin@apache.org>
> > > > > >> > > wrote:
> > > > > >> > > > >>
> > > > > >> > > > >> > Hi Ismael,
> > > > > >> > > > >> >
> > > > > >> > > > >> > It seems like it needs more explanation. Here is the
> > > > detailed
> > > > > >> > > > reasoning.
> > > > > >> > > > >> >
> > > > > >> > > > >> > You know, topic and broker's 'compression.type'
> allows
> > > > > >> > > 'uncompressed',
> > > > > >> > > > >> > 'producer' with standard codecs (i.e., gzip, snappy,
> > lz4,
> > > > > >> zstd.)
> > > > > >> > And
> > > > > >> > > > >> this
> > > > > >> > > > >> > configuration is used by the broker in the
> > re-compressing
> > > > > >> process
> > > > > >> > > > after
> > > > > >> > > > >> > offset assignment. After this feature, the new
> configs,
> > > > > >> > > > >> 'compression.level'
> > > > > >> > > > >> > and 'compression.buffer.size', also will be used in
> > this
> > > > > >> process.
> > > > > >> > > > >> >
> > > > > >> > > > >> > The problem arises when given topic's compression
> type
> > > > > >> (whether it
> > > > > >> > > was
> > > > > >> > > > >> > inherited from broker's configuration or explicitly
> > set)
> > > is
> > > > > >> > > > 'producer.'
> > > > > >> > > > >> > With this setting, the compression codec to be used
> is
> > > > decided
> > > > > >> by
> > > > > >> > > the
> > > > > >> > > > >> > producer client. Since there is no way to restore the
> > > > > >> compression
> > > > > >> > > > level
> > > > > >> > > > >> and
> > > > > >> > > > >> > buffer size from the message, we can take the
> following
> > > > > >> > strategies:
> > > > > >> > > > >> >
> > > > > >> > > > >> > 1. Just use given 'compression.level' and
> > > > > >> > 'compression.buffer.size'
> > > > > >> > > > >> > settings.
> > > > > >> > > > >> >
> > > > > >> > > > >> > It will cause so many errors. Let's imagine the case
> of
> > > > topic's
> > > > > >> > > > >> > configuration is { compression.type=producer,
> > > > > >> > compression.level=10,
> > > > > >> > > > >> > compression.buffer.size=8192 }. In this case, all
> > > producers
> > > > > >> with
> > > > > >> > > gzip
> > > > > >> > > > or
> > > > > >> > > > >> > lz4 compressed messages will result in an error.
> (gzip
> > > > doesn't
> > > > > >> > allow
> > > > > >> > > > >> > compression level 10, and lz4 also for a buffer size
> of
> > > > 8192.)
> > > > > >> > > > >> >
> > > > > >> > > > >> > 2. Extend the message format to include compression
> > > > > >> > configurations.
> > > > > >> > > > >> >
> > > > > >> > > > >> > With this strategy, we need to change the message
> > format
> > > -
> > > > > >> it's a
> > > > > >> > > too
> > > > > >> > > > >> big
> > > > > >> > > > >> > change.
> > > > > >> > > > >> >
> > > > > >> > > > >> > 3. If topic's compression.type is 'producer', use the
> > > > default
> > > > > >> > > > >> configuration
> > > > > >> > > > >> > for the given codec.
> > > > > >> > > > >> >
> > > > > >> > > > >> > With this strategy, allowing fine-grained compression
> > > > > >> > configuration
> > > > > >> > > is
> > > > > >> > > > >> > meaningless.
> > > > > >> > > > >> >
> > > > > >> > > > >> > For the above reasons, I think the only alternative
> is
> > > > > >> providing
> > > > > >> > > > options
> > > > > >> > > > >> > that can be used when the topic's 'compression.type'
> is
> > > > > >> > 'producer.'
> > > > > >> > > In
> > > > > >> > > > >> > other words, adding compression.[gzip, lz4,
> zstd].level
> > > and
> > > > > >> > > > >> > compression.[gzip.snappy.lz4].buffer.size options -
> and
> > > it
> > > > is
> > > > > >> > what I
> > > > > >> > > > >> did in
> > > > > >> > > > >> > the last modification.
> > > > > >> > > > >> >
> > > > > >> > > > >> > (wait, the reasoning above should be included in the
> > KIP
> > > > in the
> > > > > >> > > > rejected
> > > > > >> > > > >> > alternatives section, isn't it?)
> > > > > >> > > > >> >
> > > > > >> > > > >> > Thanks,
> > > > > >> > > > >> > Dongjin
> > > > > >> > > > >> >
> > > > > >> > > > >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <
> > > > ismael@juma.me.uk
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > > >> >
> > > > > >> > > > >> > > Hi Dongjin,
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > For topic level, you can only have a single
> > compression
> > > > type
> > > > > >> so
> > > > > >> > > the
> > > > > >> > > > >> way
> > > > > >> > > > >> > it
> > > > > >> > > > >> > > was before was fine, right? The point you raise is
> > how
> > > > to set
> > > > > >> > > broker
> > > > > >> > > > >> > > defaults that vary depending on the compression
> type,
> > > > > >> correct?
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > Ismael
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <
> > > > > >> > dongjin@apache.org>
> > > > > >> > > > >> wrote:
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > > I just realized that there was a missing hole in
> > the
> > > > KIP,
> > > > > >> so I
> > > > > >> > > > fixed
> > > > > >> > > > >> > it.
> > > > > >> > > > >> > > > The draft implementation will be updated soon.
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > In short, the proposed change did not regard the
> > case
> > > > of
> > > > > >> the
> > > > > >> > > topic
> > > > > >> > > > >> or
> > > > > >> > > > >> > > > broker's 'compression.type' is 'producer'; in
> this
> > > > case,
> > > > > >> the
> > > > > >> > > > broker
> > > > > >> > > > >> has
> > > > > >> > > > >> > > to
> > > > > >> > > > >> > > > handle all kinds of the supported codec. So I
> added
> > > > > >> additional
> > > > > >> > > > >> options
> > > > > >> > > > >> > > > (compression.[gzip,snappy,lz4, zstd].level,
> > > > > >> > > > >> > compression.[gzip,snappy,lz4,
> > > > > >> > > > >> > > > zstd].buffer.size) with handling routines.
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > Please have a look when you are free.
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > Thanks,
> > > > > >> > > > >> > > > Dongjin
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <
> > > > > >> > dongjin@apache.org>
> > > > > >> > > > >> wrote:
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > > Thanks for pointing out Ismael. It's now
> updated.
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > Best,
> > > > > >> > > > >> > > > > Dongjin
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <
> > > > > >> > ismaelj@gmail.com
> > > > > >> > > >
> > > > > >> > > > >> > wrote:
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> Thanks Dongjin. One minor suggestion: we
> should
> > > > mention
> > > > > >> > that
> > > > > >> > > > the
> > > > > >> > > > >> > > broker
> > > > > >> > > > >> > > > >> side configs are also topic configs (i.e. can
> be
> > > set
> > > > > >> for a
> > > > > >> > > > given
> > > > > >> > > > >> > > topic).
> > > > > >> > > > >> > > > >>
> > > > > >> > > > >> > > > >> Ismael
> > > > > >> > > > >> > > > >>
> > > > > >> > > > >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <
> > > > > >> > > dongjin@apache.org
> > > > > >> > > > >> > wrote:
> > > > > >> > > > >> > > > >>
> > > > > >> > > > >> > > > >> > Happy new year.
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> > I just updated the title and contents of KIP
> > and
> > > > Jira
> > > > > >> > > issue,
> > > > > >> > > > >> with
> > > > > >> > > > >> > > > >> updated
> > > > > >> > > > >> > > > >> > draft implementation. Now both of
> compression
> > > > level
> > > > > >> and
> > > > > >> > > > buffer
> > > > > >> > > > >> > size
> > > > > >> > > > >> > > > >> options
> > > > > >> > > > >> > > > >> > are available to producer and broker
> > > > configuration.
> > > > > >> You
> > > > > >> > can
> > > > > >> > > > >> check
> > > > > >> > > > >> > > the
> > > > > >> > > > >> > > > >> > updated KIP from modified URL:
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >>
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> > Please have a look when you are free.
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> > Thanks,
> > > > > >> > > > >> > > > >> > Dongjin
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma
> <
> > > > > >> > > > ismaelj@gmail.com
> > > > > >> > > > >> >
> > > > > >> > > > >> > > > wrote:
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> > > The updated title sounds fine to me.
> > > > > >> > > > >> > > > >> > >
> > > > > >> > > > >> > > > >> > > Ismael
> > > > > >> > > > >> > > > >> > >
> > > > > >> > > > >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
> > > > > >> > > > dongjin@apache.org
> > > > > >> > > > >> > > wrote:
> > > > > >> > > > >> > > > >> > >
> > > > > >> > > > >> > > > >> > > > Hi Ismael,
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > > Got it. Your direction is perfectly
> > > > reasonable. I
> > > > > >> am
> > > > > >> > > now
> > > > > >> > > > >> > > updating
> > > > > >> > > > >> > > > >> the
> > > > > >> > > > >> > > > >> > KIP
> > > > > >> > > > >> > > > >> > > > document and the implementation.
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > > By allowing the buffer/block size to be
> > > > > >> configurable,
> > > > > >> > > it
> > > > > >> > > > >> would
> > > > > >> > > > >> > > be
> > > > > >> > > > >> > > > >> > better
> > > > > >> > > > >> > > > >> > > to
> > > > > >> > > > >> > > > >> > > > update the title of the KIP like 'Allow
> > > > > >> fine-grained
> > > > > >> > > > >> > > configuration
> > > > > >> > > > >> > > > >> for
> > > > > >> > > > >> > > > >> > > > compression'. Is that right?
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > > @Other committers:
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > > Is there any other opinion on allowing
> the
> > > > > >> > buffer/block
> > > > > >> > > > >> size
> > > > > >> > > > >> > to
> > > > > >> > > > >> > > be
> > > > > >> > > > >> > > > >> > > > configurable?
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > > Thanks,
> > > > > >> > > > >> > > > >> > > > Dongjin
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael
> > Juma
> > > <
> > > > > >> > > > >> > ismael@juma.me.uk>
> > > > > >> > > > >> > > > >> wrote:
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > > > Hi Dongjin,
> > > > > >> > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> > > > > To clarify, I mean a broker topic
> config
> > > > with
> > > > > >> > regards
> > > > > >> > > > to
> > > > > >> > > > >> > point
> > > > > >> > > > >> > > > 1.
> > > > > >> > > > >> > > > >> As
> > > > > >> > > > >> > > > >> > > you
> > > > > >> > > > >> > > > >> > > > > know, compression can be done by the
> > > > producer
> > > > > >> > and/or
> > > > > >> > > by
> > > > > >> > > > >> the
> > > > > >> > > > >> > > > >> broker.
> > > > > >> > > > >> > > > >> > The
> > > > > >> > > > >> > > > >> > > > > default is for the broker to just use
> > > > whatever
> > > > > >> > > > >> compression
> > > > > >> > > > >> > was
> > > > > >> > > > >> > > > >> used
> > > > > >> > > > >> > > > >> > by
> > > > > >> > > > >> > > > >> > > > the
> > > > > >> > > > >> > > > >> > > > > producer, but this can be changed by
> the
> > > > user
> > > > > >> on a
> > > > > >> > > per
> > > > > >> > > > >> topic
> > > > > >> > > > >> > > > >> basis.
> > > > > >> > > > >> > > > >> > It
> > > > > >> > > > >> > > > >> > > > > seems like it would make sense for the
> > > > configs
> > > > > >> to
> > > > > >> > be
> > > > > >> > > .
> > > > > >> > > > >> > > > consistent
> > > > > >> > > > >> > > > >> > > between
> > > > > >> > > > >> > > > >> > > > > producer and broker.
> > > > > >> > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> > > > > For point 2, I haven't looked at the
> > > > > >> > implementation,
> > > > > >> > > > but
> > > > > >> > > > >> we
> > > > > >> > > > >> > > > could
> > > > > >> > > > >> > > > >> do
> > > > > >> > > > >> > > > >> > it
> > > > > >> > > > >> > > > >> > > > in
> > > > > >> > > > >> > > > >> > > > > the `CompressionType` enum by invoking
> > the
> > > > right
> > > > > >> > > > >> constructor
> > > > > >> > > > >> > > or
> > > > > >> > > > >> > > > >> > > > retrieving
> > > > > >> > > > >> > > > >> > > > > the default value via a constant (if
> > > > defined).
> > > > > >> > That's
> > > > > >> > > > an
> > > > > >> > > > >> > > > >> > implementation
> > > > > >> > > > >> > > > >> > > > > detail and can be discussed in the PR.
> > The
> > > > more
> > > > > >> > > general
> > > > > >> > > > >> > point
> > > > > >> > > > >> > > is
> > > > > >> > > > >> > > > >> to
> > > > > >> > > > >> > > > >> > > rely
> > > > > >> > > > >> > > > >> > > > on
> > > > > >> > > > >> > > > >> > > > > the library defaults instead of
> choosing
> > > one
> > > > > >> > > ourselves.
> > > > > >> > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> > > > > For point 3, I'm in favour of doing
> that
> > > in
> > > > this
> > > > > >> > KIP.
> > > > > >> > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> > > > > Ismael
> > > > > >> > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM
> Dongjin
> > > Lee
> > > > <
> > > > > >> > > > >> > > dongjin@apache.org
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> > > wrote:
> > > > > >> > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> > > > > > Thank you Ismael, here are the
> > answers:
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > *1. About topic config*
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > After some consideration, I
> concluded
> > > that
> > > > > >> topic
> > > > > >> > > > config
> > > > > >> > > > >> > > > doesn't
> > > > > >> > > > >> > > > >> > need
> > > > > >> > > > >> > > > >> > > to
> > > > > >> > > > >> > > > >> > > > > > support compression.level. Here is
> > why:
> > > > since
> > > > > >> the
> > > > > >> > > > >> > > compression
> > > > > >> > > > >> > > > is
> > > > > >> > > > >> > > > >> > > > > conducted
> > > > > >> > > > >> > > > >> > > > > > by the client, the one who can
> select
> > > the
> > > > best
> > > > > >> > > > >> compression
> > > > > >> > > > >> > > > >> level is
> > > > > >> > > > >> > > > >> > > the
> > > > > >> > > > >> > > > >> > > > > > client itself. Let us assume that
> the
> > > > > >> compression
> > > > > >> > > > >> level is
> > > > > >> > > > >> > > set
> > > > > >> > > > >> > > > >> at
> > > > > >> > > > >> > > > >> > the
> > > > > >> > > > >> > > > >> > > > > topic
> > > > > >> > > > >> > > > >> > > > > > config level. In that case, there
> is a
> > > > > >> > possibility
> > > > > >> > > > that
> > > > > >> > > > >> > the
> > > > > >> > > > >> > > > >> > > compression
> > > > > >> > > > >> > > > >> > > > > > level is not optimal for some
> > producers.
> > > > > >> > Actually,
> > > > > >> > > > >> Kafka's
> > > > > >> > > > >> > > go
> > > > > >> > > > >> > > > >> > client
> > > > > >> > > > >> > > > >> > > > also
> > > > > >> > > > >> > > > >> > > > > > supports compression level
> > functionality
> > > > for
> > > > > >> the
> > > > > >> > > > >> producer
> > > > > >> > > > >> > > > config
> > > > > >> > > > >> > > > >> > > only.
> > > > > >> > > > >> > > > >> > > > > > <
> > > > > >> > > > >>
> https://github.com/Shopify/sarama/blob/master/config.go>
> > > > > >> > > > >> > > > >> (wait,
> > > > > >> > > > >> > > > >> > do
> > > > > >> > > > >> > > > >> > > we
> > > > > >> > > > >> > > > >> > > > > > need
> > > > > >> > > > >> > > > >> > > > > > to add this reasoning in the KIP,
> > > rejected
> > > > > >> > > > alternatives
> > > > > >> > > > >> > > > >> section?)
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > *2. About default level*
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > As of current draft implementation,
> > the
> > > > > >> default
> > > > > >> > > > >> > compression
> > > > > >> > > > >> > > is
> > > > > >> > > > >> > > > >> set
> > > > > >> > > > >> > > > >> > on
> > > > > >> > > > >> > > > >> > > > the
> > > > > >> > > > >> > > > >> > > > > > CompressionType enum. Of course,
> > > changing
> > > > this
> > > > > >> > > > strategy
> > > > > >> > > > >> > into
> > > > > >> > > > >> > > > >> > relying
> > > > > >> > > > >> > > > >> > > > on a
> > > > > >> > > > >> > > > >> > > > > > method from the library to pick the
> > > > default
> > > > > >> > > > compression
> > > > > >> > > > >> > > level
> > > > > >> > > > >> > > > >> seems
> > > > > >> > > > >> > > > >> > > > > > possible, like
> `GZIPBlockOutputStream`
> > > > does.
> > > > > >> In
> > > > > >> > > this
> > > > > >> > > > >> case,
> > > > > >> > > > >> > > we
> > > > > >> > > > >> > > > >> need
> > > > > >> > > > >> > > > >> > to
> > > > > >> > > > >> > > > >> > > > add
> > > > > >> > > > >> > > > >> > > > > > similar wrapper class for zstd and
> > > modify
> > > > lz4
> > > > > >> the
> > > > > >> > > > >> wrapper
> > > > > >> > > > >> > > > also.
> > > > > >> > > > >> > > > >> Add
> > > > > >> > > > >> > > > >> > > to
> > > > > >> > > > >> > > > >> > > > > > this, it seems like we need to
> > > explicitly
> > > > > >> state
> > > > > >> > > that
> > > > > >> > > > we
> > > > > >> > > > >> > > follow
> > > > > >> > > > >> > > > >> the
> > > > > >> > > > >> > > > >> > > > > default
> > > > > >> > > > >> > > > >> > > > > > compression level of the codec in
> the
> > > > > >> > > documentation.
> > > > > >> > > > Is
> > > > > >> > > > >> > this
> > > > > >> > > > >> > > > >> what
> > > > > >> > > > >> > > > >> > you
> > > > > >> > > > >> > > > >> > > > > > intended?
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > *3. Whether to allow the
> buffer/block
> > > > size to
> > > > > >> be
> > > > > >> > > > >> > > configurable*
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > Well, As of current draft
> > > implementation,
> > > > the
> > > > > >> lz4
> > > > > >> > > > >> level is
> > > > > >> > > > >> > > > >> > > implemented
> > > > > >> > > > >> > > > >> > > > as
> > > > > >> > > > >> > > > >> > > > > > block size; this is caused by my
> > > > > >> misunderstanding
> > > > > >> > > on
> > > > > >> > > > >> lz4.
> > > > > >> > > > >> > > > After
> > > > > >> > > > >> > > > >> > > > reviewing
> > > > > >> > > > >> > > > >> > > > > > lz4 today, I found that it also
> > supports
> > > > > >> > > compression
> > > > > >> > > > >> level
> > > > > >> > > > >> > > of
> > > > > >> > > > >> > > > >> 1~16
> > > > > >> > > > >> > > > >> > > > > > (default: 1), not block size. I will
> > fix
> > > > it in
> > > > > >> > this
> > > > > >> > > > >> > weekend
> > > > > >> > > > >> > > by
> > > > > >> > > > >> > > > >> > > updating
> > > > > >> > > > >> > > > >> > > > > the
> > > > > >> > > > >> > > > >> > > > > > wrapper class.
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > For the problem of the buffer/block
> > > size,
> > > > I
> > > > > >> have
> > > > > >> > no
> > > > > >> > > > >> strong
> > > > > >> > > > >> > > > >> opinion.
> > > > > >> > > > >> > > > >> > > If
> > > > > >> > > > >> > > > >> > > > > the
> > > > > >> > > > >> > > > >> > > > > > community needs it, I will do it all
> > > > together.
> > > > > >> > How
> > > > > >> > > do
> > > > > >> > > > >> you
> > > > > >> > > > >> > > > think?
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > In short, it seems like I need to
> > update
> > > > the
> > > > > >> KIP
> > > > > >> > > > >> document
> > > > > >> > > > >> > > for
> > > > > >> > > > >> > > > >> issue
> > > > > >> > > > >> > > > >> > > #1
> > > > > >> > > > >> > > > >> > > > > and
> > > > > >> > > > >> > > > >> > > > > > update the compression wrapper for
> > issue
> > > > #2,
> > > > > >> #3.
> > > > > >> > Is
> > > > > >> > > > >> this
> > > > > >> > > > >> > > okay?
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > Thanks,
> > > > > >> > > > >> > > > >> > > > > > Dongjin
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM
> > Ismael
> > > > Juma <
> > > > > >> > > > >> > > > ismaelj@gmail.com
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> > > > wrote:
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > >  Thanks for the KIP, this is
> > helpful.
> > > A
> > > > few
> > > > > >> > > > >> questions:
> > > > > >> > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > >> > > > > > > 1. Have we considered whether we
> > want
> > > to
> > > > > >> allow
> > > > > >> > a
> > > > > >> > > > >> similar
> > > > > >> > > > >> > > > topic
> > > > > >> > > > >> > > > >> > > > config?
> > > > > >> > > > >> > > > >> > > > > > > 2. Can we rely on a method from
> the
> > > > library
> > > > > >> to
> > > > > >> > > pick
> > > > > >> > > > >> the
> > > > > >> > > > >> > > > >> default
> > > > > >> > > > >> > > > >> > > > > > compression
> > > > > >> > > > >> > > > >> > > > > > > level if compression.level is not
> > set?
> > > > We
> > > > > >> do it
> > > > > >> > > for
> > > > > >> > > > >> gzip
> > > > > >> > > > >> > > and
> > > > > >> > > > >> > > > >> it
> > > > > >> > > > >> > > > >> > > would
> > > > > >> > > > >> > > > >> > > > > > seem
> > > > > >> > > > >> > > > >> > > > > > > reasonable to do something similar
> > for
> > > > the
> > > > > >> > other
> > > > > >> > > > >> > > compression
> > > > > >> > > > >> > > > >> > > > libraries.
> > > > > >> > > > >> > > > >> > > > > > > 3. Do we want to allow the
> > > buffer/block
> > > > > >> size to
> > > > > >> > > be
> > > > > >> > > > >> > > > >> configurable?
> > > > > >> > > > >> > > > >> > > This
> > > > > >> > > > >> > > > >> > > > > has
> > > > > >> > > > >> > > > >> > > > > > > an impact on memory usage and
> people
> > > may
> > > > > >> want
> > > > > >> > to
> > > > > >> > > > >> trade
> > > > > >> > > > >> > > > >> > compression
> > > > > >> > > > >> > > > >> > > > for
> > > > > >> > > > >> > > > >> > > > > > > less/more memory in some cases.
> For
> > > > example,
> > > > > >> > the
> > > > > >> > > > >> default
> > > > > >> > > > >> > > for
> > > > > >> > > > >> > > > >> LZ4
> > > > > >> > > > >> > > > >> > is
> > > > > >> > > > >> > > > >> > > > > 64KB
> > > > > >> > > > >> > > > >> > > > > > > which is a bit high.
> > > > > >> > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > >> > > > > > > Ismael
> > > > > >> > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM
> > Dongjin
> > > > Lee <
> > > > > >> > > > >> > > > dongjin@apache.org
> > > > > >> > > > >> > > > >> > > wrote:
> > > > > >> > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > >> > > > > > > > Hello dev,
> > > > > >> > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > >> > > > > > > > I hope to initiate the
> discussion
> > of
> > > > > >> KIP-390:
> > > > > >> > > Add
> > > > > >> > > > >> > > producer
> > > > > >> > > > >> > > > >> > option
> > > > > >> > > > >> > > > >> > > > to
> > > > > >> > > > >> > > > >> > > > > > > adjust
> > > > > >> > > > >> > > > >> > > > > > > > compression level
> > > > > >> > > > >> > > > >> > > > > > > > <
> > > > > >> > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > >
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >>
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > > > >> > > > >> > > > >> > > > > > > > >.
> > > > > >> > > > >> > > > >> > > > > > > > All feedbacks will be highly
> > > > appreciated.
> > > > > >> > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > >> > > > > > > > Best,
> > > > > >> > > > >> > > > >> > > > > > > > Dongjin
> > > > > >> > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > >> > > > > > > > --
> > > > > >> > > > >> > > > >> > > > > > > > *Dongjin Lee*
> > > > > >> > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > >> > > > > > > > *A hitchhiker in the
> mathematical
> > > > world.*
> > > > > >> > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > >> > > > > > > > *github:  <
> http://goog_969573159/
> > >
> > > > > >> > > > >> > > github.com/dongjinleekr
> > > > > >> > > > >> > > > >> > > > > > > > <http://github.com/dongjinleekr
> > > > >linkedin:
> > > > > >> > > > >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >> > > > >> > > > > > > > <
> > > > http://kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >> >slideshare:
> > > > > >> > > > >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> > > > > >> > > > >> > > > >> > > > > > > > <
> > > > http://www.slideshare.net/dongjinleekr>*
> > > > > >> > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > --
> > > > > >> > > > >> > > > >> > > > > > *Dongjin Lee*
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > > > *A hitchhiker in the mathematical
> > > world.*
> > > > > >> > > > >> > > > >> > > > > > *github:  <http://goog_969573159/>
> > > > > >> > > > >> github.com/dongjinleekr
> > > > > >> > > > >> > > > >> > > > > > <https://github.com/dongjinleekr
> > > > >linkedin:
> > > > > >> > > > >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >> > > > >> > > > > > <
> > > https://kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >speakerdeck:
> > > > > >> > > > >> > > > >> > > > > > speakerdeck.com/dongjin
> > > > > >> > > > >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> > > > > >> > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > > --
> > > > > >> > > > >> > > > >> > > > *Dongjin Lee*
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > > > *A hitchhiker in the mathematical
> world.*
> > > > > >> > > > >> > > > >> > > > *github:  <http://goog_969573159/>
> > > > > >> > > > github.com/dongjinleekr
> > > > > >> > > > >> > > > >> > > > <https://github.com/dongjinleekr
> > >linkedin:
> > > > > >> > > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >> > > > >> > > > <
> https://kr.linkedin.com/in/dongjinleekr
> > > > > >> > >speakerdeck:
> > > > > >> > > > >> > > > >> > > > speakerdeck.com/dongjin
> > > > > >> > > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > > > >> > > > >> > > > >> > > >
> > > > > >> > > > >> > > > >> > >
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> > --
> > > > > >> > > > >> > > > >> > *Dongjin Lee*
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >> > *A hitchhiker in the mathematical world.*
> > > > > >> > > > >> > > > >> > *github:  <http://goog_969573159/>
> > > > > >> > github.com/dongjinleekr
> > > > > >> > > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > > > >> > > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr
> > > > >speakerdeck:
> > > > > >> > > > >> > > > >> > speakerdeck.com/dongjin
> > > > > >> > > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > > > > >> > > > >> > > > >> >
> > > > > >> > > > >> > > > >>
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > --
> > > > > >> > > > >> > > > > *Dongjin Lee*
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > >> > > > >> > > > > *github:  <http://goog_969573159/>
> > > > > >> github.com/dongjinleekr
> > > > > >> > > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > > > > >> > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr
> > > > >speakerdeck:
> > > > > >> > > > >> > > > speakerdeck.com/dongjin
> > > > > >> > > > >> > > > > <https://speakerdeck.com/dongjin>*
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > --
> > > > > >> > > > >> > > > *Dongjin Lee*
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > >> > > > >> > > > *github:  <http://goog_969573159/>
> > > > github.com/dongjinleekr
> > > > > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
> > > >speakerdeck:
> > > > > >> > > > >> > > > speakerdeck.com/dongjin
> > > > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >> >
> > > > > >> > > > >> >
> > > > > >> > > > >> > --
> > > > > >> > > > >> > *Dongjin Lee*
> > > > > >> > > > >> >
> > > > > >> > > > >> > *A hitchhiker in the mathematical world.*
> > > > > >> > > > >> > *github:  <http://goog_969573159/>
> > > github.com/dongjinleekr
> > > > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr
> >speakerdeck:
> > > > > >> > > > >> > speakerdeck.com/dongjin
> > > > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > > > > >> > > > >> >
> > > > > >> > > > >>
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > --
> > > > > >> > > > > *Dongjin Lee*
> > > > > >> > > > >
> > > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > > >> > > > > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > > > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > > >> > > > speakerdeck.com/dongjin
> > > > > >> > > > > <https://speakerdeck.com/dongjin>*
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > > *Dongjin Lee*
> > > > > >> > > >
> > > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > > >> > > > speakerdeck.com/dongjin
> > > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >> >
> > > > > >> > --
> > > > > >> > *Dongjin Lee*
> > > > > >> >
> > > > > >> > *A hitchhiker in the mathematical world.*
> > > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > > > >> kr.linkedin.com/in/dongjinleekr
> > > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > > >> > speakerdeck.com/dongjin
> > > > > >> > <https://speakerdeck.com/dongjin>*
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *Dongjin Lee*
> > > > > >
> > > > > > *A hitchhiker in the mathematical world.*
> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > <https://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > speakerdeck.com/dongjin
> > > > > > <https://speakerdeck.com/dongjin>*
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <https://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > speakerdeck.com/dongjin
> > > > > <https://speakerdeck.com/dongjin>*
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <https://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > speakerdeck.com/dongjin
> > > <https://speakerdeck.com/dongjin>*
> > >
> >
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Hi Becket,

Understood. You are right, introducing map type is just a clearer way, but
not a strong prerequisite. It can be discussed later.

@Committers, PMC members:

Could you please give us some valuable feedbacks and suggestions for this
topic?


Thanks,

Dongjin

On Fri, Feb 1, 2019 at 11:37 AM Becket Qin <be...@gmail.com> wrote:

> Hi Dongjin,
>
> Thanks for the reply. I agree that implementation wise, it is clearer to
> add a new "Map" config type. However, practically speaking I don't think
> this KIP strongly depends on that. As long as we follow the same string
> format in the configuration for all those map-like configurations, whether
> to have a Map config type or not seems not that important.
>
> Given that the existing configuration uses the format of
> "KEY:VALUE[,KEY:VALUE]", we could just use this string format.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Thu, Jan 31, 2019 at 7:47 PM Dongjin Lee <do...@apache.org> wrote:
>
> > Mickael,
> >
> > It seems like the majority of the community agrees that the new config
> > scheme proposed by Becket is much better. However, we still need another
> > KIP to support this kind of config; it is different from the case of
> > `listener.security.protocol.map,` which is just a concatenation of
> > available settings (i.e., string type.) For Becket's idea, we need to add
> > support for the key-value style.
> >
> > It seems like we need advice from a committer, whether to add a new
> > prerequisite KIP for the map type configuration to implement Becket's
> idea.
> >
> > Committers:
> >
> > Could you give us some advice on this problem?
> >
> > Best,
> > Dongjin
> >
> > On Wed, Jan 30, 2019 at 11:30 PM Mickael Maison <
> mickael.maison@gmail.com>
> > wrote:
> >
> > > Thanks, that's a very interesting KIP!
> > >
> > > I agree with Becket that a clearer config format is likely to help.
> > > Have you considered using a config format similar to listeners (as
> > > described in the "Updating SSL Keystore of an Existing Listener" in
> > > the Kafka docs)?
> > >
> > > Also worth noting that we already have configs that effectively take a
> > > map, for example: listener.security.protocol.map so not sure if we
> > > need an additional KIP
> > >
> > > On Wed, Jan 30, 2019 at 5:22 AM Dongjin Lee <do...@apache.org>
> wrote:
> > > >
> > > > Hello.
> > > >
> > > > Do you have any idea on Becket's Idea of new config format (example
> > > below)?
> > > >
> > > > ```
> > > > compression.config="gzip.compression.level=5,
> lz4.compression.level=17,
> > > > zstd.compression.level=22"
> > > > ```
> > > >
> > > > It requires some additional KIP for supporting new config format
> (map),
> > > but
> > > > it can significantly simplify the configuration with flexibility and
> > > > extensibility. If you prefer this way, I hope to carry the ball.
> > > >
> > > > If not, please give me an opinion here or the voting thread.
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > >
> > > > On Fri, Jan 25, 2019 at 1:25 AM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >
> > > > > Hi Becket,
> > > > >
> > > > > Thank you for your opinion. Frankly, I have no strong opinion on
> > > > > configuration name. In this problem, I will follow the community's
> > > choice.
> > > > > (I like your idea in that it has a notion of 'scope' per
> compression
> > > codec.
> > > > > However, it should be implemented on top of new config type like
> Map;
> > > It
> > > > > will require another KIP as a prerequisite, but if the community
> > prefer
> > > > > this way, I will take the task.)
> > > > >
> > > > > (One minor correction: the one who thought 'producer' compression
> > > config
> > > > > would cause a problem at broker was me, not Ismael - and Ismael
> > > reassured
> > > > > me there will be no problem with it.)
> > > > >
> > > > > To All,
> > > > >
> > > > > How about Becket's idea of 'compression.config' option?
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > On Wed, Jan 23, 2019 at 1:16 PM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hi Dongjin,
> > > > >>
> > > > >> Thanks for the KIP and sorry for being a bit late on the
> discussion.
> > > > >>
> > > > >> It makes sense to expose the configuration for compression types.
> > But
> > > I am
> > > > >> wondering if there is a better way to do that than what proposed
> in
> > > the
> > > > >> KIP. What I feel confusing is that we are effectively sharing the
> > > > >> configuration across different compression types, the meaning of
> the
> > > > >> configuration are actually kind of different depending on the
> > > compression
> > > > >> type. This will end up with issues like what Ismael has brought up
> > > > >> earlier.
> > > > >> Say if the broker has compression type of producer (this may
> result
> > in
> > > > >> mixed compression type in the same topic), and for some reason the
> > > broker
> > > > >> needs to re-compress the topic (e.g. due to log compaction), a
> > single
> > > > >> topic
> > > > >> level compression config may not work, because a valid compression
> > > level
> > > > >> for lz4 maybe invalid for gzip.
> > > > >>
> > > > >> One alternative I am thinking is to provide a "compression.config"
> > > > >> configuration, inside which it specifies configuration used by
> each
> > > > >> specific compression type as k-v pairs. The format could use some
> > name
> > > > >> space as well. For example,
> > > > >>
> > > > >>
> > > > >>
> > >
> >
> compression.config="gzip.compression.level=5,lz4.compression.level=17,zstd.compression.level=22".
> > > > >>
> > > > >> Each compression type will just pick whatever configuration they
> > need
> > > from
> > > > >> the k-v pairs defined in this config.
> > > > >>
> > > > >> Besides clarity, some other benefits are:
> > > > >> 1. Extensibility, we don't have to add more configuration when we
> > add
> > > new
> > > > >> compression types, or expose new config for a particular
> compression
> > > type.
> > > > >> 2. Even for the same config, different compression type may have
> > > different
> > > > >> terminologies. With the approach we can honor those terminologies
> > > instead
> > > > >> of shoehorning them into the same configuration name.
> > > > >>
> > > > >> What do you think?
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Jiangjie (Becket) Qin
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Wed, Jan 23, 2019 at 12:07 AM Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >>
> > > > >> > Hello. I just fixed the draft implementation, with rebasing onto
> > the
> > > > >> latest
> > > > >> > trunk. The KIP was also restored.
> > > > >> >
> > > > >> > Please have a look, and if there is no major problem, please
> vote
> > > to the
> > > > >> > voting thread. You know, KIP freeze for 2.2.0 is almost
> imminent.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Dongjin
> > > > >> >
> > > > >> > On Tue, Jan 22, 2019 at 1:04 AM Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > > >> >
> > > > >> > > Thanks!
> > > > >> > >
> > > > >> > > Ismael
> > > > >> > >
> > > > >> > > On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <
> dongjin@apache.org
> > >
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > Hi Ismael,
> > > > >> > > >
> > > > >> > > > After reviewing
> > > > >> > > `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
> > > > >> > > > yes, you are right. If source codec and target codec is
> > > identical
> > > > >> and
> > > > >> > the
> > > > >> > > > magic is above 0, the broker can do an in-place assignment,
> > > without
> > > > >> > > > recompressing. Sorry for my misunderstanding.
> > > > >> > > >
> > > > >> > > > Since we don't need `compression.[gzip,lz4,zstd].level` and
> > > > >> > > > `compression.[gzip,snappy,lz4].buffer.size` anymore, I will
> > > revert
> > > > >> > those
> > > > >> > > > changes and update the KIP with the recent discussions. I
> will
> > > > >> complete
> > > > >> > > it
> > > > >> > > > in 48 hours from now.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Dongjin
> > > > >> > > >
> > > > >> > > > On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <
> > dongjin@apache.org
> > > >
> > > > >> > wrote:
> > > > >> > > >
> > > > >> > > > > I see. Let me have a check. If not needed, of course, we
> > don't
> > > > >> have
> > > > >> > to
> > > > >> > > > > waste on configuration options.
> > > > >> > > > >
> > > > >> > > > > Since the KIP deadline is imminent, I just opened the
> voting
> > > > >> thread.
> > > > >> > > > Let's
> > > > >> > > > > continue the discussion here.
> > > > >> > > > >
> > > > >> > > > > Best,
> > > > >> > > > > Dongjin
> > > > >> > > > >
> > > > >> > > > > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <
> > > ismael@juma.me.uk>
> > > > >> > wrote:
> > > > >> > > > >
> > > > >> > > > >> Hi Dongjin,
> > > > >> > > > >>
> > > > >> > > > >> When the compression type is "producer", then the broker
> > > doesn't
> > > > >> > > > >> recompress
> > > > >> > > > >> though. Thinking about it some more, there are some
> > uncommon
> > > > >> cases
> > > > >> > > where
> > > > >> > > > >> recompression does happen (the old (and hopefully hardly
> > > used by
> > > > >> > now)
> > > > >> > > > >> message format == 0 and some edge cases), so it is a good
> > > point
> > > > >> you
> > > > >> > > > >> raised.
> > > > >> > > > >>
> > > > >> > > > >> It's a bit unfortunate to add so many topic configs for
> > cases
> > > > >> that
> > > > >> > > > >> probably
> > > > >> > > > >> don't matter. That is, if you are using "producer"
> > > compression,
> > > > >> you
> > > > >> > > > >> probably don't need to configure these settings and can
> > live
> > > with
> > > > >> > the
> > > > >> > > > >> defaults. Perhaps we should only support the topic config
> > > for the
> > > > >> > > cases
> > > > >> > > > >> where you are actually recompressing in the broker.
> > > > >> > > > >>
> > > > >> > > > >> What do you think? I'd be interested in other people's
> > > thoughts
> > > > >> too.
> > > > >> > > > >>
> > > > >> > > > >> Ismael
> > > > >> > > > >>
> > > > >> > > > >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <
> > > dongjin@apache.org>
> > > > >> > > wrote:
> > > > >> > > > >>
> > > > >> > > > >> > Hi Ismael,
> > > > >> > > > >> >
> > > > >> > > > >> > It seems like it needs more explanation. Here is the
> > > detailed
> > > > >> > > > reasoning.
> > > > >> > > > >> >
> > > > >> > > > >> > You know, topic and broker's 'compression.type' allows
> > > > >> > > 'uncompressed',
> > > > >> > > > >> > 'producer' with standard codecs (i.e., gzip, snappy,
> lz4,
> > > > >> zstd.)
> > > > >> > And
> > > > >> > > > >> this
> > > > >> > > > >> > configuration is used by the broker in the
> re-compressing
> > > > >> process
> > > > >> > > > after
> > > > >> > > > >> > offset assignment. After this feature, the new configs,
> > > > >> > > > >> 'compression.level'
> > > > >> > > > >> > and 'compression.buffer.size', also will be used in
> this
> > > > >> process.
> > > > >> > > > >> >
> > > > >> > > > >> > The problem arises when given topic's compression type
> > > > >> (whether it
> > > > >> > > was
> > > > >> > > > >> > inherited from broker's configuration or explicitly
> set)
> > is
> > > > >> > > > 'producer.'
> > > > >> > > > >> > With this setting, the compression codec to be used is
> > > decided
> > > > >> by
> > > > >> > > the
> > > > >> > > > >> > producer client. Since there is no way to restore the
> > > > >> compression
> > > > >> > > > level
> > > > >> > > > >> and
> > > > >> > > > >> > buffer size from the message, we can take the following
> > > > >> > strategies:
> > > > >> > > > >> >
> > > > >> > > > >> > 1. Just use given 'compression.level' and
> > > > >> > 'compression.buffer.size'
> > > > >> > > > >> > settings.
> > > > >> > > > >> >
> > > > >> > > > >> > It will cause so many errors. Let's imagine the case of
> > > topic's
> > > > >> > > > >> > configuration is { compression.type=producer,
> > > > >> > compression.level=10,
> > > > >> > > > >> > compression.buffer.size=8192 }. In this case, all
> > producers
> > > > >> with
> > > > >> > > gzip
> > > > >> > > > or
> > > > >> > > > >> > lz4 compressed messages will result in an error. (gzip
> > > doesn't
> > > > >> > allow
> > > > >> > > > >> > compression level 10, and lz4 also for a buffer size of
> > > 8192.)
> > > > >> > > > >> >
> > > > >> > > > >> > 2. Extend the message format to include compression
> > > > >> > configurations.
> > > > >> > > > >> >
> > > > >> > > > >> > With this strategy, we need to change the message
> format
> > -
> > > > >> it's a
> > > > >> > > too
> > > > >> > > > >> big
> > > > >> > > > >> > change.
> > > > >> > > > >> >
> > > > >> > > > >> > 3. If topic's compression.type is 'producer', use the
> > > default
> > > > >> > > > >> configuration
> > > > >> > > > >> > for the given codec.
> > > > >> > > > >> >
> > > > >> > > > >> > With this strategy, allowing fine-grained compression
> > > > >> > configuration
> > > > >> > > is
> > > > >> > > > >> > meaningless.
> > > > >> > > > >> >
> > > > >> > > > >> > For the above reasons, I think the only alternative is
> > > > >> providing
> > > > >> > > > options
> > > > >> > > > >> > that can be used when the topic's 'compression.type' is
> > > > >> > 'producer.'
> > > > >> > > In
> > > > >> > > > >> > other words, adding compression.[gzip, lz4, zstd].level
> > and
> > > > >> > > > >> > compression.[gzip.snappy.lz4].buffer.size options - and
> > it
> > > is
> > > > >> > what I
> > > > >> > > > >> did in
> > > > >> > > > >> > the last modification.
> > > > >> > > > >> >
> > > > >> > > > >> > (wait, the reasoning above should be included in the
> KIP
> > > in the
> > > > >> > > > rejected
> > > > >> > > > >> > alternatives section, isn't it?)
> > > > >> > > > >> >
> > > > >> > > > >> > Thanks,
> > > > >> > > > >> > Dongjin
> > > > >> > > > >> >
> > > > >> > > > >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <
> > > ismael@juma.me.uk
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > >> >
> > > > >> > > > >> > > Hi Dongjin,
> > > > >> > > > >> > >
> > > > >> > > > >> > > For topic level, you can only have a single
> compression
> > > type
> > > > >> so
> > > > >> > > the
> > > > >> > > > >> way
> > > > >> > > > >> > it
> > > > >> > > > >> > > was before was fine, right? The point you raise is
> how
> > > to set
> > > > >> > > broker
> > > > >> > > > >> > > defaults that vary depending on the compression type,
> > > > >> correct?
> > > > >> > > > >> > >
> > > > >> > > > >> > > Ismael
> > > > >> > > > >> > >
> > > > >> > > > >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <
> > > > >> > dongjin@apache.org>
> > > > >> > > > >> wrote:
> > > > >> > > > >> > >
> > > > >> > > > >> > > > I just realized that there was a missing hole in
> the
> > > KIP,
> > > > >> so I
> > > > >> > > > fixed
> > > > >> > > > >> > it.
> > > > >> > > > >> > > > The draft implementation will be updated soon.
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > In short, the proposed change did not regard the
> case
> > > of
> > > > >> the
> > > > >> > > topic
> > > > >> > > > >> or
> > > > >> > > > >> > > > broker's 'compression.type' is 'producer'; in this
> > > case,
> > > > >> the
> > > > >> > > > broker
> > > > >> > > > >> has
> > > > >> > > > >> > > to
> > > > >> > > > >> > > > handle all kinds of the supported codec. So I added
> > > > >> additional
> > > > >> > > > >> options
> > > > >> > > > >> > > > (compression.[gzip,snappy,lz4, zstd].level,
> > > > >> > > > >> > compression.[gzip,snappy,lz4,
> > > > >> > > > >> > > > zstd].buffer.size) with handling routines.
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > Please have a look when you are free.
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > Thanks,
> > > > >> > > > >> > > > Dongjin
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <
> > > > >> > dongjin@apache.org>
> > > > >> > > > >> wrote:
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > > Thanks for pointing out Ismael. It's now updated.
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > Best,
> > > > >> > > > >> > > > > Dongjin
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <
> > > > >> > ismaelj@gmail.com
> > > > >> > > >
> > > > >> > > > >> > wrote:
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> Thanks Dongjin. One minor suggestion: we should
> > > mention
> > > > >> > that
> > > > >> > > > the
> > > > >> > > > >> > > broker
> > > > >> > > > >> > > > >> side configs are also topic configs (i.e. can be
> > set
> > > > >> for a
> > > > >> > > > given
> > > > >> > > > >> > > topic).
> > > > >> > > > >> > > > >>
> > > > >> > > > >> > > > >> Ismael
> > > > >> > > > >> > > > >>
> > > > >> > > > >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <
> > > > >> > > dongjin@apache.org
> > > > >> > > > >> > wrote:
> > > > >> > > > >> > > > >>
> > > > >> > > > >> > > > >> > Happy new year.
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> > I just updated the title and contents of KIP
> and
> > > Jira
> > > > >> > > issue,
> > > > >> > > > >> with
> > > > >> > > > >> > > > >> updated
> > > > >> > > > >> > > > >> > draft implementation. Now both of compression
> > > level
> > > > >> and
> > > > >> > > > buffer
> > > > >> > > > >> > size
> > > > >> > > > >> > > > >> options
> > > > >> > > > >> > > > >> > are available to producer and broker
> > > configuration.
> > > > >> You
> > > > >> > can
> > > > >> > > > >> check
> > > > >> > > > >> > > the
> > > > >> > > > >> > > > >> > updated KIP from modified URL:
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >>
> > > > >> > > > >> > > >
> > > > >> > > > >> > >
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> > Please have a look when you are free.
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> > Thanks,
> > > > >> > > > >> > > > >> > Dongjin
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <
> > > > >> > > > ismaelj@gmail.com
> > > > >> > > > >> >
> > > > >> > > > >> > > > wrote:
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> > > The updated title sounds fine to me.
> > > > >> > > > >> > > > >> > >
> > > > >> > > > >> > > > >> > > Ismael
> > > > >> > > > >> > > > >> > >
> > > > >> > > > >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
> > > > >> > > > dongjin@apache.org
> > > > >> > > > >> > > wrote:
> > > > >> > > > >> > > > >> > >
> > > > >> > > > >> > > > >> > > > Hi Ismael,
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > > Got it. Your direction is perfectly
> > > reasonable. I
> > > > >> am
> > > > >> > > now
> > > > >> > > > >> > > updating
> > > > >> > > > >> > > > >> the
> > > > >> > > > >> > > > >> > KIP
> > > > >> > > > >> > > > >> > > > document and the implementation.
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > > By allowing the buffer/block size to be
> > > > >> configurable,
> > > > >> > > it
> > > > >> > > > >> would
> > > > >> > > > >> > > be
> > > > >> > > > >> > > > >> > better
> > > > >> > > > >> > > > >> > > to
> > > > >> > > > >> > > > >> > > > update the title of the KIP like 'Allow
> > > > >> fine-grained
> > > > >> > > > >> > > configuration
> > > > >> > > > >> > > > >> for
> > > > >> > > > >> > > > >> > > > compression'. Is that right?
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > > @Other committers:
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > > Is there any other opinion on allowing the
> > > > >> > buffer/block
> > > > >> > > > >> size
> > > > >> > > > >> > to
> > > > >> > > > >> > > be
> > > > >> > > > >> > > > >> > > > configurable?
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > > Thanks,
> > > > >> > > > >> > > > >> > > > Dongjin
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael
> Juma
> > <
> > > > >> > > > >> > ismael@juma.me.uk>
> > > > >> > > > >> > > > >> wrote:
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > > > Hi Dongjin,
> > > > >> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> > > > > To clarify, I mean a broker topic config
> > > with
> > > > >> > regards
> > > > >> > > > to
> > > > >> > > > >> > point
> > > > >> > > > >> > > > 1.
> > > > >> > > > >> > > > >> As
> > > > >> > > > >> > > > >> > > you
> > > > >> > > > >> > > > >> > > > > know, compression can be done by the
> > > producer
> > > > >> > and/or
> > > > >> > > by
> > > > >> > > > >> the
> > > > >> > > > >> > > > >> broker.
> > > > >> > > > >> > > > >> > The
> > > > >> > > > >> > > > >> > > > > default is for the broker to just use
> > > whatever
> > > > >> > > > >> compression
> > > > >> > > > >> > was
> > > > >> > > > >> > > > >> used
> > > > >> > > > >> > > > >> > by
> > > > >> > > > >> > > > >> > > > the
> > > > >> > > > >> > > > >> > > > > producer, but this can be changed by the
> > > user
> > > > >> on a
> > > > >> > > per
> > > > >> > > > >> topic
> > > > >> > > > >> > > > >> basis.
> > > > >> > > > >> > > > >> > It
> > > > >> > > > >> > > > >> > > > > seems like it would make sense for the
> > > configs
> > > > >> to
> > > > >> > be
> > > > >> > > .
> > > > >> > > > >> > > > consistent
> > > > >> > > > >> > > > >> > > between
> > > > >> > > > >> > > > >> > > > > producer and broker.
> > > > >> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> > > > > For point 2, I haven't looked at the
> > > > >> > implementation,
> > > > >> > > > but
> > > > >> > > > >> we
> > > > >> > > > >> > > > could
> > > > >> > > > >> > > > >> do
> > > > >> > > > >> > > > >> > it
> > > > >> > > > >> > > > >> > > > in
> > > > >> > > > >> > > > >> > > > > the `CompressionType` enum by invoking
> the
> > > right
> > > > >> > > > >> constructor
> > > > >> > > > >> > > or
> > > > >> > > > >> > > > >> > > > retrieving
> > > > >> > > > >> > > > >> > > > > the default value via a constant (if
> > > defined).
> > > > >> > That's
> > > > >> > > > an
> > > > >> > > > >> > > > >> > implementation
> > > > >> > > > >> > > > >> > > > > detail and can be discussed in the PR.
> The
> > > more
> > > > >> > > general
> > > > >> > > > >> > point
> > > > >> > > > >> > > is
> > > > >> > > > >> > > > >> to
> > > > >> > > > >> > > > >> > > rely
> > > > >> > > > >> > > > >> > > > on
> > > > >> > > > >> > > > >> > > > > the library defaults instead of choosing
> > one
> > > > >> > > ourselves.
> > > > >> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> > > > > For point 3, I'm in favour of doing that
> > in
> > > this
> > > > >> > KIP.
> > > > >> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> > > > > Ismael
> > > > >> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin
> > Lee
> > > <
> > > > >> > > > >> > > dongjin@apache.org
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> > > wrote:
> > > > >> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> > > > > > Thank you Ismael, here are the
> answers:
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > *1. About topic config*
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > After some consideration, I concluded
> > that
> > > > >> topic
> > > > >> > > > config
> > > > >> > > > >> > > > doesn't
> > > > >> > > > >> > > > >> > need
> > > > >> > > > >> > > > >> > > to
> > > > >> > > > >> > > > >> > > > > > support compression.level. Here is
> why:
> > > since
> > > > >> the
> > > > >> > > > >> > > compression
> > > > >> > > > >> > > > is
> > > > >> > > > >> > > > >> > > > > conducted
> > > > >> > > > >> > > > >> > > > > > by the client, the one who can select
> > the
> > > best
> > > > >> > > > >> compression
> > > > >> > > > >> > > > >> level is
> > > > >> > > > >> > > > >> > > the
> > > > >> > > > >> > > > >> > > > > > client itself. Let us assume that the
> > > > >> compression
> > > > >> > > > >> level is
> > > > >> > > > >> > > set
> > > > >> > > > >> > > > >> at
> > > > >> > > > >> > > > >> > the
> > > > >> > > > >> > > > >> > > > > topic
> > > > >> > > > >> > > > >> > > > > > config level. In that case, there is a
> > > > >> > possibility
> > > > >> > > > that
> > > > >> > > > >> > the
> > > > >> > > > >> > > > >> > > compression
> > > > >> > > > >> > > > >> > > > > > level is not optimal for some
> producers.
> > > > >> > Actually,
> > > > >> > > > >> Kafka's
> > > > >> > > > >> > > go
> > > > >> > > > >> > > > >> > client
> > > > >> > > > >> > > > >> > > > also
> > > > >> > > > >> > > > >> > > > > > supports compression level
> functionality
> > > for
> > > > >> the
> > > > >> > > > >> producer
> > > > >> > > > >> > > > config
> > > > >> > > > >> > > > >> > > only.
> > > > >> > > > >> > > > >> > > > > > <
> > > > >> > > > >> https://github.com/Shopify/sarama/blob/master/config.go>
> > > > >> > > > >> > > > >> (wait,
> > > > >> > > > >> > > > >> > do
> > > > >> > > > >> > > > >> > > we
> > > > >> > > > >> > > > >> > > > > > need
> > > > >> > > > >> > > > >> > > > > > to add this reasoning in the KIP,
> > rejected
> > > > >> > > > alternatives
> > > > >> > > > >> > > > >> section?)
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > *2. About default level*
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > As of current draft implementation,
> the
> > > > >> default
> > > > >> > > > >> > compression
> > > > >> > > > >> > > is
> > > > >> > > > >> > > > >> set
> > > > >> > > > >> > > > >> > on
> > > > >> > > > >> > > > >> > > > the
> > > > >> > > > >> > > > >> > > > > > CompressionType enum. Of course,
> > changing
> > > this
> > > > >> > > > strategy
> > > > >> > > > >> > into
> > > > >> > > > >> > > > >> > relying
> > > > >> > > > >> > > > >> > > > on a
> > > > >> > > > >> > > > >> > > > > > method from the library to pick the
> > > default
> > > > >> > > > compression
> > > > >> > > > >> > > level
> > > > >> > > > >> > > > >> seems
> > > > >> > > > >> > > > >> > > > > > possible, like `GZIPBlockOutputStream`
> > > does.
> > > > >> In
> > > > >> > > this
> > > > >> > > > >> case,
> > > > >> > > > >> > > we
> > > > >> > > > >> > > > >> need
> > > > >> > > > >> > > > >> > to
> > > > >> > > > >> > > > >> > > > add
> > > > >> > > > >> > > > >> > > > > > similar wrapper class for zstd and
> > modify
> > > lz4
> > > > >> the
> > > > >> > > > >> wrapper
> > > > >> > > > >> > > > also.
> > > > >> > > > >> > > > >> Add
> > > > >> > > > >> > > > >> > > to
> > > > >> > > > >> > > > >> > > > > > this, it seems like we need to
> > explicitly
> > > > >> state
> > > > >> > > that
> > > > >> > > > we
> > > > >> > > > >> > > follow
> > > > >> > > > >> > > > >> the
> > > > >> > > > >> > > > >> > > > > default
> > > > >> > > > >> > > > >> > > > > > compression level of the codec in the
> > > > >> > > documentation.
> > > > >> > > > Is
> > > > >> > > > >> > this
> > > > >> > > > >> > > > >> what
> > > > >> > > > >> > > > >> > you
> > > > >> > > > >> > > > >> > > > > > intended?
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > *3. Whether to allow the buffer/block
> > > size to
> > > > >> be
> > > > >> > > > >> > > configurable*
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > Well, As of current draft
> > implementation,
> > > the
> > > > >> lz4
> > > > >> > > > >> level is
> > > > >> > > > >> > > > >> > > implemented
> > > > >> > > > >> > > > >> > > > as
> > > > >> > > > >> > > > >> > > > > > block size; this is caused by my
> > > > >> misunderstanding
> > > > >> > > on
> > > > >> > > > >> lz4.
> > > > >> > > > >> > > > After
> > > > >> > > > >> > > > >> > > > reviewing
> > > > >> > > > >> > > > >> > > > > > lz4 today, I found that it also
> supports
> > > > >> > > compression
> > > > >> > > > >> level
> > > > >> > > > >> > > of
> > > > >> > > > >> > > > >> 1~16
> > > > >> > > > >> > > > >> > > > > > (default: 1), not block size. I will
> fix
> > > it in
> > > > >> > this
> > > > >> > > > >> > weekend
> > > > >> > > > >> > > by
> > > > >> > > > >> > > > >> > > updating
> > > > >> > > > >> > > > >> > > > > the
> > > > >> > > > >> > > > >> > > > > > wrapper class.
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > For the problem of the buffer/block
> > size,
> > > I
> > > > >> have
> > > > >> > no
> > > > >> > > > >> strong
> > > > >> > > > >> > > > >> opinion.
> > > > >> > > > >> > > > >> > > If
> > > > >> > > > >> > > > >> > > > > the
> > > > >> > > > >> > > > >> > > > > > community needs it, I will do it all
> > > together.
> > > > >> > How
> > > > >> > > do
> > > > >> > > > >> you
> > > > >> > > > >> > > > think?
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > In short, it seems like I need to
> update
> > > the
> > > > >> KIP
> > > > >> > > > >> document
> > > > >> > > > >> > > for
> > > > >> > > > >> > > > >> issue
> > > > >> > > > >> > > > >> > > #1
> > > > >> > > > >> > > > >> > > > > and
> > > > >> > > > >> > > > >> > > > > > update the compression wrapper for
> issue
> > > #2,
> > > > >> #3.
> > > > >> > Is
> > > > >> > > > >> this
> > > > >> > > > >> > > okay?
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > Thanks,
> > > > >> > > > >> > > > >> > > > > > Dongjin
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM
> Ismael
> > > Juma <
> > > > >> > > > >> > > > ismaelj@gmail.com
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> > > > wrote:
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > >  Thanks for the KIP, this is
> helpful.
> > A
> > > few
> > > > >> > > > >> questions:
> > > > >> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > >> > > > > > > 1. Have we considered whether we
> want
> > to
> > > > >> allow
> > > > >> > a
> > > > >> > > > >> similar
> > > > >> > > > >> > > > topic
> > > > >> > > > >> > > > >> > > > config?
> > > > >> > > > >> > > > >> > > > > > > 2. Can we rely on a method from the
> > > library
> > > > >> to
> > > > >> > > pick
> > > > >> > > > >> the
> > > > >> > > > >> > > > >> default
> > > > >> > > > >> > > > >> > > > > > compression
> > > > >> > > > >> > > > >> > > > > > > level if compression.level is not
> set?
> > > We
> > > > >> do it
> > > > >> > > for
> > > > >> > > > >> gzip
> > > > >> > > > >> > > and
> > > > >> > > > >> > > > >> it
> > > > >> > > > >> > > > >> > > would
> > > > >> > > > >> > > > >> > > > > > seem
> > > > >> > > > >> > > > >> > > > > > > reasonable to do something similar
> for
> > > the
> > > > >> > other
> > > > >> > > > >> > > compression
> > > > >> > > > >> > > > >> > > > libraries.
> > > > >> > > > >> > > > >> > > > > > > 3. Do we want to allow the
> > buffer/block
> > > > >> size to
> > > > >> > > be
> > > > >> > > > >> > > > >> configurable?
> > > > >> > > > >> > > > >> > > This
> > > > >> > > > >> > > > >> > > > > has
> > > > >> > > > >> > > > >> > > > > > > an impact on memory usage and people
> > may
> > > > >> want
> > > > >> > to
> > > > >> > > > >> trade
> > > > >> > > > >> > > > >> > compression
> > > > >> > > > >> > > > >> > > > for
> > > > >> > > > >> > > > >> > > > > > > less/more memory in some cases. For
> > > example,
> > > > >> > the
> > > > >> > > > >> default
> > > > >> > > > >> > > for
> > > > >> > > > >> > > > >> LZ4
> > > > >> > > > >> > > > >> > is
> > > > >> > > > >> > > > >> > > > > 64KB
> > > > >> > > > >> > > > >> > > > > > > which is a bit high.
> > > > >> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > >> > > > > > > Ismael
> > > > >> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM
> Dongjin
> > > Lee <
> > > > >> > > > >> > > > dongjin@apache.org
> > > > >> > > > >> > > > >> > > wrote:
> > > > >> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > >> > > > > > > > Hello dev,
> > > > >> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > >> > > > > > > > I hope to initiate the discussion
> of
> > > > >> KIP-390:
> > > > >> > > Add
> > > > >> > > > >> > > producer
> > > > >> > > > >> > > > >> > option
> > > > >> > > > >> > > > >> > > > to
> > > > >> > > > >> > > > >> > > > > > > adjust
> > > > >> > > > >> > > > >> > > > > > > > compression level
> > > > >> > > > >> > > > >> > > > > > > > <
> > > > >> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > >
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >>
> > > > >> > > > >> > > >
> > > > >> > > > >> > >
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > > >> > > > >> > > > >> > > > > > > > >.
> > > > >> > > > >> > > > >> > > > > > > > All feedbacks will be highly
> > > appreciated.
> > > > >> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > >> > > > > > > > Best,
> > > > >> > > > >> > > > >> > > > > > > > Dongjin
> > > > >> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > >> > > > > > > > --
> > > > >> > > > >> > > > >> > > > > > > > *Dongjin Lee*
> > > > >> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > >> > > > > > > > *A hitchhiker in the mathematical
> > > world.*
> > > > >> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > >> > > > > > > > *github:  <http://goog_969573159/
> >
> > > > >> > > > >> > > github.com/dongjinleekr
> > > > >> > > > >> > > > >> > > > > > > > <http://github.com/dongjinleekr
> > > >linkedin:
> > > > >> > > > >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > > > >> > > > > > > > <
> > > http://kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> >slideshare:
> > > > >> > > > >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> > > > >> > > > >> > > > >> > > > > > > > <
> > > http://www.slideshare.net/dongjinleekr>*
> > > > >> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > --
> > > > >> > > > >> > > > >> > > > > > *Dongjin Lee*
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > > > *A hitchhiker in the mathematical
> > world.*
> > > > >> > > > >> > > > >> > > > > > *github:  <http://goog_969573159/>
> > > > >> > > > >> github.com/dongjinleekr
> > > > >> > > > >> > > > >> > > > > > <https://github.com/dongjinleekr
> > > >linkedin:
> > > > >> > > > >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > > > >> > > > > > <
> > https://kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >speakerdeck:
> > > > >> > > > >> > > > >> > > > > > speakerdeck.com/dongjin
> > > > >> > > > >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > > --
> > > > >> > > > >> > > > >> > > > *Dongjin Lee*
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > >> > > > >> > > > *github:  <http://goog_969573159/>
> > > > >> > > > github.com/dongjinleekr
> > > > >> > > > >> > > > >> > > > <https://github.com/dongjinleekr
> >linkedin:
> > > > >> > > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
> > > > >> > >speakerdeck:
> > > > >> > > > >> > > > >> > > > speakerdeck.com/dongjin
> > > > >> > > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >> > > > >> > > >
> > > > >> > > > >> > > > >> > >
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> > --
> > > > >> > > > >> > > > >> > *Dongjin Lee*
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >> > *A hitchhiker in the mathematical world.*
> > > > >> > > > >> > > > >> > *github:  <http://goog_969573159/>
> > > > >> > github.com/dongjinleekr
> > > > >> > > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr
> > > >speakerdeck:
> > > > >> > > > >> > > > >> > speakerdeck.com/dongjin
> > > > >> > > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >> > > > >> >
> > > > >> > > > >> > > > >>
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > --
> > > > >> > > > >> > > > > *Dongjin Lee*
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > >> > > > > *github:  <http://goog_969573159/>
> > > > >> github.com/dongjinleekr
> > > > >> > > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr
> > > >speakerdeck:
> > > > >> > > > >> > > > speakerdeck.com/dongjin
> > > > >> > > > >> > > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > >
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > --
> > > > >> > > > >> > > > *Dongjin Lee*
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > >> > > > *github:  <http://goog_969573159/>
> > > github.com/dongjinleekr
> > > > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
> > >speakerdeck:
> > > > >> > > > >> > > > speakerdeck.com/dongjin
> > > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >> > > >
> > > > >> > > > >> > >
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> > --
> > > > >> > > > >> > *Dongjin Lee*
> > > > >> > > > >> >
> > > > >> > > > >> > *A hitchhiker in the mathematical world.*
> > > > >> > > > >> > *github:  <http://goog_969573159/>
> > github.com/dongjinleekr
> > > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > > > >> > speakerdeck.com/dongjin
> > > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > *Dongjin Lee*
> > > > >> > > > >
> > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > > > speakerdeck.com/dongjin
> > > > >> > > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > *Dongjin Lee*
> > > > >> > > >
> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > > > speakerdeck.com/dongjin
> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > *Dongjin Lee*
> > > > >> >
> > > > >> > *A hitchhiker in the mathematical world.*
> > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > > >> kr.linkedin.com/in/dongjinleekr
> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > speakerdeck.com/dongjin
> > > > >> > <https://speakerdeck.com/dongjin>*
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <https://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > speakerdeck.com/dongjin
> > > > > <https://speakerdeck.com/dongjin>*
> > > > >
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <https://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > speakerdeck.com/dongjin
> > > > <https://speakerdeck.com/dongjin>*
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>
-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Becket Qin <be...@gmail.com>.
Hi Dongjin,

Thanks for the reply. I agree that implementation wise, it is clearer to
add a new "Map" config type. However, practically speaking I don't think
this KIP strongly depends on that. As long as we follow the same string
format in the configuration for all those map-like configurations, whether
to have a Map config type or not seems not that important.

Given that the existing configuration uses the format of
"KEY:VALUE[,KEY:VALUE]", we could just use this string format.

Thanks,

Jiangjie (Becket) Qin

On Thu, Jan 31, 2019 at 7:47 PM Dongjin Lee <do...@apache.org> wrote:

> Mickael,
>
> It seems like the majority of the community agrees that the new config
> scheme proposed by Becket is much better. However, we still need another
> KIP to support this kind of config; it is different from the case of
> `listener.security.protocol.map,` which is just a concatenation of
> available settings (i.e., string type.) For Becket's idea, we need to add
> support for the key-value style.
>
> It seems like we need advice from a committer, whether to add a new
> prerequisite KIP for the map type configuration to implement Becket's idea.
>
> Committers:
>
> Could you give us some advice on this problem?
>
> Best,
> Dongjin
>
> On Wed, Jan 30, 2019 at 11:30 PM Mickael Maison <mi...@gmail.com>
> wrote:
>
> > Thanks, that's a very interesting KIP!
> >
> > I agree with Becket that a clearer config format is likely to help.
> > Have you considered using a config format similar to listeners (as
> > described in the "Updating SSL Keystore of an Existing Listener" in
> > the Kafka docs)?
> >
> > Also worth noting that we already have configs that effectively take a
> > map, for example: listener.security.protocol.map so not sure if we
> > need an additional KIP
> >
> > On Wed, Jan 30, 2019 at 5:22 AM Dongjin Lee <do...@apache.org> wrote:
> > >
> > > Hello.
> > >
> > > Do you have any idea on Becket's Idea of new config format (example
> > below)?
> > >
> > > ```
> > > compression.config="gzip.compression.level=5, lz4.compression.level=17,
> > > zstd.compression.level=22"
> > > ```
> > >
> > > It requires some additional KIP for supporting new config format (map),
> > but
> > > it can significantly simplify the configuration with flexibility and
> > > extensibility. If you prefer this way, I hope to carry the ball.
> > >
> > > If not, please give me an opinion here or the voting thread.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > >
> > > On Fri, Jan 25, 2019 at 1:25 AM Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > Hi Becket,
> > > >
> > > > Thank you for your opinion. Frankly, I have no strong opinion on
> > > > configuration name. In this problem, I will follow the community's
> > choice.
> > > > (I like your idea in that it has a notion of 'scope' per compression
> > codec.
> > > > However, it should be implemented on top of new config type like Map;
> > It
> > > > will require another KIP as a prerequisite, but if the community
> prefer
> > > > this way, I will take the task.)
> > > >
> > > > (One minor correction: the one who thought 'producer' compression
> > config
> > > > would cause a problem at broker was me, not Ismael - and Ismael
> > reassured
> > > > me there will be no problem with it.)
> > > >
> > > > To All,
> > > >
> > > > How about Becket's idea of 'compression.config' option?
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > On Wed, Jan 23, 2019 at 1:16 PM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > >> Hi Dongjin,
> > > >>
> > > >> Thanks for the KIP and sorry for being a bit late on the discussion.
> > > >>
> > > >> It makes sense to expose the configuration for compression types.
> But
> > I am
> > > >> wondering if there is a better way to do that than what proposed in
> > the
> > > >> KIP. What I feel confusing is that we are effectively sharing the
> > > >> configuration across different compression types, the meaning of the
> > > >> configuration are actually kind of different depending on the
> > compression
> > > >> type. This will end up with issues like what Ismael has brought up
> > > >> earlier.
> > > >> Say if the broker has compression type of producer (this may result
> in
> > > >> mixed compression type in the same topic), and for some reason the
> > broker
> > > >> needs to re-compress the topic (e.g. due to log compaction), a
> single
> > > >> topic
> > > >> level compression config may not work, because a valid compression
> > level
> > > >> for lz4 maybe invalid for gzip.
> > > >>
> > > >> One alternative I am thinking is to provide a "compression.config"
> > > >> configuration, inside which it specifies configuration used by each
> > > >> specific compression type as k-v pairs. The format could use some
> name
> > > >> space as well. For example,
> > > >>
> > > >>
> > > >>
> >
> compression.config="gzip.compression.level=5,lz4.compression.level=17,zstd.compression.level=22".
> > > >>
> > > >> Each compression type will just pick whatever configuration they
> need
> > from
> > > >> the k-v pairs defined in this config.
> > > >>
> > > >> Besides clarity, some other benefits are:
> > > >> 1. Extensibility, we don't have to add more configuration when we
> add
> > new
> > > >> compression types, or expose new config for a particular compression
> > type.
> > > >> 2. Even for the same config, different compression type may have
> > different
> > > >> terminologies. With the approach we can honor those terminologies
> > instead
> > > >> of shoehorning them into the same configuration name.
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Jan 23, 2019 at 12:07 AM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >>
> > > >> > Hello. I just fixed the draft implementation, with rebasing onto
> the
> > > >> latest
> > > >> > trunk. The KIP was also restored.
> > > >> >
> > > >> > Please have a look, and if there is no major problem, please vote
> > to the
> > > >> > voting thread. You know, KIP freeze for 2.2.0 is almost imminent.
> > > >> >
> > > >> > Thanks,
> > > >> > Dongjin
> > > >> >
> > > >> > On Tue, Jan 22, 2019 at 1:04 AM Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > >> >
> > > >> > > Thanks!
> > > >> > >
> > > >> > > Ismael
> > > >> > >
> > > >> > > On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <dongjin@apache.org
> >
> > > >> wrote:
> > > >> > >
> > > >> > > > Hi Ismael,
> > > >> > > >
> > > >> > > > After reviewing
> > > >> > > `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
> > > >> > > > yes, you are right. If source codec and target codec is
> > identical
> > > >> and
> > > >> > the
> > > >> > > > magic is above 0, the broker can do an in-place assignment,
> > without
> > > >> > > > recompressing. Sorry for my misunderstanding.
> > > >> > > >
> > > >> > > > Since we don't need `compression.[gzip,lz4,zstd].level` and
> > > >> > > > `compression.[gzip,snappy,lz4].buffer.size` anymore, I will
> > revert
> > > >> > those
> > > >> > > > changes and update the KIP with the recent discussions. I will
> > > >> complete
> > > >> > > it
> > > >> > > > in 48 hours from now.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Dongjin
> > > >> > > >
> > > >> > > > On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <
> dongjin@apache.org
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > I see. Let me have a check. If not needed, of course, we
> don't
> > > >> have
> > > >> > to
> > > >> > > > > waste on configuration options.
> > > >> > > > >
> > > >> > > > > Since the KIP deadline is imminent, I just opened the voting
> > > >> thread.
> > > >> > > > Let's
> > > >> > > > > continue the discussion here.
> > > >> > > > >
> > > >> > > > > Best,
> > > >> > > > > Dongjin
> > > >> > > > >
> > > >> > > > > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <
> > ismael@juma.me.uk>
> > > >> > wrote:
> > > >> > > > >
> > > >> > > > >> Hi Dongjin,
> > > >> > > > >>
> > > >> > > > >> When the compression type is "producer", then the broker
> > doesn't
> > > >> > > > >> recompress
> > > >> > > > >> though. Thinking about it some more, there are some
> uncommon
> > > >> cases
> > > >> > > where
> > > >> > > > >> recompression does happen (the old (and hopefully hardly
> > used by
> > > >> > now)
> > > >> > > > >> message format == 0 and some edge cases), so it is a good
> > point
> > > >> you
> > > >> > > > >> raised.
> > > >> > > > >>
> > > >> > > > >> It's a bit unfortunate to add so many topic configs for
> cases
> > > >> that
> > > >> > > > >> probably
> > > >> > > > >> don't matter. That is, if you are using "producer"
> > compression,
> > > >> you
> > > >> > > > >> probably don't need to configure these settings and can
> live
> > with
> > > >> > the
> > > >> > > > >> defaults. Perhaps we should only support the topic config
> > for the
> > > >> > > cases
> > > >> > > > >> where you are actually recompressing in the broker.
> > > >> > > > >>
> > > >> > > > >> What do you think? I'd be interested in other people's
> > thoughts
> > > >> too.
> > > >> > > > >>
> > > >> > > > >> Ismael
> > > >> > > > >>
> > > >> > > > >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <
> > dongjin@apache.org>
> > > >> > > wrote:
> > > >> > > > >>
> > > >> > > > >> > Hi Ismael,
> > > >> > > > >> >
> > > >> > > > >> > It seems like it needs more explanation. Here is the
> > detailed
> > > >> > > > reasoning.
> > > >> > > > >> >
> > > >> > > > >> > You know, topic and broker's 'compression.type' allows
> > > >> > > 'uncompressed',
> > > >> > > > >> > 'producer' with standard codecs (i.e., gzip, snappy, lz4,
> > > >> zstd.)
> > > >> > And
> > > >> > > > >> this
> > > >> > > > >> > configuration is used by the broker in the re-compressing
> > > >> process
> > > >> > > > after
> > > >> > > > >> > offset assignment. After this feature, the new configs,
> > > >> > > > >> 'compression.level'
> > > >> > > > >> > and 'compression.buffer.size', also will be used in this
> > > >> process.
> > > >> > > > >> >
> > > >> > > > >> > The problem arises when given topic's compression type
> > > >> (whether it
> > > >> > > was
> > > >> > > > >> > inherited from broker's configuration or explicitly set)
> is
> > > >> > > > 'producer.'
> > > >> > > > >> > With this setting, the compression codec to be used is
> > decided
> > > >> by
> > > >> > > the
> > > >> > > > >> > producer client. Since there is no way to restore the
> > > >> compression
> > > >> > > > level
> > > >> > > > >> and
> > > >> > > > >> > buffer size from the message, we can take the following
> > > >> > strategies:
> > > >> > > > >> >
> > > >> > > > >> > 1. Just use given 'compression.level' and
> > > >> > 'compression.buffer.size'
> > > >> > > > >> > settings.
> > > >> > > > >> >
> > > >> > > > >> > It will cause so many errors. Let's imagine the case of
> > topic's
> > > >> > > > >> > configuration is { compression.type=producer,
> > > >> > compression.level=10,
> > > >> > > > >> > compression.buffer.size=8192 }. In this case, all
> producers
> > > >> with
> > > >> > > gzip
> > > >> > > > or
> > > >> > > > >> > lz4 compressed messages will result in an error. (gzip
> > doesn't
> > > >> > allow
> > > >> > > > >> > compression level 10, and lz4 also for a buffer size of
> > 8192.)
> > > >> > > > >> >
> > > >> > > > >> > 2. Extend the message format to include compression
> > > >> > configurations.
> > > >> > > > >> >
> > > >> > > > >> > With this strategy, we need to change the message format
> -
> > > >> it's a
> > > >> > > too
> > > >> > > > >> big
> > > >> > > > >> > change.
> > > >> > > > >> >
> > > >> > > > >> > 3. If topic's compression.type is 'producer', use the
> > default
> > > >> > > > >> configuration
> > > >> > > > >> > for the given codec.
> > > >> > > > >> >
> > > >> > > > >> > With this strategy, allowing fine-grained compression
> > > >> > configuration
> > > >> > > is
> > > >> > > > >> > meaningless.
> > > >> > > > >> >
> > > >> > > > >> > For the above reasons, I think the only alternative is
> > > >> providing
> > > >> > > > options
> > > >> > > > >> > that can be used when the topic's 'compression.type' is
> > > >> > 'producer.'
> > > >> > > In
> > > >> > > > >> > other words, adding compression.[gzip, lz4, zstd].level
> and
> > > >> > > > >> > compression.[gzip.snappy.lz4].buffer.size options - and
> it
> > is
> > > >> > what I
> > > >> > > > >> did in
> > > >> > > > >> > the last modification.
> > > >> > > > >> >
> > > >> > > > >> > (wait, the reasoning above should be included in the KIP
> > in the
> > > >> > > > rejected
> > > >> > > > >> > alternatives section, isn't it?)
> > > >> > > > >> >
> > > >> > > > >> > Thanks,
> > > >> > > > >> > Dongjin
> > > >> > > > >> >
> > > >> > > > >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <
> > ismael@juma.me.uk
> > > >> >
> > > >> > > > wrote:
> > > >> > > > >> >
> > > >> > > > >> > > Hi Dongjin,
> > > >> > > > >> > >
> > > >> > > > >> > > For topic level, you can only have a single compression
> > type
> > > >> so
> > > >> > > the
> > > >> > > > >> way
> > > >> > > > >> > it
> > > >> > > > >> > > was before was fine, right? The point you raise is how
> > to set
> > > >> > > broker
> > > >> > > > >> > > defaults that vary depending on the compression type,
> > > >> correct?
> > > >> > > > >> > >
> > > >> > > > >> > > Ismael
> > > >> > > > >> > >
> > > >> > > > >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <
> > > >> > dongjin@apache.org>
> > > >> > > > >> wrote:
> > > >> > > > >> > >
> > > >> > > > >> > > > I just realized that there was a missing hole in the
> > KIP,
> > > >> so I
> > > >> > > > fixed
> > > >> > > > >> > it.
> > > >> > > > >> > > > The draft implementation will be updated soon.
> > > >> > > > >> > > >
> > > >> > > > >> > > > In short, the proposed change did not regard the case
> > of
> > > >> the
> > > >> > > topic
> > > >> > > > >> or
> > > >> > > > >> > > > broker's 'compression.type' is 'producer'; in this
> > case,
> > > >> the
> > > >> > > > broker
> > > >> > > > >> has
> > > >> > > > >> > > to
> > > >> > > > >> > > > handle all kinds of the supported codec. So I added
> > > >> additional
> > > >> > > > >> options
> > > >> > > > >> > > > (compression.[gzip,snappy,lz4, zstd].level,
> > > >> > > > >> > compression.[gzip,snappy,lz4,
> > > >> > > > >> > > > zstd].buffer.size) with handling routines.
> > > >> > > > >> > > >
> > > >> > > > >> > > > Please have a look when you are free.
> > > >> > > > >> > > >
> > > >> > > > >> > > > Thanks,
> > > >> > > > >> > > > Dongjin
> > > >> > > > >> > > >
> > > >> > > > >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <
> > > >> > dongjin@apache.org>
> > > >> > > > >> wrote:
> > > >> > > > >> > > >
> > > >> > > > >> > > > > Thanks for pointing out Ismael. It's now updated.
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > Best,
> > > >> > > > >> > > > > Dongjin
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <
> > > >> > ismaelj@gmail.com
> > > >> > > >
> > > >> > > > >> > wrote:
> > > >> > > > >> > > > >
> > > >> > > > >> > > > >> Thanks Dongjin. One minor suggestion: we should
> > mention
> > > >> > that
> > > >> > > > the
> > > >> > > > >> > > broker
> > > >> > > > >> > > > >> side configs are also topic configs (i.e. can be
> set
> > > >> for a
> > > >> > > > given
> > > >> > > > >> > > topic).
> > > >> > > > >> > > > >>
> > > >> > > > >> > > > >> Ismael
> > > >> > > > >> > > > >>
> > > >> > > > >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <
> > > >> > > dongjin@apache.org
> > > >> > > > >> > wrote:
> > > >> > > > >> > > > >>
> > > >> > > > >> > > > >> > Happy new year.
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> > I just updated the title and contents of KIP and
> > Jira
> > > >> > > issue,
> > > >> > > > >> with
> > > >> > > > >> > > > >> updated
> > > >> > > > >> > > > >> > draft implementation. Now both of compression
> > level
> > > >> and
> > > >> > > > buffer
> > > >> > > > >> > size
> > > >> > > > >> > > > >> options
> > > >> > > > >> > > > >> > are available to producer and broker
> > configuration.
> > > >> You
> > > >> > can
> > > >> > > > >> check
> > > >> > > > >> > > the
> > > >> > > > >> > > > >> > updated KIP from modified URL:
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >>
> > > >> > > > >> > > >
> > > >> > > > >> > >
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> > Please have a look when you are free.
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> > Thanks,
> > > >> > > > >> > > > >> > Dongjin
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <
> > > >> > > > ismaelj@gmail.com
> > > >> > > > >> >
> > > >> > > > >> > > > wrote:
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> > > The updated title sounds fine to me.
> > > >> > > > >> > > > >> > >
> > > >> > > > >> > > > >> > > Ismael
> > > >> > > > >> > > > >> > >
> > > >> > > > >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
> > > >> > > > dongjin@apache.org
> > > >> > > > >> > > wrote:
> > > >> > > > >> > > > >> > >
> > > >> > > > >> > > > >> > > > Hi Ismael,
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > > Got it. Your direction is perfectly
> > reasonable. I
> > > >> am
> > > >> > > now
> > > >> > > > >> > > updating
> > > >> > > > >> > > > >> the
> > > >> > > > >> > > > >> > KIP
> > > >> > > > >> > > > >> > > > document and the implementation.
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > > By allowing the buffer/block size to be
> > > >> configurable,
> > > >> > > it
> > > >> > > > >> would
> > > >> > > > >> > > be
> > > >> > > > >> > > > >> > better
> > > >> > > > >> > > > >> > > to
> > > >> > > > >> > > > >> > > > update the title of the KIP like 'Allow
> > > >> fine-grained
> > > >> > > > >> > > configuration
> > > >> > > > >> > > > >> for
> > > >> > > > >> > > > >> > > > compression'. Is that right?
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > > @Other committers:
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > > Is there any other opinion on allowing the
> > > >> > buffer/block
> > > >> > > > >> size
> > > >> > > > >> > to
> > > >> > > > >> > > be
> > > >> > > > >> > > > >> > > > configurable?
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > > Thanks,
> > > >> > > > >> > > > >> > > > Dongjin
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma
> <
> > > >> > > > >> > ismael@juma.me.uk>
> > > >> > > > >> > > > >> wrote:
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > > > Hi Dongjin,
> > > >> > > > >> > > > >> > > > >
> > > >> > > > >> > > > >> > > > > To clarify, I mean a broker topic config
> > with
> > > >> > regards
> > > >> > > > to
> > > >> > > > >> > point
> > > >> > > > >> > > > 1.
> > > >> > > > >> > > > >> As
> > > >> > > > >> > > > >> > > you
> > > >> > > > >> > > > >> > > > > know, compression can be done by the
> > producer
> > > >> > and/or
> > > >> > > by
> > > >> > > > >> the
> > > >> > > > >> > > > >> broker.
> > > >> > > > >> > > > >> > The
> > > >> > > > >> > > > >> > > > > default is for the broker to just use
> > whatever
> > > >> > > > >> compression
> > > >> > > > >> > was
> > > >> > > > >> > > > >> used
> > > >> > > > >> > > > >> > by
> > > >> > > > >> > > > >> > > > the
> > > >> > > > >> > > > >> > > > > producer, but this can be changed by the
> > user
> > > >> on a
> > > >> > > per
> > > >> > > > >> topic
> > > >> > > > >> > > > >> basis.
> > > >> > > > >> > > > >> > It
> > > >> > > > >> > > > >> > > > > seems like it would make sense for the
> > configs
> > > >> to
> > > >> > be
> > > >> > > .
> > > >> > > > >> > > > consistent
> > > >> > > > >> > > > >> > > between
> > > >> > > > >> > > > >> > > > > producer and broker.
> > > >> > > > >> > > > >> > > > >
> > > >> > > > >> > > > >> > > > > For point 2, I haven't looked at the
> > > >> > implementation,
> > > >> > > > but
> > > >> > > > >> we
> > > >> > > > >> > > > could
> > > >> > > > >> > > > >> do
> > > >> > > > >> > > > >> > it
> > > >> > > > >> > > > >> > > > in
> > > >> > > > >> > > > >> > > > > the `CompressionType` enum by invoking the
> > right
> > > >> > > > >> constructor
> > > >> > > > >> > > or
> > > >> > > > >> > > > >> > > > retrieving
> > > >> > > > >> > > > >> > > > > the default value via a constant (if
> > defined).
> > > >> > That's
> > > >> > > > an
> > > >> > > > >> > > > >> > implementation
> > > >> > > > >> > > > >> > > > > detail and can be discussed in the PR. The
> > more
> > > >> > > general
> > > >> > > > >> > point
> > > >> > > > >> > > is
> > > >> > > > >> > > > >> to
> > > >> > > > >> > > > >> > > rely
> > > >> > > > >> > > > >> > > > on
> > > >> > > > >> > > > >> > > > > the library defaults instead of choosing
> one
> > > >> > > ourselves.
> > > >> > > > >> > > > >> > > > >
> > > >> > > > >> > > > >> > > > > For point 3, I'm in favour of doing that
> in
> > this
> > > >> > KIP.
> > > >> > > > >> > > > >> > > > >
> > > >> > > > >> > > > >> > > > > Ismael
> > > >> > > > >> > > > >> > > > >
> > > >> > > > >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin
> Lee
> > <
> > > >> > > > >> > > dongjin@apache.org
> > > >> > > > >> > > > >
> > > >> > > > >> > > > >> > > wrote:
> > > >> > > > >> > > > >> > > > >
> > > >> > > > >> > > > >> > > > > > Thank you Ismael, here are the answers:
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > *1. About topic config*
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > After some consideration, I concluded
> that
> > > >> topic
> > > >> > > > config
> > > >> > > > >> > > > doesn't
> > > >> > > > >> > > > >> > need
> > > >> > > > >> > > > >> > > to
> > > >> > > > >> > > > >> > > > > > support compression.level. Here is why:
> > since
> > > >> the
> > > >> > > > >> > > compression
> > > >> > > > >> > > > is
> > > >> > > > >> > > > >> > > > > conducted
> > > >> > > > >> > > > >> > > > > > by the client, the one who can select
> the
> > best
> > > >> > > > >> compression
> > > >> > > > >> > > > >> level is
> > > >> > > > >> > > > >> > > the
> > > >> > > > >> > > > >> > > > > > client itself. Let us assume that the
> > > >> compression
> > > >> > > > >> level is
> > > >> > > > >> > > set
> > > >> > > > >> > > > >> at
> > > >> > > > >> > > > >> > the
> > > >> > > > >> > > > >> > > > > topic
> > > >> > > > >> > > > >> > > > > > config level. In that case, there is a
> > > >> > possibility
> > > >> > > > that
> > > >> > > > >> > the
> > > >> > > > >> > > > >> > > compression
> > > >> > > > >> > > > >> > > > > > level is not optimal for some producers.
> > > >> > Actually,
> > > >> > > > >> Kafka's
> > > >> > > > >> > > go
> > > >> > > > >> > > > >> > client
> > > >> > > > >> > > > >> > > > also
> > > >> > > > >> > > > >> > > > > > supports compression level functionality
> > for
> > > >> the
> > > >> > > > >> producer
> > > >> > > > >> > > > config
> > > >> > > > >> > > > >> > > only.
> > > >> > > > >> > > > >> > > > > > <
> > > >> > > > >> https://github.com/Shopify/sarama/blob/master/config.go>
> > > >> > > > >> > > > >> (wait,
> > > >> > > > >> > > > >> > do
> > > >> > > > >> > > > >> > > we
> > > >> > > > >> > > > >> > > > > > need
> > > >> > > > >> > > > >> > > > > > to add this reasoning in the KIP,
> rejected
> > > >> > > > alternatives
> > > >> > > > >> > > > >> section?)
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > *2. About default level*
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > As of current draft implementation, the
> > > >> default
> > > >> > > > >> > compression
> > > >> > > > >> > > is
> > > >> > > > >> > > > >> set
> > > >> > > > >> > > > >> > on
> > > >> > > > >> > > > >> > > > the
> > > >> > > > >> > > > >> > > > > > CompressionType enum. Of course,
> changing
> > this
> > > >> > > > strategy
> > > >> > > > >> > into
> > > >> > > > >> > > > >> > relying
> > > >> > > > >> > > > >> > > > on a
> > > >> > > > >> > > > >> > > > > > method from the library to pick the
> > default
> > > >> > > > compression
> > > >> > > > >> > > level
> > > >> > > > >> > > > >> seems
> > > >> > > > >> > > > >> > > > > > possible, like `GZIPBlockOutputStream`
> > does.
> > > >> In
> > > >> > > this
> > > >> > > > >> case,
> > > >> > > > >> > > we
> > > >> > > > >> > > > >> need
> > > >> > > > >> > > > >> > to
> > > >> > > > >> > > > >> > > > add
> > > >> > > > >> > > > >> > > > > > similar wrapper class for zstd and
> modify
> > lz4
> > > >> the
> > > >> > > > >> wrapper
> > > >> > > > >> > > > also.
> > > >> > > > >> > > > >> Add
> > > >> > > > >> > > > >> > > to
> > > >> > > > >> > > > >> > > > > > this, it seems like we need to
> explicitly
> > > >> state
> > > >> > > that
> > > >> > > > we
> > > >> > > > >> > > follow
> > > >> > > > >> > > > >> the
> > > >> > > > >> > > > >> > > > > default
> > > >> > > > >> > > > >> > > > > > compression level of the codec in the
> > > >> > > documentation.
> > > >> > > > Is
> > > >> > > > >> > this
> > > >> > > > >> > > > >> what
> > > >> > > > >> > > > >> > you
> > > >> > > > >> > > > >> > > > > > intended?
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > *3. Whether to allow the buffer/block
> > size to
> > > >> be
> > > >> > > > >> > > configurable*
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > Well, As of current draft
> implementation,
> > the
> > > >> lz4
> > > >> > > > >> level is
> > > >> > > > >> > > > >> > > implemented
> > > >> > > > >> > > > >> > > > as
> > > >> > > > >> > > > >> > > > > > block size; this is caused by my
> > > >> misunderstanding
> > > >> > > on
> > > >> > > > >> lz4.
> > > >> > > > >> > > > After
> > > >> > > > >> > > > >> > > > reviewing
> > > >> > > > >> > > > >> > > > > > lz4 today, I found that it also supports
> > > >> > > compression
> > > >> > > > >> level
> > > >> > > > >> > > of
> > > >> > > > >> > > > >> 1~16
> > > >> > > > >> > > > >> > > > > > (default: 1), not block size. I will fix
> > it in
> > > >> > this
> > > >> > > > >> > weekend
> > > >> > > > >> > > by
> > > >> > > > >> > > > >> > > updating
> > > >> > > > >> > > > >> > > > > the
> > > >> > > > >> > > > >> > > > > > wrapper class.
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > For the problem of the buffer/block
> size,
> > I
> > > >> have
> > > >> > no
> > > >> > > > >> strong
> > > >> > > > >> > > > >> opinion.
> > > >> > > > >> > > > >> > > If
> > > >> > > > >> > > > >> > > > > the
> > > >> > > > >> > > > >> > > > > > community needs it, I will do it all
> > together.
> > > >> > How
> > > >> > > do
> > > >> > > > >> you
> > > >> > > > >> > > > think?
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > In short, it seems like I need to update
> > the
> > > >> KIP
> > > >> > > > >> document
> > > >> > > > >> > > for
> > > >> > > > >> > > > >> issue
> > > >> > > > >> > > > >> > > #1
> > > >> > > > >> > > > >> > > > > and
> > > >> > > > >> > > > >> > > > > > update the compression wrapper for issue
> > #2,
> > > >> #3.
> > > >> > Is
> > > >> > > > >> this
> > > >> > > > >> > > okay?
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > Thanks,
> > > >> > > > >> > > > >> > > > > > Dongjin
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael
> > Juma <
> > > >> > > > >> > > > ismaelj@gmail.com
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> > > > wrote:
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > >  Thanks for the KIP, this is helpful.
> A
> > few
> > > >> > > > >> questions:
> > > >> > > > >> > > > >> > > > > > >
> > > >> > > > >> > > > >> > > > > > > 1. Have we considered whether we want
> to
> > > >> allow
> > > >> > a
> > > >> > > > >> similar
> > > >> > > > >> > > > topic
> > > >> > > > >> > > > >> > > > config?
> > > >> > > > >> > > > >> > > > > > > 2. Can we rely on a method from the
> > library
> > > >> to
> > > >> > > pick
> > > >> > > > >> the
> > > >> > > > >> > > > >> default
> > > >> > > > >> > > > >> > > > > > compression
> > > >> > > > >> > > > >> > > > > > > level if compression.level is not set?
> > We
> > > >> do it
> > > >> > > for
> > > >> > > > >> gzip
> > > >> > > > >> > > and
> > > >> > > > >> > > > >> it
> > > >> > > > >> > > > >> > > would
> > > >> > > > >> > > > >> > > > > > seem
> > > >> > > > >> > > > >> > > > > > > reasonable to do something similar for
> > the
> > > >> > other
> > > >> > > > >> > > compression
> > > >> > > > >> > > > >> > > > libraries.
> > > >> > > > >> > > > >> > > > > > > 3. Do we want to allow the
> buffer/block
> > > >> size to
> > > >> > > be
> > > >> > > > >> > > > >> configurable?
> > > >> > > > >> > > > >> > > This
> > > >> > > > >> > > > >> > > > > has
> > > >> > > > >> > > > >> > > > > > > an impact on memory usage and people
> may
> > > >> want
> > > >> > to
> > > >> > > > >> trade
> > > >> > > > >> > > > >> > compression
> > > >> > > > >> > > > >> > > > for
> > > >> > > > >> > > > >> > > > > > > less/more memory in some cases. For
> > example,
> > > >> > the
> > > >> > > > >> default
> > > >> > > > >> > > for
> > > >> > > > >> > > > >> LZ4
> > > >> > > > >> > > > >> > is
> > > >> > > > >> > > > >> > > > > 64KB
> > > >> > > > >> > > > >> > > > > > > which is a bit high.
> > > >> > > > >> > > > >> > > > > > >
> > > >> > > > >> > > > >> > > > > > > Ismael
> > > >> > > > >> > > > >> > > > > > >
> > > >> > > > >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin
> > Lee <
> > > >> > > > >> > > > dongjin@apache.org
> > > >> > > > >> > > > >> > > wrote:
> > > >> > > > >> > > > >> > > > > > >
> > > >> > > > >> > > > >> > > > > > > > Hello dev,
> > > >> > > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > >> > > > > > > > I hope to initiate the discussion of
> > > >> KIP-390:
> > > >> > > Add
> > > >> > > > >> > > producer
> > > >> > > > >> > > > >> > option
> > > >> > > > >> > > > >> > > > to
> > > >> > > > >> > > > >> > > > > > > adjust
> > > >> > > > >> > > > >> > > > > > > > compression level
> > > >> > > > >> > > > >> > > > > > > > <
> > > >> > > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > >> > > > > > >
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > >
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > >
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >>
> > > >> > > > >> > > >
> > > >> > > > >> > >
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > >> > > > >> > > > >> > > > > > > > >.
> > > >> > > > >> > > > >> > > > > > > > All feedbacks will be highly
> > appreciated.
> > > >> > > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > >> > > > > > > > Best,
> > > >> > > > >> > > > >> > > > > > > > Dongjin
> > > >> > > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > >> > > > > > > > --
> > > >> > > > >> > > > >> > > > > > > > *Dongjin Lee*
> > > >> > > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > >> > > > > > > > *A hitchhiker in the mathematical
> > world.*
> > > >> > > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > >> > > > > > > > *github:  <http://goog_969573159/>
> > > >> > > > >> > > github.com/dongjinleekr
> > > >> > > > >> > > > >> > > > > > > > <http://github.com/dongjinleekr
> > >linkedin:
> > > >> > > > >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > > > >> > > > > > > > <
> > http://kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> >slideshare:
> > > >> > > > >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> > > >> > > > >> > > > >> > > > > > > > <
> > http://www.slideshare.net/dongjinleekr>*
> > > >> > > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > >> > > > > > >
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > --
> > > >> > > > >> > > > >> > > > > > *Dongjin Lee*
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > > > *A hitchhiker in the mathematical
> world.*
> > > >> > > > >> > > > >> > > > > > *github:  <http://goog_969573159/>
> > > >> > > > >> github.com/dongjinleekr
> > > >> > > > >> > > > >> > > > > > <https://github.com/dongjinleekr
> > >linkedin:
> > > >> > > > >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > > > >> > > > > > <
> https://kr.linkedin.com/in/dongjinleekr
> > > >> > > > >speakerdeck:
> > > >> > > > >> > > > >> > > > > > speakerdeck.com/dongjin
> > > >> > > > >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> > > >> > > > >> > > > >> > > > > >
> > > >> > > > >> > > > >> > > > >
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > > --
> > > >> > > > >> > > > >> > > > *Dongjin Lee*
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > >> > > > >> > > > >> > > > *github:  <http://goog_969573159/>
> > > >> > > > github.com/dongjinleekr
> > > >> > > > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
> > > >> > >speakerdeck:
> > > >> > > > >> > > > >> > > > speakerdeck.com/dongjin
> > > >> > > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > >> > > > >> > > > >> > > >
> > > >> > > > >> > > > >> > >
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> > --
> > > >> > > > >> > > > >> > *Dongjin Lee*
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >> > *A hitchhiker in the mathematical world.*
> > > >> > > > >> > > > >> > *github:  <http://goog_969573159/>
> > > >> > github.com/dongjinleekr
> > > >> > > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr
> > >speakerdeck:
> > > >> > > > >> > > > >> > speakerdeck.com/dongjin
> > > >> > > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > > >> > > > >> > > > >> >
> > > >> > > > >> > > > >>
> > > >> > > > >> > > > >
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > --
> > > >> > > > >> > > > > *Dongjin Lee*
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > >> > > > >> > > > > *github:  <http://goog_969573159/>
> > > >> github.com/dongjinleekr
> > > >> > > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr
> > >speakerdeck:
> > > >> > > > >> > > > speakerdeck.com/dongjin
> > > >> > > > >> > > > > <https://speakerdeck.com/dongjin>*
> > > >> > > > >> > > > >
> > > >> > > > >> > > >
> > > >> > > > >> > > >
> > > >> > > > >> > > > --
> > > >> > > > >> > > > *Dongjin Lee*
> > > >> > > > >> > > >
> > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > >> > > > >> > > > *github:  <http://goog_969573159/>
> > github.com/dongjinleekr
> > > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
> >speakerdeck:
> > > >> > > > >> > > > speakerdeck.com/dongjin
> > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > >> > > > >> > > >
> > > >> > > > >> > >
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> > --
> > > >> > > > >> > *Dongjin Lee*
> > > >> > > > >> >
> > > >> > > > >> > *A hitchhiker in the mathematical world.*
> > > >> > > > >> > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > > > >> > speakerdeck.com/dongjin
> > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > *Dongjin Lee*
> > > >> > > > >
> > > >> > > > > *A hitchhiker in the mathematical world.*
> > > >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > > > speakerdeck.com/dongjin
> > > >> > > > > <https://speakerdeck.com/dongjin>*
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > *Dongjin Lee*
> > > >> > > >
> > > >> > > > *A hitchhiker in the mathematical world.*
> > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > > > speakerdeck.com/dongjin
> > > >> > > > <https://speakerdeck.com/dongjin>*
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > *Dongjin Lee*
> > > >> >
> > > >> > *A hitchhiker in the mathematical world.*
> > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > <https://github.com/dongjinleekr>linkedin:
> > > >> kr.linkedin.com/in/dongjinleekr
> > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > speakerdeck.com/dongjin
> > > >> > <https://speakerdeck.com/dongjin>*
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <https://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > > > <https://speakerdeck.com/dongjin>*
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <https://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > > <https://speakerdeck.com/dongjin>*
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Mickael,

It seems like the majority of the community agrees that the new config
scheme proposed by Becket is much better. However, we still need another
KIP to support this kind of config; it is different from the case of
`listener.security.protocol.map,` which is just a concatenation of
available settings (i.e., string type.) For Becket's idea, we need to add
support for the key-value style.

It seems like we need advice from a committer, whether to add a new
prerequisite KIP for the map type configuration to implement Becket's idea.

Committers:

Could you give us some advice on this problem?

Best,
Dongjin

On Wed, Jan 30, 2019 at 11:30 PM Mickael Maison <mi...@gmail.com>
wrote:

> Thanks, that's a very interesting KIP!
>
> I agree with Becket that a clearer config format is likely to help.
> Have you considered using a config format similar to listeners (as
> described in the "Updating SSL Keystore of an Existing Listener" in
> the Kafka docs)?
>
> Also worth noting that we already have configs that effectively take a
> map, for example: listener.security.protocol.map so not sure if we
> need an additional KIP
>
> On Wed, Jan 30, 2019 at 5:22 AM Dongjin Lee <do...@apache.org> wrote:
> >
> > Hello.
> >
> > Do you have any idea on Becket's Idea of new config format (example
> below)?
> >
> > ```
> > compression.config="gzip.compression.level=5, lz4.compression.level=17,
> > zstd.compression.level=22"
> > ```
> >
> > It requires some additional KIP for supporting new config format (map),
> but
> > it can significantly simplify the configuration with flexibility and
> > extensibility. If you prefer this way, I hope to carry the ball.
> >
> > If not, please give me an opinion here or the voting thread.
> >
> > Thanks,
> > Dongjin
> >
> >
> > On Fri, Jan 25, 2019 at 1:25 AM Dongjin Lee <do...@apache.org> wrote:
> >
> > > Hi Becket,
> > >
> > > Thank you for your opinion. Frankly, I have no strong opinion on
> > > configuration name. In this problem, I will follow the community's
> choice.
> > > (I like your idea in that it has a notion of 'scope' per compression
> codec.
> > > However, it should be implemented on top of new config type like Map;
> It
> > > will require another KIP as a prerequisite, but if the community prefer
> > > this way, I will take the task.)
> > >
> > > (One minor correction: the one who thought 'producer' compression
> config
> > > would cause a problem at broker was me, not Ismael - and Ismael
> reassured
> > > me there will be no problem with it.)
> > >
> > > To All,
> > >
> > > How about Becket's idea of 'compression.config' option?
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Wed, Jan 23, 2019 at 1:16 PM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > >> Hi Dongjin,
> > >>
> > >> Thanks for the KIP and sorry for being a bit late on the discussion.
> > >>
> > >> It makes sense to expose the configuration for compression types. But
> I am
> > >> wondering if there is a better way to do that than what proposed in
> the
> > >> KIP. What I feel confusing is that we are effectively sharing the
> > >> configuration across different compression types, the meaning of the
> > >> configuration are actually kind of different depending on the
> compression
> > >> type. This will end up with issues like what Ismael has brought up
> > >> earlier.
> > >> Say if the broker has compression type of producer (this may result in
> > >> mixed compression type in the same topic), and for some reason the
> broker
> > >> needs to re-compress the topic (e.g. due to log compaction), a single
> > >> topic
> > >> level compression config may not work, because a valid compression
> level
> > >> for lz4 maybe invalid for gzip.
> > >>
> > >> One alternative I am thinking is to provide a "compression.config"
> > >> configuration, inside which it specifies configuration used by each
> > >> specific compression type as k-v pairs. The format could use some name
> > >> space as well. For example,
> > >>
> > >>
> > >>
> compression.config="gzip.compression.level=5,lz4.compression.level=17,zstd.compression.level=22".
> > >>
> > >> Each compression type will just pick whatever configuration they need
> from
> > >> the k-v pairs defined in this config.
> > >>
> > >> Besides clarity, some other benefits are:
> > >> 1. Extensibility, we don't have to add more configuration when we add
> new
> > >> compression types, or expose new config for a particular compression
> type.
> > >> 2. Even for the same config, different compression type may have
> different
> > >> terminologies. With the approach we can honor those terminologies
> instead
> > >> of shoehorning them into the same configuration name.
> > >>
> > >> What do you think?
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Jan 23, 2019 at 12:07 AM Dongjin Lee <do...@apache.org>
> wrote:
> > >>
> > >> > Hello. I just fixed the draft implementation, with rebasing onto the
> > >> latest
> > >> > trunk. The KIP was also restored.
> > >> >
> > >> > Please have a look, and if there is no major problem, please vote
> to the
> > >> > voting thread. You know, KIP freeze for 2.2.0 is almost imminent.
> > >> >
> > >> > Thanks,
> > >> > Dongjin
> > >> >
> > >> > On Tue, Jan 22, 2019 at 1:04 AM Ismael Juma <is...@juma.me.uk>
> wrote:
> > >> >
> > >> > > Thanks!
> > >> > >
> > >> > > Ismael
> > >> > >
> > >> > > On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <do...@apache.org>
> > >> wrote:
> > >> > >
> > >> > > > Hi Ismael,
> > >> > > >
> > >> > > > After reviewing
> > >> > > `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
> > >> > > > yes, you are right. If source codec and target codec is
> identical
> > >> and
> > >> > the
> > >> > > > magic is above 0, the broker can do an in-place assignment,
> without
> > >> > > > recompressing. Sorry for my misunderstanding.
> > >> > > >
> > >> > > > Since we don't need `compression.[gzip,lz4,zstd].level` and
> > >> > > > `compression.[gzip,snappy,lz4].buffer.size` anymore, I will
> revert
> > >> > those
> > >> > > > changes and update the KIP with the recent discussions. I will
> > >> complete
> > >> > > it
> > >> > > > in 48 hours from now.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Dongjin
> > >> > > >
> > >> > > > On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <dongjin@apache.org
> >
> > >> > wrote:
> > >> > > >
> > >> > > > > I see. Let me have a check. If not needed, of course, we don't
> > >> have
> > >> > to
> > >> > > > > waste on configuration options.
> > >> > > > >
> > >> > > > > Since the KIP deadline is imminent, I just opened the voting
> > >> thread.
> > >> > > > Let's
> > >> > > > > continue the discussion here.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Dongjin
> > >> > > > >
> > >> > > > > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <
> ismael@juma.me.uk>
> > >> > wrote:
> > >> > > > >
> > >> > > > >> Hi Dongjin,
> > >> > > > >>
> > >> > > > >> When the compression type is "producer", then the broker
> doesn't
> > >> > > > >> recompress
> > >> > > > >> though. Thinking about it some more, there are some uncommon
> > >> cases
> > >> > > where
> > >> > > > >> recompression does happen (the old (and hopefully hardly
> used by
> > >> > now)
> > >> > > > >> message format == 0 and some edge cases), so it is a good
> point
> > >> you
> > >> > > > >> raised.
> > >> > > > >>
> > >> > > > >> It's a bit unfortunate to add so many topic configs for cases
> > >> that
> > >> > > > >> probably
> > >> > > > >> don't matter. That is, if you are using "producer"
> compression,
> > >> you
> > >> > > > >> probably don't need to configure these settings and can live
> with
> > >> > the
> > >> > > > >> defaults. Perhaps we should only support the topic config
> for the
> > >> > > cases
> > >> > > > >> where you are actually recompressing in the broker.
> > >> > > > >>
> > >> > > > >> What do you think? I'd be interested in other people's
> thoughts
> > >> too.
> > >> > > > >>
> > >> > > > >> Ismael
> > >> > > > >>
> > >> > > > >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <
> dongjin@apache.org>
> > >> > > wrote:
> > >> > > > >>
> > >> > > > >> > Hi Ismael,
> > >> > > > >> >
> > >> > > > >> > It seems like it needs more explanation. Here is the
> detailed
> > >> > > > reasoning.
> > >> > > > >> >
> > >> > > > >> > You know, topic and broker's 'compression.type' allows
> > >> > > 'uncompressed',
> > >> > > > >> > 'producer' with standard codecs (i.e., gzip, snappy, lz4,
> > >> zstd.)
> > >> > And
> > >> > > > >> this
> > >> > > > >> > configuration is used by the broker in the re-compressing
> > >> process
> > >> > > > after
> > >> > > > >> > offset assignment. After this feature, the new configs,
> > >> > > > >> 'compression.level'
> > >> > > > >> > and 'compression.buffer.size', also will be used in this
> > >> process.
> > >> > > > >> >
> > >> > > > >> > The problem arises when given topic's compression type
> > >> (whether it
> > >> > > was
> > >> > > > >> > inherited from broker's configuration or explicitly set) is
> > >> > > > 'producer.'
> > >> > > > >> > With this setting, the compression codec to be used is
> decided
> > >> by
> > >> > > the
> > >> > > > >> > producer client. Since there is no way to restore the
> > >> compression
> > >> > > > level
> > >> > > > >> and
> > >> > > > >> > buffer size from the message, we can take the following
> > >> > strategies:
> > >> > > > >> >
> > >> > > > >> > 1. Just use given 'compression.level' and
> > >> > 'compression.buffer.size'
> > >> > > > >> > settings.
> > >> > > > >> >
> > >> > > > >> > It will cause so many errors. Let's imagine the case of
> topic's
> > >> > > > >> > configuration is { compression.type=producer,
> > >> > compression.level=10,
> > >> > > > >> > compression.buffer.size=8192 }. In this case, all producers
> > >> with
> > >> > > gzip
> > >> > > > or
> > >> > > > >> > lz4 compressed messages will result in an error. (gzip
> doesn't
> > >> > allow
> > >> > > > >> > compression level 10, and lz4 also for a buffer size of
> 8192.)
> > >> > > > >> >
> > >> > > > >> > 2. Extend the message format to include compression
> > >> > configurations.
> > >> > > > >> >
> > >> > > > >> > With this strategy, we need to change the message format -
> > >> it's a
> > >> > > too
> > >> > > > >> big
> > >> > > > >> > change.
> > >> > > > >> >
> > >> > > > >> > 3. If topic's compression.type is 'producer', use the
> default
> > >> > > > >> configuration
> > >> > > > >> > for the given codec.
> > >> > > > >> >
> > >> > > > >> > With this strategy, allowing fine-grained compression
> > >> > configuration
> > >> > > is
> > >> > > > >> > meaningless.
> > >> > > > >> >
> > >> > > > >> > For the above reasons, I think the only alternative is
> > >> providing
> > >> > > > options
> > >> > > > >> > that can be used when the topic's 'compression.type' is
> > >> > 'producer.'
> > >> > > In
> > >> > > > >> > other words, adding compression.[gzip, lz4, zstd].level and
> > >> > > > >> > compression.[gzip.snappy.lz4].buffer.size options - and it
> is
> > >> > what I
> > >> > > > >> did in
> > >> > > > >> > the last modification.
> > >> > > > >> >
> > >> > > > >> > (wait, the reasoning above should be included in the KIP
> in the
> > >> > > > rejected
> > >> > > > >> > alternatives section, isn't it?)
> > >> > > > >> >
> > >> > > > >> > Thanks,
> > >> > > > >> > Dongjin
> > >> > > > >> >
> > >> > > > >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <
> ismael@juma.me.uk
> > >> >
> > >> > > > wrote:
> > >> > > > >> >
> > >> > > > >> > > Hi Dongjin,
> > >> > > > >> > >
> > >> > > > >> > > For topic level, you can only have a single compression
> type
> > >> so
> > >> > > the
> > >> > > > >> way
> > >> > > > >> > it
> > >> > > > >> > > was before was fine, right? The point you raise is how
> to set
> > >> > > broker
> > >> > > > >> > > defaults that vary depending on the compression type,
> > >> correct?
> > >> > > > >> > >
> > >> > > > >> > > Ismael
> > >> > > > >> > >
> > >> > > > >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <
> > >> > dongjin@apache.org>
> > >> > > > >> wrote:
> > >> > > > >> > >
> > >> > > > >> > > > I just realized that there was a missing hole in the
> KIP,
> > >> so I
> > >> > > > fixed
> > >> > > > >> > it.
> > >> > > > >> > > > The draft implementation will be updated soon.
> > >> > > > >> > > >
> > >> > > > >> > > > In short, the proposed change did not regard the case
> of
> > >> the
> > >> > > topic
> > >> > > > >> or
> > >> > > > >> > > > broker's 'compression.type' is 'producer'; in this
> case,
> > >> the
> > >> > > > broker
> > >> > > > >> has
> > >> > > > >> > > to
> > >> > > > >> > > > handle all kinds of the supported codec. So I added
> > >> additional
> > >> > > > >> options
> > >> > > > >> > > > (compression.[gzip,snappy,lz4, zstd].level,
> > >> > > > >> > compression.[gzip,snappy,lz4,
> > >> > > > >> > > > zstd].buffer.size) with handling routines.
> > >> > > > >> > > >
> > >> > > > >> > > > Please have a look when you are free.
> > >> > > > >> > > >
> > >> > > > >> > > > Thanks,
> > >> > > > >> > > > Dongjin
> > >> > > > >> > > >
> > >> > > > >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <
> > >> > dongjin@apache.org>
> > >> > > > >> wrote:
> > >> > > > >> > > >
> > >> > > > >> > > > > Thanks for pointing out Ismael. It's now updated.
> > >> > > > >> > > > >
> > >> > > > >> > > > > Best,
> > >> > > > >> > > > > Dongjin
> > >> > > > >> > > > >
> > >> > > > >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <
> > >> > ismaelj@gmail.com
> > >> > > >
> > >> > > > >> > wrote:
> > >> > > > >> > > > >
> > >> > > > >> > > > >> Thanks Dongjin. One minor suggestion: we should
> mention
> > >> > that
> > >> > > > the
> > >> > > > >> > > broker
> > >> > > > >> > > > >> side configs are also topic configs (i.e. can be set
> > >> for a
> > >> > > > given
> > >> > > > >> > > topic).
> > >> > > > >> > > > >>
> > >> > > > >> > > > >> Ismael
> > >> > > > >> > > > >>
> > >> > > > >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <
> > >> > > dongjin@apache.org
> > >> > > > >> > wrote:
> > >> > > > >> > > > >>
> > >> > > > >> > > > >> > Happy new year.
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> > I just updated the title and contents of KIP and
> Jira
> > >> > > issue,
> > >> > > > >> with
> > >> > > > >> > > > >> updated
> > >> > > > >> > > > >> > draft implementation. Now both of compression
> level
> > >> and
> > >> > > > buffer
> > >> > > > >> > size
> > >> > > > >> > > > >> options
> > >> > > > >> > > > >> > are available to producer and broker
> configuration.
> > >> You
> > >> > can
> > >> > > > >> check
> > >> > > > >> > > the
> > >> > > > >> > > > >> > updated KIP from modified URL:
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >>
> > >> > > > >> > > >
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> > Please have a look when you are free.
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> > Thanks,
> > >> > > > >> > > > >> > Dongjin
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <
> > >> > > > ismaelj@gmail.com
> > >> > > > >> >
> > >> > > > >> > > > wrote:
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> > > The updated title sounds fine to me.
> > >> > > > >> > > > >> > >
> > >> > > > >> > > > >> > > Ismael
> > >> > > > >> > > > >> > >
> > >> > > > >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
> > >> > > > dongjin@apache.org
> > >> > > > >> > > wrote:
> > >> > > > >> > > > >> > >
> > >> > > > >> > > > >> > > > Hi Ismael,
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > > Got it. Your direction is perfectly
> reasonable. I
> > >> am
> > >> > > now
> > >> > > > >> > > updating
> > >> > > > >> > > > >> the
> > >> > > > >> > > > >> > KIP
> > >> > > > >> > > > >> > > > document and the implementation.
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > > By allowing the buffer/block size to be
> > >> configurable,
> > >> > > it
> > >> > > > >> would
> > >> > > > >> > > be
> > >> > > > >> > > > >> > better
> > >> > > > >> > > > >> > > to
> > >> > > > >> > > > >> > > > update the title of the KIP like 'Allow
> > >> fine-grained
> > >> > > > >> > > configuration
> > >> > > > >> > > > >> for
> > >> > > > >> > > > >> > > > compression'. Is that right?
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > > @Other committers:
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > > Is there any other opinion on allowing the
> > >> > buffer/block
> > >> > > > >> size
> > >> > > > >> > to
> > >> > > > >> > > be
> > >> > > > >> > > > >> > > > configurable?
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > > Thanks,
> > >> > > > >> > > > >> > > > Dongjin
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
> > >> > > > >> > ismael@juma.me.uk>
> > >> > > > >> > > > >> wrote:
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > > > Hi Dongjin,
> > >> > > > >> > > > >> > > > >
> > >> > > > >> > > > >> > > > > To clarify, I mean a broker topic config
> with
> > >> > regards
> > >> > > > to
> > >> > > > >> > point
> > >> > > > >> > > > 1.
> > >> > > > >> > > > >> As
> > >> > > > >> > > > >> > > you
> > >> > > > >> > > > >> > > > > know, compression can be done by the
> producer
> > >> > and/or
> > >> > > by
> > >> > > > >> the
> > >> > > > >> > > > >> broker.
> > >> > > > >> > > > >> > The
> > >> > > > >> > > > >> > > > > default is for the broker to just use
> whatever
> > >> > > > >> compression
> > >> > > > >> > was
> > >> > > > >> > > > >> used
> > >> > > > >> > > > >> > by
> > >> > > > >> > > > >> > > > the
> > >> > > > >> > > > >> > > > > producer, but this can be changed by the
> user
> > >> on a
> > >> > > per
> > >> > > > >> topic
> > >> > > > >> > > > >> basis.
> > >> > > > >> > > > >> > It
> > >> > > > >> > > > >> > > > > seems like it would make sense for the
> configs
> > >> to
> > >> > be
> > >> > > .
> > >> > > > >> > > > consistent
> > >> > > > >> > > > >> > > between
> > >> > > > >> > > > >> > > > > producer and broker.
> > >> > > > >> > > > >> > > > >
> > >> > > > >> > > > >> > > > > For point 2, I haven't looked at the
> > >> > implementation,
> > >> > > > but
> > >> > > > >> we
> > >> > > > >> > > > could
> > >> > > > >> > > > >> do
> > >> > > > >> > > > >> > it
> > >> > > > >> > > > >> > > > in
> > >> > > > >> > > > >> > > > > the `CompressionType` enum by invoking the
> right
> > >> > > > >> constructor
> > >> > > > >> > > or
> > >> > > > >> > > > >> > > > retrieving
> > >> > > > >> > > > >> > > > > the default value via a constant (if
> defined).
> > >> > That's
> > >> > > > an
> > >> > > > >> > > > >> > implementation
> > >> > > > >> > > > >> > > > > detail and can be discussed in the PR. The
> more
> > >> > > general
> > >> > > > >> > point
> > >> > > > >> > > is
> > >> > > > >> > > > >> to
> > >> > > > >> > > > >> > > rely
> > >> > > > >> > > > >> > > > on
> > >> > > > >> > > > >> > > > > the library defaults instead of choosing one
> > >> > > ourselves.
> > >> > > > >> > > > >> > > > >
> > >> > > > >> > > > >> > > > > For point 3, I'm in favour of doing that in
> this
> > >> > KIP.
> > >> > > > >> > > > >> > > > >
> > >> > > > >> > > > >> > > > > Ismael
> > >> > > > >> > > > >> > > > >
> > >> > > > >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee
> <
> > >> > > > >> > > dongjin@apache.org
> > >> > > > >> > > > >
> > >> > > > >> > > > >> > > wrote:
> > >> > > > >> > > > >> > > > >
> > >> > > > >> > > > >> > > > > > Thank you Ismael, here are the answers:
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > *1. About topic config*
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > After some consideration, I concluded that
> > >> topic
> > >> > > > config
> > >> > > > >> > > > doesn't
> > >> > > > >> > > > >> > need
> > >> > > > >> > > > >> > > to
> > >> > > > >> > > > >> > > > > > support compression.level. Here is why:
> since
> > >> the
> > >> > > > >> > > compression
> > >> > > > >> > > > is
> > >> > > > >> > > > >> > > > > conducted
> > >> > > > >> > > > >> > > > > > by the client, the one who can select the
> best
> > >> > > > >> compression
> > >> > > > >> > > > >> level is
> > >> > > > >> > > > >> > > the
> > >> > > > >> > > > >> > > > > > client itself. Let us assume that the
> > >> compression
> > >> > > > >> level is
> > >> > > > >> > > set
> > >> > > > >> > > > >> at
> > >> > > > >> > > > >> > the
> > >> > > > >> > > > >> > > > > topic
> > >> > > > >> > > > >> > > > > > config level. In that case, there is a
> > >> > possibility
> > >> > > > that
> > >> > > > >> > the
> > >> > > > >> > > > >> > > compression
> > >> > > > >> > > > >> > > > > > level is not optimal for some producers.
> > >> > Actually,
> > >> > > > >> Kafka's
> > >> > > > >> > > go
> > >> > > > >> > > > >> > client
> > >> > > > >> > > > >> > > > also
> > >> > > > >> > > > >> > > > > > supports compression level functionality
> for
> > >> the
> > >> > > > >> producer
> > >> > > > >> > > > config
> > >> > > > >> > > > >> > > only.
> > >> > > > >> > > > >> > > > > > <
> > >> > > > >> https://github.com/Shopify/sarama/blob/master/config.go>
> > >> > > > >> > > > >> (wait,
> > >> > > > >> > > > >> > do
> > >> > > > >> > > > >> > > we
> > >> > > > >> > > > >> > > > > > need
> > >> > > > >> > > > >> > > > > > to add this reasoning in the KIP, rejected
> > >> > > > alternatives
> > >> > > > >> > > > >> section?)
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > *2. About default level*
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > As of current draft implementation, the
> > >> default
> > >> > > > >> > compression
> > >> > > > >> > > is
> > >> > > > >> > > > >> set
> > >> > > > >> > > > >> > on
> > >> > > > >> > > > >> > > > the
> > >> > > > >> > > > >> > > > > > CompressionType enum. Of course, changing
> this
> > >> > > > strategy
> > >> > > > >> > into
> > >> > > > >> > > > >> > relying
> > >> > > > >> > > > >> > > > on a
> > >> > > > >> > > > >> > > > > > method from the library to pick the
> default
> > >> > > > compression
> > >> > > > >> > > level
> > >> > > > >> > > > >> seems
> > >> > > > >> > > > >> > > > > > possible, like `GZIPBlockOutputStream`
> does.
> > >> In
> > >> > > this
> > >> > > > >> case,
> > >> > > > >> > > we
> > >> > > > >> > > > >> need
> > >> > > > >> > > > >> > to
> > >> > > > >> > > > >> > > > add
> > >> > > > >> > > > >> > > > > > similar wrapper class for zstd and modify
> lz4
> > >> the
> > >> > > > >> wrapper
> > >> > > > >> > > > also.
> > >> > > > >> > > > >> Add
> > >> > > > >> > > > >> > > to
> > >> > > > >> > > > >> > > > > > this, it seems like we need to explicitly
> > >> state
> > >> > > that
> > >> > > > we
> > >> > > > >> > > follow
> > >> > > > >> > > > >> the
> > >> > > > >> > > > >> > > > > default
> > >> > > > >> > > > >> > > > > > compression level of the codec in the
> > >> > > documentation.
> > >> > > > Is
> > >> > > > >> > this
> > >> > > > >> > > > >> what
> > >> > > > >> > > > >> > you
> > >> > > > >> > > > >> > > > > > intended?
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > *3. Whether to allow the buffer/block
> size to
> > >> be
> > >> > > > >> > > configurable*
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > Well, As of current draft implementation,
> the
> > >> lz4
> > >> > > > >> level is
> > >> > > > >> > > > >> > > implemented
> > >> > > > >> > > > >> > > > as
> > >> > > > >> > > > >> > > > > > block size; this is caused by my
> > >> misunderstanding
> > >> > > on
> > >> > > > >> lz4.
> > >> > > > >> > > > After
> > >> > > > >> > > > >> > > > reviewing
> > >> > > > >> > > > >> > > > > > lz4 today, I found that it also supports
> > >> > > compression
> > >> > > > >> level
> > >> > > > >> > > of
> > >> > > > >> > > > >> 1~16
> > >> > > > >> > > > >> > > > > > (default: 1), not block size. I will fix
> it in
> > >> > this
> > >> > > > >> > weekend
> > >> > > > >> > > by
> > >> > > > >> > > > >> > > updating
> > >> > > > >> > > > >> > > > > the
> > >> > > > >> > > > >> > > > > > wrapper class.
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > For the problem of the buffer/block size,
> I
> > >> have
> > >> > no
> > >> > > > >> strong
> > >> > > > >> > > > >> opinion.
> > >> > > > >> > > > >> > > If
> > >> > > > >> > > > >> > > > > the
> > >> > > > >> > > > >> > > > > > community needs it, I will do it all
> together.
> > >> > How
> > >> > > do
> > >> > > > >> you
> > >> > > > >> > > > think?
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > In short, it seems like I need to update
> the
> > >> KIP
> > >> > > > >> document
> > >> > > > >> > > for
> > >> > > > >> > > > >> issue
> > >> > > > >> > > > >> > > #1
> > >> > > > >> > > > >> > > > > and
> > >> > > > >> > > > >> > > > > > update the compression wrapper for issue
> #2,
> > >> #3.
> > >> > Is
> > >> > > > >> this
> > >> > > > >> > > okay?
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > Thanks,
> > >> > > > >> > > > >> > > > > > Dongjin
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael
> Juma <
> > >> > > > >> > > > ismaelj@gmail.com
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> > > > wrote:
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > >  Thanks for the KIP, this is helpful. A
> few
> > >> > > > >> questions:
> > >> > > > >> > > > >> > > > > > >
> > >> > > > >> > > > >> > > > > > > 1. Have we considered whether we want to
> > >> allow
> > >> > a
> > >> > > > >> similar
> > >> > > > >> > > > topic
> > >> > > > >> > > > >> > > > config?
> > >> > > > >> > > > >> > > > > > > 2. Can we rely on a method from the
> library
> > >> to
> > >> > > pick
> > >> > > > >> the
> > >> > > > >> > > > >> default
> > >> > > > >> > > > >> > > > > > compression
> > >> > > > >> > > > >> > > > > > > level if compression.level is not set?
> We
> > >> do it
> > >> > > for
> > >> > > > >> gzip
> > >> > > > >> > > and
> > >> > > > >> > > > >> it
> > >> > > > >> > > > >> > > would
> > >> > > > >> > > > >> > > > > > seem
> > >> > > > >> > > > >> > > > > > > reasonable to do something similar for
> the
> > >> > other
> > >> > > > >> > > compression
> > >> > > > >> > > > >> > > > libraries.
> > >> > > > >> > > > >> > > > > > > 3. Do we want to allow the buffer/block
> > >> size to
> > >> > > be
> > >> > > > >> > > > >> configurable?
> > >> > > > >> > > > >> > > This
> > >> > > > >> > > > >> > > > > has
> > >> > > > >> > > > >> > > > > > > an impact on memory usage and people may
> > >> want
> > >> > to
> > >> > > > >> trade
> > >> > > > >> > > > >> > compression
> > >> > > > >> > > > >> > > > for
> > >> > > > >> > > > >> > > > > > > less/more memory in some cases. For
> example,
> > >> > the
> > >> > > > >> default
> > >> > > > >> > > for
> > >> > > > >> > > > >> LZ4
> > >> > > > >> > > > >> > is
> > >> > > > >> > > > >> > > > > 64KB
> > >> > > > >> > > > >> > > > > > > which is a bit high.
> > >> > > > >> > > > >> > > > > > >
> > >> > > > >> > > > >> > > > > > > Ismael
> > >> > > > >> > > > >> > > > > > >
> > >> > > > >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin
> Lee <
> > >> > > > >> > > > dongjin@apache.org
> > >> > > > >> > > > >> > > wrote:
> > >> > > > >> > > > >> > > > > > >
> > >> > > > >> > > > >> > > > > > > > Hello dev,
> > >> > > > >> > > > >> > > > > > > >
> > >> > > > >> > > > >> > > > > > > > I hope to initiate the discussion of
> > >> KIP-390:
> > >> > > Add
> > >> > > > >> > > producer
> > >> > > > >> > > > >> > option
> > >> > > > >> > > > >> > > > to
> > >> > > > >> > > > >> > > > > > > adjust
> > >> > > > >> > > > >> > > > > > > > compression level
> > >> > > > >> > > > >> > > > > > > > <
> > >> > > > >> > > > >> > > > > > > >
> > >> > > > >> > > > >> > > > > > >
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > >
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > >
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >>
> > >> > > > >> > > >
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > >> > > > >> > > > >> > > > > > > > >.
> > >> > > > >> > > > >> > > > > > > > All feedbacks will be highly
> appreciated.
> > >> > > > >> > > > >> > > > > > > >
> > >> > > > >> > > > >> > > > > > > > Best,
> > >> > > > >> > > > >> > > > > > > > Dongjin
> > >> > > > >> > > > >> > > > > > > >
> > >> > > > >> > > > >> > > > > > > > --
> > >> > > > >> > > > >> > > > > > > > *Dongjin Lee*
> > >> > > > >> > > > >> > > > > > > >
> > >> > > > >> > > > >> > > > > > > > *A hitchhiker in the mathematical
> world.*
> > >> > > > >> > > > >> > > > > > > >
> > >> > > > >> > > > >> > > > > > > > *github:  <http://goog_969573159/>
> > >> > > > >> > > github.com/dongjinleekr
> > >> > > > >> > > > >> > > > > > > > <http://github.com/dongjinleekr
> >linkedin:
> > >> > > > >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > > > >> > > > > > > > <
> http://kr.linkedin.com/in/dongjinleekr
> > >> > > > >> >slideshare:
> > >> > > > >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> > >> > > > >> > > > >> > > > > > > > <
> http://www.slideshare.net/dongjinleekr>*
> > >> > > > >> > > > >> > > > > > > >
> > >> > > > >> > > > >> > > > > > >
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > --
> > >> > > > >> > > > >> > > > > > *Dongjin Lee*
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > > > *A hitchhiker in the mathematical world.*
> > >> > > > >> > > > >> > > > > > *github:  <http://goog_969573159/>
> > >> > > > >> github.com/dongjinleekr
> > >> > > > >> > > > >> > > > > > <https://github.com/dongjinleekr
> >linkedin:
> > >> > > > >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr
> > >> > > > >speakerdeck:
> > >> > > > >> > > > >> > > > > > speakerdeck.com/dongjin
> > >> > > > >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> > >> > > > >> > > > >> > > > > >
> > >> > > > >> > > > >> > > > >
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > > --
> > >> > > > >> > > > >> > > > *Dongjin Lee*
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > >> > > > >> > > > >> > > > *github:  <http://goog_969573159/>
> > >> > > > github.com/dongjinleekr
> > >> > > > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
> > >> > >speakerdeck:
> > >> > > > >> > > > >> > > > speakerdeck.com/dongjin
> > >> > > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > >> > > > >> > > > >> > > >
> > >> > > > >> > > > >> > >
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> > --
> > >> > > > >> > > > >> > *Dongjin Lee*
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >> > *A hitchhiker in the mathematical world.*
> > >> > > > >> > > > >> > *github:  <http://goog_969573159/>
> > >> > github.com/dongjinleekr
> > >> > > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > >> > > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr
> >speakerdeck:
> > >> > > > >> > > > >> > speakerdeck.com/dongjin
> > >> > > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > >> > > > >> > > > >> >
> > >> > > > >> > > > >>
> > >> > > > >> > > > >
> > >> > > > >> > > > >
> > >> > > > >> > > > > --
> > >> > > > >> > > > > *Dongjin Lee*
> > >> > > > >> > > > >
> > >> > > > >> > > > > *A hitchhiker in the mathematical world.*
> > >> > > > >> > > > > *github:  <http://goog_969573159/>
> > >> github.com/dongjinleekr
> > >> > > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr
> >speakerdeck:
> > >> > > > >> > > > speakerdeck.com/dongjin
> > >> > > > >> > > > > <https://speakerdeck.com/dongjin>*
> > >> > > > >> > > > >
> > >> > > > >> > > >
> > >> > > > >> > > >
> > >> > > > >> > > > --
> > >> > > > >> > > > *Dongjin Lee*
> > >> > > > >> > > >
> > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > >> > > > >> > > > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > >> > > > speakerdeck.com/dongjin
> > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > >> > > > >> > > >
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >> > --
> > >> > > > >> > *Dongjin Lee*
> > >> > > > >> >
> > >> > > > >> > *A hitchhiker in the mathematical world.*
> > >> > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > >> > speakerdeck.com/dongjin
> > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > *Dongjin Lee*
> > >> > > > >
> > >> > > > > *A hitchhiker in the mathematical world.*
> > >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > speakerdeck.com/dongjin
> > >> > > > > <https://speakerdeck.com/dongjin>*
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > *Dongjin Lee*
> > >> > > >
> > >> > > > *A hitchhiker in the mathematical world.*
> > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > kr.linkedin.com/in/dongjinleekr
> > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > speakerdeck.com/dongjin
> > >> > > > <https://speakerdeck.com/dongjin>*
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > *Dongjin Lee*
> > >> >
> > >> > *A hitchhiker in the mathematical world.*
> > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > <https://github.com/dongjinleekr>linkedin:
> > >> kr.linkedin.com/in/dongjinleekr
> > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > speakerdeck.com/dongjin
> > >> > <https://speakerdeck.com/dongjin>*
> > >> >
> > >>
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> > > <https://speakerdeck.com/dongjin>*
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Mickael Maison <mi...@gmail.com>.
Thanks, that's a very interesting KIP!

I agree with Becket that a clearer config format is likely to help.
Have you considered using a config format similar to listeners (as
described in the "Updating SSL Keystore of an Existing Listener" in
the Kafka docs)?

Also worth noting that we already have configs that effectively take a
map, for example: listener.security.protocol.map so not sure if we
need an additional KIP

On Wed, Jan 30, 2019 at 5:22 AM Dongjin Lee <do...@apache.org> wrote:
>
> Hello.
>
> Do you have any idea on Becket's Idea of new config format (example below)?
>
> ```
> compression.config="gzip.compression.level=5, lz4.compression.level=17,
> zstd.compression.level=22"
> ```
>
> It requires some additional KIP for supporting new config format (map), but
> it can significantly simplify the configuration with flexibility and
> extensibility. If you prefer this way, I hope to carry the ball.
>
> If not, please give me an opinion here or the voting thread.
>
> Thanks,
> Dongjin
>
>
> On Fri, Jan 25, 2019 at 1:25 AM Dongjin Lee <do...@apache.org> wrote:
>
> > Hi Becket,
> >
> > Thank you for your opinion. Frankly, I have no strong opinion on
> > configuration name. In this problem, I will follow the community's choice.
> > (I like your idea in that it has a notion of 'scope' per compression codec.
> > However, it should be implemented on top of new config type like Map; It
> > will require another KIP as a prerequisite, but if the community prefer
> > this way, I will take the task.)
> >
> > (One minor correction: the one who thought 'producer' compression config
> > would cause a problem at broker was me, not Ismael - and Ismael reassured
> > me there will be no problem with it.)
> >
> > To All,
> >
> > How about Becket's idea of 'compression.config' option?
> >
> > Best,
> > Dongjin
> >
> > On Wed, Jan 23, 2019 at 1:16 PM Becket Qin <be...@gmail.com> wrote:
> >
> >> Hi Dongjin,
> >>
> >> Thanks for the KIP and sorry for being a bit late on the discussion.
> >>
> >> It makes sense to expose the configuration for compression types. But I am
> >> wondering if there is a better way to do that than what proposed in the
> >> KIP. What I feel confusing is that we are effectively sharing the
> >> configuration across different compression types, the meaning of the
> >> configuration are actually kind of different depending on the compression
> >> type. This will end up with issues like what Ismael has brought up
> >> earlier.
> >> Say if the broker has compression type of producer (this may result in
> >> mixed compression type in the same topic), and for some reason the broker
> >> needs to re-compress the topic (e.g. due to log compaction), a single
> >> topic
> >> level compression config may not work, because a valid compression level
> >> for lz4 maybe invalid for gzip.
> >>
> >> One alternative I am thinking is to provide a "compression.config"
> >> configuration, inside which it specifies configuration used by each
> >> specific compression type as k-v pairs. The format could use some name
> >> space as well. For example,
> >>
> >>
> >> compression.config="gzip.compression.level=5,lz4.compression.level=17,zstd.compression.level=22".
> >>
> >> Each compression type will just pick whatever configuration they need from
> >> the k-v pairs defined in this config.
> >>
> >> Besides clarity, some other benefits are:
> >> 1. Extensibility, we don't have to add more configuration when we add new
> >> compression types, or expose new config for a particular compression type.
> >> 2. Even for the same config, different compression type may have different
> >> terminologies. With the approach we can honor those terminologies instead
> >> of shoehorning them into the same configuration name.
> >>
> >> What do you think?
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Jan 23, 2019 at 12:07 AM Dongjin Lee <do...@apache.org> wrote:
> >>
> >> > Hello. I just fixed the draft implementation, with rebasing onto the
> >> latest
> >> > trunk. The KIP was also restored.
> >> >
> >> > Please have a look, and if there is no major problem, please vote to the
> >> > voting thread. You know, KIP freeze for 2.2.0 is almost imminent.
> >> >
> >> > Thanks,
> >> > Dongjin
> >> >
> >> > On Tue, Jan 22, 2019 at 1:04 AM Ismael Juma <is...@juma.me.uk> wrote:
> >> >
> >> > > Thanks!
> >> > >
> >> > > Ismael
> >> > >
> >> > > On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <do...@apache.org>
> >> wrote:
> >> > >
> >> > > > Hi Ismael,
> >> > > >
> >> > > > After reviewing
> >> > > `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
> >> > > > yes, you are right. If source codec and target codec is identical
> >> and
> >> > the
> >> > > > magic is above 0, the broker can do an in-place assignment, without
> >> > > > recompressing. Sorry for my misunderstanding.
> >> > > >
> >> > > > Since we don't need `compression.[gzip,lz4,zstd].level` and
> >> > > > `compression.[gzip,snappy,lz4].buffer.size` anymore, I will revert
> >> > those
> >> > > > changes and update the KIP with the recent discussions. I will
> >> complete
> >> > > it
> >> > > > in 48 hours from now.
> >> > > >
> >> > > > Thanks,
> >> > > > Dongjin
> >> > > >
> >> > > > On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <do...@apache.org>
> >> > wrote:
> >> > > >
> >> > > > > I see. Let me have a check. If not needed, of course, we don't
> >> have
> >> > to
> >> > > > > waste on configuration options.
> >> > > > >
> >> > > > > Since the KIP deadline is imminent, I just opened the voting
> >> thread.
> >> > > > Let's
> >> > > > > continue the discussion here.
> >> > > > >
> >> > > > > Best,
> >> > > > > Dongjin
> >> > > > >
> >> > > > > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <is...@juma.me.uk>
> >> > wrote:
> >> > > > >
> >> > > > >> Hi Dongjin,
> >> > > > >>
> >> > > > >> When the compression type is "producer", then the broker doesn't
> >> > > > >> recompress
> >> > > > >> though. Thinking about it some more, there are some uncommon
> >> cases
> >> > > where
> >> > > > >> recompression does happen (the old (and hopefully hardly used by
> >> > now)
> >> > > > >> message format == 0 and some edge cases), so it is a good point
> >> you
> >> > > > >> raised.
> >> > > > >>
> >> > > > >> It's a bit unfortunate to add so many topic configs for cases
> >> that
> >> > > > >> probably
> >> > > > >> don't matter. That is, if you are using "producer" compression,
> >> you
> >> > > > >> probably don't need to configure these settings and can live with
> >> > the
> >> > > > >> defaults. Perhaps we should only support the topic config for the
> >> > > cases
> >> > > > >> where you are actually recompressing in the broker.
> >> > > > >>
> >> > > > >> What do you think? I'd be interested in other people's thoughts
> >> too.
> >> > > > >>
> >> > > > >> Ismael
> >> > > > >>
> >> > > > >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <do...@apache.org>
> >> > > wrote:
> >> > > > >>
> >> > > > >> > Hi Ismael,
> >> > > > >> >
> >> > > > >> > It seems like it needs more explanation. Here is the detailed
> >> > > > reasoning.
> >> > > > >> >
> >> > > > >> > You know, topic and broker's 'compression.type' allows
> >> > > 'uncompressed',
> >> > > > >> > 'producer' with standard codecs (i.e., gzip, snappy, lz4,
> >> zstd.)
> >> > And
> >> > > > >> this
> >> > > > >> > configuration is used by the broker in the re-compressing
> >> process
> >> > > > after
> >> > > > >> > offset assignment. After this feature, the new configs,
> >> > > > >> 'compression.level'
> >> > > > >> > and 'compression.buffer.size', also will be used in this
> >> process.
> >> > > > >> >
> >> > > > >> > The problem arises when given topic's compression type
> >> (whether it
> >> > > was
> >> > > > >> > inherited from broker's configuration or explicitly set) is
> >> > > > 'producer.'
> >> > > > >> > With this setting, the compression codec to be used is decided
> >> by
> >> > > the
> >> > > > >> > producer client. Since there is no way to restore the
> >> compression
> >> > > > level
> >> > > > >> and
> >> > > > >> > buffer size from the message, we can take the following
> >> > strategies:
> >> > > > >> >
> >> > > > >> > 1. Just use given 'compression.level' and
> >> > 'compression.buffer.size'
> >> > > > >> > settings.
> >> > > > >> >
> >> > > > >> > It will cause so many errors. Let's imagine the case of topic's
> >> > > > >> > configuration is { compression.type=producer,
> >> > compression.level=10,
> >> > > > >> > compression.buffer.size=8192 }. In this case, all producers
> >> with
> >> > > gzip
> >> > > > or
> >> > > > >> > lz4 compressed messages will result in an error. (gzip doesn't
> >> > allow
> >> > > > >> > compression level 10, and lz4 also for a buffer size of 8192.)
> >> > > > >> >
> >> > > > >> > 2. Extend the message format to include compression
> >> > configurations.
> >> > > > >> >
> >> > > > >> > With this strategy, we need to change the message format -
> >> it's a
> >> > > too
> >> > > > >> big
> >> > > > >> > change.
> >> > > > >> >
> >> > > > >> > 3. If topic's compression.type is 'producer', use the default
> >> > > > >> configuration
> >> > > > >> > for the given codec.
> >> > > > >> >
> >> > > > >> > With this strategy, allowing fine-grained compression
> >> > configuration
> >> > > is
> >> > > > >> > meaningless.
> >> > > > >> >
> >> > > > >> > For the above reasons, I think the only alternative is
> >> providing
> >> > > > options
> >> > > > >> > that can be used when the topic's 'compression.type' is
> >> > 'producer.'
> >> > > In
> >> > > > >> > other words, adding compression.[gzip, lz4, zstd].level and
> >> > > > >> > compression.[gzip.snappy.lz4].buffer.size options - and it is
> >> > what I
> >> > > > >> did in
> >> > > > >> > the last modification.
> >> > > > >> >
> >> > > > >> > (wait, the reasoning above should be included in the KIP in the
> >> > > > rejected
> >> > > > >> > alternatives section, isn't it?)
> >> > > > >> >
> >> > > > >> > Thanks,
> >> > > > >> > Dongjin
> >> > > > >> >
> >> > > > >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <ismael@juma.me.uk
> >> >
> >> > > > wrote:
> >> > > > >> >
> >> > > > >> > > Hi Dongjin,
> >> > > > >> > >
> >> > > > >> > > For topic level, you can only have a single compression type
> >> so
> >> > > the
> >> > > > >> way
> >> > > > >> > it
> >> > > > >> > > was before was fine, right? The point you raise is how to set
> >> > > broker
> >> > > > >> > > defaults that vary depending on the compression type,
> >> correct?
> >> > > > >> > >
> >> > > > >> > > Ismael
> >> > > > >> > >
> >> > > > >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <
> >> > dongjin@apache.org>
> >> > > > >> wrote:
> >> > > > >> > >
> >> > > > >> > > > I just realized that there was a missing hole in the KIP,
> >> so I
> >> > > > fixed
> >> > > > >> > it.
> >> > > > >> > > > The draft implementation will be updated soon.
> >> > > > >> > > >
> >> > > > >> > > > In short, the proposed change did not regard the case of
> >> the
> >> > > topic
> >> > > > >> or
> >> > > > >> > > > broker's 'compression.type' is 'producer'; in this case,
> >> the
> >> > > > broker
> >> > > > >> has
> >> > > > >> > > to
> >> > > > >> > > > handle all kinds of the supported codec. So I added
> >> additional
> >> > > > >> options
> >> > > > >> > > > (compression.[gzip,snappy,lz4, zstd].level,
> >> > > > >> > compression.[gzip,snappy,lz4,
> >> > > > >> > > > zstd].buffer.size) with handling routines.
> >> > > > >> > > >
> >> > > > >> > > > Please have a look when you are free.
> >> > > > >> > > >
> >> > > > >> > > > Thanks,
> >> > > > >> > > > Dongjin
> >> > > > >> > > >
> >> > > > >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <
> >> > dongjin@apache.org>
> >> > > > >> wrote:
> >> > > > >> > > >
> >> > > > >> > > > > Thanks for pointing out Ismael. It's now updated.
> >> > > > >> > > > >
> >> > > > >> > > > > Best,
> >> > > > >> > > > > Dongjin
> >> > > > >> > > > >
> >> > > > >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <
> >> > ismaelj@gmail.com
> >> > > >
> >> > > > >> > wrote:
> >> > > > >> > > > >
> >> > > > >> > > > >> Thanks Dongjin. One minor suggestion: we should mention
> >> > that
> >> > > > the
> >> > > > >> > > broker
> >> > > > >> > > > >> side configs are also topic configs (i.e. can be set
> >> for a
> >> > > > given
> >> > > > >> > > topic).
> >> > > > >> > > > >>
> >> > > > >> > > > >> Ismael
> >> > > > >> > > > >>
> >> > > > >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <
> >> > > dongjin@apache.org
> >> > > > >> > wrote:
> >> > > > >> > > > >>
> >> > > > >> > > > >> > Happy new year.
> >> > > > >> > > > >> >
> >> > > > >> > > > >> > I just updated the title and contents of KIP and Jira
> >> > > issue,
> >> > > > >> with
> >> > > > >> > > > >> updated
> >> > > > >> > > > >> > draft implementation. Now both of compression level
> >> and
> >> > > > buffer
> >> > > > >> > size
> >> > > > >> > > > >> options
> >> > > > >> > > > >> > are available to producer and broker configuration.
> >> You
> >> > can
> >> > > > >> check
> >> > > > >> > > the
> >> > > > >> > > > >> > updated KIP from modified URL:
> >> > > > >> > > > >> >
> >> > > > >> > > > >> >
> >> > > > >> > > > >> >
> >> > > > >> > > > >>
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> >> > > > >> > > > >> >
> >> > > > >> > > > >> > Please have a look when you are free.
> >> > > > >> > > > >> >
> >> > > > >> > > > >> > Thanks,
> >> > > > >> > > > >> > Dongjin
> >> > > > >> > > > >> >
> >> > > > >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <
> >> > > > ismaelj@gmail.com
> >> > > > >> >
> >> > > > >> > > > wrote:
> >> > > > >> > > > >> >
> >> > > > >> > > > >> > > The updated title sounds fine to me.
> >> > > > >> > > > >> > >
> >> > > > >> > > > >> > > Ismael
> >> > > > >> > > > >> > >
> >> > > > >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
> >> > > > dongjin@apache.org
> >> > > > >> > > wrote:
> >> > > > >> > > > >> > >
> >> > > > >> > > > >> > > > Hi Ismael,
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > > Got it. Your direction is perfectly reasonable. I
> >> am
> >> > > now
> >> > > > >> > > updating
> >> > > > >> > > > >> the
> >> > > > >> > > > >> > KIP
> >> > > > >> > > > >> > > > document and the implementation.
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > > By allowing the buffer/block size to be
> >> configurable,
> >> > > it
> >> > > > >> would
> >> > > > >> > > be
> >> > > > >> > > > >> > better
> >> > > > >> > > > >> > > to
> >> > > > >> > > > >> > > > update the title of the KIP like 'Allow
> >> fine-grained
> >> > > > >> > > configuration
> >> > > > >> > > > >> for
> >> > > > >> > > > >> > > > compression'. Is that right?
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > > @Other committers:
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > > Is there any other opinion on allowing the
> >> > buffer/block
> >> > > > >> size
> >> > > > >> > to
> >> > > > >> > > be
> >> > > > >> > > > >> > > > configurable?
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > > Thanks,
> >> > > > >> > > > >> > > > Dongjin
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
> >> > > > >> > ismael@juma.me.uk>
> >> > > > >> > > > >> wrote:
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > > > Hi Dongjin,
> >> > > > >> > > > >> > > > >
> >> > > > >> > > > >> > > > > To clarify, I mean a broker topic config with
> >> > regards
> >> > > > to
> >> > > > >> > point
> >> > > > >> > > > 1.
> >> > > > >> > > > >> As
> >> > > > >> > > > >> > > you
> >> > > > >> > > > >> > > > > know, compression can be done by the producer
> >> > and/or
> >> > > by
> >> > > > >> the
> >> > > > >> > > > >> broker.
> >> > > > >> > > > >> > The
> >> > > > >> > > > >> > > > > default is for the broker to just use whatever
> >> > > > >> compression
> >> > > > >> > was
> >> > > > >> > > > >> used
> >> > > > >> > > > >> > by
> >> > > > >> > > > >> > > > the
> >> > > > >> > > > >> > > > > producer, but this can be changed by the user
> >> on a
> >> > > per
> >> > > > >> topic
> >> > > > >> > > > >> basis.
> >> > > > >> > > > >> > It
> >> > > > >> > > > >> > > > > seems like it would make sense for the configs
> >> to
> >> > be
> >> > > .
> >> > > > >> > > > consistent
> >> > > > >> > > > >> > > between
> >> > > > >> > > > >> > > > > producer and broker.
> >> > > > >> > > > >> > > > >
> >> > > > >> > > > >> > > > > For point 2, I haven't looked at the
> >> > implementation,
> >> > > > but
> >> > > > >> we
> >> > > > >> > > > could
> >> > > > >> > > > >> do
> >> > > > >> > > > >> > it
> >> > > > >> > > > >> > > > in
> >> > > > >> > > > >> > > > > the `CompressionType` enum by invoking the right
> >> > > > >> constructor
> >> > > > >> > > or
> >> > > > >> > > > >> > > > retrieving
> >> > > > >> > > > >> > > > > the default value via a constant (if defined).
> >> > That's
> >> > > > an
> >> > > > >> > > > >> > implementation
> >> > > > >> > > > >> > > > > detail and can be discussed in the PR. The more
> >> > > general
> >> > > > >> > point
> >> > > > >> > > is
> >> > > > >> > > > >> to
> >> > > > >> > > > >> > > rely
> >> > > > >> > > > >> > > > on
> >> > > > >> > > > >> > > > > the library defaults instead of choosing one
> >> > > ourselves.
> >> > > > >> > > > >> > > > >
> >> > > > >> > > > >> > > > > For point 3, I'm in favour of doing that in this
> >> > KIP.
> >> > > > >> > > > >> > > > >
> >> > > > >> > > > >> > > > > Ismael
> >> > > > >> > > > >> > > > >
> >> > > > >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
> >> > > > >> > > dongjin@apache.org
> >> > > > >> > > > >
> >> > > > >> > > > >> > > wrote:
> >> > > > >> > > > >> > > > >
> >> > > > >> > > > >> > > > > > Thank you Ismael, here are the answers:
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > *1. About topic config*
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > After some consideration, I concluded that
> >> topic
> >> > > > config
> >> > > > >> > > > doesn't
> >> > > > >> > > > >> > need
> >> > > > >> > > > >> > > to
> >> > > > >> > > > >> > > > > > support compression.level. Here is why: since
> >> the
> >> > > > >> > > compression
> >> > > > >> > > > is
> >> > > > >> > > > >> > > > > conducted
> >> > > > >> > > > >> > > > > > by the client, the one who can select the best
> >> > > > >> compression
> >> > > > >> > > > >> level is
> >> > > > >> > > > >> > > the
> >> > > > >> > > > >> > > > > > client itself. Let us assume that the
> >> compression
> >> > > > >> level is
> >> > > > >> > > set
> >> > > > >> > > > >> at
> >> > > > >> > > > >> > the
> >> > > > >> > > > >> > > > > topic
> >> > > > >> > > > >> > > > > > config level. In that case, there is a
> >> > possibility
> >> > > > that
> >> > > > >> > the
> >> > > > >> > > > >> > > compression
> >> > > > >> > > > >> > > > > > level is not optimal for some producers.
> >> > Actually,
> >> > > > >> Kafka's
> >> > > > >> > > go
> >> > > > >> > > > >> > client
> >> > > > >> > > > >> > > > also
> >> > > > >> > > > >> > > > > > supports compression level functionality for
> >> the
> >> > > > >> producer
> >> > > > >> > > > config
> >> > > > >> > > > >> > > only.
> >> > > > >> > > > >> > > > > > <
> >> > > > >> https://github.com/Shopify/sarama/blob/master/config.go>
> >> > > > >> > > > >> (wait,
> >> > > > >> > > > >> > do
> >> > > > >> > > > >> > > we
> >> > > > >> > > > >> > > > > > need
> >> > > > >> > > > >> > > > > > to add this reasoning in the KIP, rejected
> >> > > > alternatives
> >> > > > >> > > > >> section?)
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > *2. About default level*
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > As of current draft implementation, the
> >> default
> >> > > > >> > compression
> >> > > > >> > > is
> >> > > > >> > > > >> set
> >> > > > >> > > > >> > on
> >> > > > >> > > > >> > > > the
> >> > > > >> > > > >> > > > > > CompressionType enum. Of course, changing this
> >> > > > strategy
> >> > > > >> > into
> >> > > > >> > > > >> > relying
> >> > > > >> > > > >> > > > on a
> >> > > > >> > > > >> > > > > > method from the library to pick the default
> >> > > > compression
> >> > > > >> > > level
> >> > > > >> > > > >> seems
> >> > > > >> > > > >> > > > > > possible, like `GZIPBlockOutputStream` does.
> >> In
> >> > > this
> >> > > > >> case,
> >> > > > >> > > we
> >> > > > >> > > > >> need
> >> > > > >> > > > >> > to
> >> > > > >> > > > >> > > > add
> >> > > > >> > > > >> > > > > > similar wrapper class for zstd and modify lz4
> >> the
> >> > > > >> wrapper
> >> > > > >> > > > also.
> >> > > > >> > > > >> Add
> >> > > > >> > > > >> > > to
> >> > > > >> > > > >> > > > > > this, it seems like we need to explicitly
> >> state
> >> > > that
> >> > > > we
> >> > > > >> > > follow
> >> > > > >> > > > >> the
> >> > > > >> > > > >> > > > > default
> >> > > > >> > > > >> > > > > > compression level of the codec in the
> >> > > documentation.
> >> > > > Is
> >> > > > >> > this
> >> > > > >> > > > >> what
> >> > > > >> > > > >> > you
> >> > > > >> > > > >> > > > > > intended?
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > *3. Whether to allow the buffer/block size to
> >> be
> >> > > > >> > > configurable*
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > Well, As of current draft implementation, the
> >> lz4
> >> > > > >> level is
> >> > > > >> > > > >> > > implemented
> >> > > > >> > > > >> > > > as
> >> > > > >> > > > >> > > > > > block size; this is caused by my
> >> misunderstanding
> >> > > on
> >> > > > >> lz4.
> >> > > > >> > > > After
> >> > > > >> > > > >> > > > reviewing
> >> > > > >> > > > >> > > > > > lz4 today, I found that it also supports
> >> > > compression
> >> > > > >> level
> >> > > > >> > > of
> >> > > > >> > > > >> 1~16
> >> > > > >> > > > >> > > > > > (default: 1), not block size. I will fix it in
> >> > this
> >> > > > >> > weekend
> >> > > > >> > > by
> >> > > > >> > > > >> > > updating
> >> > > > >> > > > >> > > > > the
> >> > > > >> > > > >> > > > > > wrapper class.
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > For the problem of the buffer/block size, I
> >> have
> >> > no
> >> > > > >> strong
> >> > > > >> > > > >> opinion.
> >> > > > >> > > > >> > > If
> >> > > > >> > > > >> > > > > the
> >> > > > >> > > > >> > > > > > community needs it, I will do it all together.
> >> > How
> >> > > do
> >> > > > >> you
> >> > > > >> > > > think?
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > In short, it seems like I need to update the
> >> KIP
> >> > > > >> document
> >> > > > >> > > for
> >> > > > >> > > > >> issue
> >> > > > >> > > > >> > > #1
> >> > > > >> > > > >> > > > > and
> >> > > > >> > > > >> > > > > > update the compression wrapper for issue #2,
> >> #3.
> >> > Is
> >> > > > >> this
> >> > > > >> > > okay?
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > Thanks,
> >> > > > >> > > > >> > > > > > Dongjin
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
> >> > > > >> > > > ismaelj@gmail.com
> >> > > > >> > > > >> >
> >> > > > >> > > > >> > > > wrote:
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > >  Thanks for the KIP, this is helpful. A few
> >> > > > >> questions:
> >> > > > >> > > > >> > > > > > >
> >> > > > >> > > > >> > > > > > > 1. Have we considered whether we want to
> >> allow
> >> > a
> >> > > > >> similar
> >> > > > >> > > > topic
> >> > > > >> > > > >> > > > config?
> >> > > > >> > > > >> > > > > > > 2. Can we rely on a method from the library
> >> to
> >> > > pick
> >> > > > >> the
> >> > > > >> > > > >> default
> >> > > > >> > > > >> > > > > > compression
> >> > > > >> > > > >> > > > > > > level if compression.level is not set? We
> >> do it
> >> > > for
> >> > > > >> gzip
> >> > > > >> > > and
> >> > > > >> > > > >> it
> >> > > > >> > > > >> > > would
> >> > > > >> > > > >> > > > > > seem
> >> > > > >> > > > >> > > > > > > reasonable to do something similar for the
> >> > other
> >> > > > >> > > compression
> >> > > > >> > > > >> > > > libraries.
> >> > > > >> > > > >> > > > > > > 3. Do we want to allow the buffer/block
> >> size to
> >> > > be
> >> > > > >> > > > >> configurable?
> >> > > > >> > > > >> > > This
> >> > > > >> > > > >> > > > > has
> >> > > > >> > > > >> > > > > > > an impact on memory usage and people may
> >> want
> >> > to
> >> > > > >> trade
> >> > > > >> > > > >> > compression
> >> > > > >> > > > >> > > > for
> >> > > > >> > > > >> > > > > > > less/more memory in some cases. For example,
> >> > the
> >> > > > >> default
> >> > > > >> > > for
> >> > > > >> > > > >> LZ4
> >> > > > >> > > > >> > is
> >> > > > >> > > > >> > > > > 64KB
> >> > > > >> > > > >> > > > > > > which is a bit high.
> >> > > > >> > > > >> > > > > > >
> >> > > > >> > > > >> > > > > > > Ismael
> >> > > > >> > > > >> > > > > > >
> >> > > > >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
> >> > > > >> > > > dongjin@apache.org
> >> > > > >> > > > >> > > wrote:
> >> > > > >> > > > >> > > > > > >
> >> > > > >> > > > >> > > > > > > > Hello dev,
> >> > > > >> > > > >> > > > > > > >
> >> > > > >> > > > >> > > > > > > > I hope to initiate the discussion of
> >> KIP-390:
> >> > > Add
> >> > > > >> > > producer
> >> > > > >> > > > >> > option
> >> > > > >> > > > >> > > > to
> >> > > > >> > > > >> > > > > > > adjust
> >> > > > >> > > > >> > > > > > > > compression level
> >> > > > >> > > > >> > > > > > > > <
> >> > > > >> > > > >> > > > > > > >
> >> > > > >> > > > >> > > > > > >
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > >
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > >
> >> > > > >> > > > >> >
> >> > > > >> > > > >>
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> >> > > > >> > > > >> > > > > > > > >.
> >> > > > >> > > > >> > > > > > > > All feedbacks will be highly appreciated.
> >> > > > >> > > > >> > > > > > > >
> >> > > > >> > > > >> > > > > > > > Best,
> >> > > > >> > > > >> > > > > > > > Dongjin
> >> > > > >> > > > >> > > > > > > >
> >> > > > >> > > > >> > > > > > > > --
> >> > > > >> > > > >> > > > > > > > *Dongjin Lee*
> >> > > > >> > > > >> > > > > > > >
> >> > > > >> > > > >> > > > > > > > *A hitchhiker in the mathematical world.*
> >> > > > >> > > > >> > > > > > > >
> >> > > > >> > > > >> > > > > > > > *github:  <http://goog_969573159/>
> >> > > > >> > > github.com/dongjinleekr
> >> > > > >> > > > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> >> > > > >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> >> > > > >> > > > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
> >> > > > >> >slideshare:
> >> > > > >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> >> > > > >> > > > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> >> > > > >> > > > >> > > > > > > >
> >> > > > >> > > > >> > > > > > >
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > --
> >> > > > >> > > > >> > > > > > *Dongjin Lee*
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > > > *A hitchhiker in the mathematical world.*
> >> > > > >> > > > >> > > > > > *github:  <http://goog_969573159/>
> >> > > > >> github.com/dongjinleekr
> >> > > > >> > > > >> > > > > > <https://github.com/dongjinleekr>linkedin:
> >> > > > >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> >> > > > >> > > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr
> >> > > > >speakerdeck:
> >> > > > >> > > > >> > > > > > speakerdeck.com/dongjin
> >> > > > >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> >> > > > >> > > > >> > > > > >
> >> > > > >> > > > >> > > > >
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > > --
> >> > > > >> > > > >> > > > *Dongjin Lee*
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> >> > > > >> > > > >> > > > *github:  <http://goog_969573159/>
> >> > > > github.com/dongjinleekr
> >> > > > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> >> > > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> >> > > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
> >> > >speakerdeck:
> >> > > > >> > > > >> > > > speakerdeck.com/dongjin
> >> > > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> >> > > > >> > > > >> > > >
> >> > > > >> > > > >> > >
> >> > > > >> > > > >> >
> >> > > > >> > > > >> >
> >> > > > >> > > > >> > --
> >> > > > >> > > > >> > *Dongjin Lee*
> >> > > > >> > > > >> >
> >> > > > >> > > > >> > *A hitchhiker in the mathematical world.*
> >> > > > >> > > > >> > *github:  <http://goog_969573159/>
> >> > github.com/dongjinleekr
> >> > > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> >> > > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> >> > > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > >> > > > >> > speakerdeck.com/dongjin
> >> > > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> >> > > > >> > > > >> >
> >> > > > >> > > > >>
> >> > > > >> > > > >
> >> > > > >> > > > >
> >> > > > >> > > > > --
> >> > > > >> > > > > *Dongjin Lee*
> >> > > > >> > > > >
> >> > > > >> > > > > *A hitchhiker in the mathematical world.*
> >> > > > >> > > > > *github:  <http://goog_969573159/>
> >> github.com/dongjinleekr
> >> > > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> >> > > > >> > > > kr.linkedin.com/in/dongjinleekr
> >> > > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > >> > > > speakerdeck.com/dongjin
> >> > > > >> > > > > <https://speakerdeck.com/dongjin>*
> >> > > > >> > > > >
> >> > > > >> > > >
> >> > > > >> > > >
> >> > > > >> > > > --
> >> > > > >> > > > *Dongjin Lee*
> >> > > > >> > > >
> >> > > > >> > > > *A hitchhiker in the mathematical world.*
> >> > > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > >> > > > speakerdeck.com/dongjin
> >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > --
> >> > > > >> > *Dongjin Lee*
> >> > > > >> >
> >> > > > >> > *A hitchhiker in the mathematical world.*
> >> > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> >> > > > >> kr.linkedin.com/in/dongjinleekr
> >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > >> > speakerdeck.com/dongjin
> >> > > > >> > <https://speakerdeck.com/dongjin>*
> >> > > > >> >
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > *Dongjin Lee*
> >> > > > >
> >> > > > > *A hitchhiker in the mathematical world.*
> >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > > <https://github.com/dongjinleekr>linkedin:
> >> > > > kr.linkedin.com/in/dongjinleekr
> >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > speakerdeck.com/dongjin
> >> > > > > <https://speakerdeck.com/dongjin>*
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > *Dongjin Lee*
> >> > > >
> >> > > > *A hitchhiker in the mathematical world.*
> >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > <https://github.com/dongjinleekr>linkedin:
> >> > > kr.linkedin.com/in/dongjinleekr
> >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > speakerdeck.com/dongjin
> >> > > > <https://speakerdeck.com/dongjin>*
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > *Dongjin Lee*
> >> >
> >> > *A hitchhiker in the mathematical world.*
> >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > <https://github.com/dongjinleekr>linkedin:
> >> kr.linkedin.com/in/dongjinleekr
> >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > speakerdeck.com/dongjin
> >> > <https://speakerdeck.com/dongjin>*
> >> >
> >>
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Hello.

Do you have any idea on Becket's Idea of new config format (example below)?

```
compression.config="gzip.compression.level=5, lz4.compression.level=17,
zstd.compression.level=22"
```

It requires some additional KIP for supporting new config format (map), but
it can significantly simplify the configuration with flexibility and
extensibility. If you prefer this way, I hope to carry the ball.

If not, please give me an opinion here or the voting thread.

Thanks,
Dongjin


On Fri, Jan 25, 2019 at 1:25 AM Dongjin Lee <do...@apache.org> wrote:

> Hi Becket,
>
> Thank you for your opinion. Frankly, I have no strong opinion on
> configuration name. In this problem, I will follow the community's choice.
> (I like your idea in that it has a notion of 'scope' per compression codec.
> However, it should be implemented on top of new config type like Map; It
> will require another KIP as a prerequisite, but if the community prefer
> this way, I will take the task.)
>
> (One minor correction: the one who thought 'producer' compression config
> would cause a problem at broker was me, not Ismael - and Ismael reassured
> me there will be no problem with it.)
>
> To All,
>
> How about Becket's idea of 'compression.config' option?
>
> Best,
> Dongjin
>
> On Wed, Jan 23, 2019 at 1:16 PM Becket Qin <be...@gmail.com> wrote:
>
>> Hi Dongjin,
>>
>> Thanks for the KIP and sorry for being a bit late on the discussion.
>>
>> It makes sense to expose the configuration for compression types. But I am
>> wondering if there is a better way to do that than what proposed in the
>> KIP. What I feel confusing is that we are effectively sharing the
>> configuration across different compression types, the meaning of the
>> configuration are actually kind of different depending on the compression
>> type. This will end up with issues like what Ismael has brought up
>> earlier.
>> Say if the broker has compression type of producer (this may result in
>> mixed compression type in the same topic), and for some reason the broker
>> needs to re-compress the topic (e.g. due to log compaction), a single
>> topic
>> level compression config may not work, because a valid compression level
>> for lz4 maybe invalid for gzip.
>>
>> One alternative I am thinking is to provide a "compression.config"
>> configuration, inside which it specifies configuration used by each
>> specific compression type as k-v pairs. The format could use some name
>> space as well. For example,
>>
>>
>> compression.config="gzip.compression.level=5,lz4.compression.level=17,zstd.compression.level=22".
>>
>> Each compression type will just pick whatever configuration they need from
>> the k-v pairs defined in this config.
>>
>> Besides clarity, some other benefits are:
>> 1. Extensibility, we don't have to add more configuration when we add new
>> compression types, or expose new config for a particular compression type.
>> 2. Even for the same config, different compression type may have different
>> terminologies. With the approach we can honor those terminologies instead
>> of shoehorning them into the same configuration name.
>>
>> What do you think?
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>>
>>
>>
>>
>> On Wed, Jan 23, 2019 at 12:07 AM Dongjin Lee <do...@apache.org> wrote:
>>
>> > Hello. I just fixed the draft implementation, with rebasing onto the
>> latest
>> > trunk. The KIP was also restored.
>> >
>> > Please have a look, and if there is no major problem, please vote to the
>> > voting thread. You know, KIP freeze for 2.2.0 is almost imminent.
>> >
>> > Thanks,
>> > Dongjin
>> >
>> > On Tue, Jan 22, 2019 at 1:04 AM Ismael Juma <is...@juma.me.uk> wrote:
>> >
>> > > Thanks!
>> > >
>> > > Ismael
>> > >
>> > > On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <do...@apache.org>
>> wrote:
>> > >
>> > > > Hi Ismael,
>> > > >
>> > > > After reviewing
>> > > `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
>> > > > yes, you are right. If source codec and target codec is identical
>> and
>> > the
>> > > > magic is above 0, the broker can do an in-place assignment, without
>> > > > recompressing. Sorry for my misunderstanding.
>> > > >
>> > > > Since we don't need `compression.[gzip,lz4,zstd].level` and
>> > > > `compression.[gzip,snappy,lz4].buffer.size` anymore, I will revert
>> > those
>> > > > changes and update the KIP with the recent discussions. I will
>> complete
>> > > it
>> > > > in 48 hours from now.
>> > > >
>> > > > Thanks,
>> > > > Dongjin
>> > > >
>> > > > On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <do...@apache.org>
>> > wrote:
>> > > >
>> > > > > I see. Let me have a check. If not needed, of course, we don't
>> have
>> > to
>> > > > > waste on configuration options.
>> > > > >
>> > > > > Since the KIP deadline is imminent, I just opened the voting
>> thread.
>> > > > Let's
>> > > > > continue the discussion here.
>> > > > >
>> > > > > Best,
>> > > > > Dongjin
>> > > > >
>> > > > > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <is...@juma.me.uk>
>> > wrote:
>> > > > >
>> > > > >> Hi Dongjin,
>> > > > >>
>> > > > >> When the compression type is "producer", then the broker doesn't
>> > > > >> recompress
>> > > > >> though. Thinking about it some more, there are some uncommon
>> cases
>> > > where
>> > > > >> recompression does happen (the old (and hopefully hardly used by
>> > now)
>> > > > >> message format == 0 and some edge cases), so it is a good point
>> you
>> > > > >> raised.
>> > > > >>
>> > > > >> It's a bit unfortunate to add so many topic configs for cases
>> that
>> > > > >> probably
>> > > > >> don't matter. That is, if you are using "producer" compression,
>> you
>> > > > >> probably don't need to configure these settings and can live with
>> > the
>> > > > >> defaults. Perhaps we should only support the topic config for the
>> > > cases
>> > > > >> where you are actually recompressing in the broker.
>> > > > >>
>> > > > >> What do you think? I'd be interested in other people's thoughts
>> too.
>> > > > >>
>> > > > >> Ismael
>> > > > >>
>> > > > >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <do...@apache.org>
>> > > wrote:
>> > > > >>
>> > > > >> > Hi Ismael,
>> > > > >> >
>> > > > >> > It seems like it needs more explanation. Here is the detailed
>> > > > reasoning.
>> > > > >> >
>> > > > >> > You know, topic and broker's 'compression.type' allows
>> > > 'uncompressed',
>> > > > >> > 'producer' with standard codecs (i.e., gzip, snappy, lz4,
>> zstd.)
>> > And
>> > > > >> this
>> > > > >> > configuration is used by the broker in the re-compressing
>> process
>> > > > after
>> > > > >> > offset assignment. After this feature, the new configs,
>> > > > >> 'compression.level'
>> > > > >> > and 'compression.buffer.size', also will be used in this
>> process.
>> > > > >> >
>> > > > >> > The problem arises when given topic's compression type
>> (whether it
>> > > was
>> > > > >> > inherited from broker's configuration or explicitly set) is
>> > > > 'producer.'
>> > > > >> > With this setting, the compression codec to be used is decided
>> by
>> > > the
>> > > > >> > producer client. Since there is no way to restore the
>> compression
>> > > > level
>> > > > >> and
>> > > > >> > buffer size from the message, we can take the following
>> > strategies:
>> > > > >> >
>> > > > >> > 1. Just use given 'compression.level' and
>> > 'compression.buffer.size'
>> > > > >> > settings.
>> > > > >> >
>> > > > >> > It will cause so many errors. Let's imagine the case of topic's
>> > > > >> > configuration is { compression.type=producer,
>> > compression.level=10,
>> > > > >> > compression.buffer.size=8192 }. In this case, all producers
>> with
>> > > gzip
>> > > > or
>> > > > >> > lz4 compressed messages will result in an error. (gzip doesn't
>> > allow
>> > > > >> > compression level 10, and lz4 also for a buffer size of 8192.)
>> > > > >> >
>> > > > >> > 2. Extend the message format to include compression
>> > configurations.
>> > > > >> >
>> > > > >> > With this strategy, we need to change the message format -
>> it's a
>> > > too
>> > > > >> big
>> > > > >> > change.
>> > > > >> >
>> > > > >> > 3. If topic's compression.type is 'producer', use the default
>> > > > >> configuration
>> > > > >> > for the given codec.
>> > > > >> >
>> > > > >> > With this strategy, allowing fine-grained compression
>> > configuration
>> > > is
>> > > > >> > meaningless.
>> > > > >> >
>> > > > >> > For the above reasons, I think the only alternative is
>> providing
>> > > > options
>> > > > >> > that can be used when the topic's 'compression.type' is
>> > 'producer.'
>> > > In
>> > > > >> > other words, adding compression.[gzip, lz4, zstd].level and
>> > > > >> > compression.[gzip.snappy.lz4].buffer.size options - and it is
>> > what I
>> > > > >> did in
>> > > > >> > the last modification.
>> > > > >> >
>> > > > >> > (wait, the reasoning above should be included in the KIP in the
>> > > > rejected
>> > > > >> > alternatives section, isn't it?)
>> > > > >> >
>> > > > >> > Thanks,
>> > > > >> > Dongjin
>> > > > >> >
>> > > > >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <ismael@juma.me.uk
>> >
>> > > > wrote:
>> > > > >> >
>> > > > >> > > Hi Dongjin,
>> > > > >> > >
>> > > > >> > > For topic level, you can only have a single compression type
>> so
>> > > the
>> > > > >> way
>> > > > >> > it
>> > > > >> > > was before was fine, right? The point you raise is how to set
>> > > broker
>> > > > >> > > defaults that vary depending on the compression type,
>> correct?
>> > > > >> > >
>> > > > >> > > Ismael
>> > > > >> > >
>> > > > >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <
>> > dongjin@apache.org>
>> > > > >> wrote:
>> > > > >> > >
>> > > > >> > > > I just realized that there was a missing hole in the KIP,
>> so I
>> > > > fixed
>> > > > >> > it.
>> > > > >> > > > The draft implementation will be updated soon.
>> > > > >> > > >
>> > > > >> > > > In short, the proposed change did not regard the case of
>> the
>> > > topic
>> > > > >> or
>> > > > >> > > > broker's 'compression.type' is 'producer'; in this case,
>> the
>> > > > broker
>> > > > >> has
>> > > > >> > > to
>> > > > >> > > > handle all kinds of the supported codec. So I added
>> additional
>> > > > >> options
>> > > > >> > > > (compression.[gzip,snappy,lz4, zstd].level,
>> > > > >> > compression.[gzip,snappy,lz4,
>> > > > >> > > > zstd].buffer.size) with handling routines.
>> > > > >> > > >
>> > > > >> > > > Please have a look when you are free.
>> > > > >> > > >
>> > > > >> > > > Thanks,
>> > > > >> > > > Dongjin
>> > > > >> > > >
>> > > > >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <
>> > dongjin@apache.org>
>> > > > >> wrote:
>> > > > >> > > >
>> > > > >> > > > > Thanks for pointing out Ismael. It's now updated.
>> > > > >> > > > >
>> > > > >> > > > > Best,
>> > > > >> > > > > Dongjin
>> > > > >> > > > >
>> > > > >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <
>> > ismaelj@gmail.com
>> > > >
>> > > > >> > wrote:
>> > > > >> > > > >
>> > > > >> > > > >> Thanks Dongjin. One minor suggestion: we should mention
>> > that
>> > > > the
>> > > > >> > > broker
>> > > > >> > > > >> side configs are also topic configs (i.e. can be set
>> for a
>> > > > given
>> > > > >> > > topic).
>> > > > >> > > > >>
>> > > > >> > > > >> Ismael
>> > > > >> > > > >>
>> > > > >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <
>> > > dongjin@apache.org
>> > > > >> > wrote:
>> > > > >> > > > >>
>> > > > >> > > > >> > Happy new year.
>> > > > >> > > > >> >
>> > > > >> > > > >> > I just updated the title and contents of KIP and Jira
>> > > issue,
>> > > > >> with
>> > > > >> > > > >> updated
>> > > > >> > > > >> > draft implementation. Now both of compression level
>> and
>> > > > buffer
>> > > > >> > size
>> > > > >> > > > >> options
>> > > > >> > > > >> > are available to producer and broker configuration.
>> You
>> > can
>> > > > >> check
>> > > > >> > > the
>> > > > >> > > > >> > updated KIP from modified URL:
>> > > > >> > > > >> >
>> > > > >> > > > >> >
>> > > > >> > > > >> >
>> > > > >> > > > >>
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
>> > > > >> > > > >> >
>> > > > >> > > > >> > Please have a look when you are free.
>> > > > >> > > > >> >
>> > > > >> > > > >> > Thanks,
>> > > > >> > > > >> > Dongjin
>> > > > >> > > > >> >
>> > > > >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <
>> > > > ismaelj@gmail.com
>> > > > >> >
>> > > > >> > > > wrote:
>> > > > >> > > > >> >
>> > > > >> > > > >> > > The updated title sounds fine to me.
>> > > > >> > > > >> > >
>> > > > >> > > > >> > > Ismael
>> > > > >> > > > >> > >
>> > > > >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
>> > > > dongjin@apache.org
>> > > > >> > > wrote:
>> > > > >> > > > >> > >
>> > > > >> > > > >> > > > Hi Ismael,
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > > Got it. Your direction is perfectly reasonable. I
>> am
>> > > now
>> > > > >> > > updating
>> > > > >> > > > >> the
>> > > > >> > > > >> > KIP
>> > > > >> > > > >> > > > document and the implementation.
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > > By allowing the buffer/block size to be
>> configurable,
>> > > it
>> > > > >> would
>> > > > >> > > be
>> > > > >> > > > >> > better
>> > > > >> > > > >> > > to
>> > > > >> > > > >> > > > update the title of the KIP like 'Allow
>> fine-grained
>> > > > >> > > configuration
>> > > > >> > > > >> for
>> > > > >> > > > >> > > > compression'. Is that right?
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > > @Other committers:
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > > Is there any other opinion on allowing the
>> > buffer/block
>> > > > >> size
>> > > > >> > to
>> > > > >> > > be
>> > > > >> > > > >> > > > configurable?
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > > Thanks,
>> > > > >> > > > >> > > > Dongjin
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
>> > > > >> > ismael@juma.me.uk>
>> > > > >> > > > >> wrote:
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > > > Hi Dongjin,
>> > > > >> > > > >> > > > >
>> > > > >> > > > >> > > > > To clarify, I mean a broker topic config with
>> > regards
>> > > > to
>> > > > >> > point
>> > > > >> > > > 1.
>> > > > >> > > > >> As
>> > > > >> > > > >> > > you
>> > > > >> > > > >> > > > > know, compression can be done by the producer
>> > and/or
>> > > by
>> > > > >> the
>> > > > >> > > > >> broker.
>> > > > >> > > > >> > The
>> > > > >> > > > >> > > > > default is for the broker to just use whatever
>> > > > >> compression
>> > > > >> > was
>> > > > >> > > > >> used
>> > > > >> > > > >> > by
>> > > > >> > > > >> > > > the
>> > > > >> > > > >> > > > > producer, but this can be changed by the user
>> on a
>> > > per
>> > > > >> topic
>> > > > >> > > > >> basis.
>> > > > >> > > > >> > It
>> > > > >> > > > >> > > > > seems like it would make sense for the configs
>> to
>> > be
>> > > .
>> > > > >> > > > consistent
>> > > > >> > > > >> > > between
>> > > > >> > > > >> > > > > producer and broker.
>> > > > >> > > > >> > > > >
>> > > > >> > > > >> > > > > For point 2, I haven't looked at the
>> > implementation,
>> > > > but
>> > > > >> we
>> > > > >> > > > could
>> > > > >> > > > >> do
>> > > > >> > > > >> > it
>> > > > >> > > > >> > > > in
>> > > > >> > > > >> > > > > the `CompressionType` enum by invoking the right
>> > > > >> constructor
>> > > > >> > > or
>> > > > >> > > > >> > > > retrieving
>> > > > >> > > > >> > > > > the default value via a constant (if defined).
>> > That's
>> > > > an
>> > > > >> > > > >> > implementation
>> > > > >> > > > >> > > > > detail and can be discussed in the PR. The more
>> > > general
>> > > > >> > point
>> > > > >> > > is
>> > > > >> > > > >> to
>> > > > >> > > > >> > > rely
>> > > > >> > > > >> > > > on
>> > > > >> > > > >> > > > > the library defaults instead of choosing one
>> > > ourselves.
>> > > > >> > > > >> > > > >
>> > > > >> > > > >> > > > > For point 3, I'm in favour of doing that in this
>> > KIP.
>> > > > >> > > > >> > > > >
>> > > > >> > > > >> > > > > Ismael
>> > > > >> > > > >> > > > >
>> > > > >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
>> > > > >> > > dongjin@apache.org
>> > > > >> > > > >
>> > > > >> > > > >> > > wrote:
>> > > > >> > > > >> > > > >
>> > > > >> > > > >> > > > > > Thank you Ismael, here are the answers:
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > *1. About topic config*
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > After some consideration, I concluded that
>> topic
>> > > > config
>> > > > >> > > > doesn't
>> > > > >> > > > >> > need
>> > > > >> > > > >> > > to
>> > > > >> > > > >> > > > > > support compression.level. Here is why: since
>> the
>> > > > >> > > compression
>> > > > >> > > > is
>> > > > >> > > > >> > > > > conducted
>> > > > >> > > > >> > > > > > by the client, the one who can select the best
>> > > > >> compression
>> > > > >> > > > >> level is
>> > > > >> > > > >> > > the
>> > > > >> > > > >> > > > > > client itself. Let us assume that the
>> compression
>> > > > >> level is
>> > > > >> > > set
>> > > > >> > > > >> at
>> > > > >> > > > >> > the
>> > > > >> > > > >> > > > > topic
>> > > > >> > > > >> > > > > > config level. In that case, there is a
>> > possibility
>> > > > that
>> > > > >> > the
>> > > > >> > > > >> > > compression
>> > > > >> > > > >> > > > > > level is not optimal for some producers.
>> > Actually,
>> > > > >> Kafka's
>> > > > >> > > go
>> > > > >> > > > >> > client
>> > > > >> > > > >> > > > also
>> > > > >> > > > >> > > > > > supports compression level functionality for
>> the
>> > > > >> producer
>> > > > >> > > > config
>> > > > >> > > > >> > > only.
>> > > > >> > > > >> > > > > > <
>> > > > >> https://github.com/Shopify/sarama/blob/master/config.go>
>> > > > >> > > > >> (wait,
>> > > > >> > > > >> > do
>> > > > >> > > > >> > > we
>> > > > >> > > > >> > > > > > need
>> > > > >> > > > >> > > > > > to add this reasoning in the KIP, rejected
>> > > > alternatives
>> > > > >> > > > >> section?)
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > *2. About default level*
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > As of current draft implementation, the
>> default
>> > > > >> > compression
>> > > > >> > > is
>> > > > >> > > > >> set
>> > > > >> > > > >> > on
>> > > > >> > > > >> > > > the
>> > > > >> > > > >> > > > > > CompressionType enum. Of course, changing this
>> > > > strategy
>> > > > >> > into
>> > > > >> > > > >> > relying
>> > > > >> > > > >> > > > on a
>> > > > >> > > > >> > > > > > method from the library to pick the default
>> > > > compression
>> > > > >> > > level
>> > > > >> > > > >> seems
>> > > > >> > > > >> > > > > > possible, like `GZIPBlockOutputStream` does.
>> In
>> > > this
>> > > > >> case,
>> > > > >> > > we
>> > > > >> > > > >> need
>> > > > >> > > > >> > to
>> > > > >> > > > >> > > > add
>> > > > >> > > > >> > > > > > similar wrapper class for zstd and modify lz4
>> the
>> > > > >> wrapper
>> > > > >> > > > also.
>> > > > >> > > > >> Add
>> > > > >> > > > >> > > to
>> > > > >> > > > >> > > > > > this, it seems like we need to explicitly
>> state
>> > > that
>> > > > we
>> > > > >> > > follow
>> > > > >> > > > >> the
>> > > > >> > > > >> > > > > default
>> > > > >> > > > >> > > > > > compression level of the codec in the
>> > > documentation.
>> > > > Is
>> > > > >> > this
>> > > > >> > > > >> what
>> > > > >> > > > >> > you
>> > > > >> > > > >> > > > > > intended?
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > *3. Whether to allow the buffer/block size to
>> be
>> > > > >> > > configurable*
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > Well, As of current draft implementation, the
>> lz4
>> > > > >> level is
>> > > > >> > > > >> > > implemented
>> > > > >> > > > >> > > > as
>> > > > >> > > > >> > > > > > block size; this is caused by my
>> misunderstanding
>> > > on
>> > > > >> lz4.
>> > > > >> > > > After
>> > > > >> > > > >> > > > reviewing
>> > > > >> > > > >> > > > > > lz4 today, I found that it also supports
>> > > compression
>> > > > >> level
>> > > > >> > > of
>> > > > >> > > > >> 1~16
>> > > > >> > > > >> > > > > > (default: 1), not block size. I will fix it in
>> > this
>> > > > >> > weekend
>> > > > >> > > by
>> > > > >> > > > >> > > updating
>> > > > >> > > > >> > > > > the
>> > > > >> > > > >> > > > > > wrapper class.
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > For the problem of the buffer/block size, I
>> have
>> > no
>> > > > >> strong
>> > > > >> > > > >> opinion.
>> > > > >> > > > >> > > If
>> > > > >> > > > >> > > > > the
>> > > > >> > > > >> > > > > > community needs it, I will do it all together.
>> > How
>> > > do
>> > > > >> you
>> > > > >> > > > think?
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > In short, it seems like I need to update the
>> KIP
>> > > > >> document
>> > > > >> > > for
>> > > > >> > > > >> issue
>> > > > >> > > > >> > > #1
>> > > > >> > > > >> > > > > and
>> > > > >> > > > >> > > > > > update the compression wrapper for issue #2,
>> #3.
>> > Is
>> > > > >> this
>> > > > >> > > okay?
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > Thanks,
>> > > > >> > > > >> > > > > > Dongjin
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
>> > > > >> > > > ismaelj@gmail.com
>> > > > >> > > > >> >
>> > > > >> > > > >> > > > wrote:
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > >  Thanks for the KIP, this is helpful. A few
>> > > > >> questions:
>> > > > >> > > > >> > > > > > >
>> > > > >> > > > >> > > > > > > 1. Have we considered whether we want to
>> allow
>> > a
>> > > > >> similar
>> > > > >> > > > topic
>> > > > >> > > > >> > > > config?
>> > > > >> > > > >> > > > > > > 2. Can we rely on a method from the library
>> to
>> > > pick
>> > > > >> the
>> > > > >> > > > >> default
>> > > > >> > > > >> > > > > > compression
>> > > > >> > > > >> > > > > > > level if compression.level is not set? We
>> do it
>> > > for
>> > > > >> gzip
>> > > > >> > > and
>> > > > >> > > > >> it
>> > > > >> > > > >> > > would
>> > > > >> > > > >> > > > > > seem
>> > > > >> > > > >> > > > > > > reasonable to do something similar for the
>> > other
>> > > > >> > > compression
>> > > > >> > > > >> > > > libraries.
>> > > > >> > > > >> > > > > > > 3. Do we want to allow the buffer/block
>> size to
>> > > be
>> > > > >> > > > >> configurable?
>> > > > >> > > > >> > > This
>> > > > >> > > > >> > > > > has
>> > > > >> > > > >> > > > > > > an impact on memory usage and people may
>> want
>> > to
>> > > > >> trade
>> > > > >> > > > >> > compression
>> > > > >> > > > >> > > > for
>> > > > >> > > > >> > > > > > > less/more memory in some cases. For example,
>> > the
>> > > > >> default
>> > > > >> > > for
>> > > > >> > > > >> LZ4
>> > > > >> > > > >> > is
>> > > > >> > > > >> > > > > 64KB
>> > > > >> > > > >> > > > > > > which is a bit high.
>> > > > >> > > > >> > > > > > >
>> > > > >> > > > >> > > > > > > Ismael
>> > > > >> > > > >> > > > > > >
>> > > > >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
>> > > > >> > > > dongjin@apache.org
>> > > > >> > > > >> > > wrote:
>> > > > >> > > > >> > > > > > >
>> > > > >> > > > >> > > > > > > > Hello dev,
>> > > > >> > > > >> > > > > > > >
>> > > > >> > > > >> > > > > > > > I hope to initiate the discussion of
>> KIP-390:
>> > > Add
>> > > > >> > > producer
>> > > > >> > > > >> > option
>> > > > >> > > > >> > > > to
>> > > > >> > > > >> > > > > > > adjust
>> > > > >> > > > >> > > > > > > > compression level
>> > > > >> > > > >> > > > > > > > <
>> > > > >> > > > >> > > > > > > >
>> > > > >> > > > >> > > > > > >
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > >
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > >
>> > > > >> > > > >> >
>> > > > >> > > > >>
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
>> > > > >> > > > >> > > > > > > > >.
>> > > > >> > > > >> > > > > > > > All feedbacks will be highly appreciated.
>> > > > >> > > > >> > > > > > > >
>> > > > >> > > > >> > > > > > > > Best,
>> > > > >> > > > >> > > > > > > > Dongjin
>> > > > >> > > > >> > > > > > > >
>> > > > >> > > > >> > > > > > > > --
>> > > > >> > > > >> > > > > > > > *Dongjin Lee*
>> > > > >> > > > >> > > > > > > >
>> > > > >> > > > >> > > > > > > > *A hitchhiker in the mathematical world.*
>> > > > >> > > > >> > > > > > > >
>> > > > >> > > > >> > > > > > > > *github:  <http://goog_969573159/>
>> > > > >> > > github.com/dongjinleekr
>> > > > >> > > > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
>> > > > >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
>> > > > >> > > > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
>> > > > >> >slideshare:
>> > > > >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
>> > > > >> > > > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
>> > > > >> > > > >> > > > > > > >
>> > > > >> > > > >> > > > > > >
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > --
>> > > > >> > > > >> > > > > > *Dongjin Lee*
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > > > *A hitchhiker in the mathematical world.*
>> > > > >> > > > >> > > > > > *github:  <http://goog_969573159/>
>> > > > >> github.com/dongjinleekr
>> > > > >> > > > >> > > > > > <https://github.com/dongjinleekr>linkedin:
>> > > > >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
>> > > > >> > > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr
>> > > > >speakerdeck:
>> > > > >> > > > >> > > > > > speakerdeck.com/dongjin
>> > > > >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
>> > > > >> > > > >> > > > > >
>> > > > >> > > > >> > > > >
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > > --
>> > > > >> > > > >> > > > *Dongjin Lee*
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
>> > > > >> > > > >> > > > *github:  <http://goog_969573159/>
>> > > > github.com/dongjinleekr
>> > > > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
>> > > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
>> > > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
>> > >speakerdeck:
>> > > > >> > > > >> > > > speakerdeck.com/dongjin
>> > > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
>> > > > >> > > > >> > > >
>> > > > >> > > > >> > >
>> > > > >> > > > >> >
>> > > > >> > > > >> >
>> > > > >> > > > >> > --
>> > > > >> > > > >> > *Dongjin Lee*
>> > > > >> > > > >> >
>> > > > >> > > > >> > *A hitchhiker in the mathematical world.*
>> > > > >> > > > >> > *github:  <http://goog_969573159/>
>> > github.com/dongjinleekr
>> > > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
>> > > > >> > > > >> kr.linkedin.com/in/dongjinleekr
>> > > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > >> > > > >> > speakerdeck.com/dongjin
>> > > > >> > > > >> > <https://speakerdeck.com/dongjin>*
>> > > > >> > > > >> >
>> > > > >> > > > >>
>> > > > >> > > > >
>> > > > >> > > > >
>> > > > >> > > > > --
>> > > > >> > > > > *Dongjin Lee*
>> > > > >> > > > >
>> > > > >> > > > > *A hitchhiker in the mathematical world.*
>> > > > >> > > > > *github:  <http://goog_969573159/>
>> github.com/dongjinleekr
>> > > > >> > > > > <https://github.com/dongjinleekr>linkedin:
>> > > > >> > > > kr.linkedin.com/in/dongjinleekr
>> > > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > >> > > > speakerdeck.com/dongjin
>> > > > >> > > > > <https://speakerdeck.com/dongjin>*
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > --
>> > > > >> > > > *Dongjin Lee*
>> > > > >> > > >
>> > > > >> > > > *A hitchhiker in the mathematical world.*
>> > > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
>> > > > >> > > kr.linkedin.com/in/dongjinleekr
>> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > >> > > > speakerdeck.com/dongjin
>> > > > >> > > > <https://speakerdeck.com/dongjin>*
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >> >
>> > > > >> > --
>> > > > >> > *Dongjin Lee*
>> > > > >> >
>> > > > >> > *A hitchhiker in the mathematical world.*
>> > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > >> > <https://github.com/dongjinleekr>linkedin:
>> > > > >> kr.linkedin.com/in/dongjinleekr
>> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > >> > speakerdeck.com/dongjin
>> > > > >> > <https://speakerdeck.com/dongjin>*
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > > > --
>> > > > > *Dongjin Lee*
>> > > > >
>> > > > > *A hitchhiker in the mathematical world.*
>> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > > <https://github.com/dongjinleekr>linkedin:
>> > > > kr.linkedin.com/in/dongjinleekr
>> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > speakerdeck.com/dongjin
>> > > > > <https://speakerdeck.com/dongjin>*
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > *Dongjin Lee*
>> > > >
>> > > > *A hitchhiker in the mathematical world.*
>> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > <https://github.com/dongjinleekr>linkedin:
>> > > kr.linkedin.com/in/dongjinleekr
>> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > speakerdeck.com/dongjin
>> > > > <https://speakerdeck.com/dongjin>*
>> > > >
>> > >
>> >
>> >
>> > --
>> > *Dongjin Lee*
>> >
>> > *A hitchhiker in the mathematical world.*
>> > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > <https://github.com/dongjinleekr>linkedin:
>> kr.linkedin.com/in/dongjinleekr
>> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > speakerdeck.com/dongjin
>> > <https://speakerdeck.com/dongjin>*
>> >
>>
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Hi Becket,

Thank you for your opinion. Frankly, I have no strong opinion on
configuration name. In this problem, I will follow the community's choice.
(I like your idea in that it has a notion of 'scope' per compression codec.
However, it should be implemented on top of new config type like Map; It
will require another KIP as a prerequisite, but if the community prefer
this way, I will take the task.)

(One minor correction: the one who thought 'producer' compression config
would cause a problem at broker was me, not Ismael - and Ismael reassured
me there will be no problem with it.)

To All,

How about Becket's idea of 'compression.config' option?

Best,
Dongjin

On Wed, Jan 23, 2019 at 1:16 PM Becket Qin <be...@gmail.com> wrote:

> Hi Dongjin,
>
> Thanks for the KIP and sorry for being a bit late on the discussion.
>
> It makes sense to expose the configuration for compression types. But I am
> wondering if there is a better way to do that than what proposed in the
> KIP. What I feel confusing is that we are effectively sharing the
> configuration across different compression types, the meaning of the
> configuration are actually kind of different depending on the compression
> type. This will end up with issues like what Ismael has brought up earlier.
> Say if the broker has compression type of producer (this may result in
> mixed compression type in the same topic), and for some reason the broker
> needs to re-compress the topic (e.g. due to log compaction), a single topic
> level compression config may not work, because a valid compression level
> for lz4 maybe invalid for gzip.
>
> One alternative I am thinking is to provide a "compression.config"
> configuration, inside which it specifies configuration used by each
> specific compression type as k-v pairs. The format could use some name
> space as well. For example,
>
>
> compression.config="gzip.compression.level=5,lz4.compression.level=17,zstd.compression.level=22".
>
> Each compression type will just pick whatever configuration they need from
> the k-v pairs defined in this config.
>
> Besides clarity, some other benefits are:
> 1. Extensibility, we don't have to add more configuration when we add new
> compression types, or expose new config for a particular compression type.
> 2. Even for the same config, different compression type may have different
> terminologies. With the approach we can honor those terminologies instead
> of shoehorning them into the same configuration name.
>
> What do you think?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
>
> On Wed, Jan 23, 2019 at 12:07 AM Dongjin Lee <do...@apache.org> wrote:
>
> > Hello. I just fixed the draft implementation, with rebasing onto the
> latest
> > trunk. The KIP was also restored.
> >
> > Please have a look, and if there is no major problem, please vote to the
> > voting thread. You know, KIP freeze for 2.2.0 is almost imminent.
> >
> > Thanks,
> > Dongjin
> >
> > On Tue, Jan 22, 2019 at 1:04 AM Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Thanks!
> > >
> > > Ismael
> > >
> > > On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > After reviewing
> > > `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
> > > > yes, you are right. If source codec and target codec is identical and
> > the
> > > > magic is above 0, the broker can do an in-place assignment, without
> > > > recompressing. Sorry for my misunderstanding.
> > > >
> > > > Since we don't need `compression.[gzip,lz4,zstd].level` and
> > > > `compression.[gzip,snappy,lz4].buffer.size` anymore, I will revert
> > those
> > > > changes and update the KIP with the recent discussions. I will
> complete
> > > it
> > > > in 48 hours from now.
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >
> > > > > I see. Let me have a check. If not needed, of course, we don't have
> > to
> > > > > waste on configuration options.
> > > > >
> > > > > Since the KIP deadline is imminent, I just opened the voting
> thread.
> > > > Let's
> > > > > continue the discussion here.
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <is...@juma.me.uk>
> > wrote:
> > > > >
> > > > >> Hi Dongjin,
> > > > >>
> > > > >> When the compression type is "producer", then the broker doesn't
> > > > >> recompress
> > > > >> though. Thinking about it some more, there are some uncommon cases
> > > where
> > > > >> recompression does happen (the old (and hopefully hardly used by
> > now)
> > > > >> message format == 0 and some edge cases), so it is a good point
> you
> > > > >> raised.
> > > > >>
> > > > >> It's a bit unfortunate to add so many topic configs for cases that
> > > > >> probably
> > > > >> don't matter. That is, if you are using "producer" compression,
> you
> > > > >> probably don't need to configure these settings and can live with
> > the
> > > > >> defaults. Perhaps we should only support the topic config for the
> > > cases
> > > > >> where you are actually recompressing in the broker.
> > > > >>
> > > > >> What do you think? I'd be interested in other people's thoughts
> too.
> > > > >>
> > > > >> Ismael
> > > > >>
> > > > >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >>
> > > > >> > Hi Ismael,
> > > > >> >
> > > > >> > It seems like it needs more explanation. Here is the detailed
> > > > reasoning.
> > > > >> >
> > > > >> > You know, topic and broker's 'compression.type' allows
> > > 'uncompressed',
> > > > >> > 'producer' with standard codecs (i.e., gzip, snappy, lz4, zstd.)
> > And
> > > > >> this
> > > > >> > configuration is used by the broker in the re-compressing
> process
> > > > after
> > > > >> > offset assignment. After this feature, the new configs,
> > > > >> 'compression.level'
> > > > >> > and 'compression.buffer.size', also will be used in this
> process.
> > > > >> >
> > > > >> > The problem arises when given topic's compression type (whether
> it
> > > was
> > > > >> > inherited from broker's configuration or explicitly set) is
> > > > 'producer.'
> > > > >> > With this setting, the compression codec to be used is decided
> by
> > > the
> > > > >> > producer client. Since there is no way to restore the
> compression
> > > > level
> > > > >> and
> > > > >> > buffer size from the message, we can take the following
> > strategies:
> > > > >> >
> > > > >> > 1. Just use given 'compression.level' and
> > 'compression.buffer.size'
> > > > >> > settings.
> > > > >> >
> > > > >> > It will cause so many errors. Let's imagine the case of topic's
> > > > >> > configuration is { compression.type=producer,
> > compression.level=10,
> > > > >> > compression.buffer.size=8192 }. In this case, all producers with
> > > gzip
> > > > or
> > > > >> > lz4 compressed messages will result in an error. (gzip doesn't
> > allow
> > > > >> > compression level 10, and lz4 also for a buffer size of 8192.)
> > > > >> >
> > > > >> > 2. Extend the message format to include compression
> > configurations.
> > > > >> >
> > > > >> > With this strategy, we need to change the message format - it's
> a
> > > too
> > > > >> big
> > > > >> > change.
> > > > >> >
> > > > >> > 3. If topic's compression.type is 'producer', use the default
> > > > >> configuration
> > > > >> > for the given codec.
> > > > >> >
> > > > >> > With this strategy, allowing fine-grained compression
> > configuration
> > > is
> > > > >> > meaningless.
> > > > >> >
> > > > >> > For the above reasons, I think the only alternative is providing
> > > > options
> > > > >> > that can be used when the topic's 'compression.type' is
> > 'producer.'
> > > In
> > > > >> > other words, adding compression.[gzip, lz4, zstd].level and
> > > > >> > compression.[gzip.snappy.lz4].buffer.size options - and it is
> > what I
> > > > >> did in
> > > > >> > the last modification.
> > > > >> >
> > > > >> > (wait, the reasoning above should be included in the KIP in the
> > > > rejected
> > > > >> > alternatives section, isn't it?)
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Dongjin
> > > > >> >
> > > > >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <is...@juma.me.uk>
> > > > wrote:
> > > > >> >
> > > > >> > > Hi Dongjin,
> > > > >> > >
> > > > >> > > For topic level, you can only have a single compression type
> so
> > > the
> > > > >> way
> > > > >> > it
> > > > >> > > was before was fine, right? The point you raise is how to set
> > > broker
> > > > >> > > defaults that vary depending on the compression type, correct?
> > > > >> > >
> > > > >> > > Ismael
> > > > >> > >
> > > > >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <
> > dongjin@apache.org>
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > I just realized that there was a missing hole in the KIP,
> so I
> > > > fixed
> > > > >> > it.
> > > > >> > > > The draft implementation will be updated soon.
> > > > >> > > >
> > > > >> > > > In short, the proposed change did not regard the case of the
> > > topic
> > > > >> or
> > > > >> > > > broker's 'compression.type' is 'producer'; in this case, the
> > > > broker
> > > > >> has
> > > > >> > > to
> > > > >> > > > handle all kinds of the supported codec. So I added
> additional
> > > > >> options
> > > > >> > > > (compression.[gzip,snappy,lz4, zstd].level,
> > > > >> > compression.[gzip,snappy,lz4,
> > > > >> > > > zstd].buffer.size) with handling routines.
> > > > >> > > >
> > > > >> > > > Please have a look when you are free.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Dongjin
> > > > >> > > >
> > > > >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <
> > dongjin@apache.org>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > > > Thanks for pointing out Ismael. It's now updated.
> > > > >> > > > >
> > > > >> > > > > Best,
> > > > >> > > > > Dongjin
> > > > >> > > > >
> > > > >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <
> > ismaelj@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > > > >
> > > > >> > > > >> Thanks Dongjin. One minor suggestion: we should mention
> > that
> > > > the
> > > > >> > > broker
> > > > >> > > > >> side configs are also topic configs (i.e. can be set for
> a
> > > > given
> > > > >> > > topic).
> > > > >> > > > >>
> > > > >> > > > >> Ismael
> > > > >> > > > >>
> > > > >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <
> > > dongjin@apache.org
> > > > >> > wrote:
> > > > >> > > > >>
> > > > >> > > > >> > Happy new year.
> > > > >> > > > >> >
> > > > >> > > > >> > I just updated the title and contents of KIP and Jira
> > > issue,
> > > > >> with
> > > > >> > > > >> updated
> > > > >> > > > >> > draft implementation. Now both of compression level and
> > > > buffer
> > > > >> > size
> > > > >> > > > >> options
> > > > >> > > > >> > are available to producer and broker configuration. You
> > can
> > > > >> check
> > > > >> > > the
> > > > >> > > > >> > updated KIP from modified URL:
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > > > >> > > > >> >
> > > > >> > > > >> > Please have a look when you are free.
> > > > >> > > > >> >
> > > > >> > > > >> > Thanks,
> > > > >> > > > >> > Dongjin
> > > > >> > > > >> >
> > > > >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <
> > > > ismaelj@gmail.com
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > >> >
> > > > >> > > > >> > > The updated title sounds fine to me.
> > > > >> > > > >> > >
> > > > >> > > > >> > > Ismael
> > > > >> > > > >> > >
> > > > >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
> > > > dongjin@apache.org
> > > > >> > > wrote:
> > > > >> > > > >> > >
> > > > >> > > > >> > > > Hi Ismael,
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > Got it. Your direction is perfectly reasonable. I
> am
> > > now
> > > > >> > > updating
> > > > >> > > > >> the
> > > > >> > > > >> > KIP
> > > > >> > > > >> > > > document and the implementation.
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > By allowing the buffer/block size to be
> configurable,
> > > it
> > > > >> would
> > > > >> > > be
> > > > >> > > > >> > better
> > > > >> > > > >> > > to
> > > > >> > > > >> > > > update the title of the KIP like 'Allow
> fine-grained
> > > > >> > > configuration
> > > > >> > > > >> for
> > > > >> > > > >> > > > compression'. Is that right?
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > @Other committers:
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > Is there any other opinion on allowing the
> > buffer/block
> > > > >> size
> > > > >> > to
> > > > >> > > be
> > > > >> > > > >> > > > configurable?
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > Thanks,
> > > > >> > > > >> > > > Dongjin
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
> > > > >> > ismael@juma.me.uk>
> > > > >> > > > >> wrote:
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > > Hi Dongjin,
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > To clarify, I mean a broker topic config with
> > regards
> > > > to
> > > > >> > point
> > > > >> > > > 1.
> > > > >> > > > >> As
> > > > >> > > > >> > > you
> > > > >> > > > >> > > > > know, compression can be done by the producer
> > and/or
> > > by
> > > > >> the
> > > > >> > > > >> broker.
> > > > >> > > > >> > The
> > > > >> > > > >> > > > > default is for the broker to just use whatever
> > > > >> compression
> > > > >> > was
> > > > >> > > > >> used
> > > > >> > > > >> > by
> > > > >> > > > >> > > > the
> > > > >> > > > >> > > > > producer, but this can be changed by the user on
> a
> > > per
> > > > >> topic
> > > > >> > > > >> basis.
> > > > >> > > > >> > It
> > > > >> > > > >> > > > > seems like it would make sense for the configs to
> > be
> > > .
> > > > >> > > > consistent
> > > > >> > > > >> > > between
> > > > >> > > > >> > > > > producer and broker.
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > For point 2, I haven't looked at the
> > implementation,
> > > > but
> > > > >> we
> > > > >> > > > could
> > > > >> > > > >> do
> > > > >> > > > >> > it
> > > > >> > > > >> > > > in
> > > > >> > > > >> > > > > the `CompressionType` enum by invoking the right
> > > > >> constructor
> > > > >> > > or
> > > > >> > > > >> > > > retrieving
> > > > >> > > > >> > > > > the default value via a constant (if defined).
> > That's
> > > > an
> > > > >> > > > >> > implementation
> > > > >> > > > >> > > > > detail and can be discussed in the PR. The more
> > > general
> > > > >> > point
> > > > >> > > is
> > > > >> > > > >> to
> > > > >> > > > >> > > rely
> > > > >> > > > >> > > > on
> > > > >> > > > >> > > > > the library defaults instead of choosing one
> > > ourselves.
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > For point 3, I'm in favour of doing that in this
> > KIP.
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > Ismael
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
> > > > >> > > dongjin@apache.org
> > > > >> > > > >
> > > > >> > > > >> > > wrote:
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > > > > Thank you Ismael, here are the answers:
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > *1. About topic config*
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > After some consideration, I concluded that
> topic
> > > > config
> > > > >> > > > doesn't
> > > > >> > > > >> > need
> > > > >> > > > >> > > to
> > > > >> > > > >> > > > > > support compression.level. Here is why: since
> the
> > > > >> > > compression
> > > > >> > > > is
> > > > >> > > > >> > > > > conducted
> > > > >> > > > >> > > > > > by the client, the one who can select the best
> > > > >> compression
> > > > >> > > > >> level is
> > > > >> > > > >> > > the
> > > > >> > > > >> > > > > > client itself. Let us assume that the
> compression
> > > > >> level is
> > > > >> > > set
> > > > >> > > > >> at
> > > > >> > > > >> > the
> > > > >> > > > >> > > > > topic
> > > > >> > > > >> > > > > > config level. In that case, there is a
> > possibility
> > > > that
> > > > >> > the
> > > > >> > > > >> > > compression
> > > > >> > > > >> > > > > > level is not optimal for some producers.
> > Actually,
> > > > >> Kafka's
> > > > >> > > go
> > > > >> > > > >> > client
> > > > >> > > > >> > > > also
> > > > >> > > > >> > > > > > supports compression level functionality for
> the
> > > > >> producer
> > > > >> > > > config
> > > > >> > > > >> > > only.
> > > > >> > > > >> > > > > > <
> > > > >> https://github.com/Shopify/sarama/blob/master/config.go>
> > > > >> > > > >> (wait,
> > > > >> > > > >> > do
> > > > >> > > > >> > > we
> > > > >> > > > >> > > > > > need
> > > > >> > > > >> > > > > > to add this reasoning in the KIP, rejected
> > > > alternatives
> > > > >> > > > >> section?)
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > *2. About default level*
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > As of current draft implementation, the default
> > > > >> > compression
> > > > >> > > is
> > > > >> > > > >> set
> > > > >> > > > >> > on
> > > > >> > > > >> > > > the
> > > > >> > > > >> > > > > > CompressionType enum. Of course, changing this
> > > > strategy
> > > > >> > into
> > > > >> > > > >> > relying
> > > > >> > > > >> > > > on a
> > > > >> > > > >> > > > > > method from the library to pick the default
> > > > compression
> > > > >> > > level
> > > > >> > > > >> seems
> > > > >> > > > >> > > > > > possible, like `GZIPBlockOutputStream` does. In
> > > this
> > > > >> case,
> > > > >> > > we
> > > > >> > > > >> need
> > > > >> > > > >> > to
> > > > >> > > > >> > > > add
> > > > >> > > > >> > > > > > similar wrapper class for zstd and modify lz4
> the
> > > > >> wrapper
> > > > >> > > > also.
> > > > >> > > > >> Add
> > > > >> > > > >> > > to
> > > > >> > > > >> > > > > > this, it seems like we need to explicitly state
> > > that
> > > > we
> > > > >> > > follow
> > > > >> > > > >> the
> > > > >> > > > >> > > > > default
> > > > >> > > > >> > > > > > compression level of the codec in the
> > > documentation.
> > > > Is
> > > > >> > this
> > > > >> > > > >> what
> > > > >> > > > >> > you
> > > > >> > > > >> > > > > > intended?
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > *3. Whether to allow the buffer/block size to
> be
> > > > >> > > configurable*
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > Well, As of current draft implementation, the
> lz4
> > > > >> level is
> > > > >> > > > >> > > implemented
> > > > >> > > > >> > > > as
> > > > >> > > > >> > > > > > block size; this is caused by my
> misunderstanding
> > > on
> > > > >> lz4.
> > > > >> > > > After
> > > > >> > > > >> > > > reviewing
> > > > >> > > > >> > > > > > lz4 today, I found that it also supports
> > > compression
> > > > >> level
> > > > >> > > of
> > > > >> > > > >> 1~16
> > > > >> > > > >> > > > > > (default: 1), not block size. I will fix it in
> > this
> > > > >> > weekend
> > > > >> > > by
> > > > >> > > > >> > > updating
> > > > >> > > > >> > > > > the
> > > > >> > > > >> > > > > > wrapper class.
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > For the problem of the buffer/block size, I
> have
> > no
> > > > >> strong
> > > > >> > > > >> opinion.
> > > > >> > > > >> > > If
> > > > >> > > > >> > > > > the
> > > > >> > > > >> > > > > > community needs it, I will do it all together.
> > How
> > > do
> > > > >> you
> > > > >> > > > think?
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > In short, it seems like I need to update the
> KIP
> > > > >> document
> > > > >> > > for
> > > > >> > > > >> issue
> > > > >> > > > >> > > #1
> > > > >> > > > >> > > > > and
> > > > >> > > > >> > > > > > update the compression wrapper for issue #2,
> #3.
> > Is
> > > > >> this
> > > > >> > > okay?
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > Thanks,
> > > > >> > > > >> > > > > > Dongjin
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
> > > > >> > > > ismaelj@gmail.com
> > > > >> > > > >> >
> > > > >> > > > >> > > > wrote:
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > >  Thanks for the KIP, this is helpful. A few
> > > > >> questions:
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > > > 1. Have we considered whether we want to
> allow
> > a
> > > > >> similar
> > > > >> > > > topic
> > > > >> > > > >> > > > config?
> > > > >> > > > >> > > > > > > 2. Can we rely on a method from the library
> to
> > > pick
> > > > >> the
> > > > >> > > > >> default
> > > > >> > > > >> > > > > > compression
> > > > >> > > > >> > > > > > > level if compression.level is not set? We do
> it
> > > for
> > > > >> gzip
> > > > >> > > and
> > > > >> > > > >> it
> > > > >> > > > >> > > would
> > > > >> > > > >> > > > > > seem
> > > > >> > > > >> > > > > > > reasonable to do something similar for the
> > other
> > > > >> > > compression
> > > > >> > > > >> > > > libraries.
> > > > >> > > > >> > > > > > > 3. Do we want to allow the buffer/block size
> to
> > > be
> > > > >> > > > >> configurable?
> > > > >> > > > >> > > This
> > > > >> > > > >> > > > > has
> > > > >> > > > >> > > > > > > an impact on memory usage and people may want
> > to
> > > > >> trade
> > > > >> > > > >> > compression
> > > > >> > > > >> > > > for
> > > > >> > > > >> > > > > > > less/more memory in some cases. For example,
> > the
> > > > >> default
> > > > >> > > for
> > > > >> > > > >> LZ4
> > > > >> > > > >> > is
> > > > >> > > > >> > > > > 64KB
> > > > >> > > > >> > > > > > > which is a bit high.
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > > > Ismael
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
> > > > >> > > > dongjin@apache.org
> > > > >> > > > >> > > wrote:
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > > > > Hello dev,
> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > > > > > I hope to initiate the discussion of
> KIP-390:
> > > Add
> > > > >> > > producer
> > > > >> > > > >> > option
> > > > >> > > > >> > > > to
> > > > >> > > > >> > > > > > > adjust
> > > > >> > > > >> > > > > > > > compression level
> > > > >> > > > >> > > > > > > > <
> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > >
> > > > >> > > > >> > >
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > > >> > > > >> > > > > > > > >.
> > > > >> > > > >> > > > > > > > All feedbacks will be highly appreciated.
> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > > > > > Best,
> > > > >> > > > >> > > > > > > > Dongjin
> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > > > > > --
> > > > >> > > > >> > > > > > > > *Dongjin Lee*
> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > > > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > > > > > *github:  <http://goog_969573159/>
> > > > >> > > github.com/dongjinleekr
> > > > >> > > > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > > >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
> > > > >> >slideshare:
> > > > >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> > > > >> > > > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > >> > > > >> > > > > > > >
> > > > >> > > > >> > > > > > >
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > --
> > > > >> > > > >> > > > > > *Dongjin Lee*
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > >> > > > > > *github:  <http://goog_969573159/>
> > > > >> github.com/dongjinleekr
> > > > >> > > > >> > > > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr
> > > > >speakerdeck:
> > > > >> > > > >> > > > > > speakerdeck.com/dongjin
> > > > >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >> > > > > >
> > > > >> > > > >> > > > >
> > > > >> > > > >> > > >
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > --
> > > > >> > > > >> > > > *Dongjin Lee*
> > > > >> > > > >> > > >
> > > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > >> > > > *github:  <http://goog_969573159/>
> > > > github.com/dongjinleekr
> > > > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
> > >speakerdeck:
> > > > >> > > > >> > > > speakerdeck.com/dongjin
> > > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >> > > >
> > > > >> > > > >> > >
> > > > >> > > > >> >
> > > > >> > > > >> >
> > > > >> > > > >> > --
> > > > >> > > > >> > *Dongjin Lee*
> > > > >> > > > >> >
> > > > >> > > > >> > *A hitchhiker in the mathematical world.*
> > > > >> > > > >> > *github:  <http://goog_969573159/>
> > github.com/dongjinleekr
> > > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > > > >> > speakerdeck.com/dongjin
> > > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >> >
> > > > >> > > > >>
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > *Dongjin Lee*
> > > > >> > > > >
> > > > >> > > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > > > speakerdeck.com/dongjin
> > > > >> > > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > *Dongjin Lee*
> > > > >> > > >
> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > > > speakerdeck.com/dongjin
> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > *Dongjin Lee*
> > > > >> >
> > > > >> > *A hitchhiker in the mathematical world.*
> > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > > >> kr.linkedin.com/in/dongjinleekr
> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > speakerdeck.com/dongjin
> > > > >> > <https://speakerdeck.com/dongjin>*
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <https://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > speakerdeck.com/dongjin
> > > > > <https://speakerdeck.com/dongjin>*
> > > > >
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <https://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > speakerdeck.com/dongjin
> > > > <https://speakerdeck.com/dongjin>*
> > > >
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Becket Qin <be...@gmail.com>.
Hi Dongjin,

Thanks for the KIP and sorry for being a bit late on the discussion.

It makes sense to expose the configuration for compression types. But I am
wondering if there is a better way to do that than what proposed in the
KIP. What I feel confusing is that we are effectively sharing the
configuration across different compression types, the meaning of the
configuration are actually kind of different depending on the compression
type. This will end up with issues like what Ismael has brought up earlier.
Say if the broker has compression type of producer (this may result in
mixed compression type in the same topic), and for some reason the broker
needs to re-compress the topic (e.g. due to log compaction), a single topic
level compression config may not work, because a valid compression level
for lz4 maybe invalid for gzip.

One alternative I am thinking is to provide a "compression.config"
configuration, inside which it specifies configuration used by each
specific compression type as k-v pairs. The format could use some name
space as well. For example,

compression.config="gzip.compression.level=5,lz4.compression.level=17,zstd.compression.level=22".

Each compression type will just pick whatever configuration they need from
the k-v pairs defined in this config.

Besides clarity, some other benefits are:
1. Extensibility, we don't have to add more configuration when we add new
compression types, or expose new config for a particular compression type.
2. Even for the same config, different compression type may have different
terminologies. With the approach we can honor those terminologies instead
of shoehorning them into the same configuration name.

What do you think?

Thanks,

Jiangjie (Becket) Qin





On Wed, Jan 23, 2019 at 12:07 AM Dongjin Lee <do...@apache.org> wrote:

> Hello. I just fixed the draft implementation, with rebasing onto the latest
> trunk. The KIP was also restored.
>
> Please have a look, and if there is no major problem, please vote to the
> voting thread. You know, KIP freeze for 2.2.0 is almost imminent.
>
> Thanks,
> Dongjin
>
> On Tue, Jan 22, 2019 at 1:04 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > Thanks!
> >
> > Ismael
> >
> > On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <do...@apache.org> wrote:
> >
> > > Hi Ismael,
> > >
> > > After reviewing
> > `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
> > > yes, you are right. If source codec and target codec is identical and
> the
> > > magic is above 0, the broker can do an in-place assignment, without
> > > recompressing. Sorry for my misunderstanding.
> > >
> > > Since we don't need `compression.[gzip,lz4,zstd].level` and
> > > `compression.[gzip,snappy,lz4].buffer.size` anymore, I will revert
> those
> > > changes and update the KIP with the recent discussions. I will complete
> > it
> > > in 48 hours from now.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > I see. Let me have a check. If not needed, of course, we don't have
> to
> > > > waste on configuration options.
> > > >
> > > > Since the KIP deadline is imminent, I just opened the voting thread.
> > > Let's
> > > > continue the discussion here.
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <is...@juma.me.uk>
> wrote:
> > > >
> > > >> Hi Dongjin,
> > > >>
> > > >> When the compression type is "producer", then the broker doesn't
> > > >> recompress
> > > >> though. Thinking about it some more, there are some uncommon cases
> > where
> > > >> recompression does happen (the old (and hopefully hardly used by
> now)
> > > >> message format == 0 and some edge cases), so it is a good point you
> > > >> raised.
> > > >>
> > > >> It's a bit unfortunate to add so many topic configs for cases that
> > > >> probably
> > > >> don't matter. That is, if you are using "producer" compression, you
> > > >> probably don't need to configure these settings and can live with
> the
> > > >> defaults. Perhaps we should only support the topic config for the
> > cases
> > > >> where you are actually recompressing in the broker.
> > > >>
> > > >> What do you think? I'd be interested in other people's thoughts too.
> > > >>
> > > >> Ismael
> > > >>
> > > >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >>
> > > >> > Hi Ismael,
> > > >> >
> > > >> > It seems like it needs more explanation. Here is the detailed
> > > reasoning.
> > > >> >
> > > >> > You know, topic and broker's 'compression.type' allows
> > 'uncompressed',
> > > >> > 'producer' with standard codecs (i.e., gzip, snappy, lz4, zstd.)
> And
> > > >> this
> > > >> > configuration is used by the broker in the re-compressing process
> > > after
> > > >> > offset assignment. After this feature, the new configs,
> > > >> 'compression.level'
> > > >> > and 'compression.buffer.size', also will be used in this process.
> > > >> >
> > > >> > The problem arises when given topic's compression type (whether it
> > was
> > > >> > inherited from broker's configuration or explicitly set) is
> > > 'producer.'
> > > >> > With this setting, the compression codec to be used is decided by
> > the
> > > >> > producer client. Since there is no way to restore the compression
> > > level
> > > >> and
> > > >> > buffer size from the message, we can take the following
> strategies:
> > > >> >
> > > >> > 1. Just use given 'compression.level' and
> 'compression.buffer.size'
> > > >> > settings.
> > > >> >
> > > >> > It will cause so many errors. Let's imagine the case of topic's
> > > >> > configuration is { compression.type=producer,
> compression.level=10,
> > > >> > compression.buffer.size=8192 }. In this case, all producers with
> > gzip
> > > or
> > > >> > lz4 compressed messages will result in an error. (gzip doesn't
> allow
> > > >> > compression level 10, and lz4 also for a buffer size of 8192.)
> > > >> >
> > > >> > 2. Extend the message format to include compression
> configurations.
> > > >> >
> > > >> > With this strategy, we need to change the message format - it's a
> > too
> > > >> big
> > > >> > change.
> > > >> >
> > > >> > 3. If topic's compression.type is 'producer', use the default
> > > >> configuration
> > > >> > for the given codec.
> > > >> >
> > > >> > With this strategy, allowing fine-grained compression
> configuration
> > is
> > > >> > meaningless.
> > > >> >
> > > >> > For the above reasons, I think the only alternative is providing
> > > options
> > > >> > that can be used when the topic's 'compression.type' is
> 'producer.'
> > In
> > > >> > other words, adding compression.[gzip, lz4, zstd].level and
> > > >> > compression.[gzip.snappy.lz4].buffer.size options - and it is
> what I
> > > >> did in
> > > >> > the last modification.
> > > >> >
> > > >> > (wait, the reasoning above should be included in the KIP in the
> > > rejected
> > > >> > alternatives section, isn't it?)
> > > >> >
> > > >> > Thanks,
> > > >> > Dongjin
> > > >> >
> > > >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <is...@juma.me.uk>
> > > wrote:
> > > >> >
> > > >> > > Hi Dongjin,
> > > >> > >
> > > >> > > For topic level, you can only have a single compression type so
> > the
> > > >> way
> > > >> > it
> > > >> > > was before was fine, right? The point you raise is how to set
> > broker
> > > >> > > defaults that vary depending on the compression type, correct?
> > > >> > >
> > > >> > > Ismael
> > > >> > >
> > > >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <
> dongjin@apache.org>
> > > >> wrote:
> > > >> > >
> > > >> > > > I just realized that there was a missing hole in the KIP, so I
> > > fixed
> > > >> > it.
> > > >> > > > The draft implementation will be updated soon.
> > > >> > > >
> > > >> > > > In short, the proposed change did not regard the case of the
> > topic
> > > >> or
> > > >> > > > broker's 'compression.type' is 'producer'; in this case, the
> > > broker
> > > >> has
> > > >> > > to
> > > >> > > > handle all kinds of the supported codec. So I added additional
> > > >> options
> > > >> > > > (compression.[gzip,snappy,lz4, zstd].level,
> > > >> > compression.[gzip,snappy,lz4,
> > > >> > > > zstd].buffer.size) with handling routines.
> > > >> > > >
> > > >> > > > Please have a look when you are free.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Dongjin
> > > >> > > >
> > > >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <
> dongjin@apache.org>
> > > >> wrote:
> > > >> > > >
> > > >> > > > > Thanks for pointing out Ismael. It's now updated.
> > > >> > > > >
> > > >> > > > > Best,
> > > >> > > > > Dongjin
> > > >> > > > >
> > > >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <
> ismaelj@gmail.com
> > >
> > > >> > wrote:
> > > >> > > > >
> > > >> > > > >> Thanks Dongjin. One minor suggestion: we should mention
> that
> > > the
> > > >> > > broker
> > > >> > > > >> side configs are also topic configs (i.e. can be set for a
> > > given
> > > >> > > topic).
> > > >> > > > >>
> > > >> > > > >> Ismael
> > > >> > > > >>
> > > >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <
> > dongjin@apache.org
> > > >> > wrote:
> > > >> > > > >>
> > > >> > > > >> > Happy new year.
> > > >> > > > >> >
> > > >> > > > >> > I just updated the title and contents of KIP and Jira
> > issue,
> > > >> with
> > > >> > > > >> updated
> > > >> > > > >> > draft implementation. Now both of compression level and
> > > buffer
> > > >> > size
> > > >> > > > >> options
> > > >> > > > >> > are available to producer and broker configuration. You
> can
> > > >> check
> > > >> > > the
> > > >> > > > >> > updated KIP from modified URL:
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > > >> > > > >> >
> > > >> > > > >> > Please have a look when you are free.
> > > >> > > > >> >
> > > >> > > > >> > Thanks,
> > > >> > > > >> > Dongjin
> > > >> > > > >> >
> > > >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <
> > > ismaelj@gmail.com
> > > >> >
> > > >> > > > wrote:
> > > >> > > > >> >
> > > >> > > > >> > > The updated title sounds fine to me.
> > > >> > > > >> > >
> > > >> > > > >> > > Ismael
> > > >> > > > >> > >
> > > >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
> > > dongjin@apache.org
> > > >> > > wrote:
> > > >> > > > >> > >
> > > >> > > > >> > > > Hi Ismael,
> > > >> > > > >> > > >
> > > >> > > > >> > > > Got it. Your direction is perfectly reasonable. I am
> > now
> > > >> > > updating
> > > >> > > > >> the
> > > >> > > > >> > KIP
> > > >> > > > >> > > > document and the implementation.
> > > >> > > > >> > > >
> > > >> > > > >> > > > By allowing the buffer/block size to be configurable,
> > it
> > > >> would
> > > >> > > be
> > > >> > > > >> > better
> > > >> > > > >> > > to
> > > >> > > > >> > > > update the title of the KIP like 'Allow fine-grained
> > > >> > > configuration
> > > >> > > > >> for
> > > >> > > > >> > > > compression'. Is that right?
> > > >> > > > >> > > >
> > > >> > > > >> > > > @Other committers:
> > > >> > > > >> > > >
> > > >> > > > >> > > > Is there any other opinion on allowing the
> buffer/block
> > > >> size
> > > >> > to
> > > >> > > be
> > > >> > > > >> > > > configurable?
> > > >> > > > >> > > >
> > > >> > > > >> > > > Thanks,
> > > >> > > > >> > > > Dongjin
> > > >> > > > >> > > >
> > > >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
> > > >> > ismael@juma.me.uk>
> > > >> > > > >> wrote:
> > > >> > > > >> > > >
> > > >> > > > >> > > > > Hi Dongjin,
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > To clarify, I mean a broker topic config with
> regards
> > > to
> > > >> > point
> > > >> > > > 1.
> > > >> > > > >> As
> > > >> > > > >> > > you
> > > >> > > > >> > > > > know, compression can be done by the producer
> and/or
> > by
> > > >> the
> > > >> > > > >> broker.
> > > >> > > > >> > The
> > > >> > > > >> > > > > default is for the broker to just use whatever
> > > >> compression
> > > >> > was
> > > >> > > > >> used
> > > >> > > > >> > by
> > > >> > > > >> > > > the
> > > >> > > > >> > > > > producer, but this can be changed by the user on a
> > per
> > > >> topic
> > > >> > > > >> basis.
> > > >> > > > >> > It
> > > >> > > > >> > > > > seems like it would make sense for the configs to
> be
> > .
> > > >> > > > consistent
> > > >> > > > >> > > between
> > > >> > > > >> > > > > producer and broker.
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > For point 2, I haven't looked at the
> implementation,
> > > but
> > > >> we
> > > >> > > > could
> > > >> > > > >> do
> > > >> > > > >> > it
> > > >> > > > >> > > > in
> > > >> > > > >> > > > > the `CompressionType` enum by invoking the right
> > > >> constructor
> > > >> > > or
> > > >> > > > >> > > > retrieving
> > > >> > > > >> > > > > the default value via a constant (if defined).
> That's
> > > an
> > > >> > > > >> > implementation
> > > >> > > > >> > > > > detail and can be discussed in the PR. The more
> > general
> > > >> > point
> > > >> > > is
> > > >> > > > >> to
> > > >> > > > >> > > rely
> > > >> > > > >> > > > on
> > > >> > > > >> > > > > the library defaults instead of choosing one
> > ourselves.
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > For point 3, I'm in favour of doing that in this
> KIP.
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > Ismael
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
> > > >> > > dongjin@apache.org
> > > >> > > > >
> > > >> > > > >> > > wrote:
> > > >> > > > >> > > > >
> > > >> > > > >> > > > > > Thank you Ismael, here are the answers:
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > *1. About topic config*
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > After some consideration, I concluded that topic
> > > config
> > > >> > > > doesn't
> > > >> > > > >> > need
> > > >> > > > >> > > to
> > > >> > > > >> > > > > > support compression.level. Here is why: since the
> > > >> > > compression
> > > >> > > > is
> > > >> > > > >> > > > > conducted
> > > >> > > > >> > > > > > by the client, the one who can select the best
> > > >> compression
> > > >> > > > >> level is
> > > >> > > > >> > > the
> > > >> > > > >> > > > > > client itself. Let us assume that the compression
> > > >> level is
> > > >> > > set
> > > >> > > > >> at
> > > >> > > > >> > the
> > > >> > > > >> > > > > topic
> > > >> > > > >> > > > > > config level. In that case, there is a
> possibility
> > > that
> > > >> > the
> > > >> > > > >> > > compression
> > > >> > > > >> > > > > > level is not optimal for some producers.
> Actually,
> > > >> Kafka's
> > > >> > > go
> > > >> > > > >> > client
> > > >> > > > >> > > > also
> > > >> > > > >> > > > > > supports compression level functionality for the
> > > >> producer
> > > >> > > > config
> > > >> > > > >> > > only.
> > > >> > > > >> > > > > > <
> > > >> https://github.com/Shopify/sarama/blob/master/config.go>
> > > >> > > > >> (wait,
> > > >> > > > >> > do
> > > >> > > > >> > > we
> > > >> > > > >> > > > > > need
> > > >> > > > >> > > > > > to add this reasoning in the KIP, rejected
> > > alternatives
> > > >> > > > >> section?)
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > *2. About default level*
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > As of current draft implementation, the default
> > > >> > compression
> > > >> > > is
> > > >> > > > >> set
> > > >> > > > >> > on
> > > >> > > > >> > > > the
> > > >> > > > >> > > > > > CompressionType enum. Of course, changing this
> > > strategy
> > > >> > into
> > > >> > > > >> > relying
> > > >> > > > >> > > > on a
> > > >> > > > >> > > > > > method from the library to pick the default
> > > compression
> > > >> > > level
> > > >> > > > >> seems
> > > >> > > > >> > > > > > possible, like `GZIPBlockOutputStream` does. In
> > this
> > > >> case,
> > > >> > > we
> > > >> > > > >> need
> > > >> > > > >> > to
> > > >> > > > >> > > > add
> > > >> > > > >> > > > > > similar wrapper class for zstd and modify lz4 the
> > > >> wrapper
> > > >> > > > also.
> > > >> > > > >> Add
> > > >> > > > >> > > to
> > > >> > > > >> > > > > > this, it seems like we need to explicitly state
> > that
> > > we
> > > >> > > follow
> > > >> > > > >> the
> > > >> > > > >> > > > > default
> > > >> > > > >> > > > > > compression level of the codec in the
> > documentation.
> > > Is
> > > >> > this
> > > >> > > > >> what
> > > >> > > > >> > you
> > > >> > > > >> > > > > > intended?
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > *3. Whether to allow the buffer/block size to be
> > > >> > > configurable*
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > Well, As of current draft implementation, the lz4
> > > >> level is
> > > >> > > > >> > > implemented
> > > >> > > > >> > > > as
> > > >> > > > >> > > > > > block size; this is caused by my misunderstanding
> > on
> > > >> lz4.
> > > >> > > > After
> > > >> > > > >> > > > reviewing
> > > >> > > > >> > > > > > lz4 today, I found that it also supports
> > compression
> > > >> level
> > > >> > > of
> > > >> > > > >> 1~16
> > > >> > > > >> > > > > > (default: 1), not block size. I will fix it in
> this
> > > >> > weekend
> > > >> > > by
> > > >> > > > >> > > updating
> > > >> > > > >> > > > > the
> > > >> > > > >> > > > > > wrapper class.
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > For the problem of the buffer/block size, I have
> no
> > > >> strong
> > > >> > > > >> opinion.
> > > >> > > > >> > > If
> > > >> > > > >> > > > > the
> > > >> > > > >> > > > > > community needs it, I will do it all together.
> How
> > do
> > > >> you
> > > >> > > > think?
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > In short, it seems like I need to update the KIP
> > > >> document
> > > >> > > for
> > > >> > > > >> issue
> > > >> > > > >> > > #1
> > > >> > > > >> > > > > and
> > > >> > > > >> > > > > > update the compression wrapper for issue #2, #3.
> Is
> > > >> this
> > > >> > > okay?
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > Thanks,
> > > >> > > > >> > > > > > Dongjin
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
> > > >> > > > ismaelj@gmail.com
> > > >> > > > >> >
> > > >> > > > >> > > > wrote:
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > >  Thanks for the KIP, this is helpful. A few
> > > >> questions:
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > > > 1. Have we considered whether we want to allow
> a
> > > >> similar
> > > >> > > > topic
> > > >> > > > >> > > > config?
> > > >> > > > >> > > > > > > 2. Can we rely on a method from the library to
> > pick
> > > >> the
> > > >> > > > >> default
> > > >> > > > >> > > > > > compression
> > > >> > > > >> > > > > > > level if compression.level is not set? We do it
> > for
> > > >> gzip
> > > >> > > and
> > > >> > > > >> it
> > > >> > > > >> > > would
> > > >> > > > >> > > > > > seem
> > > >> > > > >> > > > > > > reasonable to do something similar for the
> other
> > > >> > > compression
> > > >> > > > >> > > > libraries.
> > > >> > > > >> > > > > > > 3. Do we want to allow the buffer/block size to
> > be
> > > >> > > > >> configurable?
> > > >> > > > >> > > This
> > > >> > > > >> > > > > has
> > > >> > > > >> > > > > > > an impact on memory usage and people may want
> to
> > > >> trade
> > > >> > > > >> > compression
> > > >> > > > >> > > > for
> > > >> > > > >> > > > > > > less/more memory in some cases. For example,
> the
> > > >> default
> > > >> > > for
> > > >> > > > >> LZ4
> > > >> > > > >> > is
> > > >> > > > >> > > > > 64KB
> > > >> > > > >> > > > > > > which is a bit high.
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > > > Ismael
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
> > > >> > > > dongjin@apache.org
> > > >> > > > >> > > wrote:
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > > > > Hello dev,
> > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > > > > > I hope to initiate the discussion of KIP-390:
> > Add
> > > >> > > producer
> > > >> > > > >> > option
> > > >> > > > >> > > > to
> > > >> > > > >> > > > > > > adjust
> > > >> > > > >> > > > > > > > compression level
> > > >> > > > >> > > > > > > > <
> > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > >
> > > >> > > > >> > > >
> > > >> > > > >> > >
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > >> > > > >> > > > > > > > >.
> > > >> > > > >> > > > > > > > All feedbacks will be highly appreciated.
> > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > > > > > Best,
> > > >> > > > >> > > > > > > > Dongjin
> > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > > > > > --
> > > >> > > > >> > > > > > > > *Dongjin Lee*
> > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > > > > > *A hitchhiker in the mathematical world.*
> > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > > > > > *github:  <http://goog_969573159/>
> > > >> > > github.com/dongjinleekr
> > > >> > > > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
> > > >> >slideshare:
> > > >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> > > >> > > > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > >> > > > >> > > > > > > >
> > > >> > > > >> > > > > > >
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > --
> > > >> > > > >> > > > > > *Dongjin Lee*
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > > > *A hitchhiker in the mathematical world.*
> > > >> > > > >> > > > > > *github:  <http://goog_969573159/>
> > > >> github.com/dongjinleekr
> > > >> > > > >> > > > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr
> > > >speakerdeck:
> > > >> > > > >> > > > > > speakerdeck.com/dongjin
> > > >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> > > >> > > > >> > > > > >
> > > >> > > > >> > > > >
> > > >> > > > >> > > >
> > > >> > > > >> > > >
> > > >> > > > >> > > > --
> > > >> > > > >> > > > *Dongjin Lee*
> > > >> > > > >> > > >
> > > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > >> > > > >> > > > *github:  <http://goog_969573159/>
> > > github.com/dongjinleekr
> > > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr
> >speakerdeck:
> > > >> > > > >> > > > speakerdeck.com/dongjin
> > > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > >> > > > >> > > >
> > > >> > > > >> > >
> > > >> > > > >> >
> > > >> > > > >> >
> > > >> > > > >> > --
> > > >> > > > >> > *Dongjin Lee*
> > > >> > > > >> >
> > > >> > > > >> > *A hitchhiker in the mathematical world.*
> > > >> > > > >> > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > > > >> > speakerdeck.com/dongjin
> > > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > > >> > > > >> >
> > > >> > > > >>
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > *Dongjin Lee*
> > > >> > > > >
> > > >> > > > > *A hitchhiker in the mathematical world.*
> > > >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > > > speakerdeck.com/dongjin
> > > >> > > > > <https://speakerdeck.com/dongjin>*
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > *Dongjin Lee*
> > > >> > > >
> > > >> > > > *A hitchhiker in the mathematical world.*
> > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > > > speakerdeck.com/dongjin
> > > >> > > > <https://speakerdeck.com/dongjin>*
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > *Dongjin Lee*
> > > >> >
> > > >> > *A hitchhiker in the mathematical world.*
> > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > <https://github.com/dongjinleekr>linkedin:
> > > >> kr.linkedin.com/in/dongjinleekr
> > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > speakerdeck.com/dongjin
> > > >> > <https://speakerdeck.com/dongjin>*
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <https://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > speakerdeck.com/dongjin
> > > > <https://speakerdeck.com/dongjin>*
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <https://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > speakerdeck.com/dongjin
> > > <https://speakerdeck.com/dongjin>*
> > >
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Hello. I just fixed the draft implementation, with rebasing onto the latest
trunk. The KIP was also restored.

Please have a look, and if there is no major problem, please vote to the
voting thread. You know, KIP freeze for 2.2.0 is almost imminent.

Thanks,
Dongjin

On Tue, Jan 22, 2019 at 1:04 AM Ismael Juma <is...@juma.me.uk> wrote:

> Thanks!
>
> Ismael
>
> On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <do...@apache.org> wrote:
>
> > Hi Ismael,
> >
> > After reviewing
> `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
> > yes, you are right. If source codec and target codec is identical and the
> > magic is above 0, the broker can do an in-place assignment, without
> > recompressing. Sorry for my misunderstanding.
> >
> > Since we don't need `compression.[gzip,lz4,zstd].level` and
> > `compression.[gzip,snappy,lz4].buffer.size` anymore, I will revert those
> > changes and update the KIP with the recent discussions. I will complete
> it
> > in 48 hours from now.
> >
> > Thanks,
> > Dongjin
> >
> > On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <do...@apache.org> wrote:
> >
> > > I see. Let me have a check. If not needed, of course, we don't have to
> > > waste on configuration options.
> > >
> > > Since the KIP deadline is imminent, I just opened the voting thread.
> > Let's
> > > continue the discussion here.
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <is...@juma.me.uk> wrote:
> > >
> > >> Hi Dongjin,
> > >>
> > >> When the compression type is "producer", then the broker doesn't
> > >> recompress
> > >> though. Thinking about it some more, there are some uncommon cases
> where
> > >> recompression does happen (the old (and hopefully hardly used by now)
> > >> message format == 0 and some edge cases), so it is a good point you
> > >> raised.
> > >>
> > >> It's a bit unfortunate to add so many topic configs for cases that
> > >> probably
> > >> don't matter. That is, if you are using "producer" compression, you
> > >> probably don't need to configure these settings and can live with the
> > >> defaults. Perhaps we should only support the topic config for the
> cases
> > >> where you are actually recompressing in the broker.
> > >>
> > >> What do you think? I'd be interested in other people's thoughts too.
> > >>
> > >> Ismael
> > >>
> > >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <do...@apache.org>
> wrote:
> > >>
> > >> > Hi Ismael,
> > >> >
> > >> > It seems like it needs more explanation. Here is the detailed
> > reasoning.
> > >> >
> > >> > You know, topic and broker's 'compression.type' allows
> 'uncompressed',
> > >> > 'producer' with standard codecs (i.e., gzip, snappy, lz4, zstd.) And
> > >> this
> > >> > configuration is used by the broker in the re-compressing process
> > after
> > >> > offset assignment. After this feature, the new configs,
> > >> 'compression.level'
> > >> > and 'compression.buffer.size', also will be used in this process.
> > >> >
> > >> > The problem arises when given topic's compression type (whether it
> was
> > >> > inherited from broker's configuration or explicitly set) is
> > 'producer.'
> > >> > With this setting, the compression codec to be used is decided by
> the
> > >> > producer client. Since there is no way to restore the compression
> > level
> > >> and
> > >> > buffer size from the message, we can take the following strategies:
> > >> >
> > >> > 1. Just use given 'compression.level' and 'compression.buffer.size'
> > >> > settings.
> > >> >
> > >> > It will cause so many errors. Let's imagine the case of topic's
> > >> > configuration is { compression.type=producer, compression.level=10,
> > >> > compression.buffer.size=8192 }. In this case, all producers with
> gzip
> > or
> > >> > lz4 compressed messages will result in an error. (gzip doesn't allow
> > >> > compression level 10, and lz4 also for a buffer size of 8192.)
> > >> >
> > >> > 2. Extend the message format to include compression configurations.
> > >> >
> > >> > With this strategy, we need to change the message format - it's a
> too
> > >> big
> > >> > change.
> > >> >
> > >> > 3. If topic's compression.type is 'producer', use the default
> > >> configuration
> > >> > for the given codec.
> > >> >
> > >> > With this strategy, allowing fine-grained compression configuration
> is
> > >> > meaningless.
> > >> >
> > >> > For the above reasons, I think the only alternative is providing
> > options
> > >> > that can be used when the topic's 'compression.type' is 'producer.'
> In
> > >> > other words, adding compression.[gzip, lz4, zstd].level and
> > >> > compression.[gzip.snappy.lz4].buffer.size options - and it is what I
> > >> did in
> > >> > the last modification.
> > >> >
> > >> > (wait, the reasoning above should be included in the KIP in the
> > rejected
> > >> > alternatives section, isn't it?)
> > >> >
> > >> > Thanks,
> > >> > Dongjin
> > >> >
> > >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <is...@juma.me.uk>
> > wrote:
> > >> >
> > >> > > Hi Dongjin,
> > >> > >
> > >> > > For topic level, you can only have a single compression type so
> the
> > >> way
> > >> > it
> > >> > > was before was fine, right? The point you raise is how to set
> broker
> > >> > > defaults that vary depending on the compression type, correct?
> > >> > >
> > >> > > Ismael
> > >> > >
> > >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <do...@apache.org>
> > >> wrote:
> > >> > >
> > >> > > > I just realized that there was a missing hole in the KIP, so I
> > fixed
> > >> > it.
> > >> > > > The draft implementation will be updated soon.
> > >> > > >
> > >> > > > In short, the proposed change did not regard the case of the
> topic
> > >> or
> > >> > > > broker's 'compression.type' is 'producer'; in this case, the
> > broker
> > >> has
> > >> > > to
> > >> > > > handle all kinds of the supported codec. So I added additional
> > >> options
> > >> > > > (compression.[gzip,snappy,lz4, zstd].level,
> > >> > compression.[gzip,snappy,lz4,
> > >> > > > zstd].buffer.size) with handling routines.
> > >> > > >
> > >> > > > Please have a look when you are free.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Dongjin
> > >> > > >
> > >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <do...@apache.org>
> > >> wrote:
> > >> > > >
> > >> > > > > Thanks for pointing out Ismael. It's now updated.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > Dongjin
> > >> > > > >
> > >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <ismaelj@gmail.com
> >
> > >> > wrote:
> > >> > > > >
> > >> > > > >> Thanks Dongjin. One minor suggestion: we should mention that
> > the
> > >> > > broker
> > >> > > > >> side configs are also topic configs (i.e. can be set for a
> > given
> > >> > > topic).
> > >> > > > >>
> > >> > > > >> Ismael
> > >> > > > >>
> > >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <
> dongjin@apache.org
> > >> > wrote:
> > >> > > > >>
> > >> > > > >> > Happy new year.
> > >> > > > >> >
> > >> > > > >> > I just updated the title and contents of KIP and Jira
> issue,
> > >> with
> > >> > > > >> updated
> > >> > > > >> > draft implementation. Now both of compression level and
> > buffer
> > >> > size
> > >> > > > >> options
> > >> > > > >> > are available to producer and broker configuration. You can
> > >> check
> > >> > > the
> > >> > > > >> > updated KIP from modified URL:
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > >> > > > >> >
> > >> > > > >> > Please have a look when you are free.
> > >> > > > >> >
> > >> > > > >> > Thanks,
> > >> > > > >> > Dongjin
> > >> > > > >> >
> > >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <
> > ismaelj@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > > >> >
> > >> > > > >> > > The updated title sounds fine to me.
> > >> > > > >> > >
> > >> > > > >> > > Ismael
> > >> > > > >> > >
> > >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
> > dongjin@apache.org
> > >> > > wrote:
> > >> > > > >> > >
> > >> > > > >> > > > Hi Ismael,
> > >> > > > >> > > >
> > >> > > > >> > > > Got it. Your direction is perfectly reasonable. I am
> now
> > >> > > updating
> > >> > > > >> the
> > >> > > > >> > KIP
> > >> > > > >> > > > document and the implementation.
> > >> > > > >> > > >
> > >> > > > >> > > > By allowing the buffer/block size to be configurable,
> it
> > >> would
> > >> > > be
> > >> > > > >> > better
> > >> > > > >> > > to
> > >> > > > >> > > > update the title of the KIP like 'Allow fine-grained
> > >> > > configuration
> > >> > > > >> for
> > >> > > > >> > > > compression'. Is that right?
> > >> > > > >> > > >
> > >> > > > >> > > > @Other committers:
> > >> > > > >> > > >
> > >> > > > >> > > > Is there any other opinion on allowing the buffer/block
> > >> size
> > >> > to
> > >> > > be
> > >> > > > >> > > > configurable?
> > >> > > > >> > > >
> > >> > > > >> > > > Thanks,
> > >> > > > >> > > > Dongjin
> > >> > > > >> > > >
> > >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
> > >> > ismael@juma.me.uk>
> > >> > > > >> wrote:
> > >> > > > >> > > >
> > >> > > > >> > > > > Hi Dongjin,
> > >> > > > >> > > > >
> > >> > > > >> > > > > To clarify, I mean a broker topic config with regards
> > to
> > >> > point
> > >> > > > 1.
> > >> > > > >> As
> > >> > > > >> > > you
> > >> > > > >> > > > > know, compression can be done by the producer and/or
> by
> > >> the
> > >> > > > >> broker.
> > >> > > > >> > The
> > >> > > > >> > > > > default is for the broker to just use whatever
> > >> compression
> > >> > was
> > >> > > > >> used
> > >> > > > >> > by
> > >> > > > >> > > > the
> > >> > > > >> > > > > producer, but this can be changed by the user on a
> per
> > >> topic
> > >> > > > >> basis.
> > >> > > > >> > It
> > >> > > > >> > > > > seems like it would make sense for the configs to be
> .
> > >> > > > consistent
> > >> > > > >> > > between
> > >> > > > >> > > > > producer and broker.
> > >> > > > >> > > > >
> > >> > > > >> > > > > For point 2, I haven't looked at the implementation,
> > but
> > >> we
> > >> > > > could
> > >> > > > >> do
> > >> > > > >> > it
> > >> > > > >> > > > in
> > >> > > > >> > > > > the `CompressionType` enum by invoking the right
> > >> constructor
> > >> > > or
> > >> > > > >> > > > retrieving
> > >> > > > >> > > > > the default value via a constant (if defined). That's
> > an
> > >> > > > >> > implementation
> > >> > > > >> > > > > detail and can be discussed in the PR. The more
> general
> > >> > point
> > >> > > is
> > >> > > > >> to
> > >> > > > >> > > rely
> > >> > > > >> > > > on
> > >> > > > >> > > > > the library defaults instead of choosing one
> ourselves.
> > >> > > > >> > > > >
> > >> > > > >> > > > > For point 3, I'm in favour of doing that in this KIP.
> > >> > > > >> > > > >
> > >> > > > >> > > > > Ismael
> > >> > > > >> > > > >
> > >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
> > >> > > dongjin@apache.org
> > >> > > > >
> > >> > > > >> > > wrote:
> > >> > > > >> > > > >
> > >> > > > >> > > > > > Thank you Ismael, here are the answers:
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > *1. About topic config*
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > After some consideration, I concluded that topic
> > config
> > >> > > > doesn't
> > >> > > > >> > need
> > >> > > > >> > > to
> > >> > > > >> > > > > > support compression.level. Here is why: since the
> > >> > > compression
> > >> > > > is
> > >> > > > >> > > > > conducted
> > >> > > > >> > > > > > by the client, the one who can select the best
> > >> compression
> > >> > > > >> level is
> > >> > > > >> > > the
> > >> > > > >> > > > > > client itself. Let us assume that the compression
> > >> level is
> > >> > > set
> > >> > > > >> at
> > >> > > > >> > the
> > >> > > > >> > > > > topic
> > >> > > > >> > > > > > config level. In that case, there is a possibility
> > that
> > >> > the
> > >> > > > >> > > compression
> > >> > > > >> > > > > > level is not optimal for some producers. Actually,
> > >> Kafka's
> > >> > > go
> > >> > > > >> > client
> > >> > > > >> > > > also
> > >> > > > >> > > > > > supports compression level functionality for the
> > >> producer
> > >> > > > config
> > >> > > > >> > > only.
> > >> > > > >> > > > > > <
> > >> https://github.com/Shopify/sarama/blob/master/config.go>
> > >> > > > >> (wait,
> > >> > > > >> > do
> > >> > > > >> > > we
> > >> > > > >> > > > > > need
> > >> > > > >> > > > > > to add this reasoning in the KIP, rejected
> > alternatives
> > >> > > > >> section?)
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > *2. About default level*
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > As of current draft implementation, the default
> > >> > compression
> > >> > > is
> > >> > > > >> set
> > >> > > > >> > on
> > >> > > > >> > > > the
> > >> > > > >> > > > > > CompressionType enum. Of course, changing this
> > strategy
> > >> > into
> > >> > > > >> > relying
> > >> > > > >> > > > on a
> > >> > > > >> > > > > > method from the library to pick the default
> > compression
> > >> > > level
> > >> > > > >> seems
> > >> > > > >> > > > > > possible, like `GZIPBlockOutputStream` does. In
> this
> > >> case,
> > >> > > we
> > >> > > > >> need
> > >> > > > >> > to
> > >> > > > >> > > > add
> > >> > > > >> > > > > > similar wrapper class for zstd and modify lz4 the
> > >> wrapper
> > >> > > > also.
> > >> > > > >> Add
> > >> > > > >> > > to
> > >> > > > >> > > > > > this, it seems like we need to explicitly state
> that
> > we
> > >> > > follow
> > >> > > > >> the
> > >> > > > >> > > > > default
> > >> > > > >> > > > > > compression level of the codec in the
> documentation.
> > Is
> > >> > this
> > >> > > > >> what
> > >> > > > >> > you
> > >> > > > >> > > > > > intended?
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > *3. Whether to allow the buffer/block size to be
> > >> > > configurable*
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > Well, As of current draft implementation, the lz4
> > >> level is
> > >> > > > >> > > implemented
> > >> > > > >> > > > as
> > >> > > > >> > > > > > block size; this is caused by my misunderstanding
> on
> > >> lz4.
> > >> > > > After
> > >> > > > >> > > > reviewing
> > >> > > > >> > > > > > lz4 today, I found that it also supports
> compression
> > >> level
> > >> > > of
> > >> > > > >> 1~16
> > >> > > > >> > > > > > (default: 1), not block size. I will fix it in this
> > >> > weekend
> > >> > > by
> > >> > > > >> > > updating
> > >> > > > >> > > > > the
> > >> > > > >> > > > > > wrapper class.
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > For the problem of the buffer/block size, I have no
> > >> strong
> > >> > > > >> opinion.
> > >> > > > >> > > If
> > >> > > > >> > > > > the
> > >> > > > >> > > > > > community needs it, I will do it all together. How
> do
> > >> you
> > >> > > > think?
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > In short, it seems like I need to update the KIP
> > >> document
> > >> > > for
> > >> > > > >> issue
> > >> > > > >> > > #1
> > >> > > > >> > > > > and
> > >> > > > >> > > > > > update the compression wrapper for issue #2, #3. Is
> > >> this
> > >> > > okay?
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > Thanks,
> > >> > > > >> > > > > > Dongjin
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
> > >> > > > ismaelj@gmail.com
> > >> > > > >> >
> > >> > > > >> > > > wrote:
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > >  Thanks for the KIP, this is helpful. A few
> > >> questions:
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > > > 1. Have we considered whether we want to allow a
> > >> similar
> > >> > > > topic
> > >> > > > >> > > > config?
> > >> > > > >> > > > > > > 2. Can we rely on a method from the library to
> pick
> > >> the
> > >> > > > >> default
> > >> > > > >> > > > > > compression
> > >> > > > >> > > > > > > level if compression.level is not set? We do it
> for
> > >> gzip
> > >> > > and
> > >> > > > >> it
> > >> > > > >> > > would
> > >> > > > >> > > > > > seem
> > >> > > > >> > > > > > > reasonable to do something similar for the other
> > >> > > compression
> > >> > > > >> > > > libraries.
> > >> > > > >> > > > > > > 3. Do we want to allow the buffer/block size to
> be
> > >> > > > >> configurable?
> > >> > > > >> > > This
> > >> > > > >> > > > > has
> > >> > > > >> > > > > > > an impact on memory usage and people may want to
> > >> trade
> > >> > > > >> > compression
> > >> > > > >> > > > for
> > >> > > > >> > > > > > > less/more memory in some cases. For example, the
> > >> default
> > >> > > for
> > >> > > > >> LZ4
> > >> > > > >> > is
> > >> > > > >> > > > > 64KB
> > >> > > > >> > > > > > > which is a bit high.
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > > > Ismael
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
> > >> > > > dongjin@apache.org
> > >> > > > >> > > wrote:
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > > > > Hello dev,
> > >> > > > >> > > > > > > >
> > >> > > > >> > > > > > > > I hope to initiate the discussion of KIP-390:
> Add
> > >> > > producer
> > >> > > > >> > option
> > >> > > > >> > > > to
> > >> > > > >> > > > > > > adjust
> > >> > > > >> > > > > > > > compression level
> > >> > > > >> > > > > > > > <
> > >> > > > >> > > > > > > >
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > >
> > >> > > > >> > > > >
> > >> > > > >> > > >
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >>
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > >> > > > >> > > > > > > > >.
> > >> > > > >> > > > > > > > All feedbacks will be highly appreciated.
> > >> > > > >> > > > > > > >
> > >> > > > >> > > > > > > > Best,
> > >> > > > >> > > > > > > > Dongjin
> > >> > > > >> > > > > > > >
> > >> > > > >> > > > > > > > --
> > >> > > > >> > > > > > > > *Dongjin Lee*
> > >> > > > >> > > > > > > >
> > >> > > > >> > > > > > > > *A hitchhiker in the mathematical world.*
> > >> > > > >> > > > > > > >
> > >> > > > >> > > > > > > > *github:  <http://goog_969573159/>
> > >> > > github.com/dongjinleekr
> > >> > > > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
> > >> >slideshare:
> > >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> > >> > > > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > >> > > > >> > > > > > > >
> > >> > > > >> > > > > > >
> > >> > > > >> > > > > >
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > --
> > >> > > > >> > > > > > *Dongjin Lee*
> > >> > > > >> > > > > >
> > >> > > > >> > > > > > *A hitchhiker in the mathematical world.*
> > >> > > > >> > > > > > *github:  <http://goog_969573159/>
> > >> github.com/dongjinleekr
> > >> > > > >> > > > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr
> > >speakerdeck:
> > >> > > > >> > > > > > speakerdeck.com/dongjin
> > >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> > >> > > > >> > > > > >
> > >> > > > >> > > > >
> > >> > > > >> > > >
> > >> > > > >> > > >
> > >> > > > >> > > > --
> > >> > > > >> > > > *Dongjin Lee*
> > >> > > > >> > > >
> > >> > > > >> > > > *A hitchhiker in the mathematical world.*
> > >> > > > >> > > > *github:  <http://goog_969573159/>
> > github.com/dongjinleekr
> > >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > >> > > > speakerdeck.com/dongjin
> > >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > >> > > > >> > > >
> > >> > > > >> > >
> > >> > > > >> >
> > >> > > > >> >
> > >> > > > >> > --
> > >> > > > >> > *Dongjin Lee*
> > >> > > > >> >
> > >> > > > >> > *A hitchhiker in the mathematical world.*
> > >> > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > >> > > > >> kr.linkedin.com/in/dongjinleekr
> > >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > >> > speakerdeck.com/dongjin
> > >> > > > >> > <https://speakerdeck.com/dongjin>*
> > >> > > > >> >
> > >> > > > >>
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > *Dongjin Lee*
> > >> > > > >
> > >> > > > > *A hitchhiker in the mathematical world.*
> > >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > speakerdeck.com/dongjin
> > >> > > > > <https://speakerdeck.com/dongjin>*
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > *Dongjin Lee*
> > >> > > >
> > >> > > > *A hitchhiker in the mathematical world.*
> > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > kr.linkedin.com/in/dongjinleekr
> > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > speakerdeck.com/dongjin
> > >> > > > <https://speakerdeck.com/dongjin>*
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > *Dongjin Lee*
> > >> >
> > >> > *A hitchhiker in the mathematical world.*
> > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > <https://github.com/dongjinleekr>linkedin:
> > >> kr.linkedin.com/in/dongjinleekr
> > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > speakerdeck.com/dongjin
> > >> > <https://speakerdeck.com/dongjin>*
> > >> >
> > >>
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <https://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > > <https://speakerdeck.com/dongjin>*
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Ismael Juma <is...@juma.me.uk>.
Thanks!

Ismael

On Mon, Jan 21, 2019 at 6:02 AM Dongjin Lee <do...@apache.org> wrote:

> Hi Ismael,
>
> After reviewing `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
> yes, you are right. If source codec and target codec is identical and the
> magic is above 0, the broker can do an in-place assignment, without
> recompressing. Sorry for my misunderstanding.
>
> Since we don't need `compression.[gzip,lz4,zstd].level` and
> `compression.[gzip,snappy,lz4].buffer.size` anymore, I will revert those
> changes and update the KIP with the recent discussions. I will complete it
> in 48 hours from now.
>
> Thanks,
> Dongjin
>
> On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <do...@apache.org> wrote:
>
> > I see. Let me have a check. If not needed, of course, we don't have to
> > waste on configuration options.
> >
> > Since the KIP deadline is imminent, I just opened the voting thread.
> Let's
> > continue the discussion here.
> >
> > Best,
> > Dongjin
> >
> > On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <is...@juma.me.uk> wrote:
> >
> >> Hi Dongjin,
> >>
> >> When the compression type is "producer", then the broker doesn't
> >> recompress
> >> though. Thinking about it some more, there are some uncommon cases where
> >> recompression does happen (the old (and hopefully hardly used by now)
> >> message format == 0 and some edge cases), so it is a good point you
> >> raised.
> >>
> >> It's a bit unfortunate to add so many topic configs for cases that
> >> probably
> >> don't matter. That is, if you are using "producer" compression, you
> >> probably don't need to configure these settings and can live with the
> >> defaults. Perhaps we should only support the topic config for the cases
> >> where you are actually recompressing in the broker.
> >>
> >> What do you think? I'd be interested in other people's thoughts too.
> >>
> >> Ismael
> >>
> >> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <do...@apache.org> wrote:
> >>
> >> > Hi Ismael,
> >> >
> >> > It seems like it needs more explanation. Here is the detailed
> reasoning.
> >> >
> >> > You know, topic and broker's 'compression.type' allows 'uncompressed',
> >> > 'producer' with standard codecs (i.e., gzip, snappy, lz4, zstd.) And
> >> this
> >> > configuration is used by the broker in the re-compressing process
> after
> >> > offset assignment. After this feature, the new configs,
> >> 'compression.level'
> >> > and 'compression.buffer.size', also will be used in this process.
> >> >
> >> > The problem arises when given topic's compression type (whether it was
> >> > inherited from broker's configuration or explicitly set) is
> 'producer.'
> >> > With this setting, the compression codec to be used is decided by the
> >> > producer client. Since there is no way to restore the compression
> level
> >> and
> >> > buffer size from the message, we can take the following strategies:
> >> >
> >> > 1. Just use given 'compression.level' and 'compression.buffer.size'
> >> > settings.
> >> >
> >> > It will cause so many errors. Let's imagine the case of topic's
> >> > configuration is { compression.type=producer, compression.level=10,
> >> > compression.buffer.size=8192 }. In this case, all producers with gzip
> or
> >> > lz4 compressed messages will result in an error. (gzip doesn't allow
> >> > compression level 10, and lz4 also for a buffer size of 8192.)
> >> >
> >> > 2. Extend the message format to include compression configurations.
> >> >
> >> > With this strategy, we need to change the message format - it's a too
> >> big
> >> > change.
> >> >
> >> > 3. If topic's compression.type is 'producer', use the default
> >> configuration
> >> > for the given codec.
> >> >
> >> > With this strategy, allowing fine-grained compression configuration is
> >> > meaningless.
> >> >
> >> > For the above reasons, I think the only alternative is providing
> options
> >> > that can be used when the topic's 'compression.type' is 'producer.' In
> >> > other words, adding compression.[gzip, lz4, zstd].level and
> >> > compression.[gzip.snappy.lz4].buffer.size options - and it is what I
> >> did in
> >> > the last modification.
> >> >
> >> > (wait, the reasoning above should be included in the KIP in the
> rejected
> >> > alternatives section, isn't it?)
> >> >
> >> > Thanks,
> >> > Dongjin
> >> >
> >> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <is...@juma.me.uk>
> wrote:
> >> >
> >> > > Hi Dongjin,
> >> > >
> >> > > For topic level, you can only have a single compression type so the
> >> way
> >> > it
> >> > > was before was fine, right? The point you raise is how to set broker
> >> > > defaults that vary depending on the compression type, correct?
> >> > >
> >> > > Ismael
> >> > >
> >> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <do...@apache.org>
> >> wrote:
> >> > >
> >> > > > I just realized that there was a missing hole in the KIP, so I
> fixed
> >> > it.
> >> > > > The draft implementation will be updated soon.
> >> > > >
> >> > > > In short, the proposed change did not regard the case of the topic
> >> or
> >> > > > broker's 'compression.type' is 'producer'; in this case, the
> broker
> >> has
> >> > > to
> >> > > > handle all kinds of the supported codec. So I added additional
> >> options
> >> > > > (compression.[gzip,snappy,lz4, zstd].level,
> >> > compression.[gzip,snappy,lz4,
> >> > > > zstd].buffer.size) with handling routines.
> >> > > >
> >> > > > Please have a look when you are free.
> >> > > >
> >> > > > Thanks,
> >> > > > Dongjin
> >> > > >
> >> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <do...@apache.org>
> >> wrote:
> >> > > >
> >> > > > > Thanks for pointing out Ismael. It's now updated.
> >> > > > >
> >> > > > > Best,
> >> > > > > Dongjin
> >> > > > >
> >> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <is...@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > >> Thanks Dongjin. One minor suggestion: we should mention that
> the
> >> > > broker
> >> > > > >> side configs are also topic configs (i.e. can be set for a
> given
> >> > > topic).
> >> > > > >>
> >> > > > >> Ismael
> >> > > > >>
> >> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org
> >> > wrote:
> >> > > > >>
> >> > > > >> > Happy new year.
> >> > > > >> >
> >> > > > >> > I just updated the title and contents of KIP and Jira issue,
> >> with
> >> > > > >> updated
> >> > > > >> > draft implementation. Now both of compression level and
> buffer
> >> > size
> >> > > > >> options
> >> > > > >> > are available to producer and broker configuration. You can
> >> check
> >> > > the
> >> > > > >> > updated KIP from modified URL:
> >> > > > >> >
> >> > > > >> >
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> >> > > > >> >
> >> > > > >> > Please have a look when you are free.
> >> > > > >> >
> >> > > > >> > Thanks,
> >> > > > >> > Dongjin
> >> > > > >> >
> >> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <
> ismaelj@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >> >
> >> > > > >> > > The updated title sounds fine to me.
> >> > > > >> > >
> >> > > > >> > > Ismael
> >> > > > >> > >
> >> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <
> dongjin@apache.org
> >> > > wrote:
> >> > > > >> > >
> >> > > > >> > > > Hi Ismael,
> >> > > > >> > > >
> >> > > > >> > > > Got it. Your direction is perfectly reasonable. I am now
> >> > > updating
> >> > > > >> the
> >> > > > >> > KIP
> >> > > > >> > > > document and the implementation.
> >> > > > >> > > >
> >> > > > >> > > > By allowing the buffer/block size to be configurable, it
> >> would
> >> > > be
> >> > > > >> > better
> >> > > > >> > > to
> >> > > > >> > > > update the title of the KIP like 'Allow fine-grained
> >> > > configuration
> >> > > > >> for
> >> > > > >> > > > compression'. Is that right?
> >> > > > >> > > >
> >> > > > >> > > > @Other committers:
> >> > > > >> > > >
> >> > > > >> > > > Is there any other opinion on allowing the buffer/block
> >> size
> >> > to
> >> > > be
> >> > > > >> > > > configurable?
> >> > > > >> > > >
> >> > > > >> > > > Thanks,
> >> > > > >> > > > Dongjin
> >> > > > >> > > >
> >> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
> >> > ismael@juma.me.uk>
> >> > > > >> wrote:
> >> > > > >> > > >
> >> > > > >> > > > > Hi Dongjin,
> >> > > > >> > > > >
> >> > > > >> > > > > To clarify, I mean a broker topic config with regards
> to
> >> > point
> >> > > > 1.
> >> > > > >> As
> >> > > > >> > > you
> >> > > > >> > > > > know, compression can be done by the producer and/or by
> >> the
> >> > > > >> broker.
> >> > > > >> > The
> >> > > > >> > > > > default is for the broker to just use whatever
> >> compression
> >> > was
> >> > > > >> used
> >> > > > >> > by
> >> > > > >> > > > the
> >> > > > >> > > > > producer, but this can be changed by the user on a per
> >> topic
> >> > > > >> basis.
> >> > > > >> > It
> >> > > > >> > > > > seems like it would make sense for the configs to be .
> >> > > > consistent
> >> > > > >> > > between
> >> > > > >> > > > > producer and broker.
> >> > > > >> > > > >
> >> > > > >> > > > > For point 2, I haven't looked at the implementation,
> but
> >> we
> >> > > > could
> >> > > > >> do
> >> > > > >> > it
> >> > > > >> > > > in
> >> > > > >> > > > > the `CompressionType` enum by invoking the right
> >> constructor
> >> > > or
> >> > > > >> > > > retrieving
> >> > > > >> > > > > the default value via a constant (if defined). That's
> an
> >> > > > >> > implementation
> >> > > > >> > > > > detail and can be discussed in the PR. The more general
> >> > point
> >> > > is
> >> > > > >> to
> >> > > > >> > > rely
> >> > > > >> > > > on
> >> > > > >> > > > > the library defaults instead of choosing one ourselves.
> >> > > > >> > > > >
> >> > > > >> > > > > For point 3, I'm in favour of doing that in this KIP.
> >> > > > >> > > > >
> >> > > > >> > > > > Ismael
> >> > > > >> > > > >
> >> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
> >> > > dongjin@apache.org
> >> > > > >
> >> > > > >> > > wrote:
> >> > > > >> > > > >
> >> > > > >> > > > > > Thank you Ismael, here are the answers:
> >> > > > >> > > > > >
> >> > > > >> > > > > > *1. About topic config*
> >> > > > >> > > > > >
> >> > > > >> > > > > > After some consideration, I concluded that topic
> config
> >> > > > doesn't
> >> > > > >> > need
> >> > > > >> > > to
> >> > > > >> > > > > > support compression.level. Here is why: since the
> >> > > compression
> >> > > > is
> >> > > > >> > > > > conducted
> >> > > > >> > > > > > by the client, the one who can select the best
> >> compression
> >> > > > >> level is
> >> > > > >> > > the
> >> > > > >> > > > > > client itself. Let us assume that the compression
> >> level is
> >> > > set
> >> > > > >> at
> >> > > > >> > the
> >> > > > >> > > > > topic
> >> > > > >> > > > > > config level. In that case, there is a possibility
> that
> >> > the
> >> > > > >> > > compression
> >> > > > >> > > > > > level is not optimal for some producers. Actually,
> >> Kafka's
> >> > > go
> >> > > > >> > client
> >> > > > >> > > > also
> >> > > > >> > > > > > supports compression level functionality for the
> >> producer
> >> > > > config
> >> > > > >> > > only.
> >> > > > >> > > > > > <
> >> https://github.com/Shopify/sarama/blob/master/config.go>
> >> > > > >> (wait,
> >> > > > >> > do
> >> > > > >> > > we
> >> > > > >> > > > > > need
> >> > > > >> > > > > > to add this reasoning in the KIP, rejected
> alternatives
> >> > > > >> section?)
> >> > > > >> > > > > >
> >> > > > >> > > > > > *2. About default level*
> >> > > > >> > > > > >
> >> > > > >> > > > > > As of current draft implementation, the default
> >> > compression
> >> > > is
> >> > > > >> set
> >> > > > >> > on
> >> > > > >> > > > the
> >> > > > >> > > > > > CompressionType enum. Of course, changing this
> strategy
> >> > into
> >> > > > >> > relying
> >> > > > >> > > > on a
> >> > > > >> > > > > > method from the library to pick the default
> compression
> >> > > level
> >> > > > >> seems
> >> > > > >> > > > > > possible, like `GZIPBlockOutputStream` does. In this
> >> case,
> >> > > we
> >> > > > >> need
> >> > > > >> > to
> >> > > > >> > > > add
> >> > > > >> > > > > > similar wrapper class for zstd and modify lz4 the
> >> wrapper
> >> > > > also.
> >> > > > >> Add
> >> > > > >> > > to
> >> > > > >> > > > > > this, it seems like we need to explicitly state that
> we
> >> > > follow
> >> > > > >> the
> >> > > > >> > > > > default
> >> > > > >> > > > > > compression level of the codec in the documentation.
> Is
> >> > this
> >> > > > >> what
> >> > > > >> > you
> >> > > > >> > > > > > intended?
> >> > > > >> > > > > >
> >> > > > >> > > > > > *3. Whether to allow the buffer/block size to be
> >> > > configurable*
> >> > > > >> > > > > >
> >> > > > >> > > > > > Well, As of current draft implementation, the lz4
> >> level is
> >> > > > >> > > implemented
> >> > > > >> > > > as
> >> > > > >> > > > > > block size; this is caused by my misunderstanding on
> >> lz4.
> >> > > > After
> >> > > > >> > > > reviewing
> >> > > > >> > > > > > lz4 today, I found that it also supports compression
> >> level
> >> > > of
> >> > > > >> 1~16
> >> > > > >> > > > > > (default: 1), not block size. I will fix it in this
> >> > weekend
> >> > > by
> >> > > > >> > > updating
> >> > > > >> > > > > the
> >> > > > >> > > > > > wrapper class.
> >> > > > >> > > > > >
> >> > > > >> > > > > > For the problem of the buffer/block size, I have no
> >> strong
> >> > > > >> opinion.
> >> > > > >> > > If
> >> > > > >> > > > > the
> >> > > > >> > > > > > community needs it, I will do it all together. How do
> >> you
> >> > > > think?
> >> > > > >> > > > > >
> >> > > > >> > > > > > In short, it seems like I need to update the KIP
> >> document
> >> > > for
> >> > > > >> issue
> >> > > > >> > > #1
> >> > > > >> > > > > and
> >> > > > >> > > > > > update the compression wrapper for issue #2, #3. Is
> >> this
> >> > > okay?
> >> > > > >> > > > > >
> >> > > > >> > > > > > Thanks,
> >> > > > >> > > > > > Dongjin
> >> > > > >> > > > > >
> >> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
> >> > > > ismaelj@gmail.com
> >> > > > >> >
> >> > > > >> > > > wrote:
> >> > > > >> > > > > >
> >> > > > >> > > > > > >  Thanks for the KIP, this is helpful. A few
> >> questions:
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > 1. Have we considered whether we want to allow a
> >> similar
> >> > > > topic
> >> > > > >> > > > config?
> >> > > > >> > > > > > > 2. Can we rely on a method from the library to pick
> >> the
> >> > > > >> default
> >> > > > >> > > > > > compression
> >> > > > >> > > > > > > level if compression.level is not set? We do it for
> >> gzip
> >> > > and
> >> > > > >> it
> >> > > > >> > > would
> >> > > > >> > > > > > seem
> >> > > > >> > > > > > > reasonable to do something similar for the other
> >> > > compression
> >> > > > >> > > > libraries.
> >> > > > >> > > > > > > 3. Do we want to allow the buffer/block size to be
> >> > > > >> configurable?
> >> > > > >> > > This
> >> > > > >> > > > > has
> >> > > > >> > > > > > > an impact on memory usage and people may want to
> >> trade
> >> > > > >> > compression
> >> > > > >> > > > for
> >> > > > >> > > > > > > less/more memory in some cases. For example, the
> >> default
> >> > > for
> >> > > > >> LZ4
> >> > > > >> > is
> >> > > > >> > > > > 64KB
> >> > > > >> > > > > > > which is a bit high.
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > Ismael
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
> >> > > > dongjin@apache.org
> >> > > > >> > > wrote:
> >> > > > >> > > > > > >
> >> > > > >> > > > > > > > Hello dev,
> >> > > > >> > > > > > > >
> >> > > > >> > > > > > > > I hope to initiate the discussion of KIP-390: Add
> >> > > producer
> >> > > > >> > option
> >> > > > >> > > > to
> >> > > > >> > > > > > > adjust
> >> > > > >> > > > > > > > compression level
> >> > > > >> > > > > > > > <
> >> > > > >> > > > > > > >
> >> > > > >> > > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > >
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >>
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> >> > > > >> > > > > > > > >.
> >> > > > >> > > > > > > > All feedbacks will be highly appreciated.
> >> > > > >> > > > > > > >
> >> > > > >> > > > > > > > Best,
> >> > > > >> > > > > > > > Dongjin
> >> > > > >> > > > > > > >
> >> > > > >> > > > > > > > --
> >> > > > >> > > > > > > > *Dongjin Lee*
> >> > > > >> > > > > > > >
> >> > > > >> > > > > > > > *A hitchhiker in the mathematical world.*
> >> > > > >> > > > > > > >
> >> > > > >> > > > > > > > *github:  <http://goog_969573159/>
> >> > > github.com/dongjinleekr
> >> > > > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> >> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> >> > > > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
> >> >slideshare:
> >> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> >> > > > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> >> > > > >> > > > > > > >
> >> > > > >> > > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > > >
> >> > > > >> > > > > > --
> >> > > > >> > > > > > *Dongjin Lee*
> >> > > > >> > > > > >
> >> > > > >> > > > > > *A hitchhiker in the mathematical world.*
> >> > > > >> > > > > > *github:  <http://goog_969573159/>
> >> github.com/dongjinleekr
> >> > > > >> > > > > > <https://github.com/dongjinleekr>linkedin:
> >> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> >> > > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr
> >speakerdeck:
> >> > > > >> > > > > > speakerdeck.com/dongjin
> >> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> >> > > > >> > > > > >
> >> > > > >> > > > >
> >> > > > >> > > >
> >> > > > >> > > >
> >> > > > >> > > > --
> >> > > > >> > > > *Dongjin Lee*
> >> > > > >> > > >
> >> > > > >> > > > *A hitchhiker in the mathematical world.*
> >> > > > >> > > > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> >> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> >> > > > >> > > kr.linkedin.com/in/dongjinleekr
> >> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > >> > > > speakerdeck.com/dongjin
> >> > > > >> > > > <https://speakerdeck.com/dongjin>*
> >> > > > >> > > >
> >> > > > >> > >
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > --
> >> > > > >> > *Dongjin Lee*
> >> > > > >> >
> >> > > > >> > *A hitchhiker in the mathematical world.*
> >> > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > >> > <https://github.com/dongjinleekr>linkedin:
> >> > > > >> kr.linkedin.com/in/dongjinleekr
> >> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > >> > speakerdeck.com/dongjin
> >> > > > >> > <https://speakerdeck.com/dongjin>*
> >> > > > >> >
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > *Dongjin Lee*
> >> > > > >
> >> > > > > *A hitchhiker in the mathematical world.*
> >> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > > <https://github.com/dongjinleekr>linkedin:
> >> > > > kr.linkedin.com/in/dongjinleekr
> >> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > speakerdeck.com/dongjin
> >> > > > > <https://speakerdeck.com/dongjin>*
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > *Dongjin Lee*
> >> > > >
> >> > > > *A hitchhiker in the mathematical world.*
> >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > <https://github.com/dongjinleekr>linkedin:
> >> > > kr.linkedin.com/in/dongjinleekr
> >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > speakerdeck.com/dongjin
> >> > > > <https://speakerdeck.com/dongjin>*
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > *Dongjin Lee*
> >> >
> >> > *A hitchhiker in the mathematical world.*
> >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > <https://github.com/dongjinleekr>linkedin:
> >> kr.linkedin.com/in/dongjinleekr
> >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > speakerdeck.com/dongjin
> >> > <https://speakerdeck.com/dongjin>*
> >> >
> >>
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Hi Ismael,

After reviewing `LogValidator#validateMessagesAndAssignOffsetsCompressed`,
yes, you are right. If source codec and target codec is identical and the
magic is above 0, the broker can do an in-place assignment, without
recompressing. Sorry for my misunderstanding.

Since we don't need `compression.[gzip,lz4,zstd].level` and
`compression.[gzip,snappy,lz4].buffer.size` anymore, I will revert those
changes and update the KIP with the recent discussions. I will complete it
in 48 hours from now.

Thanks,
Dongjin

On Mon, Jan 21, 2019 at 4:18 PM Dongjin Lee <do...@apache.org> wrote:

> I see. Let me have a check. If not needed, of course, we don't have to
> waste on configuration options.
>
> Since the KIP deadline is imminent, I just opened the voting thread. Let's
> continue the discussion here.
>
> Best,
> Dongjin
>
> On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <is...@juma.me.uk> wrote:
>
>> Hi Dongjin,
>>
>> When the compression type is "producer", then the broker doesn't
>> recompress
>> though. Thinking about it some more, there are some uncommon cases where
>> recompression does happen (the old (and hopefully hardly used by now)
>> message format == 0 and some edge cases), so it is a good point you
>> raised.
>>
>> It's a bit unfortunate to add so many topic configs for cases that
>> probably
>> don't matter. That is, if you are using "producer" compression, you
>> probably don't need to configure these settings and can live with the
>> defaults. Perhaps we should only support the topic config for the cases
>> where you are actually recompressing in the broker.
>>
>> What do you think? I'd be interested in other people's thoughts too.
>>
>> Ismael
>>
>> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <do...@apache.org> wrote:
>>
>> > Hi Ismael,
>> >
>> > It seems like it needs more explanation. Here is the detailed reasoning.
>> >
>> > You know, topic and broker's 'compression.type' allows 'uncompressed',
>> > 'producer' with standard codecs (i.e., gzip, snappy, lz4, zstd.) And
>> this
>> > configuration is used by the broker in the re-compressing process after
>> > offset assignment. After this feature, the new configs,
>> 'compression.level'
>> > and 'compression.buffer.size', also will be used in this process.
>> >
>> > The problem arises when given topic's compression type (whether it was
>> > inherited from broker's configuration or explicitly set) is 'producer.'
>> > With this setting, the compression codec to be used is decided by the
>> > producer client. Since there is no way to restore the compression level
>> and
>> > buffer size from the message, we can take the following strategies:
>> >
>> > 1. Just use given 'compression.level' and 'compression.buffer.size'
>> > settings.
>> >
>> > It will cause so many errors. Let's imagine the case of topic's
>> > configuration is { compression.type=producer, compression.level=10,
>> > compression.buffer.size=8192 }. In this case, all producers with gzip or
>> > lz4 compressed messages will result in an error. (gzip doesn't allow
>> > compression level 10, and lz4 also for a buffer size of 8192.)
>> >
>> > 2. Extend the message format to include compression configurations.
>> >
>> > With this strategy, we need to change the message format - it's a too
>> big
>> > change.
>> >
>> > 3. If topic's compression.type is 'producer', use the default
>> configuration
>> > for the given codec.
>> >
>> > With this strategy, allowing fine-grained compression configuration is
>> > meaningless.
>> >
>> > For the above reasons, I think the only alternative is providing options
>> > that can be used when the topic's 'compression.type' is 'producer.' In
>> > other words, adding compression.[gzip, lz4, zstd].level and
>> > compression.[gzip.snappy.lz4].buffer.size options - and it is what I
>> did in
>> > the last modification.
>> >
>> > (wait, the reasoning above should be included in the KIP in the rejected
>> > alternatives section, isn't it?)
>> >
>> > Thanks,
>> > Dongjin
>> >
>> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <is...@juma.me.uk> wrote:
>> >
>> > > Hi Dongjin,
>> > >
>> > > For topic level, you can only have a single compression type so the
>> way
>> > it
>> > > was before was fine, right? The point you raise is how to set broker
>> > > defaults that vary depending on the compression type, correct?
>> > >
>> > > Ismael
>> > >
>> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <do...@apache.org>
>> wrote:
>> > >
>> > > > I just realized that there was a missing hole in the KIP, so I fixed
>> > it.
>> > > > The draft implementation will be updated soon.
>> > > >
>> > > > In short, the proposed change did not regard the case of the topic
>> or
>> > > > broker's 'compression.type' is 'producer'; in this case, the broker
>> has
>> > > to
>> > > > handle all kinds of the supported codec. So I added additional
>> options
>> > > > (compression.[gzip,snappy,lz4, zstd].level,
>> > compression.[gzip,snappy,lz4,
>> > > > zstd].buffer.size) with handling routines.
>> > > >
>> > > > Please have a look when you are free.
>> > > >
>> > > > Thanks,
>> > > > Dongjin
>> > > >
>> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <do...@apache.org>
>> wrote:
>> > > >
>> > > > > Thanks for pointing out Ismael. It's now updated.
>> > > > >
>> > > > > Best,
>> > > > > Dongjin
>> > > > >
>> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <is...@gmail.com>
>> > wrote:
>> > > > >
>> > > > >> Thanks Dongjin. One minor suggestion: we should mention that the
>> > > broker
>> > > > >> side configs are also topic configs (i.e. can be set for a given
>> > > topic).
>> > > > >>
>> > > > >> Ismael
>> > > > >>
>> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org
>> > wrote:
>> > > > >>
>> > > > >> > Happy new year.
>> > > > >> >
>> > > > >> > I just updated the title and contents of KIP and Jira issue,
>> with
>> > > > >> updated
>> > > > >> > draft implementation. Now both of compression level and buffer
>> > size
>> > > > >> options
>> > > > >> > are available to producer and broker configuration. You can
>> check
>> > > the
>> > > > >> > updated KIP from modified URL:
>> > > > >> >
>> > > > >> >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
>> > > > >> >
>> > > > >> > Please have a look when you are free.
>> > > > >> >
>> > > > >> > Thanks,
>> > > > >> > Dongjin
>> > > > >> >
>> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <ismaelj@gmail.com
>> >
>> > > > wrote:
>> > > > >> >
>> > > > >> > > The updated title sounds fine to me.
>> > > > >> > >
>> > > > >> > > Ismael
>> > > > >> > >
>> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org
>> > > wrote:
>> > > > >> > >
>> > > > >> > > > Hi Ismael,
>> > > > >> > > >
>> > > > >> > > > Got it. Your direction is perfectly reasonable. I am now
>> > > updating
>> > > > >> the
>> > > > >> > KIP
>> > > > >> > > > document and the implementation.
>> > > > >> > > >
>> > > > >> > > > By allowing the buffer/block size to be configurable, it
>> would
>> > > be
>> > > > >> > better
>> > > > >> > > to
>> > > > >> > > > update the title of the KIP like 'Allow fine-grained
>> > > configuration
>> > > > >> for
>> > > > >> > > > compression'. Is that right?
>> > > > >> > > >
>> > > > >> > > > @Other committers:
>> > > > >> > > >
>> > > > >> > > > Is there any other opinion on allowing the buffer/block
>> size
>> > to
>> > > be
>> > > > >> > > > configurable?
>> > > > >> > > >
>> > > > >> > > > Thanks,
>> > > > >> > > > Dongjin
>> > > > >> > > >
>> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
>> > ismael@juma.me.uk>
>> > > > >> wrote:
>> > > > >> > > >
>> > > > >> > > > > Hi Dongjin,
>> > > > >> > > > >
>> > > > >> > > > > To clarify, I mean a broker topic config with regards to
>> > point
>> > > > 1.
>> > > > >> As
>> > > > >> > > you
>> > > > >> > > > > know, compression can be done by the producer and/or by
>> the
>> > > > >> broker.
>> > > > >> > The
>> > > > >> > > > > default is for the broker to just use whatever
>> compression
>> > was
>> > > > >> used
>> > > > >> > by
>> > > > >> > > > the
>> > > > >> > > > > producer, but this can be changed by the user on a per
>> topic
>> > > > >> basis.
>> > > > >> > It
>> > > > >> > > > > seems like it would make sense for the configs to be .
>> > > > consistent
>> > > > >> > > between
>> > > > >> > > > > producer and broker.
>> > > > >> > > > >
>> > > > >> > > > > For point 2, I haven't looked at the implementation, but
>> we
>> > > > could
>> > > > >> do
>> > > > >> > it
>> > > > >> > > > in
>> > > > >> > > > > the `CompressionType` enum by invoking the right
>> constructor
>> > > or
>> > > > >> > > > retrieving
>> > > > >> > > > > the default value via a constant (if defined). That's an
>> > > > >> > implementation
>> > > > >> > > > > detail and can be discussed in the PR. The more general
>> > point
>> > > is
>> > > > >> to
>> > > > >> > > rely
>> > > > >> > > > on
>> > > > >> > > > > the library defaults instead of choosing one ourselves.
>> > > > >> > > > >
>> > > > >> > > > > For point 3, I'm in favour of doing that in this KIP.
>> > > > >> > > > >
>> > > > >> > > > > Ismael
>> > > > >> > > > >
>> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
>> > > dongjin@apache.org
>> > > > >
>> > > > >> > > wrote:
>> > > > >> > > > >
>> > > > >> > > > > > Thank you Ismael, here are the answers:
>> > > > >> > > > > >
>> > > > >> > > > > > *1. About topic config*
>> > > > >> > > > > >
>> > > > >> > > > > > After some consideration, I concluded that topic config
>> > > > doesn't
>> > > > >> > need
>> > > > >> > > to
>> > > > >> > > > > > support compression.level. Here is why: since the
>> > > compression
>> > > > is
>> > > > >> > > > > conducted
>> > > > >> > > > > > by the client, the one who can select the best
>> compression
>> > > > >> level is
>> > > > >> > > the
>> > > > >> > > > > > client itself. Let us assume that the compression
>> level is
>> > > set
>> > > > >> at
>> > > > >> > the
>> > > > >> > > > > topic
>> > > > >> > > > > > config level. In that case, there is a possibility that
>> > the
>> > > > >> > > compression
>> > > > >> > > > > > level is not optimal for some producers. Actually,
>> Kafka's
>> > > go
>> > > > >> > client
>> > > > >> > > > also
>> > > > >> > > > > > supports compression level functionality for the
>> producer
>> > > > config
>> > > > >> > > only.
>> > > > >> > > > > > <
>> https://github.com/Shopify/sarama/blob/master/config.go>
>> > > > >> (wait,
>> > > > >> > do
>> > > > >> > > we
>> > > > >> > > > > > need
>> > > > >> > > > > > to add this reasoning in the KIP, rejected alternatives
>> > > > >> section?)
>> > > > >> > > > > >
>> > > > >> > > > > > *2. About default level*
>> > > > >> > > > > >
>> > > > >> > > > > > As of current draft implementation, the default
>> > compression
>> > > is
>> > > > >> set
>> > > > >> > on
>> > > > >> > > > the
>> > > > >> > > > > > CompressionType enum. Of course, changing this strategy
>> > into
>> > > > >> > relying
>> > > > >> > > > on a
>> > > > >> > > > > > method from the library to pick the default compression
>> > > level
>> > > > >> seems
>> > > > >> > > > > > possible, like `GZIPBlockOutputStream` does. In this
>> case,
>> > > we
>> > > > >> need
>> > > > >> > to
>> > > > >> > > > add
>> > > > >> > > > > > similar wrapper class for zstd and modify lz4 the
>> wrapper
>> > > > also.
>> > > > >> Add
>> > > > >> > > to
>> > > > >> > > > > > this, it seems like we need to explicitly state that we
>> > > follow
>> > > > >> the
>> > > > >> > > > > default
>> > > > >> > > > > > compression level of the codec in the documentation. Is
>> > this
>> > > > >> what
>> > > > >> > you
>> > > > >> > > > > > intended?
>> > > > >> > > > > >
>> > > > >> > > > > > *3. Whether to allow the buffer/block size to be
>> > > configurable*
>> > > > >> > > > > >
>> > > > >> > > > > > Well, As of current draft implementation, the lz4
>> level is
>> > > > >> > > implemented
>> > > > >> > > > as
>> > > > >> > > > > > block size; this is caused by my misunderstanding on
>> lz4.
>> > > > After
>> > > > >> > > > reviewing
>> > > > >> > > > > > lz4 today, I found that it also supports compression
>> level
>> > > of
>> > > > >> 1~16
>> > > > >> > > > > > (default: 1), not block size. I will fix it in this
>> > weekend
>> > > by
>> > > > >> > > updating
>> > > > >> > > > > the
>> > > > >> > > > > > wrapper class.
>> > > > >> > > > > >
>> > > > >> > > > > > For the problem of the buffer/block size, I have no
>> strong
>> > > > >> opinion.
>> > > > >> > > If
>> > > > >> > > > > the
>> > > > >> > > > > > community needs it, I will do it all together. How do
>> you
>> > > > think?
>> > > > >> > > > > >
>> > > > >> > > > > > In short, it seems like I need to update the KIP
>> document
>> > > for
>> > > > >> issue
>> > > > >> > > #1
>> > > > >> > > > > and
>> > > > >> > > > > > update the compression wrapper for issue #2, #3. Is
>> this
>> > > okay?
>> > > > >> > > > > >
>> > > > >> > > > > > Thanks,
>> > > > >> > > > > > Dongjin
>> > > > >> > > > > >
>> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
>> > > > ismaelj@gmail.com
>> > > > >> >
>> > > > >> > > > wrote:
>> > > > >> > > > > >
>> > > > >> > > > > > >  Thanks for the KIP, this is helpful. A few
>> questions:
>> > > > >> > > > > > >
>> > > > >> > > > > > > 1. Have we considered whether we want to allow a
>> similar
>> > > > topic
>> > > > >> > > > config?
>> > > > >> > > > > > > 2. Can we rely on a method from the library to pick
>> the
>> > > > >> default
>> > > > >> > > > > > compression
>> > > > >> > > > > > > level if compression.level is not set? We do it for
>> gzip
>> > > and
>> > > > >> it
>> > > > >> > > would
>> > > > >> > > > > > seem
>> > > > >> > > > > > > reasonable to do something similar for the other
>> > > compression
>> > > > >> > > > libraries.
>> > > > >> > > > > > > 3. Do we want to allow the buffer/block size to be
>> > > > >> configurable?
>> > > > >> > > This
>> > > > >> > > > > has
>> > > > >> > > > > > > an impact on memory usage and people may want to
>> trade
>> > > > >> > compression
>> > > > >> > > > for
>> > > > >> > > > > > > less/more memory in some cases. For example, the
>> default
>> > > for
>> > > > >> LZ4
>> > > > >> > is
>> > > > >> > > > > 64KB
>> > > > >> > > > > > > which is a bit high.
>> > > > >> > > > > > >
>> > > > >> > > > > > > Ismael
>> > > > >> > > > > > >
>> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
>> > > > dongjin@apache.org
>> > > > >> > > wrote:
>> > > > >> > > > > > >
>> > > > >> > > > > > > > Hello dev,
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > I hope to initiate the discussion of KIP-390: Add
>> > > producer
>> > > > >> > option
>> > > > >> > > > to
>> > > > >> > > > > > > adjust
>> > > > >> > > > > > > > compression level
>> > > > >> > > > > > > > <
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
>> > > > >> > > > > > > > >.
>> > > > >> > > > > > > > All feedbacks will be highly appreciated.
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > Best,
>> > > > >> > > > > > > > Dongjin
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > --
>> > > > >> > > > > > > > *Dongjin Lee*
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > *A hitchhiker in the mathematical world.*
>> > > > >> > > > > > > >
>> > > > >> > > > > > > > *github:  <http://goog_969573159/>
>> > > github.com/dongjinleekr
>> > > > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
>> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
>> > > > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr
>> >slideshare:
>> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
>> > > > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
>> > > > >> > > > > > > >
>> > > > >> > > > > > >
>> > > > >> > > > > >
>> > > > >> > > > > >
>> > > > >> > > > > > --
>> > > > >> > > > > > *Dongjin Lee*
>> > > > >> > > > > >
>> > > > >> > > > > > *A hitchhiker in the mathematical world.*
>> > > > >> > > > > > *github:  <http://goog_969573159/>
>> github.com/dongjinleekr
>> > > > >> > > > > > <https://github.com/dongjinleekr>linkedin:
>> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
>> > > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > >> > > > > > speakerdeck.com/dongjin
>> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
>> > > > >> > > > > >
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > > >
>> > > > >> > > > --
>> > > > >> > > > *Dongjin Lee*
>> > > > >> > > >
>> > > > >> > > > *A hitchhiker in the mathematical world.*
>> > > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
>> > > > >> > > kr.linkedin.com/in/dongjinleekr
>> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > >> > > > speakerdeck.com/dongjin
>> > > > >> > > > <https://speakerdeck.com/dongjin>*
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >> >
>> > > > >> > --
>> > > > >> > *Dongjin Lee*
>> > > > >> >
>> > > > >> > *A hitchhiker in the mathematical world.*
>> > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > >> > <https://github.com/dongjinleekr>linkedin:
>> > > > >> kr.linkedin.com/in/dongjinleekr
>> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > >> > speakerdeck.com/dongjin
>> > > > >> > <https://speakerdeck.com/dongjin>*
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > > > --
>> > > > > *Dongjin Lee*
>> > > > >
>> > > > > *A hitchhiker in the mathematical world.*
>> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > > <https://github.com/dongjinleekr>linkedin:
>> > > > kr.linkedin.com/in/dongjinleekr
>> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > speakerdeck.com/dongjin
>> > > > > <https://speakerdeck.com/dongjin>*
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > *Dongjin Lee*
>> > > >
>> > > > *A hitchhiker in the mathematical world.*
>> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > <https://github.com/dongjinleekr>linkedin:
>> > > kr.linkedin.com/in/dongjinleekr
>> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > speakerdeck.com/dongjin
>> > > > <https://speakerdeck.com/dongjin>*
>> > > >
>> > >
>> >
>> >
>> > --
>> > *Dongjin Lee*
>> >
>> > *A hitchhiker in the mathematical world.*
>> > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > <https://github.com/dongjinleekr>linkedin:
>> kr.linkedin.com/in/dongjinleekr
>> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > speakerdeck.com/dongjin
>> > <https://speakerdeck.com/dongjin>*
>> >
>>
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
I see. Let me have a check. If not needed, of course, we don't have to
waste on configuration options.

Since the KIP deadline is imminent, I just opened the voting thread. Let's
continue the discussion here.

Best,
Dongjin

On Mon, Jan 21, 2019 at 1:30 AM Ismael Juma <is...@juma.me.uk> wrote:

> Hi Dongjin,
>
> When the compression type is "producer", then the broker doesn't recompress
> though. Thinking about it some more, there are some uncommon cases where
> recompression does happen (the old (and hopefully hardly used by now)
> message format == 0 and some edge cases), so it is a good point you raised.
>
> It's a bit unfortunate to add so many topic configs for cases that probably
> don't matter. That is, if you are using "producer" compression, you
> probably don't need to configure these settings and can live with the
> defaults. Perhaps we should only support the topic config for the cases
> where you are actually recompressing in the broker.
>
> What do you think? I'd be interested in other people's thoughts too.
>
> Ismael
>
> On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <do...@apache.org> wrote:
>
> > Hi Ismael,
> >
> > It seems like it needs more explanation. Here is the detailed reasoning.
> >
> > You know, topic and broker's 'compression.type' allows 'uncompressed',
> > 'producer' with standard codecs (i.e., gzip, snappy, lz4, zstd.) And this
> > configuration is used by the broker in the re-compressing process after
> > offset assignment. After this feature, the new configs,
> 'compression.level'
> > and 'compression.buffer.size', also will be used in this process.
> >
> > The problem arises when given topic's compression type (whether it was
> > inherited from broker's configuration or explicitly set) is 'producer.'
> > With this setting, the compression codec to be used is decided by the
> > producer client. Since there is no way to restore the compression level
> and
> > buffer size from the message, we can take the following strategies:
> >
> > 1. Just use given 'compression.level' and 'compression.buffer.size'
> > settings.
> >
> > It will cause so many errors. Let's imagine the case of topic's
> > configuration is { compression.type=producer, compression.level=10,
> > compression.buffer.size=8192 }. In this case, all producers with gzip or
> > lz4 compressed messages will result in an error. (gzip doesn't allow
> > compression level 10, and lz4 also for a buffer size of 8192.)
> >
> > 2. Extend the message format to include compression configurations.
> >
> > With this strategy, we need to change the message format - it's a too big
> > change.
> >
> > 3. If topic's compression.type is 'producer', use the default
> configuration
> > for the given codec.
> >
> > With this strategy, allowing fine-grained compression configuration is
> > meaningless.
> >
> > For the above reasons, I think the only alternative is providing options
> > that can be used when the topic's 'compression.type' is 'producer.' In
> > other words, adding compression.[gzip, lz4, zstd].level and
> > compression.[gzip.snappy.lz4].buffer.size options - and it is what I did
> in
> > the last modification.
> >
> > (wait, the reasoning above should be included in the KIP in the rejected
> > alternatives section, isn't it?)
> >
> > Thanks,
> > Dongjin
> >
> > On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Hi Dongjin,
> > >
> > > For topic level, you can only have a single compression type so the way
> > it
> > > was before was fine, right? The point you raise is how to set broker
> > > defaults that vary depending on the compression type, correct?
> > >
> > > Ismael
> > >
> > > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > I just realized that there was a missing hole in the KIP, so I fixed
> > it.
> > > > The draft implementation will be updated soon.
> > > >
> > > > In short, the proposed change did not regard the case of the topic or
> > > > broker's 'compression.type' is 'producer'; in this case, the broker
> has
> > > to
> > > > handle all kinds of the supported codec. So I added additional
> options
> > > > (compression.[gzip,snappy,lz4, zstd].level,
> > compression.[gzip,snappy,lz4,
> > > > zstd].buffer.size) with handling routines.
> > > >
> > > > Please have a look when you are free.
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <do...@apache.org>
> wrote:
> > > >
> > > > > Thanks for pointing out Ismael. It's now updated.
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <is...@gmail.com>
> > wrote:
> > > > >
> > > > >> Thanks Dongjin. One minor suggestion: we should mention that the
> > > broker
> > > > >> side configs are also topic configs (i.e. can be set for a given
> > > topic).
> > > > >>
> > > > >> Ismael
> > > > >>
> > > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org
> > wrote:
> > > > >>
> > > > >> > Happy new year.
> > > > >> >
> > > > >> > I just updated the title and contents of KIP and Jira issue,
> with
> > > > >> updated
> > > > >> > draft implementation. Now both of compression level and buffer
> > size
> > > > >> options
> > > > >> > are available to producer and broker configuration. You can
> check
> > > the
> > > > >> > updated KIP from modified URL:
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > > > >> >
> > > > >> > Please have a look when you are free.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > Dongjin
> > > > >> >
> > > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <is...@gmail.com>
> > > > wrote:
> > > > >> >
> > > > >> > > The updated title sounds fine to me.
> > > > >> > >
> > > > >> > > Ismael
> > > > >> > >
> > > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org
> > > wrote:
> > > > >> > >
> > > > >> > > > Hi Ismael,
> > > > >> > > >
> > > > >> > > > Got it. Your direction is perfectly reasonable. I am now
> > > updating
> > > > >> the
> > > > >> > KIP
> > > > >> > > > document and the implementation.
> > > > >> > > >
> > > > >> > > > By allowing the buffer/block size to be configurable, it
> would
> > > be
> > > > >> > better
> > > > >> > > to
> > > > >> > > > update the title of the KIP like 'Allow fine-grained
> > > configuration
> > > > >> for
> > > > >> > > > compression'. Is that right?
> > > > >> > > >
> > > > >> > > > @Other committers:
> > > > >> > > >
> > > > >> > > > Is there any other opinion on allowing the buffer/block size
> > to
> > > be
> > > > >> > > > configurable?
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Dongjin
> > > > >> > > >
> > > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
> > ismael@juma.me.uk>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > > > Hi Dongjin,
> > > > >> > > > >
> > > > >> > > > > To clarify, I mean a broker topic config with regards to
> > point
> > > > 1.
> > > > >> As
> > > > >> > > you
> > > > >> > > > > know, compression can be done by the producer and/or by
> the
> > > > >> broker.
> > > > >> > The
> > > > >> > > > > default is for the broker to just use whatever compression
> > was
> > > > >> used
> > > > >> > by
> > > > >> > > > the
> > > > >> > > > > producer, but this can be changed by the user on a per
> topic
> > > > >> basis.
> > > > >> > It
> > > > >> > > > > seems like it would make sense for the configs to be .
> > > > consistent
> > > > >> > > between
> > > > >> > > > > producer and broker.
> > > > >> > > > >
> > > > >> > > > > For point 2, I haven't looked at the implementation, but
> we
> > > > could
> > > > >> do
> > > > >> > it
> > > > >> > > > in
> > > > >> > > > > the `CompressionType` enum by invoking the right
> constructor
> > > or
> > > > >> > > > retrieving
> > > > >> > > > > the default value via a constant (if defined). That's an
> > > > >> > implementation
> > > > >> > > > > detail and can be discussed in the PR. The more general
> > point
> > > is
> > > > >> to
> > > > >> > > rely
> > > > >> > > > on
> > > > >> > > > > the library defaults instead of choosing one ourselves.
> > > > >> > > > >
> > > > >> > > > > For point 3, I'm in favour of doing that in this KIP.
> > > > >> > > > >
> > > > >> > > > > Ismael
> > > > >> > > > >
> > > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
> > > dongjin@apache.org
> > > > >
> > > > >> > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Thank you Ismael, here are the answers:
> > > > >> > > > > >
> > > > >> > > > > > *1. About topic config*
> > > > >> > > > > >
> > > > >> > > > > > After some consideration, I concluded that topic config
> > > > doesn't
> > > > >> > need
> > > > >> > > to
> > > > >> > > > > > support compression.level. Here is why: since the
> > > compression
> > > > is
> > > > >> > > > > conducted
> > > > >> > > > > > by the client, the one who can select the best
> compression
> > > > >> level is
> > > > >> > > the
> > > > >> > > > > > client itself. Let us assume that the compression level
> is
> > > set
> > > > >> at
> > > > >> > the
> > > > >> > > > > topic
> > > > >> > > > > > config level. In that case, there is a possibility that
> > the
> > > > >> > > compression
> > > > >> > > > > > level is not optimal for some producers. Actually,
> Kafka's
> > > go
> > > > >> > client
> > > > >> > > > also
> > > > >> > > > > > supports compression level functionality for the
> producer
> > > > config
> > > > >> > > only.
> > > > >> > > > > > <
> https://github.com/Shopify/sarama/blob/master/config.go>
> > > > >> (wait,
> > > > >> > do
> > > > >> > > we
> > > > >> > > > > > need
> > > > >> > > > > > to add this reasoning in the KIP, rejected alternatives
> > > > >> section?)
> > > > >> > > > > >
> > > > >> > > > > > *2. About default level*
> > > > >> > > > > >
> > > > >> > > > > > As of current draft implementation, the default
> > compression
> > > is
> > > > >> set
> > > > >> > on
> > > > >> > > > the
> > > > >> > > > > > CompressionType enum. Of course, changing this strategy
> > into
> > > > >> > relying
> > > > >> > > > on a
> > > > >> > > > > > method from the library to pick the default compression
> > > level
> > > > >> seems
> > > > >> > > > > > possible, like `GZIPBlockOutputStream` does. In this
> case,
> > > we
> > > > >> need
> > > > >> > to
> > > > >> > > > add
> > > > >> > > > > > similar wrapper class for zstd and modify lz4 the
> wrapper
> > > > also.
> > > > >> Add
> > > > >> > > to
> > > > >> > > > > > this, it seems like we need to explicitly state that we
> > > follow
> > > > >> the
> > > > >> > > > > default
> > > > >> > > > > > compression level of the codec in the documentation. Is
> > this
> > > > >> what
> > > > >> > you
> > > > >> > > > > > intended?
> > > > >> > > > > >
> > > > >> > > > > > *3. Whether to allow the buffer/block size to be
> > > configurable*
> > > > >> > > > > >
> > > > >> > > > > > Well, As of current draft implementation, the lz4 level
> is
> > > > >> > > implemented
> > > > >> > > > as
> > > > >> > > > > > block size; this is caused by my misunderstanding on
> lz4.
> > > > After
> > > > >> > > > reviewing
> > > > >> > > > > > lz4 today, I found that it also supports compression
> level
> > > of
> > > > >> 1~16
> > > > >> > > > > > (default: 1), not block size. I will fix it in this
> > weekend
> > > by
> > > > >> > > updating
> > > > >> > > > > the
> > > > >> > > > > > wrapper class.
> > > > >> > > > > >
> > > > >> > > > > > For the problem of the buffer/block size, I have no
> strong
> > > > >> opinion.
> > > > >> > > If
> > > > >> > > > > the
> > > > >> > > > > > community needs it, I will do it all together. How do
> you
> > > > think?
> > > > >> > > > > >
> > > > >> > > > > > In short, it seems like I need to update the KIP
> document
> > > for
> > > > >> issue
> > > > >> > > #1
> > > > >> > > > > and
> > > > >> > > > > > update the compression wrapper for issue #2, #3. Is this
> > > okay?
> > > > >> > > > > >
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Dongjin
> > > > >> > > > > >
> > > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
> > > > ismaelj@gmail.com
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > >  Thanks for the KIP, this is helpful. A few questions:
> > > > >> > > > > > >
> > > > >> > > > > > > 1. Have we considered whether we want to allow a
> similar
> > > > topic
> > > > >> > > > config?
> > > > >> > > > > > > 2. Can we rely on a method from the library to pick
> the
> > > > >> default
> > > > >> > > > > > compression
> > > > >> > > > > > > level if compression.level is not set? We do it for
> gzip
> > > and
> > > > >> it
> > > > >> > > would
> > > > >> > > > > > seem
> > > > >> > > > > > > reasonable to do something similar for the other
> > > compression
> > > > >> > > > libraries.
> > > > >> > > > > > > 3. Do we want to allow the buffer/block size to be
> > > > >> configurable?
> > > > >> > > This
> > > > >> > > > > has
> > > > >> > > > > > > an impact on memory usage and people may want to trade
> > > > >> > compression
> > > > >> > > > for
> > > > >> > > > > > > less/more memory in some cases. For example, the
> default
> > > for
> > > > >> LZ4
> > > > >> > is
> > > > >> > > > > 64KB
> > > > >> > > > > > > which is a bit high.
> > > > >> > > > > > >
> > > > >> > > > > > > Ismael
> > > > >> > > > > > >
> > > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
> > > > dongjin@apache.org
> > > > >> > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hello dev,
> > > > >> > > > > > > >
> > > > >> > > > > > > > I hope to initiate the discussion of KIP-390: Add
> > > producer
> > > > >> > option
> > > > >> > > > to
> > > > >> > > > > > > adjust
> > > > >> > > > > > > > compression level
> > > > >> > > > > > > > <
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > > >> > > > > > > > >.
> > > > >> > > > > > > > All feedbacks will be highly appreciated.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Best,
> > > > >> > > > > > > > Dongjin
> > > > >> > > > > > > >
> > > > >> > > > > > > > --
> > > > >> > > > > > > > *Dongjin Lee*
> > > > >> > > > > > > >
> > > > >> > > > > > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > > > > >
> > > > >> > > > > > > > *github:  <http://goog_969573159/>
> > > github.com/dongjinleekr
> > > > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > >> > > > > > > > www.slideshare.net/dongjinleekr
> > > > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > --
> > > > >> > > > > > *Dongjin Lee*
> > > > >> > > > > >
> > > > >> > > > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > > > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > > > >> > > > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > > > > > speakerdeck.com/dongjin
> > > > >> > > > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > *Dongjin Lee*
> > > > >> > > >
> > > > >> > > > *A hitchhiker in the mathematical world.*
> > > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > > >> > > kr.linkedin.com/in/dongjinleekr
> > > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > > > speakerdeck.com/dongjin
> > > > >> > > > <https://speakerdeck.com/dongjin>*
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > *Dongjin Lee*
> > > > >> >
> > > > >> > *A hitchhiker in the mathematical world.*
> > > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > >> > <https://github.com/dongjinleekr>linkedin:
> > > > >> kr.linkedin.com/in/dongjinleekr
> > > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > >> > speakerdeck.com/dongjin
> > > > >> > <https://speakerdeck.com/dongjin>*
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <https://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > speakerdeck.com/dongjin
> > > > > <https://speakerdeck.com/dongjin>*
> > > > >
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <https://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > speakerdeck.com/dongjin
> > > > <https://speakerdeck.com/dongjin>*
> > > >
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Ismael Juma <is...@juma.me.uk>.
Hi Dongjin,

When the compression type is "producer", then the broker doesn't recompress
though. Thinking about it some more, there are some uncommon cases where
recompression does happen (the old (and hopefully hardly used by now)
message format == 0 and some edge cases), so it is a good point you raised.

It's a bit unfortunate to add so many topic configs for cases that probably
don't matter. That is, if you are using "producer" compression, you
probably don't need to configure these settings and can live with the
defaults. Perhaps we should only support the topic config for the cases
where you are actually recompressing in the broker.

What do you think? I'd be interested in other people's thoughts too.

Ismael

On Sun, Jan 20, 2019 at 2:14 AM Dongjin Lee <do...@apache.org> wrote:

> Hi Ismael,
>
> It seems like it needs more explanation. Here is the detailed reasoning.
>
> You know, topic and broker's 'compression.type' allows 'uncompressed',
> 'producer' with standard codecs (i.e., gzip, snappy, lz4, zstd.) And this
> configuration is used by the broker in the re-compressing process after
> offset assignment. After this feature, the new configs, 'compression.level'
> and 'compression.buffer.size', also will be used in this process.
>
> The problem arises when given topic's compression type (whether it was
> inherited from broker's configuration or explicitly set) is 'producer.'
> With this setting, the compression codec to be used is decided by the
> producer client. Since there is no way to restore the compression level and
> buffer size from the message, we can take the following strategies:
>
> 1. Just use given 'compression.level' and 'compression.buffer.size'
> settings.
>
> It will cause so many errors. Let's imagine the case of topic's
> configuration is { compression.type=producer, compression.level=10,
> compression.buffer.size=8192 }. In this case, all producers with gzip or
> lz4 compressed messages will result in an error. (gzip doesn't allow
> compression level 10, and lz4 also for a buffer size of 8192.)
>
> 2. Extend the message format to include compression configurations.
>
> With this strategy, we need to change the message format - it's a too big
> change.
>
> 3. If topic's compression.type is 'producer', use the default configuration
> for the given codec.
>
> With this strategy, allowing fine-grained compression configuration is
> meaningless.
>
> For the above reasons, I think the only alternative is providing options
> that can be used when the topic's 'compression.type' is 'producer.' In
> other words, adding compression.[gzip, lz4, zstd].level and
> compression.[gzip.snappy.lz4].buffer.size options - and it is what I did in
> the last modification.
>
> (wait, the reasoning above should be included in the KIP in the rejected
> alternatives section, isn't it?)
>
> Thanks,
> Dongjin
>
> On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > Hi Dongjin,
> >
> > For topic level, you can only have a single compression type so the way
> it
> > was before was fine, right? The point you raise is how to set broker
> > defaults that vary depending on the compression type, correct?
> >
> > Ismael
> >
> > On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <do...@apache.org> wrote:
> >
> > > I just realized that there was a missing hole in the KIP, so I fixed
> it.
> > > The draft implementation will be updated soon.
> > >
> > > In short, the proposed change did not regard the case of the topic or
> > > broker's 'compression.type' is 'producer'; in this case, the broker has
> > to
> > > handle all kinds of the supported codec. So I added additional options
> > > (compression.[gzip,snappy,lz4, zstd].level,
> compression.[gzip,snappy,lz4,
> > > zstd].buffer.size) with handling routines.
> > >
> > > Please have a look when you are free.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <do...@apache.org> wrote:
> > >
> > > > Thanks for pointing out Ismael. It's now updated.
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <is...@gmail.com>
> wrote:
> > > >
> > > >> Thanks Dongjin. One minor suggestion: we should mention that the
> > broker
> > > >> side configs are also topic configs (i.e. can be set for a given
> > topic).
> > > >>
> > > >> Ismael
> > > >>
> > > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org
> wrote:
> > > >>
> > > >> > Happy new year.
> > > >> >
> > > >> > I just updated the title and contents of KIP and Jira issue, with
> > > >> updated
> > > >> > draft implementation. Now both of compression level and buffer
> size
> > > >> options
> > > >> > are available to producer and broker configuration. You can check
> > the
> > > >> > updated KIP from modified URL:
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > > >> >
> > > >> > Please have a look when you are free.
> > > >> >
> > > >> > Thanks,
> > > >> > Dongjin
> > > >> >
> > > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <is...@gmail.com>
> > > wrote:
> > > >> >
> > > >> > > The updated title sounds fine to me.
> > > >> > >
> > > >> > > Ismael
> > > >> > >
> > > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org
> > wrote:
> > > >> > >
> > > >> > > > Hi Ismael,
> > > >> > > >
> > > >> > > > Got it. Your direction is perfectly reasonable. I am now
> > updating
> > > >> the
> > > >> > KIP
> > > >> > > > document and the implementation.
> > > >> > > >
> > > >> > > > By allowing the buffer/block size to be configurable, it would
> > be
> > > >> > better
> > > >> > > to
> > > >> > > > update the title of the KIP like 'Allow fine-grained
> > configuration
> > > >> for
> > > >> > > > compression'. Is that right?
> > > >> > > >
> > > >> > > > @Other committers:
> > > >> > > >
> > > >> > > > Is there any other opinion on allowing the buffer/block size
> to
> > be
> > > >> > > > configurable?
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Dongjin
> > > >> > > >
> > > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <
> ismael@juma.me.uk>
> > > >> wrote:
> > > >> > > >
> > > >> > > > > Hi Dongjin,
> > > >> > > > >
> > > >> > > > > To clarify, I mean a broker topic config with regards to
> point
> > > 1.
> > > >> As
> > > >> > > you
> > > >> > > > > know, compression can be done by the producer and/or by the
> > > >> broker.
> > > >> > The
> > > >> > > > > default is for the broker to just use whatever compression
> was
> > > >> used
> > > >> > by
> > > >> > > > the
> > > >> > > > > producer, but this can be changed by the user on a per topic
> > > >> basis.
> > > >> > It
> > > >> > > > > seems like it would make sense for the configs to be .
> > > consistent
> > > >> > > between
> > > >> > > > > producer and broker.
> > > >> > > > >
> > > >> > > > > For point 2, I haven't looked at the implementation, but we
> > > could
> > > >> do
> > > >> > it
> > > >> > > > in
> > > >> > > > > the `CompressionType` enum by invoking the right constructor
> > or
> > > >> > > > retrieving
> > > >> > > > > the default value via a constant (if defined). That's an
> > > >> > implementation
> > > >> > > > > detail and can be discussed in the PR. The more general
> point
> > is
> > > >> to
> > > >> > > rely
> > > >> > > > on
> > > >> > > > > the library defaults instead of choosing one ourselves.
> > > >> > > > >
> > > >> > > > > For point 3, I'm in favour of doing that in this KIP.
> > > >> > > > >
> > > >> > > > > Ismael
> > > >> > > > >
> > > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
> > dongjin@apache.org
> > > >
> > > >> > > wrote:
> > > >> > > > >
> > > >> > > > > > Thank you Ismael, here are the answers:
> > > >> > > > > >
> > > >> > > > > > *1. About topic config*
> > > >> > > > > >
> > > >> > > > > > After some consideration, I concluded that topic config
> > > doesn't
> > > >> > need
> > > >> > > to
> > > >> > > > > > support compression.level. Here is why: since the
> > compression
> > > is
> > > >> > > > > conducted
> > > >> > > > > > by the client, the one who can select the best compression
> > > >> level is
> > > >> > > the
> > > >> > > > > > client itself. Let us assume that the compression level is
> > set
> > > >> at
> > > >> > the
> > > >> > > > > topic
> > > >> > > > > > config level. In that case, there is a possibility that
> the
> > > >> > > compression
> > > >> > > > > > level is not optimal for some producers. Actually, Kafka's
> > go
> > > >> > client
> > > >> > > > also
> > > >> > > > > > supports compression level functionality for the producer
> > > config
> > > >> > > only.
> > > >> > > > > > <https://github.com/Shopify/sarama/blob/master/config.go>
> > > >> (wait,
> > > >> > do
> > > >> > > we
> > > >> > > > > > need
> > > >> > > > > > to add this reasoning in the KIP, rejected alternatives
> > > >> section?)
> > > >> > > > > >
> > > >> > > > > > *2. About default level*
> > > >> > > > > >
> > > >> > > > > > As of current draft implementation, the default
> compression
> > is
> > > >> set
> > > >> > on
> > > >> > > > the
> > > >> > > > > > CompressionType enum. Of course, changing this strategy
> into
> > > >> > relying
> > > >> > > > on a
> > > >> > > > > > method from the library to pick the default compression
> > level
> > > >> seems
> > > >> > > > > > possible, like `GZIPBlockOutputStream` does. In this case,
> > we
> > > >> need
> > > >> > to
> > > >> > > > add
> > > >> > > > > > similar wrapper class for zstd and modify lz4 the wrapper
> > > also.
> > > >> Add
> > > >> > > to
> > > >> > > > > > this, it seems like we need to explicitly state that we
> > follow
> > > >> the
> > > >> > > > > default
> > > >> > > > > > compression level of the codec in the documentation. Is
> this
> > > >> what
> > > >> > you
> > > >> > > > > > intended?
> > > >> > > > > >
> > > >> > > > > > *3. Whether to allow the buffer/block size to be
> > configurable*
> > > >> > > > > >
> > > >> > > > > > Well, As of current draft implementation, the lz4 level is
> > > >> > > implemented
> > > >> > > > as
> > > >> > > > > > block size; this is caused by my misunderstanding on lz4.
> > > After
> > > >> > > > reviewing
> > > >> > > > > > lz4 today, I found that it also supports compression level
> > of
> > > >> 1~16
> > > >> > > > > > (default: 1), not block size. I will fix it in this
> weekend
> > by
> > > >> > > updating
> > > >> > > > > the
> > > >> > > > > > wrapper class.
> > > >> > > > > >
> > > >> > > > > > For the problem of the buffer/block size, I have no strong
> > > >> opinion.
> > > >> > > If
> > > >> > > > > the
> > > >> > > > > > community needs it, I will do it all together. How do you
> > > think?
> > > >> > > > > >
> > > >> > > > > > In short, it seems like I need to update the KIP document
> > for
> > > >> issue
> > > >> > > #1
> > > >> > > > > and
> > > >> > > > > > update the compression wrapper for issue #2, #3. Is this
> > okay?
> > > >> > > > > >
> > > >> > > > > > Thanks,
> > > >> > > > > > Dongjin
> > > >> > > > > >
> > > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
> > > ismaelj@gmail.com
> > > >> >
> > > >> > > > wrote:
> > > >> > > > > >
> > > >> > > > > > >  Thanks for the KIP, this is helpful. A few questions:
> > > >> > > > > > >
> > > >> > > > > > > 1. Have we considered whether we want to allow a similar
> > > topic
> > > >> > > > config?
> > > >> > > > > > > 2. Can we rely on a method from the library to pick the
> > > >> default
> > > >> > > > > > compression
> > > >> > > > > > > level if compression.level is not set? We do it for gzip
> > and
> > > >> it
> > > >> > > would
> > > >> > > > > > seem
> > > >> > > > > > > reasonable to do something similar for the other
> > compression
> > > >> > > > libraries.
> > > >> > > > > > > 3. Do we want to allow the buffer/block size to be
> > > >> configurable?
> > > >> > > This
> > > >> > > > > has
> > > >> > > > > > > an impact on memory usage and people may want to trade
> > > >> > compression
> > > >> > > > for
> > > >> > > > > > > less/more memory in some cases. For example, the default
> > for
> > > >> LZ4
> > > >> > is
> > > >> > > > > 64KB
> > > >> > > > > > > which is a bit high.
> > > >> > > > > > >
> > > >> > > > > > > Ismael
> > > >> > > > > > >
> > > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
> > > dongjin@apache.org
> > > >> > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hello dev,
> > > >> > > > > > > >
> > > >> > > > > > > > I hope to initiate the discussion of KIP-390: Add
> > producer
> > > >> > option
> > > >> > > > to
> > > >> > > > > > > adjust
> > > >> > > > > > > > compression level
> > > >> > > > > > > > <
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > >> > > > > > > > >.
> > > >> > > > > > > > All feedbacks will be highly appreciated.
> > > >> > > > > > > >
> > > >> > > > > > > > Best,
> > > >> > > > > > > > Dongjin
> > > >> > > > > > > >
> > > >> > > > > > > > --
> > > >> > > > > > > > *Dongjin Lee*
> > > >> > > > > > > >
> > > >> > > > > > > > *A hitchhiker in the mathematical world.*
> > > >> > > > > > > >
> > > >> > > > > > > > *github:  <http://goog_969573159/>
> > github.com/dongjinleekr
> > > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > >> > > > > > > > www.slideshare.net/dongjinleekr
> > > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > --
> > > >> > > > > > *Dongjin Lee*
> > > >> > > > > >
> > > >> > > > > > *A hitchhiker in the mathematical world.*
> > > >> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > > > > > speakerdeck.com/dongjin
> > > >> > > > > > <https://speakerdeck.com/dongjin>*
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > --
> > > >> > > > *Dongjin Lee*
> > > >> > > >
> > > >> > > > *A hitchhiker in the mathematical world.*
> > > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > > > <https://github.com/dongjinleekr>linkedin:
> > > >> > > kr.linkedin.com/in/dongjinleekr
> > > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > > > speakerdeck.com/dongjin
> > > >> > > > <https://speakerdeck.com/dongjin>*
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > *Dongjin Lee*
> > > >> >
> > > >> > *A hitchhiker in the mathematical world.*
> > > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > >> > <https://github.com/dongjinleekr>linkedin:
> > > >> kr.linkedin.com/in/dongjinleekr
> > > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > >> > speakerdeck.com/dongjin
> > > >> > <https://speakerdeck.com/dongjin>*
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <https://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > speakerdeck.com/dongjin
> > > > <https://speakerdeck.com/dongjin>*
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <https://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > speakerdeck.com/dongjin
> > > <https://speakerdeck.com/dongjin>*
> > >
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Hi Ismael,

It seems like it needs more explanation. Here is the detailed reasoning.

You know, topic and broker's 'compression.type' allows 'uncompressed',
'producer' with standard codecs (i.e., gzip, snappy, lz4, zstd.) And this
configuration is used by the broker in the re-compressing process after
offset assignment. After this feature, the new configs, 'compression.level'
and 'compression.buffer.size', also will be used in this process.

The problem arises when given topic's compression type (whether it was
inherited from broker's configuration or explicitly set) is 'producer.'
With this setting, the compression codec to be used is decided by the
producer client. Since there is no way to restore the compression level and
buffer size from the message, we can take the following strategies:

1. Just use given 'compression.level' and 'compression.buffer.size'
settings.

It will cause so many errors. Let's imagine the case of topic's
configuration is { compression.type=producer, compression.level=10,
compression.buffer.size=8192 }. In this case, all producers with gzip or
lz4 compressed messages will result in an error. (gzip doesn't allow
compression level 10, and lz4 also for a buffer size of 8192.)

2. Extend the message format to include compression configurations.

With this strategy, we need to change the message format - it's a too big
change.

3. If topic's compression.type is 'producer', use the default configuration
for the given codec.

With this strategy, allowing fine-grained compression configuration is
meaningless.

For the above reasons, I think the only alternative is providing options
that can be used when the topic's 'compression.type' is 'producer.' In
other words, adding compression.[gzip, lz4, zstd].level and
compression.[gzip.snappy.lz4].buffer.size options - and it is what I did in
the last modification.

(wait, the reasoning above should be included in the KIP in the rejected
alternatives section, isn't it?)

Thanks,
Dongjin

On Sun, Jan 20, 2019 at 2:33 AM Ismael Juma <is...@juma.me.uk> wrote:

> Hi Dongjin,
>
> For topic level, you can only have a single compression type so the way it
> was before was fine, right? The point you raise is how to set broker
> defaults that vary depending on the compression type, correct?
>
> Ismael
>
> On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <do...@apache.org> wrote:
>
> > I just realized that there was a missing hole in the KIP, so I fixed it.
> > The draft implementation will be updated soon.
> >
> > In short, the proposed change did not regard the case of the topic or
> > broker's 'compression.type' is 'producer'; in this case, the broker has
> to
> > handle all kinds of the supported codec. So I added additional options
> > (compression.[gzip,snappy,lz4, zstd].level, compression.[gzip,snappy,lz4,
> > zstd].buffer.size) with handling routines.
> >
> > Please have a look when you are free.
> >
> > Thanks,
> > Dongjin
> >
> > On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <do...@apache.org> wrote:
> >
> > > Thanks for pointing out Ismael. It's now updated.
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <is...@gmail.com> wrote:
> > >
> > >> Thanks Dongjin. One minor suggestion: we should mention that the
> broker
> > >> side configs are also topic configs (i.e. can be set for a given
> topic).
> > >>
> > >> Ismael
> > >>
> > >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org wrote:
> > >>
> > >> > Happy new year.
> > >> >
> > >> > I just updated the title and contents of KIP and Jira issue, with
> > >> updated
> > >> > draft implementation. Now both of compression level and buffer size
> > >> options
> > >> > are available to producer and broker configuration. You can check
> the
> > >> > updated KIP from modified URL:
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> > >> >
> > >> > Please have a look when you are free.
> > >> >
> > >> > Thanks,
> > >> > Dongjin
> > >> >
> > >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <is...@gmail.com>
> > wrote:
> > >> >
> > >> > > The updated title sounds fine to me.
> > >> > >
> > >> > > Ismael
> > >> > >
> > >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org
> wrote:
> > >> > >
> > >> > > > Hi Ismael,
> > >> > > >
> > >> > > > Got it. Your direction is perfectly reasonable. I am now
> updating
> > >> the
> > >> > KIP
> > >> > > > document and the implementation.
> > >> > > >
> > >> > > > By allowing the buffer/block size to be configurable, it would
> be
> > >> > better
> > >> > > to
> > >> > > > update the title of the KIP like 'Allow fine-grained
> configuration
> > >> for
> > >> > > > compression'. Is that right?
> > >> > > >
> > >> > > > @Other committers:
> > >> > > >
> > >> > > > Is there any other opinion on allowing the buffer/block size to
> be
> > >> > > > configurable?
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Dongjin
> > >> > > >
> > >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <is...@juma.me.uk>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi Dongjin,
> > >> > > > >
> > >> > > > > To clarify, I mean a broker topic config with regards to point
> > 1.
> > >> As
> > >> > > you
> > >> > > > > know, compression can be done by the producer and/or by the
> > >> broker.
> > >> > The
> > >> > > > > default is for the broker to just use whatever compression was
> > >> used
> > >> > by
> > >> > > > the
> > >> > > > > producer, but this can be changed by the user on a per topic
> > >> basis.
> > >> > It
> > >> > > > > seems like it would make sense for the configs to be .
> > consistent
> > >> > > between
> > >> > > > > producer and broker.
> > >> > > > >
> > >> > > > > For point 2, I haven't looked at the implementation, but we
> > could
> > >> do
> > >> > it
> > >> > > > in
> > >> > > > > the `CompressionType` enum by invoking the right constructor
> or
> > >> > > > retrieving
> > >> > > > > the default value via a constant (if defined). That's an
> > >> > implementation
> > >> > > > > detail and can be discussed in the PR. The more general point
> is
> > >> to
> > >> > > rely
> > >> > > > on
> > >> > > > > the library defaults instead of choosing one ourselves.
> > >> > > > >
> > >> > > > > For point 3, I'm in favour of doing that in this KIP.
> > >> > > > >
> > >> > > > > Ismael
> > >> > > > >
> > >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <
> dongjin@apache.org
> > >
> > >> > > wrote:
> > >> > > > >
> > >> > > > > > Thank you Ismael, here are the answers:
> > >> > > > > >
> > >> > > > > > *1. About topic config*
> > >> > > > > >
> > >> > > > > > After some consideration, I concluded that topic config
> > doesn't
> > >> > need
> > >> > > to
> > >> > > > > > support compression.level. Here is why: since the
> compression
> > is
> > >> > > > > conducted
> > >> > > > > > by the client, the one who can select the best compression
> > >> level is
> > >> > > the
> > >> > > > > > client itself. Let us assume that the compression level is
> set
> > >> at
> > >> > the
> > >> > > > > topic
> > >> > > > > > config level. In that case, there is a possibility that the
> > >> > > compression
> > >> > > > > > level is not optimal for some producers. Actually, Kafka's
> go
> > >> > client
> > >> > > > also
> > >> > > > > > supports compression level functionality for the producer
> > config
> > >> > > only.
> > >> > > > > > <https://github.com/Shopify/sarama/blob/master/config.go>
> > >> (wait,
> > >> > do
> > >> > > we
> > >> > > > > > need
> > >> > > > > > to add this reasoning in the KIP, rejected alternatives
> > >> section?)
> > >> > > > > >
> > >> > > > > > *2. About default level*
> > >> > > > > >
> > >> > > > > > As of current draft implementation, the default compression
> is
> > >> set
> > >> > on
> > >> > > > the
> > >> > > > > > CompressionType enum. Of course, changing this strategy into
> > >> > relying
> > >> > > > on a
> > >> > > > > > method from the library to pick the default compression
> level
> > >> seems
> > >> > > > > > possible, like `GZIPBlockOutputStream` does. In this case,
> we
> > >> need
> > >> > to
> > >> > > > add
> > >> > > > > > similar wrapper class for zstd and modify lz4 the wrapper
> > also.
> > >> Add
> > >> > > to
> > >> > > > > > this, it seems like we need to explicitly state that we
> follow
> > >> the
> > >> > > > > default
> > >> > > > > > compression level of the codec in the documentation. Is this
> > >> what
> > >> > you
> > >> > > > > > intended?
> > >> > > > > >
> > >> > > > > > *3. Whether to allow the buffer/block size to be
> configurable*
> > >> > > > > >
> > >> > > > > > Well, As of current draft implementation, the lz4 level is
> > >> > > implemented
> > >> > > > as
> > >> > > > > > block size; this is caused by my misunderstanding on lz4.
> > After
> > >> > > > reviewing
> > >> > > > > > lz4 today, I found that it also supports compression level
> of
> > >> 1~16
> > >> > > > > > (default: 1), not block size. I will fix it in this weekend
> by
> > >> > > updating
> > >> > > > > the
> > >> > > > > > wrapper class.
> > >> > > > > >
> > >> > > > > > For the problem of the buffer/block size, I have no strong
> > >> opinion.
> > >> > > If
> > >> > > > > the
> > >> > > > > > community needs it, I will do it all together. How do you
> > think?
> > >> > > > > >
> > >> > > > > > In short, it seems like I need to update the KIP document
> for
> > >> issue
> > >> > > #1
> > >> > > > > and
> > >> > > > > > update the compression wrapper for issue #2, #3. Is this
> okay?
> > >> > > > > >
> > >> > > > > > Thanks,
> > >> > > > > > Dongjin
> > >> > > > > >
> > >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
> > ismaelj@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > > > >
> > >> > > > > > >  Thanks for the KIP, this is helpful. A few questions:
> > >> > > > > > >
> > >> > > > > > > 1. Have we considered whether we want to allow a similar
> > topic
> > >> > > > config?
> > >> > > > > > > 2. Can we rely on a method from the library to pick the
> > >> default
> > >> > > > > > compression
> > >> > > > > > > level if compression.level is not set? We do it for gzip
> and
> > >> it
> > >> > > would
> > >> > > > > > seem
> > >> > > > > > > reasonable to do something similar for the other
> compression
> > >> > > > libraries.
> > >> > > > > > > 3. Do we want to allow the buffer/block size to be
> > >> configurable?
> > >> > > This
> > >> > > > > has
> > >> > > > > > > an impact on memory usage and people may want to trade
> > >> > compression
> > >> > > > for
> > >> > > > > > > less/more memory in some cases. For example, the default
> for
> > >> LZ4
> > >> > is
> > >> > > > > 64KB
> > >> > > > > > > which is a bit high.
> > >> > > > > > >
> > >> > > > > > > Ismael
> > >> > > > > > >
> > >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
> > dongjin@apache.org
> > >> > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hello dev,
> > >> > > > > > > >
> > >> > > > > > > > I hope to initiate the discussion of KIP-390: Add
> producer
> > >> > option
> > >> > > > to
> > >> > > > > > > adjust
> > >> > > > > > > > compression level
> > >> > > > > > > > <
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > >> > > > > > > > >.
> > >> > > > > > > > All feedbacks will be highly appreciated.
> > >> > > > > > > >
> > >> > > > > > > > Best,
> > >> > > > > > > > Dongjin
> > >> > > > > > > >
> > >> > > > > > > > --
> > >> > > > > > > > *Dongjin Lee*
> > >> > > > > > > >
> > >> > > > > > > > *A hitchhiker in the mathematical world.*
> > >> > > > > > > >
> > >> > > > > > > > *github:  <http://goog_969573159/>
> github.com/dongjinleekr
> > >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > >> > > > > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > >> > > > > > > > www.slideshare.net/dongjinleekr
> > >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > *Dongjin Lee*
> > >> > > > > >
> > >> > > > > > *A hitchhiker in the mathematical world.*
> > >> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > > > kr.linkedin.com/in/dongjinleekr
> > >> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > > > speakerdeck.com/dongjin
> > >> > > > > > <https://speakerdeck.com/dongjin>*
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > *Dongjin Lee*
> > >> > > >
> > >> > > > *A hitchhiker in the mathematical world.*
> > >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > > > <https://github.com/dongjinleekr>linkedin:
> > >> > > kr.linkedin.com/in/dongjinleekr
> > >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > > > speakerdeck.com/dongjin
> > >> > > > <https://speakerdeck.com/dongjin>*
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > *Dongjin Lee*
> > >> >
> > >> > *A hitchhiker in the mathematical world.*
> > >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > >> > <https://github.com/dongjinleekr>linkedin:
> > >> kr.linkedin.com/in/dongjinleekr
> > >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > >> > speakerdeck.com/dongjin
> > >> > <https://speakerdeck.com/dongjin>*
> > >> >
> > >>
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <https://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > > <https://speakerdeck.com/dongjin>*
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Ismael Juma <is...@juma.me.uk>.
Hi Dongjin,

For topic level, you can only have a single compression type so the way it
was before was fine, right? The point you raise is how to set broker
defaults that vary depending on the compression type, correct?

Ismael

On Mon, Jan 14, 2019 at 10:18 AM Dongjin Lee <do...@apache.org> wrote:

> I just realized that there was a missing hole in the KIP, so I fixed it.
> The draft implementation will be updated soon.
>
> In short, the proposed change did not regard the case of the topic or
> broker's 'compression.type' is 'producer'; in this case, the broker has to
> handle all kinds of the supported codec. So I added additional options
> (compression.[gzip,snappy,lz4, zstd].level, compression.[gzip,snappy,lz4,
> zstd].buffer.size) with handling routines.
>
> Please have a look when you are free.
>
> Thanks,
> Dongjin
>
> On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <do...@apache.org> wrote:
>
> > Thanks for pointing out Ismael. It's now updated.
> >
> > Best,
> > Dongjin
> >
> > On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <is...@gmail.com> wrote:
> >
> >> Thanks Dongjin. One minor suggestion: we should mention that the broker
> >> side configs are also topic configs (i.e. can be set for a given topic).
> >>
> >> Ismael
> >>
> >> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org wrote:
> >>
> >> > Happy new year.
> >> >
> >> > I just updated the title and contents of KIP and Jira issue, with
> >> updated
> >> > draft implementation. Now both of compression level and buffer size
> >> options
> >> > are available to producer and broker configuration. You can check the
> >> > updated KIP from modified URL:
> >> >
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> >> >
> >> > Please have a look when you are free.
> >> >
> >> > Thanks,
> >> > Dongjin
> >> >
> >> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <is...@gmail.com>
> wrote:
> >> >
> >> > > The updated title sounds fine to me.
> >> > >
> >> > > Ismael
> >> > >
> >> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org wrote:
> >> > >
> >> > > > Hi Ismael,
> >> > > >
> >> > > > Got it. Your direction is perfectly reasonable. I am now updating
> >> the
> >> > KIP
> >> > > > document and the implementation.
> >> > > >
> >> > > > By allowing the buffer/block size to be configurable, it would be
> >> > better
> >> > > to
> >> > > > update the title of the KIP like 'Allow fine-grained configuration
> >> for
> >> > > > compression'. Is that right?
> >> > > >
> >> > > > @Other committers:
> >> > > >
> >> > > > Is there any other opinion on allowing the buffer/block size to be
> >> > > > configurable?
> >> > > >
> >> > > > Thanks,
> >> > > > Dongjin
> >> > > >
> >> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <is...@juma.me.uk>
> >> wrote:
> >> > > >
> >> > > > > Hi Dongjin,
> >> > > > >
> >> > > > > To clarify, I mean a broker topic config with regards to point
> 1.
> >> As
> >> > > you
> >> > > > > know, compression can be done by the producer and/or by the
> >> broker.
> >> > The
> >> > > > > default is for the broker to just use whatever compression was
> >> used
> >> > by
> >> > > > the
> >> > > > > producer, but this can be changed by the user on a per topic
> >> basis.
> >> > It
> >> > > > > seems like it would make sense for the configs to be .
> consistent
> >> > > between
> >> > > > > producer and broker.
> >> > > > >
> >> > > > > For point 2, I haven't looked at the implementation, but we
> could
> >> do
> >> > it
> >> > > > in
> >> > > > > the `CompressionType` enum by invoking the right constructor or
> >> > > > retrieving
> >> > > > > the default value via a constant (if defined). That's an
> >> > implementation
> >> > > > > detail and can be discussed in the PR. The more general point is
> >> to
> >> > > rely
> >> > > > on
> >> > > > > the library defaults instead of choosing one ourselves.
> >> > > > >
> >> > > > > For point 3, I'm in favour of doing that in this KIP.
> >> > > > >
> >> > > > > Ismael
> >> > > > >
> >> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <dongjin@apache.org
> >
> >> > > wrote:
> >> > > > >
> >> > > > > > Thank you Ismael, here are the answers:
> >> > > > > >
> >> > > > > > *1. About topic config*
> >> > > > > >
> >> > > > > > After some consideration, I concluded that topic config
> doesn't
> >> > need
> >> > > to
> >> > > > > > support compression.level. Here is why: since the compression
> is
> >> > > > > conducted
> >> > > > > > by the client, the one who can select the best compression
> >> level is
> >> > > the
> >> > > > > > client itself. Let us assume that the compression level is set
> >> at
> >> > the
> >> > > > > topic
> >> > > > > > config level. In that case, there is a possibility that the
> >> > > compression
> >> > > > > > level is not optimal for some producers. Actually, Kafka's go
> >> > client
> >> > > > also
> >> > > > > > supports compression level functionality for the producer
> config
> >> > > only.
> >> > > > > > <https://github.com/Shopify/sarama/blob/master/config.go>
> >> (wait,
> >> > do
> >> > > we
> >> > > > > > need
> >> > > > > > to add this reasoning in the KIP, rejected alternatives
> >> section?)
> >> > > > > >
> >> > > > > > *2. About default level*
> >> > > > > >
> >> > > > > > As of current draft implementation, the default compression is
> >> set
> >> > on
> >> > > > the
> >> > > > > > CompressionType enum. Of course, changing this strategy into
> >> > relying
> >> > > > on a
> >> > > > > > method from the library to pick the default compression level
> >> seems
> >> > > > > > possible, like `GZIPBlockOutputStream` does. In this case, we
> >> need
> >> > to
> >> > > > add
> >> > > > > > similar wrapper class for zstd and modify lz4 the wrapper
> also.
> >> Add
> >> > > to
> >> > > > > > this, it seems like we need to explicitly state that we follow
> >> the
> >> > > > > default
> >> > > > > > compression level of the codec in the documentation. Is this
> >> what
> >> > you
> >> > > > > > intended?
> >> > > > > >
> >> > > > > > *3. Whether to allow the buffer/block size to be configurable*
> >> > > > > >
> >> > > > > > Well, As of current draft implementation, the lz4 level is
> >> > > implemented
> >> > > > as
> >> > > > > > block size; this is caused by my misunderstanding on lz4.
> After
> >> > > > reviewing
> >> > > > > > lz4 today, I found that it also supports compression level of
> >> 1~16
> >> > > > > > (default: 1), not block size. I will fix it in this weekend by
> >> > > updating
> >> > > > > the
> >> > > > > > wrapper class.
> >> > > > > >
> >> > > > > > For the problem of the buffer/block size, I have no strong
> >> opinion.
> >> > > If
> >> > > > > the
> >> > > > > > community needs it, I will do it all together. How do you
> think?
> >> > > > > >
> >> > > > > > In short, it seems like I need to update the KIP document for
> >> issue
> >> > > #1
> >> > > > > and
> >> > > > > > update the compression wrapper for issue #2, #3. Is this okay?
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Dongjin
> >> > > > > >
> >> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
> ismaelj@gmail.com
> >> >
> >> > > > wrote:
> >> > > > > >
> >> > > > > > >  Thanks for the KIP, this is helpful. A few questions:
> >> > > > > > >
> >> > > > > > > 1. Have we considered whether we want to allow a similar
> topic
> >> > > > config?
> >> > > > > > > 2. Can we rely on a method from the library to pick the
> >> default
> >> > > > > > compression
> >> > > > > > > level if compression.level is not set? We do it for gzip and
> >> it
> >> > > would
> >> > > > > > seem
> >> > > > > > > reasonable to do something similar for the other compression
> >> > > > libraries.
> >> > > > > > > 3. Do we want to allow the buffer/block size to be
> >> configurable?
> >> > > This
> >> > > > > has
> >> > > > > > > an impact on memory usage and people may want to trade
> >> > compression
> >> > > > for
> >> > > > > > > less/more memory in some cases. For example, the default for
> >> LZ4
> >> > is
> >> > > > > 64KB
> >> > > > > > > which is a bit high.
> >> > > > > > >
> >> > > > > > > Ismael
> >> > > > > > >
> >> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
> dongjin@apache.org
> >> > > wrote:
> >> > > > > > >
> >> > > > > > > > Hello dev,
> >> > > > > > > >
> >> > > > > > > > I hope to initiate the discussion of KIP-390: Add producer
> >> > option
> >> > > > to
> >> > > > > > > adjust
> >> > > > > > > > compression level
> >> > > > > > > > <
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> >> > > > > > > > >.
> >> > > > > > > > All feedbacks will be highly appreciated.
> >> > > > > > > >
> >> > > > > > > > Best,
> >> > > > > > > > Dongjin
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > *Dongjin Lee*
> >> > > > > > > >
> >> > > > > > > > *A hitchhiker in the mathematical world.*
> >> > > > > > > >
> >> > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> >> > > > > > > kr.linkedin.com/in/dongjinleekr
> >> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> >> > > > > > > > www.slideshare.net/dongjinleekr
> >> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > *Dongjin Lee*
> >> > > > > >
> >> > > > > > *A hitchhiker in the mathematical world.*
> >> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > > > <https://github.com/dongjinleekr>linkedin:
> >> > > > > kr.linkedin.com/in/dongjinleekr
> >> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > > > speakerdeck.com/dongjin
> >> > > > > > <https://speakerdeck.com/dongjin>*
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > *Dongjin Lee*
> >> > > >
> >> > > > *A hitchhiker in the mathematical world.*
> >> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > > > <https://github.com/dongjinleekr>linkedin:
> >> > > kr.linkedin.com/in/dongjinleekr
> >> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > > > speakerdeck.com/dongjin
> >> > > > <https://speakerdeck.com/dongjin>*
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > *Dongjin Lee*
> >> >
> >> > *A hitchhiker in the mathematical world.*
> >> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> >> > <https://github.com/dongjinleekr>linkedin:
> >> kr.linkedin.com/in/dongjinleekr
> >> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> >> > speakerdeck.com/dongjin
> >> > <https://speakerdeck.com/dongjin>*
> >> >
> >>
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
I just updated the draft PR. Please have a look when you are free.

Following the update, I have a minor update to the KIP document -
'compression.snappy.level' and 'compression.zstd.buffer.size' were removed
since these options are meaningless. (Snappy doesn't support compression
level, and zstd doesn't support buffer size.)

If there are no other feedbacks, I hope to put this proposal to the vote by
next Monday, Jan 21.

Thanks,
Dongjin

On Tue, Jan 15, 2019 at 3:17 AM Dongjin Lee <do...@apache.org> wrote:

> I just realized that there was a missing hole in the KIP, so I fixed it.
> The draft implementation will be updated soon.
>
> In short, the proposed change did not regard the case of the topic or
> broker's 'compression.type' is 'producer'; in this case, the broker has to
> handle all kinds of the supported codec. So I added additional options
> (compression.[gzip,snappy,lz4, zstd].level, compression.[gzip,snappy,lz4,
> zstd].buffer.size) with handling routines.
>
> Please have a look when you are free.
>
> Thanks,
> Dongjin
>
> On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <do...@apache.org> wrote:
>
>> Thanks for pointing out Ismael. It's now updated.
>>
>> Best,
>> Dongjin
>>
>> On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <is...@gmail.com> wrote:
>>
>>> Thanks Dongjin. One minor suggestion: we should mention that the broker
>>> side configs are also topic configs (i.e. can be set for a given topic).
>>>
>>> Ismael
>>>
>>> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org wrote:
>>>
>>> > Happy new year.
>>> >
>>> > I just updated the title and contents of KIP and Jira issue, with
>>> updated
>>> > draft implementation. Now both of compression level and buffer size
>>> options
>>> > are available to producer and broker configuration. You can check the
>>> > updated KIP from modified URL:
>>> >
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
>>> >
>>> > Please have a look when you are free.
>>> >
>>> > Thanks,
>>> > Dongjin
>>> >
>>> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <is...@gmail.com> wrote:
>>> >
>>> > > The updated title sounds fine to me.
>>> > >
>>> > > Ismael
>>> > >
>>> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org wrote:
>>> > >
>>> > > > Hi Ismael,
>>> > > >
>>> > > > Got it. Your direction is perfectly reasonable. I am now updating
>>> the
>>> > KIP
>>> > > > document and the implementation.
>>> > > >
>>> > > > By allowing the buffer/block size to be configurable, it would be
>>> > better
>>> > > to
>>> > > > update the title of the KIP like 'Allow fine-grained configuration
>>> for
>>> > > > compression'. Is that right?
>>> > > >
>>> > > > @Other committers:
>>> > > >
>>> > > > Is there any other opinion on allowing the buffer/block size to be
>>> > > > configurable?
>>> > > >
>>> > > > Thanks,
>>> > > > Dongjin
>>> > > >
>>> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <is...@juma.me.uk>
>>> wrote:
>>> > > >
>>> > > > > Hi Dongjin,
>>> > > > >
>>> > > > > To clarify, I mean a broker topic config with regards to point
>>> 1. As
>>> > > you
>>> > > > > know, compression can be done by the producer and/or by the
>>> broker.
>>> > The
>>> > > > > default is for the broker to just use whatever compression was
>>> used
>>> > by
>>> > > > the
>>> > > > > producer, but this can be changed by the user on a per topic
>>> basis.
>>> > It
>>> > > > > seems like it would make sense for the configs to be . consistent
>>> > > between
>>> > > > > producer and broker.
>>> > > > >
>>> > > > > For point 2, I haven't looked at the implementation, but we
>>> could do
>>> > it
>>> > > > in
>>> > > > > the `CompressionType` enum by invoking the right constructor or
>>> > > > retrieving
>>> > > > > the default value via a constant (if defined). That's an
>>> > implementation
>>> > > > > detail and can be discussed in the PR. The more general point is
>>> to
>>> > > rely
>>> > > > on
>>> > > > > the library defaults instead of choosing one ourselves.
>>> > > > >
>>> > > > > For point 3, I'm in favour of doing that in this KIP.
>>> > > > >
>>> > > > > Ismael
>>> > > > >
>>> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <do...@apache.org>
>>> > > wrote:
>>> > > > >
>>> > > > > > Thank you Ismael, here are the answers:
>>> > > > > >
>>> > > > > > *1. About topic config*
>>> > > > > >
>>> > > > > > After some consideration, I concluded that topic config doesn't
>>> > need
>>> > > to
>>> > > > > > support compression.level. Here is why: since the compression
>>> is
>>> > > > > conducted
>>> > > > > > by the client, the one who can select the best compression
>>> level is
>>> > > the
>>> > > > > > client itself. Let us assume that the compression level is set
>>> at
>>> > the
>>> > > > > topic
>>> > > > > > config level. In that case, there is a possibility that the
>>> > > compression
>>> > > > > > level is not optimal for some producers. Actually, Kafka's go
>>> > client
>>> > > > also
>>> > > > > > supports compression level functionality for the producer
>>> config
>>> > > only.
>>> > > > > > <https://github.com/Shopify/sarama/blob/master/config.go>
>>> (wait,
>>> > do
>>> > > we
>>> > > > > > need
>>> > > > > > to add this reasoning in the KIP, rejected alternatives
>>> section?)
>>> > > > > >
>>> > > > > > *2. About default level*
>>> > > > > >
>>> > > > > > As of current draft implementation, the default compression is
>>> set
>>> > on
>>> > > > the
>>> > > > > > CompressionType enum. Of course, changing this strategy into
>>> > relying
>>> > > > on a
>>> > > > > > method from the library to pick the default compression level
>>> seems
>>> > > > > > possible, like `GZIPBlockOutputStream` does. In this case, we
>>> need
>>> > to
>>> > > > add
>>> > > > > > similar wrapper class for zstd and modify lz4 the wrapper
>>> also. Add
>>> > > to
>>> > > > > > this, it seems like we need to explicitly state that we follow
>>> the
>>> > > > > default
>>> > > > > > compression level of the codec in the documentation. Is this
>>> what
>>> > you
>>> > > > > > intended?
>>> > > > > >
>>> > > > > > *3. Whether to allow the buffer/block size to be configurable*
>>> > > > > >
>>> > > > > > Well, As of current draft implementation, the lz4 level is
>>> > > implemented
>>> > > > as
>>> > > > > > block size; this is caused by my misunderstanding on lz4. After
>>> > > > reviewing
>>> > > > > > lz4 today, I found that it also supports compression level of
>>> 1~16
>>> > > > > > (default: 1), not block size. I will fix it in this weekend by
>>> > > updating
>>> > > > > the
>>> > > > > > wrapper class.
>>> > > > > >
>>> > > > > > For the problem of the buffer/block size, I have no strong
>>> opinion.
>>> > > If
>>> > > > > the
>>> > > > > > community needs it, I will do it all together. How do you
>>> think?
>>> > > > > >
>>> > > > > > In short, it seems like I need to update the KIP document for
>>> issue
>>> > > #1
>>> > > > > and
>>> > > > > > update the compression wrapper for issue #2, #3. Is this okay?
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > > Dongjin
>>> > > > > >
>>> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <
>>> ismaelj@gmail.com>
>>> > > > wrote:
>>> > > > > >
>>> > > > > > >  Thanks for the KIP, this is helpful. A few questions:
>>> > > > > > >
>>> > > > > > > 1. Have we considered whether we want to allow a similar
>>> topic
>>> > > > config?
>>> > > > > > > 2. Can we rely on a method from the library to pick the
>>> default
>>> > > > > > compression
>>> > > > > > > level if compression.level is not set? We do it for gzip and
>>> it
>>> > > would
>>> > > > > > seem
>>> > > > > > > reasonable to do something similar for the other compression
>>> > > > libraries.
>>> > > > > > > 3. Do we want to allow the buffer/block size to be
>>> configurable?
>>> > > This
>>> > > > > has
>>> > > > > > > an impact on memory usage and people may want to trade
>>> > compression
>>> > > > for
>>> > > > > > > less/more memory in some cases. For example, the default for
>>> LZ4
>>> > is
>>> > > > > 64KB
>>> > > > > > > which is a bit high.
>>> > > > > > >
>>> > > > > > > Ismael
>>> > > > > > >
>>> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <
>>> dongjin@apache.org
>>> > > wrote:
>>> > > > > > >
>>> > > > > > > > Hello dev,
>>> > > > > > > >
>>> > > > > > > > I hope to initiate the discussion of KIP-390: Add producer
>>> > option
>>> > > > to
>>> > > > > > > adjust
>>> > > > > > > > compression level
>>> > > > > > > > <
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
>>> > > > > > > > >.
>>> > > > > > > > All feedbacks will be highly appreciated.
>>> > > > > > > >
>>> > > > > > > > Best,
>>> > > > > > > > Dongjin
>>> > > > > > > >
>>> > > > > > > > --
>>> > > > > > > > *Dongjin Lee*
>>> > > > > > > >
>>> > > > > > > > *A hitchhiker in the mathematical world.*
>>> > > > > > > >
>>> > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>>> > > > > > > > <http://github.com/dongjinleekr>linkedin:
>>> > > > > > > kr.linkedin.com/in/dongjinleekr
>>> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>>> > > > > > > > www.slideshare.net/dongjinleekr
>>> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > --
>>> > > > > > *Dongjin Lee*
>>> > > > > >
>>> > > > > > *A hitchhiker in the mathematical world.*
>>> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>>> > > > > > <https://github.com/dongjinleekr>linkedin:
>>> > > > > kr.linkedin.com/in/dongjinleekr
>>> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>>> > > > > > speakerdeck.com/dongjin
>>> > > > > > <https://speakerdeck.com/dongjin>*
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > *Dongjin Lee*
>>> > > >
>>> > > > *A hitchhiker in the mathematical world.*
>>> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>>> > > > <https://github.com/dongjinleekr>linkedin:
>>> > > kr.linkedin.com/in/dongjinleekr
>>> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>>> > > > speakerdeck.com/dongjin
>>> > > > <https://speakerdeck.com/dongjin>*
>>> > > >
>>> > >
>>> >
>>> >
>>> > --
>>> > *Dongjin Lee*
>>> >
>>> > *A hitchhiker in the mathematical world.*
>>> > *github:  <http://goog_969573159/>github.com/dongjinleekr
>>> > <https://github.com/dongjinleekr>linkedin:
>>> kr.linkedin.com/in/dongjinleekr
>>> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>>> > speakerdeck.com/dongjin
>>> > <https://speakerdeck.com/dongjin>*
>>> >
>>>
>>
>>
>> --
>> *Dongjin Lee*
>>
>> *A hitchhiker in the mathematical world.*
>> *github:  <http://goog_969573159/>github.com/dongjinleekr
>> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
>> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
>> <https://speakerdeck.com/dongjin>*
>>
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
I just realized that there was a missing hole in the KIP, so I fixed it.
The draft implementation will be updated soon.

In short, the proposed change did not regard the case of the topic or
broker's 'compression.type' is 'producer'; in this case, the broker has to
handle all kinds of the supported codec. So I added additional options
(compression.[gzip,snappy,lz4, zstd].level, compression.[gzip,snappy,lz4,
zstd].buffer.size) with handling routines.

Please have a look when you are free.

Thanks,
Dongjin

On Mon, Jan 7, 2019 at 6:23 AM Dongjin Lee <do...@apache.org> wrote:

> Thanks for pointing out Ismael. It's now updated.
>
> Best,
> Dongjin
>
> On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <is...@gmail.com> wrote:
>
>> Thanks Dongjin. One minor suggestion: we should mention that the broker
>> side configs are also topic configs (i.e. can be set for a given topic).
>>
>> Ismael
>>
>> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org wrote:
>>
>> > Happy new year.
>> >
>> > I just updated the title and contents of KIP and Jira issue, with
>> updated
>> > draft implementation. Now both of compression level and buffer size
>> options
>> > are available to producer and broker configuration. You can check the
>> > updated KIP from modified URL:
>> >
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
>> >
>> > Please have a look when you are free.
>> >
>> > Thanks,
>> > Dongjin
>> >
>> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <is...@gmail.com> wrote:
>> >
>> > > The updated title sounds fine to me.
>> > >
>> > > Ismael
>> > >
>> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org wrote:
>> > >
>> > > > Hi Ismael,
>> > > >
>> > > > Got it. Your direction is perfectly reasonable. I am now updating
>> the
>> > KIP
>> > > > document and the implementation.
>> > > >
>> > > > By allowing the buffer/block size to be configurable, it would be
>> > better
>> > > to
>> > > > update the title of the KIP like 'Allow fine-grained configuration
>> for
>> > > > compression'. Is that right?
>> > > >
>> > > > @Other committers:
>> > > >
>> > > > Is there any other opinion on allowing the buffer/block size to be
>> > > > configurable?
>> > > >
>> > > > Thanks,
>> > > > Dongjin
>> > > >
>> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <is...@juma.me.uk>
>> wrote:
>> > > >
>> > > > > Hi Dongjin,
>> > > > >
>> > > > > To clarify, I mean a broker topic config with regards to point 1.
>> As
>> > > you
>> > > > > know, compression can be done by the producer and/or by the
>> broker.
>> > The
>> > > > > default is for the broker to just use whatever compression was
>> used
>> > by
>> > > > the
>> > > > > producer, but this can be changed by the user on a per topic
>> basis.
>> > It
>> > > > > seems like it would make sense for the configs to be . consistent
>> > > between
>> > > > > producer and broker.
>> > > > >
>> > > > > For point 2, I haven't looked at the implementation, but we could
>> do
>> > it
>> > > > in
>> > > > > the `CompressionType` enum by invoking the right constructor or
>> > > > retrieving
>> > > > > the default value via a constant (if defined). That's an
>> > implementation
>> > > > > detail and can be discussed in the PR. The more general point is
>> to
>> > > rely
>> > > > on
>> > > > > the library defaults instead of choosing one ourselves.
>> > > > >
>> > > > > For point 3, I'm in favour of doing that in this KIP.
>> > > > >
>> > > > > Ismael
>> > > > >
>> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <do...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > > Thank you Ismael, here are the answers:
>> > > > > >
>> > > > > > *1. About topic config*
>> > > > > >
>> > > > > > After some consideration, I concluded that topic config doesn't
>> > need
>> > > to
>> > > > > > support compression.level. Here is why: since the compression is
>> > > > > conducted
>> > > > > > by the client, the one who can select the best compression
>> level is
>> > > the
>> > > > > > client itself. Let us assume that the compression level is set
>> at
>> > the
>> > > > > topic
>> > > > > > config level. In that case, there is a possibility that the
>> > > compression
>> > > > > > level is not optimal for some producers. Actually, Kafka's go
>> > client
>> > > > also
>> > > > > > supports compression level functionality for the producer config
>> > > only.
>> > > > > > <https://github.com/Shopify/sarama/blob/master/config.go>
>> (wait,
>> > do
>> > > we
>> > > > > > need
>> > > > > > to add this reasoning in the KIP, rejected alternatives
>> section?)
>> > > > > >
>> > > > > > *2. About default level*
>> > > > > >
>> > > > > > As of current draft implementation, the default compression is
>> set
>> > on
>> > > > the
>> > > > > > CompressionType enum. Of course, changing this strategy into
>> > relying
>> > > > on a
>> > > > > > method from the library to pick the default compression level
>> seems
>> > > > > > possible, like `GZIPBlockOutputStream` does. In this case, we
>> need
>> > to
>> > > > add
>> > > > > > similar wrapper class for zstd and modify lz4 the wrapper also.
>> Add
>> > > to
>> > > > > > this, it seems like we need to explicitly state that we follow
>> the
>> > > > > default
>> > > > > > compression level of the codec in the documentation. Is this
>> what
>> > you
>> > > > > > intended?
>> > > > > >
>> > > > > > *3. Whether to allow the buffer/block size to be configurable*
>> > > > > >
>> > > > > > Well, As of current draft implementation, the lz4 level is
>> > > implemented
>> > > > as
>> > > > > > block size; this is caused by my misunderstanding on lz4. After
>> > > > reviewing
>> > > > > > lz4 today, I found that it also supports compression level of
>> 1~16
>> > > > > > (default: 1), not block size. I will fix it in this weekend by
>> > > updating
>> > > > > the
>> > > > > > wrapper class.
>> > > > > >
>> > > > > > For the problem of the buffer/block size, I have no strong
>> opinion.
>> > > If
>> > > > > the
>> > > > > > community needs it, I will do it all together. How do you think?
>> > > > > >
>> > > > > > In short, it seems like I need to update the KIP document for
>> issue
>> > > #1
>> > > > > and
>> > > > > > update the compression wrapper for issue #2, #3. Is this okay?
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Dongjin
>> > > > > >
>> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <ismaelj@gmail.com
>> >
>> > > > wrote:
>> > > > > >
>> > > > > > >  Thanks for the KIP, this is helpful. A few questions:
>> > > > > > >
>> > > > > > > 1. Have we considered whether we want to allow a similar topic
>> > > > config?
>> > > > > > > 2. Can we rely on a method from the library to pick the
>> default
>> > > > > > compression
>> > > > > > > level if compression.level is not set? We do it for gzip and
>> it
>> > > would
>> > > > > > seem
>> > > > > > > reasonable to do something similar for the other compression
>> > > > libraries.
>> > > > > > > 3. Do we want to allow the buffer/block size to be
>> configurable?
>> > > This
>> > > > > has
>> > > > > > > an impact on memory usage and people may want to trade
>> > compression
>> > > > for
>> > > > > > > less/more memory in some cases. For example, the default for
>> LZ4
>> > is
>> > > > > 64KB
>> > > > > > > which is a bit high.
>> > > > > > >
>> > > > > > > Ismael
>> > > > > > >
>> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <dongjin@apache.org
>> > > wrote:
>> > > > > > >
>> > > > > > > > Hello dev,
>> > > > > > > >
>> > > > > > > > I hope to initiate the discussion of KIP-390: Add producer
>> > option
>> > > > to
>> > > > > > > adjust
>> > > > > > > > compression level
>> > > > > > > > <
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
>> > > > > > > > >.
>> > > > > > > > All feedbacks will be highly appreciated.
>> > > > > > > >
>> > > > > > > > Best,
>> > > > > > > > Dongjin
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > *Dongjin Lee*
>> > > > > > > >
>> > > > > > > > *A hitchhiker in the mathematical world.*
>> > > > > > > >
>> > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > > > > > <http://github.com/dongjinleekr>linkedin:
>> > > > > > > kr.linkedin.com/in/dongjinleekr
>> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
>> > > > > > > > www.slideshare.net/dongjinleekr
>> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > *Dongjin Lee*
>> > > > > >
>> > > > > > *A hitchhiker in the mathematical world.*
>> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > > > <https://github.com/dongjinleekr>linkedin:
>> > > > > kr.linkedin.com/in/dongjinleekr
>> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > > > speakerdeck.com/dongjin
>> > > > > > <https://speakerdeck.com/dongjin>*
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > *Dongjin Lee*
>> > > >
>> > > > *A hitchhiker in the mathematical world.*
>> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > > > <https://github.com/dongjinleekr>linkedin:
>> > > kr.linkedin.com/in/dongjinleekr
>> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > > > speakerdeck.com/dongjin
>> > > > <https://speakerdeck.com/dongjin>*
>> > > >
>> > >
>> >
>> >
>> > --
>> > *Dongjin Lee*
>> >
>> > *A hitchhiker in the mathematical world.*
>> > *github:  <http://goog_969573159/>github.com/dongjinleekr
>> > <https://github.com/dongjinleekr>linkedin:
>> kr.linkedin.com/in/dongjinleekr
>> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
>> > speakerdeck.com/dongjin
>> > <https://speakerdeck.com/dongjin>*
>> >
>>
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Thanks for pointing out Ismael. It's now updated.

Best,
Dongjin

On Mon, Jan 7, 2019 at 4:36 AM Ismael Juma <is...@gmail.com> wrote:

> Thanks Dongjin. One minor suggestion: we should mention that the broker
> side configs are also topic configs (i.e. can be set for a given topic).
>
> Ismael
>
> On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org wrote:
>
> > Happy new year.
> >
> > I just updated the title and contents of KIP and Jira issue, with updated
> > draft implementation. Now both of compression level and buffer size
> options
> > are available to producer and broker configuration. You can check the
> > updated KIP from modified URL:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
> >
> > Please have a look when you are free.
> >
> > Thanks,
> > Dongjin
> >
> > On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <is...@gmail.com> wrote:
> >
> > > The updated title sounds fine to me.
> > >
> > > Ismael
> > >
> > > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > Got it. Your direction is perfectly reasonable. I am now updating the
> > KIP
> > > > document and the implementation.
> > > >
> > > > By allowing the buffer/block size to be configurable, it would be
> > better
> > > to
> > > > update the title of the KIP like 'Allow fine-grained configuration
> for
> > > > compression'. Is that right?
> > > >
> > > > @Other committers:
> > > >
> > > > Is there any other opinion on allowing the buffer/block size to be
> > > > configurable?
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <is...@juma.me.uk>
> wrote:
> > > >
> > > > > Hi Dongjin,
> > > > >
> > > > > To clarify, I mean a broker topic config with regards to point 1.
> As
> > > you
> > > > > know, compression can be done by the producer and/or by the broker.
> > The
> > > > > default is for the broker to just use whatever compression was used
> > by
> > > > the
> > > > > producer, but this can be changed by the user on a per topic basis.
> > It
> > > > > seems like it would make sense for the configs to be . consistent
> > > between
> > > > > producer and broker.
> > > > >
> > > > > For point 2, I haven't looked at the implementation, but we could
> do
> > it
> > > > in
> > > > > the `CompressionType` enum by invoking the right constructor or
> > > > retrieving
> > > > > the default value via a constant (if defined). That's an
> > implementation
> > > > > detail and can be discussed in the PR. The more general point is to
> > > rely
> > > > on
> > > > > the library defaults instead of choosing one ourselves.
> > > > >
> > > > > For point 3, I'm in favour of doing that in this KIP.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <do...@apache.org>
> > > wrote:
> > > > >
> > > > > > Thank you Ismael, here are the answers:
> > > > > >
> > > > > > *1. About topic config*
> > > > > >
> > > > > > After some consideration, I concluded that topic config doesn't
> > need
> > > to
> > > > > > support compression.level. Here is why: since the compression is
> > > > > conducted
> > > > > > by the client, the one who can select the best compression level
> is
> > > the
> > > > > > client itself. Let us assume that the compression level is set at
> > the
> > > > > topic
> > > > > > config level. In that case, there is a possibility that the
> > > compression
> > > > > > level is not optimal for some producers. Actually, Kafka's go
> > client
> > > > also
> > > > > > supports compression level functionality for the producer config
> > > only.
> > > > > > <https://github.com/Shopify/sarama/blob/master/config.go> (wait,
> > do
> > > we
> > > > > > need
> > > > > > to add this reasoning in the KIP, rejected alternatives section?)
> > > > > >
> > > > > > *2. About default level*
> > > > > >
> > > > > > As of current draft implementation, the default compression is
> set
> > on
> > > > the
> > > > > > CompressionType enum. Of course, changing this strategy into
> > relying
> > > > on a
> > > > > > method from the library to pick the default compression level
> seems
> > > > > > possible, like `GZIPBlockOutputStream` does. In this case, we
> need
> > to
> > > > add
> > > > > > similar wrapper class for zstd and modify lz4 the wrapper also.
> Add
> > > to
> > > > > > this, it seems like we need to explicitly state that we follow
> the
> > > > > default
> > > > > > compression level of the codec in the documentation. Is this what
> > you
> > > > > > intended?
> > > > > >
> > > > > > *3. Whether to allow the buffer/block size to be configurable*
> > > > > >
> > > > > > Well, As of current draft implementation, the lz4 level is
> > > implemented
> > > > as
> > > > > > block size; this is caused by my misunderstanding on lz4. After
> > > > reviewing
> > > > > > lz4 today, I found that it also supports compression level of
> 1~16
> > > > > > (default: 1), not block size. I will fix it in this weekend by
> > > updating
> > > > > the
> > > > > > wrapper class.
> > > > > >
> > > > > > For the problem of the buffer/block size, I have no strong
> opinion.
> > > If
> > > > > the
> > > > > > community needs it, I will do it all together. How do you think?
> > > > > >
> > > > > > In short, it seems like I need to update the KIP document for
> issue
> > > #1
> > > > > and
> > > > > > update the compression wrapper for issue #2, #3. Is this okay?
> > > > > >
> > > > > > Thanks,
> > > > > > Dongjin
> > > > > >
> > > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <is...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > >  Thanks for the KIP, this is helpful. A few questions:
> > > > > > >
> > > > > > > 1. Have we considered whether we want to allow a similar topic
> > > > config?
> > > > > > > 2. Can we rely on a method from the library to pick the default
> > > > > > compression
> > > > > > > level if compression.level is not set? We do it for gzip and it
> > > would
> > > > > > seem
> > > > > > > reasonable to do something similar for the other compression
> > > > libraries.
> > > > > > > 3. Do we want to allow the buffer/block size to be
> configurable?
> > > This
> > > > > has
> > > > > > > an impact on memory usage and people may want to trade
> > compression
> > > > for
> > > > > > > less/more memory in some cases. For example, the default for
> LZ4
> > is
> > > > > 64KB
> > > > > > > which is a bit high.
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <dongjin@apache.org
> > > wrote:
> > > > > > >
> > > > > > > > Hello dev,
> > > > > > > >
> > > > > > > > I hope to initiate the discussion of KIP-390: Add producer
> > option
> > > > to
> > > > > > > adjust
> > > > > > > > compression level
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > > > > > > >.
> > > > > > > > All feedbacks will be highly appreciated.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Dongjin
> > > > > > > >
> > > > > > > > --
> > > > > > > > *Dongjin Lee*
> > > > > > > >
> > > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > > >
> > > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > *Dongjin Lee*
> > > > > >
> > > > > > *A hitchhiker in the mathematical world.*
> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > <https://github.com/dongjinleekr>linkedin:
> > > > > kr.linkedin.com/in/dongjinleekr
> > > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > > > speakerdeck.com/dongjin
> > > > > > <https://speakerdeck.com/dongjin>*
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <https://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > speakerdeck.com/dongjin
> > > > <https://speakerdeck.com/dongjin>*
> > > >
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Ismael Juma <is...@gmail.com>.
Thanks Dongjin. One minor suggestion: we should mention that the broker
side configs are also topic configs (i.e. can be set for a given topic).

Ismael

On Sun, Jan 6, 2019, 10:37 AM Dongjin Lee <dongjin@apache.org wrote:

> Happy new year.
>
> I just updated the title and contents of KIP and Jira issue, with updated
> draft implementation. Now both of compression level and buffer size options
> are available to producer and broker configuration. You can check the
> updated KIP from modified URL:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression
>
> Please have a look when you are free.
>
> Thanks,
> Dongjin
>
> On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <is...@gmail.com> wrote:
>
> > The updated title sounds fine to me.
> >
> > Ismael
> >
> > On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org wrote:
> >
> > > Hi Ismael,
> > >
> > > Got it. Your direction is perfectly reasonable. I am now updating the
> KIP
> > > document and the implementation.
> > >
> > > By allowing the buffer/block size to be configurable, it would be
> better
> > to
> > > update the title of the KIP like 'Allow fine-grained configuration for
> > > compression'. Is that right?
> > >
> > > @Other committers:
> > >
> > > Is there any other opinion on allowing the buffer/block size to be
> > > configurable?
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <is...@juma.me.uk> wrote:
> > >
> > > > Hi Dongjin,
> > > >
> > > > To clarify, I mean a broker topic config with regards to point 1. As
> > you
> > > > know, compression can be done by the producer and/or by the broker.
> The
> > > > default is for the broker to just use whatever compression was used
> by
> > > the
> > > > producer, but this can be changed by the user on a per topic basis.
> It
> > > > seems like it would make sense for the configs to be . consistent
> > between
> > > > producer and broker.
> > > >
> > > > For point 2, I haven't looked at the implementation, but we could do
> it
> > > in
> > > > the `CompressionType` enum by invoking the right constructor or
> > > retrieving
> > > > the default value via a constant (if defined). That's an
> implementation
> > > > detail and can be discussed in the PR. The more general point is to
> > rely
> > > on
> > > > the library defaults instead of choosing one ourselves.
> > > >
> > > > For point 3, I'm in favour of doing that in this KIP.
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <do...@apache.org>
> > wrote:
> > > >
> > > > > Thank you Ismael, here are the answers:
> > > > >
> > > > > *1. About topic config*
> > > > >
> > > > > After some consideration, I concluded that topic config doesn't
> need
> > to
> > > > > support compression.level. Here is why: since the compression is
> > > > conducted
> > > > > by the client, the one who can select the best compression level is
> > the
> > > > > client itself. Let us assume that the compression level is set at
> the
> > > > topic
> > > > > config level. In that case, there is a possibility that the
> > compression
> > > > > level is not optimal for some producers. Actually, Kafka's go
> client
> > > also
> > > > > supports compression level functionality for the producer config
> > only.
> > > > > <https://github.com/Shopify/sarama/blob/master/config.go> (wait,
> do
> > we
> > > > > need
> > > > > to add this reasoning in the KIP, rejected alternatives section?)
> > > > >
> > > > > *2. About default level*
> > > > >
> > > > > As of current draft implementation, the default compression is set
> on
> > > the
> > > > > CompressionType enum. Of course, changing this strategy into
> relying
> > > on a
> > > > > method from the library to pick the default compression level seems
> > > > > possible, like `GZIPBlockOutputStream` does. In this case, we need
> to
> > > add
> > > > > similar wrapper class for zstd and modify lz4 the wrapper also. Add
> > to
> > > > > this, it seems like we need to explicitly state that we follow the
> > > > default
> > > > > compression level of the codec in the documentation. Is this what
> you
> > > > > intended?
> > > > >
> > > > > *3. Whether to allow the buffer/block size to be configurable*
> > > > >
> > > > > Well, As of current draft implementation, the lz4 level is
> > implemented
> > > as
> > > > > block size; this is caused by my misunderstanding on lz4. After
> > > reviewing
> > > > > lz4 today, I found that it also supports compression level of 1~16
> > > > > (default: 1), not block size. I will fix it in this weekend by
> > updating
> > > > the
> > > > > wrapper class.
> > > > >
> > > > > For the problem of the buffer/block size, I have no strong opinion.
> > If
> > > > the
> > > > > community needs it, I will do it all together. How do you think?
> > > > >
> > > > > In short, it seems like I need to update the KIP document for issue
> > #1
> > > > and
> > > > > update the compression wrapper for issue #2, #3. Is this okay?
> > > > >
> > > > > Thanks,
> > > > > Dongjin
> > > > >
> > > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <is...@gmail.com>
> > > wrote:
> > > > >
> > > > > >  Thanks for the KIP, this is helpful. A few questions:
> > > > > >
> > > > > > 1. Have we considered whether we want to allow a similar topic
> > > config?
> > > > > > 2. Can we rely on a method from the library to pick the default
> > > > > compression
> > > > > > level if compression.level is not set? We do it for gzip and it
> > would
> > > > > seem
> > > > > > reasonable to do something similar for the other compression
> > > libraries.
> > > > > > 3. Do we want to allow the buffer/block size to be configurable?
> > This
> > > > has
> > > > > > an impact on memory usage and people may want to trade
> compression
> > > for
> > > > > > less/more memory in some cases. For example, the default for LZ4
> is
> > > > 64KB
> > > > > > which is a bit high.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <dongjin@apache.org
> > wrote:
> > > > > >
> > > > > > > Hello dev,
> > > > > > >
> > > > > > > I hope to initiate the discussion of KIP-390: Add producer
> option
> > > to
> > > > > > adjust
> > > > > > > compression level
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > > > > > >.
> > > > > > > All feedbacks will be highly appreciated.
> > > > > > >
> > > > > > > Best,
> > > > > > > Dongjin
> > > > > > >
> > > > > > > --
> > > > > > > *Dongjin Lee*
> > > > > > >
> > > > > > > *A hitchhiker in the mathematical world.*
> > > > > > >
> > > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > > kr.linkedin.com/in/dongjinleekr
> > > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > > www.slideshare.net/dongjinleekr
> > > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <https://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > > speakerdeck.com/dongjin
> > > > > <https://speakerdeck.com/dongjin>*
> > > > >
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <https://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > speakerdeck.com/dongjin
> > > <https://speakerdeck.com/dongjin>*
> > >
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Happy new year.

I just updated the title and contents of KIP and Jira issue, with updated
draft implementation. Now both of compression level and buffer size options
are available to producer and broker configuration. You can check the
updated KIP from modified URL:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Allow+fine-grained+configuration+for+compression

Please have a look when you are free.

Thanks,
Dongjin

On Mon, Dec 3, 2018 at 12:50 AM Ismael Juma <is...@gmail.com> wrote:

> The updated title sounds fine to me.
>
> Ismael
>
> On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org wrote:
>
> > Hi Ismael,
> >
> > Got it. Your direction is perfectly reasonable. I am now updating the KIP
> > document and the implementation.
> >
> > By allowing the buffer/block size to be configurable, it would be better
> to
> > update the title of the KIP like 'Allow fine-grained configuration for
> > compression'. Is that right?
> >
> > @Other committers:
> >
> > Is there any other opinion on allowing the buffer/block size to be
> > configurable?
> >
> > Thanks,
> > Dongjin
> >
> > On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <is...@juma.me.uk> wrote:
> >
> > > Hi Dongjin,
> > >
> > > To clarify, I mean a broker topic config with regards to point 1. As
> you
> > > know, compression can be done by the producer and/or by the broker. The
> > > default is for the broker to just use whatever compression was used by
> > the
> > > producer, but this can be changed by the user on a per topic basis. It
> > > seems like it would make sense for the configs to be . consistent
> between
> > > producer and broker.
> > >
> > > For point 2, I haven't looked at the implementation, but we could do it
> > in
> > > the `CompressionType` enum by invoking the right constructor or
> > retrieving
> > > the default value via a constant (if defined). That's an implementation
> > > detail and can be discussed in the PR. The more general point is to
> rely
> > on
> > > the library defaults instead of choosing one ourselves.
> > >
> > > For point 3, I'm in favour of doing that in this KIP.
> > >
> > > Ismael
> > >
> > > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <do...@apache.org>
> wrote:
> > >
> > > > Thank you Ismael, here are the answers:
> > > >
> > > > *1. About topic config*
> > > >
> > > > After some consideration, I concluded that topic config doesn't need
> to
> > > > support compression.level. Here is why: since the compression is
> > > conducted
> > > > by the client, the one who can select the best compression level is
> the
> > > > client itself. Let us assume that the compression level is set at the
> > > topic
> > > > config level. In that case, there is a possibility that the
> compression
> > > > level is not optimal for some producers. Actually, Kafka's go client
> > also
> > > > supports compression level functionality for the producer config
> only.
> > > > <https://github.com/Shopify/sarama/blob/master/config.go> (wait, do
> we
> > > > need
> > > > to add this reasoning in the KIP, rejected alternatives section?)
> > > >
> > > > *2. About default level*
> > > >
> > > > As of current draft implementation, the default compression is set on
> > the
> > > > CompressionType enum. Of course, changing this strategy into relying
> > on a
> > > > method from the library to pick the default compression level seems
> > > > possible, like `GZIPBlockOutputStream` does. In this case, we need to
> > add
> > > > similar wrapper class for zstd and modify lz4 the wrapper also. Add
> to
> > > > this, it seems like we need to explicitly state that we follow the
> > > default
> > > > compression level of the codec in the documentation. Is this what you
> > > > intended?
> > > >
> > > > *3. Whether to allow the buffer/block size to be configurable*
> > > >
> > > > Well, As of current draft implementation, the lz4 level is
> implemented
> > as
> > > > block size; this is caused by my misunderstanding on lz4. After
> > reviewing
> > > > lz4 today, I found that it also supports compression level of 1~16
> > > > (default: 1), not block size. I will fix it in this weekend by
> updating
> > > the
> > > > wrapper class.
> > > >
> > > > For the problem of the buffer/block size, I have no strong opinion.
> If
> > > the
> > > > community needs it, I will do it all together. How do you think?
> > > >
> > > > In short, it seems like I need to update the KIP document for issue
> #1
> > > and
> > > > update the compression wrapper for issue #2, #3. Is this okay?
> > > >
> > > > Thanks,
> > > > Dongjin
> > > >
> > > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <is...@gmail.com>
> > wrote:
> > > >
> > > > >  Thanks for the KIP, this is helpful. A few questions:
> > > > >
> > > > > 1. Have we considered whether we want to allow a similar topic
> > config?
> > > > > 2. Can we rely on a method from the library to pick the default
> > > > compression
> > > > > level if compression.level is not set? We do it for gzip and it
> would
> > > > seem
> > > > > reasonable to do something similar for the other compression
> > libraries.
> > > > > 3. Do we want to allow the buffer/block size to be configurable?
> This
> > > has
> > > > > an impact on memory usage and people may want to trade compression
> > for
> > > > > less/more memory in some cases. For example, the default for LZ4 is
> > > 64KB
> > > > > which is a bit high.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <dongjin@apache.org
> wrote:
> > > > >
> > > > > > Hello dev,
> > > > > >
> > > > > > I hope to initiate the discussion of KIP-390: Add producer option
> > to
> > > > > adjust
> > > > > > compression level
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > > > > >.
> > > > > > All feedbacks will be highly appreciated.
> > > > > >
> > > > > > Best,
> > > > > > Dongjin
> > > > > >
> > > > > > --
> > > > > > *Dongjin Lee*
> > > > > >
> > > > > > *A hitchhiker in the mathematical world.*
> > > > > >
> > > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > > <http://github.com/dongjinleekr>linkedin:
> > > > > kr.linkedin.com/in/dongjinleekr
> > > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > > www.slideshare.net/dongjinleekr
> > > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <https://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > > speakerdeck.com/dongjin
> > > > <https://speakerdeck.com/dongjin>*
> > > >
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Ismael Juma <is...@gmail.com>.
The updated title sounds fine to me.

Ismael

On Sun, Dec 2, 2018, 5:25 AM Dongjin Lee <dongjin@apache.org wrote:

> Hi Ismael,
>
> Got it. Your direction is perfectly reasonable. I am now updating the KIP
> document and the implementation.
>
> By allowing the buffer/block size to be configurable, it would be better to
> update the title of the KIP like 'Allow fine-grained configuration for
> compression'. Is that right?
>
> @Other committers:
>
> Is there any other opinion on allowing the buffer/block size to be
> configurable?
>
> Thanks,
> Dongjin
>
> On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <is...@juma.me.uk> wrote:
>
> > Hi Dongjin,
> >
> > To clarify, I mean a broker topic config with regards to point 1. As you
> > know, compression can be done by the producer and/or by the broker. The
> > default is for the broker to just use whatever compression was used by
> the
> > producer, but this can be changed by the user on a per topic basis. It
> > seems like it would make sense for the configs to be . consistent between
> > producer and broker.
> >
> > For point 2, I haven't looked at the implementation, but we could do it
> in
> > the `CompressionType` enum by invoking the right constructor or
> retrieving
> > the default value via a constant (if defined). That's an implementation
> > detail and can be discussed in the PR. The more general point is to rely
> on
> > the library defaults instead of choosing one ourselves.
> >
> > For point 3, I'm in favour of doing that in this KIP.
> >
> > Ismael
> >
> > On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <do...@apache.org> wrote:
> >
> > > Thank you Ismael, here are the answers:
> > >
> > > *1. About topic config*
> > >
> > > After some consideration, I concluded that topic config doesn't need to
> > > support compression.level. Here is why: since the compression is
> > conducted
> > > by the client, the one who can select the best compression level is the
> > > client itself. Let us assume that the compression level is set at the
> > topic
> > > config level. In that case, there is a possibility that the compression
> > > level is not optimal for some producers. Actually, Kafka's go client
> also
> > > supports compression level functionality for the producer config only.
> > > <https://github.com/Shopify/sarama/blob/master/config.go> (wait, do we
> > > need
> > > to add this reasoning in the KIP, rejected alternatives section?)
> > >
> > > *2. About default level*
> > >
> > > As of current draft implementation, the default compression is set on
> the
> > > CompressionType enum. Of course, changing this strategy into relying
> on a
> > > method from the library to pick the default compression level seems
> > > possible, like `GZIPBlockOutputStream` does. In this case, we need to
> add
> > > similar wrapper class for zstd and modify lz4 the wrapper also. Add to
> > > this, it seems like we need to explicitly state that we follow the
> > default
> > > compression level of the codec in the documentation. Is this what you
> > > intended?
> > >
> > > *3. Whether to allow the buffer/block size to be configurable*
> > >
> > > Well, As of current draft implementation, the lz4 level is implemented
> as
> > > block size; this is caused by my misunderstanding on lz4. After
> reviewing
> > > lz4 today, I found that it also supports compression level of 1~16
> > > (default: 1), not block size. I will fix it in this weekend by updating
> > the
> > > wrapper class.
> > >
> > > For the problem of the buffer/block size, I have no strong opinion. If
> > the
> > > community needs it, I will do it all together. How do you think?
> > >
> > > In short, it seems like I need to update the KIP document for issue #1
> > and
> > > update the compression wrapper for issue #2, #3. Is this okay?
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <is...@gmail.com>
> wrote:
> > >
> > > >  Thanks for the KIP, this is helpful. A few questions:
> > > >
> > > > 1. Have we considered whether we want to allow a similar topic
> config?
> > > > 2. Can we rely on a method from the library to pick the default
> > > compression
> > > > level if compression.level is not set? We do it for gzip and it would
> > > seem
> > > > reasonable to do something similar for the other compression
> libraries.
> > > > 3. Do we want to allow the buffer/block size to be configurable? This
> > has
> > > > an impact on memory usage and people may want to trade compression
> for
> > > > less/more memory in some cases. For example, the default for LZ4 is
> > 64KB
> > > > which is a bit high.
> > > >
> > > > Ismael
> > > >
> > > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <dongjin@apache.org wrote:
> > > >
> > > > > Hello dev,
> > > > >
> > > > > I hope to initiate the discussion of KIP-390: Add producer option
> to
> > > > adjust
> > > > > compression level
> > > > > <
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > > > >.
> > > > > All feedbacks will be highly appreciated.
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > >
> > > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > > <http://github.com/dongjinleekr>linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > > www.slideshare.net/dongjinleekr
> > > > > <http://www.slideshare.net/dongjinleekr>*
> > > > >
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <https://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > > speakerdeck.com/dongjin
> > > <https://speakerdeck.com/dongjin>*
> > >
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Hi Ismael,

Got it. Your direction is perfectly reasonable. I am now updating the KIP
document and the implementation.

By allowing the buffer/block size to be configurable, it would be better to
update the title of the KIP like 'Allow fine-grained configuration for
compression'. Is that right?

@Other committers:

Is there any other opinion on allowing the buffer/block size to be
configurable?

Thanks,
Dongjin

On Thu, Nov 29, 2018 at 1:45 AM Ismael Juma <is...@juma.me.uk> wrote:

> Hi Dongjin,
>
> To clarify, I mean a broker topic config with regards to point 1. As you
> know, compression can be done by the producer and/or by the broker. The
> default is for the broker to just use whatever compression was used by the
> producer, but this can be changed by the user on a per topic basis. It
> seems like it would make sense for the configs to be . consistent between
> producer and broker.
>
> For point 2, I haven't looked at the implementation, but we could do it in
> the `CompressionType` enum by invoking the right constructor or retrieving
> the default value via a constant (if defined). That's an implementation
> detail and can be discussed in the PR. The more general point is to rely on
> the library defaults instead of choosing one ourselves.
>
> For point 3, I'm in favour of doing that in this KIP.
>
> Ismael
>
> On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <do...@apache.org> wrote:
>
> > Thank you Ismael, here are the answers:
> >
> > *1. About topic config*
> >
> > After some consideration, I concluded that topic config doesn't need to
> > support compression.level. Here is why: since the compression is
> conducted
> > by the client, the one who can select the best compression level is the
> > client itself. Let us assume that the compression level is set at the
> topic
> > config level. In that case, there is a possibility that the compression
> > level is not optimal for some producers. Actually, Kafka's go client also
> > supports compression level functionality for the producer config only.
> > <https://github.com/Shopify/sarama/blob/master/config.go> (wait, do we
> > need
> > to add this reasoning in the KIP, rejected alternatives section?)
> >
> > *2. About default level*
> >
> > As of current draft implementation, the default compression is set on the
> > CompressionType enum. Of course, changing this strategy into relying on a
> > method from the library to pick the default compression level seems
> > possible, like `GZIPBlockOutputStream` does. In this case, we need to add
> > similar wrapper class for zstd and modify lz4 the wrapper also. Add to
> > this, it seems like we need to explicitly state that we follow the
> default
> > compression level of the codec in the documentation. Is this what you
> > intended?
> >
> > *3. Whether to allow the buffer/block size to be configurable*
> >
> > Well, As of current draft implementation, the lz4 level is implemented as
> > block size; this is caused by my misunderstanding on lz4. After reviewing
> > lz4 today, I found that it also supports compression level of 1~16
> > (default: 1), not block size. I will fix it in this weekend by updating
> the
> > wrapper class.
> >
> > For the problem of the buffer/block size, I have no strong opinion. If
> the
> > community needs it, I will do it all together. How do you think?
> >
> > In short, it seems like I need to update the KIP document for issue #1
> and
> > update the compression wrapper for issue #2, #3. Is this okay?
> >
> > Thanks,
> > Dongjin
> >
> > On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <is...@gmail.com> wrote:
> >
> > >  Thanks for the KIP, this is helpful. A few questions:
> > >
> > > 1. Have we considered whether we want to allow a similar topic config?
> > > 2. Can we rely on a method from the library to pick the default
> > compression
> > > level if compression.level is not set? We do it for gzip and it would
> > seem
> > > reasonable to do something similar for the other compression libraries.
> > > 3. Do we want to allow the buffer/block size to be configurable? This
> has
> > > an impact on memory usage and people may want to trade compression for
> > > less/more memory in some cases. For example, the default for LZ4 is
> 64KB
> > > which is a bit high.
> > >
> > > Ismael
> > >
> > > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <dongjin@apache.org wrote:
> > >
> > > > Hello dev,
> > > >
> > > > I hope to initiate the discussion of KIP-390: Add producer option to
> > > adjust
> > > > compression level
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > > >.
> > > > All feedbacks will be highly appreciated.
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > --
> > > > *Dongjin Lee*
> > > >
> > > > *A hitchhiker in the mathematical world.*
> > > >
> > > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > > <http://github.com/dongjinleekr>linkedin:
> > > kr.linkedin.com/in/dongjinleekr
> > > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > > www.slideshare.net/dongjinleekr
> > > > <http://www.slideshare.net/dongjinleekr>*
> > > >
> > >
> >
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <https://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> > speakerdeck.com/dongjin
> > <https://speakerdeck.com/dongjin>*
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Ismael Juma <is...@juma.me.uk>.
Hi Dongjin,

To clarify, I mean a broker topic config with regards to point 1. As you
know, compression can be done by the producer and/or by the broker. The
default is for the broker to just use whatever compression was used by the
producer, but this can be changed by the user on a per topic basis. It
seems like it would make sense for the configs to be . consistent between
producer and broker.

For point 2, I haven't looked at the implementation, but we could do it in
the `CompressionType` enum by invoking the right constructor or retrieving
the default value via a constant (if defined). That's an implementation
detail and can be discussed in the PR. The more general point is to rely on
the library defaults instead of choosing one ourselves.

For point 3, I'm in favour of doing that in this KIP.

Ismael

On Wed, Nov 28, 2018 at 7:01 AM Dongjin Lee <do...@apache.org> wrote:

> Thank you Ismael, here are the answers:
>
> *1. About topic config*
>
> After some consideration, I concluded that topic config doesn't need to
> support compression.level. Here is why: since the compression is conducted
> by the client, the one who can select the best compression level is the
> client itself. Let us assume that the compression level is set at the topic
> config level. In that case, there is a possibility that the compression
> level is not optimal for some producers. Actually, Kafka's go client also
> supports compression level functionality for the producer config only.
> <https://github.com/Shopify/sarama/blob/master/config.go> (wait, do we
> need
> to add this reasoning in the KIP, rejected alternatives section?)
>
> *2. About default level*
>
> As of current draft implementation, the default compression is set on the
> CompressionType enum. Of course, changing this strategy into relying on a
> method from the library to pick the default compression level seems
> possible, like `GZIPBlockOutputStream` does. In this case, we need to add
> similar wrapper class for zstd and modify lz4 the wrapper also. Add to
> this, it seems like we need to explicitly state that we follow the default
> compression level of the codec in the documentation. Is this what you
> intended?
>
> *3. Whether to allow the buffer/block size to be configurable*
>
> Well, As of current draft implementation, the lz4 level is implemented as
> block size; this is caused by my misunderstanding on lz4. After reviewing
> lz4 today, I found that it also supports compression level of 1~16
> (default: 1), not block size. I will fix it in this weekend by updating the
> wrapper class.
>
> For the problem of the buffer/block size, I have no strong opinion. If the
> community needs it, I will do it all together. How do you think?
>
> In short, it seems like I need to update the KIP document for issue #1 and
> update the compression wrapper for issue #2, #3. Is this okay?
>
> Thanks,
> Dongjin
>
> On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <is...@gmail.com> wrote:
>
> >  Thanks for the KIP, this is helpful. A few questions:
> >
> > 1. Have we considered whether we want to allow a similar topic config?
> > 2. Can we rely on a method from the library to pick the default
> compression
> > level if compression.level is not set? We do it for gzip and it would
> seem
> > reasonable to do something similar for the other compression libraries.
> > 3. Do we want to allow the buffer/block size to be configurable? This has
> > an impact on memory usage and people may want to trade compression for
> > less/more memory in some cases. For example, the default for LZ4 is 64KB
> > which is a bit high.
> >
> > Ismael
> >
> > On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <dongjin@apache.org wrote:
> >
> > > Hello dev,
> > >
> > > I hope to initiate the discussion of KIP-390: Add producer option to
> > adjust
> > > compression level
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > > >.
> > > All feedbacks will be highly appreciated.
> > >
> > > Best,
> > > Dongjin
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > > <http://github.com/dongjinleekr>linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > > www.slideshare.net/dongjinleekr
> > > <http://www.slideshare.net/dongjinleekr>*
> > >
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Dongjin Lee <do...@apache.org>.
Thank you Ismael, here are the answers:

*1. About topic config*

After some consideration, I concluded that topic config doesn't need to
support compression.level. Here is why: since the compression is conducted
by the client, the one who can select the best compression level is the
client itself. Let us assume that the compression level is set at the topic
config level. In that case, there is a possibility that the compression
level is not optimal for some producers. Actually, Kafka's go client also
supports compression level functionality for the producer config only.
<https://github.com/Shopify/sarama/blob/master/config.go> (wait, do we need
to add this reasoning in the KIP, rejected alternatives section?)

*2. About default level*

As of current draft implementation, the default compression is set on the
CompressionType enum. Of course, changing this strategy into relying on a
method from the library to pick the default compression level seems
possible, like `GZIPBlockOutputStream` does. In this case, we need to add
similar wrapper class for zstd and modify lz4 the wrapper also. Add to
this, it seems like we need to explicitly state that we follow the default
compression level of the codec in the documentation. Is this what you
intended?

*3. Whether to allow the buffer/block size to be configurable*

Well, As of current draft implementation, the lz4 level is implemented as
block size; this is caused by my misunderstanding on lz4. After reviewing
lz4 today, I found that it also supports compression level of 1~16
(default: 1), not block size. I will fix it in this weekend by updating the
wrapper class.

For the problem of the buffer/block size, I have no strong opinion. If the
community needs it, I will do it all together. How do you think?

In short, it seems like I need to update the KIP document for issue #1 and
update the compression wrapper for issue #2, #3. Is this okay?

Thanks,
Dongjin

On Wed, Nov 28, 2018 at 12:34 AM Ismael Juma <is...@gmail.com> wrote:

>  Thanks for the KIP, this is helpful. A few questions:
>
> 1. Have we considered whether we want to allow a similar topic config?
> 2. Can we rely on a method from the library to pick the default compression
> level if compression.level is not set? We do it for gzip and it would seem
> reasonable to do something similar for the other compression libraries.
> 3. Do we want to allow the buffer/block size to be configurable? This has
> an impact on memory usage and people may want to trade compression for
> less/more memory in some cases. For example, the default for LZ4 is 64KB
> which is a bit high.
>
> Ismael
>
> On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <dongjin@apache.org wrote:
>
> > Hello dev,
> >
> > I hope to initiate the discussion of KIP-390: Add producer option to
> adjust
> > compression level
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> > >.
> > All feedbacks will be highly appreciated.
> >
> > Best,
> > Dongjin
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> > *github:  <http://goog_969573159/>github.com/dongjinleekr
> > <http://github.com/dongjinleekr>linkedin:
> kr.linkedin.com/in/dongjinleekr
> > <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> > www.slideshare.net/dongjinleekr
> > <http://www.slideshare.net/dongjinleekr>*
> >
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*
*github:  <http://goog_969573159/>github.com/dongjinleekr
<https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
<https://kr.linkedin.com/in/dongjinleekr>speakerdeck: speakerdeck.com/dongjin
<https://speakerdeck.com/dongjin>*

Re: [DISCUSS] KIP-390: Add producer option to adjust compression level

Posted by Ismael Juma <is...@gmail.com>.
 Thanks for the KIP, this is helpful. A few questions:

1. Have we considered whether we want to allow a similar topic config?
2. Can we rely on a method from the library to pick the default compression
level if compression.level is not set? We do it for gzip and it would seem
reasonable to do something similar for the other compression libraries.
3. Do we want to allow the buffer/block size to be configurable? This has
an impact on memory usage and people may want to trade compression for
less/more memory in some cases. For example, the default for LZ4 is 64KB
which is a bit high.

Ismael

On Sun, Nov 18, 2018, 2:07 PM Dongjin Lee <dongjin@apache.org wrote:

> Hello dev,
>
> I hope to initiate the discussion of KIP-390: Add producer option to adjust
> compression level
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-390%3A+Add+producer+option+to+adjust+compression+level
> >.
> All feedbacks will be highly appreciated.
>
> Best,
> Dongjin
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <http://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <http://kr.linkedin.com/in/dongjinleekr>slideshare:
> www.slideshare.net/dongjinleekr
> <http://www.slideshare.net/dongjinleekr>*
>