You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Sophie Blee-Goldman <so...@responsive.dev> on 2024/04/05 21:24:30 UTC

Re: [DISCUSS] KIP-1020: Deprecate `window.size.ms` in `StreamsConfig`

Thanks Matthias -- sorry I forgot to get back to you!

Thanks for kicking off the voting thread on this

On Thu, Mar 28, 2024 at 3:58 AM Matthias J. Sax <mj...@apache.org> wrote:

> Thanks. I think you can start a VOTE.
>
> -Matthias
>
>
> On 3/20/24 9:24 PM, Lucia Cerchie wrote:
> > thanks Sophie, I've made the updates, would appreciate one more look
> before
> > submission
> >
> > On Wed, Mar 20, 2024 at 8:36 AM Sophie Blee-Goldman <
> sophie@responsive.dev>
> > wrote:
> >
> >> A few minor notes on the KIP but otherwise I think you can go ahead and
> >> call for a vote!
> >>
> >> 1. Need to update the Motivation section to say "console clients" or
> >> "console consumer/producer" instead of " plain consumer client"
> >> 2. Remove the first paragraph under "Public Interfaces" (ie the
> KIP-writing
> >> instructions) and also list the new config definitions here. You can
> just
> >> add a code snippet for each class (TimeWindowedDe/Serializer) with the
> >> actual variable definition you'll be adding. Maybe also add a code
> snippet
> >> for StreamsConfig with the @Deprecated annotation added to the two
> configs
> >> we're deprecating
> >> 3. nit: remove the "questions" under "Compatibility, Deprecation, and
> >> Migration Plan", ie the stuff from the KIP-writing template. Just makes
> it
> >> easier to read
> >> 4. In "Test Plan" we should also have a unit test to make sure the new "
> >> windowed.inner.de/serializer.class" takes preference in the case that
> both
> >> it and the old "windowed.inner.serde.class" config are both specified
> >>
> >> Also, this is more of a question than a suggestion, but is the KIP title
> >> perhaps a bit misleading in that people might assume these configs will
> no
> >> longer be available for use at all? What do people think about changing
> it
> >> to something like
> >>
> >> *Move `window.size.ms <http://window.size.ms>` and
> >> `windowed.inner.serde.class` from `StreamsConfig` to
> >> TimeWindowedDe/Serializer class*
> >>
> >> A bit long/clunky but at least it gets the point across about where
> folks
> >> can find these configs going forward.
> >>
> >> On Mon, Mar 18, 2024 at 6:26 PM Lucia Cerchie
> >> <lc...@confluent.io.invalid>
> >> wrote:
> >>
> >>> Thanks for the discussion!
> >>>
> >>> I've updated the KIP here with what I believe are the relevant pieces,
> >>> please let me know if anything is missing:
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=290982804
> >>>
> >>> On Sun, Mar 17, 2024 at 7:09 PM Sophie Blee-Goldman <
> >> sophie@responsive.dev
> >>>>
> >>> wrote:
> >>>
> >>>> Sounds good!
> >>>>
> >>>> @Lucia when you have a moment can you update the KIP with
> >>>> the new proposal, including the details that Matthias pointed
> >>>> out in his last response? After that's done I think you can go
> >>>> ahead and call for a vote whenever you're ready!
> >>>>
> >>>> On Sat, Mar 16, 2024 at 7:35 PM Matthias J. Sax <mj...@apache.org>
> >>> wrote:
> >>>>
> >>>>> Thanks for the summary. Sounds right to me. That is what I would
> >>> propose.
> >>>>>
> >>>>> As you pointed out, we of course still need to support the current
> >>>>> confis, and we should log a warning when in use (even if the new one
> >> is
> >>>>> in use IMHO) -- but that's more an implementation detail.
> >>>>>
> >>>>> I agree that the new config should take preference in case both are
> >>>>> specified. This should be pointed out in the KIP, as it's an
> >> important
> >>>>> contract the user needs to understand.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 3/14/24 6:18 PM, Sophie Blee-Goldman wrote:
> >>>>>>>
> >>>>>>> Should we change it do `.serializer` and `.deserialize`?
> >>>>>>
> >>>>>> That's a good point -- if we're going to split this up by defining
> >>> the
> >>>>>> config
> >>>>>> in both the TimeWindowedSerializer and TimeWindowedDeserializer,
> >>>>>> then it makes perfect sense to go a step further and actually
> >> define
> >>>>>> only the relevant de/serializer class instead of the full serde
> >>>>>>
> >>>>>> Just to put this all together, it sounds like the proposal is to:
> >>>>>>
> >>>>>> 1) Deprecate both these configs where they appear in StreamsConfig
> >>>>>> (as per the original plan in the KIP, just reiterating it here)
> >>>>>>
> >>>>>> 2) Don't "define" either config in any specific client's Config
> >>> class,
> >>>>>> but just define a String variable with the config name in the
> >>> relevant
> >>>>>> de/serializer class, and maybe point people to it in the docs
> >>> somewhere
> >>>>>>
> >>>>>> 3) We would add three new public String variables for three
> >> different
> >>>>>> configs across two classes, specifically:
> >>>>>>
> >>>>>> In TimeWindowedSerializer:
> >>>>>>     - define a constant for "windowed.inner.serializer.class"
> >>>>>> In TimeWindowedDeserializer:
> >>>>>>     - define a constant for "windowed.inner.deserializer.class"
> >>>>>>     - define a constant for "window.size.ms"
> >>>>>>
> >>>>>> 4) Lastly, we would update the windowed de/serializer
> >> implementations
> >>>>>> to check for the new configs (ie "
> >> windowed.inner.de/serializer.class
> >>> ")
> >>>>>> and use the provided de/serializer class, if one was given. If the
> >>> new
> >>>>>> configs are not present, they would fall back to the
> >> original/current
> >>>>>> logic (ie that based on the old "windowed.inner.serde.class"
> >> config)
> >>>>>>
> >>>>>> I think that's everything. Does this sound about right for where we
> >>>> want
> >>>>>> to go with these configs?
> >>>>>>
> >>>>>> On Thu, Mar 14, 2024 at 4:58 PM Matthias J. Sax <mj...@apache.org>
> >>>>> wrote:
> >>>>>>
> >>>>>>>>> By "don't add them" do you just mean we would not have any
> >> actual
> >>>>>>>>> variables defined anywhere for these configs (eg WINDOW_SIZE_MS)
> >>>>>>>>> and simply document -- somewhere -- that one can use the string
> >>>>>>>>> "window.size.ms" when configuring a command-line client with a
> >>>>>>>>> windowed serde?
> >>>>>>>
> >>>>>>> Yes. That's the idea.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> I assume you aren't proposing
> >>>>>>>>> to remove the ability to use and understand this config from the
> >>>>>>>>> implementations themselves, but correct me if that's wrong.
> >>>>>>>
> >>>>>>> No, that would effectively break what we fixed with the original
> >> KIP
> >>>> :)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>> Are there any other configs in similar situations that we could
> >>> look
> >>>>>>>>> to for precedent?
> >>>>>>>
> >>>>>>> Not aware of any others, either.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>> If these are truly the first/only of their kind, I would vote to
> >>>> just
> >>>>>>> stick
> >>>>>>>>> them in the appropriate class. As for which class to put them
> >> in,
> >>> I
> >>>>>>>>> think I'm convinced that "window.size.ms" should only go in the
> >>>>>>>>> TimeWindowedDeserializer rather than sticking them both in the
> >>>>>>>>> TimeWindowedSerde as I originally suggested. However, I would
> >>>>>>>>> even go a step further and not place the
> >>> "inner.window.class.serde"
> >>>>>>>>> in the TimeWindowedSerde class either. To me, it actually makes
> >>>>>>>>> the most sense to define it in both the TimeWindowedSerializer
> >>>>>>>>> and the TimeWindowedDeserializer.
> >>>>>>>
> >>>>>>> Not sure either. What you propose is fine with me. However, I am
> >>>>>>> wondering about the config names... Why is it `serde` for this
> >> case?
> >>>>>>> Should we change it do `.serializer` and `.deserialize`?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/13/24 8:19 PM, Sophie Blee-Goldman wrote:
> >>>>>>>> By "don't add them" do you just mean we would not have any actual
> >>>>>>>> variables defined anywhere for these configs (eg WINDOW_SIZE_MS)
> >>>>>>>> and simply document -- somewhere -- that one can use the string
> >>>>>>>> "window.size.ms" when configuring a command-line client with a
> >>>>>>>> windowed serde? Or something else? I assume you aren't proposing
> >>>>>>>> to remove the ability to use and understand this config from the
> >>>>>>>> implementations themselves, but correct me if that's wrong.
> >>>>>>>>
> >>>>>>>> Are there any other configs in similar situations that we could
> >>> look
> >>>>>>>> to for precedent? I personally am not aware of any but by
> >>> definition
> >>>>>>>> I suppose these would be hard to discover unless you were
> >> actively
> >>>>>>>> looking for them, so I'm wondering if there might be other
> >> "shadow
> >>>>>>>> configs" elsewhere in the code base.
> >>>>>>>>
> >>>>>>>> If these are truly the first/only of their kind, I would vote to
> >>> just
> >>>>>>> stick
> >>>>>>>> them in the appropriate class. As for which class to put them
> >> in, I
> >>>>>>>> think I'm convinced that "window.size.ms" should only go in the
> >>>>>>>> TimeWindowedDeserializer rather than sticking them both in the
> >>>>>>>> TimeWindowedSerde as I originally suggested. However, I would
> >>>>>>>> even go a step further and not place the
> >> "inner.window.class.serde"
> >>>>>>>> in the TimeWindowedSerde class either. To me, it actually makes
> >>>>>>>> the most sense to define it in both the TimeWindowedSerializer
> >>>>>>>> and the TimeWindowedDeserializer.
> >>>>>>>>
> >>>>>>>> The reason being that, as discussed above, the only use case for
> >>>>>>>> these configs would be in the console consumer/producer which
> >>>>>>>> only uses the Serializer or Deserializer, and would never
> >> actually
> >>>>>>>> be used by/in Streams where we use the Serde version. And while
> >>>>>>>> defining the  "inner.window.class.serde" in two places might seem
> >>>>>>>> redundant, this would mean that all the configs needed to
> >> properly
> >>>>>>>> configure the specific class being used by the particular kind of
> >>>>>>>> consumer client -- that is, Deserializer for a console consumer
> >> and
> >>>>>>>> Serializer for a console producer -- would be located in that
> >> exact
> >>>>>>>> class. I assume this would make them much easier to discover
> >>>>>>>> and be used than having to search for configs defined in classes
> >>>>>>>> you don't even need for the console client, like the Serde form
> >>>>>>>>
> >>>>>>>> Just my two cents -- happy to hear other opinions on this
> >>>>>>>>
> >>>>>>>> On Mon, Mar 11, 2024 at 6:58 PM Matthias J. Sax <
> >> mjsax@apache.org>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Yes, it's used inside `TimeWindowedSerializer` and actually also
> >>>>> inside
> >>>>>>>>> `TimeWindowDeserializer`.
> >>>>>>>>>
> >>>>>>>>> However, it does IMHO not change that we should remove it from
> >>>>>>>>> `StreamsConfig` because both configs are not intended to be used
> >>> in
> >>>>> Java
> >>>>>>>>> code... If one writes Java code, they should use
> >>>>>>>>>
> >>>>>>>>>       new TimeWindowedSerializer(Serializer)
> >>>>>>>>>       new TimeWindowDeserializer(Deserializer, Long)
> >>>>>>>>>       new TimeWindowSerdes(Serde, Long)
> >>>>>>>>>
> >>>>>>>>> and thus they don't need either config.
> >>>>>>>>>
> >>>>>>>>> The configs are only needed for command line tool, that create
> >> the
> >>>>>>>>> (de)serializer via reflection using the default constructor.
> >>>>>>>>>
> >>>>>>>>> Does this make sense?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The only open question is really, if and where to add them...
> >>>> Strictly
> >>>>>>>>> speaking, we don't need either config as public variable as
> >> nobody
> >>>>>>>>> should use them in Java code. To me, it just feels right/better
> >> do
> >>>>> make
> >>>>>>>>> them public for documentation purpose that these configs exists?
> >>>>>>>>>
> >>>>>>>>> `inner.window.class.serde` has "serde" in it's name, so we could
> >>> add
> >>>>> it
> >>>>>>>>> to `TimeWindowSerdes`? For `window.size.ms`, it's only used by
> >>> the
> >>>>>>>>> deserialize to maybe add it there? Just some ideas. -- Or we
> >>>> sidestep
> >>>>>>>>> this question and just don't add them; also fine with me.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>> On 3/11/24 10:53 AM, Lucia Cerchie wrote:
> >>>>>>>>>> PS-- I was re-reading the PR that originated this discussion
> >> and
> >>>>>>> realized
> >>>>>>>>>> that `window.inner.serde.class` is used here
> >>>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://github.com/a0x8o/kafka/blob/master/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindowedSerializer.java#L44
> >>>>>>>>>>
> >>>>>>>>>> in KStreams. This goes against removing it, yes?
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Mar 11, 2024 at 10:40 AM Lucia Cerchie <
> >>>>> lcerchie@confluent.io>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Sophie, I'll add a paragraph about removing
> >>>>>>> `windowed.inner.serde.class`
> >>>>>>>>>>> to the KIP. I'll also add putting it in the
> >> `TimeWindowedSerde`
> >>>>> class
> >>>>>>>>> with
> >>>>>>>>>>> some add'tl guidance on the docs addition.
> >>>>>>>>>>>
> >>>>>>>>>>> Also, I double-checked setting window.size.ms on the client
> >> and
> >>>> it
> >>>>>>>>>>> doesn't throw an error at all, in response to Matthias's
> >>> question.
> >>>>>>>>> Changing
> >>>>>>>>>>> the KIP in response to that.
> >>>>>>>>>>>
> >>>>>>>>>>> On Sun, Mar 10, 2024 at 6:04 PM Sophie Blee-Goldman <
> >>>>>>>>> sophie@responsive.dev>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks for responding Matthias -- you got there first, but I
> >>> was
> >>>>>>> going
> >>>>>>>>> to
> >>>>>>>>>>>> say exactly the same thing as in your most reply. In other
> >>>> words, I
> >>>>>>> see
> >>>>>>>>>>>> the
> >>>>>>>>>>>> `windowed.inner.serde.class` as being in the same boat as
> >> the `
> >>>>>>>>>>>> window.size.ms` config, so whatever we do with one we should
> >>> do
> >>>>> for
> >>>>>>>>> the
> >>>>>>>>>>>> other.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I do agree with removing these from StreamsConfig, but
> >> defining
> >>>>> them
> >>>>>>> in
> >>>>>>>>>>>> ConsumerConfig feels weird as well. There's really no great
> >>>> answer
> >>>>>>>>> here.
> >>>>>>>>>>>>
> >>>>>>>>>>>> My only concern about adding it to the corresponding
> >>>>>>>>>>>> serde/serializer/deserializer class is that it might be
> >>> difficult
> >>>>> for
> >>>>>>>>>>>> people to find them. I generally assume that people tend not
> >> to
> >>>>> look
> >>>>>>> at
> >>>>>>>>>>>> the
> >>>>>>>>>>>> serde/serializer/deserializer classes/implementations. But
> >>> maybe
> >>>> in
> >>>>>>>>> this
> >>>>>>>>>>>> case, someone who actually needed these configs would be
> >> likely
> >>>> to
> >>>>> be
> >>>>>>>>>>>> motivated enough to find them by looking at the class? And
> >> with
> >>>>>>>>> sufficient
> >>>>>>>>>>>> documentation, it's likely not a problem. So, I'm +1 on
> >> putting
> >>>> it
> >>>>>>> into
> >>>>>>>>>>>> the
> >>>>>>>>>>>> TimeWindowedSerde class
> >>>>>>>>>>>>
> >>>>>>>>>>>> (I would personally stick them into the serde class, rather
> >>> than
> >>>>> the
> >>>>>>>>>>>> serializer and/or deserializer, but I could be convinced
> >> either
> >>>>> way)
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Mar 1, 2024 at 3:00 PM Matthias J. Sax <
> >>> mjsax@apache.org
> >>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> One more thought after I did some more digging on the
> >> related
> >>>> PR.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Should we do the same thing for
> >> `windowed.inner.serde.class`?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Both config belong to windowed serdes (which KS provide) but
> >>> the
> >>>>> KS
> >>>>>>>>> code
> >>>>>>>>>>>>> itself does never use them (and in fact, disallows to use
> >> them
> >>>> and
> >>>>>>>>> would
> >>>>>>>>>>>>> throw an error is used). Both are intended for plain
> >> consumer
> >>>> use
> >>>>>>>>> cases
> >>>>>>>>>>>>> for which the window serdes are used.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The question to me is, should we add them back somewhere
> >> else?
> >>>> It
> >>>>>>> does
> >>>>>>>>>>>>> not really belong into `ConsumerConfig` either, but maybe we
> >>>> could
> >>>>>>> add
> >>>>>>>>>>>>> them to the corresponding serde or (de)serialize classes?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 2/21/24 2:41 PM, Matthias J. Sax wrote:
> >>>>>>>>>>>>>> Thanks for the KIP. Sounds like a nice cleanup.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> window.size.ms  is not a true KafkaStreams config, and
> >>>> results
> >>>>> in
> >>>>>>>>> an
> >>>>>>>>>>>>>>> error when set from a KStreams application
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What error?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Given that the configs is used by
> >> `TimeWindowedDeserializer`
> >>> I
> >>>> am
> >>>>>>>>>>>>>> wondering if we should additionally add
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> public class TimeWindowedDeserializer {
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>          public static final String WINDOW_SIZE_MS_CONFIG =
> >> "
> >>>>>>>>>>>> window.size.ms
> >>>>>>>>>>>>> ";
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2/15/24 6:32 AM, Lucia Cerchie wrote:
> >>>>>>>>>>>>>>> Hey everyone,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I'd like to discuss KIP-1020
> >>>>>>>>>>>>>>> <
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=290982804
> >>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>> which would move to deprecate `window.size.ms` in
> >>>>> `StreamsConfig`
> >>>>>>>>>>>>> since `
> >>>>>>>>>>>>>>> window.size.ms` is a client config.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks in advance!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Lucia Cerchie
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>>
> >>>>>>>>>>> [image: Confluent] <https://www.confluent.io>
> >>>>>>>>>>> Lucia Cerchie
> >>>>>>>>>>> Developer Advocate
> >>>>>>>>>>> Follow us: [image: Blog]
> >>>>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
> >>>>>>>>>> [image:
> >>>>>>>>>>> Twitter] <https://twitter.com/ConfluentInc>[image: Slack]
> >>>>>>>>>>> <https://slackpass.io/confluentcommunity>[image: YouTube]
> >>>>>>>>>>> <https://youtube.com/confluent>
> >>>>>>>>>>>
> >>>>>>>>>>> [image: Try Confluent Cloud for Free]
> >>>>>>>>>>> <
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> [image: Confluent] <https://www.confluent.io>
> >>> Lucia Cerchie
> >>> Developer Advocate
> >>> Follow us: [image: Blog]
> >>> <
> >>>
> >>
> https://www.confluent.io/blog?utm_source=footer&utm_medium=email&utm_campaign=ch.email-signature_type.community_content.blog
> >>>> [image:
> >>> Twitter] <https://twitter.com/ConfluentInc>[image: Slack]
> >>> <https://slackpass.io/confluentcommunity>[image: YouTube]
> >>> <https://youtube.com/confluent>
> >>>
> >>> [image: Try Confluent Cloud for Free]
> >>> <
> >>>
> >>
> https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound&utm_source=gmail&utm_medium=organic
> >>>>
> >>>
> >>
> >
> >
>