You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Nick Telford <ni...@gmail.com> on 2021/12/20 14:14:54 UTC

[DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

This is a KIP for a proposed solution to KAFKA-13549
<https://issues.apache.org/jira/browse/KAFKA-13549>. The solution is very
simple, so the KIP is pretty short.

The suggested changes are implemented by this PR
<https://github.com/apache/kafka/pull/11610>.
--
Nick Telford

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Nick Telford <ni...@gmail.com>.
> The unit of frequency is 1/s. What we are actually setting with this
> config is the period. So frequency is used a bit wrongly here. Since I
> discovered that we use the word "frequency" also for other similar
> configs, I leave it up to you if you want to change it in this case or
> not. I am fine either way.

I'm embarrassed to say this is not the first time I've made this mistake
and been corrected on it!

I've fixed it as I can't imagine anyone objecting, and I've also dropped
the "min." prefix.

For the VOTE thread, since no votes have been made yet, I'm going to assume
that they're made on the latest version of the KIP.

Regards,

Nick

On Thu, 13 Jan 2022 at 10:29, Bruno Cadonna <ca...@apache.org> wrote:

> Hi Nick,
>
> Dropping the `min.` prefix is fine with me.
>
> A really minor nit-picking:
>
> "The minimum frequency in milliseconds ..."
>
> The unit of frequency is 1/s. What we are actually setting with this
> config is the period. So frequency is used a bit wrongly here. Since I
> discovered that we use the word "frequency" also for other similar
> configs, I leave it up to you if you want to change it in this case or
> not. I am fine either way.
>
> Best,
> Bruno
>
> On 13.01.22 10:00, Nick Telford wrote:
> > Hi Matthias,
> >
> > You raise a good point: commit.interval.ms only specifies a minimum
> > interval, because commits are powered by record flow through the
> Topology,
> > if there are no input records, there will be no commits. The parallel
> here
> > is that repartition record purges are powered by commits, and if there
> are
> > no commits, there will be no record purges.
> >
> > Bruno, does this satisfy your misgivings with the name
> > repartition.purge.interval.ms?
> >
> >> For the config description, I would remove
> >
> >>> The default value is the same as the default for commit.interval.ms
> > (30000).
> >
> >> It seems unnecessary to me.
> >
> > My intent was to communicate to users that the default has been set to
> > match the commit interval, for compatibility. But since the default value
> > is provided elsewhere, I can see why it's redundant. I'll remove that.
> >
> > Thanks,
> >
> > Nick
> >
> > On Thu, 13 Jan 2022 at 03:14, Matthias J. Sax <mj...@apache.org> wrote:
> >
> >> Thanks for the KIP! Very good discussion.
> >>
> >> I want to raise one point: I am not a fan of the `min.` prefix of the
> >> config, because all configs like this are _minimums_. We also use
> >> `commit.interval.ms` and not `min.commit.interval.ms` for example. I
> >> would suggest to strip the `min.` prefix.
> >>
> >>
> >> For the config description, I would remove
> >>
> >>> The default value is the same as the default for commit.interval.ms
> >> (30000).
> >>
> >> It seems unnecessary to me.
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 1/12/22 7:21 AM, Nick Telford wrote:
> >>> Thanks Bruno, I've made those changes.
> >>>
> >>> I'll call a vote on the KIP later today.
> >>>
> >>> Regards,
> >>>
> >>> Nick Telford
> >>>
> >>> On Wed, 12 Jan 2022 at 12:13, Bruno Cadonna <ca...@apache.org>
> wrote:
> >>>
> >>>> Hi Nick,
> >>>>
> >>>> Great!
> >>>>
> >>>> I think the KIP is ready for voting. I have just a couple of minor
> >>>> comments.
> >>>>
> >>>> a.
> >>>> In the config description, I would replace
> >>>>
> >>>> "Purging will occur after at least this value since the last purge,
> but
> >>>> may be delayed until later in order to meet the processing guarantee."
> >>>>
> >>>> with
> >>>>
> >>>> "Purging will occur after at least this value since the last purge,
> but
> >>>> may be delayed until later."
> >>>>
> >>>> I do not really understand what you mean with "meet the processing
> >>>> guarantee" and I think it suffices to say that the purge might be
> >> delayed.
> >>>>
> >>>>
> >>>> b.
> >>>> I would change the title to
> >>>>
> >>>> "Add config min.repartition.purge.interval.ms to Kafka Streams"
> >>>>
> >>>>
> >>>> c.
> >>>> The current rejected alternative leaks implementation details since it
> >>>> refers to the coupling of purge to commit and we agreed to leave that
> >>>> out of the KIP.
> >>>>
> >>>>
> >>>> d.
> >>>> Could you add my proposal to specify the config as a multiple of the
> >>>> commit interval to the rejected alternatives with the reason why we
> >>>> discarded it?
> >>>>
> >>>> For the rest, I am +1.
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>>
> >>>>
> >>>> On 11.01.22 16:47, Nick Telford wrote:
> >>>>> Hi Bruno,
> >>>>>
> >>>>> Thanks for your recommendations.
> >>>>>
> >>>>> I've made those changes, and also updated the description for the new
> >>>>> config to read: "The minimum frequency in milliseconds with which to
> >>>> delete
> >>>>> fully consumed records from repartition topics. *Purging will occur
> >> after
> >>>>> at least this value since the last purge, but may be delayed until
> >> later
> >>>> in
> >>>>> order to meet the processing guarantee.* The default value is the
> same
> >> as
> >>>>> the default for commit.interval.ms (30000).  (Note, unlike
> >>>>> commit.interval.ms, the default for this value remains unchanged
> when
> >>>>> processing.guarantee is set to exactly_once_v2)."
> >>>>>
> >>>>> This should make it clear that this is just a minimum interval,
> without
> >>>>> leaking too much detail in to the specification.
> >>>>>
> >>>>> If there are no other issues, I'll put this to a vote.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Nick Telford
> >>>>>
> >>>>> On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <ca...@apache.org>
> >> wrote:
> >>>>>
> >>>>>> Hi Nick,
> >>>>>>
> >>>>>> Sorry for the delay!
> >>>>>>
> >>>>>> Regarding point 7, I am fine with keeping the config as an interval
> >> and
> >>>>>> keeping it independently of the commit interval. However, I would
> then
> >>>>>> remove the following paragraph from the KIP:
> >>>>>>
> >>>>>> "We will still wait for a commit before explicitly deleting
> >> repartition
> >>>>>> records, but we will only do so if the time since the last record
> >>>>>> deletion is at least repartition.purge.interval.ms. This means the
> >>>>>> lower-bound for repartition.purge.interval.ms  is effectively
> capped
> >> by
> >>>>>> the value of commit.interval.ms."
> >>>>>>
> >>>>>> The reason is that in the previous paragraph you say that the
> configs
> >>>>>> can be modified separately and then in this paragraph you bind the
> >> purge
> >>>>>> interval to the commit interval. This seems a contradiction and
> >>>>>> indicates that you are leaking too much implementation details into
> >> the
> >>>>>> KIP. I think, just saying that the purge interval is a minimum and
> >> name
> >>>>>> it accordingly without talking about the actual implementation makes
> >> the
> >>>>>> KIP more robust against future implementation changes.
> >>>>>>
> >>>>>> My proposal for the config name is
> min.repartition.purge.interval.ms
> >> or
> >>>>>> even min.purge.interval.ms with a preference for the former.
> >>>>>>
> >>>>>> Best,
> >>>>>> Bruno
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 04.01.22 17:21, Nick Telford wrote:
> >>>>>>>>
> >>>>>>>> You missed one "delete.interval.ms" in the last paragraph in
> >> section
> >>>>>>>> "Proposed Changes".
> >>>>>>>>
> >>>>>>>
> >>>>>>> I've fixed these now, including the title that I somehow missed!
> >>>>>>>
> >>>>>>> I am afraid, I again need to comment on point 7. IMO, it does not
> >> make
> >>>>>>>
> >>>>>>> sense to be able to tune repartition.purge.interval.ms and
> >>>>>>>
> >>>>>>> commit.interval.ms separately when the purge can only happen
> during
> >> a
> >>>>>>>
> >>>>>>> commit. For example, if I set commit.interval.ms to 30000 ms and
> >>>>>>>
> >>>>>>> repartition.purge.interval.ms to 35000 ms, the records will be
> >> purged
> >>>> at
> >>>>>>>
> >>>>>>> every second commit, i.e., every 60000 ms. What benefit do users
> have
> >>>> to
> >>>>>>>
> >>>>>>> set repartition.purge.interval.ms separately from
> commit.interval.ms
> >> ?
> >>>>>>>
> >>>>>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >>>>>>>
> >>>>>>> 2*commit.interval.ms..
> >>>>>>>
> >>>>>>>
> >>>>>>> Could we address this by chaning the name of the configuration to
> >>>>>> something
> >>>>>>> like "repartition.purge.min.interval.ms", to indicate that the
> >>>>>> repartition
> >>>>>>> purge interval will be *at least* this value?
> >>>>>>>
> >>>>>>> If that's still not suitable, are there any other existing
> >>>> configurations
> >>>>>>> that behave in a similar way, i.e. dictate a multiple of another
> >>>>>> interval,
> >>>>>>> that we could use as a basis for a new name for this configuration?
> >>>>>>>
> >>>>>>> Additionally, I have a new point.
> >>>>>>>
> >>>>>>> 8. If user code has access to the processor context (e.g. in the
> >>>>>>>
> >>>>>>> processor API), a commit can also be requested on demand by user
> >> code.
> >>>>>>>
> >>>>>>> The KIP should clarify if purges might also happen during requested
> >>>>>>>
> >>>>>>> commits or if purges only happen during automatic commits.
> >>>>>>>
> >>>>>>>
> >>>>>>> Good point. I'm holding off on amending this for now until we agree
> >> on
> >>>> an
> >>>>>>> outcome for point 7 above, because I suspect it may at least
> >> influence
> >>>>>> the
> >>>>>>> wording of this section.
> >>>>>>>
> >>>>>>> Thanks for the feedback, and sorry for the delay. Now that the
> >> holidays
> >>>>>> are
> >>>>>>> over I'm able to focus on this again.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Nick Telford
> >>>>>>>
> >>>>>>> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <ca...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Nick,
> >>>>>>>>
> >>>>>>>> Thanks for the updates!
> >>>>>>>>
> >>>>>>>> The motivation section and the description of the new config is
> much
> >>>>>>>> clearer and informative now!
> >>>>>>>>
> >>>>>>>> You missed one "delete.interval.ms" in the last paragraph in
> >> section
> >>>>>>>> "Proposed Changes".
> >>>>>>>>
> >>>>>>>> I am afraid, I again need to comment on point 7. IMO, it does not
> >> make
> >>>>>>>> sense to be able to tune repartition.purge.interval.ms and
> >>>>>>>> commit.interval.ms separately when the purge can only happen
> >> during a
> >>>>>>>> commit. For example, if I set commit.interval.ms to 30000 ms and
> >>>>>>>> repartition.purge.interval.ms to 35000 ms, the records will be
> >> purged
> >>>>>> at
> >>>>>>>> every second commit, i.e., every 60000 ms. What benefit do users
> >> have
> >>>> to
> >>>>>>>> set repartition.purge.interval.ms separately from
> >> commit.interval.ms?
> >>>>>>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >>>>>>>> 2*commit.interval.ms..
> >>>>>>>>
> >>>>>>>> Additionally, I have a new point.
> >>>>>>>> 8. If user code has access to the processor context (e.g. in the
> >>>>>>>> processor API), a commit can also be requested on demand by user
> >> code.
> >>>>>>>> The KIP should clarify if purges might also happen during
> requested
> >>>>>>>> commits or if purges only happen during automatic commits.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Bruno
> >>>>>>>>
> >>>>>>>> On 21.12.21 20:40, Nick Telford wrote:
> >>>>>>>>> Hi everyone,
> >>>>>>>>>
> >>>>>>>>> Thanks for your feedback. I've made the suggested changes to the
> >> KIP
> >>>>>> (1,
> >>>>>>>> 2,
> >>>>>>>>> 3 and 5).
> >>>>>>>>>
> >>>>>>>>> For the new name, I've chosen repartition.purge.interval.ms, as
> I
> >>>> felt
> >>>>>>>> it
> >>>>>>>>> struck a good balance between being self-descriptive and concise.
> >>>>>> Please
> >>>>>>>>> let me know if you'd prefer something else.
> >>>>>>>>>
> >>>>>>>>> On point 6: My PR has basic validation to ensure the value is
> >>>> positive,
> >>>>>>>> but
> >>>>>>>>> I don't think it's necessary to have dynamic validation, to
> ensure
> >>>> it's
> >>>>>>>> not
> >>>>>>>>> less than commit.interval.ms. The reason is that it will be
> >>>> implicitly
> >>>>>>>>> limited to that value anyway, and won't break anything. But I can
> >> add
> >>>>>> it
> >>>>>>>> if
> >>>>>>>>> you'd prefer it.
> >>>>>>>>>
> >>>>>>>>> On point 7: I worry that defaulting it to follow the value of
> >>>>>>>>> commit.interval.ms may confuse users, who will likely expect the
> >>>>>>>> default to
> >>>>>>>>> not be affected by changes to other configuration options. I can
> >> see
> >>>>>> the
> >>>>>>>>> appeal of retaining the existing behaviour (following the commit
> >>>>>>>> interval)
> >>>>>>>>> by default, but I believe that the majority of users who
> customize
> >>>>>>>>> commit.interval.ms do not desire a different frequency of
> >>>> repartition
> >>>>>>>>> record purging as well.
> >>>>>>>>>
> >>>>>>>>> As for multiples of commit interval: I think the argument against
> >>>> that
> >>>>>> is
> >>>>>>>>> that an interval is more intuitive when given as a time, rather
> >> than
> >>>>>> as a
> >>>>>>>>> multiple of some other operation. Users configuring this should
> not
> >>>>>> need
> >>>>>>>> to
> >>>>>>>>> break out a calculator to work out how frequently the records are
> >>>> going
> >>>>>>>> to
> >>>>>>>>> be purged!
> >>>>>>>>>
> >>>>>>>>> I've also updated the PR with the relevant changes.
> >>>>>>>>>
> >>>>>>>>> BTW, for some reason I didn't receive Sophie's email. I'll
> >>>> periodically
> >>>>>>>>> check the thread on the archive to ensure I don't miss any more
> of
> >>>> your
> >>>>>>>>> messages!
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>>
> >>>>>>>>> Nick
> >>>>>>>>>
> >>>>>>>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com>
> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks, Bruno.
> >>>>>>>>>>
> >>>>>>>>>> I agree my point 4 is more like PR discussion, not KIP
> discussion.
> >>>>>>>>>> @Nick , please update my point 4 in PR directly.
> >>>>>>>>>>
> >>>>>>>>>> Thank you.
> >>>>>>>>>> Luke
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <
> cadonna@apache.org
> >>>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Nick,
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you for the KIP!
> >>>>>>>>>>>
> >>>>>>>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I
> >> agree
> >>>>>>>> that
> >>>>>>>>>>> we should update the docs for commit.interval.ms but I am not
> >> sure
> >>>>>> if
> >>>>>>>>>>> this needs to mentioned in the KIP. That seems to me more a PR
> >>>>>>>>>>> discussion. Also on point 2, I agree that we need to add a doc
> >>>> string
> >>>>>>>>>>> but the content should be exemplary not binding. What I want to
> >> say
> >>>>>> is
> >>>>>>>>>>> that, we do not need a KIP to change docs.
> >>>>>>>>>>>
> >>>>>>>>>>> Here my points:
> >>>>>>>>>>>
> >>>>>>>>>>> 5. Could you specify in the motivation that the KIP is about
> >>>> deleting
> >>>>>>>>>>> records from repartition topics? Maybe with a short description
> >>>> when
> >>>>>>>> why
> >>>>>>>>>>> and when records are deleted from the repartition topics. For
> us
> >> it
> >>>>>>>>>>> might be clear, but IMO we should try to write KIPs so that
> >> someone
> >>>>>>>> that
> >>>>>>>>>>> is relatively new to Kafka Streams can understand the KIP
> without
> >>>>>>>>>>> needing to know too much background.
> >>>>>>>>>>>
> >>>>>>>>>>> 6. Does the config need to be validated? For example, does
> >>>>>>>>>>> delete.interval.ms need to be greater than or equal to
> >>>>>>>>>> commit.interval.ms?
> >>>>>>>>>>>
> >>>>>>>>>>> 7. Should the default value for non-EOS be 30s or the same
> value
> >> as
> >>>>>>>>>>> commit.interval.ms? I am just thinking about the case where a
> >> user
> >>>>>>>>>>> explicitly changes commit.interval.ms but not
> delete.interval.ms
> >>>> (or
> >>>>>>>>>>> whatever name you come up for it). Once delete.interval.ms is
> >> set
> >>>>>>>>>>> explicitly it is decoupled from commit.interval.ms. Similar
> >> could
> >>>> be
> >>>>>>>>>>> done for the EOS case.
> >>>>>>>>>>> Alternatively, we could also define delete.interval.ms to
> take a
> >>>>>>>>>>> integral number without a unit that specifies after how many
> >> commit
> >>>>>>>>>>> intervals the records in repartition topics should be deleted.
> >> This
> >>>>>>>>>>> would make sense since delete.interval.ms is tightly bound to
> >>>>>>>>>>> commit.interval.ms. Additionally, it would make the semantics
> of
> >>>> the
> >>>>>>>>>>> config simpler. The name of the config should definitely change
> >> if
> >>>> we
> >>>>>>>> go
> >>>>>>>>>>> down this way.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Bruno
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 21.12.21 11:14, Luke Chen wrote:
> >>>>>>>>>>>> Hi Nick,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for the KIP!
> >>>>>>>>>>>>
> >>>>>>>>>>>> In addition to Sophie's comments, I have one more to this KIP:
> >>>>>>>>>>>> 3. I think you should mention the behavior change *explicitly*
> >> in
> >>>>>>>>>>>> "Compatibility" section. I know you already mentioned it in
> KIP,
> >>>> in
> >>>>>>>> the
> >>>>>>>>>>>> benefit way. But I think in this section, we should clearly
> >> point
> >>>>>> out
> >>>>>>>>>>> which
> >>>>>>>>>>>> behavior will be change after this KIP. That is, you should
> put
> >> it
> >>>>>>>>>> clear
> >>>>>>>>>>>> that the delete record interval will change from 100ms to 30s
> >> with
> >>>>>> EOS
> >>>>>>>>>>>> enabled. And it should also be mentioned in doc/upgrade.html
> >> doc.
> >>>>>>>>>>>> 4. Since this new config has some relationship with
> >>>>>>>> commit.interval.ms
> >>>>>>>>>> ,
> >>>>>>>>>>> I
> >>>>>>>>>>>> think we should also update the doc description for `
> >>>>>>>>>> commit.interval.ms
> >>>>>>>>>>> `,
> >>>>>>>>>>>> to let user know there's another config to control delete
> >> interval
> >>>>>> and
> >>>>>>>>>>>> should be greater than commit.interval.ms. Something like
> that.
> >>>>>> WDYT?
> >>>>>>>>>>> (You
> >>>>>>>>>>>> should put this change in the KIP as Sophie mentioned)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you.
> >>>>>>>>>>>> Luke
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> >>>>>>>>>>>> <so...@confluent.io.invalid> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hey Nick,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I think you forgot to link to the KIP document, but I take it
> >>>> this
> >>>>>> is
> >>>>>>>>>>>>> it: KIP-811:
> >>>>>>>>>>>>> Add separate delete.interval.ms to Kafka Streams
> >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The overall proposal sounds good to me, just a few minor
> >> things:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>          1. Please specify everything needed to define this
> >> config
> >>>>>>>>>>> explicitly, ie
> >>>>>>>>>>>>>          all the arguments that will be passed in to the
> >>>>>>>>>>>>>          StreamsConfig's ConfigDef: in addition to the
> default
> >>>> value,
> >>>>>> we
> >>>>>>>>>>> need the
> >>>>>>>>>>>>>          config type (presumably a Long), the doc
> >>>>>>>>>>>>>          string, and the importance (probably "low", similar
> to
> >>>>>>>>>>>>> commit.interval.ms
> >>>>>>>>>>>>>          )
> >>>>>>>>>>>>>          2. Might be worth considering a slightly more
> >> descriptive
> >>>>>> name
> >>>>>>>> for
> >>>>>>>>>>> this
> >>>>>>>>>>>>>          config. Most users probably don't think about,
> >>>>>>>>>>>>>          or may not even be aware of, the deletion of
> consumed
> >>>>>> records by
> >>>>>>>>>>> Kafka
> >>>>>>>>>>>>>          Streams, so calling it something along
> >>>>>>>>>>>>>          the lines of "
> repartition.records.delete.interval.ms"
> >> or
> >>>> "
> >>>>>>>>>>>>>          consumed.records.deletion.interval.ms" or so on
> will
> >> help
> >>>>>>>>>>>>>          make it clear what the config refers to and whether
> or
> >> not
> >>>>>> they
> >>>>>>>>>>> need to
> >>>>>>>>>>>>>          care. Maybe you can come up with better
> >>>>>>>>>>>>>          and/or shorter names, just wanted to suggest some
> >> example
> >>>>>> names
> >>>>>>>>>>> that I
> >>>>>>>>>>>>>          think sufficiently get the point across
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Other than that I'm +1 -- thanks for the KIP!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sophie
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <
> >>>>>> nick.telford@gmail.com
> >>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
> >>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The
> >>>> solution
> >>>>>>>> is
> >>>>>>>>>>>>> very
> >>>>>>>>>>>>>> simple, so the KIP is pretty short.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The suggested changes are implemented by this PR
> >>>>>>>>>>>>>> <https://github.com/apache/kafka/pull/11610>.
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Nick Telford
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Nick,

Dropping the `min.` prefix is fine with me.

A really minor nit-picking:

"The minimum frequency in milliseconds ..."

The unit of frequency is 1/s. What we are actually setting with this 
config is the period. So frequency is used a bit wrongly here. Since I 
discovered that we use the word "frequency" also for other similar 
configs, I leave it up to you if you want to change it in this case or 
not. I am fine either way.

Best,
Bruno

On 13.01.22 10:00, Nick Telford wrote:
> Hi Matthias,
> 
> You raise a good point: commit.interval.ms only specifies a minimum
> interval, because commits are powered by record flow through the Topology,
> if there are no input records, there will be no commits. The parallel here
> is that repartition record purges are powered by commits, and if there are
> no commits, there will be no record purges.
> 
> Bruno, does this satisfy your misgivings with the name
> repartition.purge.interval.ms?
> 
>> For the config description, I would remove
> 
>>> The default value is the same as the default for commit.interval.ms
> (30000).
> 
>> It seems unnecessary to me.
> 
> My intent was to communicate to users that the default has been set to
> match the commit interval, for compatibility. But since the default value
> is provided elsewhere, I can see why it's redundant. I'll remove that.
> 
> Thanks,
> 
> Nick
> 
> On Thu, 13 Jan 2022 at 03:14, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Thanks for the KIP! Very good discussion.
>>
>> I want to raise one point: I am not a fan of the `min.` prefix of the
>> config, because all configs like this are _minimums_. We also use
>> `commit.interval.ms` and not `min.commit.interval.ms` for example. I
>> would suggest to strip the `min.` prefix.
>>
>>
>> For the config description, I would remove
>>
>>> The default value is the same as the default for commit.interval.ms
>> (30000).
>>
>> It seems unnecessary to me.
>>
>>
>> -Matthias
>>
>>
>>
>> On 1/12/22 7:21 AM, Nick Telford wrote:
>>> Thanks Bruno, I've made those changes.
>>>
>>> I'll call a vote on the KIP later today.
>>>
>>> Regards,
>>>
>>> Nick Telford
>>>
>>> On Wed, 12 Jan 2022 at 12:13, Bruno Cadonna <ca...@apache.org> wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> Great!
>>>>
>>>> I think the KIP is ready for voting. I have just a couple of minor
>>>> comments.
>>>>
>>>> a.
>>>> In the config description, I would replace
>>>>
>>>> "Purging will occur after at least this value since the last purge, but
>>>> may be delayed until later in order to meet the processing guarantee."
>>>>
>>>> with
>>>>
>>>> "Purging will occur after at least this value since the last purge, but
>>>> may be delayed until later."
>>>>
>>>> I do not really understand what you mean with "meet the processing
>>>> guarantee" and I think it suffices to say that the purge might be
>> delayed.
>>>>
>>>>
>>>> b.
>>>> I would change the title to
>>>>
>>>> "Add config min.repartition.purge.interval.ms to Kafka Streams"
>>>>
>>>>
>>>> c.
>>>> The current rejected alternative leaks implementation details since it
>>>> refers to the coupling of purge to commit and we agreed to leave that
>>>> out of the KIP.
>>>>
>>>>
>>>> d.
>>>> Could you add my proposal to specify the config as a multiple of the
>>>> commit interval to the rejected alternatives with the reason why we
>>>> discarded it?
>>>>
>>>> For the rest, I am +1.
>>>>
>>>> Best,
>>>> Bruno
>>>>
>>>>
>>>>
>>>> On 11.01.22 16:47, Nick Telford wrote:
>>>>> Hi Bruno,
>>>>>
>>>>> Thanks for your recommendations.
>>>>>
>>>>> I've made those changes, and also updated the description for the new
>>>>> config to read: "The minimum frequency in milliseconds with which to
>>>> delete
>>>>> fully consumed records from repartition topics. *Purging will occur
>> after
>>>>> at least this value since the last purge, but may be delayed until
>> later
>>>> in
>>>>> order to meet the processing guarantee.* The default value is the same
>> as
>>>>> the default for commit.interval.ms (30000).  (Note, unlike
>>>>> commit.interval.ms, the default for this value remains unchanged when
>>>>> processing.guarantee is set to exactly_once_v2)."
>>>>>
>>>>> This should make it clear that this is just a minimum interval, without
>>>>> leaking too much detail in to the specification.
>>>>>
>>>>> If there are no other issues, I'll put this to a vote.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Nick Telford
>>>>>
>>>>> On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <ca...@apache.org>
>> wrote:
>>>>>
>>>>>> Hi Nick,
>>>>>>
>>>>>> Sorry for the delay!
>>>>>>
>>>>>> Regarding point 7, I am fine with keeping the config as an interval
>> and
>>>>>> keeping it independently of the commit interval. However, I would then
>>>>>> remove the following paragraph from the KIP:
>>>>>>
>>>>>> "We will still wait for a commit before explicitly deleting
>> repartition
>>>>>> records, but we will only do so if the time since the last record
>>>>>> deletion is at least repartition.purge.interval.ms. This means the
>>>>>> lower-bound for repartition.purge.interval.ms  is effectively capped
>> by
>>>>>> the value of commit.interval.ms."
>>>>>>
>>>>>> The reason is that in the previous paragraph you say that the configs
>>>>>> can be modified separately and then in this paragraph you bind the
>> purge
>>>>>> interval to the commit interval. This seems a contradiction and
>>>>>> indicates that you are leaking too much implementation details into
>> the
>>>>>> KIP. I think, just saying that the purge interval is a minimum and
>> name
>>>>>> it accordingly without talking about the actual implementation makes
>> the
>>>>>> KIP more robust against future implementation changes.
>>>>>>
>>>>>> My proposal for the config name is min.repartition.purge.interval.ms
>> or
>>>>>> even min.purge.interval.ms with a preference for the former.
>>>>>>
>>>>>> Best,
>>>>>> Bruno
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 04.01.22 17:21, Nick Telford wrote:
>>>>>>>>
>>>>>>>> You missed one "delete.interval.ms" in the last paragraph in
>> section
>>>>>>>> "Proposed Changes".
>>>>>>>>
>>>>>>>
>>>>>>> I've fixed these now, including the title that I somehow missed!
>>>>>>>
>>>>>>> I am afraid, I again need to comment on point 7. IMO, it does not
>> make
>>>>>>>
>>>>>>> sense to be able to tune repartition.purge.interval.ms and
>>>>>>>
>>>>>>> commit.interval.ms separately when the purge can only happen during
>> a
>>>>>>>
>>>>>>> commit. For example, if I set commit.interval.ms to 30000 ms and
>>>>>>>
>>>>>>> repartition.purge.interval.ms to 35000 ms, the records will be
>> purged
>>>> at
>>>>>>>
>>>>>>> every second commit, i.e., every 60000 ms. What benefit do users have
>>>> to
>>>>>>>
>>>>>>> set repartition.purge.interval.ms separately from commit.interval.ms
>> ?
>>>>>>>
>>>>>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
>>>>>>>
>>>>>>> 2*commit.interval.ms..
>>>>>>>
>>>>>>>
>>>>>>> Could we address this by chaning the name of the configuration to
>>>>>> something
>>>>>>> like "repartition.purge.min.interval.ms", to indicate that the
>>>>>> repartition
>>>>>>> purge interval will be *at least* this value?
>>>>>>>
>>>>>>> If that's still not suitable, are there any other existing
>>>> configurations
>>>>>>> that behave in a similar way, i.e. dictate a multiple of another
>>>>>> interval,
>>>>>>> that we could use as a basis for a new name for this configuration?
>>>>>>>
>>>>>>> Additionally, I have a new point.
>>>>>>>
>>>>>>> 8. If user code has access to the processor context (e.g. in the
>>>>>>>
>>>>>>> processor API), a commit can also be requested on demand by user
>> code.
>>>>>>>
>>>>>>> The KIP should clarify if purges might also happen during requested
>>>>>>>
>>>>>>> commits or if purges only happen during automatic commits.
>>>>>>>
>>>>>>>
>>>>>>> Good point. I'm holding off on amending this for now until we agree
>> on
>>>> an
>>>>>>> outcome for point 7 above, because I suspect it may at least
>> influence
>>>>>> the
>>>>>>> wording of this section.
>>>>>>>
>>>>>>> Thanks for the feedback, and sorry for the delay. Now that the
>> holidays
>>>>>> are
>>>>>>> over I'm able to focus on this again.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Nick Telford
>>>>>>>
>>>>>>> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <ca...@apache.org>
>>>> wrote:
>>>>>>>
>>>>>>>> Hi Nick,
>>>>>>>>
>>>>>>>> Thanks for the updates!
>>>>>>>>
>>>>>>>> The motivation section and the description of the new config is much
>>>>>>>> clearer and informative now!
>>>>>>>>
>>>>>>>> You missed one "delete.interval.ms" in the last paragraph in
>> section
>>>>>>>> "Proposed Changes".
>>>>>>>>
>>>>>>>> I am afraid, I again need to comment on point 7. IMO, it does not
>> make
>>>>>>>> sense to be able to tune repartition.purge.interval.ms and
>>>>>>>> commit.interval.ms separately when the purge can only happen
>> during a
>>>>>>>> commit. For example, if I set commit.interval.ms to 30000 ms and
>>>>>>>> repartition.purge.interval.ms to 35000 ms, the records will be
>> purged
>>>>>> at
>>>>>>>> every second commit, i.e., every 60000 ms. What benefit do users
>> have
>>>> to
>>>>>>>> set repartition.purge.interval.ms separately from
>> commit.interval.ms?
>>>>>>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
>>>>>>>> 2*commit.interval.ms..
>>>>>>>>
>>>>>>>> Additionally, I have a new point.
>>>>>>>> 8. If user code has access to the processor context (e.g. in the
>>>>>>>> processor API), a commit can also be requested on demand by user
>> code.
>>>>>>>> The KIP should clarify if purges might also happen during requested
>>>>>>>> commits or if purges only happen during automatic commits.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Bruno
>>>>>>>>
>>>>>>>> On 21.12.21 20:40, Nick Telford wrote:
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> Thanks for your feedback. I've made the suggested changes to the
>> KIP
>>>>>> (1,
>>>>>>>> 2,
>>>>>>>>> 3 and 5).
>>>>>>>>>
>>>>>>>>> For the new name, I've chosen repartition.purge.interval.ms, as I
>>>> felt
>>>>>>>> it
>>>>>>>>> struck a good balance between being self-descriptive and concise.
>>>>>> Please
>>>>>>>>> let me know if you'd prefer something else.
>>>>>>>>>
>>>>>>>>> On point 6: My PR has basic validation to ensure the value is
>>>> positive,
>>>>>>>> but
>>>>>>>>> I don't think it's necessary to have dynamic validation, to ensure
>>>> it's
>>>>>>>> not
>>>>>>>>> less than commit.interval.ms. The reason is that it will be
>>>> implicitly
>>>>>>>>> limited to that value anyway, and won't break anything. But I can
>> add
>>>>>> it
>>>>>>>> if
>>>>>>>>> you'd prefer it.
>>>>>>>>>
>>>>>>>>> On point 7: I worry that defaulting it to follow the value of
>>>>>>>>> commit.interval.ms may confuse users, who will likely expect the
>>>>>>>> default to
>>>>>>>>> not be affected by changes to other configuration options. I can
>> see
>>>>>> the
>>>>>>>>> appeal of retaining the existing behaviour (following the commit
>>>>>>>> interval)
>>>>>>>>> by default, but I believe that the majority of users who customize
>>>>>>>>> commit.interval.ms do not desire a different frequency of
>>>> repartition
>>>>>>>>> record purging as well.
>>>>>>>>>
>>>>>>>>> As for multiples of commit interval: I think the argument against
>>>> that
>>>>>> is
>>>>>>>>> that an interval is more intuitive when given as a time, rather
>> than
>>>>>> as a
>>>>>>>>> multiple of some other operation. Users configuring this should not
>>>>>> need
>>>>>>>> to
>>>>>>>>> break out a calculator to work out how frequently the records are
>>>> going
>>>>>>>> to
>>>>>>>>> be purged!
>>>>>>>>>
>>>>>>>>> I've also updated the PR with the relevant changes.
>>>>>>>>>
>>>>>>>>> BTW, for some reason I didn't receive Sophie's email. I'll
>>>> periodically
>>>>>>>>> check the thread on the archive to ensure I don't miss any more of
>>>> your
>>>>>>>>> messages!
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks, Bruno.
>>>>>>>>>>
>>>>>>>>>> I agree my point 4 is more like PR discussion, not KIP discussion.
>>>>>>>>>> @Nick , please update my point 4 in PR directly.
>>>>>>>>>>
>>>>>>>>>> Thank you.
>>>>>>>>>> Luke
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <cadonna@apache.org
>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Nick,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for the KIP!
>>>>>>>>>>>
>>>>>>>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I
>> agree
>>>>>>>> that
>>>>>>>>>>> we should update the docs for commit.interval.ms but I am not
>> sure
>>>>>> if
>>>>>>>>>>> this needs to mentioned in the KIP. That seems to me more a PR
>>>>>>>>>>> discussion. Also on point 2, I agree that we need to add a doc
>>>> string
>>>>>>>>>>> but the content should be exemplary not binding. What I want to
>> say
>>>>>> is
>>>>>>>>>>> that, we do not need a KIP to change docs.
>>>>>>>>>>>
>>>>>>>>>>> Here my points:
>>>>>>>>>>>
>>>>>>>>>>> 5. Could you specify in the motivation that the KIP is about
>>>> deleting
>>>>>>>>>>> records from repartition topics? Maybe with a short description
>>>> when
>>>>>>>> why
>>>>>>>>>>> and when records are deleted from the repartition topics. For us
>> it
>>>>>>>>>>> might be clear, but IMO we should try to write KIPs so that
>> someone
>>>>>>>> that
>>>>>>>>>>> is relatively new to Kafka Streams can understand the KIP without
>>>>>>>>>>> needing to know too much background.
>>>>>>>>>>>
>>>>>>>>>>> 6. Does the config need to be validated? For example, does
>>>>>>>>>>> delete.interval.ms need to be greater than or equal to
>>>>>>>>>> commit.interval.ms?
>>>>>>>>>>>
>>>>>>>>>>> 7. Should the default value for non-EOS be 30s or the same value
>> as
>>>>>>>>>>> commit.interval.ms? I am just thinking about the case where a
>> user
>>>>>>>>>>> explicitly changes commit.interval.ms but not delete.interval.ms
>>>> (or
>>>>>>>>>>> whatever name you come up for it). Once delete.interval.ms is
>> set
>>>>>>>>>>> explicitly it is decoupled from commit.interval.ms. Similar
>> could
>>>> be
>>>>>>>>>>> done for the EOS case.
>>>>>>>>>>> Alternatively, we could also define delete.interval.ms to take a
>>>>>>>>>>> integral number without a unit that specifies after how many
>> commit
>>>>>>>>>>> intervals the records in repartition topics should be deleted.
>> This
>>>>>>>>>>> would make sense since delete.interval.ms is tightly bound to
>>>>>>>>>>> commit.interval.ms. Additionally, it would make the semantics of
>>>> the
>>>>>>>>>>> config simpler. The name of the config should definitely change
>> if
>>>> we
>>>>>>>> go
>>>>>>>>>>> down this way.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Bruno
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 21.12.21 11:14, Luke Chen wrote:
>>>>>>>>>>>> Hi Nick,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the KIP!
>>>>>>>>>>>>
>>>>>>>>>>>> In addition to Sophie's comments, I have one more to this KIP:
>>>>>>>>>>>> 3. I think you should mention the behavior change *explicitly*
>> in
>>>>>>>>>>>> "Compatibility" section. I know you already mentioned it in KIP,
>>>> in
>>>>>>>> the
>>>>>>>>>>>> benefit way. But I think in this section, we should clearly
>> point
>>>>>> out
>>>>>>>>>>> which
>>>>>>>>>>>> behavior will be change after this KIP. That is, you should put
>> it
>>>>>>>>>> clear
>>>>>>>>>>>> that the delete record interval will change from 100ms to 30s
>> with
>>>>>> EOS
>>>>>>>>>>>> enabled. And it should also be mentioned in doc/upgrade.html
>> doc.
>>>>>>>>>>>> 4. Since this new config has some relationship with
>>>>>>>> commit.interval.ms
>>>>>>>>>> ,
>>>>>>>>>>> I
>>>>>>>>>>>> think we should also update the doc description for `
>>>>>>>>>> commit.interval.ms
>>>>>>>>>>> `,
>>>>>>>>>>>> to let user know there's another config to control delete
>> interval
>>>>>> and
>>>>>>>>>>>> should be greater than commit.interval.ms. Something like that.
>>>>>> WDYT?
>>>>>>>>>>> (You
>>>>>>>>>>>> should put this change in the KIP as Sophie mentioned)
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you.
>>>>>>>>>>>> Luke
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
>>>>>>>>>>>> <so...@confluent.io.invalid> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Nick,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think you forgot to link to the KIP document, but I take it
>>>> this
>>>>>> is
>>>>>>>>>>>>> it: KIP-811:
>>>>>>>>>>>>> Add separate delete.interval.ms to Kafka Streams
>>>>>>>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The overall proposal sounds good to me, just a few minor
>> things:
>>>>>>>>>>>>>
>>>>>>>>>>>>>          1. Please specify everything needed to define this
>> config
>>>>>>>>>>> explicitly, ie
>>>>>>>>>>>>>          all the arguments that will be passed in to the
>>>>>>>>>>>>>          StreamsConfig's ConfigDef: in addition to the default
>>>> value,
>>>>>> we
>>>>>>>>>>> need the
>>>>>>>>>>>>>          config type (presumably a Long), the doc
>>>>>>>>>>>>>          string, and the importance (probably "low", similar to
>>>>>>>>>>>>> commit.interval.ms
>>>>>>>>>>>>>          )
>>>>>>>>>>>>>          2. Might be worth considering a slightly more
>> descriptive
>>>>>> name
>>>>>>>> for
>>>>>>>>>>> this
>>>>>>>>>>>>>          config. Most users probably don't think about,
>>>>>>>>>>>>>          or may not even be aware of, the deletion of consumed
>>>>>> records by
>>>>>>>>>>> Kafka
>>>>>>>>>>>>>          Streams, so calling it something along
>>>>>>>>>>>>>          the lines of "repartition.records.delete.interval.ms"
>> or
>>>> "
>>>>>>>>>>>>>          consumed.records.deletion.interval.ms" or so on will
>> help
>>>>>>>>>>>>>          make it clear what the config refers to and whether or
>> not
>>>>>> they
>>>>>>>>>>> need to
>>>>>>>>>>>>>          care. Maybe you can come up with better
>>>>>>>>>>>>>          and/or shorter names, just wanted to suggest some
>> example
>>>>>> names
>>>>>>>>>>> that I
>>>>>>>>>>>>>          think sufficiently get the point across
>>>>>>>>>>>>>
>>>>>>>>>>>>> Other than that I'm +1 -- thanks for the KIP!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sophie
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <
>>>>>> nick.telford@gmail.com
>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
>>>>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The
>>>> solution
>>>>>>>> is
>>>>>>>>>>>>> very
>>>>>>>>>>>>>> simple, so the KIP is pretty short.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The suggested changes are implemented by this PR
>>>>>>>>>>>>>> <https://github.com/apache/kafka/pull/11610>.
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Nick Telford
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Nick Telford <ni...@gmail.com>.
Hi Matthias,

You raise a good point: commit.interval.ms only specifies a minimum
interval, because commits are powered by record flow through the Topology,
if there are no input records, there will be no commits. The parallel here
is that repartition record purges are powered by commits, and if there are
no commits, there will be no record purges.

Bruno, does this satisfy your misgivings with the name
repartition.purge.interval.ms?

> For the config description, I would remove

> > The default value is the same as the default for commit.interval.ms
(30000).

> It seems unnecessary to me.

My intent was to communicate to users that the default has been set to
match the commit interval, for compatibility. But since the default value
is provided elsewhere, I can see why it's redundant. I'll remove that.

Thanks,

Nick

On Thu, 13 Jan 2022 at 03:14, Matthias J. Sax <mj...@apache.org> wrote:

> Thanks for the KIP! Very good discussion.
>
> I want to raise one point: I am not a fan of the `min.` prefix of the
> config, because all configs like this are _minimums_. We also use
> `commit.interval.ms` and not `min.commit.interval.ms` for example. I
> would suggest to strip the `min.` prefix.
>
>
> For the config description, I would remove
>
> > The default value is the same as the default for commit.interval.ms
> (30000).
>
> It seems unnecessary to me.
>
>
> -Matthias
>
>
>
> On 1/12/22 7:21 AM, Nick Telford wrote:
> > Thanks Bruno, I've made those changes.
> >
> > I'll call a vote on the KIP later today.
> >
> > Regards,
> >
> > Nick Telford
> >
> > On Wed, 12 Jan 2022 at 12:13, Bruno Cadonna <ca...@apache.org> wrote:
> >
> >> Hi Nick,
> >>
> >> Great!
> >>
> >> I think the KIP is ready for voting. I have just a couple of minor
> >> comments.
> >>
> >> a.
> >> In the config description, I would replace
> >>
> >> "Purging will occur after at least this value since the last purge, but
> >> may be delayed until later in order to meet the processing guarantee."
> >>
> >> with
> >>
> >> "Purging will occur after at least this value since the last purge, but
> >> may be delayed until later."
> >>
> >> I do not really understand what you mean with "meet the processing
> >> guarantee" and I think it suffices to say that the purge might be
> delayed.
> >>
> >>
> >> b.
> >> I would change the title to
> >>
> >> "Add config min.repartition.purge.interval.ms to Kafka Streams"
> >>
> >>
> >> c.
> >> The current rejected alternative leaks implementation details since it
> >> refers to the coupling of purge to commit and we agreed to leave that
> >> out of the KIP.
> >>
> >>
> >> d.
> >> Could you add my proposal to specify the config as a multiple of the
> >> commit interval to the rejected alternatives with the reason why we
> >> discarded it?
> >>
> >> For the rest, I am +1.
> >>
> >> Best,
> >> Bruno
> >>
> >>
> >>
> >> On 11.01.22 16:47, Nick Telford wrote:
> >>> Hi Bruno,
> >>>
> >>> Thanks for your recommendations.
> >>>
> >>> I've made those changes, and also updated the description for the new
> >>> config to read: "The minimum frequency in milliseconds with which to
> >> delete
> >>> fully consumed records from repartition topics. *Purging will occur
> after
> >>> at least this value since the last purge, but may be delayed until
> later
> >> in
> >>> order to meet the processing guarantee.* The default value is the same
> as
> >>> the default for commit.interval.ms (30000).  (Note, unlike
> >>> commit.interval.ms, the default for this value remains unchanged when
> >>> processing.guarantee is set to exactly_once_v2)."
> >>>
> >>> This should make it clear that this is just a minimum interval, without
> >>> leaking too much detail in to the specification.
> >>>
> >>> If there are no other issues, I'll put this to a vote.
> >>>
> >>> Regards,
> >>>
> >>> Nick Telford
> >>>
> >>> On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <ca...@apache.org>
> wrote:
> >>>
> >>>> Hi Nick,
> >>>>
> >>>> Sorry for the delay!
> >>>>
> >>>> Regarding point 7, I am fine with keeping the config as an interval
> and
> >>>> keeping it independently of the commit interval. However, I would then
> >>>> remove the following paragraph from the KIP:
> >>>>
> >>>> "We will still wait for a commit before explicitly deleting
> repartition
> >>>> records, but we will only do so if the time since the last record
> >>>> deletion is at least repartition.purge.interval.ms. This means the
> >>>> lower-bound for repartition.purge.interval.ms  is effectively capped
> by
> >>>> the value of commit.interval.ms."
> >>>>
> >>>> The reason is that in the previous paragraph you say that the configs
> >>>> can be modified separately and then in this paragraph you bind the
> purge
> >>>> interval to the commit interval. This seems a contradiction and
> >>>> indicates that you are leaking too much implementation details into
> the
> >>>> KIP. I think, just saying that the purge interval is a minimum and
> name
> >>>> it accordingly without talking about the actual implementation makes
> the
> >>>> KIP more robust against future implementation changes.
> >>>>
> >>>> My proposal for the config name is min.repartition.purge.interval.ms
> or
> >>>> even min.purge.interval.ms with a preference for the former.
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>>
> >>>>
> >>>> On 04.01.22 17:21, Nick Telford wrote:
> >>>>>>
> >>>>>> You missed one "delete.interval.ms" in the last paragraph in
> section
> >>>>>> "Proposed Changes".
> >>>>>>
> >>>>>
> >>>>> I've fixed these now, including the title that I somehow missed!
> >>>>>
> >>>>> I am afraid, I again need to comment on point 7. IMO, it does not
> make
> >>>>>
> >>>>> sense to be able to tune repartition.purge.interval.ms and
> >>>>>
> >>>>> commit.interval.ms separately when the purge can only happen during
> a
> >>>>>
> >>>>> commit. For example, if I set commit.interval.ms to 30000 ms and
> >>>>>
> >>>>> repartition.purge.interval.ms to 35000 ms, the records will be
> purged
> >> at
> >>>>>
> >>>>> every second commit, i.e., every 60000 ms. What benefit do users have
> >> to
> >>>>>
> >>>>> set repartition.purge.interval.ms separately from commit.interval.ms
> ?
> >>>>>
> >>>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >>>>>
> >>>>> 2*commit.interval.ms..
> >>>>>
> >>>>>
> >>>>> Could we address this by chaning the name of the configuration to
> >>>> something
> >>>>> like "repartition.purge.min.interval.ms", to indicate that the
> >>>> repartition
> >>>>> purge interval will be *at least* this value?
> >>>>>
> >>>>> If that's still not suitable, are there any other existing
> >> configurations
> >>>>> that behave in a similar way, i.e. dictate a multiple of another
> >>>> interval,
> >>>>> that we could use as a basis for a new name for this configuration?
> >>>>>
> >>>>> Additionally, I have a new point.
> >>>>>
> >>>>> 8. If user code has access to the processor context (e.g. in the
> >>>>>
> >>>>> processor API), a commit can also be requested on demand by user
> code.
> >>>>>
> >>>>> The KIP should clarify if purges might also happen during requested
> >>>>>
> >>>>> commits or if purges only happen during automatic commits.
> >>>>>
> >>>>>
> >>>>> Good point. I'm holding off on amending this for now until we agree
> on
> >> an
> >>>>> outcome for point 7 above, because I suspect it may at least
> influence
> >>>> the
> >>>>> wording of this section.
> >>>>>
> >>>>> Thanks for the feedback, and sorry for the delay. Now that the
> holidays
> >>>> are
> >>>>> over I'm able to focus on this again.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Nick Telford
> >>>>>
> >>>>> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <ca...@apache.org>
> >> wrote:
> >>>>>
> >>>>>> Hi Nick,
> >>>>>>
> >>>>>> Thanks for the updates!
> >>>>>>
> >>>>>> The motivation section and the description of the new config is much
> >>>>>> clearer and informative now!
> >>>>>>
> >>>>>> You missed one "delete.interval.ms" in the last paragraph in
> section
> >>>>>> "Proposed Changes".
> >>>>>>
> >>>>>> I am afraid, I again need to comment on point 7. IMO, it does not
> make
> >>>>>> sense to be able to tune repartition.purge.interval.ms and
> >>>>>> commit.interval.ms separately when the purge can only happen
> during a
> >>>>>> commit. For example, if I set commit.interval.ms to 30000 ms and
> >>>>>> repartition.purge.interval.ms to 35000 ms, the records will be
> purged
> >>>> at
> >>>>>> every second commit, i.e., every 60000 ms. What benefit do users
> have
> >> to
> >>>>>> set repartition.purge.interval.ms separately from
> commit.interval.ms?
> >>>>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >>>>>> 2*commit.interval.ms..
> >>>>>>
> >>>>>> Additionally, I have a new point.
> >>>>>> 8. If user code has access to the processor context (e.g. in the
> >>>>>> processor API), a commit can also be requested on demand by user
> code.
> >>>>>> The KIP should clarify if purges might also happen during requested
> >>>>>> commits or if purges only happen during automatic commits.
> >>>>>>
> >>>>>> Best,
> >>>>>> Bruno
> >>>>>>
> >>>>>> On 21.12.21 20:40, Nick Telford wrote:
> >>>>>>> Hi everyone,
> >>>>>>>
> >>>>>>> Thanks for your feedback. I've made the suggested changes to the
> KIP
> >>>> (1,
> >>>>>> 2,
> >>>>>>> 3 and 5).
> >>>>>>>
> >>>>>>> For the new name, I've chosen repartition.purge.interval.ms, as I
> >> felt
> >>>>>> it
> >>>>>>> struck a good balance between being self-descriptive and concise.
> >>>> Please
> >>>>>>> let me know if you'd prefer something else.
> >>>>>>>
> >>>>>>> On point 6: My PR has basic validation to ensure the value is
> >> positive,
> >>>>>> but
> >>>>>>> I don't think it's necessary to have dynamic validation, to ensure
> >> it's
> >>>>>> not
> >>>>>>> less than commit.interval.ms. The reason is that it will be
> >> implicitly
> >>>>>>> limited to that value anyway, and won't break anything. But I can
> add
> >>>> it
> >>>>>> if
> >>>>>>> you'd prefer it.
> >>>>>>>
> >>>>>>> On point 7: I worry that defaulting it to follow the value of
> >>>>>>> commit.interval.ms may confuse users, who will likely expect the
> >>>>>> default to
> >>>>>>> not be affected by changes to other configuration options. I can
> see
> >>>> the
> >>>>>>> appeal of retaining the existing behaviour (following the commit
> >>>>>> interval)
> >>>>>>> by default, but I believe that the majority of users who customize
> >>>>>>> commit.interval.ms do not desire a different frequency of
> >> repartition
> >>>>>>> record purging as well.
> >>>>>>>
> >>>>>>> As for multiples of commit interval: I think the argument against
> >> that
> >>>> is
> >>>>>>> that an interval is more intuitive when given as a time, rather
> than
> >>>> as a
> >>>>>>> multiple of some other operation. Users configuring this should not
> >>>> need
> >>>>>> to
> >>>>>>> break out a calculator to work out how frequently the records are
> >> going
> >>>>>> to
> >>>>>>> be purged!
> >>>>>>>
> >>>>>>> I've also updated the PR with the relevant changes.
> >>>>>>>
> >>>>>>> BTW, for some reason I didn't receive Sophie's email. I'll
> >> periodically
> >>>>>>> check the thread on the archive to ensure I don't miss any more of
> >> your
> >>>>>>> messages!
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Nick
> >>>>>>>
> >>>>>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Thanks, Bruno.
> >>>>>>>>
> >>>>>>>> I agree my point 4 is more like PR discussion, not KIP discussion.
> >>>>>>>> @Nick , please update my point 4 in PR directly.
> >>>>>>>>
> >>>>>>>> Thank you.
> >>>>>>>> Luke
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <cadonna@apache.org
> >
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Nick,
> >>>>>>>>>
> >>>>>>>>> Thank you for the KIP!
> >>>>>>>>>
> >>>>>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I
> agree
> >>>>>> that
> >>>>>>>>> we should update the docs for commit.interval.ms but I am not
> sure
> >>>> if
> >>>>>>>>> this needs to mentioned in the KIP. That seems to me more a PR
> >>>>>>>>> discussion. Also on point 2, I agree that we need to add a doc
> >> string
> >>>>>>>>> but the content should be exemplary not binding. What I want to
> say
> >>>> is
> >>>>>>>>> that, we do not need a KIP to change docs.
> >>>>>>>>>
> >>>>>>>>> Here my points:
> >>>>>>>>>
> >>>>>>>>> 5. Could you specify in the motivation that the KIP is about
> >> deleting
> >>>>>>>>> records from repartition topics? Maybe with a short description
> >> when
> >>>>>> why
> >>>>>>>>> and when records are deleted from the repartition topics. For us
> it
> >>>>>>>>> might be clear, but IMO we should try to write KIPs so that
> someone
> >>>>>> that
> >>>>>>>>> is relatively new to Kafka Streams can understand the KIP without
> >>>>>>>>> needing to know too much background.
> >>>>>>>>>
> >>>>>>>>> 6. Does the config need to be validated? For example, does
> >>>>>>>>> delete.interval.ms need to be greater than or equal to
> >>>>>>>> commit.interval.ms?
> >>>>>>>>>
> >>>>>>>>> 7. Should the default value for non-EOS be 30s or the same value
> as
> >>>>>>>>> commit.interval.ms? I am just thinking about the case where a
> user
> >>>>>>>>> explicitly changes commit.interval.ms but not delete.interval.ms
> >> (or
> >>>>>>>>> whatever name you come up for it). Once delete.interval.ms is
> set
> >>>>>>>>> explicitly it is decoupled from commit.interval.ms. Similar
> could
> >> be
> >>>>>>>>> done for the EOS case.
> >>>>>>>>> Alternatively, we could also define delete.interval.ms to take a
> >>>>>>>>> integral number without a unit that specifies after how many
> commit
> >>>>>>>>> intervals the records in repartition topics should be deleted.
> This
> >>>>>>>>> would make sense since delete.interval.ms is tightly bound to
> >>>>>>>>> commit.interval.ms. Additionally, it would make the semantics of
> >> the
> >>>>>>>>> config simpler. The name of the config should definitely change
> if
> >> we
> >>>>>> go
> >>>>>>>>> down this way.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Bruno
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 21.12.21 11:14, Luke Chen wrote:
> >>>>>>>>>> Hi Nick,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for the KIP!
> >>>>>>>>>>
> >>>>>>>>>> In addition to Sophie's comments, I have one more to this KIP:
> >>>>>>>>>> 3. I think you should mention the behavior change *explicitly*
> in
> >>>>>>>>>> "Compatibility" section. I know you already mentioned it in KIP,
> >> in
> >>>>>> the
> >>>>>>>>>> benefit way. But I think in this section, we should clearly
> point
> >>>> out
> >>>>>>>>> which
> >>>>>>>>>> behavior will be change after this KIP. That is, you should put
> it
> >>>>>>>> clear
> >>>>>>>>>> that the delete record interval will change from 100ms to 30s
> with
> >>>> EOS
> >>>>>>>>>> enabled. And it should also be mentioned in doc/upgrade.html
> doc.
> >>>>>>>>>> 4. Since this new config has some relationship with
> >>>>>> commit.interval.ms
> >>>>>>>> ,
> >>>>>>>>> I
> >>>>>>>>>> think we should also update the doc description for `
> >>>>>>>> commit.interval.ms
> >>>>>>>>> `,
> >>>>>>>>>> to let user know there's another config to control delete
> interval
> >>>> and
> >>>>>>>>>> should be greater than commit.interval.ms. Something like that.
> >>>> WDYT?
> >>>>>>>>> (You
> >>>>>>>>>> should put this change in the KIP as Sophie mentioned)
> >>>>>>>>>>
> >>>>>>>>>> Thank you.
> >>>>>>>>>> Luke
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> >>>>>>>>>> <so...@confluent.io.invalid> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hey Nick,
> >>>>>>>>>>>
> >>>>>>>>>>> I think you forgot to link to the KIP document, but I take it
> >> this
> >>>> is
> >>>>>>>>>>> it: KIP-811:
> >>>>>>>>>>> Add separate delete.interval.ms to Kafka Streams
> >>>>>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
> >>>>>>>>>>>
> >>>>>>>>>>> The overall proposal sounds good to me, just a few minor
> things:
> >>>>>>>>>>>
> >>>>>>>>>>>         1. Please specify everything needed to define this
> config
> >>>>>>>>> explicitly, ie
> >>>>>>>>>>>         all the arguments that will be passed in to the
> >>>>>>>>>>>         StreamsConfig's ConfigDef: in addition to the default
> >> value,
> >>>> we
> >>>>>>>>> need the
> >>>>>>>>>>>         config type (presumably a Long), the doc
> >>>>>>>>>>>         string, and the importance (probably "low", similar to
> >>>>>>>>>>> commit.interval.ms
> >>>>>>>>>>>         )
> >>>>>>>>>>>         2. Might be worth considering a slightly more
> descriptive
> >>>> name
> >>>>>> for
> >>>>>>>>> this
> >>>>>>>>>>>         config. Most users probably don't think about,
> >>>>>>>>>>>         or may not even be aware of, the deletion of consumed
> >>>> records by
> >>>>>>>>> Kafka
> >>>>>>>>>>>         Streams, so calling it something along
> >>>>>>>>>>>         the lines of "repartition.records.delete.interval.ms"
> or
> >> "
> >>>>>>>>>>>         consumed.records.deletion.interval.ms" or so on will
> help
> >>>>>>>>>>>         make it clear what the config refers to and whether or
> not
> >>>> they
> >>>>>>>>> need to
> >>>>>>>>>>>         care. Maybe you can come up with better
> >>>>>>>>>>>         and/or shorter names, just wanted to suggest some
> example
> >>>> names
> >>>>>>>>> that I
> >>>>>>>>>>>         think sufficiently get the point across
> >>>>>>>>>>>
> >>>>>>>>>>> Other than that I'm +1 -- thanks for the KIP!
> >>>>>>>>>>>
> >>>>>>>>>>> Sophie
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <
> >>>> nick.telford@gmail.com
> >>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The
> >> solution
> >>>>>> is
> >>>>>>>>>>> very
> >>>>>>>>>>>> simple, so the KIP is pretty short.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The suggested changes are implemented by this PR
> >>>>>>>>>>>> <https://github.com/apache/kafka/pull/11610>.
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Nick Telford
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by "Matthias J. Sax" <mj...@apache.org>.
Thanks for the KIP! Very good discussion.

I want to raise one point: I am not a fan of the `min.` prefix of the 
config, because all configs like this are _minimums_. We also use 
`commit.interval.ms` and not `min.commit.interval.ms` for example. I 
would suggest to strip the `min.` prefix.


For the config description, I would remove

> The default value is the same as the default for commit.interval.ms (30000).

It seems unnecessary to me.


-Matthias



On 1/12/22 7:21 AM, Nick Telford wrote:
> Thanks Bruno, I've made those changes.
> 
> I'll call a vote on the KIP later today.
> 
> Regards,
> 
> Nick Telford
> 
> On Wed, 12 Jan 2022 at 12:13, Bruno Cadonna <ca...@apache.org> wrote:
> 
>> Hi Nick,
>>
>> Great!
>>
>> I think the KIP is ready for voting. I have just a couple of minor
>> comments.
>>
>> a.
>> In the config description, I would replace
>>
>> "Purging will occur after at least this value since the last purge, but
>> may be delayed until later in order to meet the processing guarantee."
>>
>> with
>>
>> "Purging will occur after at least this value since the last purge, but
>> may be delayed until later."
>>
>> I do not really understand what you mean with "meet the processing
>> guarantee" and I think it suffices to say that the purge might be delayed.
>>
>>
>> b.
>> I would change the title to
>>
>> "Add config min.repartition.purge.interval.ms to Kafka Streams"
>>
>>
>> c.
>> The current rejected alternative leaks implementation details since it
>> refers to the coupling of purge to commit and we agreed to leave that
>> out of the KIP.
>>
>>
>> d.
>> Could you add my proposal to specify the config as a multiple of the
>> commit interval to the rejected alternatives with the reason why we
>> discarded it?
>>
>> For the rest, I am +1.
>>
>> Best,
>> Bruno
>>
>>
>>
>> On 11.01.22 16:47, Nick Telford wrote:
>>> Hi Bruno,
>>>
>>> Thanks for your recommendations.
>>>
>>> I've made those changes, and also updated the description for the new
>>> config to read: "The minimum frequency in milliseconds with which to
>> delete
>>> fully consumed records from repartition topics. *Purging will occur after
>>> at least this value since the last purge, but may be delayed until later
>> in
>>> order to meet the processing guarantee.* The default value is the same as
>>> the default for commit.interval.ms (30000).  (Note, unlike
>>> commit.interval.ms, the default for this value remains unchanged when
>>> processing.guarantee is set to exactly_once_v2)."
>>>
>>> This should make it clear that this is just a minimum interval, without
>>> leaking too much detail in to the specification.
>>>
>>> If there are no other issues, I'll put this to a vote.
>>>
>>> Regards,
>>>
>>> Nick Telford
>>>
>>> On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <ca...@apache.org> wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> Sorry for the delay!
>>>>
>>>> Regarding point 7, I am fine with keeping the config as an interval and
>>>> keeping it independently of the commit interval. However, I would then
>>>> remove the following paragraph from the KIP:
>>>>
>>>> "We will still wait for a commit before explicitly deleting repartition
>>>> records, but we will only do so if the time since the last record
>>>> deletion is at least repartition.purge.interval.ms. This means the
>>>> lower-bound for repartition.purge.interval.ms  is effectively capped by
>>>> the value of commit.interval.ms."
>>>>
>>>> The reason is that in the previous paragraph you say that the configs
>>>> can be modified separately and then in this paragraph you bind the purge
>>>> interval to the commit interval. This seems a contradiction and
>>>> indicates that you are leaking too much implementation details into the
>>>> KIP. I think, just saying that the purge interval is a minimum and name
>>>> it accordingly without talking about the actual implementation makes the
>>>> KIP more robust against future implementation changes.
>>>>
>>>> My proposal for the config name is min.repartition.purge.interval.ms or
>>>> even min.purge.interval.ms with a preference for the former.
>>>>
>>>> Best,
>>>> Bruno
>>>>
>>>>
>>>>
>>>> On 04.01.22 17:21, Nick Telford wrote:
>>>>>>
>>>>>> You missed one "delete.interval.ms" in the last paragraph in section
>>>>>> "Proposed Changes".
>>>>>>
>>>>>
>>>>> I've fixed these now, including the title that I somehow missed!
>>>>>
>>>>> I am afraid, I again need to comment on point 7. IMO, it does not make
>>>>>
>>>>> sense to be able to tune repartition.purge.interval.ms and
>>>>>
>>>>> commit.interval.ms separately when the purge can only happen during a
>>>>>
>>>>> commit. For example, if I set commit.interval.ms to 30000 ms and
>>>>>
>>>>> repartition.purge.interval.ms to 35000 ms, the records will be purged
>> at
>>>>>
>>>>> every second commit, i.e., every 60000 ms. What benefit do users have
>> to
>>>>>
>>>>> set repartition.purge.interval.ms separately from commit.interval.ms?
>>>>>
>>>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
>>>>>
>>>>> 2*commit.interval.ms..
>>>>>
>>>>>
>>>>> Could we address this by chaning the name of the configuration to
>>>> something
>>>>> like "repartition.purge.min.interval.ms", to indicate that the
>>>> repartition
>>>>> purge interval will be *at least* this value?
>>>>>
>>>>> If that's still not suitable, are there any other existing
>> configurations
>>>>> that behave in a similar way, i.e. dictate a multiple of another
>>>> interval,
>>>>> that we could use as a basis for a new name for this configuration?
>>>>>
>>>>> Additionally, I have a new point.
>>>>>
>>>>> 8. If user code has access to the processor context (e.g. in the
>>>>>
>>>>> processor API), a commit can also be requested on demand by user code.
>>>>>
>>>>> The KIP should clarify if purges might also happen during requested
>>>>>
>>>>> commits or if purges only happen during automatic commits.
>>>>>
>>>>>
>>>>> Good point. I'm holding off on amending this for now until we agree on
>> an
>>>>> outcome for point 7 above, because I suspect it may at least influence
>>>> the
>>>>> wording of this section.
>>>>>
>>>>> Thanks for the feedback, and sorry for the delay. Now that the holidays
>>>> are
>>>>> over I'm able to focus on this again.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Nick Telford
>>>>>
>>>>> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <ca...@apache.org>
>> wrote:
>>>>>
>>>>>> Hi Nick,
>>>>>>
>>>>>> Thanks for the updates!
>>>>>>
>>>>>> The motivation section and the description of the new config is much
>>>>>> clearer and informative now!
>>>>>>
>>>>>> You missed one "delete.interval.ms" in the last paragraph in section
>>>>>> "Proposed Changes".
>>>>>>
>>>>>> I am afraid, I again need to comment on point 7. IMO, it does not make
>>>>>> sense to be able to tune repartition.purge.interval.ms and
>>>>>> commit.interval.ms separately when the purge can only happen during a
>>>>>> commit. For example, if I set commit.interval.ms to 30000 ms and
>>>>>> repartition.purge.interval.ms to 35000 ms, the records will be purged
>>>> at
>>>>>> every second commit, i.e., every 60000 ms. What benefit do users have
>> to
>>>>>> set repartition.purge.interval.ms separately from commit.interval.ms?
>>>>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
>>>>>> 2*commit.interval.ms..
>>>>>>
>>>>>> Additionally, I have a new point.
>>>>>> 8. If user code has access to the processor context (e.g. in the
>>>>>> processor API), a commit can also be requested on demand by user code.
>>>>>> The KIP should clarify if purges might also happen during requested
>>>>>> commits or if purges only happen during automatic commits.
>>>>>>
>>>>>> Best,
>>>>>> Bruno
>>>>>>
>>>>>> On 21.12.21 20:40, Nick Telford wrote:
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Thanks for your feedback. I've made the suggested changes to the KIP
>>>> (1,
>>>>>> 2,
>>>>>>> 3 and 5).
>>>>>>>
>>>>>>> For the new name, I've chosen repartition.purge.interval.ms, as I
>> felt
>>>>>> it
>>>>>>> struck a good balance between being self-descriptive and concise.
>>>> Please
>>>>>>> let me know if you'd prefer something else.
>>>>>>>
>>>>>>> On point 6: My PR has basic validation to ensure the value is
>> positive,
>>>>>> but
>>>>>>> I don't think it's necessary to have dynamic validation, to ensure
>> it's
>>>>>> not
>>>>>>> less than commit.interval.ms. The reason is that it will be
>> implicitly
>>>>>>> limited to that value anyway, and won't break anything. But I can add
>>>> it
>>>>>> if
>>>>>>> you'd prefer it.
>>>>>>>
>>>>>>> On point 7: I worry that defaulting it to follow the value of
>>>>>>> commit.interval.ms may confuse users, who will likely expect the
>>>>>> default to
>>>>>>> not be affected by changes to other configuration options. I can see
>>>> the
>>>>>>> appeal of retaining the existing behaviour (following the commit
>>>>>> interval)
>>>>>>> by default, but I believe that the majority of users who customize
>>>>>>> commit.interval.ms do not desire a different frequency of
>> repartition
>>>>>>> record purging as well.
>>>>>>>
>>>>>>> As for multiples of commit interval: I think the argument against
>> that
>>>> is
>>>>>>> that an interval is more intuitive when given as a time, rather than
>>>> as a
>>>>>>> multiple of some other operation. Users configuring this should not
>>>> need
>>>>>> to
>>>>>>> break out a calculator to work out how frequently the records are
>> going
>>>>>> to
>>>>>>> be purged!
>>>>>>>
>>>>>>> I've also updated the PR with the relevant changes.
>>>>>>>
>>>>>>> BTW, for some reason I didn't receive Sophie's email. I'll
>> periodically
>>>>>>> check the thread on the archive to ensure I don't miss any more of
>> your
>>>>>>> messages!
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks, Bruno.
>>>>>>>>
>>>>>>>> I agree my point 4 is more like PR discussion, not KIP discussion.
>>>>>>>> @Nick , please update my point 4 in PR directly.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>> Luke
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <ca...@apache.org>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Nick,
>>>>>>>>>
>>>>>>>>> Thank you for the KIP!
>>>>>>>>>
>>>>>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree
>>>>>> that
>>>>>>>>> we should update the docs for commit.interval.ms but I am not sure
>>>> if
>>>>>>>>> this needs to mentioned in the KIP. That seems to me more a PR
>>>>>>>>> discussion. Also on point 2, I agree that we need to add a doc
>> string
>>>>>>>>> but the content should be exemplary not binding. What I want to say
>>>> is
>>>>>>>>> that, we do not need a KIP to change docs.
>>>>>>>>>
>>>>>>>>> Here my points:
>>>>>>>>>
>>>>>>>>> 5. Could you specify in the motivation that the KIP is about
>> deleting
>>>>>>>>> records from repartition topics? Maybe with a short description
>> when
>>>>>> why
>>>>>>>>> and when records are deleted from the repartition topics. For us it
>>>>>>>>> might be clear, but IMO we should try to write KIPs so that someone
>>>>>> that
>>>>>>>>> is relatively new to Kafka Streams can understand the KIP without
>>>>>>>>> needing to know too much background.
>>>>>>>>>
>>>>>>>>> 6. Does the config need to be validated? For example, does
>>>>>>>>> delete.interval.ms need to be greater than or equal to
>>>>>>>> commit.interval.ms?
>>>>>>>>>
>>>>>>>>> 7. Should the default value for non-EOS be 30s or the same value as
>>>>>>>>> commit.interval.ms? I am just thinking about the case where a user
>>>>>>>>> explicitly changes commit.interval.ms but not delete.interval.ms
>> (or
>>>>>>>>> whatever name you come up for it). Once delete.interval.ms is set
>>>>>>>>> explicitly it is decoupled from commit.interval.ms. Similar could
>> be
>>>>>>>>> done for the EOS case.
>>>>>>>>> Alternatively, we could also define delete.interval.ms to take a
>>>>>>>>> integral number without a unit that specifies after how many commit
>>>>>>>>> intervals the records in repartition topics should be deleted. This
>>>>>>>>> would make sense since delete.interval.ms is tightly bound to
>>>>>>>>> commit.interval.ms. Additionally, it would make the semantics of
>> the
>>>>>>>>> config simpler. The name of the config should definitely change if
>> we
>>>>>> go
>>>>>>>>> down this way.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Bruno
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 21.12.21 11:14, Luke Chen wrote:
>>>>>>>>>> Hi Nick,
>>>>>>>>>>
>>>>>>>>>> Thanks for the KIP!
>>>>>>>>>>
>>>>>>>>>> In addition to Sophie's comments, I have one more to this KIP:
>>>>>>>>>> 3. I think you should mention the behavior change *explicitly* in
>>>>>>>>>> "Compatibility" section. I know you already mentioned it in KIP,
>> in
>>>>>> the
>>>>>>>>>> benefit way. But I think in this section, we should clearly point
>>>> out
>>>>>>>>> which
>>>>>>>>>> behavior will be change after this KIP. That is, you should put it
>>>>>>>> clear
>>>>>>>>>> that the delete record interval will change from 100ms to 30s with
>>>> EOS
>>>>>>>>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
>>>>>>>>>> 4. Since this new config has some relationship with
>>>>>> commit.interval.ms
>>>>>>>> ,
>>>>>>>>> I
>>>>>>>>>> think we should also update the doc description for `
>>>>>>>> commit.interval.ms
>>>>>>>>> `,
>>>>>>>>>> to let user know there's another config to control delete interval
>>>> and
>>>>>>>>>> should be greater than commit.interval.ms. Something like that.
>>>> WDYT?
>>>>>>>>> (You
>>>>>>>>>> should put this change in the KIP as Sophie mentioned)
>>>>>>>>>>
>>>>>>>>>> Thank you.
>>>>>>>>>> Luke
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
>>>>>>>>>> <so...@confluent.io.invalid> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey Nick,
>>>>>>>>>>>
>>>>>>>>>>> I think you forgot to link to the KIP document, but I take it
>> this
>>>> is
>>>>>>>>>>> it: KIP-811:
>>>>>>>>>>> Add separate delete.interval.ms to Kafka Streams
>>>>>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
>>>>>>>>>>>
>>>>>>>>>>> The overall proposal sounds good to me, just a few minor things:
>>>>>>>>>>>
>>>>>>>>>>>         1. Please specify everything needed to define this config
>>>>>>>>> explicitly, ie
>>>>>>>>>>>         all the arguments that will be passed in to the
>>>>>>>>>>>         StreamsConfig's ConfigDef: in addition to the default
>> value,
>>>> we
>>>>>>>>> need the
>>>>>>>>>>>         config type (presumably a Long), the doc
>>>>>>>>>>>         string, and the importance (probably "low", similar to
>>>>>>>>>>> commit.interval.ms
>>>>>>>>>>>         )
>>>>>>>>>>>         2. Might be worth considering a slightly more descriptive
>>>> name
>>>>>> for
>>>>>>>>> this
>>>>>>>>>>>         config. Most users probably don't think about,
>>>>>>>>>>>         or may not even be aware of, the deletion of consumed
>>>> records by
>>>>>>>>> Kafka
>>>>>>>>>>>         Streams, so calling it something along
>>>>>>>>>>>         the lines of "repartition.records.delete.interval.ms" or
>> "
>>>>>>>>>>>         consumed.records.deletion.interval.ms" or so on will help
>>>>>>>>>>>         make it clear what the config refers to and whether or not
>>>> they
>>>>>>>>> need to
>>>>>>>>>>>         care. Maybe you can come up with better
>>>>>>>>>>>         and/or shorter names, just wanted to suggest some example
>>>> names
>>>>>>>>> that I
>>>>>>>>>>>         think sufficiently get the point across
>>>>>>>>>>>
>>>>>>>>>>> Other than that I'm +1 -- thanks for the KIP!
>>>>>>>>>>>
>>>>>>>>>>> Sophie
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <
>>>> nick.telford@gmail.com
>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
>>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The
>> solution
>>>>>> is
>>>>>>>>>>> very
>>>>>>>>>>>> simple, so the KIP is pretty short.
>>>>>>>>>>>>
>>>>>>>>>>>> The suggested changes are implemented by this PR
>>>>>>>>>>>> <https://github.com/apache/kafka/pull/11610>.
>>>>>>>>>>>> --
>>>>>>>>>>>> Nick Telford
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Nick Telford <ni...@gmail.com>.
Thanks Bruno, I've made those changes.

I'll call a vote on the KIP later today.

Regards,

Nick Telford

On Wed, 12 Jan 2022 at 12:13, Bruno Cadonna <ca...@apache.org> wrote:

> Hi Nick,
>
> Great!
>
> I think the KIP is ready for voting. I have just a couple of minor
> comments.
>
> a.
> In the config description, I would replace
>
> "Purging will occur after at least this value since the last purge, but
> may be delayed until later in order to meet the processing guarantee."
>
> with
>
> "Purging will occur after at least this value since the last purge, but
> may be delayed until later."
>
> I do not really understand what you mean with "meet the processing
> guarantee" and I think it suffices to say that the purge might be delayed.
>
>
> b.
> I would change the title to
>
> "Add config min.repartition.purge.interval.ms to Kafka Streams"
>
>
> c.
> The current rejected alternative leaks implementation details since it
> refers to the coupling of purge to commit and we agreed to leave that
> out of the KIP.
>
>
> d.
> Could you add my proposal to specify the config as a multiple of the
> commit interval to the rejected alternatives with the reason why we
> discarded it?
>
> For the rest, I am +1.
>
> Best,
> Bruno
>
>
>
> On 11.01.22 16:47, Nick Telford wrote:
> > Hi Bruno,
> >
> > Thanks for your recommendations.
> >
> > I've made those changes, and also updated the description for the new
> > config to read: "The minimum frequency in milliseconds with which to
> delete
> > fully consumed records from repartition topics. *Purging will occur after
> > at least this value since the last purge, but may be delayed until later
> in
> > order to meet the processing guarantee.* The default value is the same as
> > the default for commit.interval.ms (30000).  (Note, unlike
> > commit.interval.ms, the default for this value remains unchanged when
> > processing.guarantee is set to exactly_once_v2)."
> >
> > This should make it clear that this is just a minimum interval, without
> > leaking too much detail in to the specification.
> >
> > If there are no other issues, I'll put this to a vote.
> >
> > Regards,
> >
> > Nick Telford
> >
> > On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <ca...@apache.org> wrote:
> >
> >> Hi Nick,
> >>
> >> Sorry for the delay!
> >>
> >> Regarding point 7, I am fine with keeping the config as an interval and
> >> keeping it independently of the commit interval. However, I would then
> >> remove the following paragraph from the KIP:
> >>
> >> "We will still wait for a commit before explicitly deleting repartition
> >> records, but we will only do so if the time since the last record
> >> deletion is at least repartition.purge.interval.ms. This means the
> >> lower-bound for repartition.purge.interval.ms  is effectively capped by
> >> the value of commit.interval.ms."
> >>
> >> The reason is that in the previous paragraph you say that the configs
> >> can be modified separately and then in this paragraph you bind the purge
> >> interval to the commit interval. This seems a contradiction and
> >> indicates that you are leaking too much implementation details into the
> >> KIP. I think, just saying that the purge interval is a minimum and name
> >> it accordingly without talking about the actual implementation makes the
> >> KIP more robust against future implementation changes.
> >>
> >> My proposal for the config name is min.repartition.purge.interval.ms or
> >> even min.purge.interval.ms with a preference for the former.
> >>
> >> Best,
> >> Bruno
> >>
> >>
> >>
> >> On 04.01.22 17:21, Nick Telford wrote:
> >>>>
> >>>> You missed one "delete.interval.ms" in the last paragraph in section
> >>>> "Proposed Changes".
> >>>>
> >>>
> >>> I've fixed these now, including the title that I somehow missed!
> >>>
> >>> I am afraid, I again need to comment on point 7. IMO, it does not make
> >>>
> >>> sense to be able to tune repartition.purge.interval.ms and
> >>>
> >>> commit.interval.ms separately when the purge can only happen during a
> >>>
> >>> commit. For example, if I set commit.interval.ms to 30000 ms and
> >>>
> >>> repartition.purge.interval.ms to 35000 ms, the records will be purged
> at
> >>>
> >>> every second commit, i.e., every 60000 ms. What benefit do users have
> to
> >>>
> >>> set repartition.purge.interval.ms separately from commit.interval.ms?
> >>>
> >>> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >>>
> >>> 2*commit.interval.ms..
> >>>
> >>>
> >>> Could we address this by chaning the name of the configuration to
> >> something
> >>> like "repartition.purge.min.interval.ms", to indicate that the
> >> repartition
> >>> purge interval will be *at least* this value?
> >>>
> >>> If that's still not suitable, are there any other existing
> configurations
> >>> that behave in a similar way, i.e. dictate a multiple of another
> >> interval,
> >>> that we could use as a basis for a new name for this configuration?
> >>>
> >>> Additionally, I have a new point.
> >>>
> >>> 8. If user code has access to the processor context (e.g. in the
> >>>
> >>> processor API), a commit can also be requested on demand by user code.
> >>>
> >>> The KIP should clarify if purges might also happen during requested
> >>>
> >>> commits or if purges only happen during automatic commits.
> >>>
> >>>
> >>> Good point. I'm holding off on amending this for now until we agree on
> an
> >>> outcome for point 7 above, because I suspect it may at least influence
> >> the
> >>> wording of this section.
> >>>
> >>> Thanks for the feedback, and sorry for the delay. Now that the holidays
> >> are
> >>> over I'm able to focus on this again.
> >>>
> >>> Regards,
> >>>
> >>> Nick Telford
> >>>
> >>> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <ca...@apache.org>
> wrote:
> >>>
> >>>> Hi Nick,
> >>>>
> >>>> Thanks for the updates!
> >>>>
> >>>> The motivation section and the description of the new config is much
> >>>> clearer and informative now!
> >>>>
> >>>> You missed one "delete.interval.ms" in the last paragraph in section
> >>>> "Proposed Changes".
> >>>>
> >>>> I am afraid, I again need to comment on point 7. IMO, it does not make
> >>>> sense to be able to tune repartition.purge.interval.ms and
> >>>> commit.interval.ms separately when the purge can only happen during a
> >>>> commit. For example, if I set commit.interval.ms to 30000 ms and
> >>>> repartition.purge.interval.ms to 35000 ms, the records will be purged
> >> at
> >>>> every second commit, i.e., every 60000 ms. What benefit do users have
> to
> >>>> set repartition.purge.interval.ms separately from commit.interval.ms?
> >>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >>>> 2*commit.interval.ms..
> >>>>
> >>>> Additionally, I have a new point.
> >>>> 8. If user code has access to the processor context (e.g. in the
> >>>> processor API), a commit can also be requested on demand by user code.
> >>>> The KIP should clarify if purges might also happen during requested
> >>>> commits or if purges only happen during automatic commits.
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>> On 21.12.21 20:40, Nick Telford wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> Thanks for your feedback. I've made the suggested changes to the KIP
> >> (1,
> >>>> 2,
> >>>>> 3 and 5).
> >>>>>
> >>>>> For the new name, I've chosen repartition.purge.interval.ms, as I
> felt
> >>>> it
> >>>>> struck a good balance between being self-descriptive and concise.
> >> Please
> >>>>> let me know if you'd prefer something else.
> >>>>>
> >>>>> On point 6: My PR has basic validation to ensure the value is
> positive,
> >>>> but
> >>>>> I don't think it's necessary to have dynamic validation, to ensure
> it's
> >>>> not
> >>>>> less than commit.interval.ms. The reason is that it will be
> implicitly
> >>>>> limited to that value anyway, and won't break anything. But I can add
> >> it
> >>>> if
> >>>>> you'd prefer it.
> >>>>>
> >>>>> On point 7: I worry that defaulting it to follow the value of
> >>>>> commit.interval.ms may confuse users, who will likely expect the
> >>>> default to
> >>>>> not be affected by changes to other configuration options. I can see
> >> the
> >>>>> appeal of retaining the existing behaviour (following the commit
> >>>> interval)
> >>>>> by default, but I believe that the majority of users who customize
> >>>>> commit.interval.ms do not desire a different frequency of
> repartition
> >>>>> record purging as well.
> >>>>>
> >>>>> As for multiples of commit interval: I think the argument against
> that
> >> is
> >>>>> that an interval is more intuitive when given as a time, rather than
> >> as a
> >>>>> multiple of some other operation. Users configuring this should not
> >> need
> >>>> to
> >>>>> break out a calculator to work out how frequently the records are
> going
> >>>> to
> >>>>> be purged!
> >>>>>
> >>>>> I've also updated the PR with the relevant changes.
> >>>>>
> >>>>> BTW, for some reason I didn't receive Sophie's email. I'll
> periodically
> >>>>> check the thread on the archive to ensure I don't miss any more of
> your
> >>>>> messages!
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Nick
> >>>>>
> >>>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:
> >>>>>
> >>>>>> Thanks, Bruno.
> >>>>>>
> >>>>>> I agree my point 4 is more like PR discussion, not KIP discussion.
> >>>>>> @Nick , please update my point 4 in PR directly.
> >>>>>>
> >>>>>> Thank you.
> >>>>>> Luke
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <ca...@apache.org>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi Nick,
> >>>>>>>
> >>>>>>> Thank you for the KIP!
> >>>>>>>
> >>>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree
> >>>> that
> >>>>>>> we should update the docs for commit.interval.ms but I am not sure
> >> if
> >>>>>>> this needs to mentioned in the KIP. That seems to me more a PR
> >>>>>>> discussion. Also on point 2, I agree that we need to add a doc
> string
> >>>>>>> but the content should be exemplary not binding. What I want to say
> >> is
> >>>>>>> that, we do not need a KIP to change docs.
> >>>>>>>
> >>>>>>> Here my points:
> >>>>>>>
> >>>>>>> 5. Could you specify in the motivation that the KIP is about
> deleting
> >>>>>>> records from repartition topics? Maybe with a short description
> when
> >>>> why
> >>>>>>> and when records are deleted from the repartition topics. For us it
> >>>>>>> might be clear, but IMO we should try to write KIPs so that someone
> >>>> that
> >>>>>>> is relatively new to Kafka Streams can understand the KIP without
> >>>>>>> needing to know too much background.
> >>>>>>>
> >>>>>>> 6. Does the config need to be validated? For example, does
> >>>>>>> delete.interval.ms need to be greater than or equal to
> >>>>>> commit.interval.ms?
> >>>>>>>
> >>>>>>> 7. Should the default value for non-EOS be 30s or the same value as
> >>>>>>> commit.interval.ms? I am just thinking about the case where a user
> >>>>>>> explicitly changes commit.interval.ms but not delete.interval.ms
> (or
> >>>>>>> whatever name you come up for it). Once delete.interval.ms is set
> >>>>>>> explicitly it is decoupled from commit.interval.ms. Similar could
> be
> >>>>>>> done for the EOS case.
> >>>>>>> Alternatively, we could also define delete.interval.ms to take a
> >>>>>>> integral number without a unit that specifies after how many commit
> >>>>>>> intervals the records in repartition topics should be deleted. This
> >>>>>>> would make sense since delete.interval.ms is tightly bound to
> >>>>>>> commit.interval.ms. Additionally, it would make the semantics of
> the
> >>>>>>> config simpler. The name of the config should definitely change if
> we
> >>>> go
> >>>>>>> down this way.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Bruno
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 21.12.21 11:14, Luke Chen wrote:
> >>>>>>>> Hi Nick,
> >>>>>>>>
> >>>>>>>> Thanks for the KIP!
> >>>>>>>>
> >>>>>>>> In addition to Sophie's comments, I have one more to this KIP:
> >>>>>>>> 3. I think you should mention the behavior change *explicitly* in
> >>>>>>>> "Compatibility" section. I know you already mentioned it in KIP,
> in
> >>>> the
> >>>>>>>> benefit way. But I think in this section, we should clearly point
> >> out
> >>>>>>> which
> >>>>>>>> behavior will be change after this KIP. That is, you should put it
> >>>>>> clear
> >>>>>>>> that the delete record interval will change from 100ms to 30s with
> >> EOS
> >>>>>>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
> >>>>>>>> 4. Since this new config has some relationship with
> >>>> commit.interval.ms
> >>>>>> ,
> >>>>>>> I
> >>>>>>>> think we should also update the doc description for `
> >>>>>> commit.interval.ms
> >>>>>>> `,
> >>>>>>>> to let user know there's another config to control delete interval
> >> and
> >>>>>>>> should be greater than commit.interval.ms. Something like that.
> >> WDYT?
> >>>>>>> (You
> >>>>>>>> should put this change in the KIP as Sophie mentioned)
> >>>>>>>>
> >>>>>>>> Thank you.
> >>>>>>>> Luke
> >>>>>>>>
> >>>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> >>>>>>>> <so...@confluent.io.invalid> wrote:
> >>>>>>>>
> >>>>>>>>> Hey Nick,
> >>>>>>>>>
> >>>>>>>>> I think you forgot to link to the KIP document, but I take it
> this
> >> is
> >>>>>>>>> it: KIP-811:
> >>>>>>>>> Add separate delete.interval.ms to Kafka Streams
> >>>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
> >>>>>>>>>
> >>>>>>>>> The overall proposal sounds good to me, just a few minor things:
> >>>>>>>>>
> >>>>>>>>>        1. Please specify everything needed to define this config
> >>>>>>> explicitly, ie
> >>>>>>>>>        all the arguments that will be passed in to the
> >>>>>>>>>        StreamsConfig's ConfigDef: in addition to the default
> value,
> >> we
> >>>>>>> need the
> >>>>>>>>>        config type (presumably a Long), the doc
> >>>>>>>>>        string, and the importance (probably "low", similar to
> >>>>>>>>> commit.interval.ms
> >>>>>>>>>        )
> >>>>>>>>>        2. Might be worth considering a slightly more descriptive
> >> name
> >>>> for
> >>>>>>> this
> >>>>>>>>>        config. Most users probably don't think about,
> >>>>>>>>>        or may not even be aware of, the deletion of consumed
> >> records by
> >>>>>>> Kafka
> >>>>>>>>>        Streams, so calling it something along
> >>>>>>>>>        the lines of "repartition.records.delete.interval.ms" or
> "
> >>>>>>>>>        consumed.records.deletion.interval.ms" or so on will help
> >>>>>>>>>        make it clear what the config refers to and whether or not
> >> they
> >>>>>>> need to
> >>>>>>>>>        care. Maybe you can come up with better
> >>>>>>>>>        and/or shorter names, just wanted to suggest some example
> >> names
> >>>>>>> that I
> >>>>>>>>>        think sufficiently get the point across
> >>>>>>>>>
> >>>>>>>>> Other than that I'm +1 -- thanks for the KIP!
> >>>>>>>>>
> >>>>>>>>> Sophie
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <
> >> nick.telford@gmail.com
> >>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
> >>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The
> solution
> >>>> is
> >>>>>>>>> very
> >>>>>>>>>> simple, so the KIP is pretty short.
> >>>>>>>>>>
> >>>>>>>>>> The suggested changes are implemented by this PR
> >>>>>>>>>> <https://github.com/apache/kafka/pull/11610>.
> >>>>>>>>>> --
> >>>>>>>>>> Nick Telford
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Nick,

Great!

I think the KIP is ready for voting. I have just a couple of minor comments.

a.
In the config description, I would replace

"Purging will occur after at least this value since the last purge, but 
may be delayed until later in order to meet the processing guarantee."

with

"Purging will occur after at least this value since the last purge, but 
may be delayed until later."

I do not really understand what you mean with "meet the processing 
guarantee" and I think it suffices to say that the purge might be delayed.


b.
I would change the title to

"Add config min.repartition.purge.interval.ms to Kafka Streams"


c.
The current rejected alternative leaks implementation details since it 
refers to the coupling of purge to commit and we agreed to leave that 
out of the KIP.


d.
Could you add my proposal to specify the config as a multiple of the 
commit interval to the rejected alternatives with the reason why we 
discarded it?

For the rest, I am +1.

Best,
Bruno



On 11.01.22 16:47, Nick Telford wrote:
> Hi Bruno,
> 
> Thanks for your recommendations.
> 
> I've made those changes, and also updated the description for the new
> config to read: "The minimum frequency in milliseconds with which to delete
> fully consumed records from repartition topics. *Purging will occur after
> at least this value since the last purge, but may be delayed until later in
> order to meet the processing guarantee.* The default value is the same as
> the default for commit.interval.ms (30000).  (Note, unlike
> commit.interval.ms, the default for this value remains unchanged when
> processing.guarantee is set to exactly_once_v2)."
> 
> This should make it clear that this is just a minimum interval, without
> leaking too much detail in to the specification.
> 
> If there are no other issues, I'll put this to a vote.
> 
> Regards,
> 
> Nick Telford
> 
> On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <ca...@apache.org> wrote:
> 
>> Hi Nick,
>>
>> Sorry for the delay!
>>
>> Regarding point 7, I am fine with keeping the config as an interval and
>> keeping it independently of the commit interval. However, I would then
>> remove the following paragraph from the KIP:
>>
>> "We will still wait for a commit before explicitly deleting repartition
>> records, but we will only do so if the time since the last record
>> deletion is at least repartition.purge.interval.ms. This means the
>> lower-bound for repartition.purge.interval.ms  is effectively capped by
>> the value of commit.interval.ms."
>>
>> The reason is that in the previous paragraph you say that the configs
>> can be modified separately and then in this paragraph you bind the purge
>> interval to the commit interval. This seems a contradiction and
>> indicates that you are leaking too much implementation details into the
>> KIP. I think, just saying that the purge interval is a minimum and name
>> it accordingly without talking about the actual implementation makes the
>> KIP more robust against future implementation changes.
>>
>> My proposal for the config name is min.repartition.purge.interval.ms or
>> even min.purge.interval.ms with a preference for the former.
>>
>> Best,
>> Bruno
>>
>>
>>
>> On 04.01.22 17:21, Nick Telford wrote:
>>>>
>>>> You missed one "delete.interval.ms" in the last paragraph in section
>>>> "Proposed Changes".
>>>>
>>>
>>> I've fixed these now, including the title that I somehow missed!
>>>
>>> I am afraid, I again need to comment on point 7. IMO, it does not make
>>>
>>> sense to be able to tune repartition.purge.interval.ms and
>>>
>>> commit.interval.ms separately when the purge can only happen during a
>>>
>>> commit. For example, if I set commit.interval.ms to 30000 ms and
>>>
>>> repartition.purge.interval.ms to 35000 ms, the records will be purged at
>>>
>>> every second commit, i.e., every 60000 ms. What benefit do users have to
>>>
>>> set repartition.purge.interval.ms separately from commit.interval.ms?
>>>
>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
>>>
>>> 2*commit.interval.ms..
>>>
>>>
>>> Could we address this by chaning the name of the configuration to
>> something
>>> like "repartition.purge.min.interval.ms", to indicate that the
>> repartition
>>> purge interval will be *at least* this value?
>>>
>>> If that's still not suitable, are there any other existing configurations
>>> that behave in a similar way, i.e. dictate a multiple of another
>> interval,
>>> that we could use as a basis for a new name for this configuration?
>>>
>>> Additionally, I have a new point.
>>>
>>> 8. If user code has access to the processor context (e.g. in the
>>>
>>> processor API), a commit can also be requested on demand by user code.
>>>
>>> The KIP should clarify if purges might also happen during requested
>>>
>>> commits or if purges only happen during automatic commits.
>>>
>>>
>>> Good point. I'm holding off on amending this for now until we agree on an
>>> outcome for point 7 above, because I suspect it may at least influence
>> the
>>> wording of this section.
>>>
>>> Thanks for the feedback, and sorry for the delay. Now that the holidays
>> are
>>> over I'm able to focus on this again.
>>>
>>> Regards,
>>>
>>> Nick Telford
>>>
>>> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <ca...@apache.org> wrote:
>>>
>>>> Hi Nick,
>>>>
>>>> Thanks for the updates!
>>>>
>>>> The motivation section and the description of the new config is much
>>>> clearer and informative now!
>>>>
>>>> You missed one "delete.interval.ms" in the last paragraph in section
>>>> "Proposed Changes".
>>>>
>>>> I am afraid, I again need to comment on point 7. IMO, it does not make
>>>> sense to be able to tune repartition.purge.interval.ms and
>>>> commit.interval.ms separately when the purge can only happen during a
>>>> commit. For example, if I set commit.interval.ms to 30000 ms and
>>>> repartition.purge.interval.ms to 35000 ms, the records will be purged
>> at
>>>> every second commit, i.e., every 60000 ms. What benefit do users have to
>>>> set repartition.purge.interval.ms separately from commit.interval.ms?
>>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
>>>> 2*commit.interval.ms..
>>>>
>>>> Additionally, I have a new point.
>>>> 8. If user code has access to the processor context (e.g. in the
>>>> processor API), a commit can also be requested on demand by user code.
>>>> The KIP should clarify if purges might also happen during requested
>>>> commits or if purges only happen during automatic commits.
>>>>
>>>> Best,
>>>> Bruno
>>>>
>>>> On 21.12.21 20:40, Nick Telford wrote:
>>>>> Hi everyone,
>>>>>
>>>>> Thanks for your feedback. I've made the suggested changes to the KIP
>> (1,
>>>> 2,
>>>>> 3 and 5).
>>>>>
>>>>> For the new name, I've chosen repartition.purge.interval.ms, as I felt
>>>> it
>>>>> struck a good balance between being self-descriptive and concise.
>> Please
>>>>> let me know if you'd prefer something else.
>>>>>
>>>>> On point 6: My PR has basic validation to ensure the value is positive,
>>>> but
>>>>> I don't think it's necessary to have dynamic validation, to ensure it's
>>>> not
>>>>> less than commit.interval.ms. The reason is that it will be implicitly
>>>>> limited to that value anyway, and won't break anything. But I can add
>> it
>>>> if
>>>>> you'd prefer it.
>>>>>
>>>>> On point 7: I worry that defaulting it to follow the value of
>>>>> commit.interval.ms may confuse users, who will likely expect the
>>>> default to
>>>>> not be affected by changes to other configuration options. I can see
>> the
>>>>> appeal of retaining the existing behaviour (following the commit
>>>> interval)
>>>>> by default, but I believe that the majority of users who customize
>>>>> commit.interval.ms do not desire a different frequency of repartition
>>>>> record purging as well.
>>>>>
>>>>> As for multiples of commit interval: I think the argument against that
>> is
>>>>> that an interval is more intuitive when given as a time, rather than
>> as a
>>>>> multiple of some other operation. Users configuring this should not
>> need
>>>> to
>>>>> break out a calculator to work out how frequently the records are going
>>>> to
>>>>> be purged!
>>>>>
>>>>> I've also updated the PR with the relevant changes.
>>>>>
>>>>> BTW, for some reason I didn't receive Sophie's email. I'll periodically
>>>>> check the thread on the archive to ensure I don't miss any more of your
>>>>> messages!
>>>>>
>>>>> Regards,
>>>>>
>>>>> Nick
>>>>>
>>>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:
>>>>>
>>>>>> Thanks, Bruno.
>>>>>>
>>>>>> I agree my point 4 is more like PR discussion, not KIP discussion.
>>>>>> @Nick , please update my point 4 in PR directly.
>>>>>>
>>>>>> Thank you.
>>>>>> Luke
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <ca...@apache.org>
>>>> wrote:
>>>>>>
>>>>>>> Hi Nick,
>>>>>>>
>>>>>>> Thank you for the KIP!
>>>>>>>
>>>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree
>>>> that
>>>>>>> we should update the docs for commit.interval.ms but I am not sure
>> if
>>>>>>> this needs to mentioned in the KIP. That seems to me more a PR
>>>>>>> discussion. Also on point 2, I agree that we need to add a doc string
>>>>>>> but the content should be exemplary not binding. What I want to say
>> is
>>>>>>> that, we do not need a KIP to change docs.
>>>>>>>
>>>>>>> Here my points:
>>>>>>>
>>>>>>> 5. Could you specify in the motivation that the KIP is about deleting
>>>>>>> records from repartition topics? Maybe with a short description when
>>>> why
>>>>>>> and when records are deleted from the repartition topics. For us it
>>>>>>> might be clear, but IMO we should try to write KIPs so that someone
>>>> that
>>>>>>> is relatively new to Kafka Streams can understand the KIP without
>>>>>>> needing to know too much background.
>>>>>>>
>>>>>>> 6. Does the config need to be validated? For example, does
>>>>>>> delete.interval.ms need to be greater than or equal to
>>>>>> commit.interval.ms?
>>>>>>>
>>>>>>> 7. Should the default value for non-EOS be 30s or the same value as
>>>>>>> commit.interval.ms? I am just thinking about the case where a user
>>>>>>> explicitly changes commit.interval.ms but not delete.interval.ms (or
>>>>>>> whatever name you come up for it). Once delete.interval.ms is set
>>>>>>> explicitly it is decoupled from commit.interval.ms. Similar could be
>>>>>>> done for the EOS case.
>>>>>>> Alternatively, we could also define delete.interval.ms to take a
>>>>>>> integral number without a unit that specifies after how many commit
>>>>>>> intervals the records in repartition topics should be deleted. This
>>>>>>> would make sense since delete.interval.ms is tightly bound to
>>>>>>> commit.interval.ms. Additionally, it would make the semantics of the
>>>>>>> config simpler. The name of the config should definitely change if we
>>>> go
>>>>>>> down this way.
>>>>>>>
>>>>>>> Best,
>>>>>>> Bruno
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 21.12.21 11:14, Luke Chen wrote:
>>>>>>>> Hi Nick,
>>>>>>>>
>>>>>>>> Thanks for the KIP!
>>>>>>>>
>>>>>>>> In addition to Sophie's comments, I have one more to this KIP:
>>>>>>>> 3. I think you should mention the behavior change *explicitly* in
>>>>>>>> "Compatibility" section. I know you already mentioned it in KIP, in
>>>> the
>>>>>>>> benefit way. But I think in this section, we should clearly point
>> out
>>>>>>> which
>>>>>>>> behavior will be change after this KIP. That is, you should put it
>>>>>> clear
>>>>>>>> that the delete record interval will change from 100ms to 30s with
>> EOS
>>>>>>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
>>>>>>>> 4. Since this new config has some relationship with
>>>> commit.interval.ms
>>>>>> ,
>>>>>>> I
>>>>>>>> think we should also update the doc description for `
>>>>>> commit.interval.ms
>>>>>>> `,
>>>>>>>> to let user know there's another config to control delete interval
>> and
>>>>>>>> should be greater than commit.interval.ms. Something like that.
>> WDYT?
>>>>>>> (You
>>>>>>>> should put this change in the KIP as Sophie mentioned)
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>> Luke
>>>>>>>>
>>>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
>>>>>>>> <so...@confluent.io.invalid> wrote:
>>>>>>>>
>>>>>>>>> Hey Nick,
>>>>>>>>>
>>>>>>>>> I think you forgot to link to the KIP document, but I take it this
>> is
>>>>>>>>> it: KIP-811:
>>>>>>>>> Add separate delete.interval.ms to Kafka Streams
>>>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
>>>>>>>>>
>>>>>>>>> The overall proposal sounds good to me, just a few minor things:
>>>>>>>>>
>>>>>>>>>        1. Please specify everything needed to define this config
>>>>>>> explicitly, ie
>>>>>>>>>        all the arguments that will be passed in to the
>>>>>>>>>        StreamsConfig's ConfigDef: in addition to the default value,
>> we
>>>>>>> need the
>>>>>>>>>        config type (presumably a Long), the doc
>>>>>>>>>        string, and the importance (probably "low", similar to
>>>>>>>>> commit.interval.ms
>>>>>>>>>        )
>>>>>>>>>        2. Might be worth considering a slightly more descriptive
>> name
>>>> for
>>>>>>> this
>>>>>>>>>        config. Most users probably don't think about,
>>>>>>>>>        or may not even be aware of, the deletion of consumed
>> records by
>>>>>>> Kafka
>>>>>>>>>        Streams, so calling it something along
>>>>>>>>>        the lines of "repartition.records.delete.interval.ms" or "
>>>>>>>>>        consumed.records.deletion.interval.ms" or so on will help
>>>>>>>>>        make it clear what the config refers to and whether or not
>> they
>>>>>>> need to
>>>>>>>>>        care. Maybe you can come up with better
>>>>>>>>>        and/or shorter names, just wanted to suggest some example
>> names
>>>>>>> that I
>>>>>>>>>        think sufficiently get the point across
>>>>>>>>>
>>>>>>>>> Other than that I'm +1 -- thanks for the KIP!
>>>>>>>>>
>>>>>>>>> Sophie
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <
>> nick.telford@gmail.com
>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution
>>>> is
>>>>>>>>> very
>>>>>>>>>> simple, so the KIP is pretty short.
>>>>>>>>>>
>>>>>>>>>> The suggested changes are implemented by this PR
>>>>>>>>>> <https://github.com/apache/kafka/pull/11610>.
>>>>>>>>>> --
>>>>>>>>>> Nick Telford
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Nick Telford <ni...@gmail.com>.
Hi Bruno,

Thanks for your recommendations.

I've made those changes, and also updated the description for the new
config to read: "The minimum frequency in milliseconds with which to delete
fully consumed records from repartition topics. *Purging will occur after
at least this value since the last purge, but may be delayed until later in
order to meet the processing guarantee.* The default value is the same as
the default for commit.interval.ms (30000).  (Note, unlike
commit.interval.ms, the default for this value remains unchanged when
processing.guarantee is set to exactly_once_v2)."

This should make it clear that this is just a minimum interval, without
leaking too much detail in to the specification.

If there are no other issues, I'll put this to a vote.

Regards,

Nick Telford

On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <ca...@apache.org> wrote:

> Hi Nick,
>
> Sorry for the delay!
>
> Regarding point 7, I am fine with keeping the config as an interval and
> keeping it independently of the commit interval. However, I would then
> remove the following paragraph from the KIP:
>
> "We will still wait for a commit before explicitly deleting repartition
> records, but we will only do so if the time since the last record
> deletion is at least repartition.purge.interval.ms. This means the
> lower-bound for repartition.purge.interval.ms  is effectively capped by
> the value of commit.interval.ms."
>
> The reason is that in the previous paragraph you say that the configs
> can be modified separately and then in this paragraph you bind the purge
> interval to the commit interval. This seems a contradiction and
> indicates that you are leaking too much implementation details into the
> KIP. I think, just saying that the purge interval is a minimum and name
> it accordingly without talking about the actual implementation makes the
> KIP more robust against future implementation changes.
>
> My proposal for the config name is min.repartition.purge.interval.ms or
> even min.purge.interval.ms with a preference for the former.
>
> Best,
> Bruno
>
>
>
> On 04.01.22 17:21, Nick Telford wrote:
> >>
> >> You missed one "delete.interval.ms" in the last paragraph in section
> >> "Proposed Changes".
> >>
> >
> > I've fixed these now, including the title that I somehow missed!
> >
> > I am afraid, I again need to comment on point 7. IMO, it does not make
> >
> > sense to be able to tune repartition.purge.interval.ms and
> >
> > commit.interval.ms separately when the purge can only happen during a
> >
> > commit. For example, if I set commit.interval.ms to 30000 ms and
> >
> > repartition.purge.interval.ms to 35000 ms, the records will be purged at
> >
> > every second commit, i.e., every 60000 ms. What benefit do users have to
> >
> > set repartition.purge.interval.ms separately from commit.interval.ms?
> >
> > The rate of purging will never be 1 / 35000, the rate will be 1 /
> >
> > 2*commit.interval.ms..
> >
> >
> > Could we address this by chaning the name of the configuration to
> something
> > like "repartition.purge.min.interval.ms", to indicate that the
> repartition
> > purge interval will be *at least* this value?
> >
> > If that's still not suitable, are there any other existing configurations
> > that behave in a similar way, i.e. dictate a multiple of another
> interval,
> > that we could use as a basis for a new name for this configuration?
> >
> > Additionally, I have a new point.
> >
> > 8. If user code has access to the processor context (e.g. in the
> >
> > processor API), a commit can also be requested on demand by user code.
> >
> > The KIP should clarify if purges might also happen during requested
> >
> > commits or if purges only happen during automatic commits.
> >
> >
> > Good point. I'm holding off on amending this for now until we agree on an
> > outcome for point 7 above, because I suspect it may at least influence
> the
> > wording of this section.
> >
> > Thanks for the feedback, and sorry for the delay. Now that the holidays
> are
> > over I'm able to focus on this again.
> >
> > Regards,
> >
> > Nick Telford
> >
> > On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <ca...@apache.org> wrote:
> >
> >> Hi Nick,
> >>
> >> Thanks for the updates!
> >>
> >> The motivation section and the description of the new config is much
> >> clearer and informative now!
> >>
> >> You missed one "delete.interval.ms" in the last paragraph in section
> >> "Proposed Changes".
> >>
> >> I am afraid, I again need to comment on point 7. IMO, it does not make
> >> sense to be able to tune repartition.purge.interval.ms and
> >> commit.interval.ms separately when the purge can only happen during a
> >> commit. For example, if I set commit.interval.ms to 30000 ms and
> >> repartition.purge.interval.ms to 35000 ms, the records will be purged
> at
> >> every second commit, i.e., every 60000 ms. What benefit do users have to
> >> set repartition.purge.interval.ms separately from commit.interval.ms?
> >> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >> 2*commit.interval.ms..
> >>
> >> Additionally, I have a new point.
> >> 8. If user code has access to the processor context (e.g. in the
> >> processor API), a commit can also be requested on demand by user code.
> >> The KIP should clarify if purges might also happen during requested
> >> commits or if purges only happen during automatic commits.
> >>
> >> Best,
> >> Bruno
> >>
> >> On 21.12.21 20:40, Nick Telford wrote:
> >>> Hi everyone,
> >>>
> >>> Thanks for your feedback. I've made the suggested changes to the KIP
> (1,
> >> 2,
> >>> 3 and 5).
> >>>
> >>> For the new name, I've chosen repartition.purge.interval.ms, as I felt
> >> it
> >>> struck a good balance between being self-descriptive and concise.
> Please
> >>> let me know if you'd prefer something else.
> >>>
> >>> On point 6: My PR has basic validation to ensure the value is positive,
> >> but
> >>> I don't think it's necessary to have dynamic validation, to ensure it's
> >> not
> >>> less than commit.interval.ms. The reason is that it will be implicitly
> >>> limited to that value anyway, and won't break anything. But I can add
> it
> >> if
> >>> you'd prefer it.
> >>>
> >>> On point 7: I worry that defaulting it to follow the value of
> >>> commit.interval.ms may confuse users, who will likely expect the
> >> default to
> >>> not be affected by changes to other configuration options. I can see
> the
> >>> appeal of retaining the existing behaviour (following the commit
> >> interval)
> >>> by default, but I believe that the majority of users who customize
> >>> commit.interval.ms do not desire a different frequency of repartition
> >>> record purging as well.
> >>>
> >>> As for multiples of commit interval: I think the argument against that
> is
> >>> that an interval is more intuitive when given as a time, rather than
> as a
> >>> multiple of some other operation. Users configuring this should not
> need
> >> to
> >>> break out a calculator to work out how frequently the records are going
> >> to
> >>> be purged!
> >>>
> >>> I've also updated the PR with the relevant changes.
> >>>
> >>> BTW, for some reason I didn't receive Sophie's email. I'll periodically
> >>> check the thread on the archive to ensure I don't miss any more of your
> >>> messages!
> >>>
> >>> Regards,
> >>>
> >>> Nick
> >>>
> >>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:
> >>>
> >>>> Thanks, Bruno.
> >>>>
> >>>> I agree my point 4 is more like PR discussion, not KIP discussion.
> >>>> @Nick , please update my point 4 in PR directly.
> >>>>
> >>>> Thank you.
> >>>> Luke
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <ca...@apache.org>
> >> wrote:
> >>>>
> >>>>> Hi Nick,
> >>>>>
> >>>>> Thank you for the KIP!
> >>>>>
> >>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree
> >> that
> >>>>> we should update the docs for commit.interval.ms but I am not sure
> if
> >>>>> this needs to mentioned in the KIP. That seems to me more a PR
> >>>>> discussion. Also on point 2, I agree that we need to add a doc string
> >>>>> but the content should be exemplary not binding. What I want to say
> is
> >>>>> that, we do not need a KIP to change docs.
> >>>>>
> >>>>> Here my points:
> >>>>>
> >>>>> 5. Could you specify in the motivation that the KIP is about deleting
> >>>>> records from repartition topics? Maybe with a short description when
> >> why
> >>>>> and when records are deleted from the repartition topics. For us it
> >>>>> might be clear, but IMO we should try to write KIPs so that someone
> >> that
> >>>>> is relatively new to Kafka Streams can understand the KIP without
> >>>>> needing to know too much background.
> >>>>>
> >>>>> 6. Does the config need to be validated? For example, does
> >>>>> delete.interval.ms need to be greater than or equal to
> >>>> commit.interval.ms?
> >>>>>
> >>>>> 7. Should the default value for non-EOS be 30s or the same value as
> >>>>> commit.interval.ms? I am just thinking about the case where a user
> >>>>> explicitly changes commit.interval.ms but not delete.interval.ms (or
> >>>>> whatever name you come up for it). Once delete.interval.ms is set
> >>>>> explicitly it is decoupled from commit.interval.ms. Similar could be
> >>>>> done for the EOS case.
> >>>>> Alternatively, we could also define delete.interval.ms to take a
> >>>>> integral number without a unit that specifies after how many commit
> >>>>> intervals the records in repartition topics should be deleted. This
> >>>>> would make sense since delete.interval.ms is tightly bound to
> >>>>> commit.interval.ms. Additionally, it would make the semantics of the
> >>>>> config simpler. The name of the config should definitely change if we
> >> go
> >>>>> down this way.
> >>>>>
> >>>>> Best,
> >>>>> Bruno
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 21.12.21 11:14, Luke Chen wrote:
> >>>>>> Hi Nick,
> >>>>>>
> >>>>>> Thanks for the KIP!
> >>>>>>
> >>>>>> In addition to Sophie's comments, I have one more to this KIP:
> >>>>>> 3. I think you should mention the behavior change *explicitly* in
> >>>>>> "Compatibility" section. I know you already mentioned it in KIP, in
> >> the
> >>>>>> benefit way. But I think in this section, we should clearly point
> out
> >>>>> which
> >>>>>> behavior will be change after this KIP. That is, you should put it
> >>>> clear
> >>>>>> that the delete record interval will change from 100ms to 30s with
> EOS
> >>>>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
> >>>>>> 4. Since this new config has some relationship with
> >> commit.interval.ms
> >>>> ,
> >>>>> I
> >>>>>> think we should also update the doc description for `
> >>>> commit.interval.ms
> >>>>> `,
> >>>>>> to let user know there's another config to control delete interval
> and
> >>>>>> should be greater than commit.interval.ms. Something like that.
> WDYT?
> >>>>> (You
> >>>>>> should put this change in the KIP as Sophie mentioned)
> >>>>>>
> >>>>>> Thank you.
> >>>>>> Luke
> >>>>>>
> >>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> >>>>>> <so...@confluent.io.invalid> wrote:
> >>>>>>
> >>>>>>> Hey Nick,
> >>>>>>>
> >>>>>>> I think you forgot to link to the KIP document, but I take it this
> is
> >>>>>>> it: KIP-811:
> >>>>>>> Add separate delete.interval.ms to Kafka Streams
> >>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
> >>>>>>>
> >>>>>>> The overall proposal sounds good to me, just a few minor things:
> >>>>>>>
> >>>>>>>       1. Please specify everything needed to define this config
> >>>>> explicitly, ie
> >>>>>>>       all the arguments that will be passed in to the
> >>>>>>>       StreamsConfig's ConfigDef: in addition to the default value,
> we
> >>>>> need the
> >>>>>>>       config type (presumably a Long), the doc
> >>>>>>>       string, and the importance (probably "low", similar to
> >>>>>>> commit.interval.ms
> >>>>>>>       )
> >>>>>>>       2. Might be worth considering a slightly more descriptive
> name
> >> for
> >>>>> this
> >>>>>>>       config. Most users probably don't think about,
> >>>>>>>       or may not even be aware of, the deletion of consumed
> records by
> >>>>> Kafka
> >>>>>>>       Streams, so calling it something along
> >>>>>>>       the lines of "repartition.records.delete.interval.ms" or "
> >>>>>>>       consumed.records.deletion.interval.ms" or so on will help
> >>>>>>>       make it clear what the config refers to and whether or not
> they
> >>>>> need to
> >>>>>>>       care. Maybe you can come up with better
> >>>>>>>       and/or shorter names, just wanted to suggest some example
> names
> >>>>> that I
> >>>>>>>       think sufficiently get the point across
> >>>>>>>
> >>>>>>> Other than that I'm +1 -- thanks for the KIP!
> >>>>>>>
> >>>>>>> Sophie
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <
> nick.telford@gmail.com
> >>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
> >>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution
> >> is
> >>>>>>> very
> >>>>>>>> simple, so the KIP is pretty short.
> >>>>>>>>
> >>>>>>>> The suggested changes are implemented by this PR
> >>>>>>>> <https://github.com/apache/kafka/pull/11610>.
> >>>>>>>> --
> >>>>>>>> Nick Telford
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Nick,

Sorry for the delay!

Regarding point 7, I am fine with keeping the config as an interval and 
keeping it independently of the commit interval. However, I would then 
remove the following paragraph from the KIP:

"We will still wait for a commit before explicitly deleting repartition 
records, but we will only do so if the time since the last record 
deletion is at least repartition.purge.interval.ms. This means the 
lower-bound for repartition.purge.interval.ms  is effectively capped by 
the value of commit.interval.ms."

The reason is that in the previous paragraph you say that the configs 
can be modified separately and then in this paragraph you bind the purge 
interval to the commit interval. This seems a contradiction and 
indicates that you are leaking too much implementation details into the 
KIP. I think, just saying that the purge interval is a minimum and name 
it accordingly without talking about the actual implementation makes the 
KIP more robust against future implementation changes.

My proposal for the config name is min.repartition.purge.interval.ms or 
even min.purge.interval.ms with a preference for the former.

Best,
Bruno



On 04.01.22 17:21, Nick Telford wrote:
>>
>> You missed one "delete.interval.ms" in the last paragraph in section
>> "Proposed Changes".
>>
> 
> I've fixed these now, including the title that I somehow missed!
> 
> I am afraid, I again need to comment on point 7. IMO, it does not make
> 
> sense to be able to tune repartition.purge.interval.ms and
> 
> commit.interval.ms separately when the purge can only happen during a
> 
> commit. For example, if I set commit.interval.ms to 30000 ms and
> 
> repartition.purge.interval.ms to 35000 ms, the records will be purged at
> 
> every second commit, i.e., every 60000 ms. What benefit do users have to
> 
> set repartition.purge.interval.ms separately from commit.interval.ms?
> 
> The rate of purging will never be 1 / 35000, the rate will be 1 /
> 
> 2*commit.interval.ms..
> 
> 
> Could we address this by chaning the name of the configuration to something
> like "repartition.purge.min.interval.ms", to indicate that the repartition
> purge interval will be *at least* this value?
> 
> If that's still not suitable, are there any other existing configurations
> that behave in a similar way, i.e. dictate a multiple of another interval,
> that we could use as a basis for a new name for this configuration?
> 
> Additionally, I have a new point.
> 
> 8. If user code has access to the processor context (e.g. in the
> 
> processor API), a commit can also be requested on demand by user code.
> 
> The KIP should clarify if purges might also happen during requested
> 
> commits or if purges only happen during automatic commits.
> 
> 
> Good point. I'm holding off on amending this for now until we agree on an
> outcome for point 7 above, because I suspect it may at least influence the
> wording of this section.
> 
> Thanks for the feedback, and sorry for the delay. Now that the holidays are
> over I'm able to focus on this again.
> 
> Regards,
> 
> Nick Telford
> 
> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <ca...@apache.org> wrote:
> 
>> Hi Nick,
>>
>> Thanks for the updates!
>>
>> The motivation section and the description of the new config is much
>> clearer and informative now!
>>
>> You missed one "delete.interval.ms" in the last paragraph in section
>> "Proposed Changes".
>>
>> I am afraid, I again need to comment on point 7. IMO, it does not make
>> sense to be able to tune repartition.purge.interval.ms and
>> commit.interval.ms separately when the purge can only happen during a
>> commit. For example, if I set commit.interval.ms to 30000 ms and
>> repartition.purge.interval.ms to 35000 ms, the records will be purged at
>> every second commit, i.e., every 60000 ms. What benefit do users have to
>> set repartition.purge.interval.ms separately from commit.interval.ms?
>> The rate of purging will never be 1 / 35000, the rate will be 1 /
>> 2*commit.interval.ms..
>>
>> Additionally, I have a new point.
>> 8. If user code has access to the processor context (e.g. in the
>> processor API), a commit can also be requested on demand by user code.
>> The KIP should clarify if purges might also happen during requested
>> commits or if purges only happen during automatic commits.
>>
>> Best,
>> Bruno
>>
>> On 21.12.21 20:40, Nick Telford wrote:
>>> Hi everyone,
>>>
>>> Thanks for your feedback. I've made the suggested changes to the KIP (1,
>> 2,
>>> 3 and 5).
>>>
>>> For the new name, I've chosen repartition.purge.interval.ms, as I felt
>> it
>>> struck a good balance between being self-descriptive and concise. Please
>>> let me know if you'd prefer something else.
>>>
>>> On point 6: My PR has basic validation to ensure the value is positive,
>> but
>>> I don't think it's necessary to have dynamic validation, to ensure it's
>> not
>>> less than commit.interval.ms. The reason is that it will be implicitly
>>> limited to that value anyway, and won't break anything. But I can add it
>> if
>>> you'd prefer it.
>>>
>>> On point 7: I worry that defaulting it to follow the value of
>>> commit.interval.ms may confuse users, who will likely expect the
>> default to
>>> not be affected by changes to other configuration options. I can see the
>>> appeal of retaining the existing behaviour (following the commit
>> interval)
>>> by default, but I believe that the majority of users who customize
>>> commit.interval.ms do not desire a different frequency of repartition
>>> record purging as well.
>>>
>>> As for multiples of commit interval: I think the argument against that is
>>> that an interval is more intuitive when given as a time, rather than as a
>>> multiple of some other operation. Users configuring this should not need
>> to
>>> break out a calculator to work out how frequently the records are going
>> to
>>> be purged!
>>>
>>> I've also updated the PR with the relevant changes.
>>>
>>> BTW, for some reason I didn't receive Sophie's email. I'll periodically
>>> check the thread on the archive to ensure I don't miss any more of your
>>> messages!
>>>
>>> Regards,
>>>
>>> Nick
>>>
>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:
>>>
>>>> Thanks, Bruno.
>>>>
>>>> I agree my point 4 is more like PR discussion, not KIP discussion.
>>>> @Nick , please update my point 4 in PR directly.
>>>>
>>>> Thank you.
>>>> Luke
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <ca...@apache.org>
>> wrote:
>>>>
>>>>> Hi Nick,
>>>>>
>>>>> Thank you for the KIP!
>>>>>
>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree
>> that
>>>>> we should update the docs for commit.interval.ms but I am not sure if
>>>>> this needs to mentioned in the KIP. That seems to me more a PR
>>>>> discussion. Also on point 2, I agree that we need to add a doc string
>>>>> but the content should be exemplary not binding. What I want to say is
>>>>> that, we do not need a KIP to change docs.
>>>>>
>>>>> Here my points:
>>>>>
>>>>> 5. Could you specify in the motivation that the KIP is about deleting
>>>>> records from repartition topics? Maybe with a short description when
>> why
>>>>> and when records are deleted from the repartition topics. For us it
>>>>> might be clear, but IMO we should try to write KIPs so that someone
>> that
>>>>> is relatively new to Kafka Streams can understand the KIP without
>>>>> needing to know too much background.
>>>>>
>>>>> 6. Does the config need to be validated? For example, does
>>>>> delete.interval.ms need to be greater than or equal to
>>>> commit.interval.ms?
>>>>>
>>>>> 7. Should the default value for non-EOS be 30s or the same value as
>>>>> commit.interval.ms? I am just thinking about the case where a user
>>>>> explicitly changes commit.interval.ms but not delete.interval.ms (or
>>>>> whatever name you come up for it). Once delete.interval.ms is set
>>>>> explicitly it is decoupled from commit.interval.ms. Similar could be
>>>>> done for the EOS case.
>>>>> Alternatively, we could also define delete.interval.ms to take a
>>>>> integral number without a unit that specifies after how many commit
>>>>> intervals the records in repartition topics should be deleted. This
>>>>> would make sense since delete.interval.ms is tightly bound to
>>>>> commit.interval.ms. Additionally, it would make the semantics of the
>>>>> config simpler. The name of the config should definitely change if we
>> go
>>>>> down this way.
>>>>>
>>>>> Best,
>>>>> Bruno
>>>>>
>>>>>
>>>>>
>>>>> On 21.12.21 11:14, Luke Chen wrote:
>>>>>> Hi Nick,
>>>>>>
>>>>>> Thanks for the KIP!
>>>>>>
>>>>>> In addition to Sophie's comments, I have one more to this KIP:
>>>>>> 3. I think you should mention the behavior change *explicitly* in
>>>>>> "Compatibility" section. I know you already mentioned it in KIP, in
>> the
>>>>>> benefit way. But I think in this section, we should clearly point out
>>>>> which
>>>>>> behavior will be change after this KIP. That is, you should put it
>>>> clear
>>>>>> that the delete record interval will change from 100ms to 30s with EOS
>>>>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
>>>>>> 4. Since this new config has some relationship with
>> commit.interval.ms
>>>> ,
>>>>> I
>>>>>> think we should also update the doc description for `
>>>> commit.interval.ms
>>>>> `,
>>>>>> to let user know there's another config to control delete interval and
>>>>>> should be greater than commit.interval.ms. Something like that. WDYT?
>>>>> (You
>>>>>> should put this change in the KIP as Sophie mentioned)
>>>>>>
>>>>>> Thank you.
>>>>>> Luke
>>>>>>
>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
>>>>>> <so...@confluent.io.invalid> wrote:
>>>>>>
>>>>>>> Hey Nick,
>>>>>>>
>>>>>>> I think you forgot to link to the KIP document, but I take it this is
>>>>>>> it: KIP-811:
>>>>>>> Add separate delete.interval.ms to Kafka Streams
>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
>>>>>>>
>>>>>>> The overall proposal sounds good to me, just a few minor things:
>>>>>>>
>>>>>>>       1. Please specify everything needed to define this config
>>>>> explicitly, ie
>>>>>>>       all the arguments that will be passed in to the
>>>>>>>       StreamsConfig's ConfigDef: in addition to the default value, we
>>>>> need the
>>>>>>>       config type (presumably a Long), the doc
>>>>>>>       string, and the importance (probably "low", similar to
>>>>>>> commit.interval.ms
>>>>>>>       )
>>>>>>>       2. Might be worth considering a slightly more descriptive name
>> for
>>>>> this
>>>>>>>       config. Most users probably don't think about,
>>>>>>>       or may not even be aware of, the deletion of consumed records by
>>>>> Kafka
>>>>>>>       Streams, so calling it something along
>>>>>>>       the lines of "repartition.records.delete.interval.ms" or "
>>>>>>>       consumed.records.deletion.interval.ms" or so on will help
>>>>>>>       make it clear what the config refers to and whether or not they
>>>>> need to
>>>>>>>       care. Maybe you can come up with better
>>>>>>>       and/or shorter names, just wanted to suggest some example names
>>>>> that I
>>>>>>>       think sufficiently get the point across
>>>>>>>
>>>>>>> Other than that I'm +1 -- thanks for the KIP!
>>>>>>>
>>>>>>> Sophie
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <nick.telford@gmail.com
>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution
>> is
>>>>>>> very
>>>>>>>> simple, so the KIP is pretty short.
>>>>>>>>
>>>>>>>> The suggested changes are implemented by this PR
>>>>>>>> <https://github.com/apache/kafka/pull/11610>.
>>>>>>>> --
>>>>>>>> Nick Telford
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Nick Telford <ni...@gmail.com>.
>
> You missed one "delete.interval.ms" in the last paragraph in section
> "Proposed Changes".
>

I've fixed these now, including the title that I somehow missed!

I am afraid, I again need to comment on point 7. IMO, it does not make

sense to be able to tune repartition.purge.interval.ms and

commit.interval.ms separately when the purge can only happen during a

commit. For example, if I set commit.interval.ms to 30000 ms and

repartition.purge.interval.ms to 35000 ms, the records will be purged at

every second commit, i.e., every 60000 ms. What benefit do users have to

set repartition.purge.interval.ms separately from commit.interval.ms?

The rate of purging will never be 1 / 35000, the rate will be 1 /

2*commit.interval.ms..


Could we address this by chaning the name of the configuration to something
like "repartition.purge.min.interval.ms", to indicate that the repartition
purge interval will be *at least* this value?

If that's still not suitable, are there any other existing configurations
that behave in a similar way, i.e. dictate a multiple of another interval,
that we could use as a basis for a new name for this configuration?

Additionally, I have a new point.

8. If user code has access to the processor context (e.g. in the

processor API), a commit can also be requested on demand by user code.

The KIP should clarify if purges might also happen during requested

commits or if purges only happen during automatic commits.


Good point. I'm holding off on amending this for now until we agree on an
outcome for point 7 above, because I suspect it may at least influence the
wording of this section.

Thanks for the feedback, and sorry for the delay. Now that the holidays are
over I'm able to focus on this again.

Regards,

Nick Telford

On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <ca...@apache.org> wrote:

> Hi Nick,
>
> Thanks for the updates!
>
> The motivation section and the description of the new config is much
> clearer and informative now!
>
> You missed one "delete.interval.ms" in the last paragraph in section
> "Proposed Changes".
>
> I am afraid, I again need to comment on point 7. IMO, it does not make
> sense to be able to tune repartition.purge.interval.ms and
> commit.interval.ms separately when the purge can only happen during a
> commit. For example, if I set commit.interval.ms to 30000 ms and
> repartition.purge.interval.ms to 35000 ms, the records will be purged at
> every second commit, i.e., every 60000 ms. What benefit do users have to
> set repartition.purge.interval.ms separately from commit.interval.ms?
> The rate of purging will never be 1 / 35000, the rate will be 1 /
> 2*commit.interval.ms..
>
> Additionally, I have a new point.
> 8. If user code has access to the processor context (e.g. in the
> processor API), a commit can also be requested on demand by user code.
> The KIP should clarify if purges might also happen during requested
> commits or if purges only happen during automatic commits.
>
> Best,
> Bruno
>
> On 21.12.21 20:40, Nick Telford wrote:
> > Hi everyone,
> >
> > Thanks for your feedback. I've made the suggested changes to the KIP (1,
> 2,
> > 3 and 5).
> >
> > For the new name, I've chosen repartition.purge.interval.ms, as I felt
> it
> > struck a good balance between being self-descriptive and concise. Please
> > let me know if you'd prefer something else.
> >
> > On point 6: My PR has basic validation to ensure the value is positive,
> but
> > I don't think it's necessary to have dynamic validation, to ensure it's
> not
> > less than commit.interval.ms. The reason is that it will be implicitly
> > limited to that value anyway, and won't break anything. But I can add it
> if
> > you'd prefer it.
> >
> > On point 7: I worry that defaulting it to follow the value of
> > commit.interval.ms may confuse users, who will likely expect the
> default to
> > not be affected by changes to other configuration options. I can see the
> > appeal of retaining the existing behaviour (following the commit
> interval)
> > by default, but I believe that the majority of users who customize
> > commit.interval.ms do not desire a different frequency of repartition
> > record purging as well.
> >
> > As for multiples of commit interval: I think the argument against that is
> > that an interval is more intuitive when given as a time, rather than as a
> > multiple of some other operation. Users configuring this should not need
> to
> > break out a calculator to work out how frequently the records are going
> to
> > be purged!
> >
> > I've also updated the PR with the relevant changes.
> >
> > BTW, for some reason I didn't receive Sophie's email. I'll periodically
> > check the thread on the archive to ensure I don't miss any more of your
> > messages!
> >
> > Regards,
> >
> > Nick
> >
> > On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:
> >
> >> Thanks, Bruno.
> >>
> >> I agree my point 4 is more like PR discussion, not KIP discussion.
> >> @Nick , please update my point 4 in PR directly.
> >>
> >> Thank you.
> >> Luke
> >>
> >>
> >>
> >>
> >> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <ca...@apache.org>
> wrote:
> >>
> >>> Hi Nick,
> >>>
> >>> Thank you for the KIP!
> >>>
> >>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree
> that
> >>> we should update the docs for commit.interval.ms but I am not sure if
> >>> this needs to mentioned in the KIP. That seems to me more a PR
> >>> discussion. Also on point 2, I agree that we need to add a doc string
> >>> but the content should be exemplary not binding. What I want to say is
> >>> that, we do not need a KIP to change docs.
> >>>
> >>> Here my points:
> >>>
> >>> 5. Could you specify in the motivation that the KIP is about deleting
> >>> records from repartition topics? Maybe with a short description when
> why
> >>> and when records are deleted from the repartition topics. For us it
> >>> might be clear, but IMO we should try to write KIPs so that someone
> that
> >>> is relatively new to Kafka Streams can understand the KIP without
> >>> needing to know too much background.
> >>>
> >>> 6. Does the config need to be validated? For example, does
> >>> delete.interval.ms need to be greater than or equal to
> >> commit.interval.ms?
> >>>
> >>> 7. Should the default value for non-EOS be 30s or the same value as
> >>> commit.interval.ms? I am just thinking about the case where a user
> >>> explicitly changes commit.interval.ms but not delete.interval.ms (or
> >>> whatever name you come up for it). Once delete.interval.ms is set
> >>> explicitly it is decoupled from commit.interval.ms. Similar could be
> >>> done for the EOS case.
> >>> Alternatively, we could also define delete.interval.ms to take a
> >>> integral number without a unit that specifies after how many commit
> >>> intervals the records in repartition topics should be deleted. This
> >>> would make sense since delete.interval.ms is tightly bound to
> >>> commit.interval.ms. Additionally, it would make the semantics of the
> >>> config simpler. The name of the config should definitely change if we
> go
> >>> down this way.
> >>>
> >>> Best,
> >>> Bruno
> >>>
> >>>
> >>>
> >>> On 21.12.21 11:14, Luke Chen wrote:
> >>>> Hi Nick,
> >>>>
> >>>> Thanks for the KIP!
> >>>>
> >>>> In addition to Sophie's comments, I have one more to this KIP:
> >>>> 3. I think you should mention the behavior change *explicitly* in
> >>>> "Compatibility" section. I know you already mentioned it in KIP, in
> the
> >>>> benefit way. But I think in this section, we should clearly point out
> >>> which
> >>>> behavior will be change after this KIP. That is, you should put it
> >> clear
> >>>> that the delete record interval will change from 100ms to 30s with EOS
> >>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
> >>>> 4. Since this new config has some relationship with
> commit.interval.ms
> >> ,
> >>> I
> >>>> think we should also update the doc description for `
> >> commit.interval.ms
> >>> `,
> >>>> to let user know there's another config to control delete interval and
> >>>> should be greater than commit.interval.ms. Something like that. WDYT?
> >>> (You
> >>>> should put this change in the KIP as Sophie mentioned)
> >>>>
> >>>> Thank you.
> >>>> Luke
> >>>>
> >>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> >>>> <so...@confluent.io.invalid> wrote:
> >>>>
> >>>>> Hey Nick,
> >>>>>
> >>>>> I think you forgot to link to the KIP document, but I take it this is
> >>>>> it: KIP-811:
> >>>>> Add separate delete.interval.ms to Kafka Streams
> >>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
> >>>>>
> >>>>> The overall proposal sounds good to me, just a few minor things:
> >>>>>
> >>>>>      1. Please specify everything needed to define this config
> >>> explicitly, ie
> >>>>>      all the arguments that will be passed in to the
> >>>>>      StreamsConfig's ConfigDef: in addition to the default value, we
> >>> need the
> >>>>>      config type (presumably a Long), the doc
> >>>>>      string, and the importance (probably "low", similar to
> >>>>> commit.interval.ms
> >>>>>      )
> >>>>>      2. Might be worth considering a slightly more descriptive name
> for
> >>> this
> >>>>>      config. Most users probably don't think about,
> >>>>>      or may not even be aware of, the deletion of consumed records by
> >>> Kafka
> >>>>>      Streams, so calling it something along
> >>>>>      the lines of "repartition.records.delete.interval.ms" or "
> >>>>>      consumed.records.deletion.interval.ms" or so on will help
> >>>>>      make it clear what the config refers to and whether or not they
> >>> need to
> >>>>>      care. Maybe you can come up with better
> >>>>>      and/or shorter names, just wanted to suggest some example names
> >>> that I
> >>>>>      think sufficiently get the point across
> >>>>>
> >>>>> Other than that I'm +1 -- thanks for the KIP!
> >>>>>
> >>>>> Sophie
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <nick.telford@gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>>> This is a KIP for a proposed solution to KAFKA-13549
> >>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution
> is
> >>>>> very
> >>>>>> simple, so the KIP is pretty short.
> >>>>>>
> >>>>>> The suggested changes are implemented by this PR
> >>>>>> <https://github.com/apache/kafka/pull/11610>.
> >>>>>> --
> >>>>>> Nick Telford
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Nick,

Thanks for the updates!

The motivation section and the description of the new config is much 
clearer and informative now!

You missed one "delete.interval.ms" in the last paragraph in section 
"Proposed Changes".

I am afraid, I again need to comment on point 7. IMO, it does not make 
sense to be able to tune repartition.purge.interval.ms and 
commit.interval.ms separately when the purge can only happen during a 
commit. For example, if I set commit.interval.ms to 30000 ms and 
repartition.purge.interval.ms to 35000 ms, the records will be purged at 
every second commit, i.e., every 60000 ms. What benefit do users have to 
set repartition.purge.interval.ms separately from commit.interval.ms? 
The rate of purging will never be 1 / 35000, the rate will be 1 / 
2*commit.interval.ms..

Additionally, I have a new point.
8. If user code has access to the processor context (e.g. in the 
processor API), a commit can also be requested on demand by user code. 
The KIP should clarify if purges might also happen during requested 
commits or if purges only happen during automatic commits.

Best,
Bruno

On 21.12.21 20:40, Nick Telford wrote:
> Hi everyone,
> 
> Thanks for your feedback. I've made the suggested changes to the KIP (1, 2,
> 3 and 5).
> 
> For the new name, I've chosen repartition.purge.interval.ms, as I felt it
> struck a good balance between being self-descriptive and concise. Please
> let me know if you'd prefer something else.
> 
> On point 6: My PR has basic validation to ensure the value is positive, but
> I don't think it's necessary to have dynamic validation, to ensure it's not
> less than commit.interval.ms. The reason is that it will be implicitly
> limited to that value anyway, and won't break anything. But I can add it if
> you'd prefer it.
> 
> On point 7: I worry that defaulting it to follow the value of
> commit.interval.ms may confuse users, who will likely expect the default to
> not be affected by changes to other configuration options. I can see the
> appeal of retaining the existing behaviour (following the commit interval)
> by default, but I believe that the majority of users who customize
> commit.interval.ms do not desire a different frequency of repartition
> record purging as well.
> 
> As for multiples of commit interval: I think the argument against that is
> that an interval is more intuitive when given as a time, rather than as a
> multiple of some other operation. Users configuring this should not need to
> break out a calculator to work out how frequently the records are going to
> be purged!
> 
> I've also updated the PR with the relevant changes.
> 
> BTW, for some reason I didn't receive Sophie's email. I'll periodically
> check the thread on the archive to ensure I don't miss any more of your
> messages!
> 
> Regards,
> 
> Nick
> 
> On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:
> 
>> Thanks, Bruno.
>>
>> I agree my point 4 is more like PR discussion, not KIP discussion.
>> @Nick , please update my point 4 in PR directly.
>>
>> Thank you.
>> Luke
>>
>>
>>
>>
>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <ca...@apache.org> wrote:
>>
>>> Hi Nick,
>>>
>>> Thank you for the KIP!
>>>
>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree that
>>> we should update the docs for commit.interval.ms but I am not sure if
>>> this needs to mentioned in the KIP. That seems to me more a PR
>>> discussion. Also on point 2, I agree that we need to add a doc string
>>> but the content should be exemplary not binding. What I want to say is
>>> that, we do not need a KIP to change docs.
>>>
>>> Here my points:
>>>
>>> 5. Could you specify in the motivation that the KIP is about deleting
>>> records from repartition topics? Maybe with a short description when why
>>> and when records are deleted from the repartition topics. For us it
>>> might be clear, but IMO we should try to write KIPs so that someone that
>>> is relatively new to Kafka Streams can understand the KIP without
>>> needing to know too much background.
>>>
>>> 6. Does the config need to be validated? For example, does
>>> delete.interval.ms need to be greater than or equal to
>> commit.interval.ms?
>>>
>>> 7. Should the default value for non-EOS be 30s or the same value as
>>> commit.interval.ms? I am just thinking about the case where a user
>>> explicitly changes commit.interval.ms but not delete.interval.ms (or
>>> whatever name you come up for it). Once delete.interval.ms is set
>>> explicitly it is decoupled from commit.interval.ms. Similar could be
>>> done for the EOS case.
>>> Alternatively, we could also define delete.interval.ms to take a
>>> integral number without a unit that specifies after how many commit
>>> intervals the records in repartition topics should be deleted. This
>>> would make sense since delete.interval.ms is tightly bound to
>>> commit.interval.ms. Additionally, it would make the semantics of the
>>> config simpler. The name of the config should definitely change if we go
>>> down this way.
>>>
>>> Best,
>>> Bruno
>>>
>>>
>>>
>>> On 21.12.21 11:14, Luke Chen wrote:
>>>> Hi Nick,
>>>>
>>>> Thanks for the KIP!
>>>>
>>>> In addition to Sophie's comments, I have one more to this KIP:
>>>> 3. I think you should mention the behavior change *explicitly* in
>>>> "Compatibility" section. I know you already mentioned it in KIP, in the
>>>> benefit way. But I think in this section, we should clearly point out
>>> which
>>>> behavior will be change after this KIP. That is, you should put it
>> clear
>>>> that the delete record interval will change from 100ms to 30s with EOS
>>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
>>>> 4. Since this new config has some relationship with commit.interval.ms
>> ,
>>> I
>>>> think we should also update the doc description for `
>> commit.interval.ms
>>> `,
>>>> to let user know there's another config to control delete interval and
>>>> should be greater than commit.interval.ms. Something like that. WDYT?
>>> (You
>>>> should put this change in the KIP as Sophie mentioned)
>>>>
>>>> Thank you.
>>>> Luke
>>>>
>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
>>>> <so...@confluent.io.invalid> wrote:
>>>>
>>>>> Hey Nick,
>>>>>
>>>>> I think you forgot to link to the KIP document, but I take it this is
>>>>> it: KIP-811:
>>>>> Add separate delete.interval.ms to Kafka Streams
>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
>>>>>
>>>>> The overall proposal sounds good to me, just a few minor things:
>>>>>
>>>>>      1. Please specify everything needed to define this config
>>> explicitly, ie
>>>>>      all the arguments that will be passed in to the
>>>>>      StreamsConfig's ConfigDef: in addition to the default value, we
>>> need the
>>>>>      config type (presumably a Long), the doc
>>>>>      string, and the importance (probably "low", similar to
>>>>> commit.interval.ms
>>>>>      )
>>>>>      2. Might be worth considering a slightly more descriptive name for
>>> this
>>>>>      config. Most users probably don't think about,
>>>>>      or may not even be aware of, the deletion of consumed records by
>>> Kafka
>>>>>      Streams, so calling it something along
>>>>>      the lines of "repartition.records.delete.interval.ms" or "
>>>>>      consumed.records.deletion.interval.ms" or so on will help
>>>>>      make it clear what the config refers to and whether or not they
>>> need to
>>>>>      care. Maybe you can come up with better
>>>>>      and/or shorter names, just wanted to suggest some example names
>>> that I
>>>>>      think sufficiently get the point across
>>>>>
>>>>> Other than that I'm +1 -- thanks for the KIP!
>>>>>
>>>>> Sophie
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <ni...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> This is a KIP for a proposed solution to KAFKA-13549
>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution is
>>>>> very
>>>>>> simple, so the KIP is pretty short.
>>>>>>
>>>>>> The suggested changes are implemented by this PR
>>>>>> <https://github.com/apache/kafka/pull/11610>.
>>>>>> --
>>>>>> Nick Telford
>>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Nick Telford <ni...@gmail.com>.
Hi everyone,

Thanks for your feedback. I've made the suggested changes to the KIP (1, 2,
3 and 5).

For the new name, I've chosen repartition.purge.interval.ms, as I felt it
struck a good balance between being self-descriptive and concise. Please
let me know if you'd prefer something else.

On point 6: My PR has basic validation to ensure the value is positive, but
I don't think it's necessary to have dynamic validation, to ensure it's not
less than commit.interval.ms. The reason is that it will be implicitly
limited to that value anyway, and won't break anything. But I can add it if
you'd prefer it.

On point 7: I worry that defaulting it to follow the value of
commit.interval.ms may confuse users, who will likely expect the default to
not be affected by changes to other configuration options. I can see the
appeal of retaining the existing behaviour (following the commit interval)
by default, but I believe that the majority of users who customize
commit.interval.ms do not desire a different frequency of repartition
record purging as well.

As for multiples of commit interval: I think the argument against that is
that an interval is more intuitive when given as a time, rather than as a
multiple of some other operation. Users configuring this should not need to
break out a calculator to work out how frequently the records are going to
be purged!

I've also updated the PR with the relevant changes.

BTW, for some reason I didn't receive Sophie's email. I'll periodically
check the thread on the archive to ensure I don't miss any more of your
messages!

Regards,

Nick

On Tue, 21 Dec 2021 at 12:34, Luke Chen <sh...@gmail.com> wrote:

> Thanks, Bruno.
>
> I agree my point 4 is more like PR discussion, not KIP discussion.
> @Nick , please update my point 4 in PR directly.
>
> Thank you.
> Luke
>
>
>
>
> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <ca...@apache.org> wrote:
>
> > Hi Nick,
> >
> > Thank you for the KIP!
> >
> > I agree on points 1, 2, and 3. I am not sure about point 4. I agree that
> > we should update the docs for commit.interval.ms but I am not sure if
> > this needs to mentioned in the KIP. That seems to me more a PR
> > discussion. Also on point 2, I agree that we need to add a doc string
> > but the content should be exemplary not binding. What I want to say is
> > that, we do not need a KIP to change docs.
> >
> > Here my points:
> >
> > 5. Could you specify in the motivation that the KIP is about deleting
> > records from repartition topics? Maybe with a short description when why
> > and when records are deleted from the repartition topics. For us it
> > might be clear, but IMO we should try to write KIPs so that someone that
> > is relatively new to Kafka Streams can understand the KIP without
> > needing to know too much background.
> >
> > 6. Does the config need to be validated? For example, does
> > delete.interval.ms need to be greater than or equal to
> commit.interval.ms?
> >
> > 7. Should the default value for non-EOS be 30s or the same value as
> > commit.interval.ms? I am just thinking about the case where a user
> > explicitly changes commit.interval.ms but not delete.interval.ms (or
> > whatever name you come up for it). Once delete.interval.ms is set
> > explicitly it is decoupled from commit.interval.ms. Similar could be
> > done for the EOS case.
> > Alternatively, we could also define delete.interval.ms to take a
> > integral number without a unit that specifies after how many commit
> > intervals the records in repartition topics should be deleted. This
> > would make sense since delete.interval.ms is tightly bound to
> > commit.interval.ms. Additionally, it would make the semantics of the
> > config simpler. The name of the config should definitely change if we go
> > down this way.
> >
> > Best,
> > Bruno
> >
> >
> >
> > On 21.12.21 11:14, Luke Chen wrote:
> > > Hi Nick,
> > >
> > > Thanks for the KIP!
> > >
> > > In addition to Sophie's comments, I have one more to this KIP:
> > > 3. I think you should mention the behavior change *explicitly* in
> > > "Compatibility" section. I know you already mentioned it in KIP, in the
> > > benefit way. But I think in this section, we should clearly point out
> > which
> > > behavior will be change after this KIP. That is, you should put it
> clear
> > > that the delete record interval will change from 100ms to 30s with EOS
> > > enabled. And it should also be mentioned in doc/upgrade.html doc.
> > > 4. Since this new config has some relationship with commit.interval.ms
> ,
> > I
> > > think we should also update the doc description for `
> commit.interval.ms
> > `,
> > > to let user know there's another config to control delete interval and
> > > should be greater than commit.interval.ms. Something like that. WDYT?
> > (You
> > > should put this change in the KIP as Sophie mentioned)
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> > > <so...@confluent.io.invalid> wrote:
> > >
> > >> Hey Nick,
> > >>
> > >> I think you forgot to link to the KIP document, but I take it this is
> > >> it: KIP-811:
> > >> Add separate delete.interval.ms to Kafka Streams
> > >> <https://cwiki.apache.org/confluence/x/JY-kCw>
> > >>
> > >> The overall proposal sounds good to me, just a few minor things:
> > >>
> > >>     1. Please specify everything needed to define this config
> > explicitly, ie
> > >>     all the arguments that will be passed in to the
> > >>     StreamsConfig's ConfigDef: in addition to the default value, we
> > need the
> > >>     config type (presumably a Long), the doc
> > >>     string, and the importance (probably "low", similar to
> > >> commit.interval.ms
> > >>     )
> > >>     2. Might be worth considering a slightly more descriptive name for
> > this
> > >>     config. Most users probably don't think about,
> > >>     or may not even be aware of, the deletion of consumed records by
> > Kafka
> > >>     Streams, so calling it something along
> > >>     the lines of "repartition.records.delete.interval.ms" or "
> > >>     consumed.records.deletion.interval.ms" or so on will help
> > >>     make it clear what the config refers to and whether or not they
> > need to
> > >>     care. Maybe you can come up with better
> > >>     and/or shorter names, just wanted to suggest some example names
> > that I
> > >>     think sufficiently get the point across
> > >>
> > >> Other than that I'm +1 -- thanks for the KIP!
> > >>
> > >> Sophie
> > >>
> > >>
> > >>
> > >> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <ni...@gmail.com>
> > >> wrote:
> > >>
> > >>> This is a KIP for a proposed solution to KAFKA-13549
> > >>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution is
> > >> very
> > >>> simple, so the KIP is pretty short.
> > >>>
> > >>> The suggested changes are implemented by this PR
> > >>> <https://github.com/apache/kafka/pull/11610>.
> > >>> --
> > >>> Nick Telford
> > >>>
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Luke Chen <sh...@gmail.com>.
Thanks, Bruno.

I agree my point 4 is more like PR discussion, not KIP discussion.
@Nick , please update my point 4 in PR directly.

Thank you.
Luke




On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <ca...@apache.org> wrote:

> Hi Nick,
>
> Thank you for the KIP!
>
> I agree on points 1, 2, and 3. I am not sure about point 4. I agree that
> we should update the docs for commit.interval.ms but I am not sure if
> this needs to mentioned in the KIP. That seems to me more a PR
> discussion. Also on point 2, I agree that we need to add a doc string
> but the content should be exemplary not binding. What I want to say is
> that, we do not need a KIP to change docs.
>
> Here my points:
>
> 5. Could you specify in the motivation that the KIP is about deleting
> records from repartition topics? Maybe with a short description when why
> and when records are deleted from the repartition topics. For us it
> might be clear, but IMO we should try to write KIPs so that someone that
> is relatively new to Kafka Streams can understand the KIP without
> needing to know too much background.
>
> 6. Does the config need to be validated? For example, does
> delete.interval.ms need to be greater than or equal to commit.interval.ms?
>
> 7. Should the default value for non-EOS be 30s or the same value as
> commit.interval.ms? I am just thinking about the case where a user
> explicitly changes commit.interval.ms but not delete.interval.ms (or
> whatever name you come up for it). Once delete.interval.ms is set
> explicitly it is decoupled from commit.interval.ms. Similar could be
> done for the EOS case.
> Alternatively, we could also define delete.interval.ms to take a
> integral number without a unit that specifies after how many commit
> intervals the records in repartition topics should be deleted. This
> would make sense since delete.interval.ms is tightly bound to
> commit.interval.ms. Additionally, it would make the semantics of the
> config simpler. The name of the config should definitely change if we go
> down this way.
>
> Best,
> Bruno
>
>
>
> On 21.12.21 11:14, Luke Chen wrote:
> > Hi Nick,
> >
> > Thanks for the KIP!
> >
> > In addition to Sophie's comments, I have one more to this KIP:
> > 3. I think you should mention the behavior change *explicitly* in
> > "Compatibility" section. I know you already mentioned it in KIP, in the
> > benefit way. But I think in this section, we should clearly point out
> which
> > behavior will be change after this KIP. That is, you should put it clear
> > that the delete record interval will change from 100ms to 30s with EOS
> > enabled. And it should also be mentioned in doc/upgrade.html doc.
> > 4. Since this new config has some relationship with commit.interval.ms,
> I
> > think we should also update the doc description for `commit.interval.ms
> `,
> > to let user know there's another config to control delete interval and
> > should be greater than commit.interval.ms. Something like that. WDYT?
> (You
> > should put this change in the KIP as Sophie mentioned)
> >
> > Thank you.
> > Luke
> >
> > On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> > <so...@confluent.io.invalid> wrote:
> >
> >> Hey Nick,
> >>
> >> I think you forgot to link to the KIP document, but I take it this is
> >> it: KIP-811:
> >> Add separate delete.interval.ms to Kafka Streams
> >> <https://cwiki.apache.org/confluence/x/JY-kCw>
> >>
> >> The overall proposal sounds good to me, just a few minor things:
> >>
> >>     1. Please specify everything needed to define this config
> explicitly, ie
> >>     all the arguments that will be passed in to the
> >>     StreamsConfig's ConfigDef: in addition to the default value, we
> need the
> >>     config type (presumably a Long), the doc
> >>     string, and the importance (probably "low", similar to
> >> commit.interval.ms
> >>     )
> >>     2. Might be worth considering a slightly more descriptive name for
> this
> >>     config. Most users probably don't think about,
> >>     or may not even be aware of, the deletion of consumed records by
> Kafka
> >>     Streams, so calling it something along
> >>     the lines of "repartition.records.delete.interval.ms" or "
> >>     consumed.records.deletion.interval.ms" or so on will help
> >>     make it clear what the config refers to and whether or not they
> need to
> >>     care. Maybe you can come up with better
> >>     and/or shorter names, just wanted to suggest some example names
> that I
> >>     think sufficiently get the point across
> >>
> >> Other than that I'm +1 -- thanks for the KIP!
> >>
> >> Sophie
> >>
> >>
> >>
> >> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <ni...@gmail.com>
> >> wrote:
> >>
> >>> This is a KIP for a proposed solution to KAFKA-13549
> >>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution is
> >> very
> >>> simple, so the KIP is pretty short.
> >>>
> >>> The suggested changes are implemented by this PR
> >>> <https://github.com/apache/kafka/pull/11610>.
> >>> --
> >>> Nick Telford
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Nick,

Thank you for the KIP!

I agree on points 1, 2, and 3. I am not sure about point 4. I agree that 
we should update the docs for commit.interval.ms but I am not sure if 
this needs to mentioned in the KIP. That seems to me more a PR 
discussion. Also on point 2, I agree that we need to add a doc string 
but the content should be exemplary not binding. What I want to say is 
that, we do not need a KIP to change docs.

Here my points:

5. Could you specify in the motivation that the KIP is about deleting 
records from repartition topics? Maybe with a short description when why 
and when records are deleted from the repartition topics. For us it 
might be clear, but IMO we should try to write KIPs so that someone that 
is relatively new to Kafka Streams can understand the KIP without 
needing to know too much background.

6. Does the config need to be validated? For example, does 
delete.interval.ms need to be greater than or equal to commit.interval.ms?

7. Should the default value for non-EOS be 30s or the same value as 
commit.interval.ms? I am just thinking about the case where a user 
explicitly changes commit.interval.ms but not delete.interval.ms (or 
whatever name you come up for it). Once delete.interval.ms is set 
explicitly it is decoupled from commit.interval.ms. Similar could be 
done for the EOS case.
Alternatively, we could also define delete.interval.ms to take a 
integral number without a unit that specifies after how many commit 
intervals the records in repartition topics should be deleted. This 
would make sense since delete.interval.ms is tightly bound to 
commit.interval.ms. Additionally, it would make the semantics of the 
config simpler. The name of the config should definitely change if we go 
down this way.

Best,
Bruno



On 21.12.21 11:14, Luke Chen wrote:
> Hi Nick,
> 
> Thanks for the KIP!
> 
> In addition to Sophie's comments, I have one more to this KIP:
> 3. I think you should mention the behavior change *explicitly* in
> "Compatibility" section. I know you already mentioned it in KIP, in the
> benefit way. But I think in this section, we should clearly point out which
> behavior will be change after this KIP. That is, you should put it clear
> that the delete record interval will change from 100ms to 30s with EOS
> enabled. And it should also be mentioned in doc/upgrade.html doc.
> 4. Since this new config has some relationship with commit.interval.ms, I
> think we should also update the doc description for `commit.interval.ms`,
> to let user know there's another config to control delete interval and
> should be greater than commit.interval.ms. Something like that. WDYT? (You
> should put this change in the KIP as Sophie mentioned)
> 
> Thank you.
> Luke
> 
> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> <so...@confluent.io.invalid> wrote:
> 
>> Hey Nick,
>>
>> I think you forgot to link to the KIP document, but I take it this is
>> it: KIP-811:
>> Add separate delete.interval.ms to Kafka Streams
>> <https://cwiki.apache.org/confluence/x/JY-kCw>
>>
>> The overall proposal sounds good to me, just a few minor things:
>>
>>     1. Please specify everything needed to define this config explicitly, ie
>>     all the arguments that will be passed in to the
>>     StreamsConfig's ConfigDef: in addition to the default value, we need the
>>     config type (presumably a Long), the doc
>>     string, and the importance (probably "low", similar to
>> commit.interval.ms
>>     )
>>     2. Might be worth considering a slightly more descriptive name for this
>>     config. Most users probably don't think about,
>>     or may not even be aware of, the deletion of consumed records by Kafka
>>     Streams, so calling it something along
>>     the lines of "repartition.records.delete.interval.ms" or "
>>     consumed.records.deletion.interval.ms" or so on will help
>>     make it clear what the config refers to and whether or not they need to
>>     care. Maybe you can come up with better
>>     and/or shorter names, just wanted to suggest some example names that I
>>     think sufficiently get the point across
>>
>> Other than that I'm +1 -- thanks for the KIP!
>>
>> Sophie
>>
>>
>>
>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <ni...@gmail.com>
>> wrote:
>>
>>> This is a KIP for a proposed solution to KAFKA-13549
>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution is
>> very
>>> simple, so the KIP is pretty short.
>>>
>>> The suggested changes are implemented by this PR
>>> <https://github.com/apache/kafka/pull/11610>.
>>> --
>>> Nick Telford
>>>
>>
> 

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Luke Chen <sh...@gmail.com>.
Hi Nick,

Thanks for the KIP!

In addition to Sophie's comments, I have one more to this KIP:
3. I think you should mention the behavior change *explicitly* in
"Compatibility" section. I know you already mentioned it in KIP, in the
benefit way. But I think in this section, we should clearly point out which
behavior will be change after this KIP. That is, you should put it clear
that the delete record interval will change from 100ms to 30s with EOS
enabled. And it should also be mentioned in doc/upgrade.html doc.
4. Since this new config has some relationship with commit.interval.ms, I
think we should also update the doc description for `commit.interval.ms`,
to let user know there's another config to control delete interval and
should be greater than commit.interval.ms. Something like that. WDYT? (You
should put this change in the KIP as Sophie mentioned)

Thank you.
Luke

On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
<so...@confluent.io.invalid> wrote:

> Hey Nick,
>
> I think you forgot to link to the KIP document, but I take it this is
> it: KIP-811:
> Add separate delete.interval.ms to Kafka Streams
> <https://cwiki.apache.org/confluence/x/JY-kCw>
>
> The overall proposal sounds good to me, just a few minor things:
>
>    1. Please specify everything needed to define this config explicitly, ie
>    all the arguments that will be passed in to the
>    StreamsConfig's ConfigDef: in addition to the default value, we need the
>    config type (presumably a Long), the doc
>    string, and the importance (probably "low", similar to
> commit.interval.ms
>    )
>    2. Might be worth considering a slightly more descriptive name for this
>    config. Most users probably don't think about,
>    or may not even be aware of, the deletion of consumed records by Kafka
>    Streams, so calling it something along
>    the lines of "repartition.records.delete.interval.ms" or "
>    consumed.records.deletion.interval.ms" or so on will help
>    make it clear what the config refers to and whether or not they need to
>    care. Maybe you can come up with better
>    and/or shorter names, just wanted to suggest some example names that I
>    think sufficiently get the point across
>
> Other than that I'm +1 -- thanks for the KIP!
>
> Sophie
>
>
>
> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <ni...@gmail.com>
> wrote:
>
> > This is a KIP for a proposed solution to KAFKA-13549
> > <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution is
> very
> > simple, so the KIP is pretty short.
> >
> > The suggested changes are implemented by this PR
> > <https://github.com/apache/kafka/pull/11610>.
> > --
> > Nick Telford
> >
>

Re: [DISCUSS] KIP-811 Add separate delete.interval.ms to Kafka Streams

Posted by Sophie Blee-Goldman <so...@confluent.io.INVALID>.
Hey Nick,

I think you forgot to link to the KIP document, but I take it this is
it: KIP-811:
Add separate delete.interval.ms to Kafka Streams
<https://cwiki.apache.org/confluence/x/JY-kCw>

The overall proposal sounds good to me, just a few minor things:

   1. Please specify everything needed to define this config explicitly, ie
   all the arguments that will be passed in to the
   StreamsConfig's ConfigDef: in addition to the default value, we need the
   config type (presumably a Long), the doc
   string, and the importance (probably "low", similar to commit.interval.ms
   )
   2. Might be worth considering a slightly more descriptive name for this
   config. Most users probably don't think about,
   or may not even be aware of, the deletion of consumed records by Kafka
   Streams, so calling it something along
   the lines of "repartition.records.delete.interval.ms" or "
   consumed.records.deletion.interval.ms" or so on will help
   make it clear what the config refers to and whether or not they need to
   care. Maybe you can come up with better
   and/or shorter names, just wanted to suggest some example names that I
   think sufficiently get the point across

Other than that I'm +1 -- thanks for the KIP!

Sophie



On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <ni...@gmail.com> wrote:

> This is a KIP for a proposed solution to KAFKA-13549
> <https://issues.apache.org/jira/browse/KAFKA-13549>. The solution is very
> simple, so the KIP is pretty short.
>
> The suggested changes are implemented by this PR
> <https://github.com/apache/kafka/pull/11610>.
> --
> Nick Telford
>